576 97 14MB
English Pages [397] Year 2019
Linear Algebra Concepts and Applications
AMS / MAA
TEXTBOOKS
VOL 47
Linear Algebra Concepts and Applications
Przemyslaw Bogacki
Providence, Rhode Island
Committee on Books Jennifer J. Quinn, Chair MAA Textbooks Editorial Board Stanley E. Seltzer, Editor Bela Bajnok Matthias Beck Heather Ann Dye William Robert Green Charles R. Hampton
Suzanne Lynne Larson Jeffrey L. Stuart John Lorch Ron D. Taylor, Jr. Michael J. McAsey Elizabeth Thoren Virginia Noonburg Ruth Vanderpool
2010 Mathematics Subject Classification. Primary 15Axx, 15A03, 15A04, 15A06, 15A09, 15A15, 15A18, 97-01, 97H60.
For additional information and updates on this book, visit www.ams.org/bookpages/text-47
Library of Congress Cataloging-in-Publication Data Names: Bogacki, Przemyslaw, 1963- author. Title: Linear algebra : concepts and applications / Przemyslaw Bogacki. Description: Providence, Rhode Island : MAA Press, an imprint of the American Mathematical Society, [2019] | Series: AMS/MAA textbooks ; volume 47 | Includes index. Identifiers: LCCN 2018041789 | ISBN 9781470443849 (alk. paper) Subjects: LCSH: Algebras, Linear–Textbooks. | AMS: Linear and multilinear algebra; matrix theory – Basic linear algebra – Basic linear algebra. msc | Linear and multilinear algebra; matrix theory – Basic linear algebra – Vector spaces, linear dependence, rank. msc | Linear and multilinear algebra; matrix theory – Basic linear algebra – Linear transformations, semilinear transformations. msc | Linear and multilinear algebra; matrix theory – Basic linear algebra – Linear equations. msc | Linear and multilinear algebra; matrix theory – Basic linear algebra – Matrix inversion, generalized inverses. msc | Linear and multilinear algebra; matrix theory – Basic linear algebra – Determinants, permanents, other special matrix functions. msc | Linear and multilinear algebra; matrix theory – Basic linear algebra – Eigenvalues, singular values, and eigenvectors. msc | Mathematics education – Instructional exposition (textbooks, tutorial papers, etc.). msc | Mathematics education – Algebra – Linear algebra. msc Classification: LCC QA184.2 .B64 2019 | DDC 512/.5–dc23 LC record available at https://lccn.loc.gov/2018041789
Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy select pages for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given. Republication, systematic copying, or multiple reproduction of any material in this publication is permitted only under license from the American Mathematical Society. Requests for permission to reuse portions of AMS publication content are handled by the Copyright Clearance Center. For more information, please visit www.ams.org/publications/pubpermissions. Send requests for translation rights and licensed reprints to reprint-permission@ams.org. c 2019 by the American Mathematical Society. All rights reserved. The American Mathematical Society retains all rights except those granted to the United States Government. Printed in the United States of America. ∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability. Visit the AMS home page at https://www.ams.org/ 10 9 8 7 6 5 4 3 2 1
24 23 22 21 20 19
To my wife
Contents vii
Contents
Preface
1
ix
Glossary of Notation
xi
Vectors and Matrices
1
1.1 Vectors
2
1.2 Matrices
17
1.3 Matrix Multiplication
25
1.4 Introduction to Linear Transformations 1.5 Chapter Review
2
Linear Systems
35
58
60
2.1 Systems of Linear Equations
62
2.2 Elementary Matrices and the Geometry of Linear Systems 2.3 Matrix Inverse
87
2.4 Applications of Linear Systems and Matrix Factorizations 2.5 Chapter Review
3
Determinants
114
116
3.1 Cofactor Expansions
116
3.2 Applications of Determinants 3.3 Chapter Review
4
Vector Spaces
140
142
4.1 Vector Spaces 4.2 Subspaces
142
151
4.3 Linear Independence
163
4.4 Basis and Dimension
176
4.5 Coordinates
77
191
130
98
viii Contents 4.6 Rank and Nullity
206
4.7 Chapter Review
5
Linear Transformations
216
218
5.1 Linear Transformations in General Vector Spaces 5.2 Kernel and Range
225
5.3 Matrices of Linear Transformations 5.4 Chapter Review
6
237
252
Orthogonality and Projections 6.1 Orthogonality
218
254 254
6.2 Orthogonal Projections and Orthogonal Complements
261
6.3 Gram-Schmidt Process and Least Squares Approximation 6.4 Introduction to Singular Value Decomposition 6.5 Chapter Review
7
294
Eigenvalues and Singular Values
296
7.1 Eigenvalues and Eigenvectors 7.2 Diagonalization
296
311
7.3 Applications of Eigenvalues and Eigenvectors 7.4 Singular Value Decomposition 7.5 Chapter Review
360
Answers to Selected Odd-Numbered Exercises
B
Recurring References to Selected Transformations in R2
C
Twelve Equivalent Statements 381
325
340
A
Index
284
379
362 377
276
Preface ix
Preface This is a textbook designed for an introductory linear algebra course. While many students may have been exposed to some of the relevant topics (e.g., vectors, matrices, determinants, etc.) it is the author’s experience that no such familiarity can be assumed of all students; consequently, this book assumes none. At several places throughout the text, the symbol ò marks portions that require calculus. If you have taken some introductory calculus course(s) already, you should be able to go through most of these (and solve some of the exercises marked likewise). However, in many instances, knowledge of multivariable calculus is required. If you are going to take such a course in the future, you may want to revisit the respective portion of this text after completing it – doing so is likely to enhance your understanding of both subjects: calculus and linear algebra. Most odd-numbered exercises are answered in Appendix A. The exceptions are those marked with an asterisk (*). These range from quite easy to fairly challenging, but they generally tend to expect you to go about solving them in a more independent fashion than is the case with most other exercises. Some of these may ask you to prove results that are actually used later in the book. In many sections you will find exercises in which you are asked if the given statement is true or false ( T/F? ). In some cases, you will be guided to more specifically •
find an example that matches the given description or explain why this cannot be done (
•
? ) or
find if the given statement always holds or find a counterexample (
? ).
Another feature of this text is a deck of cards containing equivalent statements and (on reverse side) their “negatives” located at the end of the book. Throughout the text, we shall continue to build these equivalences up. As more and more statements are added to the list, you may find it helpful to actually cut these out and use them as a manipulative when working with problems requiring them. The Linear Algebra Toolkit (latoolkit.com) is an online environment designed by the author to help a student learn to correctly perform the steps involved in some basic linear algebra procedures and to learn why these steps are used. Many of the instances where the Toolkit may be useful are marked by the symbol
LA TOOLKIT .COM IL 26748372846 NG 03571894251 EE 71028335189 AB 25729491092 RR 17398108327 + A 39163728081 I would like to express my gratitude to all those who have supported my work on this book, including my family and my colleagues in the Department of Mathematics and Statistics at Old Dominion University. Last but not least, I greatly appreciate the feedback I have received from other professors and from students who commented on previous versions of this manuscript.
Glossary of Notation xi
Glossary of Notation Notation
Meaning
Page
n-vector
2
⎡
⎤ a1 ⎢ . ⎥ → − . ⎥ v =⎢ ⎣ . ⎦ an n R −−→ PQ → − v → − → u ·− v → − → u ×− v ⎡ a11 · · · ⎢ . .. . A=⎢ . ⎣ . am1 · · · In AT colj A rowi A Ak A−1 det A = |A| Mij , Aij adj A Mmn FX Pn →, . . . , − →} span{− u u 1 k → {− e1 , . . . , − e→ } n dim V → [− v ]S rank A nullity A ker F range F F −1 PT ←S V⊥ → projV − u + A
⎤ a1n .. ⎥ ⎥ . ⎦ amn
n-space
2
vector from P to Q → length (or magnitude) of − v dot product cross product
2 3 8 15
matrix
17
n × n identity matrix matrix transpose jth column of A ith row of A matrix power matrix inverse determinant of the matrix A minor and cofactor of the matrix A adjoint of the matrix A space of all m × n matrices
17 20 21 21 29 87 116 116 132 143
space of all functions defined over a nonempty set X ⊆ R space of all polynomials of degree n or less →, . . . , − → span of vectors − u u 1 k standard basis for Rn dimension of V → coordinate vector of − v with respect to the basis S rank of the matrix A nullity of the matrix A kernel of the transformation F range of the transformation F inverse of the transformation F coordinate-change matrix orthogonal complement of the subspace V → orthogonal projection of the vector − u onto the subspace V pseudoinverse of the matrix A
143 144 155 177 182 193 209 210 225 225 231 246 266 267 347
Chapter 1 Vectors and Matrices 1
1
Vectors and Matrices
One way to look at vectors and matrices is to view them as data structures: tables (or arrays) of numbers. However, as we will see in this chapter (and in the rest of this book), vectors and matrices are much more than just storage devices. Their many applications make them indispensable tools of modern science and mathematics. An important example of such an application is three-dimensional modeling (or 3D modeling) – an area of computer graphics that relies on various mathematical tools involving vectors and matrices. These tools allow a digital content creator to represent a variety of subjects and to manipulate those representations. The following figure shows renderings of a face of a human female model.1
t=0
t = 0.5
t=1
At the end of Section 1.1, we will discuss the notion of a polygon mesh, which is used in this model. As we shall see, vectors serve as the key ingredient of such polygon meshes; furthermore, by performing appropriate operations on those vectors, we can even make our model smile! As we progress in our studies of linear algebra, we will be able to do even more: for instance, later in this chapter, we will see how matrices can be used to perform transformations (e.g., reflections and rotations) on such models.
1 These renderings actually use two models created by Daz Productions, Inc.: Victoria 3.0 (including Expression Morphs) and Millenium Flip Hairstyle 2.0.
2
Chapter 1 Vectors and Matrices
1.1 Vectors D EFINITION A sequence of n real numbers is called an n-vector. Such a vector, de⎤ ⎡ a1 ⎥ ⎢ ⎢ a2 ⎥ → − ⎢ noted by v = ⎢ . ⎥ ⎥ , is said to have components a1 , a2 , . . . , and an . The set of all ⎣ .. ⎦
y
an n-vectors is called the n-space and is denoted Rn .
b x
u a − → u =
a b
Most likely, you are already familiar with n-vectors, in particular those for n = 2 (vectors in the
plane) and n = 3 (vectors in space). Although the scope of this book will extend beyond these two cases, our ability to visualize vectors in R2 and R3 as arrows will be of great importance to us. Often, these arrows will be fixed to start at the origin, like those contained in the figures on the left.
z c
u
y
O x
then the vector can be expressed as
b
a
Sometimes, however, the arrows can be positioned elsewhere. We refer to the point P at the beginning of the arrow as the initial point, while the point at the end (at the tip of the arrow) Q is called the terminal point. A vector corresponding to such an arrow from P to Q can be denoted −−→ P Q. If the points are specified by their coordinates P (x1 , x2 , . . . , xn ) and Q(y1 , y2 , . . . , yn ),
⎡ ⎡
⎤
⎢ ⎢ ⎢ ⎢ ⎣
−−→ PQ =
a ⎢ ⎥ → − u = ⎣ b ⎦ c
y1 − x1 y2 − x2 .. . yn − xn
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
E XAMPLE 1.1 Consider the points P (1, 3), Q(4, 4), S(3, 1), as well as the origin O(0, 0). The following three vectors are depicted in the graph on the left: − → u
−−→ = PQ =
− → v
−→ = SP =
− → w
−→ = OS =
4−1 4−3
=
3 1
,
y Q
4 3
P u v
2
S
1 O
1
w 2 3
x
1−3 3−1
3−0 1−0
=
=
−2 2
3 1
,
.
4 → → Geometrically, − u in the above example is a translation of − w . However, components of any → vector merely specify the vector’s direction and length and not its position. Consequently, − u → − and w in our example are considered equal.
Section 1.1 Vectors
⎡ ⎢ ⎢ → D EFINITION Generally, two n-vectors − u =⎢ ⎢ ⎣
a1 a2 .. . an
⎡
⎤
⎢ ⎥ ⎢ ⎥ → ⎢ ⎥ and − v = ⎢ ⎥ ⎣ ⎦
b1 b2 .. . bn
3
⎤ ⎥ ⎥ ⎥ are equal if ⎥ ⎦
their corresponding components are equal; i.e., ai = bi for i = 1, 2, . . . , n.
For an additional discussion of vectors that can start at any point compared to vectors restricted to starting at the origin only, refer to the subsection on p. 10.
Length of the hypotenuse ||OB||= =
The formal definition of the length of a vector follows: ⎡
||OA||2+||AB||2 2
2
x +y +z
⎢ ⎢ − → D EFINITION The length (or magnitude) of the vector v = ⎢ ⎢ ⎣
2
z → − v=
B y
O x
a21 + a22 + · · · + a2n .
a1 a2 .. . an
⎤ ⎥ ⎥ ⎥ is ⎥ ⎦
−−→ P Q represents the distance between the points P and Q.
A The magnitudes of the vectors ⎤ ⎡ 0 ⎡ ⎤ ⎥ ⎢ −1 ⎢ 1 ⎥ ⎥ ⎢ 3 ⎢ ⎥ → − → → u = ,− v = ⎣ 2 ⎦ , and − w =⎢ 3 ⎥ ⎥ are ⎢ 4 ⎥ ⎢ 4 ⎣ 1 ⎦ −2 √ → • − u = 32 + 42 = 5, √ → • − v = (−1)2 + 22 + 42 = 21, and √ → • − w = 02 + 12 + 32 + 12 + (−2)2 = 15, E XAMPLE 1.2
Length of OA is
Length of AB is
2
|z|
x +y
2
respectively.
The length of an n-vector is a nonnegative number. The only n-vector with zero length is the vector all of whose components are equal to 0 (see Exercise 55): D EFINITION
⎡
⎤ 0 ⎢ . ⎥ → − . ⎥ The n-vector ⎢ ⎣ . ⎦ is called the zero vector and is denoted 0 . 0
4
Chapter 1 Vectors and Matrices
Vector operations
D EFINITION (Vector Addition) ⎡
y
b+d b
d b
u+v
v
x
u c
⎡
⎤
b1 ⎢ ⎢ ⎥ ⎢ b2 ⎢ ⎥ → → ⎥ and − v =⎢ The sum of two n-vectors − u =⎢ ⎢ .. ⎢ ⎥ ⎣ . ⎣ ⎦ bn ⎡ ⎤ a1 + b1 ⎢ ⎥ a ⎢ 2 + b2 ⎥ → − → − ⎢ ⎥. u + v =⎢ .. ⎥ ⎣ ⎦ . a1 a2 .. . an
a a+c
⎤ ⎥ ⎥ ⎥ is the n-vector ⎥ ⎦
an + bn
a y b+d d
d
u+v
v translated
b
x
u c
a a+c c An alternative illustration of vector → → addition − u +− v involves translating → − v so that its initial point coincides → with the terminal point of − u.
A geometric interpretation of this definition is very helpful, especially in R2 and R3 . a c a+c → − → − → − → − 2 and v = yields u + v = . In R , taking u = b d b+d → → Constructing a parallelogram with two adjacent sides made up by − u and − v , translated (if necessary) so they have the same initial point, the sum is the vector from this common initial point to the opposite corner of the parallelogram. D EFINITION (Scalar Multiplication) ⎡ ⎢ ⎢ − → Given the scalar (real number) c and an n-vector u = ⎢ ⎢ ⎣ − → u by c is the n-vector
⎢ ⎢ − → cu = ⎢ ⎢ ⎣
y v -v
⎡
2v
x
ca1 ca2 .. . can
a1 a2 .. . an
⎤ ⎥ ⎥ ⎥ , the scalar multiple of ⎥ ⎦
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
→ → → → The scalar multiple c− v is parallel to the vector − v , i.e., c− v is in the same direction as − v , 2 but its length is adjusted: → c− v =
√
2 2 → (ca1 ) + · · · + (can ) = c2 a21 + a22 + · · · + a2n = |c| − v .
→ → If c < 0, then c− v has opposite orientation to − v.
→ → Strictly speaking, to refer to c− v and − v as parallel vectors (or vectors in the same direction), we must ensure → − → − c = 0 and v = 0 since the zero vector has no direction. 2
⎡ E XAMPLE 1.3 ⎡ ⎢ ⎢ − → and 2 v = ⎢ ⎢ ⎣
8 2 2 0
⎢ ⎢ − → For vectors u = ⎢ ⎢ ⎣ ⎤
−3 2 0 1
⎤
⎡
4 ⎥ ⎢ ⎥ ⎢ 1 − ⎥ and → v =⎢ ⎥ ⎢ 1 ⎦ ⎣ 0
⎤
Section 1.1 Vectors ⎡
1 ⎥ ⎢ ⎥ ⎢ 3 − → ⎥ , we have → u +− v =⎢ ⎥ ⎢ 1 ⎦ ⎣ 1
5 ⎤ ⎥ ⎥ ⎥ ⎥ ⎦
⎥ ⎥ ⎥. ⎥ ⎦
→ → 1 − One scalar multiple of a nonzero vector − v is of special significance: → − v is called the unit v 1 − 1 − → →
− → vector in the direction of − v (the name is justified since v v = 1). − → = → v v The following theorem states ten properties of vector addition and scalar multiplication. We will revisit these in Chapter 4.
T HEOREM 1.1 → → → → 1. If − u and − v are n-vectors, then − u +− v is also an n-vector. − → → → → → 2. → u +− v =− v +− u for all n-vectors − u and − v. − → → → → → → → → 3. (→ u +− v)+− w =− u + (− v +− w ) for all n-vectors − u,− v , and − w. → → → → → → → z +− u =− u for all n-vectors − u. 4. There exists an n-vector − z such that − u +− z =− → − → − − → → − → → 5. For every n-vector − u , there exists an n-vector d such that − u + d = d +− u =→ z. → → 6. If c is a real number and − u is an n-vector, then c− u is also an n-vector. → → → → → → 7. c(− u +− v ) = c− u + c− v for all n-vectors − u,− v and for all real numbers c. → → → → u = c− u + d− u for all n-vectors − u and for all real numbers c and d. 8. (c + d)− → → → 9. (cd)− u = c(d− u ) for all n-vectors − u and for all real numbers c and d. → → → 10. 1− u =− u for all n-vectors − u. O UTLINE OF THE P ROOF Properties 1, 6, and 10 follow immediately from the corresponding definitions. Property 2 holds because of the commutative property of real number addition: a + b = b + a, which applies at every component of the resulting vectors. Likewise, relevant properties of real numbers are used to justify 3, 7, 8, and 9. → − → In property 4, we let − z be the zero n-vector 0 . → − → In property 5, the vector d should be taken to be (−1)− u.
6
Chapter 1 Vectors and Matrices → − Let us introduce some convenient terminology to refer to the vector d of property 5: D EFINITION − → − Given an n-vector → u , the scalar multiple (−1)− u is called the negative of → u and is → − denoted − u . Adding scalar multiples of n-vectors results in an expression called a linear combination, which will be of great importance to us throughout this book. D EFINITION (Linear Combination) → − → →, − A linear combination of n-vectors − u 1 u2 , . . . , uk is the n-vector − → − → → c 1 u1 + c 2 u2 + · · · + c k − u k where c1 , c2 , . . . , ck are real numbers. E XAMPLE 1.4
y
u v
+ (−3)
0 1
+4
−2 0
=
−6 1
.
The difference of two n-vectors is defined as the linear combination: → − → → → u −− v = 1− u + (−1)− v.
u-v translated u-v
2
1 2
As we have already pointed out, only vectors in R2 and in R3 can be visualized as arrows (in the plane and in space, respectively). While this approach does not work for vectors in R4 , R5 , etc., there are many other ways to interpret such vectors.
x
-v
E XAMPLE 1.5 Gold has many uses, a number of which require it to be alloyed with other metals, such as silver, copper, zinc, nickel, etc. Consider the alloy called the “9 karat yellow alloy” containing •
9 parts gold,
•
2 parts silver,
•
11 parts copper, and
•
2 parts zinc. Note that the total number of parts is 24. It is customary to specify alloy content in such a way and to designate the number of parts corresponding to gold (out of 24) as the “karat number”. Mathematically, the description above can be restated using the vector notation: ⎡ ⎢ ⎢ − → u = ⎢ ⎢ ⎣
9/24 2/24 11/24 2/24
⎤
← gold content ⎥ ⎥ ← silver content ⎥ ⎥ ← copper content ⎦ ← zinc content
Section 1.1 Vectors
7
This notation makes it possible to answer questions such as “How much of each metal is con→ tained in 600g of the 9 karat alloy?” by simply forming a scalar multiple of − u : ⎡ → 600− u
⎢ ⎢ = 600 ⎢ ⎢ ⎣
9/24 2/24 11/24 2/24
⎤
⎡
⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎦ ⎣
225 50 275 50
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
The answer is: “225g of gold, 50g of silver,...,” etc. Let us consider a second alloy made up of •
21 parts gold,
•
2 parts silver, and
•
1 part copper. Clearly, this alloy is different from the first. One difference that would be of great importance to the jeweler (and to the customer!) is the much higher gold content in the second alloy, which would make it much more expensive. Another difference (of interest to a linear algebra student) is that this alloy can be represented by a vector in R3 , whereas the previous one required a vector in R4 . However, to facilitate combining the two alloys, we shall represent the second one as a vector in R4 as well: ⎡ ⎢ ⎢ − → v = ⎢ ⎢ ⎣
21/24 2/24 1/24 0
⎤
← gold content ⎥ ⎥ ← silver content ⎥ ⎥ ← copper content ⎦ ← zinc content
Here is an example of a question that can be answered by applying vector operations to the two vectors: “What alloy do we obtain by mixing equal amounts of the first and the second one?” ⎡
− → w =
⎡ ⎤ ⎤ ⎡ 9/24 21/24 ⎢ ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ 2/24 ⎥ 1→ 1 ⎢ 1− → ⎥ + 1 ⎢ 2/24 ⎥ = ⎢ u + − v = ⎢ ⎢ ⎢ ⎥ ⎢ 2 2 2 ⎣ 11/24 ⎦ 2 ⎣ 1/24 ⎥ ⎦ ⎣ 2/24 0
⎤ 15/24 ⎥ 2/24 ⎥ ⎥. 6/24 ⎥ ⎦ 1/24
→ Vector − w completely describes the resulting (15 karat) alloy. Note that we can easily accommodate additional ingredients in our alloys by increasing the number of components in our vectors (making them vectors in R5 , R6 , etc.). E.g., the following 5-vector represents a “14 karat white alloy” composed of five metals: ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
0.585 0.005 0.270 0.070 0.070
⎤
←− gold content ⎥ ⎥ ←− silver content ⎥ ⎥ ←− copper content ⎥ ⎥ ⎦ ←− zinc content ←− nickel content
8
Chapter 1 Vectors and Matrices
Dot product The operation of scalar multiplication defined on p. 4 creates a vector from a scalar and another vector. A different way of performing multiplication using vectors is introduced in the following definition: this time, a scalar is created from two vectors. D EFINITION (Dot Product)
scalar
vector
scalar multiple
vector
⎡
⎤ b1 ⎢ ⎥ ⎥ ⎢ ⎢ b2 ⎥ ⎥ ⎢ → − → − ⎢ ⎥ ⎥ ⎢ The dot product of two n-vectors u = ⎢ ⎥ and v = ⎢ .. ⎥ is the real number ⎣ . ⎦ ⎦ ⎣ bn n → − → u ·− v = a1 b1 + a2 b2 + · · · + an bn = ai bi . ⎡
vector
dot product
a1 a2 .. . an
⎤
i=1
vector scalar A comparison of the syntax of scalar → → multiplication c− u =− v and the dot → − → − product u · v = d.
E XAMPLE 1.6
•
•
−2 · = (3)(−2) + (2)(1) = −4. 1 ⎡ ⎤ ⎡ ⎤ 0 5 ⎢ ⎥ ⎢ ⎥ · ⎣ 1 ⎦ ⎣ 2 ⎦ = (0)(5) + (1)(2) + (4)(3) = 14. 4 3 3 2
T HEOREM 1.2
(Properties of Dot Product)
→ → → Let c be a scalar and let − u,− v , and − w be n-vectors. 2 − → → 1. → u ·− u = − u .
− → → → 2. → u ·− v =− v ·− u. − → → → → → → 3. → u · (− v +− w ) = (− u ·− v ) + (− u ·− w ). → → → → → → 4. (c− u)·− v = c (− u ·− v)=− u · (c− v ). P ROOF of part 3 ⎤ ⎡ a1 b1 ⎢ . ⎥ → ⎢ . → − − ⎢ . . ⎥ Let us denote u = ⎢ ⎣ . ⎦, v = ⎣ . an bn ⎡
The left-hand side equals
⎤
⎤ c1 ⎥ ⎢ . ⎥ → ⎥ , and − . ⎥ w =⎢ ⎦ ⎣ . ⎦. cn ⎤ ⎡ b1 + c1 a1 ⎢ . ⎥ ⎢ .. ⎢ . ⎥·⎢ . ⎣ . ⎦ ⎣ an bn + cn ⎡
− → → → u · (− v +− w) = =
⎡
⎤ ⎥ ⎥ ⎦
a1 (b1 + c1 ) + · · · + an (bn + cn ).
Section 1.1 Vectors
9
Evaluating the right-hand side yields → → → → (− u ·− v ) + (− u ·− w ) = (a1 b1 + · · · + an bn ) + (a1 c1 + · · · + an cn ). Clearly, the two sides are equal. Proofs of the remaining parts are left as exercises. It is often useful to combine data contained in two vectors by calculating their dot product.
E XAMPLE 1.7 For breakfast, Bill has half-a-serving of cereal “X” with one serving of cereal “Y” and one serving of milk. According to the nutrition labels
Nutrition Facts
Serving Size: 1 cup (25g) Servings Per Container: About 9 Amount Per Serving
Calories 190 % Daily Value
Total Fat 1g
2%
Total Carbohydrate 46g 15% Protein 4g
•
each serving of cereal “X” contains 110 calories,
•
each serving of cereal “Y” contains 190 calories, and
•
one serving of milk has 150 calories. To determine the total amount of calories consumed by Bill, we can set up vectors ⎤ ⎡ ⎤ ⎡ 110 ← calories/serving of cereal “X” 0.5 ← servings of cereal “X” → ⎢ − ⎥ ⎢ ⎥ → − w = ⎣ 190 ⎦ ← calories/serving of cereal “Y” b = ⎣ 1 ⎦ ← servings of cereal “Y” ← servings of milk 150 ← calories/serving of milk 1 and then proceed to evaluate the dot product: − − → b ·→ w = (0.5)(110) + (1)(190) + (1)(150) = 395.
Orthogonal vectors v (translated)
u u+v
→ → D EFINITION Two n-vectors − u and − v are said to be orthogonal if → − → u ·− v = 0. → → → → Consider three nonzero vectors − u,− v , and − u +− v , with their initial points at the origin, except → − → that v is translated so that its initial point is at the terminal point of − u ; therefore, the three vectors form a triangle. Using some of the properties in Theorem 1.2 we obtain 2 → → − u +− v
y a v
b
u x -b a Vectors a −b → − − → u = and v = b a are orthogonal.
→ → → → = (− u +− v ) · (− u +− v) → → → → → → → → = − u ·− u +− u ·− v +− v ·− u +− v ·− v 2 2 → → → → = − u + − v + 2(− u ·− v ).
(1)
This yields the Pythagorean formula 2 2 2 → → → → → → u + − v if and only if − u ·− v = 0. − u +− v = −
→ → u and − v are orthogonal then either Generally, in R2 and R3 , if two vectors −
(2)
10
Chapter 1 Vectors and Matrices
z w
3 u
-4
2
•
the vectors are perpendicular.
⎡
v 3
x -2
→ − at least one of the two vectors is 0 or
While it is easy to see whether two vectors in R2 appear orthogonal, in R3 computing the dot product is more helpful.
-1 -2
•
v and w are not orthogonal
y
⎤ ⎡ ⎤ ⎡ ⎤ −1 −4 0 ⎢ ⎥ − ⎢ ⎥ ⎢ ⎥ → − E XAMPLE 1.8 Let − u = ⎣ 3 ⎦, → v = ⎣ 0 ⎦ , and → w = ⎣ −2 ⎦ . Which vectors 2 −2 3 among these are orthogonal? S OLUTION − → → → → u ·− v = (−1)(−4) + (3)(0) + (2)(−2) = 0 ⇒ − u and − v are orthogonal; − → → → → u ·− w = (−1)(0) + (3)(−2) + (2)(3) = 0 ⇒ − u and − w are orthogonal; − → → → → v ·− w = (−4)(0) + (0)(−2) + (−2)(3) = −6 = 0 ⇒ − v and − w are not orthogonal. → → → Since all three vectors are nonzero, it follows that − u is perpendicular to both − v and − w.
Orthogonality extends the notion of perpendicularity into higher-dimensional spaces Rn . It will be discussed in more detail in Chapter 6.
Free vectors, fixed vectors, and position vectors
Often, vectors have a specified length and direction but do not have any prescribed location. E.g., in Example 1.1 two vectors were shown to be equal in spite of their different locations. This kind of behavior characterizes a free vector. On the other hand, there are instances where it is appropriate to specify the vector’s exact location – such vectors are called fixed vectors. An extremely important example is provided by position vectors – such a vector representing a point P will always be taken to begin at the −− → → origin O and end at P. A position vector − p = OP and the point P will sometimes be used interchangeably. In this book, we will assume all vectors have their initial points at the origin, unless their translation has been explicitly specified. Finally, note that just because it is algebraically possible to carry out a certain operation on vectors, it may or may not make sense to actually do so depending on the situation. For example, → → if − p and − q are both position vectors (corresponding to the points P and Q, respectively), it → → does not make sense to add − p +− q (refer to Exercise 51). However, it does make sense to add → − → − to p a vector u that represents a displacement in a certain direction of a certain magnitude: → − → p +− u . It also makes sense to take a barycentric combination (also called an affine combina→ → tion) of position vectors t− p + (1 − t)− q which is a linear combination whose coefficients add up to one – see Exercise 52 (barycentric combinations will be discussed later in this book).
Section 1.1 Vectors
Application: Polygon meshes in computer graphics
11
Just as sets of line segments can be used to approximate curves, even sophisticated surfaces in space can often be represented quite accurately by collections of planar figures. When each of these figures is a polygon in the plane and the collection satisfies certain consistency conditions (e.g., an edge should not be shared by more than two polygons), we refer to such a collection as a polygon mesh. While nonplanar pieces (e.g., based on parametric equations involving polynomials or rational functions) can offer additional advantages, the relative simplicity of polygon meshes makes them an important tool in surface modeling. Here is a very simple example illustrating some of the main ideas behind the construction of a polygon mesh.
z
(a)
z
(b)
3
3 y
O
1
z
(c)
y
O
5
2 x
3
5
1 2 x
y
O
5
1 2 x
(a) Define the vertices, using their respective position vectors. In our case, the vertex collection ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 2 1 1 2 ⎢ ⎥ → ⎢ ⎥ → ⎢ ⎥ → ⎢ ⎥ ⎢ ⎥ → → includes − v1 = ⎣ 0 ⎦ , − v2 = ⎣ 5 ⎦ , − v3 = ⎣ 5 ⎦ , − v4 = ⎣ 5 ⎦ , and − v5 = ⎣ 5 ⎦ . 0
0
0
3
3
(b) Define the edges of the mesh by specifying pairs of vertices. Here, we have edge1 with → → → → → → v2 , edge2 from − v2 to − v3 , edge3 from − v3 to − v1 , and five more. . . . endpoints − v1 and − (c) Define the polygons. In some implementations, each polygon is defined as a list of edges (e.g., the bottom one would be edge1 , edge2 , edge3 ), while in others, we refer directly to → → → v2 , and − v3 ). the list of vertices (the bottom polygon would then be defined by − v1 , − A variety of software tools is currently available for creating 3D models and working with them. Many of the state-of-the art packages are the ones used by professional graphics designers and animators and can cost thousands of dollars. Fortunately, there are a number of respectable entry-level packages that are available free of charge, including these: •
3D modeling software • •
•
Wings 3D (URL: http://www.wings3d.com) Blender (URL: http://www.blender.org)3
3D figure posing, scene design, and animation tools •
DAZ|Studio (URL: http://www.daz3d.com).
3 Blender supports other types of surfaces in addition to polygon meshes, including polynomial and rational surface patches.
12
Chapter 1 Vectors and Matrices 3D modeling software products typically allow the user to freely manipulate individual vertices, edges, and polygons. Models can be created from scratch or can be arbitrarily edited. On the other hand, figure posing software tends to work with existing models, usually created with a 3D modeling software. Software of this type provides the users with higher-level tools to access certain properties of the models. Many such tools involve the technique known as morphing. In a typical setup, to define a morph, we need to specify the collection of all vertices to be → − → affected, − pi , and the corresponding displacement vectors di . After morphing, the new position of the ith vertex becomes → − → − pi + t di . The parameter t has the same value for all of the vertices undergoing the morphing. The picture below illustrates how one of the five vertices on Pinocchio’s nose is undergoing morphing with t = 0, t = 1, and t = 3 – see the margin for corresponding shaded renderings.
3d1
d1 p1
p1+d1
p1+3d1
Admittedly, this is a rather crude piece of 3D “art”. At the beginning of the chapter, we have seen an illustration featuring the “Smile” morph of a (much more complex) female 3D model “Victoria 3” from DAZ 3D. Below, we show the same renderings accompanied by the corresponding polygon meshes; note that this model involves over 70,000 polygons.
Section 1.1 Vectors
t=0
t = 0.5
13
t=1
EXERCISES In Exercises 1–2, use the points P1 (2, 5), P2 (−1, 3), and P3 (0, 5) in the plane. −−−→ 1. Find the vector P1 P2 and its length. −−−→ 2. Find the vector P1 P3 and its length. In Exercises 3–4, use the points Q1 (0, −2, 1), Q2 (2, 2, 1), and Q3 (1, 4, −3) in the space. −−−→ 3. Find the vector Q2 Q1 and its length. −−−→ 4. Find the vector Q2 Q3 and its length.
14
Chapter 1 Vectors and Matrices − In Exercises 5–14, use the 2-vectors → u =
1 2
− ,→ v =
6 −4
− ,→ w =
0 3
.
→ → → → → 5. Evaluate each expression: a. − u +− w ; b. 2− v ; c. −− w ; d. − w. → → → → → 6. Evaluate each expression: a. − v +− w ; b. 3− w ; c. −2− u ; d. − v . → → 7. Evaluate 2− u −− v. → → − w. 8. Evaluate − u +− v − 2→ → → → 9. Evaluate −(3− u +− v)+− w. 10. Verify property 3 of Theorem 1.1. → → 11. Evaluate − u ·− v. → → 12. Evaluate − w ·− v. → → 13. Evaluate − w ·− w. 14. Verify property 3 of Theorem 1.2. ⎡
⎤ ⎡ ⎤ ⎡ ⎤ −1 1 0 ⎢ ⎥ − ⎢ ⎥ − ⎢ ⎥ → In Exercises 15–24, use the 3-vectors − u = ⎣ 0 ⎦,→ v = ⎣ 2 ⎦,→ w = ⎣ 2 ⎦. 2 −3 2 15. Repeat Exercise 5 for these vectors. 16. Repeat Exercise 6 for these vectors. → → 17. Evaluate − w + 4− v. → → → 18. Evaluate −− u − 2 (− v +− w). → → → 19. Evaluate 4− u − (− v −− w). 20. For c = −3, verify property 7 of Theorem 1.1. → 21. Find the unit vector in the direction of − w. → 22. Find the unit vector in the direction of − u. → → 23. Evaluate − u ·− w. → → 24. Evaluate − u ·− v. ⎡ ⎢ ⎢ − → In Exercises 25–30, use the 4-vectors u = ⎢ ⎢ ⎣ 25. Repeat Exercise 5 for these vectors. 26. Repeat Exercise 6 for these vectors. → → 27. Evaluate −2− u +− w. → → 28. Evaluate − v − 3− w.
2 1 0 2
⎤
⎡
⎥ ⎢ ⎥ − ⎢ ⎥,→ ⎢ ⎥ v =⎢ ⎦ ⎣
−1 3 1 2
⎤
⎡
⎥ ⎢ ⎥ − ⎢ ⎥,→ ⎢ ⎥ w =⎢ ⎦ ⎣
2 0 3 0
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
Section 1.1 Vectors
15
→ → 29. Evaluate − v ·− v. → → 30. Evaluate − w ·− u.
ò
In Exercises 31–34, decide whether each statement is true or false. Justify your answer. → − → → 31. If − u = 0, then − u = 0. → → → → 32. If − u ·− v = 0, then either − u or − v or both must be zero vectors. ⎡ ⎤ ⎡ ⎤ 1 3 ⎢ ⎥ ⎢ ⎥ 33. The vectors ⎣ 2 ⎦ and ⎣ 0 ⎦ are orthogonal. −3 ⎡ ⎢ ⎢ 34. The vectors ⎢ ⎢ ⎣
?
4 1 0 −2
−1 ⎤
⎡
⎢ ⎥ ⎢ ⎥ ⎥ and ⎢ ⎢ ⎥ ⎣ ⎦
−1 2 2 −1
⎤ ⎥ ⎥ ⎥ are orthogonal. ⎥ ⎦
In Exercises 35–40, decide whether the given statement holds for all vectors and scalars. If so, justify your claim. If not, find a counterexample. → → → → → → 35. c(− u −− v ) = c− u − c− v for all 3-vectors − u and − v and scalars c. → → → → → → 36. c(− u ·− v ) = (c− u ) · (c− v ) for all 2-vectors − u and − v and scalars c. → → → 37. c− u = c − u for all 3-vectors − u and scalars c. → → → → → → − → → → v +− u ·− w for all 4-vectors − u,− v , and − w. 38. → u · (− v +− w) = − u ·− − → → → → → → → → 39. (→ u −− v)−− w =− u − (− v −− w ) for all 2-vectors − u,− v , and − w. − → → → → → → → → → 40. (→ u ·− v )− w = (− u ·− w )(− v ·− w ) for all 3-vectors − u,− v , and − w.
41. * Prove part 1 of Theorem 1.2. 42. * Prove part 2 of Theorem 1.2. 43. * Prove part 4 of Theorem 1.2. The cross product of two 3-vectors is defined according to the formula ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ a1 b1 a2 b3 − a3 b2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎣ a2 ⎦ × ⎣ b2 ⎦ = ⎣ a3 b1 − a1 b3 ⎦ . a3
b3
a1 b2 − a2 b1
⎡
44. Evaluate each cross product:
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 1 −3 −1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ a. ⎣ 0 ⎦ × ⎣ 2 ⎦ ; b. ⎣ 2 ⎦ × ⎣ 1 ⎦ . 1 0 1 1 ⎡
45. Evaluate each cross product:
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 2 3 −6 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ a. ⎣ 1 ⎦ × ⎣ 1 ⎦ ; b. ⎣ −1 ⎦ × ⎣ 2 ⎦ . 3 0 2 −4
(3)
16
Chapter 1 Vectors and Matrices → → → → → → 46. * Show that if − u and − v are 3-vectors such that − v = k− u for some scalar k, then − u ×− v = → − 0. → → → → → 47. * Show that for all 3-vectors − u and − v , the cross product − u ×− v is orthogonal both to − u → − and to v . → → → → → → 48. * Show that for all 3-vectors − u,− v , and − w , the following property holds: − u × (− v +− w) = → − → − → − → − ( u × v ) + ( u × w ). → → u and − v and for all scalars c the following property holds: 49. * Show that for all 3-vectors − → − → − → − → − → − → − (c u ) × v = c( u × w ) = u × (c v ). → → → → → → 50. * Show that for all 3-vectors − u and − v , the following property holds: − u ×− v = −− v ×− u.
y'
y 51. Consider the two points A and B in the diagram in the margin. This diagram contains two different coordinate systems: (x, y) and (x , y ).
A B 1
x' x
1 Illustration for Exercise 51
→ a. Find the position vector − p of the point A in the system (x, y). In the same system, → − find the position vector q of the point B. Plot the point C whose position vector with → → respect to (x, y) is − p +− q. → → u and − v for b. In the coordinate system (x , y ) – gray axes – find the position vectors: − A and B, respectively. Plot the point D whose position vector with respect to (x , y ) → → is − u +− v . Note that C = D! → → → → 52. Repeat Exercise 51 except that instead of − p +− q and − u +− v , calculate a. b.
→ → 2− p −− q and → − → 2u −− v.
(Unlike in Exercise 51, you should obtain the same point in both parts.) 53. Twenty students are enrolled in the same linear algebra class. Their test averages, quiz averages, and final exam scores are kept in a spreadsheet Test Average
Quiz Average
Final
Andrew Anderson
t1
q1
f1
Barbara Banks .. .
t2 .. .
q2 .. .
f2 .. .
Trevor Thompson
t20
q20
f20
− → → → − so that they can be viewed as vectors t , − q , and f in R20 , respectively. If the overall course grade is determined based on the scheme Test Average
40%
Quiz Average
25%
Final
35%
→ set up a linear combination of these vectors to calculate the 20-vector − a whose components are overall course averages of the twenty students. → → 54. * Prove the parallelogram law for n-vectors − u and − v : 2 2 2 2 → − → − → − → → − → − v . u + v + u − v = 2 u + 2 − → − → → → 55. * Show that if − u is an n-vector such that − u = 0, then − u = 0.
Section 1.2 Matrices
17
1.2 Matrices Most computer users have at least some acquaintance with spreadsheet applications. While modern spreadsheets can be quite complex, their fundamental functionality is to organize numerical data in a table, arranged in rows and columns. In mathematics, we refer to a “spreadsheet” of numbers as a matrix: D EFINITION An m × n matrix is an array of real numbers containing m rows and n columns ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ m rows ⎪ ⎪ ⎪ ⎪ ⎩
⎡ ⎢ ⎢ ⎣
a11 .. . am1
··· .. . ···
⎡
⎤
⎢ ⎢ A = ⎢ ⎢ ⎣
a1n .. ⎥ ⎥ . ⎦ amn
a11 a21 .. . am1
a12 a22 .. . am2
··· ··· .. . ···
a1n a2n .. . amn
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
n columns
The number aij , located in the ith row and jth column of A is called the (i, j) entry of the matrix. We can also write A = [aij ] to represent the entire matrix. E XAMPLE 1.9
B =
5 8 −1 1 2 7 9 1
is a 2 × 4 matrix. Its entries include b11 = 5,
b12 = 8, etc.
D EFINITION (Special types of matrices) Matrix type
Definition
Square matrix
A is an n × n matrix (same number of rows and columns).
Diagonal matrix
A is a square matrix such that aij = 0 whenever i = j.
Rectangular diagonal
A is a matrix such that aij = 0 whenever i = j
matrix
(not necessarily a square matrix).
Upper triangular matrix
A is a square matrix such that aij = 0 whenever i > j.
Lower triangular matrix
A is a square matrix such that aij = 0 whenever i < j.
Unit upper (lower)
A is an upper (lower) triangular matrix such that
triangular matrix
aii = 1 for all i.
Scalar matrix
A is a diagonal matrix such that aii = c for all i.
Identity matrix
A is a diagonal matrix such that aii = 1 for all i.
The n × n identity matrix is denoted by In . All entries of a matrix for which the row index equals the column index: a11 , a22 , . . . form the main diagonal of the matrix.
The following diagram illustrates the relationships between some of these types of matrices and contains their examples.
18
Chapter 1 Vectors and Matrices
Matrices
[
2 1 3 2 0 4
Square matrices
[
[
[
1 1 3 0 4 4 6 0 8
[
2 1 5 3
[ [
[
7 1 6 4 0 1
r triangular matrices U p pe Diagonal matrices
[
5 1 0 3
[
[
1 0 0 0 4 0 0 0 8
[
[
0 0 0 3
[
ar matrices Scal
[
1 1 3 0 0 4 0 0 8
[
[
0 0 0 0
[
[
6 0 0 0 6 0 0 0 6
I
[
s
[ [ 3 5 7
[
[
[
ntity matrice de 1 0 0 0 1 0 0 0 1
[
1 0 0 3 2 0 1 0 6
[
4 0 2 1
[
Lowe r triangular matrices [ 3 2 1 ]
D EFINITION The matrices A = [aij ] and B = [bij ] are equal if they have the same size (m × n) and their corresponding entries are equal: aij = bij for all i and j.
Matrix operations Operations of matrix addition and of multiplication of a matrix by a scalar follow the pattern of similar operations that were defined in the previous section for vectors. In fact, it can be easily verified that, when applied to matrices with one column, these operations become identical to the respective vector operations. D EFINITION (Matrix Addition) Entries of A + B are obtained by adding the corresponding entries of A and B.
The sum of two m × n matrices A = [aij ] and B = [bij ] is the m × n matrix A + B = [cij ] with cij = aij + bij for all i and j.
Note that if A and B have different sizes, they cannot be added.
Section 1.2 Matrices
19
E XAMPLE 1.10
•
•
kA is obtained by multiplying every entry of A by k. Example: ⎡ ⎤ ⎡ ⎤ 3 1 9 3 ⎢ ⎥ ⎢ ⎥ ⎢ 2 1 ⎥ ⎢ 6 3 ⎥ ⎥ ⎢ ⎥ ⎢ 3⎢ ⎥=⎢ ⎥ ⎣ 1 0 ⎦ ⎣ 3 0 ⎦ 6 6 2 2
1 0 −1 2 2 0 0 1 3 0 −1 3 + = 2 4 0 −2 3 1 2 1 5 5 2 −1 ⎡ ⎤ 2 3 1 0 0 ⎢ ⎥ is undefined. ⎣ 1 0 ⎦+ 0 1 0 0 0
D EFINITION (Scalar Multiplication) Given an m × n matrix A = [aij ] and a real number k, the scalar multiple of A by k is the m × n matrix kA = [cij ] such that cij = kaij for all i and j.
The following theorem lists a number of properties of matrix addition and scalar multiplication. Note the similarities to Theorem 1.1.
T HEOREM 1.3 1. If A and B are m × n matrices, then A + B is also an m × n matrix. 2. A + B = B + A for all m × n matrices A and B. 3. (A + B) + C = A + (B + C) for all m × n matrices A, B, and C. 4. There exists an m × n matrix Z such that A + Z = Z + A = A for all m × n matrices A. 5. For every m × n matrix A, there exists an m × n matrix D such that A + D = D + A = Z. 6. If c is a real number and A is an m × n matrix, then cA is also an m × n matrix. 7. c(A + B) = cA + cB for all m × n matrices A and B and for all real numbers c. 8. (c + d)A = cA + dA for all m × n matrices A and for all real numbers c and d. 9. (cd)A = c(dA) for all m × n matrices A and for all real numbers c and d. 10. 1A = A for all m × n matrices A.
O UTLINE OF THE P ROOF The proof proceeds analogously to the proof of Theorem 1.1. In particular, Z is the zero m × n matrix (whose entries all equal 0), and D = (−1)A (D is called the negative of the matrix A).
The difference of two matrices can be obtained using the negative: A − B = 1A + (−1)B.
20
Chapter 1 Vectors and Matrices
Transpose and symmetric matrices
D EFINITION (Transpose) The transpose of an m × n matrix A = [aij ] is the n × m matrix AT = [bij ] with bij = aji for all i and j.
An easy way to construct a transpose of a given matrix is to take its rows and write them as columns of the transpose. ⎡
⎤ 4 3 4 0 −1 ⎢ ⎥ T If B = ⎣ 0 2 ⎦, then B = . 3 2 2 −1 2
E XAMPLE 1.11
It should be clear that if we take a transpose of AT , we are going to get the original matrix A back (rewriting rows as columns, then doing it again, restores the original rows and columns). Also, the result of taking a transpose of a sum of two matrices is the same as taking the sum of transposes of the two matrices – likewise, transpose of a scalar multiple is equal to the scalar multiple of a transpose. Let us record these properties in the following theorem. T HEOREM 1.4
(Properties of Transpose)
For any m × n matrices A, B and any scalar c, 1. (A + B)T = AT + B T ; 2. (cA)T = cAT ; 3. (AT )T = A.
In most cases, taking a transpose of a matrix, we obtain a matrix different from the original one. For example, this is obviously the case for any matrix that is not a square matrix (e.g., B 1 2 of Example 1.11). The same goes for many square matrices, e.g., . However, there 3 4 −1 4 are some special square matrices, e.g., , for which the transpose equals the original 4 3 matrix. The following definition refers to all such matrices. D EFINITION If A is a matrix such that A = AT , then it is called a symmetric matrix.
Section 1.2 Matrices
Partitioned matrices
21
Given an m × n matrix A, we will often find it convenient to consider its individual columns as vectors in Rm denoting colj A = (jth column of A). Many matrices in this book will be specified in the form → − → ··· − A= − u u u→ 1 2 n ⎤ | ⎥ − u→ n ⎦ to emphasize the |
⎡
| | ··· ⎢ → − → → ··· where − uj =colj A. (Sometimes, we may write A = ⎣ − u1 u 2 | | ··· column-by-column structure.)
According to the definition on p. 2, a vector is defined to be a column vector. However, in some cases it will be useful to use rows instead, rowi A = (ith row of A), so that the matrix A can be specified row-by-row: ⎡ ⎤ ⎡ → − v1 T ⎢ − ⎥ ⎢ T ⎥ ⎢ → ⎢ ⎢ v2 ⎥ A = ⎢ . ⎥ (or A = ⎢ ⎢ ⎢ .. ⎥ ⎣ ⎣ ⎦ − → T v m
→ − − v1 T → − − v2 T .. .. . . − → − vm T
− − .. . −
⎤ ⎥ ⎥ ⎥) ⎥ ⎦
→ with − vi T =rowi A. Column vectors of the n × n identity matrix In will be of particular significance to us. To make it easier to refer to these vectors in the future, we shall denote the jth column of the n × n → identity matrix by − ej : ⎡ ⎡ ⎡ ⎤ ⎤ ⎤ 1 0 0 ⎢ ⎢ ⎢ ⎥ ⎥ ⎥ ⎢ 0 ⎥ − ⎢ 1 ⎥ ⎢ 0 ⎥ → − → − → ⎢ ⎢ ⎢ ⎥ ⎥ e1 = ⎢ . ⎥ , e2 = ⎢ . ⎥ , . . . , en = ⎢ . ⎥ ⎥. ⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦ 0
0 ⎡
E XAMPLE 1.12 ⎡
⎤ 2 → → − → − − ⎢ ⎥ ⎣ −3 ⎦ = 2 i − 3 j + k 1
⎢ ⎢ − → Every n-vector u = ⎢ ⎢ ⎣
x1 x2 .. . xn
1
⎤ ⎥ ⎥ ⎥ can be written as the linear combination ⎥ ⎦
→ − → → u = x1 − e1 + x2 − e2 + · · · + xn − e→ n. → − − → → − → − → − → − 3 In R , we use the notation i = e1 , j = e2 , k = e3 .
Expressing an m×n matrix in the column-by-column fashion results in a 1×n array where each entry is an m × 1 matrix (vector). Likewise, a row-by-row form is an m × 1 array containing 1 × n row matrices.
22
Chapter 1 Vectors and Matrices Generally, whenever a matrix is expressed as an array containing individual matrices, the matrix is said to be in a partitioned form.
⎡
E XAMPLE 1.13 ⎡ •
2 ⎢ ⎣ 7 3
4
0
9 6
1 5
C= ⎡ •
2 ⎢ ⎣ 7 3
0 1
4 9 6
0 1 5
2 4 ⎢ ⎣ 7 9 3 6 matrix).
0 1
⎡ •
EXERCISES
2
⎢ ⎣ 7 3
⎤ 1 ⎥ 8 ⎦= F 2
5
⎤ 1 ⎥ → 8 ⎦= − u 1 2
4
0
1
9
1
6
5
⎡ •
⎤ 2 4 0 1 ⎢ ⎥ The matrix A = ⎣ 7 9 1 8 ⎦ can be partitioned in several ways, e.g.: 3 6 5 2 ⎤ 1 B C ⎥ with B = 2 4 , 8 ⎦ , which can also be written A = D E 2 7 9 1 8 ,D= , and E = . 3 6 5 2 G
− → u 2
⎡
⎤ ⎡ ⎤ 2 4 0 1 ⎢ ⎥ ⎢ ⎥ with F = ⎣ 7 9 1 ⎦ and G = ⎣ 8 ⎦ . 3 6 5 2
− → u 3
− → u 4
(this is the column-by-column partitioned
⎤ ⎡ → − v1 T ⎥ ⎢ →T ⎥ v2 ⎦ (this is the row-by-row partitioned matrix). 8 ⎦=⎣ − → − v3 T 2 ⎤
⎡
⎤ 2 1 0 −1 ⎢ ⎥ 1. Given the matrix A = ⎣ −2 3 −3 1 ⎦ , determine 4 5 −5 −4 c. col3 A, d. row2 A, e. AT . ⎡ ⎤ 5 6 7 ⎢ ⎥ ⎢ 1 0 8 ⎥ ⎥ 2. Given the matrix B = ⎢ ⎢ −1 −2 1 ⎥ , determine ⎣ ⎦ a. a21 ,
b. a34 ,
3 a. b32 ,
b. b41 ,
c. col2 B,
1 4 d. row1 B,
e. B T .
Section 1.2 Matrices
23
In Exercises 3–4 for each matrix determine whether it is a. b. c. d. e. f.
a diagonal matrix, an upper triangular matrix, a lower triangular matrix, a scalar matrix, an identity matrix, a symmetric matrix. 1 0 0 0 3. K = ; L= ; 0 0 0 0 ⎡ ⎡ ⎤ 1 0 0 1 2 ⎢ ⎢ 0 1 ⎢ ⎥ 4. P = ⎣ 1 0 1 ⎦ ; Q = ⎢ ⎢ 0 0 ⎣ 2 1 0 0 0
M= 1 0 1 0
0 1 0 1
0 0 5 5 ⎤
; N=
In Exercises 7–10, use the matrices A =
.
⎡ ⎤ ⎥ 7 0 0 ⎥ ⎥ ⎥; R = 1 0 ; S = ⎢ ⎣ 0 7 0 ⎦. ⎥ 0 1 ⎦ 0 0 7
In Exercises 5–6, perform each matrix operation if possible. ⎡ ⎤ ⎡ ⎤ 3 0 1 −1 10 5 1 0 ⎢ ⎥ ⎢ ⎥ 5. a. ⎣ 5 −2 ⎦ + ⎣ 3 + 0 ⎦ ; b. −1 0 3 1 2 0 −2 3 8 −3 1 1 −2 0 6. a. + ; b. 2 −1 + 0 1 3 0 −6 1
0 1 −1 0
3 −1 1 2
,B =
⎤ 6 1 0 ⎢ ⎥ ; c. −3 ⎣ 1 1 ⎦ . 0 0 0 3 1 2 ; c. 4 . 4 2 1
1 3 0 4
⎡
,C =
−2 −1 1 0
and the scalar c = −3. 7. Verify property 2 of Theorem 1.3. 8. Verify property 7 of Theorem 1.3. 9. Verify property 3 of Theorem 1.4. 10. Verify property 1 of Theorem 1.4. A matrix is called skew-symmetric if AT = −A. 11. Which of the following matrices are skew-symmetric: ⎡ ⎤ ⎡ ⎤ 0 0 3 0 −4 1 3 0 0 ⎢ ⎥ ⎢ ⎥ A = ⎣ 0 0 −1 ⎦ ; B = ; C=⎣ 4 ? 0 ⎦; D = −3 −1 0 0 −3 1 0 0 0 12. * Prove that a. if a matrix is skew-symmetric, then it must be a square matrix with all diagonal entries equal to zero; b. if A and B are skew-symmetric n × n matrices, then A + B is also skew-symmetric.
24
Chapter 1 Vectors and Matrices A matrix is called tridiagonal if it is a square matrix satisfying the condition aij = 0 if |i − j| > 1. 13. Which of the following matrices are tridiagonal: ⎡ ⎤ ⎡ 0 1 0 1 3 0 0 ⎢ ⎢ 0 0 0 ⎥ ⎢ ⎢ ⎢ 2 −1 5 0 ⎥ ⎥; B = ⎢ 0 1 0 A=⎢ ⎢ ⎥ ⎢ 0 4 1 7 ⎦ ⎢ ⎣ ⎣ 0 0 0 0 0 6 −1 0 0 0
0 0 1 0 1
0 0 0 0 0
(4)
⎤ ⎡ ⎤ ⎥ 1 0 1 ⎥ ⎥ ⎥ ⎥; C = ⎢ ⎣ 0 1 0 ⎦? ⎥ ⎥ 0 0 1 ⎦
14. * Prove that a. if c is a scalar and A is a tridiagonal matrix, then cA is a tridiagonal matrix; b. if A is a tridiagonal matrix, then so is AT .
T/F?
In Exercises 15–22, decide whether each statement is true or false. Justify your answer. 15. If A and B are 2 × 3 matrices, then 3A + 4B is also a 2 × 3 matrix. 16. If A is a 2 × 2 matrix and B is a 4 × 4 matrix, then 5(A + B) is a 2 × 4 matrix. 17. (6A)T = 6 AT for every matrix A. 18.
AT
T T
= AT for every matrix A.
19. For every 3 × 4 matrix A, the matrix AT + A is also 3 × 4. 20. If A is a symmetric matrix, then AT is also a symmetric matrix. 21. Every symmetric matrix is square. 22. Every diagonal matrix is upper triangular.
?
In Exercises 23–26, show an example of a matrix that matches the description or explain why such a matrix does not exist. 23. A scalar matrix that is also lower triangular. 24. A matrix that is simultaneously upper triangular and lower triangular. 25. A matrix that is simultaneously symmetric and skew-symmetric (defined above Exercise 11). 26. A unit upper triangular matrix that is also skew-symmetric (defined above Exercise 11).
Section 1.3 Matrix Multiplication
25
1.3 Matrix Multiplication In the previous section, we have introduced several algebraic operations on matrices. While this section is devoted solely to one additional operation, this operation will prove invaluable. Before formally defining the new operation, we present the following example as a motivation.
E XAMPLE 1.14 In Example 1.7 on p. 9, we have described how Bill used a dot product of two vectors to determine the total number of calories he consumed during breakfast.
Nutrition Facts
Serving Size: 1 cup (25g) Servings Per Container: About 9 Amount Per Serving
Calories 190 % Daily Value
Total Fat 1g
Ann is now joining Bill for breakfast – she will have one serving of cereal “X” and one half of a serving of milk. Being more nutrition conscious, she reads the nutrition labels more closely. She wants to keep track of her fat, carbohydrate, protein, as well as calorie intake during this meal. Guided by Example 1.7, we can suggest that Ann set up dot products of vectors, e.g. ⎡ ⎤ ⎡ ⎤ 1 ← servings of cereal “X” 0 ← fat[g]/serving of cereal “X” ⎢ ⎥ ⎢ ⎥ → − → − a = ⎣ 0 ⎦ ← servings of cereal “Y” x = ⎣ 1 ⎦ ← fat[g]/serving of cereal “Y” 0.5 ← servings of milk 8 ← fat[g]/serving of milk → − → − so that a · x = (1)(0) + (0)(1) + (0.5)(8) = 4 indicates how many grams of fat Ann ⎡ ⎤ 25 ⎢ ⎥ → has consumed. Ann could continue in the same manner by using vectors − y = ⎣ 46 ⎦ and ⎡
12
⎤
4 ⎢ ⎥ − → → z = ⎣ 4 ⎦ to correspond to the carbohydrates and proteins and borrowing the vector − w 8 from Example 1.7. Encouraged by the initial success, Ann now sets out to tabulate the intake of calories, fat, carbohydrate, and protein during this meal for both of them
2%
Fat [g]
Carbohydrate [g]
Protein [g]
?
?
?
?
Ann
Total Carbohydrate 46g 15% Protein 4g
Calories
Bill ? ? ? ? and quickly realizes that the answer involves even more dot products. The tables Cereal “X”
Cereal “Y”
Milk
Ann
1
0
0.5
Bill
0.5
1
1
→ =− aT →T − = b
and Calories
Fat [g]
Carbohydrate [g]
Protein [g]
110 190 150
0 1 8
25 46 12
4 4 8
Cereal “X” Cereal “Y” Milk
− → w
− → x
− → y
− → z
are combined to obtain
Ann Bill
Calories → − → a ·− w = 185 → − − → b · w = 395
Fat [g] − → → a ·− x =4 → − − → b · x =9
Carbohydrate [g] − → → a ·− y = 31 → − − → b · y = 70.5
Protein [g] − → → a ·− z =8 → − − → b · z = 14
26
Chapter 1 Vectors and Matrices
b11 ... b1j ... b1p b21 ... b2j ... b2p ...
... ...
... ...
bn1 ... bnj ... bnp a11 a12 ... a1n
c11 ... c1j ... c11
...
...
...
... ...
... ...
... ...
ai1 ai2 ... ain
ci1 ... cij ... cip
...
...
...
... ...
am1 am2 ... amn
... ...
The operation of matrix multiplication defined below can be viewed as a way of combining information from two data tables (matrices) into one. D EFINITION (Matrix Multiplication) Given an m × n matrix A = [aij ] and an n × p matrix B = [bij ], the product of A and B is the m × p matrix AB = [cij ] with T
cij = (rowi A) · colj B.
... ...
cm1 ... cmj ... cmp
For notational convenience, we can allow the dot product to operate directly on both rows and columns (rather than restricting it solely to column vectors), as long as both have the same number of elements. With this understanding, the main equation in the definition above could be simplified to cij =rowi A·colj B. ⎡
⎤ 2 3 −1 4 2 0 6 ⎢ ⎥ E XAMPLE 1.15 Let A = ⎣ 0 1 ⎦ and B = . Using the notation 1 1 0 0 3 −1 2 of the definition, m = 3, n = 2, and p = 5; therefore, the product C = AB is a 3 × 5 matrix. For instance, we can calculate the (1, 2) entry of the product: T
c12 = (row1 A) · col2 B =
2 3
·
4
= (2)(4) + (3)(1) = 11.
1
Let us highlight the row and column that participated in this calculation: ⎡ ⎢ ⎣
2 0 −1
3
⎤
⎥ 1 ⎦ 2
−1 1
4 1
2 0
0 0
6 3
⎡
· ⎢ =⎣ · ·
11 · ·
· · ·
· · ·
⎤ · ⎥ · ⎦. ·
Each of the remaining entries of C is determined likewise; e.g., T
c35 = (row3 A) · col5 B =
−1 2
·
6 3
= (−1)(6) + (2)(3) = 0.
We ask the reader to verify that ⎡
1 11 4 0 ⎢ AB = ⎣ 1 1 0 0 3 −2 −2 0
⎤ 21 ⎥ 3 ⎦. 0
Note that the product BA of the matrices in the example above is undefined since the number of columns of B (five) does not match the number of rows of A (three). Generally, it is not true that AB = BA (however, it does hold for some matrices A and B). For clarity, we shall occasionally say that A is postmultiplied by B, or premultiplied by B, in order to distinguish AB from BA, respectively.
Section 1.3 Matrix Multiplication
Properties of matrix multiplication
27
While there is no commutativity property of matrix multiplication, here are some of the properties this operation does possess.
T HEOREM 1.5
(Properties of Matrix Multiplication)
Let A, B be m × n matrices, C, D be n × p matrices, and E be a p × q matrix. Also, let r be a real number. 1. (AC) E = A (CE). 2. A(C + D) = AC + AD. 3. (A + B)C = AC + BC. 4. r(AC) = (rA)C = A(rC). 5. (AC)T = C T AT . 6. AIn = Im A = A.
P ROOF Property 1. On the left-hand side, the m × p matrix AC has its (i, k) entry equal to p n m × q matrix (AC)E has its (i, l) entry equal to aij cjk ekl . k=1
n
aij cjk . The
j=1
j=1
p cjk ekl . The (i, l) entry On the right-hand side, the (j, l) entry of the n × q matrix CE is k=1 p p n n of the m×q matrix A(CE) is aij cjk ekl . Both sides are equal to aij cjk ekl . j=1
j=1 k=1
k=1
Property 2. The m × p matrix on the left-hand side has its (i, k) entry equal to The m × p matrix on the right-hand side has its (i, k) entry equal to
n j=1 n j=1
aij (cjk + djk ). n
aij cjk +
aij djk .
j=1
Property 3. This can be proved in the same manner as property 2 (see Exercise 32 on p. 33). Property 4. All three matrices are m × p.
r(AC) has its (i, k) entry equal to r (rA)C has its (i, k) entry equal to A(rC) has its (i, k) entry equal to
n j=1 n
n
aij cjk
;
j=1
(raij ) cjk ; aij (rcjk ) .
j=1
All three matrices are identical. Property 5. Note that both sides are p × m matrices. The (k, i) entry of (AC)T is the same as the (i, k) entry of AC; i.e., (rowi A)T · colk C. The (k, i) entry of C T AT is
T rowk C T · coli AT . Since a row of a transpose matches the corresponding column of the original matrix and
28
Chapter 1 Vectors and Matrices vice versa,
rowk C T so that the two sides are equal.
T T · coli AT = colk C · (rowi A)
Property 6. We show only AIn = A (Im A = A can be shown similarly). The (i, k) entry of the m × n matrix AIn is ⎤ ⎡ ⎤ ⎡ ai1 0 ⎥ ⎢ ⎥ ⎢ ⎢ ai2 ⎥ ⎢ 0 ⎥ ⎥ ⎢ ⎥ ⎢ .. ⎥ ⎢ .. ⎥ ⎢ ⎢ ⎢ ⎥ . ⎥ T ⎥ ⎢ . ⎥ (rowi A) · colk In = ⎢ ⎢ a ⎥ · ⎢ 1 ⎥ ← kth position ⎢ ik ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ .. ⎥ ⎥ ⎢ . ⎥ ⎢ . ⎦ ⎣ .. ⎦ ⎣ ain 0 = aik . Therefore, AIn = A.
E XAMPLE 1.16 Let us illustrate property 5 of Theorem based on the data in Example 1.14: ⎡ 110 1 0 0.5 ⎢ A= and B = ⎣ 190 0.5 1 1 150 The left-hand side matrix,
⎡
T
(AB) =
185 4 31 8 395 9 70.5 14
T
⎢ ⎢ =⎢ ⎢ ⎣
1.5 using the matrices A and B ⎤ 0 25 4 ⎥ 1 46 4 ⎦ . 8 12 8
185 395 4 9 31 70.5 8 14
⎤ ⎥ ⎥ ⎥, ⎥ ⎦
can be interpreted as a table: Ann
Bill
Calories
185
395
Fat [g]
4
9
Carbohydrate [g]
31
70.5
Protein [g] 8 14 To obtain this “transposed” table using matrix multiplication, it makes sense to multiply ⎤ ⎡ ⎡ ⎤ 110 190 150 ← cal. ⎥ ← cereal “X” ⎢ 1 0.5 ⎥ ⎢ 0 1 8 ⎥ ← fat ⎢ ⎥ T T ⎢ times A = ⎣ 0 B =⎢ 1 ⎦ ← cereal “Y” ⎥ ⎣ 25 46 12 ⎦ ← carb. ← milk 0.5 1 4 4 8 ← protein Ann Bill cer. “X” cer. “Y” milk
confirming that Generally, (AC) = A C . T
T
T
T
Attempting to multiply A B multiplied by a 4 × 3 matrix.
T
(AB)T = B T AT . instead would lead us nowhere, as a 3 × 2 matrix cannot be
Section 1.3 Matrix Multiplication
Matrix power
29
The notation Ak , with an n × n matrix A and a nonnegative integer k, represents the kth power of the matrix A, defined in a way consistent with the powers of real numbers: k factors
A
= AA · · · A.
k
A0 E XAMPLE 1.17
If A =
2 1 0 5
, then
A3 = AAA = (AA)A =
Block multiplication of partitioned matrices
= In .
4 7 0 25
2 1 0 5
=
8 39 0 125
.
The partitioned matrices, introduced in the previous section, can be used in matrix multiplication as illustrated below (see Exercises 43 and 44 for a justification).
E XAMPLE 1.18
Consider two partitioned matrices: ⎤ ⎡ ⎡ 0 2 4 2 ⎥ ⎢ ⎢ A = ⎣ 1 1 3 ⎦ and B = ⎣ 1 9
0
5
We can write
C E
4 D F
0 5
3 1
1
0
G H
⎤ 1 ⎥ 2 ⎦. 0
A= and B = 0 2 4 2 0 3 1 where C = ,D = ,E = 9 0 ,F = 5 ,G= , 1 1 3 1 5 1 2 and H = 4 1 0 0 .
The product AB can be obtained as follows: C D G CG + DH = . AB = E F H EG + F H Since 0 2 2 0 3 1 4 CG + DH = + 4 1 0 1 1 1 5 1 2 3 2 10 2 4 16 4 0 0 18 = + = 3 5 4 3 12 3 0 0 15 and 2 0 3 1 EG + F H = + 5 9 0 4 1 0 1 5 1 2 = 18 0 27 9 + 20 5 0 0 = 38
0 14 2 4 8 4 3
0 5 27 9
,
30
Chapter 1 Vectors and Matrices it follows that
⎤ 18 14 2 4 CG + DH ⎥ ⎢ AB = = ⎣ 15 8 4 3 ⎦. EG + F H 38 5 27 9 Check that multiplying AB directly (without partitions), you can obtain the same result.
⎡
Multiplication of partitioned matrices in such a way is referred to as a block multiplication. Note that in order for block multiplication to be possible, care must be taken to partition the matrices in the proper way, so that the corresponding dimensions match. Block multiplication can be very useful when large matrices are multiplied using parallel computers. The required operations of multiplication between partitions can be divided among individual processors, and their outputs can be assembled to form the final product. Such solution can be obtained more quickly (as compared to multiplying matrices in their entirety), as each individual processor is solving a smaller problem and does not require access to all the data, just the relevant partitions.
EXERCISES
⎡
⎤ 2 −1 4 0 0 1 ⎢ ⎥ In Exercises 1–8, use the matrices A = ⎣ 3 , 0 ⎦, B = 2 −2 0 3 2 1 3 4 4 1 1 2 0 C= ,D= , and E = 5 −2 1 −1 −1 0 2 to evaluate each expression if possible. 1. a. CD; b. DC; c. AD; d. DA; e. CE; f. EC. 2. a. CB; b. BC; c. AB; d. BA; e. BD; f. DB. T T 3. a. (AC) ; b. AT C T ; c. C T AT ; d. B T C ; e. C T B. 4. a. (CD)T ; b. C T DT ; c. DT C T ; d. (DT C)T ; e. C T D. 5. a. C 2 ; b. D2 . 6. a. E 3 ; b. B 4 . 7. a. ED + CAT ; b. ACD; c. C 2 ED + AB. 8. a. BE − EB; b. B T EC; c. DT E 2 − 2A. In Exercises 9–12, use the matrices A =
0 2 −1 1 3 −2
,B =
2 2 0 0 1 3
,
Section 1.3 Matrix Multiplication ⎡
⎤
⎡
31
⎤
4 −2 ⎢ ⎥ ⎢ ⎥ C = ⎣ 1 ⎦ , D = ⎣ 1 ⎦ , and E = 4 2 . 1 −3 9. Verify property 1 of Theorem 1.5. 10. Verify property 2 of Theorem 1.5. 11. Verify property 3 of Theorem 1.5. 12. Using the scalar r = 2, verify property 4 of Theorem 1.5.
T/F?
In Exercises 13–22, decide whether each statement is true or false. Justify your answer. 13. AB 2 = (AB)B for all 3 × 3 matrices A and B. 14. (A − B) C = AC − BC for all 2 × 2 matrices A, B, and C. 15. (A − B)(A + B) = A2 − B 2 for all n × n matrices A and B. 2
16. (AB) = A2 B 2 for all n × n matrices A and B. 17. (A2 )3 = A6 for all n × n matrices A. 18. (A + B)2 = A2 + 2AB + B 2 for all n × n matrices A and B. 19. (A − B)2 = A2 − 2AB + B 2 for all n × n matrices A and B. 20. A3 A4 = A7 for all n × n matrices A. 21. (AT B)T = B T A for all n × n matrices A and B. 22.
T T T A B = AB for all n × n matrices A and B.
23. A car company operates three plants P L1, P L2, and P L3 that manufacture four models: a compact car (CC), a luxury sedan (LS), a pickup truck (PT), and a minivan (MV). If the production during an average day at each plant is specified in this table CC
LS
PT
MV
Plant 1
100
30
0
0
Plant 2
0
0
100
80
Plant 3 0 30 60 40 and the retail price as well as the profit made on each vehicle sold is Price ($)
Profit ($/vehicle)
CC
18,000
1,200
LS
40,000
5,000
PT
24,000
2,500
MV
30,000
3,500
use matrix multiplication to calculate the entries in the following table: Total revenue ($/day)
Total profit ($/day)
Plant 1
?
?
Plant 2
?
?
Plant 3
?
?
32
Chapter 1 Vectors and Matrices 24. John, Kate, Laura, and Michael are enrolled in the same linear algebra class. Their test averages, quiz averages, and final exam scores are as follows: Test Avg.
Quiz Avg.
Final
John
75
80
60
Kate
70
80
100
Laura
95
100
85
Michael
80
90
95
If the overall course grade is determined based on the scheme Test Avg.
0.5
Quiz Avg.
0.2
Final 0.3 set up matrix multiplication to determine the overall course averages of the four students in a 4×1 matrix. 25. John’s family lives in Canada – he spends 30 minutes in an average month calling them. He also has friends in Japan and France and his calls to each of these countries average 40 minutes per month. Kate has family in Canada and in U.K. and spends about an hour per month calling each country. Her typical monthly international phone bill also includes about 20 minutes of calls to friends in Mexico and 30 minutes to those in France. They are trying to compare offers of three different telephone companies: Diagonal Enterprises, Symmetric Telecom, and Transpose Labs, which advertise the following rates per minute to the countries John and Kate are interested in calling: D.E.
S.T.
T.L.
Canada
$0.05
$0.10
$0.05
France
$0.10
$0.10
$0.10
Japan
$0.20
$0.10
$0.30
Mexico
$0.05
$0.10
$0.15
U.K. $0.10 $0.10 $0.05 Use matrix multiplication to arrive at a table representing expected monthly cost to John and to Kate, with each of the three companies: D.E.
S.T.
T.L.
John
?
?
?
Kate
?
?
?
Based on the values obtained, which company should John choose? What about Kate? In Exercises 26–28, we shall let 0m×n denote the zero m × n matrix (i.e., the m × n matrix all of whose entries are zero). 26. Show that if A is an m × n matrix, then a. A0n×p = 0m×p , b. 0k×m A = 0k×n . 27. * Generally, if A is an m × n matrix and B is an n × p matrix, AB = 0m×p does not imply that either A = 0m×n or B = 0n×p . → → → a. Using the vectors − u and − v of Example 1.8, construct the nonzero matrices A = − uT → − and B = v whose product AB is a zero 1 × 1 matrix.
Section 1.3 Matrix Multiplication
33
b. Find examples of nonzero 2 × 2 matrices A and B such that AB = 02×2 . 28. * An n × n matrix A such that for some positive integer k, Ak = 0n×n is called a nilpotent matrix. ⎡ ⎤ 0 1 0 ⎢ ⎥ a. Show that A = ⎣ 0 0 0 ⎦ is a nilpotent matrix. 1 0 0 b. Find an example of a 2 × 2 nonzero nilpotent matrix.
→ 29. * Let A be an m × n matrix and − x be an n-vector. 2 → → → x = A− x . a. Apply Theorem 1.5 and Theorem 1.2 to show that − x T AT A− → − → x = 0 implies b. Use the result of part a and Exercise 55 on p. 16 to show that AT A− → − → A− x = 0. → − → − → → c. Apply Theorem 1.5 to show that A− x = 0 implies AT A− x = 0 , consequently (together with the result of part b) establishing → − → − → → A− x = 0 if and only if AT A− x = 0. (5)
30. * If A is an m × n matrix, show that both AT A and AAT are symmetric. 31. * If A and B are n × n symmetric matrices, is AB always symmetric? If so, prove it. If not, find a counterexample. 32. Assuming A, B are m × n matrices and C is an n × p matrix, show that a. the matrices L = (A + B)C and R = AC + BC have the same size, b. the (i, k) entries of both matrices L and R of part a are equal to each other for all i = 1, . . . , m and k = 1, . . . , p. ⎡ ⎤ 1 −2 0 2 1 −1 ⎢ ⎥ 33. Consider the matrices A = ⎣ 0 . Verify that mul1 ⎦ and B = 1 −1 1 3 2 3 tiplying AB directly, the result you obtain is the same as when using block multiplication C and B = E F . with A = D ⎡
2 1 ⎢ 34. Consider the matrices A = ⎣ 0 −1 1 0 AB directly, the result youobtain is E . A = C D and B = F
⎤ ⎡ 2 −1 2 ⎥ ⎢ 3 ⎦ and B = ⎣ 3 0 2 0 1 the same as when using
⎤ ⎥ ⎦ . Verify that multiplying block multiplication with
→ 35. * Assume A is an m × n matrix partitioned into columns A = − u 1 ⎤ ⎡ − → T v1 ⎢ . ⎥ . ⎥ is an n × p matrix partitioned into rows B = ⎢ ⎣ . ⎦ . Show that − T v→ n → →− u→− v T + ··· + − v→T AB = − u 1 1
without relying on block multiplication.
n n
···
− u→ n
and B
34
Chapter 1 Vectors and Matrices 36. * Show that a product of two n × n diagonal matrices is also a diagonal matrix. 37. * Show that if D is a diagonal matrix and k is a positive integer, then Dk is a diagonal matrix whose main diagonal entries are kth powers of the corresponding entries in D. 38. * Assume A is an m × n matrix, B is an n × p matrix, and both A and B are rectangular diagonal. Show that AB is also a rectangular diagonal matrix. 39. * Show that a product of two n × n upper triangular matrices is also an upper triangular matrix. 40. * Show that a product of two n × n unit lower triangular matrices is also a unit lower triangular matrix. 41. * Assume A and B are two n × n tridiagonal matrices; i.e. (see formula (4)), aij = bij = 0 if |i − j| > 1. a. Show that C = AB is not generally tridiagonal. b. Show that C = AB is pentadiagonal, i.e., satisfies cij = 0 if |i − j| > 2. 42. * If A is an n × n matrix and c0 , . . . , ci are real numbers, then p(A) = ci Ai + ci−1 Ai−1 + · · · + c2 A2 + c1 A + c0 In is called a matrix polynomial. If q(A) = dj Aj + · · · + d1 A + d0 In , show that the two polynomials commute; i.e., p(A)q(A) = q(A)p(A). 43. * Prove each equality below: A1 B A1 B= a. where A1 is m × p, A2 is n × p, and B is p × q. A2 A2 B b. A B1 B2 = AB1 AB2 where A is m × n, B1 is n × p, and B2 is n × q. B 1 c. A1 A2 = A1 B1 + A2 B2 where A1 is m × n, A2 is m × p, B1 is n × q, B2 and B2 is p × q. 44.
a. * Use the results of the previous exercise to prove that A11 A12 B11 B12 A11 B11 + A12 B21 A11 B12 + A12 B22 = A21 A22 B21 B22 A21 B11 + A22 B21 A21 B12 + A22 B22 with all of the partitions appropriately sized. b. * Again, assuming all partitions are appropriately sized, discuss how the results of the previous exercise could be used (repeatedly) to prove the general equality ⎤⎡ ⎤ ⎡ ⎤ ⎡ A1n C1p A11 · · · C11 · · · B11 · · · B1p ⎢ . ⎢ ⎢ .. ⎥ .. ⎥ .. ⎥ .. .. .. ⎥⎢ . ⎥ ⎢ . ⎥ ⎢ . . . . . ⎦ ⎣ .. . ⎦ = ⎣ .. . ⎦ ⎣ . Am1 · · · Amn Bn1 · · · Bnp Cm1 · · · Cmp where Cij = Ai1 B1j + Ai2 B2j + · · · + Ain Bnj .
45. * If A and B are n × n matrices, show that ⎡ ⎤⎡ In A 0 In ⎢ ⎥⎢ In B ⎦ ⎣ 0 ⎣ 0 0
0
In
0
−A In 0
AB
⎤
⎥ −B ⎦ = I3n . In
Section 1.4 Introduction to Linear Transformations
35
46. * Consider an m × m matrix in partitioned form, assuming4 i > 1 and i + 1 < j < m ⎡ ⎤ Ii−1 0 0 0 0 ⎢ ⎥ 0 a 0 b 0 ⎢ ⎥ ← ith row ⎢ ⎥ ⎢ ⎥ A = ⎢ 0 0 Ij−i−1 0 0 ⎥ ⎢ ⎥ ⎣ ⎦ ← jth row 0 c 0 d 0 0
0
0
↑ ith column
0
Im−j
↑ jth column
and an m × n matrix partitioned as follows: ⎡ ⎤ P ⎢ − ⎥ u T ⎥ ← ith row ⎢ → ⎢ ⎥ ⎥ B=⎢ ⎢ Q ⎥ ⎢ − ⎥ ⎣ → v T ⎦ ← jth row R
⎡
P → − uT
⎢ ⎢ ⎢ a. Show that if a = d = 1 and b = 0, then AB = ⎢ Q ⎢ ⎢ → − → T uT ⎣ v + c−
⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦
R
⎤ P ⎥ ⎢ − vT ⎥ ⎢ → ⎥ ⎢ ⎥ b. Show that if a = d = 0 and b = c = 1, then AB = ⎢ ⎢ Q ⎥. ⎢ − → T ⎥ ⎣ u ⎦ ⎡
⎡
P ⎢ − → ⎢ auT ⎢ c. Show that if b = c = 0 and d = 1, then AB = ⎢ ⎢ Q ⎢ − vT ⎣ →
R ⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦
R
1.4 Introduction to Linear Transformations
Matrix-vector product A special case of multiplication of two matrices arises when the second matrix is a vector (i.e., it has only one column). As shown in the following theorem, such a product can be expressed as a linear combination of the columns of the first matrix. 4 It can be shown that the results of Exercise 46 will also hold if we permit i = 1, j = i + 1, or j = m, but some of the partitions shown in matrices A and B would no longer be present in those cases.
36
Chapter 1 Vectors and Matrices ⎡
T HEOREM 1.6 ⎡ ⎢ ⎢ − → n-vector v = ⎢ ⎢ ⎣
| | ··· ⎢ − − → → The product of an m × n matrix A = ⎣ u1 u2 · · · | | ··· ⎤ x1 ⎥ x2 ⎥ .. ⎥ ⎥ is . ⎦ xn
⎤ | ⎥ − u→ n ⎦ and an |
→ →+x − → − → A− v = x1 − u 1 2 u2 + · · · + x n un .
P ROOF → Multiplying A− v using the formula in the definition on p. 26, we obtain ⎡ ⎤⎡ ⎤ ⎡ ⎤ a11 a12 · · · a1n x1 a11 x1 + a12 x2 + · · · + a1n xn ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ a21 a22 · · · a2n ⎥ ⎢ x2 ⎥ ⎢ a21 x1 + a22 x2 + · · · + a2n xn ⎥ ⎢ . ⎥ ⎥ ⎥ ⎢ ⎢ .. .. ⎥ ⎢ .. ⎥ = ⎢ .. .. ⎢ . ⎥ . ⎣ . ⎦ . . ⎦⎣ . ⎦ ⎣ . am1 am2 · · · amn xn am1 x1 + am2 x2 + · · · + amn xn ⎡ ⎤ ⎡ ⎤ ⎤ ⎡ a12 x2 a1n xn a11 x1 ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ a2n xn ⎥ ⎢ a21 x1 ⎥ ⎢ a22 x2 ⎥ ⎢ ⎢ ⎥ ⎥ ⎥ ⎢ =⎢ .. .. .. ⎥+⎢ ⎥ + ···+ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎦ ⎣ . . . am1 x1 am2 x2 amn xn ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ a11 a12 a1n ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ a21 ⎥ ⎢ a22 ⎥ ⎢ a2n ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ = x1 ⎢ . ⎥ + x2 ⎢ . ⎥ + · · · + xn ⎢ . ⎥ ⎥. ⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦ am1 am2 amn
The multiplication of a matrix by a vector is frequently going to be performed under the assumption that the matrix is fixed (while the vector can change). Considering this point of view, → we can think of the process of calculating A− v as a transformation (alternatively called a function or a mapping): → → F (− v ) = A− v. (6) n It should be clear that for an m×n matrix, this transformation accepts a vector in R as its input (or “independent variable”) and produces a vector in Rm as its output (“dependent variable”). This can be succinctly expressed using the notation F : Rn → Rm , which designates Rn as the domain of F and Rm as the codomain of F. Note that F must produce a single codomain value for each value from its domain. However, there may be some values in the codomain which cannot be obtained by applying F to any domain value; the subset of the codomain consisting of all the outputs of F is called the range of F.
F v
F(v) = Av
In your study of calculus and other areas of mathematics, you have encountered a large number of functions. For instance: •
p(x) = x2 is an example of a scalar-valued function of one variable; p : R → R indicates that both the domain and the codomain of p are the set of all real numbers (however, the range of p is the set of nonnegative real numbers).
Section 1.4 Introduction to Linear Transformations •
•
37
2 g(x, y) = exy is a scalar-valued function of two variables; g : R → R designates the x domain to be the set of all two-vectors with real components while the codomain is y R (the range is the set of all positive real numbers). cos t → − r (t) = defines a vector-valued function of one variable, which can be alternasin t → tively expressed as parametric equations x = cos t, y = sin t; − r : R → R2 and therefore x the domain is R and the codomain is R2 (the range is the set of all 2-vectors such y that x2 + y 2 = 1, i.e., the unit circle in the xy-plane centered at the origin).
Compared to the general functions mapping Rn into Rm like the ones listed above, it turns out that transformations defined according to formula (6) are very special. Their key properties, → → which follow from parts 2 and 4 of Theorem 1.5, are that for all n-vectors − u and − v and for all scalars c → → → → → → A(− u +− v ) = A− u + A− v and A(c− u ) = c(A− u ). We shall distinguish those mappings F which behave in this fashion. D EFINITION (Linear Transformation) F : Rn → Rm is a linear transformation if → → → → → → 1. for all n-vectors − u,− v , F (− u +− v ) = F (− u ) + F (− v ) and → → → 2. for all n-vectors − u and real numbers c, F (c− u ) = cF (− u ). → Note that none of the functions p, g, and − r discussed above satisfy these conditions; therefore, they are not linear transformations. By repeatedly applying the two conditions in the linear transformation definition, the following important result can be established.
→, . . . , − → and u u T HEOREM 1.7 F is a linear transformation if and only if for any n-vectors − 1 k scalars c1 , . . . , ck → + ··· + c − → − → − → u F (c1 − (7) 1 k uk ) = c1 F (u1 ) + · · · + ck F (uk ).
Obviously, for every matrix A, the transformation F defined by formula (6) is a linear transformation. The converse is also true.
T HEOREM 1.8 F : Rn → Rm is a linear transformation if and only if there exists an m × n → → matrix A such that F (− v ) = A− v. P ROOF Part I (⇒) Let F be a linear transformation. Using the notation introduced in Example 1.12, we can write
38
Chapter 1 Vectors and Matrices ⎡ ⎢ ⎢ − → v =⎢ ⎢ ⎣
This theorem claims equivalence of the statements F is a lin. transf. and → → F (− v ) = A− v for some A . ⇔ , To prove it is sufficient to prove ⇒ and ⇐ .
x1 x2 .. . xn
⎤ ⎥ ⎥ → → ⎥ = x1 − e1 + x2 − e2 + · · · + xn − e→ n so that ⎥ ⎦
→ → → F (− v ) = F (x1 − e1 + x2 − e2 + · · · + xn − e→ n) → − → − = x1 F ( e1 ) + x2 F ( e2 ) + · · · + xn F (− e→ (8) n ). This linear combination can now, by Theorem 1.6, be rewritten as the matrix-vector product ⎡ ⎤ ⎤ x1 ⎡ ⎢ ⎥ | | ··· | x2 ⎥ ⎥⎢ ⎢ → − → − − → → − ⎢ (9) F ( v ) = ⎣ F ( e1 ) F ( e2 ) · · · F (en ) ⎦ ⎢ . ⎥ ⎥ ⎣ .. ⎦ | | ··· | xn A
→ → → demonstrating that F (− v ) can be written as a product A− v where the jth column of A is F (− ej ). Part II (⇐) → → If F (− v ) = A− v , then F is a linear transformation by properties 2 and 4 of Theorem 1.5.
Geometric interpretation of linear transformations in R2
Throughout this book, we will often refer to specific linear transformations F : R2 → R2 . On one hand these transformations are simple enough to be easily described both algebraically (using 2 × 2 matrices) and geometrically (in the xy-plane). On the other hand, they form a stepping stone that will enable us to gain insight into more complicated settings. In this subsection, we shall present some important examples of such transformations. E XAMPLE 1.19
y
u F(u)
y F(u)
x
u x
Consider the transformation F : R2 → R2 given by x 1 0 x F( )= . y 0 0 y
According to Theorem 1.8, this transformation is linear. Performing the multiplication on the right-hand side we obtain x x F( )= . y 0 x → − This corresponds to the (orthogonal) projection of the vector u = onto the x-axis since y → → the resulting vector F (− u ) retains the x-component of − u but has its y-component equal to zero. x 0 0 x represents the projection onto the y-axis. Likewise, F ( )= y 0 1 y
E XAMPLE 1.20
The scaling transformation F : R2 → R2 defined by → → F (− u ) = k− u
(10)
Section 1.4 Introduction to Linear Transformations
39
is a linear transformation whose matrix representation is x k 0 x F( )= . y 0 k y If k > 1, then F is called a dilation; if 0 < k < 1, it is called a contraction. In the special case → → k = 1, F (− u) = − u is called the identity transformation with the identity matrix I2 serving as the matrix of this transformation.
Let us consider a transformation, which performs a reflection with respect x → to the y-axis. A vector − u = should be transformed into its mirror image, where the y mirror is positioned along the y-axis. This means reversing the sign of the x-coordinate, while keeping the y-coordinate unchanged: x −x F( )= . y y It can be easily verified that this transformation can be rewritten in the matrix form: x −1 0 x F( )= . 0 1 y y E XAMPLE 1.21
y F(u) -x
u x
y u x -y
F(u)
To perform reflection with respect to the x-axis instead, use the transformation x 1 0 x x F( )= = . y 0 −1 y −y
y a F(u) b -b
E XAMPLE 1.22 tion
u a
x
A counterclockwise rotation by 90 degrees is performed by the transforma x 0 −1 x F( )= . y 1 0 y
Consider a general linear transformation in R2 x a F( )= y c
This is consistent with the equation (9) in the proof of Theorem 1.8, according to which the columns of the matrix of the linear transformation F represent results of applying the transformation to the unit vectors → → − e2 , . . . . e1 , −
b d
x y
.
To help understand it, us let us investigate what happens when F is applied to the unit vectors 1 0 → − → e1 = and − e2 = : 0 1 1 a 0 b F( )= ; F( )= . 0 c 1 d We can write x → → → → F( ) = F (x− e1 + y − e2 ) = xF (− e1 ) + yF (− e2 ). y The nature of the linear transformation F : R2 → R2 is fully revealed by the results of → → → applying F to − e1 and − e2 . Any other vector in R2 is obviously a linear combination of − e1 and → − → − → − e2 – applying F to such a vector results in a linear combination of F ( e1 ) and F ( e2 ) with the
40
Chapter 1 Vectors and Matrices same coefficients:
y
y F
d c
1 e2
F(e2) F(e1)
x
e1 1
b
a
x
1 , which would be placed along the diagonal (starting at the origin) of the 1 square on the left, is mapped into the vector along the corresponding diagonal of the parallelogram on the right. 2 → → − → → e2 ) as illustrated below. The vector u = is transformed into F (− u ) = 2F (− e1 ) − 1F (− −1 E.g., the vector
y
y
F e2
e1
F(e2)
x
F(e1)
x
F(u)
u
As we will see in the next two examples, it is sometimes convenient to proceed in the opposite → → direction: the information obtained from F (− e1 ) and F (− e2 ) can be used to build a matrix A → − → − such that F ( x ) = A x .
y v u+v F(u+v )=
F(u ) a
F(u)+F(v) F(v )
u x
E XAMPLE 1.23 in R2 .
Let us now consider F representing a counterclockwise rotation by angle α
→ → Recall from the discussion of vector addition on p. 4 that the sum of two vectors − u and − v in → − 2 R is a vector along the diagonal of the parallelogram with two adjacent sides formed by u and → − v . When both vectors are rotated by α, the entire parallelogram is rotated as well, including → → → → the diagonal (the sum). Therefore, F (− u +− v ) = F (− u ) + F (− v ) (see the margin illustration). Likewise, it can be easily seen that rotating a scalar multiple of a vector is equivalent to taking → → a scalar multiple (with the same scalar) of the rotated vector: F (c− u ) = cF (− u ). Therefore, by definition, F is a linear transformation. 1 0 To find the matrix of F, it is sufficient to find F ( ) and F ( ). 0 1 From trigonometry we have 1 cos α F( )= ; 0 sin α
F(
0 1
)=
− sin α cos α
.
Section 1.4 Introduction to Linear Transformations
y F(e2)
1 e2 cosa F(e1)
sina a
Consequently, the rotation transformation can be expressed as x cos α − sin α x F( )= . y sin α cos α y
41
(11)
Note that the transformation discussed in Example 1.22 is just a special case of our transformation, with α = 90◦ .
a
e1 cosa 1 x
-sina
y
x
E XAMPLE 1.24 Consider the polygon representing the outline of the capital letter “F” pic→ tured in the margin. If the position vector − v of each of the ten corner points undergoes → − → − the linear transformation F ( v ) = A v and if the corresponding points are connected in the same way, the letter “F” will still be recognizable, but distorted. An easy way to find a b the matrix A = is to examine how this transformation affects horizontal vectors c d 1 and vertical ones. Once this is clear, the reasoning presented on p. 39 yields F ( ) = 0 0 b a . ;F( )= c 1 d It may be helpful to draw a rectangle aligned with the axis surrounding the original figure. The transformed figure is then surrounded by a parallelogram. Here are some specific examples: (a)
y
y x
x
Illustration for part (a) Since F (
y
0 a
Every vertical vector
1 0
transforms into
a a
y x
while every horizontal vector remains unchanged.
)=
1 0
and F (
0 1
)=
1 1
, we have A =
1 1 0 1
.
The remaining three transformations can be analyzed in a similar manner.
x Illustration for part (b)
y
x Illustration for part (c)
y
x Illustration for part (d)
(b) Every horizontal vector is doubled, while every vertical vector remains unchanged. 1 2 0 0 2 0 Since F ( )= and F ( )= , we obtain A = . 0 0 1 1 0 1 (c) Every horizontal vector is reversed, while every vertical vector remains unchanged. 1 −1 0 0 −1 0 F( )= and F ( )= yield A = . 0 0 1 1 0 1 a 0 (d) Every horizontal vector is transformed into , while every vertical vector 0 a 0 −b is transformed into . b 0 1 0 0 −1 0 −1 From F ( )= and F ( )= we obtain A = . 0 1 1 0 1 0
42
Chapter 1 Vectors and Matrices
Geometric interpretation of linear transformations in R3
Although it becomes more difficult than for transformations in R2 , many linear transformations in R3 can also be interpreted geometrically. The table below lists matrices M for linear transformations ⎡ ⎤ ⎡ ⎤ a a ⎢ ⎥ ⎢ ⎥ F ( ⎣ b ⎦) = M ⎣ b ⎦ c c which are three-dimensional extensions of some of the transformations discussed in the previous subsection.
z c u
O
F(u)
a
y b
x Projection onto the y-axis Projection
z
onto the ...
c u
O
y b
a
y
x
⎡
z-axis ⎤ 0 0 0 ⎢ ⎥ ⎣ 0 0 0 ⎦ 0 0 1
xy-plane ⎤ 1 0 0 ⎢ ⎥ ⎣ 0 1 0 ⎦ 0 0 0
⎡
xz-plane ⎤ 1 0 0 ⎢ ⎥ ⎣ 0 0 0 ⎦ 0 0 1
⎡
Reflection with respect to the ...
⎡
x-axis ⎤ 1 0 0 ⎢ ⎥ 0 ⎦ ⎣ 0 −1 0 0 −1
⎡
y-axis ⎤ −1 0 0 ⎢ ⎥ 0 ⎦ ⎣ 0 1 0 0 −1
⎡
Reflection with respect to the ...
⎡
xy-plane ⎤ 1 0 0 ⎢ ⎥ 0 ⎦ ⎣ 0 1 0 0 −1
⎡
xz-plane ⎤ 1 0 0 ⎢ ⎥ ⎣ 0 −1 0 ⎦ 0 0 1
⎡
c 0 0 k When k = 1, this becomes an identity transformation.
z
yz-plane ⎤ 0 0 0 ⎢ ⎥ ⎣ 0 1 0 ⎦ 0 0 1
yz-plane ⎤ −1 0 0 ⎢ ⎥ ⎣ 0 1 0 ⎦ 0 0 1
c
positive x-axis represents the forward direction, positive y-axis points directly to the right, and positive z-axis points down.
z
The change in orientation of the body of the aircraft is described by:
y
Yaw
x
z-axis ⎤ −1 0 0 ⎢ ⎥ ⎣ 0 −1 0 ⎦ 0 0 1
Rotations in three dimensions are somewhat more difficult to picture but are of great importance in a number of applications. For example, consider a motion of an airplane in the air, with the coordinate system positioned in such a way that the airplane’s center of mass is at the origin,
Pitch
y
y-axis ⎤ 0 0 0 ⎢ ⎥ ⎣ 0 1 0 ⎦ 0 0 0
Dilation and contraction can be defined by formula (10), which here becomes ⎡ ⎤ ⎡ ⎤⎡ ⎤ a k 0 0 a ⎢ ⎥ ⎢ ⎥⎢ ⎥ F (⎣ b ⎦) = ⎣ 0 k 0 ⎦ ⎣ b ⎦ .
Roll
x
⎡
⎡
-c
Reflection with respect to the xy-plane
x-axis ⎤ 1 0 0 ⎢ ⎥ ⎣ 0 0 0 ⎦ 0 0 0
Projection onto the ...
F(u)
x
⎡
z
•
roll – rotation about the x-axis,
•
pitch – rotation about the y-axis, and
•
yaw – rotation about the z-axis.
Section 1.4 Introduction to Linear Transformations
43
E XAMPLE 1.25 A rotation in R3 about any line passing through the origin can easily be shown to be a linear transformation by following an argument similar to the one given in Example 1.23. In particular, if F denotes the rotation about the z-axis by the angle α measured counterclockwise when looking at the xy-plane from the positive z-direction, then ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 1 0 cos α 0 − sin α ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ F (⎣ 0 ⎦) = ⎣ sin α ⎦ ; F (⎣ 1 ⎦) = ⎣ cos α ⎦; F (⎣ 0 ⎦) = ⎣ 0 ⎦.
0
0
0
0
compare to Example 1.23
0
1
1
remains unchanged
Consequently, this transformation can be expressed as ⎡ ⎤ ⎡ a cos α − sin α ⎢ ⎥ ⎢ F (⎣ b ⎦) = ⎣ sin α cos α c
0
⎤⎡ ⎤ 0 a ⎥⎢ ⎥ 0 ⎦⎣ b ⎦. 1 c
For a discussion of other rotations in R3 refer to the exercises in this section.
An example involving an application of a linear transformation
It is difficult to overestimate the importance of the geometric interpretation of linear transformations in R2 and R3 . At the same time, such transformations have numerous applications outside of the geometry of vectors. Our next example illustrates one of them. E XAMPLE 1.26 The following is based on a model proposed by van Doorp et al.5 to describe population dynamics of perennial grasses. Let us begin with a verbal description, which will lead us to its mathematical formulation. •
S
(S) seed, (V) vegetative adult plant, and (G) generative adult (flowering) plant.
SEED 0.006 •
VEGETATIVE ADULT 0.4 0.2
G
GENERATIVE ADULT 0.4
0.4
The following rules govern the transition from one year to the next: (S) A seed can become a vegetative adult plant with a probability of 0.006 (in other words, of any 1,000 seeds, only 6 are expected to germinate); the remaining seeds die. (V) Of any five vegetative adult plants, two are expected to remain vegetative (V), two will flower (G), and one will die. (G) The exact same scenario is anticipated for a generative adult (flowering) plant (40% will remain in (G), 40% will become (V), and 20% will die) with one crucial distinction: each flowering plant is expected to produce 500 seeds.
V
0.4
Three distinct stages are considered:
500
−−→ If we let x(n) be the “state vector” representing the population count in each of the three categories in year n, ⎡ (n) ⎤ S −−→ ⎢ ⎥ (n) x = ⎣ V (n) ⎦ , G(n) 5 Reference: D. van Doorp, P. Schippers, and J. M. van Groenendael, Migration rates of grassland plants along corridors in fragmented landscapes assessed with a cellular automation model, Landscape Ecology, vol. 12, no. 1, pp. 39–50 (1997).
44
Chapter 1 Vectors and Matrices then the following matrix represents the transformation that occurs over the course of one year: ⎡ ⎤ 0 0 500 ⎢ ⎥ M = ⎣ 0.006 0.4 0.4 ⎦ . 0 0.4 0.4 −−−−→ −−→ All we have to do now is multiply M by x(n) to obtain the next year’s state vector x(n+1) . Let us suppose we would like to perform a simulation using the model described here, starting ⎡ ⎤ 12, 500 −−→ ⎢ ⎥ with just 12,500 seeds (and no adult plants), which means x(0) = ⎣ 0 ⎦. Here are the 0 necessary calculations to determine the state in the next year: ⎡ ⎤⎡ ⎤ ⎡ ⎤ 0 0 500 12500 0 −−→ ⎢ ⎥⎢ ⎥ ⎢ ⎥ x(1) = ⎣ 0.006 0.4 0.4 ⎦ ⎣ 0 ⎦ = ⎣ 75 ⎦ . 0 0.4 0.4 0 0 After the first year of simulation, all we have is 75 vegetative plants. We let the simulation take its own course without any intervention (e.g., no added seeds, etc.). Let’s see what happens next year... ⎡ ⎤⎡ ⎤ ⎡ ⎤ 0 0 500 0 0 −−→ ⎢ ⎥⎢ ⎥ ⎢ ⎥ x(2) = ⎣ 0.006 0.4 0.4 ⎦ ⎣ 75 ⎦ = ⎣ 30 ⎦ . 0 0.4 0.4 0 30 Somehow, looking at these numbers, we may feel a bit disappointed. After all, haven’t we started with 12,500 seeds, only to see 12,425 of them perish? And now, of the 75 plants for n = 1, only 60 are still alive.... Well, there is good news on the horizon – let’s just multiply M by the current state vector to see the prediction for the following year: ⎡ ⎤⎡ ⎤ ⎡ ⎤ 0 0 500 0 15, 000 −−→ ⎢ ⎥⎢ ⎥ ⎢ ⎥ x(3) = ⎣ 0.006 0.4 0.4 ⎦ ⎣ 30 ⎦ = ⎣ 24 ⎦ . 0 0.4 0.4 30 24 Wow! Not only are we getting more than the initial number if seeds back, but there are also some generative plants to produce even more. Now, perhaps for the first time, we may begin to feel that this model represents an actual growth in the described population. Indeed, this is true – later in the book (in Chapter 7), we will be able to determine a precise rate at which this population is growing for a long run.
Matrix multiplication and linear transformations
In the example above, we have taken one step at a time in generating the state vectors for years −−→ −−→ 1, 2, 3, etc. One might ask instead whether x(2) can be obtained directly from x(0) (without −−→ calculating x(1) in the process). The answer is: yes! −−→ −−→ −−→ −−→ x(2) = M x(1) = M (M x(0) ) = M 2 x(0) . −−→ −−→ −−→ −−−−→ In fact, we discovered that the transformation from x(0) to x(2) (or from x(k) to x(k+2) for any k) is also a linear transformation whose matrix is ⎡ ⎤⎡ ⎤ ⎡ 0 0 500 0 0 500 0 ⎢ ⎥⎢ ⎥ ⎢ 2 M = ⎣ 0.006 0.4 0.4 ⎦ ⎣ 0.006 0.4 0.4 ⎦ = ⎣ 0.0024 0 0.4 0.4 0 0.4 0.4 0.0024
⎤ 200 200 ⎥ 0.32 3.32 ⎦ . 0.32 0.32
Section 1.4 Introduction to Linear Transformations
45
Of course, in the same way we could represent a transition over three years using M 3 , etc. → → → Generally, applying one linear transformation F (− y ) = A− y to the outcome of another, G(− x) = → − B x , is also a linear transformation, which is a composition of the two: → → → F (G(− x )) = F (B − x ) = AB − x. The product of matrices A and B of the individual transformations is the matrix of their composition. E XAMPLE 1.27 Let F : R2 → R2 be the counterclockwise rotation by 90 degrees introduced in Example 1.22: x 0 −1 x F( )= . y 1 0 y Verify that
0 −1 1 0
Performing this transformation four times in a row, 4 x 1 0 x x x 0 −1 = = , F (F (F (F ( )))) = y 0 1 y y y 1 0 is equivalent (obviously!) to keeping the original vector unchanged.
4 = I2
Let us now recall the rotation bythe angle α (counterclockwise) whose cos α − sin α x matrix representation we found in Example 1.23 was . sin α cos α y E XAMPLE 1.28
→ Consider the composition of linear transformations F (G(H(− x ))) where: •
F is the counterclockwise rotation by 45 degrees represented by √1 cos (45◦ ) − sin (45◦ ) − → − → 2 F( x ) = x = √1 sin (45◦ ) cos (45◦ ) 2
−1 √ 2 √1 2
− → x.
•
G is the reflection with respect to the x-axis (discussed in Example 1.21) 1 0 − → − → G( x ) = x. 0 −1
•
H is the clockwise rotation by 45 degrees (i.e., counterclockwise by −45◦ ), represented by √1 √1 cos (−45◦ ) − sin (−45◦ ) − → − → → − 2 H( x ) = x = −12 x. √ √1 sin (−45◦ ) cos (−45◦ ) 2 2 The composition can be expressed as −1 √1 √ 1 0 → − 2 2 F (G(H( x ))) = √1 √1 0 −1 2 2
√1 2 −1 √ 2
√1 2 √1 2
− → x =
0 1 1 0
− → x.
(Check the algebra!) This can be rewritten as
x 0 1 x y F (G(H( ))) = = y 1 0 y x which corresponds to the reflection in the line y = x. You should understand that when a vector → → → − x )), then reflected with respect to the x-axis (G(H(− x ))), then x is first rotated by −45◦ (H(− → − ◦ finally rotated by 45 (F (G(H( x )))), the result is precisely that. (Draw some illustrations if necessary.)
46
Chapter 1 Vectors and Matrices As we have seen already, composing linear transformations corresponds to multiplying their matrices. However, there is another possible meaning for a product of two matrices. Consider such a product where the second matrix is partitioned column by column: − → → − → − A B = A b1 | b2 | · · · | bp . m×n n×p
Let us rewrite this expression using block multiplication: − → → − AB = A b1 | A b2 | · · ·
|
→ − A bp .
(12)
In other words, a product AB could also be interpreted as applying the linear transformation corresponding to A to each column of B.
What if the matrix of the linear transformation is not square?
The examples of linear transformations discussed so far in this section all involved square matrices: mapping R2 → R2 resulted in a 2 × 2 matrix, transforming R3 → R3 corresponded to a 3 × 3 matrix, etc. It was legitimate to introduce linear transformations in such a fashion since a great number of important transformations are precisely of this flavor. In fact, the matrix must be square to permit the sort of recursive application described in Example 1.26. However, to avoid leaving the reader with a false impression that we can never have m = n in F : Rm → Rn , let us include some examples, which involve mapping 3-vectors into 2-vectors. 2 Let G : R3 → ⎡ R ⎤be the linear transformation ⎡ ⎤ defined as x x 0 1 0 ⎢ ⎢ ⎥ ⎥ G(⎣ y ⎦) = ⎣ y ⎦. 0 0 1 z z A less elaborate version of the above formula, ⎡ ⎤ x y ⎥ ⎢ G(⎣ y ⎦) = , z z
E XAMPLE 1.29
should make it clear that G takes a vector in the R3 space and projects it into the yz-plane. Unlike the projection in the table on p. 42, where the result was a 3-vector (with first component zero), our G produces a 2-vector (“skipping” the zero component). The two transformations are related, but distinct.
A problem affecting many disciplines involves visualization of three-dimensional objects using a two-dimensional medium, such as this page. As we will see, at least some possible solutions to this problem are based on linear transformations6 from R3 to R2 . The mapping described in Example 1.29 is sometimes referred to as “front elevation” transformation. Four of the sides of the unit cube are projecting onto straight line segments. Instead, we could have projected onto the xz-plane (“side elevation”) or the xy-plane (“top”). The following two examples contain some of the other possible solutions for this problem. For each transformation, we shall show the two-dimensional image obtained by transforming the position vectors of the vertices of the unit cube 0 ≤ x, y, z ≤ 1. To help us distinguish between 6 While it is common to refer to these transformations as projections in computer graphics terminology, we will not do so in this book, as the term “projection” has a special meaning in linear algebra.
Section 1.4 Introduction to Linear Transformations
47
the three- and two-dimensional objects, we shall use x, y, and z as coordinates in R3 but will use u and v (instead of x and y) as coordinates in R2 . ⎡
⎤ x −1 √ 1 1 0 1 ⎢ ⎥ √ x E XAMPLE 1.30 F1 (⎣ y ⎦) = 2−1 +y +z = x 2−12 +y + 2 √ 1 0 1 0 2 2 z ⎡ ⎤ x −1 √ 1 0 ⎢ 0 ⎥ 2 2 z = ⎣ y ⎦ is an example of a cabinet transformation, in which two −1 √ 0 1 1 2 2 z of the axes (in this case, y and z) are kept unchanged, while the remaining axis (x) is scaled by 1 2 and positioned to form a certain angle to the other axes in the plane. This angle in our case is 135◦ , but it could be different.
v z
y
u
x Illustration for Example 1.30
Transformations of this type are used widely for visualization of three-dimensional objects in books, including this one. Because of the ease with which they can be rendered, they also tend to be used when one attempts to create a 3D sketch by hand.
v z
⎡
⎡ ⎤ ⎤ x √ x 3 −1 0 ⎢ ⎢ ⎥ ⎥ 2 √ √2 E XAMPLE 1.31 F2 (⎣ y ⎦) = ⎣ y ⎦ defines an orthographic − 3 3 −1 4 4 2 z z axonometric transformation: it involves rotations followed by the transformation discussed in Example 1.29. (See Exercise 29 for details on the rotations involved.) Transformations of this type are often used by computer graphics software when visualizing 3D objects on the screen (the user is often allowed to interactively change the rotations involved, resulting in different views).
u
y x Illustration for Example 1.31
Linear transformations from Rn to Rm
Much of the discussion in this section has been devoted to linear transformations that have a special form: •
F : R2 → R2 – transformations in the plane, and F : R3 → R3 – transformations in space, (e.g., projections, reflections, and rotations),
•
F : R3 → R2 – transformations mapping 3-space to the plane (e.g., cabinet transformations and orthographic axonometric transformation). As we have already shown in Example 1.26, linear transformations can be very useful outside the geometric contexts of the R2 plane or R3 space. ⎡ E XAMPLE 1.32 ⎡ ⎢ ⎢ F (⎢ ⎢ ⎣
0 1 0 0
⎤
⎢ ⎢ Consider a linear transformation F such that F (⎢ ⎢ ⎣ ⎡
⎥ ⎢ ⎥ ⎢ ⎥) = −3 , F (⎢ ⎥ ⎢ 2 ⎦ ⎣
0 0 1 0
⎤
⎡
⎥ ⎢ ⎥ ⎢ ⎥) = 0 , F ( ⎢ ⎥ ⎢ 1 ⎦ ⎣
0 0 0 1
⎤ ⎥ ⎥ ⎥) = 1 . ⎥ 1 ⎦
1 0 0 0
⎤ ⎥ ⎥ ⎥) = 2 , ⎥ 1 ⎦
48
Chapter 1 Vectors and Matrices ⎡ ⎢ ⎢ (a) Express F (⎢ ⎢ ⎣
x1 x2 x3 x4
⎤ ⎥ ⎥ ⎥) as a linear combination. ⎥ ⎦
(b) Find the matrix of F. ⎡ ⎤ −2 ⎢ ⎥ ⎢ 3 ⎥ ⎢ ⎥) using the linear combination obtained in (a) as well as using the (c) Calculate F (⎢ ⎥ ⎣ 0 ⎦ 4 matrix obtained in (b). S OLUTION (a) According to formula (8), we can write ⎤ ⎡ ⎡ ⎤ ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ 1 0 0 0 x1 ⎥ ⎢ ⎢ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎢ x2 ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ ⎢ ⎥ ⎥ ⎥ F (⎢ ⎢ x ⎥) = F (x1 ⎢ 0 ⎥ + x2 ⎢ 0 ⎥ + x3 ⎢ 1 ⎥ + x4 ⎢ 0 ⎥) ⎣ ⎣ ⎦ ⎣ ⎣ ⎦ ⎣ 3 ⎦ ⎦ ⎦ x4 0 0 0 1 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ = x1 F ( ⎢ ⎢ 0 ⎥) + x2 F (⎢ 0 ⎥) + x3 F (⎢ 1 ⎥) + x4 F (⎢ 0 ⎥) ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 0 0 0 1 2 −3 0 1 + x2 + x3 + x4 . = x1 1 2 1 1 (b) From formula (9), we know that the matrix A of the linear transformation F is ⎤ ⎡ | | | | 2 −3 0 1 ⎥ ⎢ → → → → A = ⎣ F (− . e2 ) F (− e3 ) F (− e4 ) ⎦ = e1 ) F (− 1 2 1 1 | | | | (c) From the representation obtained in part (a) ⎡ ⎤ −2 ⎢ ⎥ ⎢ 3 ⎥ 2 −3 ⎢ ⎥ F (⎢ +0 ⎥) = −2 1 + 3 2 ⎣ 0 ⎦ 4 ⎡ ⎤ −2 ⎢ ⎥ ⎢ 3 ⎥ 2 −3 0 ⎥ whereas from part (b), F (⎢ ⎢ 0 ⎥) = 1 2 1 ⎣ ⎦ 4
Linearizations of nonlinear transformations
0 1
+4
1 1
=
−9 8
⎡ ⎤ ⎢ −2 ⎥ ⎥ −9 1 ⎢ 3 ⎢ ⎥= . ⎥ 8 1 ⎢ ⎣ 0 ⎦ 4
By now, a careful reader may have realized our definition of linear transformation is actually inconsistent with the well-established notion of a linear function f (x) = y0 + m(x − x0 ).
(13)
Section 1.4 Introduction to Linear Transformations
It can be easily shown that if y0 − mx0 = 0, f is not a linear transformation (see Exercise 33), even though it clearly represents a straight line (with slope m, passing through the point (x0 , y0 )). Apparently, the definition of linear transformation only admits lines that pass through the origin. One way in which equations of other lines can also be expressed as linear transformations is by rewriting (13) as Δy = m Δx with Δy = f (x) − y0 and Δx = x − x0 . − → Applying this idea to the linearization of a differentiable vector-valued function r (s) = x(s) , we can similarly express it as a linear transformation: y(s) Δx x (s0 ) = Δs (14) Δy y (s0 ) −→ ΔL
→ − r (s0 )
−→ − → → − with ΔL = L (s) − L (s0 ) and Δs = s − s0 .
→ − 2 2 Let us develop a similar linearization for a differentiable function r : R → R defined by x(s, t) → − r (s, t) = that would locally, again, be a linear transformation y(s, t) Δx a11 a12 Δs = . (15) Δy a21 a22 Δt −→ ΔL
J
To build the entries of J, first consider the case where Δt = 0. The right-hand side of (15) a11 a11 then becomes Δs . Comparing to the right-hand side of (14) we obtain = a21 a21 xs (s0 , t0 ) → = − r s (s0 , t0 ). Note that we had to replace ordinary derivative notation with ys (s0 , t0 ) partial derivatives since there are two independent variables now. xt (s0 , t0 ) a12 → = =− r t (s0 , t0 ). Likewise, Δs = 0 leads to yt (s0 , t0 ) a22 Consequently, Δx xs (s0 , t0 ) xt (s0 , t0 ) Δs → → = r t (s0 , t0 ). = Δs− r s (s0 , t0 ) + Δt− Δy ys (s0 , t0 ) yt (s0 , t0 ) Δt
(16)
J
t
y Ds Dt
r(s0+Ds,t) M
t 0 ,t 0)
t0 + Dt
r(s0,t)
Dt
r(s,t0+Dt)
N
Dt r (s
ò
49
t0
Ds rs(s ,t ) 0 0 s0
s0 + Ds
s
r(s,t0) x
Ds In the picture above, the point N is the approximation of the point M resulting from this
50
Chapter 1 Vectors and Matrices linearization. The matrix J in (16) is called the Jacobian matrix of this transformation at (s0 , t0 ).
st E XAMPLE 1.33 is clearly nonlinear. Let us diss2 − t cuss its linearizations at two points: a. s = t = 1, b. s = t = 3. − The transformation → r (s, t) =
16
y t=0
t=1
t=2
t=3
t=4 s=4
9
t
O
s=3
4
4 3 2 1 1 2 3 4
1
s
−1 −2 −3 −4 s=0
x 8
s=2
16
s=1
⎤ ∂ ∂ (st) (st) t s ⎥ ⎢ ∂s ∂t a. The Jacobian matrix, J = ⎣ ∂ . At (s, t) = ⎦ = ∂ 2 2s −1 (s2 − t) (s − t) ∂s ∂t 1 1 Δs 1 1 −→ (1, 1), the linearization is ΔL = = Δs + Δt . 2 −1 Δt 2 −1 3 3 Δs 3 3 −→ b. At (s, t) = (3, 3), the linearization is ΔL = = Δs +Δt . 6 −1 Δt 6 −1 ⎡
− → a differentiable function → r : Rn → Rm defined by − r (s1 , . . . , sn ) = ⎡ ∂x ∂x1 ⎤ ⎤ 1 ··· ⎢ ∂s1 ∂sn ⎥ ⎥ ⎢ . .. ⎥ .. ⎥ has the Jacobian matrix J = ⎢ . ⎥ . . ⎥ defined at the ⎢ . ⎦ ⎣ ∂x ∂xm ⎦ m ··· ∂s1 ∂s → −→ −→ →n− → − →+− →− → → ∈ Rn , then − ΔL = J Δu approximates Δr = − r (u point corresponding to − u 0 0 Δu) − r (u0 ) −→ → − for Δu sufficiently close to 0 . More generally, if ⎡ x1 (s1 , . . . , sn ) ⎢ .. ⎢ . ⎣ xm (s1 , . . . , sn )
Section 1.4 Introduction to Linear Transformations
EXERCISES
1. Find the matrix of the linear transformation F : R → R for which F ( 1 3 0 2 . Use this matrix to determine F ( ). and F ( )= 1 2 1 2
2
2. Find the matrix of the linear transformation F : R → R for which F ( 0 −1 4 and F ( )= . Use this matrix to determine F ( ). 1 1 1 2
1 0
)=
0
2
1
1
0
)=
51
1 1
3. Find the matrix of the linear transformation F : R2 → R2 which doubles every horizontal vector and reverses every vertical vector. 4. Find the matrix of the linear transformation F : R2 → R2 which preserves every horizontal vector and triples every vertical vector. 5. Find the matrix of the linear transformation F : R2 → R2 which maps every horizontal vector into the zero vector and preserves every vertical vector. 6. Find the matrix of the linear transformation F : R2 → R2 which quadruples every horizontal vector and scales every vertical vector by the factor 12 . In Exercises 7–12, find the matrix of the given linear transformation. 2 −1 x1 ) = x1 + x2 . 7. F ( x2 3 0 x1 4 2 8. F ( ) = x1 + x2 . x2 3 2 ⎡
x1
⎤
⎡
⎥ ⎢ ⎢ 9. F (⎣ x2 ⎦) = 2x1 ⎣ x3 ⎤ ⎡ x1 ⎥ ⎢ ⎢ 10. F (⎣ x2 ⎦) = x2 ⎣ x3 ⎤ ⎡ x1 ⎥ ⎢ ⎢ x2 ⎥ ⎥ ⎢ 11. F (⎢ ⎥ ) = x1 ⎣ x3 ⎦ x4 ⎡
⎡ 12. F (
x1 x2
⎢ ⎢ ⎢ ) = x1 ⎢ ⎢ ⎢ ⎣
−2
⎤
⎡
4
⎤
⎢ ⎥ ⎥ 8 ⎦ − x3 ⎣ 0 ⎦ . 1 0
⎤ 1 ⎥ 2 ⎦. 3
6 1
4 2 0 6 3
+ 3x2
⎤
⎡
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ + 4x2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣
3 2
+ x4
1 0 5 2 −1
⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦
−2 7
.
52
Chapter 1 Vectors and Matrices 2 2 In Exercises 13–15, for each matrix A, consider a linear transformation F : R → R x x defined by F ( )=A . y y
a. On graph paper, sketch the unit square and the parallelogram obtained by applying F to the unit square (as was done in a figure on p. 40) b. Use the grid obtained from the parallelogram in part a to visually obtain the value 2 F( ). 1 2 c. Confirm the answer obtained in part b by algebraically calculating F ( ). 1 13. A = 14. A = 15. A =
1 1 0 1 1
1 2
1 2
1
.
1 1 −1 1
. .
→ 16. If the position vector − v of each of the corner points of the letter “F” pictured in the mar→ → gin undergoes the linear transformation F (− v ) = A− v and the corresponding points are connected in the same way, match each transformed letter to the corresponding matrix A. y y y (a) (b) (c)
y
x x
(i) A =
0 1 −1 0
x
; (ii) A =
2 0 0 2
; (iii) A =
x
1 0 −1 1
.
→ 17. If the position vector − v of each of the corner points of the letter “R” pictured in the mar→ → gin undergoes the linear transformation F (− v ) = A− v and the corresponding points are connected in the same way, match each transformed letter to the corresponding matrix A. y y y (a) (b) (c)
y
x
x
(i) A =
−2 0 0 1
x
; (ii) A =
1 0 0 2
; (iii) A =
2 1 0 1
x
.
Section 1.4 Introduction to Linear Transformations
53
→ 18. If the position vector − v of each of the corner points of the letter “J” pictured in the mar→ → gin undergoes the linear transformation F (− v ) = A− v and the corresponding points are connected in the same way, write the matrix A that results in each transformed letter using −2, −1, 0, 1, or 2 as entries. y y y (a) (b) (c)
y
x
x
x
x
→ → 19. If the position vector − v of each of the corner points of the symbol “− v ” pictured in the → − → − margin undergoes the linear transformation F ( v ) = A v and the corresponding points are connected in the same way, write the matrix A that results in each transformed symbol using −2, −1, 0, 1, or 2 as entries. y y y (a) (b) (c)
y
x x
x
x
20. The transformations in Exercises 1 and 13 are examples of a shear transformation. All shear transformations parallel to the x-axis in R2 have the form x 1 a x F( )= y 0 1 y where a is a real number. From the geometric point of view, discuss how the value of a influences the result of this transformation. 21. Find the matrix A of the linear transformation F : R2 → R2 performing the counterclockwise rotation by 30 degrees. Verify that A3 matches the matrix from Example 1.22. 22. Find the matrix A of the linear transformation F : R2 → R2 performing the counterclockwise rotation by 60 degrees. Calculate the matrix A3 . What is the geometric interpretation of the transformation corresponding to A3 ? In Exercises 23–24, form a necessary composition of projections and rotations in R2 (in a similar way to Example 1.28) to obtain the matrix of the specific linear transformation F : R2 → R2 . 23. Projection onto the line y = x. 24. Projection onto the line y = −x.
54
Chapter 1 Vectors and Matrices
x x x x 25. * Let F ( )=A and G( )=B represent counterclockwise rotay y y y tions in R2 by angles α and β, respectively. Find the matrix of counterclockwise rotation by α + β in two ways: a. by multiplying the matrices of F and G (Hint: Use formula (11) to find both.) b. directly from formula (11) with the angle α + β. Make sure to verify the answers in parts a and b are equivalent by applying standard trigonometric identities. ⎡ ⎤ ⎤ ⎡ ⎤⎡ x cos α 0 sin α x ⎥ ⎢ ⎥⎢ ⎥ ⎢ 26. Show that H(⎣ y ⎦) = ⎣ 0 1 0 ⎦ ⎣ y ⎦ represents the rotation about the z − sin α 0 cos α z y-axis by the angle α measured counterclockwise when looking at the xz-plane from the positive y-direction. ⎡ ⎤ ⎡ ⎤ x x ⎢ ⎥ ⎢ ⎥ 27. * Find the matrix A such that G(⎣ y ⎦) = A ⎣ y ⎦ represents the rotation about the x-
z
z z axis by angle α measured counterclockwise when looking at the yz-plane from the positive x-direction.
r (x,y,z)
28. Apply a sequence of two rotations:
f
•
the rotation about the y-axis by angle φ measured counterclockwise when looking at the xz-plane from the positive y-direction, followed by • the rotation about the z-axis by angle θ measured counterclockwise when looking at the xy-plane from the positive z-direction ⎤ ⎡ 0 ⎥ ⎢ to the vector ⎣ 0 ⎦ to prove the spherical coordinate system identities:
y q x Illustration for Exercise 28
ρ x y
= ρ cos θ sin φ = ρ sin θ sin φ
z
= ρ cos φ.
→ 29. * Verify that the composition of linear transformations G(H(F (− x ))) where • •
v
•
z
F is the rotation about the z-axis by angle π6 measured clockwise when looking at the xy-plane from the positive z-direction, H is the rotation about the y-axis by angle π6 measured counterclockwise when looking at the xz-plane from the positive y-direction, and G is the projection of Example 1.29
is the axonometric transformation specified in Example 1.31. 30. * Consider the cabinet transformation F1 introduced in Example 1.30.
y
x Illustration for Exercise 30
u
a. Show that
1 → e1 ) = F1 (− 2 so that each vector parallel to the x-axis in R3 is transformed into a vector whose length is half of the original length. → → → e1 ) = 2F1 (− e1 ), F3 (− e2 ) b. Find the matrix of the cavalier transformation F3 such that F3 (− → − → − → − = F1 ( e2 ), and F3 ( e3 ) = F1 ( e3 ).
Section 1.4 Introduction to Linear Transformations 31.
a. In a certain town, it has been observed that • if a day is sunny, then 9 out of 10 times it is followed by another sunny day, • if a day is cloudy, then it has a 70% chance of being followed by another cloudy one. S (i) S (i+1) =A where S (i) Write the 2 × 2 matrix A of the transformation C (i+1) C (i) and C (i) represent the probability of the ith day being sunny or cloudy, respectively. b. Assume that tomorrow there is a 90% chance of a cloudy day in that town. Use the matrix obtained in part a to determine the likelihood that the day after tomorrow will be sunny.
S
SUNNY 0.9
55
0.1
32. Three telephone companies, Diagonal Enterprises (DE), Symmetric Telecom (ST), and Transpose Labs (TL), all offer wireless service, requiring all their customers to sign oneyear contracts. According to the market research, at the end of their contract
C
CLOUDY
• •
0.7
0.3
of all the DE customers, 25% are likely to renew, another 25% are likely to switch to TL, and 50% will switch to ST instead; 50% of the current ST and TL customers will switch to DE – the remaining customers will renew their existing contracts. ⎤ ⎡ (i) ⎤ D(i+1) D ⎢ (i+1) ⎥ ⎢ (i) ⎥ a. Write the 3×3 matrix A of the transformation ⎣ S ⎦ = A⎣ S ⎦where D(i) , T (i+1) T (i) (i) (i) S , and T represent the fractions of the customers who have contracts with DE, ST, and TL at the beginning of the ith year. b. Assume that at the beginning of year 2019, DL serves 50% of the customers, whereas ST and TL serve 25% each. Use the matrix obtained in part a to determine the projected market share of each company at the beginning of year 2020. ⎡
Illustration for Exercise 31
0.25
DE
0.5
0.5
0.25
34. * Use block multiplication to obtain a more concise proof of Theorem 1.6.
0.5
ST
TL 0.5
33. * Show that f : R → R defined by (13) is a linear transformation if and only if y0 = mx0 .
0.5
Illustration for Exercise 32
35. * Show that for all real values a and b, the linear transformation x a −b x F( )= y b a y
x − → → − ) can be considered as a composition H(G( x )) = F ( x ) of a rotation G( y x x cos α − sin α x . Find formulas for = and a scaling H( ) = k y sin α cos α y y α and k in terms of a and b. (Note: F is sometimes called a rotation-dilation transformation.)
36. * Let L be a line in the plane with parametric equations x = at and y = bt where a2 +b2 = 1 (i.e., the direction vector has unit length). x x a2 ab (17) G( )= ab b2 y y performs a projection onto the line L by verifying that a a x x a G( )= and − G( ) · = 0. b b y y b b. Verify that when projecting onto one of the axes, the formula (17) reduces to the formulas presented in Example 1.19. a. Show that
y b L
x x y - G( y ) x G( y )
a x Illustration for Exercise 36
56
Chapter 1 Vectors and Matrices 37. * Let L be a line in the plane with the parametric equations x = at and y = bt where a2 + b2 = 1 (i.e., the direction vector has unit length). a. Prove that if H : R2 → R2 is a linear transformation performing a reflection with respect to the line L, then → → → H(− x)+− x = 2G(− x) where G is the linear transformation performing a projection onto the line L. b. Use part a and (17) to show that x 2ab x 2a2 − 1 H( )= y 2ab 2b2 − 1 y performs a reflection with respect to the line L. c. Verify that when reflection is performed with respect to one of the axes, the formula of part b yields the formulas of Example 1.21.
y 10 9 8 7 6 5
4 38. * Four different shapes were used in Exercises 16–19 to illustrate the effect of a linear transformation. Those shapes were chosen on purpose: they do not have any symmetries. A symmetry would enable more than one linear transformation F to keep the shape unchanged, making it impossible to visually distinguish F from the identity.
3 2 1
Which of the shapes shown below could also be used? If they cannot, identify a linear transformation other than identity that maps the shape to itself (you can superimpose a coordinate system as convenient).
x 0 1 2 3 4 5 Illustration for Exercise 39
ADGHIJKMNPQ TUVWY124567 ò
39. * The graph in the margin shows the image of the square 1 ≤ s, t ≤ 5 under the nonlinear x s/t transformation = . y s+t
y
a. Label each curve with the appropriate value s = 1, . . . , s = 5 and each line with t = 1, . . . , t = 5. b. Find the Jacobian matrix and evaluate it at four (s, t) points: (2, 2), (2, 4), (4, 2), and (4, 4). Sketch the parallelogram corresponding to the linear transformation (16) of the 2 × 2 square centered at (2, 2) and then compare it to the nonlinear image of the same square. Repeat this for the remaining three points.
4 3 2 1 x 1
0
Exercises 39–42 refer to the discussion of Jacobian matrices of nonlinear transformations starting on p. 48, including Example 1.33. Calculus is required.
1
2
3
4
1
40. * The graph in the margin shows the image of the square 0 ≤ s, t ≤ 4 under the nonlinear x s2 /4 transformation = . y s−t a. Label each curve with the appropriate value t = 0, . . . , t = 4 and each line with s = 0, . . . , s = 4. b. Find the Jacobian matrix and evaluate it at four (s, t) points: (1, 1), (1, 3), (3, 1), and (3, 3). Sketch the parallelogram corresponding to the linear transformation (16) of the 2 × 2 square centered at (1, 1) and then compare it to the nonlinear image of the same square. Repeat this for the remaining three points.
2 3 4 5 Illustration for Exercise 40
Section 1.4 Introduction to Linear Transformations
y
41. * The graph in the margin shows the image of the square 0 ≤ s, t ≤ 4 under the nonlinear x s2 − t transformation = . y s + t2
20 15
a. Label each curve with the appropriate value s = 0, . . . , s = 4 or t = 0, . . . , t = 4. b. Find the Jacobian matrix and evaluate it at four (s, t) points: (1, 1), (1, 3), (3, 1), and (3, 3). Sketch the parallelogram corresponding to the linear transformation (16) of the 2 × 2 square centered at (1, 1) and then compare it to the nonlinear image of the same square. Repeat this for the remaining three points.
10 5 x 5
25
57
42. * The graph in the margin image of the square 1 ≤ s, t ≤ 5 under the nonlinear shows the x s2 /t . transformation = st y
0 5 10 15 Illustration for Exercise 41
y
a. Label each curve with the appropriate value s = 1, . . . , s = 5 or t = 1, . . . , t = 5. b. Find the Jacobian matrix and evaluate it at four (s, t) points: (2, 2), (2, 4), (4, 2), and (4, 4). Sketch the parallelogram corresponding to the linear transformation (16) of the 2 × 2 square centered at (2, 2) and then compare it to the nonlinear image of the same square. Repeat this for the remaining three points.
20 15 10 5 x 0 5 10 15 20 25 Illustration for Exercise 42
58
Chapter 1 Vectors and Matrices
1.5 Chapter Review
Section 1.5 Chapter Review
⋅
59
60
Chapter 2 Linear Systems
2
Linear Systems
In high school algebra, systems of two or three linear equations in two or three unknowns are frequently discussed, along with methods for solving them, e.g., substitution, or elimination by subtraction. Such systems often arise in real life situations, but they may involve more equations and/or unknowns. In this chapter, we will introduce methods for solving such systems (regardless of their size). Matrices and vectors, which were studied in the first chapter, will play a prominent role here as well. Consider a scenario involving three kinds of containers, shaped as cubes, prisms, and cylinders. All cubes weigh exactly the same; the same can be said about all prisms and all cylinders, but their respective weights are unknown. The only information available to us is that each of the following three scales is in the state of equilibrium.
x
x
x
y
z
z
SCALE 1
x + 2z = 4
SCALE 2
2x + y = 7
SCALE 3
y+z =4
y
z
Let us eliminate the cubes from the second scale, incorporating the information from the first scale. For each cube we remove from the left side of scale 2, we must also remove two cylinders from the left and four 1kg weights from the right. Since we are removing two cubes, we must remove twice as much: four cylinders from the left and eight 1kg weights from the right. Note, however, that there are no cylinders to remove on the left side – instead, to keep the scale balanced, we can add the four cylinders to the other side. Likewise, after removing the seven 1kg weights from the right, the eighth one is added on the other side. Here is the result on the second scale (the other two are unchanged):
Chapter 2 Linear Systems
z
y
z
z
61
z y − 4z = −1
SCALE 2
We can now proceed to use the current state of scale 2 to remove the prism from scale 3. To do so, along with removing the prism from the left side, we must remove a single 1kg weight from the left side (or, equivalently, add it on the right side) and remove four cylinders from the right side (or add them on the other side). The result:
z
z
z z
z 5z = 5
SCALE 3
The third scale now contains sufficient information to determine the weight of a cylinder: since five of them weigh 5kg, one must weigh 1kg. Using this information in conjunction with the equilibrium contained by the second scale, we can see that a prism must weigh 3kg. Finally, the first scale helps us determine that a cube weighs 2kg. We have managed to solve the system of equations x + 2z
= 4
2x + y
= 7
y+z
= 4,
obtaining the solution x
=
2
y
=
3
z
=
1.
In this chapter, we shall develop procedures for solving similar systems, referring directly to the algebraic notation.
62
Chapter 2 Linear Systems
2.1 Systems of Linear Equations D EFINITION (Linear System) A system of m linear equations in n unknowns (or linear system) can be written as a11 x1 + a21 x1 + .. . am1 x1 + with real numbers aij (the
a12 x2 + a22 x2 + .. . am2 x2 + coefficient in
··· ···
+ a1n xn = b1 + a2n xn = b2 (18) .. .. . . · · · + amn xn = bm the ith equation associated with the jth un⎤ ⎡ s1 ⎥ ⎢ ⎢ s2 ⎥ ⎥ known) and bi (the right-hand side value in the ith equation). The n-vector ⎢ ⎢ .. ⎥ ⎣ . ⎦ sn is called a solution of the linear system if substituting x1 = s1 , . . . , xn = sn into all equations results in each equation becoming a true equality. The set containing all such solutions is called the solution set of the system.
Let us begin by illustrating one of the methods for solving linear systems you are likely to have encountered in your study of algebra.
E XAMPLE 2.1
To solve the linear system 2x + y −4x + 3y
= 5 = 5
using the method of elimination by addition (or subtraction), we want to multiply one or both equations (on both sides) by numbers such that when the resulting equations are added (or subtracted), one of the unknowns is eliminated. One way to accomplish this in our example is to multiply the first equation by 2 : 4x + 2y = 10 −4x + 3y = 5. Adding the two equations results in 5y = 15; therefore, y = 3 and, substituting into either one of the original equations, x = 1. Check your answer!
Solving systems of equations (especially larger ones) might become quite involved; however, verifying the legitimacy of the solution we obtained is often quite easy: simply substitute the solution into the left-hand sides of the original system and check that they agree with the corresponding right-hand sides: 2(1) + 3 = −4(1) + 3(3) =
2+3=5
−4 + 9 = 5
Section 2.1 Systems of Linear Equations
63
The main objective of this section will be to develop a systematic method for solving linear systems whose number of equations and/or unknowns can be considerably larger than 2 or 3. However, writing such large systems in their full form can be time- and space-consuming; therefore, we will first introduce a more compact way to represent linear systems.
Matrix representation of linear systems
Consider an m × n matrix
⎡ ⎢ ⎢ A=⎢ ⎢ ⎣
a11 a21 .. . am1
a12 a22 .. . am2
··· ··· .. .
⎡
⎤
and an n-vector (or n × 1 matrix)
⎢ ⎢ − → x =⎢ ⎢ ⎣
···
x1 x2 .. . xn
a1n a2n .. . amn
⎤ ⎥ ⎥ ⎥ ⎥ ⎦
(19)
⎥ ⎥ ⎥. ⎥ ⎦
→ The product A− x is an m-vector (or m × 1 matrix) ⎡ ⎢ ⎢ − → Ax =⎢ ⎢ ⎣
→ row1 A · − x → row2 A · − x .. . → rowm A · − x
⎡
⎤
⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎦ ⎣
a11 x1 + a12 x2 + · · · + a1n xn a21 x1 + a22 x2 + · · · + a2n xn .. . am1 x1 + am2 x2 + · · · + amn xn
Setting this vector equal to the m-vector
⎡
⎢ ⎢ − → b =⎢ ⎢ ⎣ we obtain a vector equation
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
⎤
b1 b2 .. . bm
⎥ ⎥ ⎥ ⎥ ⎦
→ − → A− x = b,
which is equivalent to the linear system (18). The matrix A is called the coefficient matrix of the linear system. We shall also introduce the augmented matrix of the linear system, defined to be the following m × (n + 1) matrix: ⎡ ⎢ ⎢ ⎢ ⎢ ⎣
a11 a21 .. . am1
a12 a22 .. . am2
··· ··· .. . ···
a1n a2n .. . amn
b1 b2 .. . bm
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
64
Chapter 2 Linear Systems E XAMPLE 2.2
The linear system 9x − 2y
− +
z 2z
= 5 = 0
can be represented using the coefficient matrix A = ⎡
⎤ x 5 → − ⎢ ⎥ → − x = ⎣ y ⎦ , and the right-hand side vector b = 0 z → − → A− x = b. Its augmented matrix is
9 0
−1 2
0 −2
9 0 0 −2
−1 2
, the unknown vector
in the matrix form
5 0
.
The augmented matrix can be considered as a data structure representing a linear system; of the → − → three ingredients of the system A− x = b , this matrix only omits the unknown vector, which contains just the names of the unknowns. These names are inconsequential to the solution itself.
Reduced row echelon form
E XAMPLE 2.3
Consider the system
y
x
= 3 = 1
y
4
If all linear systems were this easy to solve, this book would probably be a lot shorter!
3 2 1 x -1
-1
1
2
3
4
Here is another “nice” system: E XAMPLE 2.4 x y z
+ 2w − w + 3w
= 1 = 0 = −2
You should have no trouble seeing that this system possesses infinitely many solutions: letting w be an arbitrary number, we can calculate x = 1 − 2w, y = w, and z = −2 − 3w.
Both these systems have augmented matrices
1 0
0 1
3 1
⎡
1 ⎢ and ⎣ 0 0
0 1 0
0 0 1
2 −1 3
that adhere to the conditions listed in the following definition.
⎤ 1 ⎥ 0 ⎦ −2
Section 2.1 Systems of Linear Equations
65
D EFINITION (Reduced Row Echelon Form) A matrix A is said to be in reduced row echelon form (r.r.e.f.) if it satisfies the following conditions: 1. If there are any zero rows in A, these are positioned below all other (nonzero) rows. 2. Every nonzero row of A must have its first nonzero entry equal to 1. It is called the leading entry of that row. 3. For any two nonzero rows of A, the leading entry of the row below is located to the right of the leading entry of the row above. (We can describe this by saying that the leading entries form a “staircase pattern”.) 4. In any column that contains a leading entry, all remaining entries must equal zero. If a matrix satisfies conditions 1–3, then we say it is in row echelon form (r.e.f.). A column that contains a leading entry is called a leading column. Likewise, a column without a leading entry is referred to as a nonleading column. E XAMPLE 2.5 Let us consider the following matrices: ⎡ ⎤ ⎡ ⎤ ⎡ 1 0 1 1 0 1 0 1 2 3 ⎢ ⎥ ⎢ ⎥ ⎢ A = ⎣ 0 1 0 ⎦,B = ⎣ 0 0 ⎦,C = ,D = ⎣ 0 0 0 0 1 0 0 0 0 1 0 0 ⎡ ⎤ ⎡ 1 0 0 0 0 0 ⎢ ⎥ ⎢ E = 0 0 1 100 , F = ⎣ 0 2 0 0 ⎦ , G = ,H = ⎣ 0 0 0 0 1 0
⎤ 0 0 ⎥ 1 5 ⎦, 0 0 ⎤ 0 1 0 ⎥ 1 0 0 ⎦. 0 0 1
•
The matrices A, D, E, and G satisfy all four conditions above. These matrices are in reduced row echelon form (and, also, in row echelon form).
•
The matrix C satisfies conditions 1–3, but the entry c13 = 3 = 0 violates condition 4. Consequently, this matrix is in row echelon form but not in the reduced row echelon form.
•
The remaining three matrices are neither in reduced row echelon form nor in row echelon form: • • •
B violates condition 1 (its second row – zero row – should not be positioned above a nonzero row); F violates condition 2 (the first nonzero entry in the second row is not 1); H violates condition 3 (leading entries in first two rows do not form a staircase pattern).
Here is another example of a matrix satisfying all four conditions. ⎡ E XAMPLE 2.6
⎢ ⎢ The matrix in reduced row echelon form: ⎢ ⎢ ⎣
1 0 0
0 1 0
0 0 1
0
0
0
⎤ ⎥ ⎥ ⎥ is the augmented ⎥ ⎦
66
Chapter 2 Linear Systems matrix of the linear system x
= 0 y = 0 0 = 1 0 = 0 which has no solution because of the contradiction 0 = 1 contained in the third equation.
An augmented matrix in reduced row echelon form immediately reveals all information about the solutions of the underlying system. → − → − → T HEOREM 2.1 Suppose the linear system A− x = b has the augmented matrix [A| b ] in reduced row echelon form. → − (a) If [A| b ] contains a row [0 · · · 0 | 1], then the system is inconsistent (has no solutions). → − (b) If [A| b ] does not contain a row [0 · · · 0 | 1], then the system is consistent (has at least one solution). In this case: (b1) If every column of A contains a leading entry, then the system has a unique solution. (b2) If one or more columns of A do not contain leading entries, then the system has infinitely many solutions.
P ROOF Part (a) is true since the row [0 · · · 0 | 1] corresponds to the equation 0 = 1. Because this equation has no solution, any system that contains it cannot have a solution either. → − → − (b) If [A| b ] does not contain the row [0 · · · 0 | 1], then it follows that b cannot contain a → − leading entry – instead, each nonzero row of [A| b ] must contain a leading entry in A. Let k → − denote the number of nonzero rows of [A| b ], let the k leading columns of A be numbered i1 , . . . , ik , and let the n − k nonleading columns be numbered j1 , . . . , jn−k . The pth leading → − → → column is the pth column of In ; i.e., colip A = − ep . Therefore, the equation A− x = b is equivalent to ⎡
⎧ ⎪ m−k ⎨ ⎪ zeros ⎩
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
x i1 x i2 .. . x ik 0 .. . 0
⎤
⎡
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣
b1 b2 .. . bk 0 .. . 0
⎤
⎡
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − x j1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣
colj1 A
a1,j1 a2,j1 .. . ak,j1 0 .. . 0
⎤
⎡
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − · · · − xjn−k ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣
coljn−k A
a1,jn−k a2,jn−k .. . ak,jn−k 0 .. . 0
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ . (20) ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
(b1) If every column of A has a leading entry, then k = n and the system becomes x1 = b1 , x2 = b2 , ..., xn = bn , explicitly specifying the unique solution. Each unknown is specified in exactly one equation, guaranteeing the solution exists and is unique. (b2) If one or more columns of A do not contain leading entries, then k < n so that n − k (> 0) unknowns xj1 , . . . , xjn−k are arbitrary. The remaining k leading columns correspond to unknowns xi1 , . . . , xik whose values can be determined using formula (20).
Section 2.1 Systems of Linear Equations
67
When an unknown corresponds to a leading column, we shall refer to it as a leading unknown. Note that for consistent systems (b1) and (b2) with augmented matrix in reduced row echelon form (r.r.e.f.), each leading unknown is specified using the information contained in that leading entry’s row of the augmented matrix.
Row equivalence
While it is clear that linear systems with augmented matrices in reduced row echelon form are very easy to solve, you are probably beginning to wonder just how likely we are to continue to run into such systems throughout our study of linear algebra. Well, the bad news is that the answer is “not very likely.” However, the good news is that it is possible to transform any linear system into an equivalent one (i.e., with the same solution set) whose augmented matrix is in r.r.e.f. by performing a sequence of the following operations on the augmented matrix (in these operations, we often denote the ith row by ri ): D EFINITION (Elementary Row Operations) Operations of the following three types are called elementary row operations: 1. add a multiple of one row to another row: ri + krj → ri where i = j; 2. multiply a row by a nonzero number: kri → ri where k = 0; 3. interchange two rows: ri ↔ rj . If a matrix B is obtained from a matrix A by a finite sequence of elementary row operations, then we say that A is row equivalent to B. The reader may notice that elementary row operations appear to be akin to the elimination approach used to solve smaller systems (e.g., in Example 2.1). Indeed, performing any elementary row operation of one of these three types will never alter the solution set of the system (we shall completely justify this statement in the next section). Therefore, we arrive at the following procedure to solve a linear system: •
Transform the system’s augmented matrix, by a finite sequence of elementary row operations, to a reduced row echelon form.
•
Obtain the solution set, as discussed in the previous subsection. Let us illustrate this procedure in the following annotated example. E XAMPLE 2.7
Solve the linear system x 2x x
+
+ 3w
y
+ 2z + 4z + z
w
= = =
5 4 4
3y
+ 5z
+ 2w
=
13
+
68
Chapter 2 Linear Systems We begin by forming the augmented matrix of the system: ⎤ ⎡ 1 0 2 3 5 ⎥ ⎢ ⎢ 2 0 4 0 4 ⎥ ⎥ ⎢ ⎢ 0 1 1 1 4 ⎥ ⎦ ⎣ 1 3 5 2 13 At any point during this procedure, one column will be distinguished as the pivotal column. During the first part of the procedure, the location of the pivotal column will move from left to right (1st, 2nd, ..., 5th column). In each pivotal column, a single entry called the pivot is selected according to the following pivoting strategy: 1. Each pivot must contain 1. 2. The first pivot must be in the first row. 3. Each subsequent pivot must reside in the row immediately following the row of the previously selected pivot. 4. If the desired pivot entry and all entries below it contain 0, the column must be skipped – it will not be a pivotal column. 5. If the desired pivot entry contains 0, but there is at least one nonzero entry below it, then the two rows should be interchanged. 6. If the desired pivot entry contains a nonzero value not equal to 1, then the row should be multiplied by the reciprocal of that value.
During the first part, all entries below each pivot must be eliminated. During the second part, the same columns will be revisited in reverse order (some columns may be skipped, as shown below), and the pivots chosen during the first part are used to eliminate entries above them. At the end, the matrix is in reduced row echelon form, and each pivot becomes its leading entry. In the discussion below we will use a rectangle and an oval to designate the pivot and the entries to be eliminated, respectively: pivot
1 2 0 1
0 0 1 3
2 4 1 5
5 4 4 13
3 0 1 2
entries to be eliminated Column 1
At the beginning, the first column is the pivotal column, its first entry (1) is the pivot, and the three entries below it (2, 0, and 1) are to be made into zeros. This requires the following two row operations (rather than three, since the (3,1) entry is already 0):
Step 1. r2 − 2r1 → r2
1 2 0 1
0 0 1 3
2 4 1 5
3 0 1 2
5 4 4 13
×(-2) +
1 0 0 1
0 0 1 3
2 3 0 -6 1 1 5 2
5 -6 4 13
Section 2.1 Systems of Linear Equations
Step 2. r4 − r1 → r4
1 0 0 1
0 0 1 3
5 2 3 0 -6 -6 4 1 1 5 2 13
×(-1) +
1 0 0 0
0 0 1 3
2 3 0 -6 1 1 3 -1
69
5 -6 4 8
At this point, all of the entries inside of the oval under the pivot have been eliminated. Column 2
We now advance to the next column. Since the (2, 2) entry equals zero, we interchange the second and third rows to establish a nonzero pivot there: Step 3. r2 ↔ r3
1 0 0 0
0 0 1 3
2 3 0 -6 1 1 3 -1
5 -6 4 8
1 0 0 0
0 1 0 3
2 3 1 1 0 -6 3 -1
5 4 -6 8
1 0 0 0
0 1 0 0
2 3 1 1 0 -6 0 -4
5 4 -6 -4
and then we can eliminate the entries under the pivot: Step 4. r4 − 3r2 → r4
1 0 0 0
0 1 0 3
2 3 1 1 0 -6 3 -1
5 4 -6 8
×(-3) +
Column 3
Advancing to the third column, we realize that not only is the (3, 3) entry zero, but so is the rest of that column under it. Consequently, the third column will not be used as a pivotal column.
Column 4
Skipping to the fourth column, to create the pivot in the (3, 4) entry, we multiply the third row by the reciprocal of the value currently there, then use the pivot to eliminate the entry below it.
Column 5
Step 5. −1 6 r3 → r3
1 0 0 0
0 1 0 0
2 3 1 1 0 -6 0 -4
5 4 -6 -4
Step 6. r4 + 4r3 → r4
1 0 0 0
0 1 0 0
2 3 1 1 0 1 0 -4
5 4 1 -4
×(-1/6)
×4 +
1 0 0 0
0 1 0 0
2 3 1 1 0 1 0 -4
1 0 0 0
0 1 0 0
2 1 0 0
3 1 1 0
5 4 1 -4
5 4 1 0
The fifth column is not pivotal as the (4, 5) entry is zero, and there are no entries below it. In the second part, we go back through the list of the pivotal columns.
Column 5
Column 5 was not pivotal in the previous part; therefore, we skip it again.
Column 4
In column 4, we want to eliminate all entries above the pivot:
Step 7. r2 − r3 → r2
1 0 0 0
0 1 0 0
2 1 0 0
3 1 1 0
5 4 1 0
+
×(-1)
1 0 0 0
0 1 0 0
2 1 0 0
3 0 1 0
5 3 1 0
70
Chapter 2 Linear Systems
1 0 0 0
Step 8. r1 − 3r3 → r1
Column 3 Column 2 Column 1
0 1 0 0
2 1 0 0
3 0 1 0
+
5 3 1 0
1 0 0 0
×(-3)
0 1 0 0
2 1 0 0
0 0 1 0
2 3 1 0
Column 3 is not pivotal; therefore, it is skipped. Column 2 already has the (1, 2) entry above the pivot equal zero. Column 1 has no entries above the pivot. The procedure of transforming our augmented matrix to a reduced row echelon form is now complete. Note that the leading entries in this matrix (enclosed in boxes) are the same as the pivots identified above. ⎤ ⎡ 1 0 2 0 2 ⎥ ⎢ ⎢ 0 1 1 0 3 ⎥ ⎥ ⎢ ⎢ 0 0 0 1 1 ⎥ ⎦ ⎣ 0 0 0 0 0 The absence of a row [0 · · · 0 | 1 ] means the system is consistent. Since the third column does not contain a leading entry, the corresponding unknown z is arbitrary. The solution can now be obtained easily by rewriting this matrix in the form of a linear system x y
+ 2z + z
= = w = 0 =
2 3 1 0
x = 2 − 2z, y = 3 − z, z is arbitrary, w = 1. To check this general solution, let us substitute it into the original system: (2 − 2z)
+
2 (2 − 2z)
+ (3 − z)
(2 − 2z)
Checking solutions
+
3 (3 − z)
+ +
2z 4z z 5z
+ 3 =
=
+ 1 =
+ 2 =
5 4 4 13
In the remainder of this book, we will engage in solving numerous linear systems. To save space (and some trees!), we will not go through the process of checking our answers after each of those solutions; however, we recommend that you do so as often as reasonably possible. (E.g., when you are solving some systems while taking an exam, it makes perfect sense to check your answers before turning the paper in!) When you do so, note that the extent to which you can check your answer depends upon the size of the solution set (no solution, one solution, infinitely many solutions). •
Clearly, if there is no solution, then there is nothing to check. (One could imagine trying random numbers just to make sure they don’t satisfy the system, but such an approach would be grossly inefficient.)
•
If there is one solution, it is usually an easy matter to check whether it satisfies each of the equations. The case of infinitely many solutions has a general solution involving arbitrary constant(s), which can be verified similarly (as shown at the end of the example above), possibly requiring some algebra.
Section 2.1 Systems of Linear Equations
71
A word of caution is in order: successfully checking a solution (or solutions) does not mean you found them all! (E.g., if you miss an arbitrary constant in your solution and erroneously claim the solution is unique, you will be unable to detect that mistake by checking your solution.)
Algorithm for reducing a matrix to r.r.e.f.
In Example 2.7, we have shown a complete sequence of elementary row operations which transformed the given matrix to a reduced row echelon form. Every operation performed in that example was fully explained, so that you should be able to perform this procedure on different matrices. Refer to the following flowchart for additional guidance on the operations necessary to identify the pivots and eliminate the entries below them:
Input: m x n matrix A
Let j = 1 Let i = 1
Is j £ n ?
Let j = j + 1
No
Eliminate the entries above the pivots to obtain the r.r.e.f. of A
Yes
No
Is there akj ¹ 0 with k > i ?
Yes
No
Is aij ¹ 0 ?
Yes
ri « rk Is aij = 1 ?
Yes
Let i = i + 1
For each k > i rk - akj ri ® rk
No
Let c = aij
(1/c)ri ® ri
72
Chapter 2 Linear Systems
Gauss-Jordan reduction and Gaussian elimination
The procedure for solving a linear system that was illustrated in Example 2.7 involved the following steps: •
form the augmented matrix of the linear system;
•
transform the augmented matrix to a reduced row echelon form;
•
using the linear system obtained from the r.r.e.f. of the augmented matrix, find the solution. This procedure is called Gauss-Jordan reduction. Other methods for solving linear systems include Gaussian elimination:
•
form the augmented matrix of the linear system;
•
transform the augmented matrix to a row echelon form;
•
solve the resulting system by backsubstitution. E XAMPLE 2.8
Solve the system of Example 2.7 by Gaussian elimination.
Steps 1–6 lead to a row echelon form of the augmented matrix ⎡ ⎤ 1 0 2 3 5 ⎢ ⎥ ⎢ 0 1 1 1 4 ⎥ ⎢ ⎥. ⎢ 0 0 0 1 1 ⎥ ⎣ ⎦ 0 0 0 0 0 Rather than continuing row operations (to obtain r.r.e.f.), we write the corresponding linear system x + 2z + 3w = 5 y + z + w = 4 w = 1 0 = 0 In the process of backsubstitution, we solve for the unknowns corresponding to the leading variables, from the bottom to the top. w = 1 y x
= = = =
4−z−w 3−z 5 − 2z − 3w 2 − 2z
where z is arbitrary. This solution matches the one obtained in Example 2.7.
Linear Algebra Toolkit
The author of this text has created a web-based interactive tool designed specifically to help students like yourself to improve their skill and understanding of many procedures we will be discussing here. Of those procedures, elementary row operations, Gauss-Jordan reduction, and Gaussian elimination occupy a central place in a linear algebra curriculum.
Section 2.1 Systems of Linear Equations
LA TOOLKIT .COM IL 26748372846 NG 03571894251 EE 71028335189 AB 25729491092 RR 17398108327 + A 39163728081
73
The name of this tool is Linear Algebra Toolkit. To access it, point your web browser to the URL latoolkit.com Of the modules listed in the main directory, you should find the following ones relevant to the material in this section: •
row operation calculator,
•
transforming a matrix to row echelon form,
•
transforming a matrix to reduced row echelon form,
•
solving a system of linear equations. The row operation calculator module allows you to specify which elementary row operations should be performed on the given matrix and see them performed by the Toolkit. It can be particularly useful while you are learning the mechanics of these operations, e.g., to verify the arithmetic at each step.
EXERCISES In Exercises 1–4, write the augmented matrix and the coefficient matrix corresponding to each linear system. (Do not solve the system.) 1. The system:
x1
2. The system:
2x1
+
−
6x2 3x2 x2 4x2
z + 3z
3. The system: 7x
4. The system:
3x
+ x3
= 5 = 0 = 1
− x3
x1
x
= 0 = 1
4 5
= −6 = 3
5y − y + 4y 2y 0 0
= =
= 2 = −1 = 0 = 0
In Exercises 5–6, write the linear system corresponding to each augmented matrix. (Do not solve the system.) ⎡ ⎤ 2 1 2 1 ⎢ ⎥ ⎢ 3 0 −1 0 ⎥ 2 −3 4 ⎢ ⎥. ; b. ⎢ 5. a. 6 0 0 1 0 ⎥ ⎣ 0 1 ⎦ 0 0 0 1
74
Chapter 2 Linear Systems 6. a.
0 1 −1 3 1 0 2 4
⎡
;
⎤ 1 0 5 0 ⎢ ⎥ b. ⎣ 0 1 0 0 ⎦ . 0 0 0 0
In Exercises 7–8, determine if the given matrix is in (i) both reduced row echelon form and row echelon form, (ii) row echelon form, but not in reduced row echelon form, (iii) neither. ⎡ ⎤ ⎡ ⎤ 0 1 0 2 1 2 0 1 ⎢ ⎥ ⎢ 0 0 1 0 ⎥ 0 0 0 ⎢ ⎥ ⎢ ⎥ 7. a. ⎣ 0 0 1 3 ⎦ ; b. ⎢ ; c. ⎥ 1 0 0 ⎣ 0 0 0 1 ⎦ 0 0 0 0 0 0 0 0 ⎡ ⎤ ⎡ 1 2 1 ⎢ ⎥ 0 ⎢ 0 1 4 ⎥ ⎥ ; b. 0 1 ; c. 1 0 ; d. ⎢ 8. a. ⎢ ⎣ 0 ⎢ 0 0 0 ⎥ 0 0 0 2 ⎣ ⎦ 0 0 0 0
⎡
⎢ ⎢ ; d. ⎢ ⎢ ⎣
1 0 0 0
0 0 1 0
0 1 0 0
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
⎤ 1 2 3 ⎥ 0 1 2 ⎦. 0 0 1
Each of the Exercises 9–12 contains an augmented matrix of a linear system in reduced row echelon form. i. Mark each leading entry in the matrix with a box. ii. Determine if the system corresponding to the given augmented matrix in r.r.e.f. has no solution, one solution, or infinitely many solutions. iii. If the system is consistent, find all solutions. Do not use technology. 9. a. ⎡
1 0 0 0 1 2
1 0 ⎢ 10. a. ⎣ 0 1 0 0 ⎡ 1 3 ⎢ ⎢ 0 0 11. a. ⎢ ⎢ 0 0 ⎣ 0 0
⎡
;
⎤ 1 0 0 ⎢ ⎥ b. ⎣ 0 1 0 ⎦ ; 0 0 1
c.
1 0 −3 4 0 1 2 5
⎤ ⎤ ⎡ 0 −1 1 0 6 ⎥ ⎥ ⎢ 0 2 ⎦ ; b. ⎣ 0 1 0 ⎦ ; c. 1 0 0 0 5 ⎤ ⎡ 0 0 ⎥ 0 1 0 −5 0 1 0 ⎥ ⎥ ; b. ⎢ 0 0 1 3 0 ⎣ 0 1 ⎥ ⎦ 0 0 0 0 1 0 0
⎤ 1 6 0 0 0 1 ⎥ ⎢ 12. a. ⎣ 0 0 1 0 4 3 ⎦; 0 0 0 1 −1 2 ⎡
b.
0 0 0 0 0 0
⎡
1 1 ⎢ ⎣ 0 0 0 0 ⎤ 0 ⎥ 0 ⎦. 0 .
.
⎤ 0 0 ⎥ 1 0 ⎦. 0 1
Section 2.1 Systems of Linear Equations
75
In Exercises 13–26: a. b. c. d. e.
Write an augmented matrix for the given system. Use elementary row operations to transform the augmented matrix to an r.e.f. and r.r.e.f. Use the r.e.f. and backsubstitution (Gaussian elimination) to solve the system. Use the r.r.e.f. (Gauss-Jordan reduction) to solve the system. Make sure the solutions obtained in parts c and d agree. Check your solutions by substituting them back into the original system.
You should not use technology while working on steps a–e, but it is a good idea to refer to the Linear Algebra Toolkit and compare its results to your solutions. x 5x
14.
−2x x
15.
2x x
+ 4y + 2y
= =
3 −1
x
2y − 3y
= =
−4 7
+
=
0
16. 2x 17.
2x1 −x1
18.
x1
19.
20.
+ 5y + 2y
−2 13
13.
= =
= −6 = 8
+ 4y + 3y
+ +
y
x2 3x2
x1
+ 2x2 2x2 − x2
3x −x 4x
+ y + 2y
+ x3 + 2x3 − 2x3 + z
x
+
2z
+
3z
21.
+ 2y + y + y
+ 3z + 3z
x −x x
y + 2y − y + 2y
− z − z − z
= 0 = −2 = 3
= 3 = −4 = 4
x 2x −x
22.
= =
= 2 = 1 = 4
+ z
2y x
− x4 + 4x4
+ 3x3 + 2x3
−
3w
−
3w
= 2 = 0 = −1 = 3
= 0 = 6 = −6
7 0
76
Chapter 2 Linear Systems 23.
x x 2x
24. x
+ y + 3y − y y 2y + 3y
+ 5w + w + + +
z 2z 2z
z 25. x1 x1 26.
?
−x1 3x1 2x1
x2 − x2 − x2
+ w + w + 2w
+ x3 − x3 + x3
+ +
x2 3x2 x2
+ x4 − x4 − 3x4 − 2x4
− x3 + 2x3 − x3 − x3
= 1 = 0 = −3 = −3 = −1 = 5 = 6 = 5 − 2x5 + 2x5 + 3x5 + x5
− x4 + 5x4 + 8x4 + 2x4
= = = =
= = = =
0 0 0 0
1 −1 4 2
In Exercises 27–32, show an example of an augmented matrix in r.r.e.f. of a system that matches each description or explain why such a system cannot exist. 27. A system of 2 equations in 3 unknowns with a unique solution. 28. A system of 2 equations in 3 unknowns with no solution. 29. A system of 2 equations in 3 unknowns with infinitely many solutions. 30. A system of 3 equations in 2 unknowns with a unique solution. 31. A system of 3 equations in 2 unknowns with no solution. 32. A system of 3 equations in 2 unknowns with infinitely many solutions.
T/F?
In Exercises 33–38, decide whether each statement is true or false. Justify your answer. 33. If A and B are m × n matrices in r.r.e.f., then A + B is also in r.r.e.f. 34. The matrix In is in r.r.e.f. 35. The system whose augmented matrix is I4 has no solution. 36. If A is a square matrix in r.e.f., then A is upper triangular. 37. Matrices A and 3A are row equivalent. 38. For all m × n matrices A, C and m × p matrices B, D, if [A|B] is row equivalent to [C|D], then A is row equivalent to C.
Section 2.2 Elementary Matrices and the Geometry of Linear Systems
77
2.2 Elementary Matrices and the Geometry of Linear Systems The following theorem, while easy to prove, is of great importance to justify the procedure adopted in the previous section for solving linear systems. T HEOREM 2.2
→ − Let A be an m × n matrix, b be an m-vector, and let C be a p × m matrix.
− → → → → (a) If − x = − s is a solution of the linear system A− x = b , then it is also a solution of the → − → system (CA)− x =Cb. → → x =− s is a solution of the (b) If there exists an m × p matrix B such that7 BC = Im , then − → − → − → − → linear system A x = b if and only if it is also a solution of the system (CA)− x =Cb. P ROOF If
→ − → A− s = b, then premultiplying both sides by C yields → − → C(A− s)=C b
(21)
and, by applying property 1 of Theorem 1.5,
→ − → (CA)− s =Cb,
(22)
proving part (a). To prove part (b), premultiply both sides by B and apply property 1 of Theorem 1.5 to obtain → − → (BC)A− s = (BC) b . (23) Because of the assumption BC = Im and by property 6 of Theorem 1.5, this is equivalent to (21). Consequently, from part (a) we have the following implications: → − → − → → → → (− s is a solution of A− x = b ) ⇒ (− s is a solution of CA− x =Cb) → − → → ⇒ (− s is a solution of A− x = b ),
which proves part (b).
According to part (a) of the theorem above, any linear transformation applied to the columns of the augmented matrix results in a system whose solution set contains all of the solutions of the original system (and, possibly, additional ones); by part (b), if that linear transformation can be reversed, then the solution sets of the original and transformed systems are identical. We shall demonstrate that elementary row operations correspond to the transformations of the latter kind, therefore leaving the solution set unchanged. T HEOREM 2.3 For any elementary row operation applied to a matrix A to obtain a matrix B, there exists a matrix E, called an elementary matrix, such that postmultiplying it by A yields the same B (i.e., EA = B). Furthermore, each elementary row operation can be reversed by performing another elementary row operation. P ROOF The following table contains the elementary matrices corresponding to the three types of the elementary row operations, as well as the matrices associated with reversing these (we leave it for the reader to verify the details – in particular, see Exercise 46 on p. 35 in Section 1.3). 7
In some books, when BC = I, B is referred to as a left inverse of C, and C is called a right inverse of B.
78
Chapter 2 Linear Systems
Elementary row operations for transformation A → B and the corresponding elementary matrices E (B = EA) rowi A + k rowj A →rowi B 1 0 0 0 0 1 0 0 0 0 0 0 0 0 k 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 0 0 0
rowi A → rowj B rowj A → rowi B
k rowi A →rowi B
ith row
0 0 1
jth column
1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 k 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 0 0 0
ith row
0 0 1
ith column
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0
ith column
0 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 1
ith row
jth row
jth column
Elementary row operations and elementary matrices reversing the operations above
rowi B − k rowj B →rowi A 1 0 0 0 0 1 0 0 0 0 0 0 0 0 -k 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
jth column
0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 1
1 k
ith row
1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
rowi B →rowi A
0 0 0 1/k 0 0 0 0 0
0 0 0 0 1 0 0 0 0
ith column
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 1
ith row
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
rowi B → rowj A rowj B → rowi A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
ith column
ith row
jth row
jth column
Note that because of Theorem 2.3, if A is row equivalent to B, then B is also row equivalent to A. Consequently, we can simply refer to A and B being row equivalent from now on.
C OROLLARY 2.4 solution sets.
Linear systems with row equivalent augmented matrices have identical
→ → → Performing an elementary row operation on a vector − x is a linear transformation F (− x ) = E− x (by Theorem 1.8). Therefore, by (9), any elementary matrix can be constructed by applying the corresponding elementary row operation to the identity matrix of the appropriate size. We shall illustrate this in the following two examples.
Section 2.2 Elementary Matrices and the Geometry of Linear Systems
79
Consider the elementary matrices corresponding con⎡ to the row operations ⎤ 1 0 2 3 5 ⎢ ⎥ ⎢ 2 0 4 0 4 ⎥ ⎢ ⎥ by A0 – ducted in Example 2.7. Denote the original augmented matrix ⎢ 4 ⎥ ⎣ 0 1 1 1 ⎦ 1 3 5 2 13 we shall refer to a matrix resulting from step i by Ai . All elementary matrices in this example must be 4×4. (Only a matrix of this size when multiplied by a 4 × 5 matrix results in another 4 × 5 matrix.). E XAMPLE 2.9
•
Step 1 of the example involved the row operation r2 − 2r1 → r2 . The corresponding elementary matrix can be obtained by ⎡applying this operation to I4 (or, equivalently, replacing ⎤ 1 0 0 0 ⎥ ⎢ ⎢ −2 1 0 0 ⎥ ⎥ its (2, 1) entry with −2): E1 = ⎢ ⎢ 0 0 1 0 ⎥ . Multiplying produces the correct ⎦ ⎣ 0 0 0 1 matrix: ⎡ ⎢ ⎢ ⎢ ⎢ ⎣
1 −2 0 0
0 1 0 0 E1
•
⎤⎡ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎦⎣
1 2 0 1
0 0 1 3
2 4 1 5 A0
3 5 0 4 1 4 2 13
⎤
⎡
⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎦ ⎣
1 0 0 1
0 0 1 3
⎤ 2 3 5 ⎥ 0 −6 −6 ⎥ ⎥. 4 ⎥ 1 1 ⎦ 5 2 13 A1
A1
A2
Step 3 involved a row operation of type 3: interchanging rows 2 and 3. The same thing can be accomplished when multiplying by the elementary matrix E3 as follows: ⎤ ⎡ ⎤ ⎡ ⎤⎡ 5 5 1 0 2 3 1 0 2 3 1 0 0 0 ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎢ 0 0 1 0 ⎥⎢ 0 0 0 −6 −6 ⎥ ⎢ 0 1 1 1 4 ⎥ ⎥=⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ 0 0 0 −6 −6 ⎥. ⎢ 0 1 0 0 ⎥⎢ 0 1 1 1 4 ⎦ ⎣ ⎦ ⎣ ⎦⎣ 0 3 3 −1 0 3 3 −1 8 8 0 0 0 1 E3
•
0 0 0 1
In Step 2, another row operation of type 1 was performed: r4 − r1 → r4 . Using an elementary matrix, ⎤ ⎡ ⎤ ⎡ ⎤⎡ 5 1 0 2 3 5 1 0 2 3 1 0 0 0 ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎢ 0 1 0 0 ⎥⎢ 0 0 0 −6 −6 ⎥ ⎢ 0 0 0 −6 −6 ⎥ ⎥=⎢ ⎥. ⎢ ⎥⎢ ⎢ ⎢ 0 0 1 0 ⎥⎢ 0 1 1 1 1 4 ⎥ 4 ⎥ ⎦ ⎣ 0 1 1 ⎦ ⎣ ⎦⎣ 1 3 5 2 13 0 3 3 −1 8 −1 0 0 1 E2
•
0 0 1 0
A2
A3
The fourth step implemented another row operation of type 1, r4 − 3r2 → r4 . ⎤ ⎡ ⎡ ⎤⎡ 5 5 1 0 2 3 1 0 2 3 1 0 0 0 ⎥ ⎢ ⎢ ⎥⎢ ⎢ ⎢ ⎢ 0 1 1 4 ⎥ 4 1 0 0 ⎥ ⎥=⎢ 0 1 1 ⎢ ⎥⎢ 0 1 1 ⎢ ⎢ ⎥ ⎢ 0 ⎥ 0 1 0 ⎦⎣ 0 0 0 −6 −6 ⎦ ⎣ 0 0 0 −6 −6 ⎣ 0 3 3 −1 0 0 0 −4 −4 8 0 −3 0 1 E4
A3
A4
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
80
Chapter 2 Linear Systems •
To create a pivot in the (3, 4) entry, Step 5 scaled the third row by performing an operation of type 2: − 16 r3 → r3 . The corresponding elementary matrix is obtained from I4 , replacing its (3, 3) entry by − 16 : ⎡ ⎤⎡ ⎤ ⎡ ⎤ 1 0 0 0 1 0 2 3 1 0 2 3 5 5 ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ 0 1 ⎢ ⎢ 0 0 ⎥ 1 1 4 ⎥ 4 ⎥ ⎢ ⎥⎢ 0 1 1 ⎥=⎢ 0 1 1 ⎥. ⎢ 0 0 − 1 0 ⎥⎢ 0 0 0 −6 −6 ⎥ ⎢ 0 0 0 1 1 ⎥ ⎣ ⎦⎣ ⎦ ⎣ ⎦ 6 0 0 0 1 0 0 0 −4 −4 0 0 0 −4 −4 E5
A4
A5
We leave it as an exercise for the reader to perform the remaining multiplications by elementary matrices to parallel the operations carried out in Example 2.7. After finding three additional E matrices, it will be possible to obtain the reduced row echelon form: ⎡ ⎤ ⎡ ⎤ 1 0 2 0 2 1 0 2 3 5 ⎢ ⎥ ⎢ ⎥ ⎢ 2 0 4 0 4 ⎥ ⎢ 0 1 1 0 3 ⎥ ⎢ ⎢ ⎥ ⎥ E8 E7 E6 E5 E4 E3 E2 E1 ⎢ ⎥ = ⎢ 0 0 0 1 1 ⎥. 0 1 1 1 4 ⎣ ⎦ ⎣ ⎦ 1 3 5 2 13 0 0 0 0 0 A8
E XAMPLE 2.10 Find the elementary matrix that performs the given elementary row operation on a 3 × 7 matrix: a. r2 − 32 r3 → r2 ; b. r2 ↔ r3 . S OLUTION In both cases, the elementary matrix we are seeking must be 3 × 3. (Only a matrix of this size when multiplied on the right by a 3 × 7 matrix results in another 3 × 7 matrix.) We apply the given elementary row operation to I3 . ⎤ ⎤ ⎡ 1 0 0 1 0 0 ⎥ ⎢ ⎥ ⎢ a. Applying r2 − 32 r3 → r2 to I3 = ⎣ 0 1 0 ⎦ yields the elementary matrix ⎣ 0 1 − 32 ⎦ . 0 0 1 0 0 1 ⎡
⎡
1 ⎢ b. Interchanging the last two rows in I3 results in the elementary matrix ⎣ 0 0
Properties of linear systems
⎤ 0 0 ⎥ 0 1 ⎦. 1 0
One might ask: if we performed a different sequence of elementary row operations, would the resulting reduced row echelon form be guaranteed to be the same? As it turns out, the answer to this question is “yes”.
Section 2.2 Elementary Matrices and the Geometry of Linear Systems T HEOREM 2.5 echelon form.
81
Any m × n matrix A is row equivalent to a unique matrix in reduced row
P ROOF The pivoting strategy on p. 68 provides a constructive proof of the existence of the r.r.e.f. A is row equivalent to. To show the uniqueness of the r.r.e.f., let us assume that A is row equivalent to both B and C in → − → − → → r.r.e.f. Consider B and C to be coefficient matrices of linear systems B − x = 0 and C − x = 0. The solutions sets of these systems are guaranteed to be identical by Corollary 2.4 – each of → − → these solutions is also guaranteed to be a solution of the linear system (B − C)− x = 0 (but not vice versa). Let j be the first nonzero column of B − C. It is not possible to have both jth columns of B and C as leading columns since the equality of all previous columns would require that there be the same number of leading columns (say, → i) among them and the next leading column in both matrices be e−− i+1 (making their difference → − 0 ). Therefore, if j is the first nonzero column of B − C, then the jth columns of the matrices B or C cannot both be leading columns. Consequently, the solution set of at least one of the systems → − → − → → B− x = 0 and C − x = 0 is guaranteed to contain a vector ⎡ ⎤ y1 ⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ ⎥ ⎢ yj−1 ⎥ ⎢ ⎥ ⎢ ⎥ → − y =⎢ 1 ⎥ ←− jth position. ⎢ ⎥ ⎢ 0 ⎥ ⎢ ⎥ ⎢ . ⎥ ⎢ .. ⎥ ⎣ ⎦ 0 → Since both solution sets must be identical, they must both contain − y ; therefore, so does the → − → − solution set of (B − C) x = 0 . On the other hand, the jth column must be the first leading → − → → column of B − C; thus (B − C)− y = colj (B − C) = 0 so that − y is not a solution of → − → − (B − C) x = 0 . Assuming that B − C contains a nonzero column leads us to a contradiction. We conclude that B = C.
Geometry of linear systems
By Theorem 2.1 and Corollary 2.4, every linear system has either 1. one solution, 2. infinitely many solutions, or 3. no solution. The following table contains examples of each of these three possibilities for systems •
of two linear equations in two unknowns and
•
of three linear equations in three unknowns.
82
Chapter 2 Linear Systems Note that the configurations of the lines and planes shown in this table do not exhaust all of the possibilities (e.g., another way in which a system of three equations in three unknowns can have no solution is when the three planes are parallel).
one solution
infinitely many solutions
no solution
linear system of two equations in two unknowns; each equation corresponds to a line
linear system of three equations in three unknowns; each equation corresponds to a plane
Homogeneous linear systems
→ − A linear system whose right-hand side equals 0 is called a homogeneous system. Every ho→ − → − → → mogeneous system A− x = 0 is guaranteed to have at least one solution − x = 0 , which we will refer to as the trivial solution. Additionally, some homogeneous systems may possess → − → nontrivial solutions, − x = 0 . E XAMPLE 2.11
Solve the following homogeneous linear systems. a.
x
− 2y
+
2x
− 3y − 3y
+ 3z + z
and b.
−2x x 2x 3x
z
+ 2w
= 0
+ −
= 0 = 0
w w
+ 3y + 2y + y
= = =
−
= 0.
y
0 0 0
Section 2.2 Elementary Matrices and the Geometry of Linear Systems
83
S OLUTION ⎡
⎤ 1 2 0 ⎥ 3 1 0 ⎦, has the r.r.e.f. 1 −1 0
1 −2 ⎢ a. The augmented matrix of the system, ⎣ 2 −3 0 −3 ⎡
0 1 0
1 ⎢ ⎣ 0 0
0 0 1
7 2 − 12 − 52
⎤ 0 ⎥ 0 ⎦. 0
The column corresponding to the unknown w contains no leading entry; therefore, w is arbitrary, whereas the remaining unknowns can be solved for (in terms of w):
LA TOOLKIT .COM IL 26748372846 NG 03571894251 EE 71028335189 AB 25729491092 RR 17398108327 + A 39163728081 You can use the Linear Algebra Toolkit (latoolkit.com) to reproduce the elementary row operations used in parts (a) and (b) of this example.
x
=
y
=
z
=
−7 w 2 1 w 2 5 w. 2
This system has infinitely many solutions, which include • •
the trivial solution – taking w = 0 leads to x = y = z = w = 0, and nontrivial solutions; e.g., if w = 2, then x = −7, y = 1, z = 5. ⎡
⎢ ⎢ b. The augmented matrix of the second system is ⎢ ⎢ ⎣ ⎡ ⎢ ⎢ duced row echelon form, ⎢ ⎢ ⎣
1 0 0 0
0 1 0 0
0 0 0 0
⎤
−2 3 1 2 2 1 3 −1
0 0 0 0
⎤ ⎥ ⎥ ⎥ . It follows from its re⎥ ⎦
⎥ ⎥ ⎥, that the system has a unique solution, x = y = ⎥ ⎦
0 (the trivial solution).
T HEOREM 2.6 If a homogeneous system of m linear equations in n unknowns has only the trivial solution, then m ≥ n. P ROOF For the linear homogeneous system to have a unique solution, it must have a leading entry in every left-hand side column of the r.r.e.f. of the augmented matrix. Each of the n leading entries must be located in a different row; consequently, the number of rows, m, must not be less than n.
Our next result provides a connection between solutions of a general linear system and solutions of the homogeneous system with the same coefficient matrix.
84
Chapter 2 Linear Systems → − → p is a solution of T HEOREM 2.7 Let A be an m × n matrix and let b be an m-vector. If − → − → − → − the system A x = b , then every solution s of that system can be expressed as → − → → s =− p +− g → − → → where − g is some solution of the homogeneous system A− x = 0.
x3 n set Solutio b = x of A
P ROOF → − → − → → − → − Let − p be a solution of A− x = b . For every solution → s of A− x = b , we can define → g = → − → − → − → − s − p which is a solution of A x = 0 since → − − → − → → → → → → A− g = A (− s −− p ) = A− s − A− p = b − b = 0.
sl.)
g
n (tra
s
p
x1
n set Solutio 0 of Ax =
g → → The vector − p in our last theorem is sometimes called a particular solution of the system A− x = → − b . Geometrically the solution set of a linear system can be viewed as a translation of the solution set of the associated homogeneous system by the particular solution of the original system.
x2
A word of caution is in order: this does not imply that the two solution sets are identical in size. For instance, check that replacing 0 with 1 on the right-hand side of the first equation in part b of Example 2.11 will result in an inconsistent system – the translation described above cannot occur if a particular solution does not exist. → − → → The following result focuses on the relationship between solution sets of A− x = b and A− x = → − 0 when A is a square matrix. T HEOREM 2.8
Given an n × n matrix A, the following three statements are equivalent:
A. A is row equivalent to In . → − → − → B. For every n-vector b , the system A− x = b has a unique solution. → − → C. The system A− x = 0 has only the trivial solution. P ROOF The equivalence of these statements means A⇔B, B⇔C, and A⇔C. However, it is sufficient to prove A⇒B, B⇒C, and C⇒A: •
Part I (A⇒B) → − From the assumption that A is row equivalent to In , it follows that for every n-vector b , → − → − there exists an n-vector d such that the augmented matrix [A| b ] is row equivalent to → − [In | d ]. This matrix is in reduced row echelon form, which, by Theorem 2.5, is unique. → − → − → x = d we conclude that the system has a unique This means d is unique, and from In − → − → solution − x = d.
•
Part II (B⇒C) → − → − → We assume A− x = b has a unique solution for every n-vector b . Therefore, it also has a → − − → unique solution for the specific n-vector b = 0 .
•
Part III (C⇒A) Every column of the r.r.e.f. of A must contain a leading entry (otherwise the column without a leading entry would correspond to an arbitrary variable, leading to many solutions). The only way for an n × n matrix in r.r.e.f. to have leading entries in all n columns is when that matrix equals In .
Section 2.2 Elementary Matrices and the Geometry of Linear Systems
85
EXERCISES In Exercises 1–10: a. Find the elementary matrix that performs each row operation. b. Multiply this matrix by the given matrix A and verify that the result is consistent with the desired row operation. ⎡
⎤ 1 2 0 1 ⎢ ⎥ 1. Apply r3 + 4r1 → r3 to ⎣ 0 1 −2 2 ⎦ . −4 3 1 2 ⎡ ⎤ 1 2 −2 ⎢ ⎥ ⎢ 0 1 2 ⎥ ⎢ ⎥. 2. Apply r4 − 3r2 → r4 to ⎢ 4 ⎥ ⎣ 0 0 ⎦ 0 3
1
⎡
⎤ 1 −3 5 ⎢ ⎥ 3. Apply r2 − 2r3 → r2 to ⎣ 0 1 2 ⎦. 0 0 1 ⎡
4.
5.
6.
7.
1 4 ⎢ Apply r1 + r3 → r1 to ⎣ 0 1 0 0 2 3 Apply r2 − r1 → r2 to 2 4 1 2 Apply r1 − 2r2 → r1 to 0 1 ⎡ 1 3 1 ⎢ ⎢ 0 1 2 Apply 12 r3 → r3 to ⎢ ⎢ 0 0 2 ⎣ 0 0 3 ⎡
1 2
⎢ 8. Apply 2r1 → r1 to ⎣ 3 4 ⎡ 0 ⎢ ⎢ 0 9. Apply r1 ↔ r3 to ⎢ ⎢ 2 ⎣ −7 ⎡
⎤ −1 3 ⎥ 0 1 ⎦. 1 2 1 −2 . 5 2 3 . 6 ⎤ −2 ⎥ 5 ⎥ ⎥. −6 ⎥ ⎦ 7
⎤ 1 −1 ⎥ 2 1 1 ⎦. 0 1 2 ⎤ 2 ⎥ 3 ⎥ ⎥. 5 ⎥ ⎦ 1
3 2
⎤ 1 2 0 −2 0 ⎢ ⎥ 10. Apply r2 ↔ r3 to ⎣ 0 0 5 0 1 ⎦. 0 3 1 1 0
86
Chapter 2 Linear Systems 11. Find elementary matrices that perform the following row operations on a 4 × 5 matrix: a. r4 + 5r1 → r4 ;
b. r1 ↔ r4 ;
c. 3r2 → r2 .
12. Find elementary matrices that perform the following row operations on a 3 × 8 matrix: a. r1 ↔ r2 ;
b. 15 r1 → r1 ;
c. r3 − r1 → r3 .
13. Find elementary matrices that perform the following row operations on a 5 × 3 matrix: a. −6r5 → r5 ;
b. r3 ↔ r5 ;
c. r2 − 12 r5 → r2 .
14. Find elementary matrices that perform the following row operations on a 2 × 4 matrix: a. r2 + 4r1 → r2 ;
b. r1 ↔ r2 ;
c. −r2 → r2 .
→ − → In Exercises 15 and 16, consider a linear system A− x = b with an m × n matrix A, and → let − ei denote the ith column of the m × m identity matrix Im . → → ei − ej T is the elementary matrix corresponding to the operation 15. a. * Show that E = Im + k− rowi + k rowj →rowi . →T − → the operation b. * Prove that premultiplying by E ∗ = Im − k− reverses ei ej completely → → → → ∗ ej T Im + k− ej T = Im .) ei − ei − performed in part a. (Hint: Show that E E = Im − k− → → → → 16. a. * Show that E = I − (− e −− e )(− e −− e )T is the elementary matrix corresponding m
i
j
i
j
to the operation ri ↔ rj . b. * Prove that premultiplying by the same E completely reverses this operation.
T/F?
In Exercises 17–19, decide whether each statement is true or false. Justify your answer. 17. A linear system has infinitely many solutions if and only if the r.r.e.f. of its augmented matrix contains a row [0 · · · 0 | 0] . 18. A linear system has no solution if and only if the r.r.e.f. of its augmented matrix contains a row [0 · · · 0 | 1] . → − 19. Suppose the linear system is consistent and has the augmented matrix with r.r.e.f. [C| d ]. The system has a unique solution if and only if every column of C contains a leading entry.
20. * Perform different sequences of elementary row operations to solve the system of Example 2.7 on p. 67 beginning with a. r2 ↔ r1 ; then 12 r1 → r1 to create a pivot in the (1, 1) entry, and then continue with the usual pivoting strategy; b. r1 ↔ r4 , and then continue with the usual pivoting strategy. Show that the r.r.e.f. of the augmented matrix obtained in either case is identical to the one obtained at the end of the example. 21. * Modify the table of figures from p. 82 to illustrate solutions of • •
a system of 2 equations in 3 unknowns, and a system of 3 equations in 2 unknowns.
(You may want to refer to Exercises 27–32 on p. 76.)
Section 2.3 Matrix Inverse
87
2.3 Matrix Inverse In Section 1.2, some algebraic operations on matrices were defined, including matrix addition, scalar multiplication, and subtraction. Later on, in Section 1.3, we have introduced a product of two matrices. As these operations share many properties with their counterparts for real numbers (although not all: e.g., matrix multiplication is not commutative), you might wonder whether it’s also possible to divide two matrices. Before we tackle this question, let us take a step back to the arithmetic of real numbers and take a close look at the operation of division among such numbers. Saying that a−1 is the reciprocal of a nonzero number a is equivalent to saying that aa−1 = a−1 a = 1, where 1 is the neutral element of real number multiplication (1a = a(1) = a for all a). Dividing a by a nonzero number b amounts to multiplying a by the reciprocal of b : a/b = ab−1 . In matrix multiplication, the identity matrix plays the role of the neutral element (see property 6 in Theorem 1.5). If A is an m × n matrix, then AIn = Im A = A. Let us introduce a counterpart of a real number reciprocal for matrices. You might ask whether AB = Im and BA = In can be satisfied by m × n matrices A and n × m matrices B with m = n. It turns out to be impossible, as will be shown in Exercise 23 on p. 215. This is the reason why our definition admits square matrices only.
D EFINITION (Matrix Inverse) An inverse of an n × n matrix A is a matrix B such that AB = BA = In . If a matrix A has an inverse, then A is said to be invertible, or nonsingular. Otherwise, we say A is noninvertible, or singular.
T HEOREM 2.9
If an inverse of a matrix exists, then it is unique.
P ROOF Let B and C both be inverses of an n × n matrix A. From the definition, we have AC = In and BA = In . Then B
Th. 1.5 part 6
=
(*)
BIn = B(AC)
Th. 1.5 part 1
=
(∗∗)
(BA)C = In C
Th. 1.5 part 6
=
(*) (**)
C.
According to the above theorem, we can refer to the inverse of a nonsingular matrix A. We shall denote the inverse of A by A−1 . E XAMPLE 2.12 1 4 −3 −4 a. Is the inverse of ? −1 −3 1 1
88
Chapter 2 Linear Systems ⎡
⎤ ⎡ ⎤ 2 1 0 1 2 3 ⎢ ⎥ ⎢ ⎥ b. Is ⎣ 1 3 2 ⎦ the inverse of ⎣ 1 1 2 ⎦? 0 1 1 1 1 1 S OLUTION
1
4
−3
−4
1 0
a. Performing the matrix multiplications = = I2 and −1 −3 1 1 0 1 1 4 −3 −4 1 4 1 0 is the = = I2 , we conclude that −1 −3 1 1 −1 −3 0 1 −3 −4 inverse of . 1 1 ⎡
⎤⎡ ⎤ ⎡ ⎤ 2 1 0 1 2 3 3 5 8 ⎢ ⎥⎢ ⎥ ⎢ ⎥ b. Multiplying the matrices yields ⎣ 1 3 2 ⎦ ⎣ 1 1 2 ⎦ = ⎣ 6 7 11 ⎦ = I3 . 0 1 1 1 1 1 2 2 3 ⎡ ⎤ ⎡ ⎤ 2 1 0 1 2 3 ⎢ ⎥ ⎢ ⎥ Consequently, ⎣ 1 3 2 ⎦ is not the inverse of ⎣ 1 1 2 ⎦ . 0
1 1
1 1 1
Our next objective is to find a method to invert a given matrix, if possible. Before we can accomplish this, we need to develop some additional theory. L EMMA 2.10 to In .
If A and B are n × n matrices such that AB = In , then A is row equivalent
P ROOF Let us assume that A is not row equivalent to In . Because of this assumption, there must exist a finite number of elementary matrices E1 , . . . , Ek such that the matrix Ek · · · E1 A contains a zero row. Therefore, so does the matrix Ek · · · E1 AB. However, since we assumed AB = In , it follows that Ek · · · E1 AB = Ek · · · E1 In , which means we reached a contradiction: a product of elementary matrices and an identity matrix on the right-hand side cannot have a zero row. We conclude that A is row equivalent to In .
L EMMA 2.11
If A is row equivalent to In , then there exists a matrix C such that CA = In .
P ROOF Since A is row equivalent to In, there exist elementary matrices E1 , . . . , Ek such that E k · · · E 1 A = In . Consequently, C = Ek · · · E1 satisfies CA = In .
The lemmas above serve as stepping stones which yield important results that follow.
Section 2.3 Matrix Inverse T HEOREM 2.12
89
If A and B are n × n matrices such that AB = In , then BA = In .
P ROOF Let us assume that A and B are n × n matrices such that AB = In . From Lemma 2.10, it follows that A is row equivalent to In . Furthermore, from Lemma 2.11, there exists a matrix C such that CA = In . By following steps similar to those in the proof of Theorem 2.9, we can show that C = B; therefore, BA = In .
Recall that in part a of Example 2.12, we verified that AB = In and BA = In as well. According to Theorem 2.12, the latter follows automatically from the former. Theorem 2.9 and Lemmas 2.10 and 2.11 lead us directly to our next theorem.
T HEOREM 2.13
An n × n matrix is nonsingular if and only if it is row equivalent to In .
Consequently, once we see that a square matrix cannot be row reduced to identity, we can conclude that it must be singular. For instance, if A has its ith row filled completely with zeros, then so does AB, making it impossible for AB = In to hold true.
T HEOREM 2.14
An n × n matrix containing a row of zeros must be singular.
We are now ready to formulate an efficient method to decide whether a given n × n matrix A is invertible and, if so, to find its inverse. We begin by forming an n × 2n matrix [A|In ]. •
If A is invertible, then following Theorem 2.13 as well as the reasoning in our proof of Lemma 2.11, we have Ek · · · E1 A = In and Ek · · · E1 In = A−1 ; therefore, Ek · · · E1 [ A | In ] = [ In | A−1 ] so that [A|In ] is row equivalent to [ In | A−1 ], which is guaranteed to be in a reduced row echelon form (check!).
•
If A is singular, then by Theorem 2.13 the r.r.e.f. of A is not In – instead, it must be a matrix whose nth row (and possibly others) is completely made up of zeros. Note that by Theorem 2.14 we can conclude that A is singular as soon as we see it become row equivalent to a matrix with a zero row (without having to obtain the r.r.e.f.). Let us summarize our findings. Procedure for finding the inverse of an n × n matrix A: Form the n × 2n matrix [ A | In ] and perform elementary row operations to obtain [ C | D ] (C and D are n × n matrices) where either C = In or C contains a row of zeros. · If C = In , then D = A−1 . · If C contains a row of zeros, then the matrix A is singular.
90
Chapter 2 Linear Systems ⎡
⎤ 1 −1 0 ⎢ ⎥ Find the inverse of A = ⎣ 1 0 −1 ⎦. −6 2 3 ⎤ ⎡ 1 −1 0 1 0 0 ⎥ ⎢ [ A | I3 ] = ⎣ 1 0 −1 0 1 0 ⎦ −6 2 3 0 0 1
E XAMPLE 2.13
r2 − r1 → r2
⎡
⎤ 1 0 0 1 −1 0 ⎢ ⎥ 1 −1 −1 1 0 ⎦ ⎣ 0 −6 2 3 0 0 1
r3 + 6r1 → r3
⎡
r3 + 4r2 → r3
⎡
⎤ 1 −1 0 1 0 0 ⎢ ⎥ 1 −1 −1 1 0 ⎦ ⎣ 0 0 −4 3 6 0 1 ⎤ 1 −1 0 1 0 0 ⎢ ⎥ 1 −1 −1 1 0 ⎦ ⎣ 0 0 0 −1 2 4 1
−1r3 → r3
⎡
⎤ 1 0 0 1 −1 0 ⎢ ⎥ 1 −1 −1 1 0 ⎦ ⎣ 0 0 0 1 −2 −4 −1
r2 + r3 → r2
⎡
⎤ 1 −1 0 1 0 0 ⎢ ⎥ 1 0 −3 −3 −1 ⎦ ⎣ 0 0 0 1 −2 −4 −1
r1 + r2 → r1
⎡
1 0 0 −2 −3 ⎢ ⎣ 0 1 0 −3 −3 0 0 1 −2 −4
⎤ −1 ⎥ −1 ⎦ = [ I3 | A−1 ] −1
⎡
Answer: A−1
⎤ −2 −3 −1 ⎢ ⎥ = ⎣ −3 −3 −1 ⎦. −2 −4 −1
The inverse obtained in the example above can be checked by multiplying it by the matrix A : ⎤ ⎡ ⎤⎡ ⎤ ⎡ 1 0 0 −2 −3 −1 1 −1 0 ⎥ ⎥⎢ ⎥⎢ ⎢ 0 −1 ⎦ = ⎣ 0 1 0 ⎦ . ⎣ −3 −3 −1 ⎦ ⎣ 1 −2 −4 −1
−6
2
3
0 0 1
⎡ E XAMPLE 2.14
⎢ Find the inverse of A = ⎣
1 2 3 −1 −2 3
Section 2.3 Matrix Inverse ⎤ 0 ⎥ 2 ⎦.
91
−2
⎤ 1 2 0 1 0 0 ⎥ ⎢ [ A | I3 ] = ⎣ 3 −1 2 0 1 0 ⎦ −2 3 −2 0 0 1 ⎡
r2 − 3r1 → r2
⎡
⎤ 1 2 0 1 0 0 ⎢ ⎥ 2 −3 1 0 ⎦ ⎣ 0 −7 0 0 1 −2 3 −2
r3 + 2r1 → r3
r3 + r2 → r3
⎡
⎤ 1 0 0 1 2 0 ⎢ ⎥ 2 −3 1 0 ⎦ ⎣ 0 −7 0 7 −2 2 0 1 ⎡
1 2 ⎢ ⎣ 0 −7 0 0
⎤ 0 1 0 0 ⎥ 2 −3 1 0 ⎦ = [ C | D ] 0 −1 1 1
Since C contains a row of zeros, we conclude that A is singular.
LA TOOLKIT .COM IL 26748372846 NG 03571894251 EE 71028335189 AB 25729491092 RR 17398108327 + A 39163728081
This procedure is implemented by the matrix inverse module included in the Linear Algebra Toolkit (entitled “Calculating the inverse using row operations”). You can use it to invert matrices up to size 6 × 6.
Properties of matrix inverse T HEOREM 2.15
(Inverse of a Matrix Product)
If A and B are n × n nonsingular matrices, then AB is also a nonsingular matrix and (AB)−1 = B −1 A−1 . P ROOF According to the definition of the inverse, for n × n matrices C and D, C −1 = D if CD = In and DC = In . Taking C = AB and D = B −1 A−1 and using properties 1 and 6 of Theorem 1.5 as well as the definition of the inverse, we obtain CD = (AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIn A−1 = AA−1 = In . Theorem 2.12 yields DC = In ; therefore, we conclude that D = B −1 A−1 is the inverse of C = AB.
Two additional properties of matrix inverse are stated in the theorem below. Their proofs are left as exercises (25 and 26 on p. 97) and can be conducted similarly to the proof of Theorem 2.15.
92
Chapter 2 Linear Systems T HEOREM 2.16
If A is a nonsingular matrix, then
1. A−1 is also a nonsingular matrix and (A−1 )−1 = A; 2. AT is also a nonsingular matrix and (AT )−1 = (A−1 )T .
Four equivalent statements
According to Theorem 2.13, an n × n matrix A is nonsingular if and only if it is row equivalent to In . On the other hand, Theorem 2.8 established the equivalence of the latter statement to two others: • •
→ − → − → the linear system A− x = b has a unique solution for any n-vector b , and → − → the system A− x = 0 has a unique solution. These two theorems can be “repackaged” more neatly in the following form. 4 Equivalent Statements For an n × n matrix A, the following statements are equivalent. 1. A is nonsingular. 2. A is row equivalent to In . → − → − → 3. For every n-vector b , the system A− x = b has a unique solution. → − → 4. The homogeneous system A− x = 0 has only the trivial solution.
Throughout this book, we will be adding more and more statements to this list. The more we add, the more we will appreciate the advantage offered by this “multilateral” format as opposed to the “bilateral” theorems like Theorem 2.13. Note that the equivalence means that if one of the statements is not satisfied, none of the others are satisfied either. 4 Equivalent “Negative” Statements For an n × n matrix A, the following statements are equivalent. -1. A is singular. -2. A is not row equivalent to In . → − → − → -3. For some n-vector b , the system A− x = b has either no solution or many solutions. → − → -4. The homogeneous system A− x = 0 has nontrivial solutions.
Section 2.3 Matrix Inverse
Inverse of the coefficient matrix and solution of a system
93
According to the equivalent conditions introduced above, a linear system of n equations in n unknowns with a nonsingular coefficient matrix A is guaranteed to have a unique solution. If A−1 is actually known, then it can be used to determine this unique solution. Multiplying both sides of → − → A− x = b from the left by A−1 we obtain → − → A−1 A− x = A−1 b . → x , so that The left-hand side equals In − → − → − (24) x = A−1 b . → This is an explicit formula for the unique solution − x.
E XAMPLE 2.15
The linear system x1 x1 −6x1
can be written in the form
−
x2
+ 2x2
− x3 + 3x3
= −2 = 2 = −3
⎡
⎤ ⎡ ⎤ 1 −1 0 −2 → − → ⎢ − ⎢ ⎥ ⎥ → A− x = b with A = ⎣ 1 0 −1 ⎦ and b = ⎣ 2 ⎦ . −6 2 3 −3 In Example 2.13, we found the inverse of the coefficient matrix A (at the same time showing ⎡ ⎤ −2 −3 −1 ⎢ ⎥ that A is nonsingular): A−1 = ⎣ −3 −3 −1 ⎦ . Equation (24) yields −2 −4 −1 ⎤ ⎡ ⎤ ⎤⎡ −2 1 −2 −3 −1 ⎥ ⎢ ⎥ ⎢ ⎥⎢ → − x = ⎣ −3 −3 −1 ⎦⎣ 2 ⎦ = ⎣ 3 ⎦ . −3 −1 −2 −4 −1 ⎡
A−1
→ − b
− → → (Check that A− x = b . You may also want to verify that the same solution is obtained using Gauss-Jordan reduction or Gaussian elimination.)
Suppose we are asked to solve multiple linear systems that share the same coefficient matrix, −−→ −→ −−→ −→ but for different right-hand side vectors: Ax(1) = b(1) , Ax(2) = b(2) , etc. Rather than perform−→ −→ ing elementary row operations on [A|b(1) ], [A|b(2) ], etc., an attractive alternative would appear to be to find the inverse of A, then multiply it by each right-hand side vector. Unfortunately, when using finite precision arithmetic, this approach can potentially introduce large errors into the computation. However, an efficient solution can be found that does not have the numerical drawbacks associated with inverting matrices – it will involve matrix factorizations discussed in the following section.
94
Chapter 2 Linear Systems
Invertible transformations
In Section 1.4, we have introduced linear transformations and have shown that if F : Rn → Rn → → is a linear transformation, then an n×n matrix A exists such that F (− x ) = A− x for all n-vectors → − x. If A is nonsingular, then we can define another transformation G : Rn → Rn by taking → → → x for all − x in Rn . The transformation G is called an inverse transformation of G(− x ) = A−1 − F because → → → → → → G(F (− x )) = − x and F (G(− y )) = − y for all − x and − y in Rn .
F F( v ) = w
G(w) = v
A transformation F is said to be invertible if it has an inverse transformation; otherwise, it is called noninvertible.
G
E XAMPLE 2.16 The transformation F : R2 → R2 defined in Example 1.22 performed a counterclockwise rotation by 90 degrees: x x 0 −1 . F( )= y y 1 0 A
To invert A, we set up
0 −1 1 0 1 0 0 1 and perform elementary row operations (r1 ↔ r2 ; −r2 → r2 ) to obtain 0 1 1 0 . 0 1 −1 0 The transformation F is invertible – its inverse transformation, x 0 1 x G( )= , y −1 0 y
A−1
corresponds to clockwise rotation by 90 degrees (check by referring to Example 1.23). ⎡
E XAMPLE 2.17 ⎡
⎤ x ⎢ ⎥ One of the transformations listed in the table on p. 42 is F (⎣ y ⎦) = z ⎤ x ⎥ y ⎦ – projection of vectors in R3 onto the xy-plane. There are two very
⎤⎡ 1 0 0 ⎢ ⎥⎢ ⎣ 0 1 0 ⎦⎣ 0 0 0 z good reasons why this transformation is not invertible: ⎡
• •
1 ⎢ the matrix ⎣ 0 0
⎤ 0 0 ⎥ 1 0 ⎦ is singular (check!) and 0 0
it is impossible to find any function (not just a linear transformation) that will “recover” the ⎡ ⎤ ⎡ ⎤ x x ⎢ ⎥ ⎢ ⎥ original vector ⎣ y ⎦ from its “shadow” in the xy-plane ⎣ y ⎦ (the z component of the z 0 original vector is irretrievably lost).
Section 2.3 Matrix Inverse
95
Note that linear transformations from Rn to Rm with n = m cannot be invertible, as their matrices are not square (only square matrices can possibly be inverted).
EXERCISES
4 1 1 3 1. Is the inverse of ? −1 2 1 6 3 8 −5 8 2. Is the inverse of ? 2 5 2 −3 ⎡
⎤ ⎡ ⎤ −4 0 −3 5 0 3 ⎢ ⎥ ⎢ ⎥ 3. Is ⎣ 0 1 2 ⎦ the inverse of ⎣ 14 1 8 ⎦? 7 0 5 −7 0 −4 ⎡
2 1 ⎢ 4. Is ⎣ 1 2 3 2
⎤ ⎡ ⎤ 1 1 1 3 ⎥ ⎢ ⎥ 4 ⎦ the inverse of ⎣ 2 3 1 ⎦? 1 1 2 2
In Exercises 5–12, follow the procedure on p. 89 to find the inverse of each matrix if possible. If you obtained the inverse, verify it multiplying it by the original matrix. (Do not use technology.) 1 3 2 3 3 6 5. a. ; b. ; c. . 2 5 2 5 2 4 1 −2 −2 1 −1 2 2 . 6. a. ; b. ; c. 3 4 −1 1 2 −4 2 ⎡
⎤ ⎡ ⎤ ⎡ ⎤ 1 0 2 −2 0 1 −3 2 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 7. a. ⎣ 0 2 0 ⎦ ; b. ⎣ 0 1 0 ⎦ ; c. ⎣ 0 1 −1 ⎦ . 0 0 1 2 0 −1 1 −1 0 ⎡
⎤ ⎡ 3 0 −2 1 0 0 ⎢ ⎥ ⎢ 8. a. ⎣ 0 1 0 ⎦ ; b. ⎣ 3 −2 0 −1 0 1 −2 −3 1 ⎡ ⎡ ⎤ 0 1 −1 1 0 0 1 ⎢ ⎢ ⎥ ⎢ ⎢ 0 1 ⎥ 1 2 0 ⎥ ⎢ 2 0 9. a. ⎢ ⎢ 0 0 −1 0 ⎥ ; b. ⎢ 2 1 1 ⎦ ⎣ ⎣ 0 1 0 0 0 0 1 ⎡ ⎤ ⎡ 1 0 1 1 1 1 1 ⎢ ⎥ ⎢ ⎢ 0 1 ⎥ ⎢ 1 1 1 0 1 ⎥ ⎢ 10. a. ⎢ ⎢ 0 1 −1 0 ⎥ ; b. ⎢ 1 1 2 ⎣ ⎦ ⎣ −1 0 0 0 1 2 1
⎤
⎡
⎤ 0 1 1 ⎥ ⎢ ⎥ ⎦ ; c. ⎣ 3 −1 2 ⎦ . 2 1 3 ⎤ 0 ⎥ 1 ⎥ ⎥. 0 ⎥ ⎦ −1 ⎤ 1 ⎥ 2 ⎥ ⎥. 1 ⎥ ⎦ 1
96
Chapter 2 Linear Systems ⎡
⎤
⎡
1 0 ⎢ 1 ⎢ 0 ⎢ ⎢ 11. a. ⎢ 0 0 ⎢ ⎣ 0 −3 0 0
⎢ 0 0 0 ⎢ ⎥ ⎢ 0 0 0 ⎥ ⎢ ⎥ ⎢ 1 0 0 ⎥ ⎥ ; b. ⎢ ⎢ ⎥ ⎢ 0 1 0 ⎦ ⎢ ⎣ 0 0 1
⎡
⎤
⎢ ⎢ ⎢ 12. a. ⎢ ⎢ ⎢ ⎣
1 0 0 0 0
0 0 1 0 0
0 1 0 0 0
0 0 0 1 0
0 0 0 0 1
⎡
⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ; b. ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎢ ⎣
1 0 0 0 0 0
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 4 0
⎤ 0 ⎥ 0 ⎥ ⎥ 0 ⎥ ⎥ ⎥. 0 ⎥ ⎥ 0 ⎥ ⎦ 1 ⎤
1 0 0 0 0 0 ⎥ 0 1 0 0 0 0 ⎥ ⎥ 0 0 1 0 0 0 ⎥ ⎥ ⎥. 0 0 0 1 0 0 ⎥ ⎥ 0 0 0 0 1 0 ⎥ ⎦ 0 0 0 0 0 0
In Exercises 13–16, solve each system by using the inverse of the coefficient matrix. 13. x −4x
− +
y 3y
= 1 = −3
14. 2x 2x
+ 3y + 4y
= 3 = 2
15. 3y 2y
−x x
+
2x x x
− y
+ 2z + z
= 0 = −1 = −1
16.
+ y
+ z + 2z
= 1 = 0 = −4
In Exercises 17–20, determine if the given linear transformation is invertible. If so, describe the inverse transformation geometrically and find its matrix. 17. F : R2 → R2 performing reflection with respect to the y-axis. 18. F : R2 → R2 performing the dilation by the factor 3. 19. F : R3 → R3 performing the projection onto the z-axis. 20. F : R3 → R3 performing the reflection with respect to the xz-plane.
Section 2.3 Matrix Inverse
97
In Exercises 21–24, you are given inverses A−1 and B −1 ; all problems in parts a–d can be solved by relying on properties of the matrix inverse, without actually inverting any matrices (in particular, without inverting A−1 to obtain A or inverting B −1 to obtain B). ⎡ ⎤ ⎡ ⎤ 1 2 −1 0 3 1 ⎢ ⎥ ⎢ ⎥ 21. Given A−1 = ⎣ 1 2 0 ⎦ and B −1 = ⎣ 3 1 3 ⎦ 1 1 1 1 3 0 −1 , a. evaluate (AB)−1 , b. evaluate AT ⎡ ⎤ 3 ⎢ ⎥ → − → − c. find x such that A x = ⎣ 1 ⎦ , 1 d. are the matrices A and B ⎡ 2 1 0 ⎢ ⎢ 1 3 1 22. Given A−1 = ⎢ ⎢ 0 1 3 ⎣ 0 0 1
row equivalent? (Justify your answer.) ⎤ ⎤ ⎡ 0 2 0 0 0 ⎥ ⎥ ⎢ ⎥ ⎢ 0 ⎥ ⎥ and B −1 = ⎢ 3 1 0 0 ⎥ ⎥ ⎥ ⎢ 1 ⎦ ⎣ 0 0 2 0 ⎦ 2 4 0 3 1 −1 a. evaluate (BA)−1 , b. evaluate AT , ⎡ ⎤ 1 ⎢ ⎥ ⎢ 2 ⎥ → → ⎥ c. find − x such that A− x =⎢ ⎢ 1 ⎥, ⎣ ⎦ 0
d. are the matrices A and B ⎡ 0 1 0 ⎢ ⎢ 2 0 1 23. Given A−1 = ⎢ ⎢ 0 2 0 ⎣ 2 0 2
row equivalent? (Justify your answer.) ⎤ ⎤ ⎡ 1 1 1 0 0 ⎥ ⎥ ⎢ ⎥ ⎢ 0 ⎥ ⎥ and B −1 = ⎢ 0 1 1 1 ⎥ ⎥ ⎥ ⎢ 1 ⎦ ⎣ 1 1 1 0 ⎦ 0 0 0 1 1
a. evaluate (B T A)−1 , b. evaluate ((A−1 )−1 )−1 , −1 c. evaluate B 2 , d. evaluate (AB −1 )−1 (BA−1 )−1 . ⎡
24. Given A−1
⎤ ⎡ ⎤ 5 4 2 2 −1 0 ⎢ ⎥ ⎢ ⎥ = ⎣ 2 3 1 ⎦ and B −1 = ⎣ 2 1 1 ⎦ 1 0 3 0 1 −2
a. evaluate (AT B)−1 , b. evaluate ((B T )−1 )T , −1 c. evaluate A2 , d. evaluate B −1 (AB −1 )−1 .
In Exercises 25 and 26, you may find it helpful to adopt an approach similar to the one used in the proof of Theorem 2.15. 25. * Prove part 1 of Theorem 2.16. 26. * Prove part 2 of Theorem 2.16.
98
Chapter 2 Linear Systems
T/F?
In Exercises 27–34, decide whether each statement is true or false. Justify your answer. 27. For all n × n nonsingular matrices A and B, (A−1 B −1 )T = (AT B T )−1 . −1 −1 3 28. For all n × n nonsingular matrices A, A3 = A . 1 → − 29. If A is a 2 × 2 matrix such that the linear system A x = has no solution, then A is 2 nonsingular. ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 30. If A ⎣ 2 ⎦ = A ⎣ 1 ⎦ = ⎣ 0 ⎦, then A is invertible. 3
2
1 ⎡
⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 x 9 ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ 31. If A is row equivalent to ⎣ 0 1 0 ⎦, then A ⎣ y ⎦ = ⎣ 7 ⎦ is consistent. 0 0 1 z 3 32. If A is a nonsingular matrix, then its r.r.e.f. contains at least one zero row. 33. All nonsingular 5 × 5 matrices are row equivalent. 34. There exist some invertible linear transformations F : R2 → R3 .
35. * If matrices A, B, C, and D have sizes m × m, m × n, n × m, and n × n, respectively, and A is nonsingular, show that 0 B A Im A B = . C D CA−1 In 0 D − CA−1 B
2.4 Applications of Linear Systems and Matrix Factorizations
Application: Alloys
In the following example, a linear system is set up using the alloy vectors introduced in Example 1.5. E XAMPLE 2.18
Suppose a supply of the following three alloys of gold is available:
•
alloy 1 (22 karat), made up of 22 parts gold, 1 part silver, and 1 part copper;
•
alloy 2 (14 karat), made up of 14 parts gold, 6 parts silver, and 4 parts copper;
•
alloy 3 (18 karat), made up of 18 parts gold and 6 parts copper. How can these three alloys be combined to obtain:
•
alloy 4 (18 karat), made up of 18 parts gold, 3 parts silver, and 3 parts copper?
Section 2.4 Applications of Linear Systems and Matrix Factorizations
99
To solve this problem, we set up a system of three linear equations in three unknowns: 22 24 x1 1 24 x1 1 24 x1
+ + +
14 24 x2 6 24 x2 4 24 x2
+
18 24 x3
+
6 24 x3
= = =
18 24 3 24 3 24
In this system, the unknowns x1 , x2 , and x3 represent the amounts of the first, second, and third alloy used to mix together and obtain the fourth alloy – we must insist on these unknowns being nonnegative in order for the solution to make sense. The first equation expresses the requirement that the amount of gold in the mix (left-hand side) matches the required amount of gold (right-hand side). The second and the third equations express the analogous requirements with respect to silver and copper, respectively. Using the augmented matrix of the system (and referring to details) ⎡ ⎡ 22 14 18 18 ⎤ 1 24 24 24 24 ⎢ 1 3 ⎥ →→→→→→ ⎢ 6 ⎣ 0 ⎣ 24 24 0 24 ⎦ 3 4 6 1 a sequence 0 24 24 24 24 of elem. row ops. we can conclude that the system has a unique solution:
the Linear Algebra Toolkit for 3 7 3 7 1 7
0 0 1 0 0 1
⎤ ⎥ ⎦
= 37 x2 = 37 x3 = 17 Therefore, alloy 4 can be obtained by mixing three parts of alloy 1, three parts of alloy 2, and one part of alloy 3. x1
Here is a vector interpretation of this solution ⎡ 22 ⎤ ⎡ 14 ⎤ 3⎢ ⎣ 7
24 1 24 1 24
⎥ 3⎢ ⎦+ ⎣ 7
24 6 24 4 24
⎡
18 24
⎤
⎡
⎥ 1⎢ ⎥ ⎢ ⎦+ ⎣ 0 ⎦=⎣ 7 6 ⎡
24
⎤ gold content ⎢ ⎥ where each alloy is represented by a vector: ⎣ silver content ⎦ . copper content
Application: Balancing chemical reaction equations
18 24 3 24 3 24
⎤ ⎥ ⎦,
Chemical equations are often used to describe chemical reactions. For example, the reaction taking place when methane is burning in the air is represented by the equation CH4 + 2O2 → CO2 + 2H2 O. Methane (CH4 ) and oxygen (O2 ) are the reactants, positioned on the left side of the arrow. On the right side, the products of the reaction are listed: carbon dioxide (CO2 ) and water (H2 O). During such a reaction, atoms can be neither created nor destroyed. For example, there are four oxygen atoms on the reactant side (since 2O2 denotes two molecules, each containing two oxygen atoms) and four oxygen atoms on the product side.
100
Chapter 2 Linear Systems E XAMPLE 2.19
Let us balance the reaction x1 NH3 + x2 O2 → x3 NO + x4 H2 O.
We form a system of equations, each corresponding to a different element. Each molecule of both ammonia (NH3 ) and nitrogen oxide (NO) contains one atom of nitrogen. Therefore, if reactants include x1 molecules of ammonia and products include x3 molecules of nitrogen oxide, then we must have x1 = x3 . Likewise, we can balance the number of hydrogen atoms (keeping in mind that each ammonia molecule contains three hydrogen atoms, while each molecule of water has two): 3x1 = 2x4 . Finally, for oxygen we have 2x2 = x3 + x4 . These three equations can now be rewritten in the standard form (18) x1 3x1 2x2
−
x3
−
x3 ⎡
− 2x4 − x4
= 0 = 0 = 0
(25)
⎤ 1 0 −1 0 0 ⎢ ⎥ The augmented matrix of this linear system, ⎣ 3 0 0 −2 0 ⎦, has the reduced row 0 2 −1 −1 0 ⎤ ⎡ 2 1 0 0 −3 0 ⎥ ⎢ echelon form: ⎣ 0 1 0 − 56 0 ⎦. (Check this by hand, or use the Linear Algebra Toolkit.) 0 0 1 − 23 0 Since the fourth column has no leading entry, x4 is arbitrary, whereas the remaining variables can be expressed in terms of x4 : 2 x4 x1 = 3 5 x4 x2 = 6 2 x4 . x3 = 3 While mathematically this system has infinitely many solutions, including these three (a) x1 = 23 , x2 = 56 , x3 = 23 , x4 = 1, (b) x1 = −8, x2 = −10, x3 = −8, x4 = −12, (c) x1 = 0, x2 = 0, x3 = 0, x4 = 0, none of these are considered acceptable solutions to the problem of balancing the given chemical equation for various reasons: (a) it doesn’t make sense to consider a fraction (e.g., 2/3) of a molecule, (b) the coefficients should not be negative (otherwise a reactant would become a product, and vice versa), and (c) an equation with all coefficients equal to zero represents no reaction. What we really want is to make sure that all the coefficients are positive integers, using as small values as possible. Taking x4 = 6 yields x1 = 4, x2 = 5, and x3 = 4, and this corresponds to the balanced reaction 4NH3 + 5O2 → 4NO + 6H2 O.
Section 2.4 Applications of Linear Systems and Matrix Factorizations
Application: Network flow 200
E XAMPLE 2.20 Consider the network of one-way streets depicted in the margin. Each number indicates the traffic flow, in cars per hour, measured along the given street. Our objective is to determine the unknown traffic flow figures (indicated with question marks).
300
400
? ?
200
Let us superimpose a network of oriented line segments on our street network so that we can clearly observe the connections between various quantities. Furthermore, we denote the four unknown quantities by x, y, z, and w and then proceed to set up equations reflecting the relationships between the known and the unknown quantities.
?
200
101
400 The intersections are designated with solid dots in our diagram. Each intersection yields exactly one equation, based on the principle
?
⎛
200 z
400
⎞ total traffic ⎜ ⎟ arriving at ⎝ ⎠ the intersection
300
y
200
⎛
⎞ total traffic ⎜ ⎟ = ⎝ leaving ⎠. the intersection
w
200
400
In the following table, we apply this principle to obtain the equations corresponding to the highlighted intersection.
x
200
300 z
400 y
200
200
300 z
400
w
200
y 400
200
x
z
y 400
x
300
400
w
200
z + 200 = y + 400
200
200 w
200
400
x
w + 200 = z + 300
x + y + 200 = w + 400
The three linear equations can now be rewritten in the standard form, with the unknown terms on the left-hand sides: − y + z = 200 − z + w = 100 x + y − w = 200. ⎡ ⎤ 0 −1 1 0 200 ⎢ ⎥ The augmented matrix of this system, ⎣ 0 0 −1 1 100 ⎦, has the reduced row 1 1 0 −1 200 ⎤ ⎡ 500 1 0 0 0 ⎥ ⎢ echelon form ⎣ 0 1 0 −1 −300 ⎦. (Check!) 0
0
1
−1
−100
102
Chapter 2 Linear Systems Therefore, there are infinitely many solutions to our problem
x
= 500
y
= w − 300
z
= w − 100
where w can have an arbitrary value.
1
To avoid reversing the directions of the one-way streets in our network, we must choose w to make sure that the inequalities
2 3 4
1 x ≥ 0; 2 y ≥ 0; 3 z ≥ 0; 4 w ≥ 0
w
0 100 200 300 1 &2 &3 &4
hold true – we shall refer to the solutions of our system that satisfy these inequalities as feasible solutions. To determine the w values leading to the feasible solutions rewrite the four inequalities in terms of w :
5000 7000
Refer to the figure in the margin to see how we can use the w-axis to graphically find the set of values of w that satisfies all four inequalities simultaneously – in our case all such solutions correspond to w ≥ 300.
3000
1000
1 500 ≥ 0; 2 w − 300 ≥ 0; 3 w − 100 ≥ 0; 4 w ≥ 0.
For example, if w = 350, then x = 500, y = 50, and z = 250.
2000
3000
1000
E XAMPLE 2.21 A two-level intersection is shown in the margin, including traffic volumes per hour during the evening commute. Our objective is to determine all remaining traffic volumes throughout this intersection.
6000 5000
1000
x1
7000
x2
x5
x4
Similarly to the previous example, each intersection leads to a single equation. Verify that the system has an augmented matrix
x3
3000
5000
We begin by redrawing the diagram using one-way street segments and then denoting each unknown quantity by a name x1 , . . . , x9 . There are eight intersections involving these one-way segments. Each of them is marked by a solid dot – make sure to distinguish these intersections from the situations where one segment passes over another one at a different level!
x6
2000
3000
1000
x7
x9
x8
6000
5000
⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
−1 0 0 0 1 1 0 −1 0 0 1 0 0 −1 −1 0 0 0 0 0
0 0 0 0
0 0 0 0
1 0 0 0
−1 0 0 0
0 0 1 0
0 0 0 0
0 0 0 0
0 0 −1 −1 1 0 1 0 0 0 0 1 0 −1 0 0
⎤ 0 −5000 ⎥ 0 1000 ⎥ ⎥ 0 7000 ⎥ ⎥ ⎥ 0 −4000 ⎥ ⎥ 0 −3000 ⎥ ⎥ ⎥ 0 7000 ⎥ ⎥ 1 2000 ⎥ ⎦ −1 −5000
Section 2.4 Applications of Linear Systems and Matrix Factorizations
x9
5
4
9
1
6
5000 4000 3000 2000 1000
x7
0 0 0 0 0 0 1 0
0 −2000 1 2000 2000 −1 1 −1000 7000 0 1 5000 2000 1 0 0
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
Two of the nine unknowns, x7 and x9 , are arbitrary, subject to the restriction that they lead to nonnegative values of all unknowns. To graphically determine feasible solutions, we plot an intersection of half-planes corresponding to the inequalities 1 x1 ≥ 0; . . . ; 9 x9 ≥ 0. We plot the line for each equation, along with the number tag on the side of it where the inequality holds (lines not adjacent to the feasible region are plotted as dotted lines).
3
1000 2000 3000 4000 5000 6000 7000
2,8 7
0
whose reduced row echelon form is ⎡ 1 0 0 0 0 0 −1 ⎢ ⎢ 0 1 0 0 0 0 0 ⎢ ⎢ 0 0 1 0 0 0 0 ⎢ ⎢ ⎢ 0 0 0 1 0 0 −1 ⎢ ⎢ 0 0 0 0 1 0 1 ⎢ ⎢ ⎢ 0 0 0 0 0 1 0 ⎢ ⎢ 0 0 0 0 0 0 0 ⎣ 0 0 0 0 0 0 0 (Use the Linear Algebra Toolkit to verify this.)
103
An example of a feasible solution can be obtained by taking x7 = 5000, x9 = 1000, leading to x1 = 3000, x2 = 1000, x3 = 3000, x4 = 3000, x5 = 2000, x6 = 4000, x8 = 1000. Determining nonnegative feasible solutions in cases involving three or more arbitrary variables is considerably more complex and will not be required in this book (geometrically, the corresponding feasible region will be an intersection of half-spaces in higher-dimensional space).8
Application: Polynomial interpolation y
a0
...
yn
Consider n + 1 given points (x0 , y0 ), (x1 , y1 ), . . . , (xn , yn ) such that the values x0 , x1 , . . . , xn are distinct. A polynomial of degree n or less, p(x) = a0 +a1 x+· · ·+an xn , that passes through all the given points, i.e., p(xi ) = yi for i = 0, 1, . . . , n, is called the Lagrange interpolating polynomial. Its coefficients satisfy the following linear system:
y=p(x)
y2 y1 y0
x x0 x1
x2
...
+
+
···
+ xn0 an
=
y0
x1 a 1 + .. .
···
xn1 an
=
.. .
y1 .. .
+ xnn an
= yn
x0 a 1
a0 .. .
+
a0
+ xn a 1
+
···
+
It can be shown that this linear system has a unique solution.
xn E XAMPLE 2.22 Find the Lagrange interpolating polynomial of degree 3 or less passing through (0, 2), (1, 4), (2, 3), and (3, 2).
y
The polynomial we are seeking has the form p(x) = a0 + a1 x + a2 x2 + a3 x3 with the coefficients a0 , a1 , a2 , and a3 satisfying the linear system
y=p(x)
4 3 2
x 1
2
a0 a0 a0 a0
+ a1 + 2a1 + 3a1
+ + +
a2 2 a2 32 a2 2
+ + +
a3 2 a3 33 a3 3
= = = =
2 4 3 2
3 8 A related problem of great practical importance is a linear programming problem – it involves maximizing a linear function of the unknowns over the feasible region.
104
Chapter 2 Linear Systems ⎡
⎤ ⎡ 1 0 0 0 2 1 ⎢ ⎥ ⎢ ⎢ 1 1 1 ⎢ 1 4 ⎥ ⎥ has the r.r.e.f. ⎢ 0 Since the augmented matrix ⎢ ⎢ 1 2 4 ⎥ ⎢ 0 8 3 ⎦ ⎣ ⎣ 1 3 9 27 2 0 we conclude that p(x) = 2 + 92 x − 3x2 + 12 x3 is the desired polynomial.
Linear systems with parameters
0 1 0 0
0 0 1 0
0 0 0 1
2
⎤
⎥ ⎥ ⎥ −3 ⎥ ⎦ 9 2 1 2
The linear systems we have solved so far in this chapter involved only given real numbers as coefficients and right-hand side values. However, in some problems it will become desirable to study such systems in which at least some of these values are not explicitly specified but are left as parameters instead. E XAMPLE 2.23
Consider the linear system with the augmented matrix
a 4 −6 1 a 3
.
The first elementary row operation we will perform is r1 ↔ r2 . This is done to avoid having to divide by a, since we cannot guarantee a = 0.
3 1 a a 4 −6
Now, we can safely eliminate the (2, 1) entry using the operation r2 − ar1 → r2 . 3 1 a 0 4 − a2 −6 − 3a If 4 − a2 = 0, i.e., a is neither −2 nor 2, then the system has a unique solution (each of the first two columns will contain a leading entry). 1 −2 3 , corresponding to infinitely many soluIf a = −2, then the matrix becomes 0 0 0 tions (the second unknown is arbitrary since there is no leading entry in its column). 3 1 2 indicates the system has no solution (the second If a = 2, then the matrix 0 0 −12 equation, 0 = −12, is inconsistent).
LU decomposition Consider a sequence of elementary row operations transforming the coeffi⎡ ⎤ 2 1 1 ⎢ ⎥ cient matrix A = ⎣ 3 0 −1 ⎦ to an upper triangular matrix: E XAMPLE 2.24
⎡ A
r2 − 32 r1 →r2
−→
2 ⎢ ⎣ 0 2
2 2 1
3 1
⎤
⎥ r3 −r1 →r3 − 32 − 52 ⎦ −→ 2 3
⎤ ⎡ 2 1 1 2 1 1 ⎥ r3 + 23 r2 →r3 ⎢ ⎢ −→ ⎣ 0 − 32 − 52 ⎦ ⎣ 0 − 32 − 52 1 0 1 2 0 0 3 ⎡
U
⎤ ⎥ ⎦.
Section 2.4 Applications of Linear Systems and Matrix Factorizations
105
(Note that the matrix U is not in row echelon form.) The same transitions can be expressed in terms of elementary matrices: ⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎤⎡ ⎤⎡ 1 0 0 1 0 0 1 0 0 2 1 1 2 1 1 ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎥⎢ ⎥⎢ ⎣ 0 1 0 ⎦⎣ 0 1 0 ⎦⎣ − 32 1 0 ⎦⎣ 3 0 −1 ⎦ = ⎣ 0 − 32 − 52 ⎦.
0
2 3
1
E3
−1 0
1
2 2 0 0 1
E2
E1
3
0
A
0
1 3
U
(26) Recall from the table on p. 78 that the result of the operation ri + krj → ri is completely reversed by ri − krj → ri (also see Exercise a in Section 2.2 on p. 86); therefore, ⎤ ⎤ ⎡ ⎡ ⎤ ⎡ 1 0 0 1 0 0 1 0 0 ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ E1−1 = ⎣ 32 1 0 ⎦ , E2−1 = ⎣ 0 1 0 ⎦ , E3−1 = ⎣ 0 1 0 ⎦. 0 0 1 1 0 1 0 − 23 1 Premultiplying both sides of the equation (26) by E1−1 E2−1 E3−1 yields A = E1−1 E2−1 E3−1 U. −1 −1 −1 The product E1 E2 E3 is unit lower triangular, with 1’s on the main diagonal (since it is a product of unit lower triangular matrices Ei−1 – see Exercise 40 in Section 1.3 on p. 34). It can be thought of as a result of performing a sequence of elementary row operations on the identity matrix: ⎡
1 0 ⎢ r3 − 23 r2 → r3 applied to I3 yields ⎣ 0 1 2 0 −3 ⎡ 1 ⎢ r3 + r1 → r3 applied to the above yields ⎣ 0
•
•
⎤ 0 ⎥ 0 ⎦ = E3−1 , 1 ⎤ 0 0 ⎥ 1 0 ⎦ = E2−1 E3−1 ,
1 − 23
1 ⎡
⎢ then r2 + 32 r1 → r2 applied to the above result leads to ⎣
•
1
0 1
3 2
1 − 23
⎤ 0 ⎥ 0 ⎦ = E1−1 E2−1 E3−1 . 1
Notice how each subsequent row operation (and the corresponding matrix multiplication) changes precisely one entry in the resulting matrix. More specifically, ri + krj → ri introduces the number k at the (i, j) entry. This works because the sequence of the j values is → ej T . 9 nonincreasing (in our case, j = 2, 1, 1), so that in ri + krj we always have rj = − Denoting L = E1−1 E2−1 E3−1 , we obtain an LU decomposition (also called LU factorization) of A : ⎤⎡ 2 1 1 0 0 ⎥⎢ ⎢ 3 3 5 A=⎣ 2 1 0 ⎦⎣ 0 − 2 − 2 2 1 1 −3 1 0 0 3 ⎡
1
L 9
⎤ ⎥ ⎦.
U
To convince yourself about the importance of the j sequence being nondecreasing, compare this to the product ⎤⎡ ⎤ ⎤⎡ ⎡ 1 0 0 1 0 0 1 0 0 ⎥⎢ ⎥ ⎥⎢ ⎢ E3 E2 E1 – the j values used are 1, 1, 2; therefore, E3 E2 E1 = ⎣ 0 1 0 ⎦ ⎣ 0 1 0 ⎦ ⎣ − 32 1 0 ⎦ 2 −1 0 1 0 0 1 0 3 1 ⎡ ⎤ 1 0 0 ⎢ ⎥ = ⎣ − 32 1 0 ⎦. −1
2 3
1
106
Chapter 2 Linear Systems Once an LU decomposition of the coefficient matrix A is known, the system → − → A− x = b can be rewritten as
→ − → L(U − x)= b → − y
and solved in two easy steps: •
first, solve
•
then
→ − → L− y = b, → → U− x =− y.
Since both coefficient matrices, L and U, are triangular, solving each of these is as simple as the backsubstitution discussed in Example 2.8. ⎡ ⎤ 1 ⎢ ⎥ → In particular, for the system A− x = ⎣ 5 ⎦, we have 2
⎤⎡ 0 0 ⎥⎢ ⎢ 3 1 0 ⎦⎣ ⎣ 2 1 − 23 1 ⎡
⎤ ⎡ y1 ⎥ ⎢ y2 ⎦ = ⎣ y3
1
→ − y
L
⎤ 1 ⎥ 5 ⎦ 2 → − b
or y1 3 2 y1
+ y2 2 y1 − 3 y2 which is easily solved from top to bottom: y1 = 1
+ y3
= 1 = 5 = 2
3 7 = 5 − y1 = 2 2 2 10 y3 = 2 − y1 + y2 = 3 3 ⎡ ⎤⎡ ⎤ ⎡ ⎤ 2 1 1 x1 1 ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎣ 0 − 32 − 52 ⎦⎣ x2 ⎦ = ⎣ 72 ⎦, 1 10 0 0 x3 3 3 y2
and
→ − x
U
→ − y
i.e., 2x1
+ −
x2 3 2 x2
+ −
x3 5 2 x3 1 3 x3
= = =
for which backsubstitution yields x3
=
x2
=
x1
=
10/3 = 10 1/3 7 5 2 + 2 x3 = −19 −3/2 1 − x2 − x3 =5 2
1 7 2 10 3
Section 2.4 Applications of Linear Systems and Matrix Factorizations
PA = LU decomposition
107
The elementary row operations executed at the beginning of Example 2.24 did not include any operations of the type kri → ri . The resulting pivots (and leading entries) were not made to equal 1, as the standard pivoting strategy would. Consequently, the matrix U was upper triangular, but not in row echelon form. We can think of this arrangement as postponing the row-scaling elementary operations kri → ri – they could be performed next, to result in a row echelon form, if we so desired. Considering a sequence of operations of the type ri + krj → ri guaranteed that the product of the corresponding elementary matrices, as well as its inverse, is a unit lower triangular matrix. This plan will work well as long as each pivotal column contains a nonzero pivot at the correct position. However, if a row interchange is required, then our plan needs to be modified, as demonstrated in the following example.
E XAMPLE 2.25
In order to obtain a pivot at the (1,1) entry of the matrix ⎡ ⎤ 0 0 2 ⎢ ⎥ A = ⎣ −4 3 2 ⎦ ,
2 1 5 the first row must be interchanged with one of the other two rows. If we interchange it with the second row, then creating a pivot in the second column will require an additional row interchange: ⎤ ⎤ ⎡ ⎤ ⎡ ⎡ −4 3 2 −4 3 2 −4 3 2 r ↔r ⎢ ⎥ ⎥ r2 ↔r3 ⎢ ⎥ r3 + 12 r1 →r3 ⎢ → A 1→ 2 ⎣ 0 0 2 ⎦ ⎣ 0 0 2 ⎦ → ⎣ 0 52 6 ⎦. 2 1 5
0
5 2
6
0
0 2 U
Using elementary matrices, this can be written as ⎤⎡ ⎡ ⎤ ⎤⎡ 1 0 0 1 0 0 0 1 0 ⎢ ⎥⎢ ⎥ ⎥⎢ ⎣ 0 0 1 ⎦⎣ 0 1 0 ⎦⎣ 1 0 0 ⎦A = U. 1 0 1 0 0 0 1 0 1 2 E3
E2
E1
The matrices E1 and E3 are not unit lower triangular; therefore, we will be unable to claim a decomposition A = LU of the type discussed in the last subsection. Here is how we can accomplish something comparable by modifying the original sequence of elementary row operations: •
Perform all the row interchanges (operations of type ri ↔ rj ) first, before any other row operations are performed. (These are exactly the same row interchanges performed in the original sequence.)
•
Follow with the sequence of operations of the type ri + krj → ri . The numbers of rows i and j may need to be adjusted to reflect the new positions of rows.
Implementing this for our matrix, we have ⎡ ⎤ ⎡ ⎤ −4 3 2 −4 3 2 r ↔r ⎢ ⎥ r ↔r ⎢ ⎥ r2 + 12 r1 →r2 → U, A 1→ 2 ⎣ 0 0 2 ⎦ 2→ 3 ⎣ 2 1 5 ⎦ 2 1 5 0 0 2
108
Chapter 2 Linear Systems i.e.,
⎡
⎤⎡ ⎤⎡ ⎤ 1 0 0 1 0 0 0 1 0 ⎢ 1 ⎥⎢ ⎥⎢ ⎥ ⎣ 2 1 0 ⎦⎣ 0 0 1 ⎦⎣ 1 0 0 ⎦A = U. 0 0 1 0 1 0 0 0 1 E3∗
E2∗
E1∗
⎤ ⎤ ⎡ 0 1 0 1 0 0 ⎥ ⎢ ⎥ ⎢ −1 Letting P = E2∗ E1∗ = ⎣ 0 0 1 ⎦ and L = (E3∗ ) = ⎣ − 12 1 0 ⎦ we arrive at the 0 0 1 1 0 0 decomposition P A = LU. ⎡
D EFINITION A matrix obtained by multiplying elementary n × n matrices corresponding to row interchanges ri ↔ rj is called a permutation matrix.
Such a matrix properly “stores” all of the row interchanges involved. E.g., it can be easily ⎤ ⎡ row2 A ⎥ ⎢ verified that in the last example, P A = ⎣ row3 A ⎦ , correctly reflecting the operations row1 A
r1 ↔ r2 ; r2 ↔ r3 .
Similarly to A = LU, the P A = LU decomposition is helpful when solving a linear system → − → A− x = b.
E XAMPLE 2.26
Solve the system
2x3 = 4 −4x1 + 3x2 + 2x3 = 13 2x1 + x2 + 5x3 = 3 The coefficient matrix of this system is the matrix A of Example 2.25. Using the decomposition obtained therein, ⎤ ⎤⎡ ⎤ ⎡ ⎡ ⎤⎡ 0 0 2 −4 3 2 1 0 0 0 1 0 ⎥ ⎥⎢ ⎥ ⎢ ⎢ ⎥⎢ ⎣ 0 0 1 ⎦⎣ −4 3 2 ⎦ = ⎣ − 12 1 0 ⎦⎣ 0 52 6 ⎦,
1 0 0 P
2 1 5 A
0 0 1
⎡
L
⎤
0
0 2 U
4 − → → ⎢ − ⎥ → we can solve the system A− x = b (with b = ⎣ 13 ⎦) by first premultiplying both sides by 3 P : → − → P A− x =P b and then using the decomposition → − → LU − x =P b. → → Denoting − y = U− x , it should be clear that the system can be, again, solved in two steps, each involving simple backsubstitution: •
Solve the system
→ − → L− y =P b.
Section 2.4 Applications of Linear Systems and Matrix Factorizations 109 ⎡ ⎤ 13 → − ⎢ ⎥ In our case, P b = ⎣ 3 ⎦ . A process similar to that in Example 2.24 yields ⎡ ⎢ − → y =⎣
13 19 2
⎤
4
⎥ ⎦ (check!).
4 •
To finish, solve the system
→ → U− x =− y.
⎡
⎤ −3 ⎢ ⎥ → Once again, we ask the reader to verify that backsubstitution yields − x = ⎣ −1 ⎦ . 2
Numerical considerations
In this book, we generally assume all computations are performed exactly. However, many of the real-life problems involve data for which only approximate values are available. Moreover, it is common for problems of practical importance to involve massive data sets so that the corresponding matrices and vectors tend to be very large. These are among the reasons why it may not be practical to insist on performing calculations exactly, but rather use the tools of numerical linear algebra, dealing with approximate data using finite precision computer arithmetic. The detailed study of numerical linear algebra is outside the scope of this text, but we shall occasionally find it appropriate to refer to some numerical aspects of the topics we cover. One of the most frequently used numerical procedures for (approximately) solving linear systems is known as Gaussian elimination with partial pivoting. It is based on a modification of the Gaussian elimination we introduced in Section 2.1. In partial pivoting, each time we select a pivot in a pivotal column, we do so by selecting the value of the largest magnitude among the eligible entries (i.e., at or below the pivot location) – this is done to minimize the potential for error growth in the computation. The solution is typically carried out using the P A = LU decomposition. Refer to Exercise 45 for an example.
EXERCISES
In Exercises 1–4, balance each chemical reaction using the smallest positive integers. 1. x1 N2 O5 → x2 NO2 + x3 O2 2. x1 Fe2 O3 + x2 CO→ x3 FeO + x4 CO2 3. x1 C10 H16 + x2 Cl2 → x3 C + x4 HCl 4. x1 As + x2 NaOH→ x3 Na3 AsO3 + x4 H2 In Exercises 5–8, some reaction equations contain errors, making them impossible to balance. If possible, balance each reaction; if it is not possible, state so. 5. x1 C6 H12 O6 → x2 C2 H5 OH + x3 CO
110
Chapter 2 Linear Systems
x
6. x1 C2 H6 + x2 O2 → x3 CO2 + x4 H2 O
y 300
400
w
v
z
200
100
100
8. x1 AgI + x2 Fe2 (CO3 )3 → x3 FeI2 + x4 Ag2 CO3
400
300 Figure for Exercise 9
In Exercises 9–12, consider the given network of one-way streets along with the traffic flow figures (in vehicles per hour). Solve for the traffic flow at all one-way street segments that are not given. There are infinitely many solutions with one or two arbitrary values – describe the feasible set of these values corresponding to nonnegative rates of flow. Provide one specific example involving only positive numbers.
x y
z 300
9. The one-way street network involving 4 intersections (equations) and five unknowns. 10. The one-way street network involving 6 intersections (equations) and six unknowns. (Note that to simplify matters, we are using x twice – why is this legitimate?)
u v
w
x
7. x1 Rb3 PO4 + x2 CrCl3 → x3 RbCl + x4 CrPO4
11. The one-way street network involving 4 intersections and 6 unknowns (you need to label them first, then draw an arrow diagram). 12. The given interstate interchange can be viewed as a one-way street network. Draw an appropriate arrow diagram (make sure you correctly identify all six “intersection” points where lanes either split or merge).
200
400 In Exercises 13–17, consider the following five alloys: Figure for Exercise 10
100
600
Alloy I
Alloy II
Alloy III
Alloy IV
Alloy V
gold
22/24
14/24
18/24
18/24
18/24
silver
1/24
6/24
0
3/24
2/24
copper
1/24
4/24
6/24
3/24
4/24
13. How can the alloys I, II, and III be mixed to obtain alloy V? 14. How can the alloys II, III, and IV be mixed to obtain alloy V? 15. How can the alloys I, II, III, and V be mixed to obtain alloy IV?
500
300
16. How can the alloys I, III, IV, and V be mixed to obtain alloy II? 17. Of the set of the three alloys III, IV, and V, which one can be obtained by mixing the remaining ones in the set?
200 200 Figure for Exercise 11 18. Most of the coins presently in circulation in the United States are made of alloys of copper and nickel. The following table specifies approximate nickel and copper content in one dollar’s worth of the following three coins:
3000 1000 ?
20 nickels (=$1)
10 dimes (=$1)
1 Susan B. Anthony Dollar
nickel [grams]
25
2
1
copper [grams]
75
21
7
5000 4000
? ?
?
Coins that are no longer fit for circulation (worn or mutilated) are melted at the United States Mint and reused for manufacturing new coins.
?
5000 2000 Figure for Exercise 12
?
Which one of the three denominations can be obtained by melting coins of the remaining two? In what proportion should these two be mixed together? What is the net “gain” or “loss” realized in processing $1 worth of old coins (excluding production costs)?
Section 2.4 Applications of Linear Systems and Matrix Factorizations
111
19. Find the Lagrange interpolating polynomial of degree 2 or less that passes through (0, 2), (1, 4), and (2, 0). 20. Find the Lagrange interpolating polynomial of degree 2 or less that passes through (0, 1), (2, 1), and (3, 7). 21. Find the Lagrange interpolating polynomial of degree 3 or less that passes through (0, 0), (1, 2), (2, 2), and (3, 0). 22. Find the Lagrange interpolating polynomial of degree 3 or less that passes through (−1, 0), (0, 6), (1, 6), and (2, 12).
ò
23. The Hermite interpolating polynomial p(x) of degree 2n + 1 satisfies the conditions p(xi ) = yi and p (xi ) = di for i = 0, 1, . . . , n, where the values xi , yi , and di are given (and xi ’s are distinct). Use a linear system to determine p(x) = a0 + a1 x + a2 x2 + a3 x3 such that p(0) = 2, p (0) = 1, p(1) = 4, and p (1) = 0.
ò
24. Repeat Exercise 23 for p(1) = 1, p (1) = 1, p(2) = 1, and p (2) = −1.
ò
25. If possible, find a polynomial p(x) = a0 + a1 x + a2 x2 + a3 x3 such that p(0) = 1, p (0) = 1, p (2) = 0, and p(3) = 2. Note that this is not a Hermite interpolating polynomial – e.g., at x = 2, the first derivative is specified, but the value is not. This is an example of a Hermite-Birkhoff interpolation problem, and it may or may not have a solution (even if it does, the solution need not be unique).
ò
26. (Another Hermite-Birkhoff interpolation problem) If possible, find a polynomial p(x) = a0 + a1 x + a2 x2 + a3 x3 such that p(0) = −1, p (0) = 2, p (1) = 2, and p(2) = −1.
ò
27. Given the values x0 < x1 < · · · < xn and the corresponding values y0 , y1 , . . . , yn , a natural cubic spline is defined by ⎧ ⎪ p0 (x) = y0 + b0 (x − x0 ) + c0 (x − x0 )2 + d0 (x − x0 )3 if x0 ≤ x < x1 ⎪ ⎪ ⎪ ⎪ ⎪ p1 (x) = y1 + b1 (x − x1 ) + c1 (x − x1 )2 + d1 (x − x1 )3 if x1 ≤ x < x2 ⎪ ⎨ .. s(x) = . ⎪ ⎪ ⎪ ⎪ pn−1 (x) = yn−1 + bn−1 (x − xn−1 ) + cn−1 (x − xn−1 )2 + dn−1 (x − xn−1 )3 ⎪ ⎪ ⎪ ⎩ if xn−1 ≤ x ≤ xn where pi (xi+1 ) = yi+1 for i = 0, . . . , n − 1, (27) pi−1 (xi ) = pi (xi ) for i = 1, . . . , n − 1, pi−1 (xi ) = pi (xi ) for i = 1, . . . , n − 1, p0 (x0 ) = pn−1 (xn ) = 0. For n = 2, set up and solve a system of 6 linear equations in 6 unknowns b0 , c0 , d0 , b1 , c1 , d1 to determine the natural cubic spline passing through the points (0, 2), (1, 3), and (2, 0). After forming the polynomials p0 and p1 , verify that the six conditions in (27) hold true.
112
Chapter 2 Linear Systems
ò
28. * Use the formulas of the previous exercise to set up a system of 9 equations in 9 unknowns to determine the natural cubic spline passing through the points (0, 0), (1, 1), (2, 5), and (3, 6). Use the Linear Algebra Toolkit to solve this system. After forming the polynomials p0 , p1 , and p2 , verify that the nine conditions in (27) hold true. In Exercises 29–32, for the system with the given augmented matrix, find all values of a that correspond to (i) no solution, (ii) one solution, (iii) many solutions. 1 3 1 29. a 6 2 2 4 8 30. 3 a−1 1 a 1 0 31. −1 a 0 a − 1 −2 0 32. −2 a − 1 0 In Exercises 33–36, for the system with the given augmented matrix, find all values of a and b that correspond to
33.
34.
35.
36.
(i) no solution, (ii) one solution, (iii) many solutions. 1 a+1 0 a−1 3 b a 0 b 1 b 0 a b 0 b a 0 ⎡ ⎤ 1 1 2 ⎢ ⎥ ⎢ a b 0 ⎥ ⎢ ⎥ ⎢ a2 b2 2 ⎥ ⎣ 3 ⎦ a3 b3 0 In Exercises 37–39, prove the following properties of an n × n permutation matrix P.
37. * Each row of P and each column of P contains exactly one nonzero entry, which equals 1. 38. * If the (i, j) entry of P, pij is 1, then rowi (P A) =rowj A. 39. * P T P = P P T = In .
Section 2.4 Applications of Linear Systems and Matrix Factorizations
113
40. After the first three elementary row operations in Example 2.13, the matrix ⎡ ⎤ ⎡ ⎤ 1 −1 0 1 −1 0 ⎢ ⎥ ⎢ ⎥ A=⎣ 1 0 −1 ⎦ was transformed to U = ⎣ 0 1 −1 ⎦ . Determine the unit −6 2 3 0 0 −1 lower triangular matrix L such that A = LU. Verify that multiplying LU yields A. In Exercises 41–43, for the given linear system: a. Find an LU decomposition of the coefficient matrix of the system A (verify). → − → b. Use the decomposition found in part a to solve the given system: first solve L− y = b, → − → − then U x = y . Verify that your solution agrees with the answer printed in Appendix A for each original exercise. 41. The linear system of Exercise 13 on p. 75 in Section 2.1. 42. The linear system of Exercise 19 on p. 75 in Section 2.1. 43. The linear system of Exercise 23 on p. 76 in Section 2.1. ⎡
⎤ 1 −1 1 ⎢ ⎥ 44. Check that when the matrix A = ⎣ 1 2 2 ⎦ undergoes the elementary row oper−1 2 1 ations⎡r2 − r1 → r2 ,⎤r3 + r1 → r3 , r2 ↔ r3 , and r3 − 3r2 → r3 , it is transformed to 1 −1 1 ⎢ ⎥ U =⎣ 0 1 2 ⎦ . Determine the unit lower triangular matrix L and the permutation 0 0 −5 matrix P such that P A = LU. Verify this equality. 45. For the linear system of Exercise 42 above, find a P A = LU decomposition that avoids introducing fractions into the matrices L and U (apply r1 ↔ r2 before any other elementary row operations). Use this decomposition to solve the system.
114
Chapter 2 Linear Systems
2.5 Chapter Review
Section 2.5 Chapter Review
115
116
3
Chapter 3 Determinants
Determinants
You are likely to have already worked with determinants of 2 × 2, or even 3 × 3, matrices. In this chapter, we shall discuss determinants of those and larger square matrices, as well as their applications. In the previous chapter, we introduced some methods to solve systems of linear equations. It turns out that one of the important applications of determinants involves solving such linear systems. For instance, consider a system ax + by = c dx + ey = f Using a few algebra steps, you should be able to verify that ce − bf x= ae − bd and af − cd y= ae − bd solve our system (as long as ae = bd). As we will learn in this chapter, these ratios involve determinants of some 2 × 2 matrices; we shall also learn how to extend this kind of reasoning to find unique solutions of systems of n linear equations in n unknowns for any positive integer n by using determinants of n × n matrices.
3.1 Cofactor Expansions D EFINITION (Determinant) The determinant of a 1 × 1 matrix A = [a11 ] is det A = a11 . If n > 1, the determinant of an n × n matrix A is defined recursively: n (−1)1+j a1j det M1j det A =
(28)
j=1
= a11 det M11 − a12 det M12 + a13 det M13 − · · · + (−1)1+n a1n det M1n where Mij , referred to as the i, j minor of A, denotes the (n − 1) × (n − 1) submatrix obtained by deleting the ith row and the jth column from A.
In addition to det A, some sources use |A| to denote the determinant of A. This notation should not be misread to imply that the determinant of A is always nonnegative – e.g., any 1 × 1 matrix whose only entry is a negative number will have the same number as its determinant.
Section 3.1 Cofactor Expansions
117
It is sometimes convenient to introduce the i, j cofactor of A Aij = (−1)i+j det Mij . Formula (28) can then be rewritten as n det A = a1j A1j = a11 A11 + a12 A12 + · · · + a1n A1n . j=1
E XAMPLE 3.1
det
2 1 −3 4
= (2) det [4] − (1) det [−3] = (2)(4) − (1)(−3) = 11.
Generally, for a 2 × 2 matrix A =
a11 a21
a12 a22
, the definition of the determinant yields
M11
a21 a22
M12
det A = a11 det([a22 ]) − a12 det([a21 ])
a11 a12
resulting in the standard formula det A = a11 a22 − a12 a21 .
(29)
The following example uses this formula when evaluating determinants of 2 × 2 minors while calculating the determinant of a larger matrix. ⎡
E XAMPLE 3.2
⎤ 0 2 1 2 −4 1 −4 1 2 ⎢ ⎥ det⎣ 1 2 −4 ⎦ = 0 det −2 det +1 det 1 3 2 3 2 1 2 1 3
= (−2)[(1)(3) − (−4)(2)] + (1)[(1)(1) − (2)(2)]= (−2)(11) − 3 = −25.
a11 a12 a13 a11 a12 a21 a22 a23 a21 a22 a31 a32 a33 a31 a32 copy
We can derive a formula that will provide us with an alternative approach to computing the ⎡ ⎤ a11 a12 a13 ⎢ ⎥ determinant of a 3 × 3 matrix A = ⎣ a21 a22 a23 ⎦: a31 a32 a33 a22 a23 a21 a23 a21 a22 det A = a11 det( ) − a12 det( ) + a13 det( ) a32 a33 a31 a33 a31 a32 = a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 ) so that det A = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a11 a23 a32 − a12 a21 a33 − a13 a22 a31 .
(30)
(We leave it as an easy exercise for the reader to verify this formula by applying it to the matrix of Example 3.2.) As a mnemonic device, it can be helpful to think of the formula (29) as multiplying the entries along the main diagonal, then subtracting the product obtained along the other diagonal. There is a similar, albeit more elaborate, scheme for 3 × 3 matrices: after copies of the first two columns are appended to the matrix, six “diagonals” are used, as shown in the margin.
118
Chapter 3 Determinants Attempting to expand a 4 × 4 determinant would be a tedious process, producing 24 terms – each being a product of four entries. It must be noted that no “diagonal” mnemonic devices exist for matrices larger than 3×3. Instead, we can use the definition of the determinant, as shown below. ⎤ 0 2 3 1 ⎥ ⎢ ⎢ 1 −1 0 0 ⎥ ⎥ ⎢ E XAMPLE 3.3 det ⎢ 1 ⎥ ⎦ ⎣ 0 −2 0 1 1 0 −2 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 1 −1 0 1 −1 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = 0 − 2 det ⎣ 0 0 1 ⎦ + 3 det ⎣ 0 −2 1 ⎦ − 1 det ⎣ 0 −2 0 ⎦ 1 0 −2 1 1 −2 1 1 0 0 1 −2 1 0 1 − (−1) det + 0) = −2(1 det − 0 + 0) + 3(1 det 0 −2 1 −2 1 −2 0 0 −2 0 − (−1) det + 0) − (1 det 1 0 1 0 ⎡
= −2(0 − (1)(0)) + 3 [((−2)(−2) − (1)(1)) + (0 − (1)(1))] − [((−2)(0) − 0) + (0 − 0)] = 6.
Cofactor expansion along any row or column
While the definition of the determinant was based on the cofactor expansion along the first row, the result below will show that any other row or any column can be used instead – proof of this result is rather technical and is placed at the end of this section.
T HEOREM 3.1 For n > 1, the determinant of an n × n matrix A can be obtained by cofactor expansion along the ith row: n n det A = aij Aij = (−1)i+j aij det(Mij ) (31) j=1
j=1
for any i = 1, . . . , n, as well as by cofactor expansion along the jth column: n n det A = aij Aij = (−1)i+j aij det(Mij ) i=1
i=1
for any j = 1, . . . , n.
C OROLLARY 3.2
If a square matrix A has a zero row or zero column, then det A = 0.
For determinants of large matrices, it may be helpful to choose a cofactor expansion along a specific row or column that takes advantage of the structure of the matrix.
Section 3.1 Cofactor Expansions
119
p. 118, the initial cofactor expansion of the determinant of E XAMPLE 3.4 ⎡ In Example 3.3 on ⎤ 0 2 3 1 ⎢ ⎥ ⎢ 1 −1 0 0 ⎥ ⎢ ⎥ has been performed along the first row. the matrix A = ⎢ 1 ⎥ ⎣ 0 −2 0 ⎦ 1 1 0 −2 According to Theorem 3.1, we could obtain the same determinant by expanding along any of the remaining rows or along any column. For instance, an expansion along the second row yields ⎡ ⎤ ⎡ ⎤ 2 3 1 0 3 1 ⎢ ⎥ ⎢ ⎥ det A = −(1) det ⎣ −2 0 1 ⎦ + (−1) det ⎣ 0 0 1 ⎦−0+0
Notice the pattern of signs (−+−+) in this expansion. Keep in mind that the i, j cofactor includes a factor of (−1)i+j . When expanding along the ith row or jth column, you should determine the first sign in that pattern from (−1)i+1 or (−1)1+j ; after that the subsequent signs in the pattern will alternate.
= −(−3 det
−2 1 0 −2 −2 1 0 1 + 0 − 0) − (−3 det + 0 − 0) 1 −2 1 −2 1 0
= (3)(4 − 1) + (3)(0 − 1) = 6. Note that this new expansion involved only two 3×3 determinants (each of which was evaluated by expanding along the second column), whereas the expansion in Example 3.3 required three. While this helped reduce the complexity of our calculation, it turns out we can simplify things even more by expanding along the third column instead: ⎤ ⎡ 1 −1 0 ⎥ ⎢ det A = (3) det ⎣ 0 −2 1 ⎦−0+0−0 1 =
1 −2
(3)(4 − 1 + 0 − 0 − 1 − 0) = 6.
Because of the three zeros in that column, our expansion involved just one 3 × 3 determinant (which we evaluated by applying formula (30)).
Properties of determinants
Determinants of some types of matrices can be calculated easily. T HEOREM 3.3 If A is an upper triangular n × n matrix, then its determinant is equal to the product of the entries on the main diagonal: det A = a11 a22 · · · ann .
To establish that the statement S(i) holds true for i = 1, 2, . . ., the proof by induction proceeds in two parts:
P ROOF We prove our result by induction on the size of the matrix, n.
Induction basis – proves S(1).
•
Induction basis: We show the statement of the theorem holds true for n = 1. From the definition of the determinant, det[a11 ] = a11 .
Induction step – proves S(n − 1) ⇒ S(n).
•
Induction step: Assuming the statement of the theorem holds true for all (n − 1) × (n − 1) matrices, i.e.,
As a result, we have a chain of implications S(1) ⇒ S(2) ⇒ S(3) ⇒ · · · justifying that S(i) is valid for all i = 1, 2, . . ..
the determinant of any (n − 1) × (n − 1) upper triangular matrix equals the product of the entries on the main diagonal, we show that the statement holds for all n × n matrices.
induction hypothesis
120
Chapter 3 Determinants By cofactor expansion along the first column, det A = a11 det M11 − a21 det M21 + a31 det M31 − · · · + (−1)n+1 an1 det Mn1 . However, since A is upper triangular, a21 = a31 = · · · = an1 = 0. Furthermore, M11 is an (n − 1) × (n − 1) upper triangular matrix, so that from the induction hypothesis det M11 = a22 a33 · · · ann . Consequently, det A = a11 det M11 = a11 a22 a33 · · · ann .
Theorem 3.3 implies that the determinant of any diagonal matrix is equal to the product of the main diagonal entries. Here is an important special case of this situation.
C OROLLARY 3.4
T HEOREM 3.5
For any positive integer n, det In = 1.
If A is a square matrix with rowi A =rowj A (i = j), then det A = 0.
P ROOF
The proof proceeds by induction. For n = 2, the statement is obviously true: det
a a
b b
=
ab − ab = 0. Assuming the statement is true for all (n − 1) × (n − 1) matrices (induction hypothesis), we want to show it for all n × n matrices. Let A be such a matrix whose ith and jth rows are identical. Calculating det A by cofactor expansion along any other row k involves minors that are (n − 1) × (n − 1) matrices and have two identical rows. Therefore, determinants of all these minors are zero, making det A = 0 as well.
Our next theorem establishes the linearity of the determinant with respect to a single row of the matrix.
T HEOREM 3.6
Let i be an integer 1 ≤ i ≤ n, and let A, B, C be n × n matrices such that αrowi A + βrowi B = rowi C
and rowj A = rowj B = rowj C for all j = i. Then α det A + β det B = det C. P ROOF Using the cofactor expansion along the ith row (formula (31)) we obtain n n α det A + β det B = α (−1)i+j aij det Mij + β (−1)i+j bij det Mij j=1
=
j=1
n
(−1)i+j (αaij + βbij ) det Mij j=1 cij
= det C. (Note that the minors Mij are the same in all three matrices A, B, and C.)
Section 3.1 Cofactor Expansions
121
A word of caution is appropriate here: the linearity of the determinant with respect to a row of the matrix does not mean the determinant is linear with respect to the matrix. Here are a few counterexamples: 0 0 1 0 0 0 1 0 + det = det( + ), det 0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 2 det = det(2 ). 0 1 0 1 2
4
therefore, generally det A + det B = det(A + B)
and k det A = det(kA).
The property that follows addresses the relationship between the determinant of a matrix and that of its transpose.
T HEOREM 3.7
For every n × n matrix A, det A = det AT .
P ROOF The proof proceeds by induction. The statement of the theorem obviously holds true for all 1 × 1 matrices since they are always symmetric. Our induction hypothesis is that det A = det AT for all (n − 1) × (n − 1) matrices. Assuming this, let us show that det A = det AT for all n × n matrices as well. Let B = AT and let Nij denote the i, j minor of B. Expanding det B along the first row (by definition) n det AT = b1j (−1)1+j det(N1j ). j=1
The minor N1j is the transpose of the j, 1 minor of A, Mj1. From the induction hypothesis, both minors have equal determinants, so that n det AT = aj1 (−1)1+j det(Mj1 ), j=1
which is a cofactor expansion along the first column of det A.
As a consequence of this, the results in this section that were stated for rows can now be restated for columns as well; e.g., •
the determinant of a lower triangular matrix is the product of its main diagonal entries;
•
the determinant of a square matrix having two equal columns is 0;
•
the determinant of A is linear with respect to a column of A.
122
Chapter 3 Determinants
Row operations and determinants
The following theorem specifies how elementary row operations affect the determinant. T HEOREM 3.8
If C results from an n × n matrix A by performing an operation
1. krp → rp , then det C = k det A, 2. rp + krq → rp , then det C = det A (if n > 1 and p = q), and 3. rp ↔ rq , then det C = − det A (if n > 1 and p = q). P ROOF (1) Follows from Theorem 3.6 by letting α = k and β = 0. (2) Let B be the matrix obtained by copying all rows of A; then overwriting the qth row with the pth row: ⎡ ⎤ row1 A ⎢ ⎥ .. ⎢ ⎥ . ⎢ ⎥ ⎢ ⎥ ⎢ rowq−1 A ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ rowp A ⎥ ← qth row. ⎢ ⎥ ⎢ ⎥ B = ⎢ rowq+1 A ⎥ ⎢ ⎥ .. ⎢ ⎥ . ⎢ ⎥ ⎢ ⎥ ⎢ rowp A ⎥ ⎢ ⎥ ⎢ ⎥ .. ⎢ ⎥ . ⎣ ⎦ rown A By Theorem 3.5, det B = 0. On the other hand, by Theorem 3.6, letting α = 1 and β = k we have det A + k det B = det C. Consequently, det A = det C. (3) This property follows from the previous two properties. According to them, the sequence of elementary row operations rq + rp → rq rp − rq → rp rq + rp → rq −1rp → rp ⎡ . ⎤ ⎤ ⎡ ⎤ ⎡ ⎤ .. .. .. .. ⎤ ⎡ .. . . . . ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ → − → → →T ⎥ →T ⎥ ⎢ ⎢ − ⎢ − ⎥ ⎢ ⎥ ⎢ −− uT −− vT vT ⎥ ⎢ v ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ pth row → ⎢ u ⎥ ⎢ ⎢ . ⎥ ⎢ . ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ .. .. .. ⎢ . ⎥ ⎢ . ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ . . . ⎢ . ⎥ ⎢ . ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ ⎢ ⎢ − ⎢ − ⎥ ⎥ ⎥ ⎥ ⎥ → − → − → → → − T ⎥ ⎢ − T T ⎥ ⎢ − T T ⎥ T ⎥ T ⎥ ⎢ ⎢ → qth row → ⎢ → ⎣ u ⎦ ⎣ v ⎦ ⎣ u + v ⎦ ⎣ u + v ⎦ ⎣ u ⎦ .. .. .. .. .. . . . . . results in multiplying the determinant by the factor (−1) – on the other hand, this sequence of row operations is equivalent to the operation rp ↔ rq . ⎡
LA TOOLKIT .COM IL 26748372846 NG 03571894251 EE 71028335189 AB 25729491092 RR 17398108327 + A 39163728081 The module “calculating the determinant using row operations” can be used to create your own annotated examples. Try making up your own matrices (while you can go up to 12 by 12, you should probably begin your experiments with some 3×3 and 4×4 matrices).
Based on Theorem 3.8, it is possible to calculate det A by row reducing A to an upper triangular matrix B (e.g., in r.e.f.) whose determinant can then be found easily by Theorem 3.3, as long as we make sure to keep track of all changes to the determinant value resulting from the elementary row operations. ⎡
⎤ 0 2 1 ⎢ ⎥ E XAMPLE 3.5 Let us find the determinant of A = ⎣ 1 2 −4 ⎦ by performing a se2 1 3 quence of elementary row operations.
Section 3.1 Cofactor Expansions
123
To place a nonzero pivot in the (1,1) entry, we use the elementary row operation r1 ↔ r2 . As a result, we obtain (by part 3 of Theorem 3.8) ⎡ ⎤ 1 2 −4 ⎢ ⎥ A1 = ⎣ 0 2 1 ⎦ with det A1 = − det A. 2 1 3 We eliminate the pivotal column entry under the pivot: r3 − 2r1 → r3 . By part 2 of Theorem 3.8, such operation does not influence the determinant. ⎡ ⎤ 1 2 −4 ⎢ ⎥ A2 = ⎣ 0 2 1 ⎦ with det A2 = det A1 . 0 −3 11 To create a pivot = 1 in the (2, 2) entry, we scale the second row: 1 r2 → r2 . 2 The determinant scales by the same factor (part 1 of Theorem 3.8): ⎤ ⎡ 1 2 −4 ⎢ 1 ⎥ with det A = 1 det A . A3 = ⎣ 0 1 3 2 2 ⎦ 2 0 −3 11 Finally, eliminate the entry under the pivot: r3 + 3r2 → r3 . Once again, the determinant does not change. ⎤ ⎡ 1 2 −4 ⎢ 1 ⎥ with det A = det A . A4 = ⎣ 0 1 4 3 2 ⎦ 0 0
25 2
By Theorem 3.3, det A4 = (1)(1)( 25 2 ). On the other hand, 1 1 1 25 = det A4 = det A3 = det A2 = det A1 = (− det A) . 2 2 2 2 Solving for det A, we obtain det A = −25.
We shall now discuss determinants of elementary matrices. T HEOREM 3.9
Let E be an n × n elementary matrix.
1. If E corresponds to the operation k rowi → rowi , then det E = k. 2. If E corresponds to the operation rowi + k rowj → rowi , then det E = 1. 3. If E corresponds to the operation rowj ↔ rowi , then det E = −1. 4. If A is an n × n matrix, then det(EA) = det E det A. P ROOF For part 1, apply the elementary row operation k rowi → rowi to In : B = EIn . From part 1 of Theorem 3.8, det B = k det In . Since by Corollary 3.4 det In = 1 and B = E, we have det E = k.
124
Chapter 3 Determinants Parts 2 and 3 are proved in the same way. To prove part 4, we investigate three possible cases: •
Case I: E corresponds to the operation k rowi → rowi . By part 1 of Theorem 3.8, det(EA) = k det A = det E det A (by part 1 of this theorem).
•
Case II: E corresponds to the operation rowi + k rowj → rowi . By part 2 of Theorem 3.8, det(EA) = det A, which by part 2 of this theorem also equals det E det A.
•
Case III: E corresponds to the operation rowj ↔ rowi . Follows similarly from part 3 of Theorem 3.8 and part 3 of this theorem.
Theorems 3.8 and 3.9 allow us to justify the following important results.
T HEOREM 3.10
If A is a square matrix, then det A = 0 if and only if A is singular.
P ROOF By Theorem 2.13, A is nonsingular if and only if it is row equivalent to I. Therefore, the determinant of a nonsingular matrix A, after it is multiplied by a sequence of nonzero numbers (corresponding to the elementary row operations), becomes equal to det I = 1. Consequently, det A = 0. On the other hand, the determinant of a singular matrix A, after multiplying by the nonzero numbers, becomes equal to the determinant of the r.r.e.f. matrix with at least one zero row. By Corollary 3.2, the determinant of such matrix is 0; therefore, det A = 0 as well.
T HEOREM 3.11
For all n × n matrices A and B det(AB) = det A det B.
P ROOF If A is row equivalent to I, then Ek · · · E1 A = I, which, by repeatedly applying Theorem 3.9 1 implies det Ek · · · det E1 det A = det I = 1; therefore, det A = det Ek ··· det E1 . On the other hand, Ek · · · E1 AB = B and det Ek · · · det E1 det(AB) = det(B). If A is not row equivalent to I, then det A = 0 and there exists a matrix C with at least one zero row such that Ek · · · E1 A = C. Consequently, Ek · · · E1 AB = CB. Since CB has a zero row, we must have det(CB) = 0 by Corollary 3.2. As none of the det Ej can be zero, we must have det(AB) = 0.
For any nonsingular matrix A, we have AA−1 = I so that from Theorem 3.11 and Corollary 3.4 we have det A det A−1 = 1. This can be rewritten as follows.
C OROLLARY 3.12
If A is a nonsingular matrix, then det A−1 = 1/ det A.
Section 3.1 Cofactor Expansions
Five equivalent statements
125
Theorem 3.10 allows us to add yet another equivalent statement to the list from p. 92. 5 Equivalent Statements For an n × n matrix A, the following statements are equivalent. 1. A is nonsingular. 2. A is row equivalent to In . → − → − → 3. For every n-vector b , the system A− x = b has a unique solution. → − → 4. The homogeneous system A− x = 0 has only the trivial solution. 5. det A = 0. 5 Equivalent “Negative” Statements For an n × n matrix A, the following statements are equivalent. -1. A is singular. -2. A is not row equivalent to In . → − → − → -3. For some n-vector b , the system A− x = b has either no solution or many solutions. → − → -4. The homogeneous system A− x = 0 has nontrivial solutions. -5. det A = 0.
Proof of Theorem 3.1 on p. 118
For n = 2, we have M11
det A
M12
= a11 a22 − a12 ( a21 )
Expansion along the first row (Definition)
M22
M21
= −a21 ( a12 ) + a22 a11 Expansion along the second row M11
M21
= a11 a22 − a21 ( a12 ) M12
Expansion along the first column
M22
= −a12 ( a21 ) + a22 ( a11 ) Expansion along the second column
We assume the statement of the theorem holds true for any (n − 1) × (n − 1) square matrix – this is our induction hypothesis. Our task now is to establish the validity of the statement for an n × n matrix A. By definition, det A =
n j=1
a1j (−1)1+j det M1j .
(32)
126
Chapter 3 Determinants Since M1j is an (n − 1) × (n − 1) matrix, its determinant can be expanded along any of its rows j−1 n det M1j = apq (−1)p−1+q det M1,p;q,j + apq (−1)p−1+q−1 det M1,p;j,q (33) q=1
q=j+1
or columns det M1j det M1j
=
=
n p=2 n
apq (−1)p−1+q det M1,p;q,j if q < j,
(34)
apq (−1)p−1+q−1 det M1,p;j,q if q > j,
p=2
where M1,p;j,q is the (n − 2) × (n − 2) submatrix of A obtained by deleting its 1st and pth rows, as well as jth and qth columns: ⎡
M1,p;j,q
a21 .. .
⎢ ⎢ ⎢ ⎢ ⎢ ap−1,1 =⎢ ⎢ a ⎢ p+1,1 ⎢ .. ⎢ . ⎣ an1
··· ··· ··· ···
a2,j−1 .. . ap−1,j−1 ap+1,j−1 .. . an,j−1
a2,j+1 .. . ap−1,j+1 ap+1,j+1 .. . an,j+1
··· ··· ··· ···
a2,q−1 .. . ap−1,q−1 ap+1,q−1 .. . an,q−1
a2,q+1 .. . ap−1,q+1 ap+1,q+1 .. . an,q+1
··· ··· ··· ···
a2n .. . ap−1n ap+1n .. . ann
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎦
Combining (32) and (33) results in j−1 n 1+j det A = a1j (−1) apq (−1)p−1+q det M1,p;q,j q=1
j=1 n
+
⎞
apq (−1)p−1+q−1 det M1,p;j,q ⎠
q=j+1 n
=
⎛ ⎞ q−1 n apq (−1)p+q ⎝ a1j (−1)1+j det M1,p;j,q + a1j (−1)j det M1,p;q,j ⎠
q=1 n
=
j=1
j=q+1
apq (−1)p+q det Mpq .
q=1
Likewise, (32) and (34) lead to q−1 n 1+j det A = a1j (−1) apq (−1)p−1+q−1 det M1,p;j,q + a1q (−1)1+q det M1,q p=2
j=1
+
n
a1j (−1)1+j
apq (−1)p−1+q det M1,p;q,j
p=2
j=q+1
= a1q (−1)
n
1+q
det M1,q ⎛ ⎞ q−1 n n + apq (−1)p+q ⎝ a1j (−1)1+j det M1,p;j,q + a1j (−1)j det M1,p;q,j ⎠ p=2
=
n p=1
j=1
apq (−1)p+q det Mpq .
j=q+1
Section 3.1 Cofactor Expansions
EXERCISES
In Exercises 1–2, i. calculate the determinant by cofactor expansion along a row or column, ii. calculate the determinant by performing elementary row operations, iii. calculate the determinant using formulas (29) and (30). 1. a. det
2. det
1 3 2 1
1 6 −1 1
⎡
⎤ 1 3 0 ⎢ ⎥ b. det ⎣ 0 3 5 ⎦; 2 0 1
;
⎡
9 ⎢ c. det ⎣ 8 0
⎡
⎤ 2 4 0 ⎢ ⎥ b. det ⎣ 1 1 2 ⎦; 1 0 −1
;
⎤ 7 2 ⎥ 6 7 ⎦. 0 0
⎡
⎤ 0 1 0 ⎢ ⎥ c. det ⎣ −1 1 1 ⎦. 2 1 2
In Exercises 3–4, i. calculate the determinant by cofactor expansion along a row or column, ii. calculate the determinant by performing elementary row operations. Make sure your answers match. ⎡ ⎤ ⎡ 1 1 0 3 ⎢ ⎢ ⎥ ⎢ ⎢ ⎢ 0 0 1 0 ⎥ ⎥ ⎢ ; b. det ⎢ 3. a. det ⎢ ⎢ ⎥ ⎢ ⎣ 0 2 2 4 ⎦ ⎣ −2 0 0 3 ⎡
1 0 ⎢ ⎢ 0 0 4. a. det ⎢ ⎢ 0 2 ⎣ 1 0
0 1 2 0
1 0 4 4
⎡
⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥; b. det ⎢ ⎥ ⎢ ⎢ ⎦ ⎣
0 2 0 0 4
−1 0 2 0 4
2 0 0 0 0 1 2 0 0 −1 0 0 1 0 1 3 0 2 0 0 0 1 1 0 0
0 1 0 3 0 2 3 −2 2 0 0 1 1 0 0
⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦ ⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦
In Exercises 5–8, calculate each determinant using any method. ⎡ ⎤ ⎡ ⎤ 2 0 1 2 1 3 0 −2 ⎢ ⎥ ⎢ ⎥ 5. a. det ; b. det ⎣ −3 1 2 ⎦; c. det ⎣ 2 −1 −2 ⎦ . 3 −7 0 2 1 1 1 2 6. a. det
⎡
2 6 −1 1
⎡
;
2 1 0 ⎢ ⎢ 4 0 2 7. a. det ⎢ ⎢ 1 −1 1 ⎣ 0 0 −1
⎤ 0 1 −2 ⎢ ⎥ b. det ⎣ 3 −2 0 ⎦; 4 1 3 3 0 2 2
⎤
⎡
⎢ ⎥ ⎢ ⎥ ⎢ ⎥; b. det ⎢ ⎢ ⎥ ⎢ ⎦ ⎣
−1 0 1 0
⎡
⎤ 4 1 −1 ⎢ ⎥ c. det ⎣ 1 2 1 ⎦. −1 1 3
0 1 0 1 −1 0 0 1 2 1 1 0
0 0
1 0
1 1 −1 1 0
⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦
127
128
Chapter 3 Determinants ⎡ ⎢ ⎢ 8. a. det ⎢ ⎢ ⎣
−1 2 2 1
2 1
3 1 −1 2 4 1
⎡
⎤
0 ⎢ ⎢ ⎥ ⎢ −1 ⎥ ⎥; b. det ⎢ ⎢ ⎥ 2 ⎦ ⎢ ⎣ 0
3 0 1 1 0
2 0 1 1 0 1 2 0 1 −2 1 0 0 0 1 2 2 −1 0 −1
⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦
⎤ ⎡ → − v1 T ⎢ →T ⎥ 9. Let A = ⎣ − v2 ⎦ be a 3 × 3 matrix such that det A = 6. Calculate the determinant of → − v3 T ⎤ ⎤ ⎤ ⎡ ⎡ − ⎡ − → − → → v1 T v3 T v1 T → ⎥ ⎥ ⎢ →T ⎢ →T ⎥ ⎢ − → a. ⎣ − v2 + 2− v2 ⎦; c. ⎣ 0 T ⎦; d. A−1 ; e. 2A. v1 T ⎦; b. ⎣ − → − → − → − v3 T v1 T v3 T ⎡ ⎢ ⎢ 10. Let A = ⎢ ⎢ ⎣ ⎡ ⎢ ⎢ a. ⎢ ⎢ ⎣
− → v1 T → − v2 T → − v3 T → − v T
⎤ ⎥ ⎥ ⎥ be a 4 × 4 matrix such that det A = −4. Calculate the determinant of ⎥ ⎦
4
− → v1 T → − v2 T → − → T v3 − − v2 T → − v T
⎤
⎡
⎢ ⎥ ⎢ ⎥ ⎥; b. ⎢ ⎢ ⎥ ⎣ ⎦
4
⎡
→ 3− v1 T → − v2 T → 2− v3 T → − v T
⎤ ⎥ ⎥ → → → → ⎥; c. − v1 − v2 − v3 − v4 ; d. A3 ; e. −A. ⎥ ⎦
4
⎤
⎡ ⎤ 2 0 0 −1 8 8 ⎢ ⎥ ⎢ ⎥ 11. Let A = ⎣ 6 −1 0 ⎦ and let B be a matrix such that B −1 = ⎣ 0 2 3 ⎦. 0 4 3 0 0 −2 Calculate the determinant of a. AT ;
b. A;
c. AB.
12. Let A and B be 4 × 4 matrices such that det A = 2 and det B = −7. Calculate the determinant of a. ABA;
b. AT A2 ;
c. 2B;
d. (3A)−1 ;
e. (AB T )2 (B 2 ).
13. Let A and B be 3 × 3 matrices such that det A = −3 and det B = 5. Calculate the determinant of a. B T A2 ;
T/F?
b. A−1 B;
c. 3A;
d. (2B)−1 ;
e. AT BA−1 ;
f. (A−1 B)−1 (BA)T .
In Exercises 14–21, decide whether each statement is true or false. Justify your answer. 3 14. If A is a square matrix, then det A3 = (det A) . 15. If all entries of a square matrix A are positive, then det A > 0. 16. If A and B are n × n invertible matrices, then det(AB −1 ) =
det A det B .
17. If one of the columns of an n×n matrix A is a linear combination of the remaining columns, then det A = 0. 18. If A and B are n × n matrices such that det(AB) = 0, then A is row equivalent to B. 19. If A is an m × n matrix and B is an n × m matrix, then det(AB) = det(BA).
Section 3.1 Cofactor Expansions
129
20. If A is a lower triangular matrix, then det A is the product of the main diagonal entries of A. 21. If A is a 3 × 3 matrix with det A = 2, then A is row equivalent to I3 .
22. * Consider the n × n exchange matrix ⎡ ⎢ ⎢ Jn = ⎢ ⎢ ⎣
0 ··· 0 ··· .. . 1 ···
⎤ 0 1 ⎥ 1 0 ⎥ .. .. ⎥ ⎥. . . ⎦ 0 0
a. Calculate det Ji for i = 1, 2, 3, 4, 5. b. Find a general formula for det Jn for all positive integers n.
− → → − → − → − → − → − → − → − 23. * Prove that for any 3-vectors u , v , w , and r , ( u × v )·( w × r ) =
→ → (Cross product − u ×− v is defined in formula (3) on p. 15.)
− → → − → u ·− w → v ·− w
. → − → → − u ·− r − v ·→ r
24. * Let A, B, C, and D be n×n matrices. Show that it is generally not true that det
A
B
C D equals det A det D − det B det C. (Hint: Find four 2 × 2 matrices that can serve as a counterexample.)
25. * Let A and B be square matrices (not necessarily of the same size). Show that A C det = det A det B. 0 B (Hint: Consider elementary row operations necessary to reduce A to an upper triangular matrix U without affecting B.) 26. * Show that a block diagonal matrix ⎤ ⎡ 0 ··· 0 C1 ⎥ ⎢ ⎢ 0 C2 · · · 0 ⎥ ⎥ ⎢ A=⎢ . .. .. ⎥ .. ⎢ .. . . . ⎥ ⎦ ⎣ 0 0 · · · Ck where C1 , . . . , Ck are square matrices (not necessarily of the same size) satisfies det A = det C1 · · · det Ck . 27. * A permutation of the set of integers S = {1, 2, . . . , n} is any rearrangement of the set S, i.e., a sequence i1 , . . . , in such that all ij are in S and they are all distinct. E.g., • •
{1, 2} has two permutations: (1, 2) and (2, 1) , while {1, 2, 3} has six: (1, 2, 3) , (2, 3, 1) , (3, 1, 2) , (3, 2, 1) , (2, 1, 3) , and (1, 3, 2) . a. Prove that the set {1, 2, . . . , n} has n! = 1 · 2 · · · · · n permutations. (Hint: Use mathematical induction.) b. The permutation (2, 1) has one inversion required to arrange it in the increasing order, whereas the permutation (1, 2) requires zero inversions. Likewise, the permutation (3, 1, 2) requires two inversions to make it increasing: first interchange 1 and 3 to yield (1, 3, 2) and then interchange 3 and 2. Determine the required number of inversions for each of the five remaining permutations of {1, 2, 3} listed above.
130
Chapter 3 Determinants c. An alternative formula to calculate determinants is det A = σ i1 ...in a1i1 a2i2 · · · anin
(35)
where the sum is taken over all n! permutations of {1, 2, . . . , n} and σ i1 ...in = 1 whenever the permutation has an even number of inversions, and σ i1 ...in = −1 whenever that number is odd. Expand the formula (35) for n = 2 and n = 3 and compare the results with the standard formulas (29) and (30). d. By induction, prove that the formula (35) holds true for any n. 28. A permutation matrix was defined on p. 108 as a matrix obtained by multiplying elementary matrices corresponding to the operations ri ↔ rj . In this exercise, you will investigate connections between a permutation matrix and the notions of permutation and inversion introduced in Exercise 27. a. Show that any permutation i1 , . . . , in of the set {1, 2, . . . , n} corresponds to the per⎤ ⎡ ⎤ ⎡ ⎡ − ⎤ 1 i1 → T ei1 ⎥ ⎢ ⎥ ⎢ ⎢ . ⎥ ⎢ 2 ⎥ ⎢ i2 ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ mutation matrix P = ⎣ .. ⎦ so that P ⎢ .. ⎥ = ⎢ .. ⎥ where the matrix P is ⎣ . ⎦ ⎣ . ⎦ − T e→ in n in a product of the elementary matrices of the operations ri ↔ rj corresponding to the inversions in the permutation. b. Show that det P = 1 whenever the number of inversions of the corresponding permutation is even, and det P = −1 whenever the number is odd.
3.2 Applications of Determinants Theorem 3.8 described the manner in which performing elementary row operations on a square matrix affects its determinant. In light of Theorem 3.7, we can similarly consider column operations: •
adding a multiple of one column to another does not affect the determinant,
•
multiplying a column by a scalar k results in multiplying the determinant by k as well,
•
interchanging two columns reverses the sign of the determinant. In the following subsection, we shall derive a method for solving linear systems by using elementary column operations on the coefficient matrix.
Cramer’s rule
− → → Consider an n × n coefficient matrix A of a linear system A− x = b with a unique solution ⎤ ⎡ x1 ⎥ ⎢ ⎢ x2 ⎥ → − ⎥ x =⎢ ⎢ .. ⎥ (i.e., A is nonsingular). ⎣ . ⎦ xn Let us multiply the jth column of A by the jth component of the solution vector, xj (even though this is considered unknown until the system is solved, we are assuming it exists and is unique). Doing so results in multiplying the determinant by the same factor (xj ).
Section 3.2 Applications of Determinants
131
Now, let us add x1 times the first column to the jth column. Similarly, add x2 times the second column to the jth, etc. We will perform n − 1 operations colj + xk colk → colj for all k values 1, . . . , j − 1, j + 1, . . . , n. None of these operations affect the determinant. As a result, we obtain the following matrix: ⎤ ⎡ a11 · · · a1,j−1 a11 x1 + · · · + a1j xj + · · · + a1n xn a1,j+1 · · · a1n ⎥ ⎢ ⎢ a21 · · · a2,j−1 a21 x1 + · · · + a2j xj + · · · + a2n xn a2,j+1 · · · a2n ⎥ Aj = ⎢ .. .. .. .. ⎥ ⎥ ⎢ .. ⎣ . . . . . ⎦ an1 · · · an,j−1 an1 x1 + · · · + anj xj + · · · + ann xn an,j+1 · · · ann → − whose jth column became equal to the right-hand side vector b and whose determinant equals xj det A. This leads to the following result. → − (Cramer’s Rule) If A is a nonsingular n × n matrix and b is an n-vector, ⎡ ⎤ x1 ⎢ ⎥ ⎢ x2 ⎥ → − → − → − ⎢ then the linear system A x = b has the unique solution x = ⎢ . ⎥ ⎥ whose components are ⎣ .. ⎦ T HEOREM 3.13
xn given by det Aj det A → − where Aj is the matrix obtained from A by replacing its jth column with b . xj =
Let us use Cramer’s rule to solve the linear system
E XAMPLE 3.6
2x1 3x1 −x1
x2
− 4x3
= −5
+ 3x2
− 2x3 + x3
= 6 = −4
+
⎡
⎤ 1 −4 ⎥ 0 −2 ⎦ = −25 = 0; therefore, the system has a unique solution, 3 1 ⎡ ⎤ −5 1 −4 ⎢ ⎥ which can be found using Cramer’s rule. We have det A1 = det ⎣ 6 0 −2 ⎦ = −100, −4 3 1 ⎡ ⎡ ⎤ ⎤ 2 −5 −4 2 1 −5 ⎢ ⎢ ⎥ ⎥ det A2 = det ⎣ 3 6 −2 ⎦ = 25, and det A3 = det ⎣ 3 0 6 ⎦ = −75; thus 2 ⎢ det A = det ⎣ 3 −1
−1 −4 x1 =
det A1 det A
=
−100 −25
−1 3 −4
1
= 4, x2 =
det A2 det A
=
25 −25
= −1, x3 =
det A3 det A
=
−75 −25
= 3.
The main limitations of Cramer’s rule, compared to Gaussian elimination, are the assumptions that were made in Theorem 3.13: •
the system must have the same number of equations and unknowns (so that its coefficient matrix is square), and
•
the solution must be unique (i.e., the coefficient matrix must be nonsingular). Even if both of these conditions are met, Cramer’s rule is very inefficient for large systems compared to methods based on Gaussian elimination.
132
Chapter 3 Determinants
Adjoint
We begin this subsection by defining its main subject: the adjoint matrix. Simply put, the adjoint of A is a matrix of all cofactors of A, but with a twist: all row and column indices are reversed. D EFINITION (Adjoint) If n ≥ 2, then the adjoint of an n × n matrix A is the matrix ⎡ ⎤ A11 A21 · · · An1 ⎢ ⎥ ⎢ A12 A22 · · · An2 ⎥ ⎢ adj A = ⎢ . .. .. ⎥ ⎥. ⎣ .. . . ⎦ A1n A2n · · · Ann The adjoint of a 1 × 1 matrix is the 1 × 1 matrix [1]. We now state the key result involving the adjoint. T HEOREM 3.14
For every n × n matrix A A adj A = (det A) In .
P ROOF If n = 1, the statement is obviously true. Let n ≥ 2. The (i, j) entry of the matrix D = A adj A satisfies dij = rowi A · colj (adj A) = ai1 Aj1 + ai2 Aj2 + · · · + ain Ajn . If i = j, this expression equals det A by Theorem 3.1 (this is the cofactor expansion along the ith row). If i = j, the expression can be seen as the cofactor expansion along the jth row of the determinant of the following matrix: ⎡ ⎤ − − row1 A ⎢ ⎥ .. ⎢ ⎥ . ⎢ ⎥ ⎢ ⎥ ⎢ − rowi A − ⎥ −→ ith row of A ⎢ ⎥ ⎢ ⎥ .. ⎢ ⎥ ↓ is copied . ⎢ ⎥ ⎥ Z=⎢ ↓ into ⎢ − rowj−1 A − ⎥ ⎢ ⎥ ⎢ − ←− the jth row − ⎥ rowi A ⎢ ⎥ ⎢ ⎥ ⎢ − rowj+1 A − ⎥ ⎢ ⎥ .. ⎢ ⎥ ⎣ ⎦ . −
rown A
−
ai1 Aj1 + ai2 Aj2 + · · · + ain Ajn = zj1 Zj1 + zj2 Zj2 + · · · + zjn Zjn (since Zjk = Ajk for all k). Because Z has two identical rows, by Theorem 3.5, det Z = 0. Consequently,
( dij =
which completes the proof.
det A 0
if i = j, if i = j,
Section 3.2 Applications of Determinants If A is a nonsingular matrix, then
C OROLLARY 3.15
A−1 =
1 adj A. det A
⎡
E XAMPLE 3.7
⎤ 2 −1 0 ⎢ ⎥ The matrix A = ⎣ 0 1 −2 ⎦ has the cofactors 3 1 1
A11 = (−1)
1+1
det
A12 = (−1)1+2 det
A13 = (−1)1+3 det
A21 = (−1)2+1 det
1 −2 1 1
0 −2 3 1
0 1 3 1
A22 = (−1)2+2 det
The adjoint matrix of A is
⎡
= 3,
A23 = (−1)
det
= −6,
A31 = (−1)3+1 det
= −3,
−1 0
2 0 3 1
2+3
1 1
A32 = (−1)3+2 det
= 1,
A33 = (−1)3+3 det
2 −1 3 1
= −5,
−1 0 1 −2
2 0 0 −2
2 −1 0
1
= 2,
= 4,
= 2,
= 2.
A11 ⎢ adj A = ⎣ A12 A13
A21 A22 A23
⎤ ⎡ ⎤ A31 3 1 2 ⎥ ⎢ ⎥ A32 ⎦ = ⎣ −6 2 4 ⎦. A33 −3 −5 2
Let us verify the result from Theorem 3.14: ⎡
⎤⎡ ⎤ ⎡ ⎤ 2 −1 0 3 1 2 12 0 0 ⎥ ⎢ ⎢ ⎥⎢ ⎥ A adj A = ⎣ 0 1 −2 ⎦ ⎣ −6 2 4 ⎦ = ⎣ 0 12 0 ⎦ . 3 1 1 −3 −5 2 0 0 12 Since det A = 12 (check), the matrix we obtained equals (det A) I3 .
133
134
Chapter 3 Determinants
Geometric interpretation of determinants
Consider the parallelogram constructed using the rows of the matrix 2 1 A= 4 5 (as well as their translations). E XAMPLE 3.8
In Gauss-Jordan reduction, we would typically begin by scaling the first row to obtain 1 as the pivot. However, in this example, we proceed differently, keeping 2 as the pivot value for subsequent calculation. The first elementary row operation we perform is r2 − 2r1 → r2 obtaining 2 1 B= . 0 3 Note that
y r2
5 −2r 1
2 1 4 5
r1
1 2
x
4
y 3 r2 r1 1
-1 r 3 2
2 1 0 3 x
2 y 3
2 0 0 3
r2 r1 2
x
•
det A = det B (from Theorem 3.8) and
•
the area of the parallelogram corresponding to the row vectors of B is the same as the area of the original parallelogram – examine the margin illustrations to convince yourself that both parallelograms can be seen as having the same base (the first row) and that the heights of the two parallelograms are indeed equal.
Now, once again, we do not scale the second row of B to obtain a leading one – instead perform 1 r1 − r2 → r1 3 giving us 2 0 C= 0 3 whose determinant matches that of B (and of A) and for which the area of the parallelogram remains, again, unchanged. Of course the last parallelogram is really a rectangle, which makes the calculation of its area remarkably straightforward: 2 · 3 = 6; the same straightforward calculation yields the determinant of the diagonal matrix C. Denoting by area(M ) the area of the parallelogram corresponding to the rows of M we conclude that area(A) = area(B) = area(C) = det C = det B = det A.
In this example, we did not have to interchange rows of the matrix; however, according to Theorem 3.8 the magnitude of the determinant would not change even if we did. In fact, for any nonsingular matrix A we can obtain a diagonal matrix D by performing a finite sequence of operations of the types ri + krj → ri and ri ↔ rj and staying away from operations kri → ri (the reader is asked to justify this claim in Exercise 25 on p. 139) . This yields |det A| = |det D|
(36)
so that the area of the parallelogram constructed from the rows of a 2 × 2 matrix A equals |det A| . If A is singular, then one row vector must be a scalar multiple of the other (why?), making the area 0, which again matches det A.
Section 3.2 Applications of Determinants
135
E XAMPLE 3.9 Use the determinant to find the area of the triangle with vertices P (3, 1), Q(−1, 2), and R(0, −1). S OLUTION The area in question is one half of the area of the parallelogram built using the vectors −1 − 3 −4 −−→ PQ = = 2−1 1 and 0−3 −3 −→ PR = = . −1 − 1 −2 Therefore, (Area of triangle P QR)
= = = =
2u
−4 1
1
det
2 −3 −2 1 |(−4)(−2) − (1)(−3)| (using formula (29)) 2 1 |8 + 3| 2 11 . 2
(translated)
v+2u v w
u Performing an elementary row operation ri + krj → ri does not affect the volume of the parallelepiped obtained from the row vectors.
Applications of determinants in multivariable calculus ò
A similar argument can be applied to 3 × 3 matrices as well. This time, row vectors of A give rise to a parallelepiped, whose volume can be shown to remain the same under a row reduction involving operations of types ri ↔ rj and ri + krj → ri . Once again, if the matrix A is nonsingular, then we can use such operations to obtain a diagonal matrix D. This matrix corresponds to a box with the magnitude of the volume equal to the absolute value of the product of the main diagonal entries (= |det D|). By the equality (36) the volume of the original parallelepiped is given by |det A|.
Recall thatin Example 1.33 in Section 1.4 (p. 50), we studied a nonlinear transformation st → − r (s, t) = along with its linearizations at the specific locations, based on the Jacos2 − t bian matrix ⎡
∂ ⎢ ∂s (st) J(s, t) = ⎣ ∂ (s2 − t) ∂s
⎤ ∂ (st) ⎥ t s ∂t . ⎦= ∂ 2 2s −1 (s − t) ∂t
In our discussion of geometric properties of 2 × 2 determinants, we have shown that the determinant of a 2×2 matrix A represents a signed area of the parallelogram spanned by the row vectors of A (as well as the one spanned by the column vectors, from Theorem 3.7). Therefore, the absolute value of the determinant of the Jacobian at a point corresponds to the ratio of the area resulting from the transformation divided by the original area.
136
Chapter 3 Determinants
16
y t=0
t=1
t=2
t=3
t=4 s=4
9
t
O
s=3
4
4 3 2 1 1 2 3 4
s
1
x 8
−1 −2 −3 −4 s=0
s=2
16
s=1
Consult the figure above (depicting column vectors of the Jacobian matrix) to verify that it makes sense for the following values of the Jacobian determinants to be interpreted in this way: 1 1 3 3 |det J(1, 1)| = | det( )| = 3, |det J(3, 3)| = | det( )| = 21. 2 −1 6 −1 This information is extremely useful when performing a change of variables in multiple integrals, as illustrated in the following example.
E XAMPLE 3.10 In this example, we shall find the area of the region D pictured in the following figure. From calculus, we know this area is specified by the double integral )) I= 1 dx dy. (37) D
However, because of the complicated shape of the region D in the xy-plane, it would be awkward to set up the iterated integrals corresponding to I. Instead, we change the variables from x and y to s and t, making sure to include the magnitude of the Jacobian determinant as a scaling factor: )3 )4 )) |det (J(s, t))| ds dt = (t + 2s2 ) ds dt. (38) I= R
1
1
One way to justify this change of variables is by recognizing that it reflects the equality of volumes of two solids: •
formula (37) calculates the volume of the solid with base D and constant height 1, whereas
•
formula (38) corresponds to the volume of the solid with base R and a variable height specified by z = t + 2s2 .
After a few additional steps, a student familiar with multivariable calculus basics should be able to evaluate the double integral in (38) to obtain the final answer 96.
Section 3.2 Applications of Determinants
137
y 16
t=1 t=0
t=2 t=3 t=4 s=4
9
t
D
O
EXERCISES
s=3
4
4 3 2 1
R 1
s 1 2 3 4
-1 -2 -3 -4 s=0
x 8
s=2
s=1
In Exercises 1–8, solve each system using Cramer’s rule if possible. 1.
3x1 −2x1
− +
x2 3x2
= 3 = 5
2.
2x1 −4x1
− +
x2 3x2
= −3 = 6
3.
2x1 −4x1
− x2 + 2x2
= 1 = 6
4.
x1 −2x1
+ −
= 3 = −1
5.
x1 x1
6.
4x1 x1
7.
x1 2x1
2x2 9x2
+ 2x2 + x2 x2
+ 3x3 + 3x3
= 1 = −1 = 0
− 5x2 x2 + 2x2
+ x3 + x3 + 3x3
= 0 = 0 = 1
+ 2x2 3x2 − x2
− x3 − 2x3 + x3
= 2 = 1 = 1
16
138
Chapter 3 Determinants 2x1 −x1 x1
8.
− +
3x2 x2
+ 2x3 − x3 + x3
= 6 = 0 = 3
In Exercises 9–10, find the adjoint of the given matrix. Verify that A adj A = (det A) In . ⎡ ⎤ ⎡ ⎤ 1 1 2 2 0 −1 −1 2 ⎢ ⎥ ⎢ ⎥ 9. a. ; b. ⎣ −2 1 −1 ⎦; c. ⎣ 0 1 0 ⎦. 3 1 0 2 −3 −4 0 2 10. a.
4 0 1 1
⎡
;
⎤ 4 1 0 ⎢ ⎥ b.⎣ 1 −2 2 ⎦; 0 0 1
⎤ −4 2 2 ⎢ ⎥ c. ⎣ 1 0 0 ⎦. 0 −1 1 ⎡
In Exercises 11–12, use determinants to find the area of the triangle with given vertices. 11. P (5, 2), Q(−1, 3), and R(4, 0). 12. P (1, −3), Q(−2, −3), and R(−5, 6). Before you attempt Exercises 13–14, recall how in Example 1.24 on p. 41, a linear transformation F : R2 → R2 was visualized using its effect on a specific shape. Directly comparing the “input” and “output” shape, you should be able to approximate the determinant of the corresponding 2 × 2 matrix. E.g., determinant of the matrix in part (a) of Example 1.24 is 1 since the area does not change. In part (b), it should be 2, whereas in part (c) it is −1 – the area is the same, but the shape is “flipped”. You should see that these match the actual determinant values obtained from the matrices themselves. 13. For each of the three transformations of the letter “F” depicted in Exercise 16 on p. 52, visually estimate the determinant based on how the transformation affects the shape. (All determinants here will be integers between −4 and 4.) Confirm that the same determinant is calculated from the correct matrix. 14. Repeat Exercise 13 for the transformations of the letter “R” in Exercise 17 on p. 52.
T/F?
In Exercises 15–19, decide whether each statement is true or false. Justify your answer. 15. The adjoint of a diagonal matrix is diagonal. 16. For all n × n matrices A, adj(AT ) = (adjA)T . 17. For all real numbers k and n × n matrices A, adj(kA) = kadjA. 18. If adjA = A−1 , then det A = 1. 19. If A is singular, then adjA does not exist.
Section 3.2 Applications of Determinants
ò
139
In Exercises 20–24, refer to the discussion of the determinant of a Jacobian beginning on p. 135. Calculus is required. 20. * Let a and b be two positive real numbers. x as → − a. Show that the change of variables = r (s, t) = maps the region inside y bt the unit circle C in the s, t coordinate system, s2 + t2 ≤ 1, into the region inside the 2 2 ellipse E : xa2 + yb2 ≤ 1 in the x, y coordinate system. b. Find the Jacobian J(s, t), and use it to prove that the area ** of the ellipse E is πab by converting the double integral expressing the area of E, E 1dxdy, to the s and t. 21. * For the nonlinear transformation defined in Exercise 39 on p. 56, find the determinants of the Jacobian matrices obtained at each of the four points. Verify that the magnitude of the determinant approximates the ratio of the transformed region area over the original region area near each point. 22. * Repeat Exercise 21 for the nonlinear transformation defined in Exercise 40 on p. 56. 23. * Repeat Exercise 21 for the nonlinear transformation defined in Exercise 41 on p. 57. 24. * Repeat Exercise 21 for the nonlinear transformation defined in Exercise 42 on p. 57.
25. * In Section 2.4, we introduced a P A = LU decomposition of a square matrix A, in which P is a permutation matrix, L is unit lower triangular, and U is upper triangular. In Exercise 28 on p. 130, it was shown that the determinant of an n × n permutation matrix is 1 or −1. a. Prove that det L = 1. b. Show that if A is nonsingular, then the main diagonal entries of the upper triangular matrix U are all nonzero. Using this fact, explain how a sequence of elementary row operations of the type ri + krj → ri can be applied to U to obtain a diagonal matrix D. c. Use parts a–b of this exercise to justify the claim that any nonsingular matrix A can be row-reduced to a diagonal matrix without using operations of type kri → ri . 26. * Prove that ⎡
x1
⎢ 1 2
det ⎣ x2
x3
the area of the triangle with vertices (x1 , y1 ), (x2 , y2 ), and (x3 , y3 ) equals ⎤ y1 1
⎥ y2 1 ⎦ .
y3 1
27. * Show that four points in 3-space, (x1 , y1 , z1 ), (x2 , y2 , z⎡2 ), (x3 , y3 , z3 ), and⎤(x4 , y4 , z4 ), x1 y1 z1 1 ⎢ ⎥ ⎢ x2 y2 z2 1 ⎥ ⎢ ⎥ = 0. are coplanar (reside on the same plane) if and only if det ⎢ ⎥ ⎣ x3 y3 z3 1 ⎦ x4 y4 z4 1 28. * Prove that the adjoint of a singular matrix A must also be singular by considering two possible cases: a. When A = 0 (n × n zero matrix), show that adjA = 0 as well; therefore, it is singular. b. If A is singular but A = 0, then show that assuming adjA is nonsingular would lead to a contradiction. (Hint: Use Theorem 3.14.)
140
Chapter 3 Determinants
3.3 Chapter Review
Section 3.3 Chapter Review
141
142
Chapter 4 Vector Spaces
4
Vector Spaces
4.1 Vector Spaces Theorem 1.1 listed ten properties of addition and scalar multiplication of n-vectors. We then saw the same ten properties apply to m × n matrices in Theorem 1.3. We might ask whether there are any other sets that also satisfy these properties. In this section, we are going to see that the answer is “yes” and will study a number of such sets. They will be called vector spaces, but the word “vector” will now carry a more general (and abstract) meaning.10 D EFINITION (Vector Space)
cv
·c
u
The set V , together with operations of addition and scalar multiplication, is called a vector space if the following conditions are satisfied: → → → → 1. If − u,− v are in V , then − u +− v is also in V (i.e., V is closed under the operation of addition).
+
v
− → → → → → 2. → u +− v =− v +− u for all − u,− v in V.
u+v
− → → → → → → − → 3. (→ u +− v)+− w =− u + (− v +− w ) for all − u ,→ v ,− w in V.
V
→ → → → → → → 4. There exists an element − z in V such that − u +− z =− z +− u =− u for all − u in V. → − → → 5. For every − u in V, there exists an element d in V (called the negative of − u ) such → − − → − → − → → − that u + d = d + u = z .
Illustration of conditions 1 and 6 from the definition of vector space.
→ → 6. If c is a real number and − u is in V , then c− u is also in V (i.e., V is closed under the operation of scalar multiplication). → → → → → → u,− v in V and for all real numbers c. 7. c(− u +− v ) = c− u + c− v for all − → → → → 8. (c + d)− u = c− u + d− u for all − u in V and for all real numbers c and d. → → → 9. (cd)− u = c(d− u ) for all − u in V and for all real numbers c and d. → → → 10. 1− u =− u for all − u in V. In the literature, the vector space defined above is sometimes referred to as a vector space over the field of real numbers. Although we shall not discuss this in our text, a vector space can be similarly defined over any field, e.g., the field of complex numbers. 10
In some texts, a vector in Rn is referred to as an n-tuple.
Section 4.1 Vector Spaces
143
Note that to completely describe a vector space, one must specify three items: •
the set V,
•
the operation of addition of elements of set V, and
•
the operation of scalar multiplication of a real number by an element of V. It follows from Theorems 1.1 and 1.3 that both of the following are properly defined vector spaces:
•
Rn with the usual operations of vector addition and scalar multiplication, and
•
Mmn , which denotes the set of all m × n matrices with the usual matrix addition and scalar multiplication. Here is a different example: E XAMPLE 4.1 Consider FX to be the set of all functions defined over a nonempty set X ⊆ R (X is a subset of the set of all real numbers and can be R itself) with
•
the operation of addition defined by (f + g)(x) = f (x) + g(x) for all x in X and
•
the operation of scalar multiplication defined by (cf )(x) = cf (x) for all x in X. → → Note that it may sometimes be awkward to refer to these “vectors” using the notation − u,− v, etc.; instead, we may find it convenient to revert to the standard function notation (f, g, etc.), keeping in mind that these are to be considered vectors in the general sense. Let us check at least some of the ten conditions of vector space, leaving verification of the remaining ones as an exercise for the reader.
Condition 1. If functions f and g are both defined over the entire set X, then their sum is also going to be defined over the same set. Thus, FX is closed under the operation of addition. Condition 2. The left-hand side contains a vector (function) f + g whose value at any x is (f + g)(x) = f (x) + g(x). The right-hand side contains g + f , whose value at any x is (g + f )(x) = g(x) + f (x). Since for all x in X the values are identical, it follows that f + g = g + f. Condition 4. Does there exist a vector in our set (i.e., a function) z such that f + z = z + f = f for all f in FX ? Indeed, we can define z to be a function such that z(x) = 0 for all x in X. Condition 5. Now we want to verify that for every f in FX , there exists another function d in FX such that f + d = d + f = z (where z is the zero function defined above). This time, we need to let d be a function whose values, d(x) = −f (x), are negatives of the corresponding values of f for all x in X.
A special case of Example 4.1 is FR : the space of functions defined over all real numbers. Such functions include polynomials, studied in our next example.
144
Chapter 4 Vector Spaces E XAMPLE 4.2 Let Pn be the set of all polynomials of degree n or less, i.e., functions p which can be expressed in the form p(x) = a0 + a1 x + a2 x2 + · · · + an−1 xn−1 + an xn with real numbers (coefficients) a0 , a1 , . . . , an . We define the addition p + q and scalar multiplication cp in a manner consistent with Example 4.1: (p + q) (x) = p(x) + q(x) and (cp)(x) = cp(x) for all x. It can be verified that the ten conditions of vector space are satisfied. We will pay particular attention (for reasons that will be revealed in the next section) to conditions 1 and 6. Condition 1 is satisfied because adding one vector (polynomial) p in Pn such that p(x) = a0 + a1 x + a2 x2 + · · · + an−1 xn−1 + an xn to another vector q in Pn with q(x) = b0 + b1 x + b2 x2 + · · · + bn−1 xn−1 + bn xn we obtain a p + q whose value at any real x can be expressed as (p + q)(x) = (a0 + b0 ) + (a1 + b1 )x + (a2 + b2 )x2 + · · · + (an + bn )xn . This is obviously another polynomial of degree n or less, which allows us to conclude that p+q is in Pn . Condition 6 can be shown similarly (take the p above, a real number c, and consider cp – it is also in Pn ).
The example above is actually a template that can be used to build infinitely many specific vector spaces, e.g.: •
P2 contains all polynomials of degree 2 or less, i.e., quadratics (degree = 2), linear functions (degree = 1), constant functions (degree = 0), and polynomial p(x) = 0 with no degree.
•
P3 is the set of all polynomials of degree 3 or less. This means that it includes all of the contents of P2 and all cubic polynomials. You may begin to wonder whether we could build polynomial spaces that are simpler than the ones discussed above, such as “the set of all quadratic polynomials”. Unfortunately, the following example will shatter our hopes for such simplicity.
0 E XAMPLE 4.3 Let V be the set of all polynomials of degree exactly 2. Each of the vectors in V has the form p(x) = a0 + a1 x + a2 x2 with a2 = 0.
·0
2
2+3x-x
·2
2
4+6x-2x
It is because of this last requirement that V fails to be a vector space. E.g., taking the following p and q in V : p(x) = 2 + 3x − x2 and q(x) = 1 − x + x2 we obtain (p + q)(x) = 3 + 2x, which is not in V, in violation of condition 1.
1-x+x2 +
V
3+2x Illustration of Example 4.3 – the set V is not closed under vector addition or scalar multiplication.
Condition 1 is not the only one that breaks down for this example. In condition 4, we will be unable to find a zero vector as p(x) = 0 is outside V. Condition 6 fails as well (just take c = 0). Note that some of the conditions (e.g., condition 2) are satisfied for our V. However, for V to be a proper vector space, all ten conditions must be satisfied.
Section 4.1 Vector Spaces
145
E XAMPLE 4.4 Consider the set V of all infinite sequences {x1 , x2 , x3 , . . .} of real numbers, with the operations {x1 , x2 , x3 , . . .} + {y1 , y2 , y3 , . . .} = {x1 + y1 , x2 + y2 , x3 + y3 , . . .} and c{x1 , x2 , x3 , . . .} = {cx1 , cx2 , cx3 , . . .}. It can be shown that this set satisfies all ten conditions of the vector space.
You may recall from calculus, or other courses, that an infinite sequence {x1 , x2 , x3 , . . .} can be defined as a function whose domain is the set of positive integers, N. Therefore, the space of infinite sequences from the above example can be referred to as FN (using the notation of Example 4.1). Interestingly, this space can also be thought of as an extension of the space Rn , except that the number of components of these “vectors” is taken to be infinite. In all the vector spaces discussed so far (Rn , Mmn, Pn , the space of functions defined over X, and the space of sequences), the operations of addition and scalar multiplication were defined in the usual way (consistent with how we normally add vectors, matrices, functions, and sequences, and how we multiply each of those by a scalar). The following example, on the other hand, will be choosing these operations in an “unusual” way instead.
You may find it useful to denote such operations by special symbols, e.g., → − → → u − v and c − u, to emphasize their nonstandard meaning and help prevent confusion with the standard operations.
E XAMPLE 4.5 Let V be the set of all positive real numbers, denoted by R+ . Here is how we propose to perform operations on vectors in V : → − → → u +− v = xy and c− u = xc → → where we assumed that − u had a value x and − v had a value y. Somewhat surprisingly, this set satisfies all ten conditions of a vector space.
Condition 1. A product of two positive numbers is also in R+ . Condition 2. xy = yx for all positive x and y. Condition 3. (xy)z = x(yz) for all positive numbers x, y, and z. Condition 4. Let the zero vector correspond to the value 1. Then x1 = 1x = x for all positive x. Condition 5. The negative of a vector should contain the reciprocal of the original number: (x) ( x1 ) = ( x1 ) (x) = 1. Condition 6. If x is a positive number, then so is xc for any real c. Condition 7. (xy)c = xc y c for all positive x and y and all scalars c. Condition 8. xc+d = xc xd for all positive x and all real numbers c and d. Condition 9. xcd = (xc )d for all positive x and all real numbers c and d. Condition 10. x1 = x for all positive x values.
It is interesting to note how many standard algebra formulas were involved in verifying the conditions in the example above.
146
Chapter 4 Vector Spaces Let V be the set of all 2-vectors for which the operations are defined as fol b1 a2 + b2 a1 ka2 a1 + = and k = . a2 b2 a1 + b1 a2 ka1 A number of conditions will hold just fine, specifically conditions 1, 2, 6, and 7. However, we must remember that if just one condition fails, then the set fails to be a vector space.
As discussed in the margin note for the previous example, you may find it helpful to use notation like b1 a2 + b2 a1 = a2 b2 a1 + b1 and a1 ka2 k = a2 ka1 instead.
E XAMPLE 4.6 lows:
→ → → E.g., condition 8, (c + d)− u = c− u + d− u , is not satisfied because (c + d)a2 a1 = , LHS = (c + d) a2 (c + d)a1 ca2 da2 ca1 + da1 → − → − RHS = c u + d u = + = . ca1 da1 ca2 + da2 We conclude that V is not a vector space.
To successfully solve problems similar to the last example, you may want to first “encode” the nonstandard operations using placeholders rather than variable names: ⎡ ⎤ ⎡ ⎤ + ⎢ ⎥ ⎢ ⎥ ⎥=⎢ ⎥ +⎢ ⎣ ⎦ ⎣ ⎦ + and
⎤
⎡
⎥ ⎦.
⎢ =⎣
Now, to carry out any nonstandard operation, e.g., ers with the corresponding expressions: ⎡ ⎤ ⎡ da2 ⎢ ca2 ⎢ ⎦+⎢ ⎣ ⎢ ca1 ⎣ da1
Properties of vector spaces
⎤
ca2 ca1
+
da2 da1
, just fill the placehold-
⎡
⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎦ ⎣
⎤ ca1 + ca2 +
da1 da2
⎥ ⎥ ⎥. ⎥ ⎦
Throughout the current section, we continued to emphasize the following: •
Elements of a vector space V may not resemble vectors in Rn ; furthermore, operations of addition and scalar multiplication defined in V may be quite different from their usual counterparts in Rn .
•
In spite of the possible discrepancies mentioned above, elements of V behave just like vectors in Rn do with respect to the ten conditions of the definition of the vector space. We shall conclude this section by stating a few additional properties that general vector spaces “inherit” from Rn . The first property strengthens condition 4 of the definition of the vector space.
Section 4.1 Vector Spaces
147
→ z in V such that T HEOREM 4.1 If V is a vector space, then there exists a unique element − − → → − → − → − → − → − u + z = z + u = u for all u in V. P ROOF → → → → → → → → Suppose there are two vectors − z and − y in V , which satisfy − u +− z =− u, − z +− u =− u, − → → − → − → − → − → − → − → − → − u + y = u , and y + u = u for all u in V . Applying the first equality with u = y and → → → → → → then the fourth one with − u =− z yields − y =− y +− z =− z , establishing uniqueness of the zero vector. → − This unique zero vector will often be denoted by 0 (just as in Rn ). T HEOREM 4.2
If V is a vector space, then
→ − → → (a) 0− u = 0 for every − u in V ; → − − → (b) c 0 = 0 for every real number c; → − → − → → (c) if c− u = 0 , then either c = 0 or − u = 0 (or both). P ROOF → (a) For any − u in V, we can write − → u
→ = 1− u (condition 10 of the vector space definition) → = (1 + 0)− u → − → = 1 u + 0− u (condition 8 of the vector space definition) → − → − = u + 0 u (condition 10 of the vector space definition).
→ It follows from Theorem 4.1 that 0− u must be the unique zero vector of the vector space V. → − → − (b) Let − u be a vector in V. From part (a), we can write 0 = 0→ u ; therefore, for any real number c, we have → − c0 = = = =
→ c(0− u) → (c · 0)− u (condition 9 of the vector space definition) → − 0u → − 0 (part (a) of this theorem).
→ − → → (c) If c = 0, then c− u = 0 for any − u in V by part (a) of this theorem. → − → If c− u = 0 but c = 0, then → − → u = 1− u (condition 10 of the vector space definition) 1 → u (we assumed c = 0) = ·c − c 1 − = (c→ u ) (condition 9 of the vector space definition) c 1− → → − → = 0 (we assumed c− u = 0) c → − = 0 (part (b) of this theorem).
148
Chapter 4 Vector Spaces − → → → u in a vector space V , the vector d = (−1)− u is the unique T HEOREM 4.3 For every − → − → − → − → − → − vector such that u + d = d + u = 0 . P ROOF
→ − → − − → → − → → → First of all, we show that d = (−1)− u satisfies − u + d = d +− u = 0 : − → → → → u + (−1)− u = 1− u + (−1)− u (condition 10 of the vector space definition) → − = (1 + (−1)) u (condition 8 of the vector space definition) → = 0− u → − = 0 (part (a) of Theorem 4.2); → − → → (−1)− u +− u = 0 follows from condition 2 of the vector space definition. → − → For any − u in V let d be a vector in V such that → − − → − d +→ u = 0. → Adding (−1)− u to both sides yields → → − → − → → (d +− u ) + (−1)− u = 0 + (−1)− u. Let us use condition 3 on the left-hand side and condition 4 on the right-hand side, → − → → → d + (− u + (−1)− u ) = (−1)− u;
(39)
then applying (39) and condition 4 on the left-hand side results in → − → d = (−1)− u → − which shows uniqueness of the negative of u .
→ → The unique negative of − u will often be denoted by −− u . Furthermore, we can perform a subtraction of two elements of V by adding one to the negative of the other: → − → → → u −− v =− u + (−− v );
→ → since V is closed under both addition and scalar multiplication, it follows that − u −− v is in V . The same claim can be made about any linear combination of vectors in V.
→, . . . , − → are elements of a vector space V and c , . . . , c are real numT HEOREM 4.4 If − u u 1 k 1 k bers, then the linear combination → + ··· + c − → c1 − u (40) 1 k uk is in V. The theorem is a consequence of conditions 1 and 6 of the vector space definition. A formal proof by induction is left as Exercise 13 for the reader.
EXERCISES
In Exercises 1–6, check whether the set V with the given operations satisfies the specific conditions of the vector space definition on p. 142.
x1 1. V = { | x1 , x2 ∈ R}; x2 b1 a1 a1 ka1 a1 + = ;k = ; a2 b2 b2 a2 ka2 Conditions 2, 3, 7, 8.
Section 4.1 Vector Spaces
x1 x2
2. V = {
a1 a2
+
149
| x1 , x2 ∈ R}; b1 b2
=
2a1 + 2b1 2a2 + 2b2
;k
a1 a2
=
2ka1 2ka2
;
Conditions 2, 3, 4, 7, 8, 9, 10. ⎡ ⎤ x1 ⎢ ⎥ 3. V = {⎣ x2 ⎦ | x1 , x2 , x3 ∈ R}; x3 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ a1 b1 a1 + b1 a1 a1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ a2 ⎦ + ⎣ b2 ⎦ = ⎣ a2 + b2 ⎦ ; k ⎣ a2 ⎦ = ⎣ a2 ⎦ ; a3 b3 a3 + b3 a3 a3 Conditions 7, 8, 9, 10. ⎤ ⎡ x1 ⎥ ⎢ 4. V = {⎣ x2 ⎦ | x1 , x2 , x3 ∈ R}; ⎡
⎤
x3 ⎡
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ a1 b1 a1 a1 ka1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ a2 ⎣ a2 ⎦ + ⎣ b2 ⎦ = ⎣ ⎦ ; k ⎣ a2 ⎦ = ⎣ ka2 ⎦ ; a3 b3 a3 + b3 a3 ka3 Conditions 2, 3, 7, 8. a 0 a 0 b 0 a+b 0 5. V = { | a ∈ R}; + = ; 0 a 0 a 0 b 0 a+b a 0 ka 0 k = ; 0 a 0 ka Conditions 2, 3, 4, 9, 10. 6. V = {a2 t2 + a1 t + a0 | a0 , a1 , a2 ∈ R}; 2 a2 t + a1 t + a0 + b2 t2 + b1 t + b0 = a0 + b0 ; k a2 t2 + a1 t + a0 = ka0 ; Conditions 2, 3, 8, 9, 10. In Exercises 7–10, decide whether the given set with the given operations is a vector space. If so, verify all ten conditions of the definition. Otherwise, indicate at least one condition that fails. x1 7. V = { | x1 , x2 ∈ R}; x2 a1 b1 a1 + b1 ka1 a1 ;k . + = = a2 b2 0 a2 0
150
Chapter 4 Vector Spaces
8. V
9. V
10. V
x1 ={ | x1 , x2 ∈ R}; x2 b1 a1 − b1 a1 a1 + = ;k = a2 b2 a2 − b2 a2 x1 ={ | x1 ∈ R}; 0 b1 a1 + b1 a1 a1 + = ;k = 0 0 0 0 a11 a12 ={ | a11 , a12 , a21 , a22 ∈ R}. a21 a22 b11 b12 0 0 a11 a12 + = ;k a21 a22 b21 b22 0 0
k/a1 k/a2
ka1 0
.
.
a11
a12
a21
a22
=
0
0
0
0
.
11. * Show that the set FX defined in Example 4.1, along with the operations of addition and scalar multiplication, satisfies conditions 6–10 of the vector space definition on p. 142. 12. * Show that the set V defined in Example 4.4, along with the operations of addition and scalar multiplication, satisfies all ten conditions of the vector space definition on p. 142. 13. * Prove Theorem 4.4 by induction: a. Show that for k = 1, the expression (40) is in V . → + ··· + c −−→ u b. For any k = 2, 3, ..., show that if c1 − 1 k−1 uk−1 is in V , then the expression (40) is in V as well. (See margin notes next to the proof of Theorem 3.3 for information on proofs by induction.) 14. * Let V and W be vector spaces, defined with specific operations of addition and scalar multiplication. Show that the set of all pairs of vectors from the two spaces V × W = → → → → → →) + (− → →) = (− → → →+− →) {(− v ,− w ) |− v ∈ V, − w ∈ W } with the operations (− v1 , − w v2 , − w v1 + − v2 , − w w 1 2 1 2 → − → − → − → − and c( v , w ) = (c v , c w ) is a vector space. (V × W is called a Cartesian product of V and W.) 15. * If V1 , . . . , Vk are vector spaces (defined with specific operations of addition and scalar → − → v1 , . . . , → vk ) |− vi ∈ Vi multiplication), show that their Cartesian product V1 × · · · × Vk = {(− for all i = 1, . . . , k} is a vector space (with operations defined as in Exercise 14). 16. * Discuss how the vector space Rn can be viewed as a Cartesian product of n factors R × · · · × R (see Exercise 15). Likewise, discuss how Mmn can relate to Cartesian products of Rm or Rn spaces. → → r (t) = 17. *⎡ A vector-valued function of one variable is a function − r : R → Rn where − ⎤ f1 (t) ⎢ . ⎥ ⎢ . ⎥. (Often, such functions are expressed as parametric equations; e.g., x = f1 (t), ⎣ . ⎦ fn (t) y = f2 (t), and t is referred to as the parameter.) If each of the component functions fi is in the vector space FR (the space of functions defined over all real numbers), check that the → set of all vector-valued functions − r is a Cartesian product of n factors FR × · · · × FR .
Section 4.2 Subspaces
151
4.2 Subspaces Some of the examples of vector spaces discussed in the previous section involved subsets of other vector spaces. For instance, the set of polynomials of degree 2 or less, P2 , is a subset of the set of all functions defined over R, FR . Moreover, both P2 and FR had the operations of addition and scalar multiplication defined in the same way. Such a relationship between two vector spaces will be of fundamental importance to us. D EFINITION (Subspace) If V is a vector space with the given operations of vector addition and scalar multiplication and W is a subset of V that satisfies the conditions a. W contains the zero vector of V, b. W is closed under the operation of vector addition, and c. W is closed under the operation of scalar multiplication, then W is called a subspace of V. Clearly, P2 (or, generally, Pn for any n) is a subset of FR satisfying all three conditions of the definition above; therefore, it is a subspace of FR . T HEOREM 4.5
If W is a subspace of a vector space V , then W is also a vector space.
P ROOF The set W satisfies conditions 1 and 6 of the definition of the vector space automatically as a result of conditions b and c of the definition of the subspace. The remaining eight conditions of the vector space definition also follow, as they hold for all vectors in V, including all those in the subset W (condition a above ensures that the same zero vector used in V can still be used in W for condition 4).
For every vector space V there exist the following two subspaces: •
the subspace composed solely of the zero vector in V and
•
the entire V. You should encounter no difficulty trying to verify that these “extreme” subspaces satisfy conditions a–c of the definition. E XAMPLE 4.7
Let W = {
x 2x
| x ∈ R}.
W is the set of all 2-vectors whose second component equals twice the first. In other words, for a vector
to be in W, we must have
=2
.
Obviously, W is a subset of R2 (which can be written W ⊆ R2 ).
152
Chapter 4 Vector Spaces We will show that W, taken together with the usual operations of vector addition and scalar multiplication , i.e., x y x+y x cx + = and c = , 2x 2y 2x + 2y 2x (c)(2x) satisfies the three conditions required of a subspace. 0 x Condition a. Holds since is in W. (Take x = 0 in .) 0 2x ⎡ ⎤ x+y x y ⎦ . Since , we obtain ⎣ Condition b. Adding two elements of W, and 2x + 2y 2x 2y x+y is in W. Consequently, W is closed 2x + 2y = 2 (x + y) , it follows that 2x + 2y under vector addition. Condition c. Taking a scalar multiple of a vector in W yields ⎡ kx x k =⎣ k(2x) 2x which is in W (again, because of
k(2x)
⎤ ⎦,
= 2 kx ), making W closed under scalar
multiplication. We conclude that W is a subspace of R2 .
⎤ x1 ⎥ ⎢ Let us check whether W = {⎣ x2 ⎦ | x1 , x2 ∈ R} is a subspace of R3 . x1 x2 ⎡
E XAMPLE 4.8
W is a subset of R3 . Consider the conditions of the definition of a subspace. ⎤ ⎡ ⎤ x1 0 ⎥ ⎢ ⎢ ⎥ Condition a. ⎣ 0 ⎦ is in W. (Take x1 = x2 = 0 in ⎣ x2 ⎦.) x1 x2 0 ⎤ ⎤ ⎡ ⎡ y1 x1 ⎥ ⎥ ⎢ ⎢ Condition b. Take two arbitrary 3-vectors in W : ⎣ x2 ⎦ and ⎣ y2 ⎦ . Their sum, x1 x2 y1 y2 ⎤ ⎡ x1 + y1 ⎥ ⎢ ⎦ ⎣ x2 + y2 x1 x2 + y1 y2 generally does not have x1 x2 + y1 y2 equal to the product (x1 + y1 ) (x2 + y2 ) (e.g., take x1 = x2 = y1 = y2 = 1 ). This means that the result of our summation is outside W. ⎡
Since W is not closed under the operation of addition, it cannot be a subspace of R3 . (There is no need to check condition c.)
Section 4.2 Subspaces
a 0
0 0 0 1
| a ∈ R} is not a subspace of M23 , as it fails to meet 0 0 0 condition a of the definition of a subspace: is not in W. 0 0 0 E XAMPLE 4.9
The null space
W ={
153
Recall from Section 2.2 that a homogeneous system of m equations in n unknowns (i.e., a system with all right-hand side values equal to zero) has either one solution (the trivial solution, made up of all zeros) or many solutions (the trivial one as well as nontrivial ones). The following example introduces an important vector space related to any such system.
E XAMPLE 4.10
Given an m × n matrix A, the solution set of the homogeneous system → − → A− x = 0
is a subspace of Rn . Denoting the solution set by W, it is clear that W is a subset of Rn . Let us proceed to verify that W satisfies the three conditions for a subspace: → − Condition a. The zero vector in Rn , 0 , is guaranteed to be a solution of the system (called the trivial solution). → → Condition b. To show that W is closed under vector addition, consider two vectors − x and − y in W. Since both these vectors are among the solutions of our system, we have → − → A− x = 0 and We use properties of matrix multiplication from Theorem 1.5.
Adding the equations, we obtain
→ − → A− y = 0. → − → → A− x + A− y = 0,
which can be rewritten as
→ − → → A(− x +− y)= 0, → → showing that − x +− y is also a solution of our system, therefore is in W.
Condition c. We can show that W is closed under scalar multiplication in a similar manner. → Considering a scalar multiple of a solution − x we can write → − − → → − → − A(c x ) = c(A x ) = c 0 = 0 , → demonstrating that c− x is in W.
→ − → Since the solution set of the homogeneous system A− x = 0 with an m × n matrix A is a subspace of Rn , it is also a vector space. We will refer to this space as the solution space of the homogeneous system, or the null space of the matrix A.
154
Chapter 4 Vector Spaces
Indexed sets of vectors
The concept of a set is fundamental in mathematics. Here are just two examples of sets we have already dealt with in this book: •
the n-space Rn was defined as a set of all n-vectors (p. 2),
•
the solution set of a linear system was defined as a set of all solutions of the given system (p. 62). Rearranging the set changes nothing,( neither does an element: e.g., ( the elements +of( + repeating + 1 0 0 1 1 0 1 the sets , , , , and , , are identical. 2 1 1 2 2 1 2 This is often desirable; e.g., when considering the solution set of a linear system, one is usually not interested in listing the solutions in any prescribed order. However, in the remaining portion of this book, we will often be required to keep track of both the order and the count of elements on the list. We will refer to such a list as an indexed set11 . Curly braces will be used to surround elements of an indexed set, just as is done for ordinary sets, but you should keep the differences between the two concepts clear in your mind:
Set
Indexed set
Number of times an element appears is irrelevant {a, a, b} = {a, b}
Number of times an element appears matters {a, a, b} = {a, b}
Order of elements is irrelevant {a, b} = {b, a}
Order of elements matters {a, b} = {b, a}
If S and T are both indexed sets, we shall say S is a subset of T (denoting it S ⊆ T ) if S can be obtained from T by deleting a certain number of its elements (possibly zero). Additionally, →, . . . , − → → → we will sometimes merge (i.e., concatenate) two indexed sets {− u ui } and {− v1 , . . . , − vj } 1 − → → − → − → − to form another indexed set {u1 , . . . , ui , v1 , . . . , vj }. In many places, we will explicitly refer to indexed sets for emphasis. However, for simplicity, we adopt the following convention for all sets in the remainder of this book: ( + 1 0 • every set with a finite number of elements (e.g., , ) will be implicitly un0 1 derstood to be an indexed set (i.e., repeating or rearranging elements would create a different set), ( + x • every set with an infinite number of elements (e.g., | x ∈ R ) will be implied to 0 be an ordinary set (i.e., repeating or rearranging elements makes no difference). 11
Other possible terminologies used elsewhere are sequence, ordered collection, or tuple.
Section 4.2 Subspaces
Subspaces spanned by sets of vectors
V
D EFINITION (Span)
3u2 u1+4u3 S u2
→, . . . , − → in an abstract vector space V, we can form a linear combination using Given vectors − u u 1 k the same formula as on p. 6: → + ··· + c − → c1 − u 1 k uk . Based on Theorem 4.4 any such linear combination remains in V. We now introduce new terminology to refer to a set composed of all possible linear combinations (with varying scalar values) of the given vectors.
spanS
0
155
u1
→, . . . , − → in a vector space V is called The set of all linear combinations of vectors − u u 1 k − → − → the span of u1 , . . . , uk and is denoted by →, . . . , − →}. span{− u u 1 k
u3 Here is an important property of span.
T HEOREM 4.6 subspace of V.
→, . . . , − → be vectors in a vector space V. The span{− →, . . . , − →} is a Let − u u u u 1 k 1 k
P ROOF →, . . . , − →} is a subset of V. Let us demonstrate that it From Theorem 4.4 it follows that span{− u u 1 k satisfies the three conditions for a subspace: Condition a. Taking c1 = · · · = ck = 0, we have → → − → = − → + · · · + 0− u 0 + · · · + 0 (part (a) of Theorem 4.2) 0− u 1 k → − = 0 (condition 4 of the vector space definition, applied repeatedly) →, . . . , − →}. as one of the vectors in span{− u u 1 k →} : → − → − →, . . . , − u Condition b. Consider two vectors, v and w in span{− u 1 k → − − → →, v = c 1 u1 + · · · + c k − u k → − → + ···+ d − →. w = d − u u 1 1
k k
Adding these equations and repeatedly applying conditions 2, 3, and 8 of the vector space definition on the right-hand side, we get → − → → + · · · + (c + d )− → v +− w = (c1 + d1 )− u 1 k k uk . → − → − − → − → We conclude that v + w remains in span{u1 , . . . , uk }. Condition c. Taking a scalar multiple of a vector → − → + ···+ c − → v = c1 − u 1 k uk we obtain → + ··· + c − → → u d− v = d(c1 − 1 k uk ). Repeatedly applying conditions 7 and 9 of the vector space definition, we can write → → + · · · + dc − → d− v = dc1 − u 1 k uk ; − → − → thus span{u1 , . . . , uk } is closed under scalar multiplication.
So far, the word “span” has been used solely as a noun. However, we shall also occasionally find it useful as a verb.
156
Chapter 4 Vector Spaces
→, . . . , − →} be a set of vectors − →, . . . , − → in a vector space V. Saying Let S = {− u u u u 1 k 1 k − → − → “span {u1 , . . . , uk } = V ” or “span S = V ” (“span” as a noun) is equivalent to saying →, . . . , − → span the space V ” or “S spans V ” (“span” as a verb) “The vectors − u u 1 k
E XAMPLE Recall Example 4.7, in which we have demonstrated that 4.11 x W ={ | x ∈ R} is a subspace of R2 . 2x Another (quicker) way to do so would be to notice that 1 W = span{ }, 2 which, by Theorem 4.6, implies that W is a subspace of R2 .
Geometrically, the space W contains all the vectors parallel to the vector 1 → − straight line x = t . 2
1 2
, i.e., the entire
E XAMPLE 4.12
1 0 0 1 0 0 The set W = span{ , , } is a subspace of M22 0 0 1 0 0 1
(the space of all 2 × 2 matrices).
− → u 1
− → u 2
− → u 3
This subspace contains all symmetric 2 × 2 matrices since any such matrix can be expressed as →, − → − → a linear combination of the vectors − u 1 u2 , and u3 : a b a 0 0 b 0 0 = + + b c 0 0 b 0 0 c Arbitrary 2×2 symmetric matrix
→ + b− → + c− →. = a− u u u 1 2 3
E XAMPLE 4.13 The set W of polynomials p(t) of degree 2 or less such that p(0) = 0 is a subspace of P2 . This is because any such polynomial can be written as required for p(0)=0
0 p(t) = a2 t2 + a1 t + 2 where a1 and a2 are arbitrary, so that W = span {t, t }.
Note that a “slight” modification of the definition of our set can lead to dramatic results. The set W1 of polynomials of degree 2 or less such that p(0) = 1 (rather than p(0) = 0 used in W ) is not a subspace of P2 , as it violates all three conditions (a–c) of the definition of subspace. E.g., condition a fails as the zero polynomial in P2 , p(t) ≡ 0, is not in W1 .
Section 4.2 Subspaces
When do the vectors span the entire space?
157
In the last three examples, we have seen sets that span •
a subspace of R2 (containing vectors along a given line passing through the origin),
•
a subspace of M22 (containing all symmetric 2 × 2 matrices), and
•
a subspace of P2 (containing polynomials whose graphs pass through the origin). In each of those cases, the resulting subspace W was based on a proper subset of the respective space V (R2 , M22 , or P2 ). Sometimes, however, we would like to ask a different question: Does the given set of vectors span the entire vector space V ?
(41)
⎡
E XAMPLE 4.14
⎤ ⎡ ⎤ ⎡ ⎤ 1 0 1 ⎢ ⎥ − ⎢ ⎥ − ⎢ ⎥ → Do the vectors − u = ⎣ 0 ⎦, → v = ⎣ 1 ⎦, → w = ⎣ 1 ⎦ span R3 ? 1 −1 1
S OLUTION → → → → → → Recall that saying “vectors − u,− v , and − w span R3 ” is equivalent to saying “span{− u,− v ,− w} = 3 R ”. → → → The notation span{− u,− v ,− w } represents the set of all possible linear combinations → → → c1 − u + c2 − v + c3 − w. (42) → 3 For this set to be identical to R , it must be true that every vector − x in R3 can be represented as a linear combination (42): → → → → u + c2 − v + c3 − w =− x. (43) c1 − ⎡ ⎤ d1 ⎢ ⎥ → We can endow − x = ⎣ d2 ⎦ with components d1 , d2 , and d3 which are arbitrary real numbers. d3 The vector equation (43) can be rewritten as a linear system + c3 = d1 c2 + c3 = d2 c1 − c2 + c3 = d3 We are now on a familiar turf since solving linear systems has been covered in great detail in Section 2.1. Let us proceed to solve the system applying elementary row operations to the augmented matrix ⎤ ⎡ 1 0 1 d1 ⎥ ⎢ 1 1 d2 ⎦ . ⎣ 0 c1
1 −1 1 d3 We must be very careful to carry out the appropriate algebraic operations on the right-hand side involving the arbitrary values. ⎤ ⎡ d1 1 0 1 ⎥ ⎢ r3 − r1 → r3 yields ⎣ 0 1 1 d2 ⎦ , 0 −1 0 −d1 + d3 ⎤ ⎡ d1 1 0 1 ⎥ ⎢ r3 + r2 → r3 yields ⎣ 0 1 1 d2 ⎦ , 0 0 1 −d1 + d2 + d3 ⎤ ⎡ d1 1 0 1 ⎥ ⎢ r2 − r3 → r2 yields ⎣ 0 1 0 d1 − d3 ⎦ , 0 0 1 −d1 + d2 + d3
158
Chapter 4 Vector Spaces and, finally
⎤ 1 0 0 2d1 − d2 − d3 ⎥ ⎢ r1 − r3 → r1 yields ⎣ 0 1 0 d1 − d3 ⎦ . 0 0 1 −d1 + d2 + d3 The reduced row echelon form contains no row [0 · · · 0 | nonzero ] that would render the system inconsistent. Consequently, the system and the corresponding vector equation (43) ⎤ ⎤ ⎡ ⎡ d1 c1 ⎥ ⎥ ⎢ ⎢ → → x = ⎣ d2 ⎦ . We conclude that our vectors − u, possess solutions ⎣ c2 ⎦ for all 3-vectors −
− → → v , and − w span R3 .
⎡
c3
d3
E XAMPLE 4.15 Given the polynomials (or “vectors”) in P3 , p1 (t) = t+t3 , p2 (t) = −1+t2 , and p3 (t) = t, let us determine whether p1 , p2 , p3 span the entire P3 . For this to happen we would need to demonstrate that for all real numbers d1 , d2 , d3 , and d4 we can find c1 , c2 , and c3 such that c1 p1 (t) + c2 p2 (t) + c3 p3 (t) = d1 + d2 t + d3 t2 + d4 t3 . (44) Note that the right-hand side of equation (44) contains an arbitrary polynomial in P3 . Substituting the given expressions for the three polynomials c1 t + t3 + c2 −1 + t2 + c3 t = d1 + d2 t + d3 t2 + d4 t3 , and collecting like terms, we get −c2 + (c1 + c3 ) t + (c2 ) t2 + (c1 ) t3 = d1 + d2 t + d3 t2 + d4 t3 . Two polynomials are equal for all t only if the coefficients in front of the same powers of t on both sides are equal. This results in the linear system − c2 c1
+ c3 c2
c1 The augmented matrix is
= = = =
d1 d2 d3 d4
⎡
⎤ 0 −1 0 d1 ⎢ ⎥ ⎢ 1 0 1 d2 ⎥ ⎢ ⎥. ⎢ 0 1 0 d3 ⎥ ⎣ ⎦ 1 0 0 d4 Proceeding strictly according to the pivoting sequence would involve a number of row operations (r1 ↔ r2 ; r4 − r1 → r4 ; etc.) However, a nice shortcut is available if we choose to perform the row operation r3 + r1 → r3 instead, as it yields ⎤ ⎡ d1 0 −1 0 ⎥ ⎢ ⎢ 1 0 1 d2 ⎥ ⎥. ⎢ ⎢ 0 0 0 d1 + d3 ⎥ ⎦ ⎣ 1 0 0 d4 Even though this matrix is not yet in r.r.e.f. (or even in r.e.f.), the third row can become [0 · · · 0 | nonzero] if d1 + d3 = 0; therefore, our system, and equation (44), can become inconsistent. As a result, our three polynomials do not span P3 .
→, . . . , − →, − −→ − → T HEOREM 4.7 Let − u u 1 k uk+1 , . . . , un be vectors in a vector space V. If the vectors − → − → − → − → → − → u1 , . . . , uk span V , then the vectors u1 , . . . , uk , − u− k+1 , . . . , un also span V .
Section 4.2 Subspaces
159
P ROOF →, . . . , − → span V, it follows that for every vector − → From the assumption that − u u v in V there 1 k − → − → → − exist real numbers c1 , . . . , ck such that v = c1 u1 + · · · + ck uk . Consequently, we can → + ··· + c − → −−→ − → → u write − v = c1 − 1 k uk + 0uk+1 + · · · + 0un , which allows us to conclude that − → − − → − → − → u1 , . . . , uk , uk+1 , . . . , un also span V.
E XAMPLE 4.16 ⎡
a. In ⎡ ⎢ ⎣
⎤ ⎡ ⎤ 1 0 ⎢ ⎥ − ⎢ ⎥ → − Example 4.14, we have shown that the vectors → u = ⎣ 0 ⎦, → v = ⎣ 1 ⎦, − w = 1 −1 ⎡ ⎤ ⎤ 1 1 ⎢ ⎥ ⎥ → → → → u,− v ,− w,− z = ⎣ 2 ⎦ span R3 1 ⎦ span R3 . By Theorem 4.7, it follows that vectors − 3
1 as well.
b. The vectors in P3 (polynomials), p1 (t) = t + t3 , p2 (t) = −1 + t2 , and p3 (t) = t, do not span P3 according to Example 4.15; hence p1 (t) = t + t3 and p2 (t) = −1 + t2 do not span P3 either (if they did, it would contradict the result of Theorem 4.7).
Linear Algebra Toolkit implementation LA TOOLKIT .COM IL 26748372846 NG 03571894251 EE 71028335189 AB 25729491092 RR 17398108327 + A 39163728081
You will probably be glad to learn that the Linear Algebra Toolkit includes a module designed to solve problems of the type (41). Try to invoke this module for the vectors in the recent examples to see how it works. You might notice some discrepancies between the solutions generated by the Toolkit and the solutions above: •
The Toolkit follows the pivoting sequence introduced in our coverage of Gauss-Jordan reduction in Section 2.1, always going all the way to the r.r.e.f. In doing so, it is unable to take advantage of the various shortcuts, such as the one seen in Example 4.15.
•
The Toolkit does not accept variables as entries in matrices. Consequently, it cannot use d1 , d2 , etc., as the right-hand side values. Let us use the vectors from Example 4.14 to illustrate the work-around used. The vector equation (43) can be written as ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 1 1 0 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ c1 ⎣ 0 ⎦ + c2 ⎣ 1 ⎦ + c3 ⎣ 1 ⎦ = d1 ⎣ 0 ⎦ + d2 ⎣ 1 ⎦ + d3 ⎣ 0 ⎦ . 1 −1 1 0 0 1 Performing an elementary row operation on the “enlarged” augmented matrix with number entries, ⎡ ⎤ 1 0 1 1 0 0 ⎢ ⎥ 1 1 0 1 0 ⎦, ⎣ 0 1 −1
1 0 0 1
160
Chapter 4 Vector Spaces e.g., r3 − r1 → r3 , results in ⎡
⎤ 1 0 1 1 0 0 ⎢ ⎥ 1 1 0 1 0 ⎦, ⎣ 0 0 −1 0 −1 0 1 which is equivalent to performing the operations using d1 , d2 , and d3 .
EXERCISES
In Exercises 1–6, decide whether the given set of 2-vectors, with the usual operations, is a valid subspace of R2 . 1. W = { 2. W = { 3. W = { 4. W = { 5. W = { 6. W = {
x1
| x1 ∈ R}. x1 + 3 x1 | x1 ∈ R}. x1 0 | x1 ∈ R}. x1 x1 | x1 ∈ R}. 1 x1 | x1 , x2 ∈ R and x1 ≥ 0}. x2 x1 | x1 , x2 ∈ R and x1 x2 ≤ 0}. x2
In Exercises 7–10, decide whether the given set of 3-vectors, with the usual operations, is a valid subspace of R3 . ⎤ ⎡ x1 ⎥ ⎢ 7. W = {⎣ x2 ⎦ | x1 , x2 ∈ R}. x1 − x2 ⎤ x1 ⎥ ⎢ 8. W = {⎣ 0 ⎦ | x1 , x2 ∈ R}. x2 ⎡
⎡
⎤ x1 ⎢ ⎥ 9. W = {⎣ x2 ⎦ | x1 , x2 , x3 ∈ R and x1 + 2x2 − x3 = 0}. x3 ⎡
⎤ x1 ⎢ ⎥ 10. W = {⎣ x2 ⎦ | x1 , x2 , x3 ∈ R and x1 − x2 + x3 = 1}. x3
Section 4.2 Subspaces
161
11. Is the set W = {a + at + at2 | a ∈ R}, with the usual operations in P2 , a subspace of P2 ? 12. Is the set W = {a + t | a ∈ R}, with the usual operations in P1 , a subspace of P1 ? 1 a 13. Is the set W = { | a ∈ R}, with the usual operations in M22 , a subspace of 0 1 M22 ? 0 a1 0 14. Is the set W = { | a1 , a2 , a3 ∈ R and a1 + a2 = a3 }, with the usual a2 0 a3 operations in M23 , a subspace of M23 ? 15. Is the set of 3 × 4 matrices in reduced row echelon form, with the usual operations in M34 , a subspace of M34 ? 16. Is the set of lower triangular 2 × 2 matrices, with the usual operations in M22 , a subspace of M22 ? 17. Is the set of functions continuous on R, with the usual operations in FR , a subspace of FR ? 18. Is the set of functions defined on R with a discontinuity at 0, with the usual operations in FR , a subspace of FR ? 19. Is the set of nondecreasing functions defined on R, with the usual operations in FR , a subspace of FR ? 20. Is the set of functions defined on R that are periodic with a period 2π, with the usual operations in FR , a subspace of FR ? In Exercises 21–30, decide whether the given vectors span the vector space. Follow the procedure introduced in Examples 4.14 and 4.15. Do not use technology while solving these exercises – however, after you solve the exercise, you may want to compare your solution to the one generated by the Linear Algebra Toolkit. 1 2 21. Do the vectors and span R2 ? 1 1 1 −1 0 22. Do the vectors , , and span R2 ? 3 2 1 ⎡
1
⎤ ⎡
0
⎤
⎢ ⎥ ⎢ ⎥ 23. Does the set {⎣ 3 ⎦ , ⎣ 1 ⎦} span R3 ? 0 −2 ⎡
⎤ ⎡ ⎤ ⎡ ⎤ 1 0 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 24. Does the set {⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ −1 ⎦} span R3 ? −1 −1 0 ⎡
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 1 0 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 25. Do the vectors ⎣ 0 ⎦ , ⎣ 0 ⎦ , ⎣ 2 ⎦ , and ⎣ 0 ⎦ span R3 ? 0 2 −1 0
162
Chapter 4 Vector Spaces ⎡
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 4 −3 5 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 26. Do the vectors ⎣ 2 ⎦ , ⎣ 0 ⎦ , ⎣ 2 ⎦ , and ⎣ 2 ⎦ span R3 ? 3 1 2 4 27. Does the set {1 − 3t2 , t, −1 + 2t2 } span P2 ? 28. Does the set {1 + t2 + t3 , t − 2t2 + t3 } span P3 ? 1 0 0 1 1 0 29. Does the set { , , } span M22 ? 0 1 1 0 1 0 1 0 1 0 0 1 0 1 30. Does the set { , , , } span M22 ? 0 0 0 1 0 0 1 0
T/F?
In Exercises 31–34, decide whether each statement is true or false. Justify your answer. 31. Every subspace of R3 contains infinitely many vectors. → − → → 32. If − u and − v are vectors in a vector space V and 0 is the zero vector in V, then → − → → → → span{− u,− v , 0 } = span{− u,− v }. → → → → → 33. If − u, − v , and − w are vectors in a vector space V , then span{− u,− v } is a subspace of → − → − → − span{ u , v , w }. 34. Any set of 3 × 5 matrices spans a subspace of M35 . →, . . . , − → − → n 35. * Let − u u− 1 k−1 , uk be vectors in R and let A be an n × n matrix. → ∈ span{− →, . . . , − → − → − → −−→ a. Show that if − u u u− k 1 k−1 }, then Auk ∈ span {Au1 , . . . , Auk−1 }. − → −−→ →∈ b. Show that it is not generally true that if − u k / span{u1 , . . . , uk−1 }, then − → − − → − → / span{Au1 , . . . , Auk−1 }. (Hint: Find an A for which the statement is never A uk ∈ true.) → ∈span{ − →, . . . , − → c. Show that if A is nonsingular and − u u u− k / 1 k−1 }, then − → − → − − → A uk ∈ / span{Au1 , . . . , Auk−1 }. − − 36. * Let W be a subspace of a vector space V and let → u and → v be vectors in V such that → − → − → − → − u, v ∈ / W. Is it always true that u + v ∈ / W? 37. * Let A be an m × n matrix and let B be an n × p matrix. Show that the null space of B is a subspace of the null space of AB. Find examples of 2 × 2 matrices A and B and a 2-vector → − → − → − → → v such that AB − v = 0 but B − v = 0 . 38. * Let A and B be n × n matrices such that A is nonsingular. Show that the null spaces of B and AB are identical. 39. * Let W1 and W2 be subspaces of a vector space V. The sum of these subspaces, W1 + W2 , is defined to be the set of all vectors that can be expressed as a sum of vectors in the two subspaces: →+− →|− → − → w w (45) W1 + W2 = {− 1 2 w1 ∈ W1 ; w2 ∈ W2 }. Show that W1 + W2 is a subspace of V. 40. * Let V1 be a subspace of V and let W1 be a subspace of W. Show that the Cartesian product V1 × W1 is a subspace of V × W. (See Exercise 14 on p. 150 for the definition of the Cartesian product of vector spaces.)
Section 4.3 Linear Independence 41.
163
a. * Let W1 and W2 be subspaces of V. Show that the intersection → → → W1 ∩ W2 = {− u |− u ∈ W1 and − u ∈ W2 } is also a subspace of V. b. * Given the homogeneous system x1 + 2x2 + x3 = 0 2x1 − x2 − 3x3 = 0 illustrate how the solution space of this system (a line) is the intersection of solution spaces of the individual equations (planes).
4.3 Linear Independence →, . . . , − →} be an indexed set of vectors in a vector space V. D EFINITION Let S = {− u u 1 k We say that S is linearly independent if the only real numbers c1 , . . . , ck that satisfy the equation → → + ··· + c − → − u c1 − 1 k uk = 0 are c1 = · · · = ck = 0. Otherwise (i.e., if at least some ci ’s can be nonzero), S is said to be linearly dependent.
⎡ E XAMPLE 4.17
⎢ ⎢ − → Consider the vectors u1 = ⎢ ⎢ ⎣
1 2 1
⎤
⎡
⎥ ⎢ ⎥ − →=⎢ ⎥, u ⎢ ⎥ 2 ⎢ ⎦ ⎣
0 The equation
⎤ ⎡ 1 ⎥ ⎢ ⎢ ⎢ 2 ⎥ ⎢ ⎥ ⎢ c1 ⎢ + c2 ⎢ ⎥ ⎢ ⎣ 1 ⎦ ⎣ 0 corresponds to the linear system ⎡
c1 2c1 c1 ⎡
0 1 2 1
⎤
⎤
⎡
⎥ ⎢ ⎥ − →=⎢ ⎥, u ⎢ ⎥ 3 ⎢ ⎦ ⎣
⎡
1 1 0 −1
+ c3 + c3 − c3 ⎤
⎡
⎤
⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎦ ⎣
= = = =
1 1 0
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
−1
1
⎥ ⎢ ⎥ ⎢ ⎥ + c3 ⎢ ⎥ ⎢ ⎦ ⎣
+ c2 + 2c2 c2
0 1 2
0 0 0 0
⎤ ⎥ ⎥ ⎥ ⎥ ⎦
0 0 0 0
1 0 1 0 ⎢ ⎥ ⎢ 2 1 1 0 ⎥ ⎢ ⎥ has the reduced row echelon form (check!): whose augmented matrix ⎢ 0 0 ⎥ ⎣ 1 2 ⎦ 0 1 −1 0 ⎡ ⎤ 0 0 0 1 ⎢ ⎥ ⎢ 0 1 0 0 ⎥ ⎢ ⎥, making c1 = c2 = c3 = 0 the only solution. Therefore, the set ⎢ 0 0 1 0 ⎥ ⎣ ⎦ 0 0 0 0 →, − → − → S = {− u 1 u2 , u3 } is linearly independent (abbreviated L.I.).
164
Chapter 4 Vector Spaces ⎡ E XAMPLE 4.18
⎢ ⎢ − → Consider the vectors v1 = ⎢ ⎢ ⎣
1 −1 2 1
⎡
⎤
⎢ ⎥ ⎢ ⎥ − ⎢ ⎥, → ⎥ v2 = ⎢ ⎣ ⎦
1 2 −3 0
⎡
⎤
⎢ ⎥ ⎢ ⎥ − ⎢ ⎥, → ⎥ v3 = ⎢ ⎣ ⎦
3 3 −4 1
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
Proceeding as in the previous example, we leave it for the reader to check that the equation ⎡ ⎢ ⎢ c1 ⎢ ⎢ ⎣
1 −1 2 1
⎤
⎡
⎥ ⎢ ⎥ ⎢ ⎥ + c2 ⎢ ⎥ ⎢ ⎦ ⎣
1 2 −3 0
⎤
⎡
⎥ ⎢ ⎥ ⎢ ⎥ + c3 ⎢ ⎥ ⎢ ⎦ ⎣
3 3 −4 1
⎤
⎡
⎥ ⎥ ⎥ ⎥ ⎦
⎢ ⎢ = ⎢ ⎢ ⎣
0 0 0 0
⎤ ⎥ ⎥ ⎥ ⎥ ⎦
can be rewritten as a linear system whose augmented matrix ⎡ ⎤ 1 1 3 0 ⎢ ⎥ ⎢ −1 2 3 0 ⎥ ⎢ ⎥ ⎢ 2 −3 −4 0 ⎥ ⎣ ⎦ 1 0 1 0 has the reduced row echelon form ⎡ ⎤ 0 1 0 1 ⎥ ⎢ ⎢ 0 1 2 0 ⎥ ⎥. ⎢ ⎢ 0 0 0 0 ⎥ ⎦ ⎣ 0 0 0 0 There are infinitely many solutions c1 = −c3 , c2 = −2c3 , with c3 arbitrary. Consequently, the → − → set T = {− v1 , → v2 , − v3 } is linearly dependent (abbreviated L.D.).
→, − → − → For brevity, we shall sometimes say things like “vectors − u 1 u2 , u3 are linearly independent” → − → − → − or “ v1 , v2 , v3 are linearly dependent” without explicitly referring to the corresponding sets. In spite of that, we want to emphasize that the linear independence/dependence is a property of an indexed set and not a property of individual vectors. For instance, there is nothing inherently → − dependent about the three vectors in Example 4.18 taken individually, as the set {− v1 , → v2 } can easily be shown to be L.I. (check!). ⎤ | | ··· | ⎢ → − → ⎥ → ··· − According to Theorem 1.6, the product of an n × k matrix A = ⎣ − u u1 u 2 k ⎦ | | ··· | ⎤ ⎡ c1 ⎥ ⎢ c ⎢ 2 ⎥ → − ⎢ and a k-vector v = ⎢ . ⎥ ⎥ can be written as a linear combination of the columns of A : ⎣ .. ⎦ ⎡
ck →+c − →+···+c − → − → − → → u u A− v = c1 − 1 2 2 k uk . Consequently, to determine whether u1 , . . . , uk are linearly independent, one can check the number of solutions of the corresponding homogeneous system, as was done in the last two examples. The following table summarizes this relationship:
Section 4.3 Linear Independence
Six equivalent statements
⎡
| ⎢ − → u ⎣ 1 |
The linear system ⎤ ⎡ ⎤⎡ c1 ··· | ⎢ . ⎥ ⎢ → ⎥ ⎥ ⎢ ⎢ ··· − u k ⎦ ⎣ .. ⎦ = ⎣ ··· | ck has
The n-vectors − →, . . . , − → u u 1 k are
The equation → → + ···+ c − → − c1 − u 1 k uk = 0 can be solved
linearly independent
only by c1 = · · · = ck = 0
only the trivial solution
linearly dependent
with at least one of the ci ’s nonzero
the trivial solution as well as nontrivial solutions
165
⎤ 0 .. ⎥ ⎥ . ⎦ 0
According to the above discussion, another statement can be added to the list of equivalent statements from p. 125. 6 Equivalent Statements For an n × n matrix A, the following statements are equivalent. 1. A is nonsingular. 2. A is row equivalent to In . → − → − → 3. For every n-vector b , the system A− x = b has a unique solution. → − → 4. The homogeneous system A− x = 0 has only the trivial solution. 5. det A = 0. 6. The columns of A are linearly independent.
6 Equivalent “Negative” Statements For an n × n matrix A, the following statements are equivalent. -1. A is singular. -2. A is not row equivalent to In . → − → − → -3. For some n-vector b , the system A− x = b has either no solution or many solutions. → − → -4. The homogeneous system A− x = 0 has nontrivial solutions. -5. det A = 0. -6. The columns of A are linearly dependent.
166
Chapter 4 Vector Spaces
Linear independence and dependence in general vector spaces
Up to now, our examples focused on linear independence of vectors in Rn . We shall now proceed to the treatment of vectors in other vector spaces. E XAMPLE Consider three vectors in the space M22 , i.e., 2 × 2 matrices: 4.19 the following 1 0 0 1 1 0 , , . According to the definition, the set of these vectors is 0 1 1 0 0 −1 linearly independent if and only if the only solution of the equation 1 0 0 1 1 0 0 0 + c2 + c3 = c1 0 1 1 0 0 −1 0 0 is c1 = c2 = c3 = 0.
LA TOOLKIT .COM IL 26748372846 NG 03571894251 EE 71028335189 AB 25729491092 RR 17398108327 + A 39163728081 The procedure we follow in these examples is implemented by the “Linear independence and dependence” module of the Linear Algebra Toolkit. Note that the toolkit uses coefficient matrices instead of the augmented matrices of the homogeneous systems. The two approaches are equivalent since the right-hand sides are all zero anyway.
Performing the operations on the left-hand side, we obtain c2 0 0 c1 + c3 = . c2 c1 − c3 0 0 For two such matrices to be equal, their corresponding entries must be equal. This leads us to the homogeneous system c1 + c3 = 0 = 0 c2 c2 = 0 c1 − c3 = 0 whose augmented matrix is ⎤ ⎡ 1 0 1 0 ⎥ ⎢ ⎢ 0 1 0 0 ⎥ ⎥. ⎢ ⎢ 0 1 0 0 ⎥ ⎦ ⎣ 1 0 −1 0 Performing a sequence of elementary row operations, r4 − r1 → r4 , r3 − r2 → r3 , r3 ↔ r4 , −1 r3 → r3 , 2 results in the r.r.e.f. ⎤ ⎡ 0 0 0 1 ⎥ ⎢ ⎢ 0 1 0 0 ⎥ ⎥. ⎢ ⎢ 0 0 1 0 ⎥ ⎦ ⎣ 0 0 0 0 Every left-hand side column contains a leading entry; therefore, the solution is unique: c1 = c2 = c3 = 0. We conclude that the three given vectors in M22 are linearly independent.
E XAMPLE 4.20 Consider the following vectors in P2 , that is, polynomials of degree 2 or less: p1 (t) = 1 + t2 , p2 (t) = 1 + t, p3 (t) = t + t2 , p4 (t) = t − t2 . In order for these (i.e., for their indexed set) to be linearly independent, the equation c1 p1 (t) + c2 p2 (t) + c3 p3 (t) + c4 p4 (t) = 0 (46) must have only one solution, c1 = c2 = c3 = c4 = 0. Substituting the individual polynomials into (46) c1 1 + t2 + c2 (1 + t) + c3 t + t2 + c4 t − t2 = 0
Section 4.3 Linear Independence
For two polynomials to be equal, the coefficients corresponding to the same power of the variable on both sides must be equal.
then grouping like powers of t (c1 + c2 ) + (c2 + c3 + c4 )t + (c1 + c3 − c4 )t2 = 0 we arrive at the homogeneous system c1
+ c2 c2
c1 with the augmented matrix
Obtaining the reduced row echelon form would require some additional operations without giving us any more information at this stage.
167
+ +
c3 c3
+ c4 − c4
= 0 = 0 = 0
⎡
⎤ 1 1 0 0 0 ⎢ ⎥ 1 0 ⎦. ⎣ 0 1 1 1 0 1 −1 0 After the elementary row operations are executed r3 − r1 → r3 , r3 + r2 → r3 , 1 r3 → r3 , 2 we obtain a row echelon form ⎡ ⎤ 1 0 0 0 1 ⎢ ⎥ (47) 1 1 1 0 ⎦. ⎣ 0 0 0 1 0 0 The fourth column (corresponding to the unknown c4 ) contains no leading entry, allowing us to let c4 be arbitrary. Consequently the given vectors in P2 are linearly dependent.
Refer to the module “Linear independence and dependence” of the Linear Algebra Toolkit for a step-by-step explanation of the process involved in deciding linear independence or dependence of indexed sets of vectors in Rn , Pn , and Mmn .
Properties of linear independence and dependence
The following results will help us appreciate the significance of linear independence or dependence of vectors. T HEOREM 4.8
→ and − → be vectors in a vector space V. Let − u u 1 2
→ →} is linearly dependent if and only if − →=− 1. S = {− u u 0. 1 1 → →, − 2. S = {− u 1 u2 } is linearly dependent if and only if at least one of the vectors is a scalar multiple of the other. P ROOF 1. Part I (⇒)
→ → →=− →=− From part (c) of Theorem 4.2, c1 − 0 , then either c1 = 0 or − u 0 (or both). However u 1 1 →= →} implies that c can be nonzero. Consequently, − u assuming linear dependence of {− u 1 1 1 → − 0. Part II (⇐) → → →=− →=− →}. u Since − u 0 , then c1 = 1 = 0 leads to c1 − 0 and to the linear dependence of {− u 1 1 1
168
Chapter 4 Vector Spaces 2. Part I (⇒) →, − → From the definition, the linear dependence of {− u 1 u2 } implies that at least one of the two − → − → − → scalars c1 or c2 can be nonzero while c1 u1 + c2 u2 = 0 . If c1 = 0, then we add the negative − → of c2 u2 to both sides → − →+c − → − → →)) u u 0 + (−(c2 − (c1 − 1 2 u2 ) + (−(c2 u2 )) = 2 → + (c − → − → − → u c1 − 1 2 u2 + (−(c2 u2 ))) = −(c2 u2 ) (conditions 3 and 4 of the vector space def.) → →+− →) (condition 5 of the vector space definition) 0 = −(c2 − u u c1 − 1 2 → = −(c − → c1 − u 1 2 u2 ) (condition 4 of the vector space definition) → = (−1)(c − → u c1 − 1 2 u2 ) (Theorem 4.3) − → − → c u = (−c )u (condition 9 of the vector space definition) 1 1
1 →) = u (c1 − 1 c1 c1 − → = u 1 c1 − → = u 1 →= Likewise, if c2 = 0, then − u 2 other. Note that in this part of the proof we do not assume d = 0.
2
2
1 →) (multiply both sides by 1 ) ((−c2 )− u 2 c1 c1 −c2 − → u2 (condition 9 of the vector space definition) c1 −c2 − → ( c1 = 1 and cond. 10 of the vector space def.) u 2 c1 c1 → −c1 − c2 u1 ,
making one of the vectors a scalar multiple of the
Part II (⇐) → − → = d− →, then we can write c − → − → If − u u 1 2 1 u1 + c2 u2 = 0 where c2 = −d and c1 = 1 = 0. − → − → Likewise, if u2 = du1 , we can take c1 = −d and c2 = 1 = 0. In either case, we have →, − →}. established linear dependence of {− u u 1
2
T HEOREM 4.9 Let S be an indexed set of m vectors in a vector space V. If there exists a linearly dependent subset of S, then S is linearly dependent. P ROOF Rearranging vectors in an indexed set does not affect the set’s linear independence since the → − → corresponding rearrangement of the terms in c1 − v1 + · · · + cm − v→ m = 0 leads to an equivalent equation. − → v are the elements of Let us rearrange the vectors in S as necessary so that the vectors − v ,...,→ 1
k
the linearly dependent subset of S; i.e., there exist scalars c1 , . . . , ck not all of which are zero → − → → such that c1 − v1 + · · · + ck − vk = 0 . Suppose the remaining vectors in S (outside the subset) are → → − → − −−→ − → − − −→, . . . , − v→ vk+1 m . The equation c1 v1 + · · · + ck vk + 0vk+1 + · · · + 0vm = 0 is satisfied with at least some of the coefficients nonzero, therefore making S linearly dependent.
The following corollary follows directly from the last two theorems – it identifies special situations that make the linear dependence of the given vectors apparent.
C OROLLARY 4.10
→, . . . , − →} be an indexed set of vectors in a vector space V. Let S = {− u u 1 k
→, . . . , − → is a zero vector, then S is linearly dependent. 1. If at least one of the vectors − u u 1 k 2. If a vector appears more than once in S, then S is linearly dependent.
Section 4.3 Linear Independence
169
Note that the second part of the corollary helps justify our insistence on using the → →, . . . , − “indexed referring to {− u set” terminology while 1 uk }. For instance, the indexed set 1 0 0 0 T = { , , } is L.D. because is repeated, but as an ordinary set T 0 1 1 1 1 0 would equal { , }, which is L.I. 0 1 While a rearrangement of elements in a set does not affect the linear independence, the number of times an element appears in an indexed set can affect it. →, . . . , − → in a vector space V are u u T HEOREM 4.11 For any integer k > 1, the vectors − 1 k linearly dependent if and only if at least one of them can be expressed as a linear combination of the remaining vectors. P ROOF Part I (⇒) →, . . . , − → are linearly dependent, it follows from the definition that there exist Assuming − u u 1 k → → + ··· + c − → − u scalars c1 , . . . , ck , not all of which are zero, such that c1 − 1 k uk = 0 . Let cj = 0 be → uj to both sides of the one of these nonzero coefficients. Then, we can add the negative of cj − vector equation and apply conditions 2, 3, 4, and 5 of the vector space definition: → + ··· + c − −→ −−→ − → → − c1 − u 1 j−1 uj−1 + cj+1 uj+1 + · · · + ck uk = −(cj uj ). Taking a scalar multiple of both sides using the scalar −1 cj , then applying Theorem 4.3 as well as conditions 7, 9, and 10 of the vector space definition yields −c1 − → + · · · + −cj−1 − −→ + −cj+1 − −→ + · · · + −ck − →=− → u u u u uj , 1 j−1 j+1 k cj cj cj cj which concludes this part of the proof. Part II (⇐) → Suppose it is possible to express − ui as a linear combination of the remaining vectors: → − − → −−→ + d − −→ − → ui = d1 u1 + · · · + di−1 u i−1 i+1 ui+1 + · · · + dk uk . → − → − Adding − ui = (−1) ui (Theorem 4.3) to both sides, applying conditions 2, 3, 4, and 5 of the vector space definition, then substituting di = −1 results in → − → + ··· + d − → →, 0 =d − u u−→ + d − u +d − u−→ + · · · + d − u 1 1
i−1 i−1
i i
i+1 i+1
k k
which shows that the vectors are linearly dependent (since di = −1 = 0). This concludes the proof.
According to the above theorem, linear dependence of vectors can be thought of as an indication of redundancy among the vectors in that it is possible to reproduce one of these vectors by combining the others. On the other hand, this is not possible if the vectors are linearly independent.
E XAMPLE 4.21 In Example 4.20, we have established linear dependence of the set of polynomials p1 (t) = 1 + t2 , p2 (t) = 1 + t, p3 (t) = t + t2 , p4 (t) = t − t2 . To interpret this dependence using Theorem 4.11, we will demonstrate that one of the polynomials can be expressed as a linear combination of the remaining ones.
170
Chapter 4 Vector Spaces Using the row echelon form (47), we proceed to solve our system by backsubstitution: c4 = arbitrary c3 = 0 c2 = −c3 − c4 = −c4 c1 = −c2 = c4 One of the many nontrivial solutions will be obtained if we let c4 = 1. Then c3 = 0, c2 = −1, and c1 = 1. After substituting these values into (46) 1p1 (t) − 1p2 (t) + 0p3 (t) + 1p4 (t) = 0, we are now ready to express one of the polynomials as a linear combination of the remaining ones, e.g., p4 (t) = −1p1 (t) + 1p2 (t) + 0p3 (t).
Check: −(1 + t2 ) + (1 + t) = t − t2
It is important that we read the statement of Theorem 4.11 in a precise manner. In particular, the theorem does not claim that, in a linearly dependent set, every vector can be expressed as a linear combination of the others. You can easily see that would not be true in the recent example: there is no way to represent p3 (t) as a linear combination of the other polynomials listed. According to the theorem, none of the three matrices in Example 4.19 can be expressed as a linear combination of the other two. While working with linearly dependent sets, it will sometimes be convenient to distinguish →, . . . , − → that can be represented as a linear combination of the the first vector on the list − u u 1 k preceding ones. If a set is linearly dependent, such a vector must exist unless the first vector on the list is a zero vector.
T HEOREM 4.12 only if • •
→, . . . , − → in a vector space V are linearly dependent if and The vectors − u u 1 k
→ − →=− u 0 or 1
→ can be expressed as a linear combination of the pre→, . . . , − u at least one of the vectors − u 2 k ceding vectors. P ROOF Part I (⇒) → →=− If k = 1, then − u 0 from part 1 of Theorem 4.8. 1 If k > 1, then by definition there exist scalars c1 , . . . , ck , not all of which are zero, such that → → + ··· + c − → − u c1 − 1 k uk = 0 . Let cj be the last nonzero scalar on the list.
• •
→ → →=− →=− If j = 1, then c1 − u 0 so that − u 0 from part (c) of Theorem 4.2. 1 1 −cj−1 −−→ → → − −c1 − u + ··· + u (refer to Part I of proof of Theorem If j > 1, then u = j
4.11).
cj
1
cj
j−1
Section 4.3 Linear Independence
171
Part II (⇐) → → − → → →=− →=− →, . . . , − → If − u u2 +· · ·+0− 0 , then c1 0 +0− u 0 with any nonzero c1 so that the vectors − u u 1 k 1 k are linearly dependent. → Suppose it is possible to express − ui as a linear combination of the preceding vectors: → − → + ··· + d − −→ ui = d1 − u 1 i−1 ui−1 . → Adding (−1)− ui to both sides and substituting di = −1 as well as di+1 = · · · = dk = 0 (refer to Part II of the proof of Theorem 4.11) results in → − → + ··· + d − → →, 0 =d − u u−→ + d − u +d − u−→ + · · · + d − u 1 1
i−1 i−1
i i
i+1 i+1
k k
which shows that the vectors are linearly dependent (since di = −1 = 0). This concludes the proof.
Applications of linear independence and dependence
Alloys were discussed in
Example 1.5 p. 6 Example 2.18 p. 98
Given the following alloys
E XAMPLE 4.22
Gold
Silver
Copper
Zinc
Alloy 1
14 24
1 24
8 24
1 24
Alloy 2
18 24
0
6 24
0
Alloy 3
19 24
0
5 24
0
Alloy 4
9 24
2 24
11 24
2 24
Alloy 5
18 24
4 24
2 24
0
consider the vectors describing the mix of the four metals in each alloy: ⎤ ⎤ ⎤ ⎤ ⎡ ⎡ ⎡ ⎡ 14 24 1 24 8 24 1 24
18 24
19 24
9 24 2 24 11 24 2 24
⎡
18 24 4 24 2 24
⎤
⎥ ⎥ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ ⎢ ⎥ ⎥ − ⎥ − 0 ⎥ 0 ⎥ − →=⎢ − →=⎢ →=⎢ →=⎢ ⎥ ⎥ ⎥. ⎥,u ⎥ ⎢ ⎢ ⎢ ⎢ , u , u , u ⎥ ⎥ 2 ⎢ 6 ⎥ 3 ⎢ 5 ⎥ 4 ⎢ ⎥ 5 ⎢ ⎦ ⎦ ⎦ ⎣ 24 ⎦ ⎣ 24 ⎦ ⎣ ⎣ 0 0 0 For each indexed set listed below, find if it is linearly independent or linearly dependent. ⎢ − →=⎢ ⎢ u 1 ⎢ ⎣
→, − → − → (a) S = {− u 1 u3 , u4 } To find if the set is L.I. or L.D., solve the homogeneous system corresponding to the equation → → + c− → − → + b− u a− u 1 3 ⎤ u4 = 0 .⎡ ⎤ ⎡ 14 19 9 1 0 0 2 0 24 24 ⎢ ⎥ ⎥ ⎢ 24 1 2 ⎢ ⎢ 24 1 −1 0 ⎥ 0 0 ⎥ 24 ⎥ has r.r.e.f. ⎢ 0 ⎥. Its augmented matrix ⎢ ⎢ 0 ⎥ ⎢ 8 5 11 0 0 0 ⎥ ⎣ ⎦ ⎣ 24 24 24 0 ⎦ 1 2 0 0 0 0 0 0 24 24 Since the third column contains no leading entry, the third unknown (c) can be set equal to an arbitrary value. Therefore, the homogeneous system has infinitely many solutions. Consequently, S is linearly dependent. Thus it is possible to create an alloy or mix of alloys using a mix of other alloys in the set. → → +1− → +1− →=− For example, a solution c = 1, b = 1, a = −2 leads to −2− u u u 0 . Moving the 1 3 4 → + 1− → = 2− →. Therefore, mixing equal amounts negative term to the other side yields 1− u u u 3 4 1 of alloy 3 and alloy 4 results in alloy 1.
172
Chapter 4 Vector Spaces Another consequence of the dependence of this set: if another alloy is obtained by mixing alloys 1, 3, and 4, then there are also different ways to mix these alloys to obtain the same outcome. → − → − → →, − (b) T = {− u 1 u2 , u4 , u5 } The equation
→ → + b− → + c− → + d− →=− a− u u u u 0 1 2 4 5 corresponds to the homogeneous system whose augmented matrix⎤ ⎡ ⎡ ⎤ 14 18 9 18 1 0 0 0 0 0 24 24 24 ⎢ ⎢ 24 ⎥ ⎥ 1 2 4 ⎢ ⎢ 24 ⎥ 1 0 24 24 0 ⎥ 0 0 0 ⎥ ⎢ 0 ⎢ ⎥. has r.r.e.f. ⎢ 0 ⎢ 8 ⎥ 6 11 2 0 0 ⎥ 0 1 ⎣ ⎣ 24 24 24 24 0 ⎦ ⎦ 1 2 0 0 0 1 0 0 0 0 24 24 All left-hand side columns contain leading entries; therefore, the homogeneous system has a unique solution: a = b = c = d = 0 (the trivial solution). Consequently, T is linearly independent. Because of this, it is not possible to create an alloy or mix of alloys using a mix of other alloys in the set. Another consequence of the independence of this set: if another alloy is obtained by mixing alloys 1, 2, 4, and 5, then there is no other way to mix these alloys to obtain the same outcome.
In Section 2.4, we applied the procedures for solving linear systems to balancing chemical reactions. E XAMPLE 4.23
In Example 2.19, the equation x1 NH3 + x2 O2 → x3 NO + x4 H2 O
was balanced by rewriting it as a homogeneous linear system (25). This system can also be written using a linear combination of vectors: ⎡
⎤ ⎡ 1 ⎢ ⎥ ⎢ x1 ⎣ 3 ⎦ + x2 ⎣ 0 − → u 1
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 −1 0 0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 0 ⎦ + x3 ⎣ 0 ⎦ + x4 ⎣ −2 ⎦ = ⎣ 0 ⎦ . 2 −1 −1 0 − → u 2
− → u 3
− → u 4
Having shown that this system has nontrivial solutions (e.g., x1 = 4, x2 = 5, x3 = 4, and →, − → − → − → u x4 = 6), we conclude that the set {− 1 u2 , u3 , u4 } is linearly dependent.
E XAMPLE 4.24
Consider a reaction “equation” x1 H2 SO4 + x2 B(OH)3 → x3 B2 (SO4 )3 + x4 H2 .
Following ⎡ the procedure of 2 3 0 −2 ⎢ ⎢ 1 0 −3 0 matrix ⎢ ⎢ 4 3 −12 0 ⎣ 0 1
−2
0
Example 2.19 yields a homogeneous system ⎤ ⎡ 1 ⎥ ⎢ ⎥ ⎢ 0 ⎥, whose reduced row echelon form is ⎢ ⎥ ⎢ 0 ⎦ ⎣
(48) with the coefficient ⎤ 0 0 0 ⎥ 1 0 0 ⎥ ⎥. 0 1 0 ⎥ ⎦
0 0 0 1
Section 4.3 Linear Independence
173
Therefore, the only way for the equality ⎤ ⎤ ⎤ ⎡ ⎤ ⎡ ⎡ ⎤ ⎡ ⎡ 2 3 0 −2 0 ⎥ ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ ⎢ ⎢ ⎢ 1 ⎥ ⎢ 0 ⎥ ⎢ −3 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ ⎢ ⎢ x1 ⎢ ⎥ + x2 ⎢ 3 ⎥ + x3 ⎢ −12 ⎥ + x4 ⎢ 0 ⎥ = ⎢ 0 ⎥ 4 ⎦ ⎦ ⎦ ⎣ ⎦ ⎣ ⎣ ⎦ ⎣ ⎣ 0 1 −2 0 0 → − v1
→ − v2
→ − v3
→ − v4
− → → → to be satisfied is when x1 = x2 = x3 = x4 = 0, making the vectors → v1 , − v2 , − v3 , − v4 linearly independent. Consequently, equation (48) cannot be balanced.12
EXERCISES
In Exercises 1–6, determine whether the given vectors are linearly independent or dependent. If they are dependent, express one of them as a linear combination of the remaining vectors. 1. a.
1 0
2. a.
1 1
,
,
2
−1
0 1
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 0 1 0 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ; b. ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ −1 ⎦ ; c. ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦ . 0 −1 1 1 0 0
,
⎡
−1 1
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎤ ⎡ 1 −1 2 1 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ; b. ⎣ 2 ⎦ , ⎣ 1 ⎦ ; c. ⎣ 1 ⎦ , ⎣ 1 ⎦ , ⎣ 2 ⎦ . 1 2 −1 1 3 ⎡
⎡
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 2 1 0 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 3. a. ⎣ 2 ⎦ , ⎣ 0 ⎦ ; b. ⎣ 1 ⎦ , ⎣ 0 ⎦ , ⎣ 2 ⎦ , ⎣ 2 ⎦ . 0 0 0 2 1 0 ⎡
⎤ ⎡ 0 ⎢ ⎥ ⎢ 4. a. ⎣ 1 ⎦ , ⎣ 1 ⎤ ⎡ ⎡ 0 1 ⎥ ⎢ ⎢ ⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ 5. ⎢ ⎢ 0 ⎥,⎢ 0 ⎦ ⎣ ⎣ 0 0 ⎡ ⎤ ⎡ 1 2 ⎢ ⎥ ⎢ ⎢ 3 ⎥ ⎢ 1 ⎥ ⎢ 6. ⎢ ⎢ 0 ⎥,⎢ 2 ⎣ ⎦ ⎣ 1 1
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −1 1 0 1 1 0 0 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 1 ⎦ , ⎣ 0 ⎦ , ⎣ 0 ⎦ ; b. ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦ . 0 0 1 1 0 1 0 0 ⎤ ⎤ ⎡ 0 ⎥ ⎥ ⎢ ⎥ ⎢ 0 ⎥ ⎥ ⎥,⎢ ⎥ ⎢ 1 ⎥. ⎦ ⎦ ⎣ 1 ⎤ ⎡ ⎤ 1 ⎥ ⎢ ⎥ ⎥ ⎢ 3 ⎥ ⎥,⎢ ⎥ ⎥ ⎢ 0 ⎥. ⎦ ⎣ ⎦ 1
This “equation” contains an intentionally introduced misprint – it should read x1 H2 SO4 + x2 B(OH)3 → x3 B2 (SO4 )3 + x4 H2 O instead.
12
174
Chapter 4 Vector Spaces In Exercises 7–16, determine whether the given vectors are linearly independent or dependent. If they are dependent, express one of them as a linear combination of the remaining vectors. 7. Vectors 1 + t2 and 2t + t2 in P2 . 8. Vectors 1 − t + t2 , 1 + t2 , and −1 + 2t in P2 . 9. Vectors 1 + t, t + t2 , and 1 + 2t + t2 in P2 . 10. Vectors 1 + t, t + t2 , 1 + t2 , and t2 in P2 . 11. Vectors 1 + t + t2 , 2t + 2t3 , and 1 + t2 − t3 in P3 . 12. Vectors 1 + 2t2 , t2 + t3 , t + t3 , and 1 + t in P3 . 0 1 1 0 0 1 13. Vectors , , and in M22 . −1 1 0 1 0 1 2 0 1 1 0 2 14. Vectors , , and in M22 . 0 1 1 0 2 −1 3 0 1 0 0 0 15. Vectors and in M23 . 0 2 0 0 0 0 ⎡
1 ⎢ 16. Vectors ⎣ 0 1
⎤ ⎡ 0 2 ⎥ ⎢ 2 ⎦,⎣ 0 0 2
⎤ ⎡ ⎤ 0 0 1 ⎥ ⎢ ⎥ 4 ⎦ , and ⎣ 0 0 ⎦ in M32 . 0 1 0
17. * Consider the reaction equations of Exercises 5–8 in Section 2.4 in the same way as was done in Examples 4.23 and 4.24. Which of these equations correspond to linearly independent sets, and which are linearly dependent? Compare this to the answers obtained in Exercises 5–8 in Section 2.4. In Exercises 18–19, consider the following five alloys: Alloy I
Alloy II
Alloy III
Alloy IV
Alloy V
gold
22/24
14/24
18/24
18/24
18/24
silver
1/24
6/24
0
3/24
2/24
copper
1/24
4/24
6/24
3/24
4/24
18. Are the vectors corresponding to the alloys III, IV, and V linearly independent or linearly dependent? Can one of the three alloys be obtained by mixing the other two? (Compare the result of Exercise 17 on p. 110 in Section 2.4.
Section 4.3 Linear Independence
x2
x4
x3
x1
x5 x10
x6
x9
19. Are the vectors corresponding to the alloys I, II, and III linearly independent or linearly dependent? Can one of the three alloys be obtained by mixing the other two? 20.
x7 x8
Figure for Exercise 20
175
a. Verify that by following the procedure demonstrated in Example 2.20 on p. 101, the one-way street network diagram in the margin leads to a homogeneous linear system with coefficient⎡matrix ⎤ −1 1 1 0 0 0 0 0 0 −1 ⎢ ⎥ A = ⎣ 0 0 −1 −1 1 1 0 0 0 0 ⎦. 0 0 0 0 0 −1 −1 1 1 1 b. Obtain the reduced row echelon form of A. →, . . . , − u→ c. Show that the ten column vectors of A (let us call them − u 1 10 ) are linearly dependent. →, . . . , − d. Use r.r.e.f. of A to identify three of the vectors − u u→ 1 10 that are linearly independent. e. Suppose you are in charge of placing traffic-monitoring devices throughout this network. Where would you place seven of them to guarantee complete (unique) information about all ten segments? f. Suppose that after you made your proposal based on the result obtained in the previous exercise, three counterproposals were offered, each featuring alternative locations of the traffic-monitoring devices. Which of these three no longer guarantee access to complete information about all ten segments? x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
Counterproposal B
Counterproposal C
Counterproposal D Legend: street where traffic is monitored; street where traffic is not monitored.
T/F?
In Exercises 21–24, decide whether each statement is true or false. Justify your answer. → − → − → → → → 21. If S = {− u ,→ v ,− w } is a set of 4-vectors such that − u +− v +− w = 0 , then S is linearly independent. → → 22. If S is a set that contains − u and 2− u , then S is linearly dependent. 23. If S is linearly independent, then any subset of S is also linearly independent. → → 24. The set S = {− u } consisting of one nonzero vector − u is linearly independent.
→, . . . , − →, − −→ − → − → 25. * Let − u u 1 k uk+1 be vectors in a vector space V. Prove that if S = {u1 , . . . , uk } is − → − → − − → − − → linearly independent and uk+1 is not in span S, then {u1 , . . . , uk , uk+1 } is linearly independent. →, . . . , − → be linearly independent vectors in Rn . 26. * Let − u u 1 k →, . . . , A− → must be linearly independent. a. If A is nonsingular, show that A− u u 1 k → do not necessarily have to be linearly − → u b. If A is an n × n matrix, show that Au1 , . . . , A− k → are always L.D.) →, . . . , A− u independent (Hint: Find an example of A such that A− u 1 k − → − → c. If A is singular, are Au1 , . . . , Auk guaranteed to be linearly dependent? →, . . . , − → be linearly dependent vectors in Rn . Show that 27. * Let A be an m × n matrix and − u u 1 k − → − → Au1 , . . . , Auk also must be linearly dependent.
176
Chapter 4 Vector Spaces
4.4 Basis and Dimension There are infinitely many ways to find sets of vectors that span the entire R2 ; for example, 1 0 2 , , }, • S1 = { 0 1 3 1 0 , }, etc. • S2 = { 0 1 The span of S1 is composed of all linear combinations 1 0 2 + c2 + c3 . (49) c1 0 1 3 However, it turns out that the third of the first and the second: vector isa linear combination 2 1 0 =2 +3 . 3 0 1 Therefore, (49) can be rewritten as 1 0 1 0 c1 +3 ) + c2 + c3 (2 0 1 0 1 1 0 = (c1 + 2c3 ) + (c2 + 3c3 ) . 0 1 It shows that, while we can span R2 using all three vectors in S1 , we don’t really need all three for this purpose – the first two will do just fine. These two vectors (i.e., the set S2 ) appear to be a more economical way to span R2 . The situation we have encountered with the set S1 should look familiar. Based on Theorem 4.11, the fact that the third vector is a combination of others means that S1 is linearly dependent. Its third vector was redundant and could be eliminated without affecting the resulting span. Clearly, no such redundancy exists in the set S2 , as this set is linearly independent. → − In general, it is possible to span any space (other than { 0 }) using infinitely many different sets: •
those sets that are linearly dependent, include redundant vectors, which can be safely deleted (with no change to the span);
•
those sets that are linearly independent, contain no redundant vectors. Should any one of them be removed from the set, the corresponding span would be reduced accordingly. We introduce new terminology to refer to the sets of the latter variety. D EFINITION (Basis) →, . . . , − →} in a vector space V is called a basis for V if An indexed set of vectors {− u u 1 k → span V and →, . . . , − u 1. − u 1 k − → are linearly independent. − → 2. u1 , . . . , u k →, . . . , − → form a basis for V ” without explicitly For brevity, we will sometimes say “vectors − u u 1 k mentioning the indexed set.
⎡
⎤
Section 4.4 Basis and Dimension 177 ⎤ ⎡ ⎤ ⎡ 0 0 ⎥ ⎢ ⎥ ⎢ 0 ⎥ ⎢ 1 ⎥ − → − →=⎢ → − ⎥ ⎥ ⎢ e2 = ⎢ , . . . , e n ⎢ .. ⎥ ⎢ .. ⎥ (where ej is ⎣ . ⎦ ⎣ . ⎦
1 ⎥ 0 ⎥ E XAMPLE 4.25 .. ⎥ ⎥, . ⎦ 0 0 the jth column of In ) are a basis for Rn , as they clearly ⎢ ⎢ − The vectors → e1 = ⎢ ⎢ ⎣
⎡
x1 x2 .. . xn
1
⎤
•
⎢ ⎢ → span Rn (every n-vector − u =⎢ ⎢ ⎣
•
→ − → e2 + · · · + xn − e→ x2 − n = u ) and → − → − e1 + x2 → e2 + · · · + xn − e→ are linearly independent (x1 − n = 0 requires x1 = x2 = · · · = xn = 0).
⎥ ⎥ → ⎥ can be expressed as a linear combination x1 − e1 + ⎥ ⎦
→ n e→ The set {− e1 , . . . , − n } is called the standard basis for R .
⎡
⎤ ⎡ ⎤ ⎡ ⎤ 1 0 1 ⎢ ⎥ → ⎢ ⎥ → ⎢ ⎥ → E XAMPLE 4.26 Consider the vectors − u = ⎣ 0 ⎦, − v = ⎣ 1 ⎦, − w = ⎣ 1 ⎦ in R3 . In 1 −1 1 → → → Example 4.14, we established that span{− u,− v ,− w } = R3 . To verify linear independence, we set → − → → → u + c2 − v + c3 − w = 0, c1 − which leads to a homogeneous system with augmented matrix ⎡ ⎤ 1 0 1 0 ⎢ ⎥ 1 1 0 ⎦. ⎣ 0 1 −1 1 0 After the same four elementary row operations as those performed in Example 4.14, the r.r.e.f. is reached: ⎤ ⎡ 0 0 0 1 ⎥ ⎢ 1 0 0 ⎦. ⎣ 0 0 0 1 0 The presence of a leading entry in each left-hand side column implies that the only solution is → → → u,− v , and − w. c1 = c2 = c3 = 0, which signifies linear independence of − We conclude that these vectors form a basis for R3 .
⎡ E XAMPLE 4.27
⎢ ⎢ The vectors ⎢ ⎢ ⎣
1 −1 2
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥, ⎢ ⎥ ⎢ ⎦ ⎣
1 2 −3
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥, ⎢ ⎥ ⎢ ⎦ ⎣
3 3 −4
1 0 1 be linearly dependent; thus, they do not form a basis for R4 .
⎤ ⎥ ⎥ ⎥ were shown in Example 4.18 to ⎥ ⎦
178
Chapter 4 Vector Spaces ⎡
⎤ ⎡ ⎤ ⎡ ⎤ 1 0 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 2 ⎥ ⎢ 1 ⎥ ⎢ 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥, which were E XAMPLE 4.28 Example 4.17 concerned the vectors ⎢ ⎥,⎢ ⎥,⎢ ⎥ ⎣ 1 ⎦ ⎣ 2 ⎦ ⎣ 0 ⎦ 0 1 −1 shown to be linearly independent. These vectors do not span R4 (check!) so that they cannot be argued to form a basis for R4 . However, they do span a subspace of R4 (recall Theorem 4.6) and, consequently, can be considered a basis for that subspace.
E XAMPLE 4.29
The polynomials 1, t, t2 , . . . , tn form a basis for Pn .
•
They span Pn since any polynomial of degree n or less can be easily expressed as a linear combination of these vectors: c0 (1) + c1 t + c2 t2 + · · · + cn tn .
•
They are linearly independent since for c0 (1) + c1 t + c2 t2 + · · · + cn tn = 0 to hold we must have c0 = c1 = c2 = · · · = cn = 0. The polynomials 1, t, t2 , . . . , tn are referred to as the standard basis (or the monomial basis) for Pn .
E XAMPLE 4.30 A basis for the space Mmn of m×n matrices can be formed by the following m × n matrices: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 ··· 0 0 1 ··· 0 0 0 ··· 1 ⎢ . . . ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ . . . . ... ⎥ , ⎢ ... ... . . . ... ⎥ , · · · ⎢ ... ... . . . ... ⎥ , ⎣ . . ⎦ ⎣ ⎦ ⎣ ⎦ 0 0 ··· 0 0 0 ··· 0 0 0 ··· 0 .. .. .. .. . . . . ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 0 ··· 0 0 0 ··· 0 0 0 ··· 0 ⎢ . . . ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ . . . . ... ⎥ , ⎢ ... ... . . . ... ⎥ , · · · ⎢ ... ... . . . ... ⎥ . ⎦ ⎣ ⎦ ⎣ . . ⎦ ⎣ 0 0 ··· 1 1 0 ··· 0 0 1 ··· 0
The case of the smallest vector space V =
,− →0 is special and will be investigated next.
There is only one vector in the vector space V = ,− →it certainly spans the space. However, 0 is linearly dependent. E XAMPLE 4.31
,− →− → → 0 , i.e., − v = 0 , and
Since the only available vector in V does not form a basis, nothing can. We will sometimes describe this situation by saying “V has no basis” or, equivalently, “the basis of V is an empty set.”
We opened this section with an example involving a set S1 that spanned the desired space (R2 ) but was not a basis because it contained too many vectors. After removing a redundant vector, the new set S2 became a basis for the space. The following result states that this can be
Section 4.4 Basis and Dimension
179
accomplished for any set that spans the space. It also goes on to state that we can add vectors to a linearly independent set to make it a basis. → − → − → T HEOREM 4.13 If S = {− v1 , . . . , − v→ m } spans a vector space V and T = {u1 , . . . , un } is linearly independent in V , then a. it is possible to obtain a basis S for V from S by deleting some of its vectors if necessary, and b. it is possible to obtain a basis T for V from T by adding some vectors to the set if necessary. P ROOF → → − a. Since S = {− v1 , . . . , − v→ m } spans V, every vector u ∈ V can be expressed as a linear combination of vectors in S → → − v1 + · · · + cm − v→ u = c1 − m. • •
If S is linearly independent, then it is already a basis for V (S = S). If S is linearly dependent, then by Theorem 4.12 → − → → • either − v1 = 0 – in this case, we create S1 by deleting − v1 , • or at least one vector in S can be expressed as a linear combination of the preceding ones – in this case, we create S1 by deleting the first such vector. In either case, we obtain a subset S1 ⊂ S of m − 1 vectors such that span S1 = span S. If S1 is linearly independent, then S = S1 . → − Otherwise we create S2 by deleting 0 or the first vector in S1 that is a linear combination of the preceding ones. We still have span S2 =span S. Continuing this process, we obtain a linearly independent subset S after a finite number of steps.
b. Let us form the indexed set U in which the vectors of T are followed by the vectors of S. Since U spans V (by Theorem 4.7), we can apply the procedure of part a of this theorem to obtain a basis U for V. Based on the proof of part a, the only time a vector in U can be deleted is if it either equals → − 0 or can be expressed as a linear combination of the preceding ones. However, as the vectors in T are known to be linearly independent, none of them can be deleted in this way. Therefore, the basis U contains all vectors in T ; consequently, we can take T = U .
The following two examples show how the construction of S and T is carried out when V = Rn . Forming a (reduced) row echelon form of a matrix turns out to be a key ingredient. E XAMPLE 4.32 The set
(Illustration of part a of Theorem 4.13) ⎡
⎤ ⎡ 1 ⎢ ⎥ ⎢ S = {⎣ 2 ⎦, ⎣ 3 → − v1
⎤ ⎡ −1 ⎥ ⎢ −2 ⎦, ⎣ −3 → − v2
⎤ ⎡ 0 ⎥ ⎢ 1 ⎦, ⎣ 1 → − v3
⎤ 1 ⎥ 1 ⎦} 2 → − v4
spans a subspace W of R (by Theorem 4.6). To determine a basis S for W, let us begin by setting → − → → → → v1 + c2 − v2 + c3 − v3 + c4 − v4 = 0 . c1 − 3
180
Chapter 4 Vector Spaces The corresponding system has augmented matrix ⎡ ⎤ 1 −1 0 1 0 ⎢ ⎥ ⎣ 2 −2 1 1 0 ⎦ , 3 −3 1 2 0 which is equivalent (r2 − 2r1 → r2 ; r3 − 3r1 → r3 ; r3 − r2 → r3 ) to ⎡ ⎤ 1 0 1 −1 0 ⎢ ⎥ 0 1 −1 0 ⎦ . ⎣ 0 0 0 0 0 0 Since the second and fourth columns contain no leading entries, we can let c2 and c4 have arbitrary values. For example:
LA TOOLKIT .COM IL 26748372846 NG 03571894251 EE 71028335189 AB 25729491092 RR 17398108327 + A 39163728081 The Linear Algebra Toolkit features a module “Finding a basis of the space spanned by the set” designed to carry out the steps outlined in the last example, showing more details in the process. You can apply this module to problems involving Rn , Pn , and Mmn spaces.
• •
→ → → v2 can be expressed as a linear combination of − v1 and − v3 . (Check!) If c2 = 1, c4 = 0, then − → − → − → − If c = 0, c = 1, then v can be expressed as a linear combination of v and v . (Check!) 2
4
4
1
3
→ → v3 . Also Therefore, every vector in span S can be expressed as a linear combination of − v1 and − → − → − → − → − note that v1 and v3 are linearly independent. Consequently, S = { v1 , v3 } forms a basis for span S – it contains those of the original column vectors that correspond to leading entries in the r.r.e.f.
E XAMPLE 4.33
(Illustration of part b of Theorem 4.13) ⎡
⎢ ⎢ Find a basis for R4 that contains the vectors in the set T = {⎢ ⎢ ⎣
⎤ ⎡ 1 ⎥ ⎢ ⎢ 1 ⎥ ⎥, ⎢ ⎥ 0 ⎦ ⎢ ⎣ 3 − → u 1
⎤ 0 ⎥ 2 ⎥ ⎥}. 0 ⎥ ⎦ 5 − → u 2
T is linearly independent since its vectors are not scalar multiples of each other. Begin by appending any set of vectors spanning R4 to the vectors in set T ; e.g., we can use the standard basis: ⎤ ⎡ ⎤ ⎡ ⎤ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎡ 0 1 0 0 0 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ ⎢ 1 ⎥ ⎢ 2 ⎥ ⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ ⎢ 0 ⎥, ⎢ 0 ⎥, ⎢ 0 ⎥, ⎢ 0 ⎥, ⎢ 1 ⎥, ⎢ 0 ⎥. ⎦ ⎣ ⎦ ⎣ ⎦ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎣ 5 0 0 0 1 3 − → u 1
− → u 2
− → u 3
− → u 4
− → u 5
− → u 6
→ − → − → − → − → →, − 4 Clearly, the set {− u 1 u2 , u3 , u4 , u5 , u6 } spans R (since its subset does). We can now follow the procedure of the previous example to find a linearly independent subset. The equation
→ →+c − → − → − → − → − → − u c1 − 1 2 u2 + c 3 u3 + c 4 u4 + c 5 u5 + c 6 u6 = 0 is equivalent to the homogeneous system with the augmented matrix ⎤ ⎡ 1 0 1 0 0 0 0 ⎥ ⎢ ⎢ 1 2 0 1 0 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 0 0 1 0 0 ⎥. ⎦ ⎣ 3 5 0 0 0 1 0
Section 4.4 Basis and Dimension
181
⎤ 2 0 ⎥ ⎢ ⎢ −1 0 ⎥ ⎥ ⎢ ⎥ ⎢ −2 0 ⎦ ⎣ 0 0 0 0 1 0 0 → and − →, can be indicates that c4 and c6 are arbitrary. Therefore, the corresponding vectors, − u u 4 6 − → − → →, and expressed as linear combinations of the remaining vectors. The other vectors, u1 , u2 , − u 3 − → − → − → − → − → 4 u5 , are linearly independent. Therefore, T = {u1 , u2 , u3 , u5 } is a basis for R (again, we use the leading entries as pointers to indicate which of the original vectors to retain).
The r.r.e.f.
⎡
1 0 0
0 1 0
0 0 1
−5 3 5
0 0 0
Remember that the procedure used in both examples above depends on the ordering of the vectors. This was of particular importance in the last example: by placing the linearly independent vectors in the indexed set T first, we made sure that they would be kept, while some of the other vectors will be eliminated.
Dimension
⎡
⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 → ⎢ − → ⎢ − → ⎢ ⎥ ⎥ − ⎥ A standard basis for R3 is given by the vectors i = ⎣ 0 ⎦ , j = ⎣ 1 ⎦ , and k = ⎣ 0 ⎦ . 0 0 1 → → → Also, in Example 4.26 we found that a different set of vectors − u, − v , and − w provide us with another basis for R3 . It is therefore clear that a basis is not unique: generally, for any vector → − space (other than { 0 }), a number of different bases can be found. However, in this subsection, we shall discover a key property that all such bases for a given vector space share, after proving the following theorem. →, . . . , − − → − → T HEOREM 4.14 If − u u→ 1 m span a vector space V and v1 , . . . , vn are linearly independent vectors in V , then m ≥ n. P ROOF →, . . . , − → − Since vectors − u u→ 1 m span V , then every vector vi can be expressed as a linear combination − → − → of u1 , . . . , um : → − → + ···+ a − → v1 = a11 − u 1 m1 um .. . − → −→ − = a v→ n 1n u1 + · · · + amn um Consider the linear combination: → − → − → 0 = c1 − v1 + · · · + cn − v→ n = (a11 c1 + · · · + a1n cn )u1 + · · · + (am1 c1 + · · · + amn cn )− u→ m (we applied conditions 2, 3, 7, 8, and 9 of the vector space definition). Linear independence of → − v1 , . . . , − v→ n implies c1 = · · · = cn = 0; therefore, the homogeneous system a11 c1 + · · · + a1n cn = 0 .. .. .. . . . am1 c1 + · · · + amn cn = 0 must have only the trivial solution. It follows from Theorem 2.6 that m ≥ n.
182
Chapter 4 Vector Spaces →, . . . , − → − − → u u→ T HEOREM 4.15 If the indexed sets S = {− 1 m } and T = { v1 , . . . , vn } are both bases for a vector space V , then m = n. P ROOF Because S spans V and T is linearly independent, it follows by Theorem 4.14 that m ≥ n. Likewise, since S is linearly independent and T spans V , we have m ≤ n. The only way for both of these inequalities to hold simultaneously is when m = n.
Let us reiterate the crucial point made by the above theorem: while a vector space may have a number of different bases, each of these bases must contain exactly the same number of vectors.
D EFINITION (Dimension) The number of vectors in any basis for a vector space V is called the dimension of V and is denoted by dim V.
→ − It follows from the definition that the vector space { 0 } has dimension 0 since that space has no basis (refer to Example 4.31). From the various examples, we can deduce the dimensions of a number of the following important vector spaces:
dim Rn = n
Example 4.25
dim Pn = n + 1
Example 4.29
dim Mmn = mn
Example 4.30
Not every vector space has a finite dimension. For instance, the vector space of all functions defined for all real numbers FR (see Example 4.1) has infinite dimension. In this book, we shall primarily concern ourselves with finite-dimensional spaces.
Shortcuts
Knowing the dimension of a vector space V can be of great value when deciding whether a given set S is a basis for V. Let’s say that the dimension of V is n : •
If the number of vectors in S is less than or more than n, then S cannot be a basis for V.
•
If the number of vectors in S equals n, then it is possible (but not guaranteed) that S is a basis for V. Here is a result that offers an additional shortcut in the latter case.
Section 4.4 Basis and Dimension
183
→, . . . , − u u→ T HEOREM 4.16 Let S = {− 1 n } be an indexed set of vectors in a vector space V with dim V = n. Then a. if S spans V , then S is a basis for V, and b. if S is linearly independent, then S is a basis for V . P ROOF Both parts follow from Theorem 4.13. In part a, by part a of Theorem 4.13, we can obtain a set S that is a basis for V by deleting some vectors in V. But, if we actually deleted any vectors, the size of the resulting basis would be less than dim V = n – a contradiction. Consequently, S itself must be a basis for V.
LA TOOLKIT .COM IL 26748372846 NG 03571894251 EE 71028335189 AB 25729491092 RR 17398108327 + A 39163728081 The reason why there is no separate module for checking if the set is a basis for the space in the Linear Algebra Toolkit is that there really is no need for such a module. This is because of the last theorem: once we know that we have the correct number of vectors in our set (n = dim V ) it is sufficient to just use the “linear independence and dependence” module to check if the set is L.I. (If the set has an “incorrect” number of vectors, n = dim V, then it cannot be a basis for V anyway.)
In part b, we use part b of Theorem 4.13 in a similar manner: adding vectors to the set S to create a basis would contradict dim V = n, therefore proving that S is a basis for V.
Finally, using Theorem 4.14, we can conclude that •
a set of vectors in a vector space V that is linearly independent must contain no more than dim V vectors and
•
a set of vectors in a vector space V that spans V cannot contain fewer than dim V vectors.
⎡
E XAMPLE 4.34 ⎡
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −1 2 1 2 ⎢ ⎥ − ⎢ ⎥ − ⎢ ⎥ − ⎢ ⎥ − − → → → → →= Given u1 = ⎣ 3 ⎦ , u2 = ⎣ 0 ⎦ , u3 = ⎣ 2 ⎦ , u4 = ⎣ 4 ⎦ , u 5 1 3 4 8
⎤ 0 ⎢ ⎥ ⎣ 4 ⎦ , which of the indexed sets 5 →, − →} , u u • S = {− • • • • •
1
1
2
4
2
3
→, − → S2 = {− u 3 u4 } , →, − → − → u S3 = {− 1 u2 , u3 } , →, − →, − →} , u u u S = {− 5
→, − → − → − → u S5 = {− 1 u2 , u3 , u5 } , →, − → − → − → u S6 = {− 2 u3 , u4 , u5 }
form bases for R3 ? S OLUTION Since the numbers of vectors in S1 , S2 , S5 , and S6 do not match dim R3 = 3, none of these sets can be a basis for R3 . The number of vectors in S3 and S4 match dimR3 ; thus by Theorem 4.16 it is sufficient to check whether these sets are linearly independent. Recall that from equivalent statements, columns of a square matrix are L.I. if and only if the determinant of the matrix is nonzero.
184
Chapter 4 Vector Spaces ⎤ −1 2 1 ⎥ → − → − → = det ⎢ Since det − u u u ⎣ 3 0 2 ⎦= 1 2 3 1 3 4 quently (by Theorem 4.16) is a basis for R3 . ⎡ 2 ⎢ − → − → − → On the other hand, det u2 u3 u5 = det ⎣ 0 3 conclude that S4 is not a basis for R3 .
E XAMPLE 4.35 for P3 .
⎡
−5 = 0, S3 must be L.I. and conse-
⎤ 1 0 ⎥ 2 4 ⎦ = 0; thus S4 is L.D. We 4 5
We will show that the set S = {(1 − t)3 , 3t(1 − t)2 , 3t2 (1 − t), t3 } is a basis
Since the number of vectors in S, four, matches dim P3 , by Theorem 4.16, it is sufficient to show that S is linearly independent to conclude that S is a basis for P3 . Linear independence of S means that the only solution for the equation c1 (1 − t)3 + c2 3t(1 − t)2 + c3 3t2 (1 − t) + c4 t3 = 0 is c1 = c2 = c3 = c4 = 0.
(50)
The left-hand side of equation (50) can be expanded, c1 (1 − 3t + 3t2 − t3 ) + 3c2 (t − 2t2 + t3 ) + 3c3 (t2 − t3 ) + c4 t3 = 0, and then the like powers of t can be collected: c1 + (−3c1 + 3c2 )t + (3c1 − 6c2 + 3c3 )t2 + (−c1 + 3c2 − 3c3 + c4 )t3 = 0. For the last equation to be satisfied for all t, the coefficients in front of 1, t, t2 , and t3 must all be equal to 0: = 0 c1 −3c1 + 3c2 = 0 3c1 − 6c2 + 3c3 = 0 −c1 + 3c2 − 3c3 + c4 = 0 This system can be solved in the ⎤standard way, 2 (with the augmented ⎡ established in Chapter ⎡ ⎤ 1 0 0 0 0 1 0 0 0 0 ⎢ ⎢ ⎥ ⎥ ⎢ ⎢ −3 ⎥ ⎥ 3 0 0 0 ⎥ ⎢ 0 1 0 0 0 ⎥) or can be solved in a more in r.r.e.f. matrix ⎢ ⎢ ⎢ 3 −6 ⎥ ⎥ 3 0 0 ⎦ ⎣ 0 0 1 0 0 ⎦ ⎣ −1 3 −3 1 0 0 0 0 1 0 straightforward manner by solving the first equation for c1 first, then substituting the result into the second equation and solving for c2 , etc. Either way, it should be clear that the only solution is the trivial solution, which means S is linearly independent, making it a basis for P3 .
n! is called the factorial of n and is defined as n! = 1 · 2 · · · · · n with 0! = 1.
Generally, the n + 1 polynomials n! ti (1 − t)n−i , (51) i!(n − i)! where i = 0, 1, . . . , n, form a basis for Pn , called the Bernstein basis. In the last example, we have verified that this is indeed the case for n = 3 (we omit the proof for general n). pi (t) =
Section 4.4 Basis and Dimension
Subspaces of a finitedimensional space
185
While there are many important vector spaces whose dimensions are infinite (e.g., the space FR of all functions defined on the set of real numbers), in this book we most often study finitedimensional vector spaces, such as Rn , Pn , or Mmn . The following result relates dimensions of such spaces to those of their subspaces.
T HEOREM 4.17
Let V be a finite-dimensional vector space with a subspace W.
1. dim W ≤ dim V. 2. If dim W = dim V , then W = V. P ROOF Part 1 follows directly from Theorem 4.14: since a basis for W must be a linearly independent set of vectors in W (and also in V ), it cannot contain more vectors than does a basis for V (which spans V ). To prove part 2, let S be a basis for W. Since S is a linearly independent set of vectors in W (and also in V ) and it contains dim W = dim V vectors, by Theorem 4.16, S must also be a basis for V. Since W = span S and V = span S, we must have W = V.
The possible dimensions of subspaces of the 2-space and of the 3-space are: •
Subspaces of R2 : Point
Line
y
x
x
Dimension = 0
•
Plane y
y
x
Dimension = 1
Dimension = 2
Line
Plane
Subspaces of R3 : Point
z
z
y
Space
z
y
z
y
y
x
x
x
x
Dimension = 0
Dimension = 1
Dimension = 2
Dimension = 3
186
Chapter 4 Vector Spaces E XAMPLE 4.36 In Example 4.34, we considered six sets of vectors in R3 ; one of those sets, S3 , was shown to form a basis for R3 . Consequently, dim span S3 = 3 – the set S3 spans the entire 3-space. Let us determine the dimensions of the subspaces of R3 spanned by each of the remaining five sets in that example: •
→ →+c − → − u The equation c1 − 1 2 u2 = 0 can be rewritten as a system −1c1 3c1 1c1
⎡
+ 2c2 + 0c2 + 3c2 ⎤
= 0 = 0 = 0
⎤ ⎡ 0 0 −1 2 0 1 ⎢ ⎥ ⎥ ⎢ whose augmented matrix ⎣ 3 0 0 ⎦ has the r.r.e.f. ⎣ 0 1 0 ⎦. Since the sys1 3 0 0 0 0 tem has only the trivial solution c1 = c2 = 0, S1 is linearly independent13 ; hence it forms a basis for span S1 . We conclude that the dimension of span S1 is 2 (span S1 is a plane passing through the origin in 3-space). •
•
We could proceed as above to check for linear independence of S2 . However, it would be → = 2− → yields the linear dependence faster to use Theorem 4.8 instead: by inspection, − u u 4 3 →. 3 u of S2 . The subspace of R spanned by S2 is also spanned by the (nonzero) vector − 3 Consequently, the dimension of span S2 is 1 (span S2 is a line passing through the origin in 3-space). → →+c − →+c − →=− 0 can be rewritten as a system u u u The equation c − 2 2
4
+ 1c3 + 2c3
+ 0c5 + 4c5
= 0 = 0
3c2 + 4c3 ⎡ 2 1 0 0 ⎢ whose augmented matrix ⎣ 0 2 4 0
+ 5c5 ⎤
= 0 ⎡
u5
⎤ 0 −1 0 1 ⎥ ⎥ ⎢ 2 0 ⎦. There 1 ⎦ has the r.r.e.f. ⎣ 0 3 4 5 0 0 0 0 0 − → is no leading entry in the third column; therefore, the third vector (u5 ) can be expressed as → and − →), which are linearly independent a linear combination of the first two vectors (− u u 2 3 → → and − u (since they correspond to leading columns – see Example 4.32). The vectors − u 2 3 form a basis for span S4 ; the dimension of span S4 is 2 (span S4 is a plane passing through the origin in 3-space – see the illustration in the margin).
u3
3
u2
2
span{ u 2 , u 3 , u 5 }
1 1 2
O 1
2
3
4
5 5
2c2 0c2
z
5
3 3
y
x • •
Noticing that S5 is a superset of S3 which spans the entire R3 we conclude that S5 also spans R3 and its dimension is 3. → →+c − → − → − → − u The equation c2 − 2 3 u3 +c4 u4 +c5 u5 = 0 can be rewritten as a system whose augmented ⎡ ⎤ 1 0 0 −1 0 ⎢ ⎥ matrix has the reduced row echelon form ⎣ 0 1 2 2 0 ⎦. The positions of 0 0 0 0 0 →) form a basis for span S . The → and − u leading entries indicate that the first two vectors (− u 2 3 6 dimension of span S6 is 2 (span S6 is a plane passing through the origin in 3-space).
13
Alternatively, linear independence of S1 could be established by noticing that neither vector is a scalar multiple of the other, then using Theorem 4.8.
Section 4.4 Basis and Dimension
EXERCISES
−1 1. Is { , } a basis for R2 ? 3 2 −4 2. Is { , } a basis for R2 ? −1 2 1 7 0 3. Is { , , } a basis for R2 ? 3 −5 2 1 4. Is { } a basis for R2 ? −1 2 1
⎡
⎤ ⎡ ⎤ 0 1 ⎢ ⎥ ⎢ ⎥ 5. Is {⎣ 1 ⎦ , ⎣ 1 ⎦} a basis for R3 ? 1 0 ⎡
⎤ ⎡ ⎤ ⎡ ⎤ 1 0 −1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 6. Is {⎣ 0 ⎦ , ⎣ 2 ⎦ , ⎣ 1 ⎦} a basis for R3 ? 1 1 0 ⎡
⎤ ⎡ ⎤ ⎡ ⎤ 1 0 −1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 7. Is {⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦} a basis for R3 ? 1 1 0 ⎡
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 4 1 3 2 5 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 8. Is {⎣ 3 ⎦ , ⎣ 2 ⎦ , ⎣ −1 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦} a basis for R3 ? 2 1 0 2 0 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥ 4 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 9. Is {⎢ ⎢ 0 ⎥ , ⎢ 1 ⎥ , ⎢ 0 ⎥ , ⎢ 0 ⎥} a basis for R ? ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ −1 0 1 1 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 3 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 2 ⎥ ⎢ 2 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ 4 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 10. Is {⎢ ⎢ 1 ⎥ , ⎢ 1 ⎥ , ⎢ 0 ⎥ , ⎢ 0 ⎥} a basis for R ? ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 1 0 3 0 11. Is {t2 , t} a basis for P2 ? 12. Is {2t, −1 + 2t2 , t − t2 } a basis for P2 ? 13. Is {2 + t, 1 + 3t} a basis for P1 ? 14. Is {t, 1 − 2t, 2 − 3t} a basis for P1 ? 15. Is {t2 + t, t3 , 1 + t} a basis for P3 ? 16. Is { 1 0 3 , 0 2 1 , −1 1 1 , 2 1 0 } a basis for M13 ?
187
188
Chapter 4 Vector Spaces ⎡
⎤ ⎡ 0 0 ⎥ ⎢ 1 ⎦,⎣ 1 1 0
⎡
⎤ 0 0 ⎥ 1 0 ⎦} a basis for M33 ? 0 0 0 1 1 1 1 , , , 0 0 0 1 0 0 1 1 1 1 , , , 1 1 0 0 1
2 ⎢ 17. Is {⎣ 0 1 1 ⎢ 18. Is {⎣ 0 0 1 19. Is { 0 1 20. Is { 1
⎤ 1 ⎥ 0 ⎦} a basis for M32 ? 0
1 1 1 1 0 1 1 1
} a basis for M22 ? } a basis for M22 ?
In Exercises 21–24, for each set S, a. find a subset of S that forms a basis for span S, b. determine dim span S,
21.
22.
23.
24.
c. geometrically describe span S as a point, a line, or the entire plane. 1 2 1 1 { , , , } 2 4 1 4 0 1 3 , } { , 0 1 −1 0 1 −1 { , , } 0 −1 1 2 1 { , } 3 1
In Exercises 25–30, for each set S, a. find a subset of S that forms a basis for span S, b. determine dim span S, c. geometrically describe span S as a point, a line, a plane, or the entire 3-space. ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎡ ⎤ 0 1 1 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 25. {⎣ 1 ⎦ , ⎣ 0 ⎦ , ⎣ 2 ⎦ , ⎣ 1 ⎦} −1
2
0
1
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 2 1 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 26. {⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦ , ⎣ −1 ⎦} 3 0 1 −2 ⎡
Section 4.4 Basis and Dimension ⎡
⎤ ⎡
⎤ ⎡
⎤
0 1 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 27. {⎣ 2 ⎦ , ⎣ 4 ⎦ , ⎣ 2 ⎦} 1 3 3 ⎡
⎤ ⎡ ⎤ ⎡ ⎤ 2 1 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 28. {⎣ 3 ⎦ , ⎣ 0 ⎦ , ⎣ 1 ⎦} 1 −1 0 ⎡
⎤ 0 ⎢ ⎥ 29. {⎣ 0 ⎦} 0 ⎡
⎤ ⎡ ⎤ 1 0 ⎢ ⎥ ⎢ ⎥ 30. {⎣ 1 ⎦ , ⎣ 0 ⎦} 2 0
In Exercises 31–36, for each set S, a. find a subset of S that forms a basis for span S, b. determine dim span S. ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ 1 1 0 2 2 −1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ 0 ⎥ ⎢ −1 ⎥ ⎢ 2 ⎥ ⎢ 1 ⎥ ⎢ 1 ⎥ ⎢ 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 31. {⎢ ⎢ 2 ⎥ , ⎢ 1 ⎥ , ⎢ 0 ⎥ , ⎢ 1 ⎥ , ⎢ −3 ⎥ , ⎢ 0 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ 0 2 3 −6 −3 −3 ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎡ 1 1 1 0 1 0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ ⎢ 1 ⎥ ⎢ 2 ⎥ ⎢ 0 ⎥ ⎢ −1 ⎥ ⎢ 1 ⎥ ⎢ 2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 32. {⎢ ⎢ 1 ⎥ , ⎢ 2 ⎥ , ⎢ 0 ⎥ , ⎢ −1 ⎥ , ⎢ −1 ⎥ , ⎢ 0 ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎣ −1 −1 −1 0 −1 0
⎤ ⎥ ⎥ ⎥} ⎥ ⎦ ⎤ ⎥ ⎥ ⎥} ⎥ ⎦
33. {−t, 1 − t, 2 + t, t − t2 } 34. {t + t2 − t3 , 1 + 2t2 + t3 , 1 − 2t + 3t3 , 1 + t + 3t2 , 1 − t + t2 + 2t3 } 1 0 0 0 1 0 0 1 1 1 35. { , , , , } 0 1 1 1 −1 0 0 −1 0 0 1 0 0 1 1 1 1 1 36. { , , , } 0 1 1 0 1 1 −1 −1
In Exercises 37–38, find a basis for R3 that contains the given vectors. ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 37. a. ⎣ −3 ⎦ ; b. ⎣ 0 ⎦ , ⎣ 0 ⎦ . 0
−2
−3
189
190
Chapter 4 Vector Spaces ⎡
⎤ ⎡ ⎤ ⎡ ⎤ 1 1 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 38. a. ⎣ 1 ⎦ ; b. ⎣ 0 ⎦ , ⎣ 1 ⎦ . 1 0 1
T/F?
In Exercises 39–44, decide whether each statement is true or false. Justify your answer. 39. A vector space cannot have two different bases. 40. If there are m vectors that are linearly independent in a vector space V and n vectors span the space, then m ≤ n. 41. If a vector space has dimension 1, then it can be spanned by one vector. 42. A basis cannot contain a zero vector. 43. Every linearly independent set of vectors in a vector space V is a basis for some subspace of V. 44. If a vector space has dimension 7, then it can be spanned by 6 vectors.
45. * Consider two linear equations u 1 x + u2 y + u 3 z = b and v1 x + v2 y + v3 z = c corresponding to two planes whose normal vectors ⎤ ⎡ u1 ⎥ ⎢ → − u = ⎣ u2 ⎦ u3 and
⎤ v1 ⎥ ⎢ → − v = ⎣ v2 ⎦ v3 ⎡
are linearly independent. Consider a third plane whose equation is obtained by adding k times the first equation to the second one. a. Show that the third plane contains the line of intersection of the first two. → → b. Show that the normal vector to the third plane is in span{− u,− v }. → − c. Show that every vector in span{− u ,→ v } can be obtained as a normal to this plane by taking the appropriate k value, except for vectors in one specific direction. What is this direction?
Section 4.5 Coordinates
191
4.5 Coordinates We open this section with a result, which will showcase a key property of a basis. →, . . . , − →} be an indexed set of vectors in a vector space V . u u T HEOREM 4.18 Let S = {− 1 k S is a basis for V if and only if every vector in V can be expressed as a linear combination of vectors in S in a unique way. P ROOF Part I (⇒) →} is a basis for V, it follows that S spans V ; therefore, for every →, . . . , − u Assuming S = {− u 1 k → − vector x in V, we can write − → →+c − → − → x = c1 − u 1 2 u2 + · · · + c k uk .
(52)
→ To show that this representation is unique, let us assume there is another way to express − x as a linear combination of vectors in S : → − →+d − → − → x = d1 − u 1 2 u2 + · · · + dk uk . Subtracting the two equations and applying conditions 2, 3, 5, and 8 of the vector space definition leads to → − → + (c − d ) − → − → u 0 = (c1 − d1 ) − 1 2 2 u2 + · · · + (ck − dk ) uk . It follows from linear independence of S (it was assumed to be a basis), that the only way to satisfy the last equation is with all the coefficients equal to zero: c1 − d1 = c2 − d2 = · · · = ck − dk = 0, which implies that the dj = cj for all j. Therefore, the representation (52) is unique. Part II (⇐) Assuming there is a unique way to express every vector in V as a linear combination of vectors in S, we want to show that S is a basis for V ; i.e., a. S spans V , and b. S is linearly independent. Part a (S spans V ) follows immediately since we have assumed that every vector in V can be expressed as a linear combination of vectors in S – we don’t even need the uniqueness here. → − Part b can be demonstrated if we focus on the zero vector in V , 0 . We have − → →+c − → − → 0 = c1 − u 1 2 u2 + · · · + c k uk with a unique set of values c1 , c2 , . . . , ck (since this was true for any vector in V ). This implies that S is linearly independent. Concluding, S is a basis for V.
Let us stop and think about the significance of the above theorem.
192
Chapter 4 Vector Spaces
y
Recall the two sets introduced at the beginning of the previous section:
6
5
•
4
3
•
2 1 5
x
1 2 3 4 0 +6 0 1
5
1
y
6 5 4 3 2 1 3
S1 = {
x 1 0
S2 = {
1 0 1 0
, ,
0 1 0 1
,
2 3
},
}.
Both of these sets span the entire R2 . However, S2 is also linearly independent, thereby forming a basis for R2 ; S1 is linearly dependent, so it cannot be a basis. x → u = can be expressed as a linear combination Because both sets span R2 , any 2-vector − y of the vectors in either set: 1 0 2 x + d2 + d3 (53) = d1 0 1 3 y and x 1 0 = c1 + c2 . (54) y 0 1 The key distinction is that for the given values of x and y, only one set of coefficients c1 and c2 satisfies (54), whereas many possible combinations of coefficients d1 , d2 , and d3 solve (53). 5 For example, the only way to express the vector in the form (54) is 6
1 2 3 4 5 0 2 +3 +1 1 3
5 6
=5
1 0
+6
0 1
,
but there are many ways to do so in the form (53), e.g.,
y
6 5 4
5 6
=
5
=
3
3
2 1 1
=
1
1 0 1 0 1 0
+6
+3
+0
0 1 0 1 0 1
+0
+1
+2
2 3 2 3 2 3
, etc.
x
1 2 3 4 5 1 0 2 +0 +2 0 1 3
It is generally preferable to be able to reference a vector in a simple, unique manner afforded by a basis, as opposed to the multitude of possible references when a basis is not used. If you play chess (or at least are familiar with the way chess pieces move), then the following illustration may appeal to you. Consider various ways in which different pieces can move four squares to the right and two squares forward (without going left or back): (Q) A queen can do so in many ways, including some possible diagonal moves. There is no uniqueness here. (R) A rook can do it in a few ways, too, but all of them amount to composing a horizontal motion (by 4) with a vertical one (by 2). (P) A pawn can’t get there at all, as it can only move forward (if it doesn’t capture another piece).
Section 4.5 Coordinates
193
In terms of 2-vectors: (Q) The set that describes possible directions for a queen 1 0 1 1 , , , 0 1 1 −1 The queen can get there in many ways...
spans the entire plane, but is too big to form a basis for it (it is linearly dependent). (R) The two available directions for a rook correspond to 1 0 , , 0 1 which spans R2 and is linearly independent.
... while the rook must move 4 across and 2 forward ...
(P) The only direction a pawn can take (unless capturing another piece) is along 0 1 (actually, it cannot move back; therefore, the set of its available directions is even smaller than the span of this vector); this set is linearly independent but does not span the plane. It cannot form a basis for the plane.
The unique values of the coefficients assigned to the individual basis vectors will now be given a special name.
... whereas the pawn can’t get there at all!
D EFINITION → →, . . . , − →} is a basis for V , then the If − v is a vector in a vector space V and S = {− u u 1 k → − coordinate vector of v with respect to S is the k-vector ⎤ ⎡ c1 ⎥ ⎢ ⎢ c2 ⎥ → − ⎢ [ v ]S = ⎢ . ⎥ ⎥ ⎣ .. ⎦ ck where
− → →+c − → − → v = c1 − u 1 2 u2 + · · · + c k uk .
→ → → − E XAMPLE 4.37 The standard basis for Rn , S = {− e1 , − e2 , . . . , − e→ n } (where ej is the jth column of In ), corresponds to particularly nice coordinate vectors. ⎡ ⎤ ⎤ ⎡ x1 x1 ⎢ ⎥ ⎥ ⎢ ⎢ x2 ⎥ ⎢ x2 ⎥ → − → − → − → − ⎢ ⎥ ⎢ If v = ⎢ . ⎥, then [ v ]S = ⎢ . ⎥ ⎥ (check!). In other words, [ v ]S = v . . . ⎣ . ⎦ ⎣ . ⎦ xn
xn
(A special case of this in this section: the set S2 was the standard basis was discussed earlier 5 5 for R2 , and we had [ ]S 2 = .) 6 6
194
Chapter 4 Vector Spaces
Finding coordinates of a given vector
E XAMPLE 4.38
In Example 4.26, we have demonstrated that the set ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ T = {⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦} 1 −1 1
forms a basis for R3 .
⎡
⎤ 3 ⎢ ⎥ → → Let us find the coordinate vector [− v ]T of − v = ⎣ 0 ⎦. 1 ⎤ ⎡ c1 ⎥ ⎢ → By definition, [− v ]T = ⎣ c2 ⎦ where c3 ⎡
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 1 3 ⎢ ⎢ ⎢ ⎥ ⎥ ⎥ ⎢ ⎥ c1 ⎣ 0 ⎦ + c2 ⎣ 1 ⎦ + c3 ⎣ 1 ⎦ = ⎣ 0 ⎦ . 1 −1 1 1 This corresponds to the linear system
LA TOOLKIT .COM IL 26748372846 NG 03571894251 EE 71028335189 AB 25729491092 RR 17398108327 + A 39163728081
c1 c1 with augmented matrix
The module “solving a linear system” can be used to generate more detail for this part of our solution.
c2 − c2 ⎡
1 ⎢ ⎣ 0 1
+ c3 + c3 + c3
= 3 = 0 = 1
⎤ 0 1 3 ⎥ 1 1 0 ⎦. −1 1 1
After a sequence of elementary row operations (r3 − r1 → r3 ; r3 + r2 → r3 ; r2 − r3 → r2 ; r1 − r3 → r1 ) we arrive at the reduced row echelon form: ⎡ ⎤ 0 0 5 1 ⎢ ⎥ 1 0 2 ⎦. ⎣ 0 0 0 1 −2 ⎡ ⎤ 5 ⎢ ⎥ → The solution of the system is also the answer to our question: [− v ]T = ⎣ 2 ⎦ . −2 ⎤ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ 3 1 0 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ (Check: 5 ⎣ 0 ⎦ + 2 ⎣ 1 ⎦ − 2 ⎣ 1 ⎦ = ⎣ 0 ⎦.) 1 1 −1 1 ⎡
E XAMPLE 4.39 The set S = {2t, −1 + 2t2 , t − t2 } can be shown to be a basis for P2 (see Exercise 12 in Section 4.4). → We would like to find the coordinate vector of − u = −3 − 4t + 6t2 with respect to the basis S.
Section 4.5 Coordinates ⎡
195
⎤
c1 ⎥ ⎢ − → According to the definition, [ u ]S = ⎣ c2 ⎦ where c3 c1 (2t) + c2 −1 + 2t2 + c3 t − t2 = −3 − 4t + 6t2 . For these two polynomials to be equal, the coefficients corresponding to the same power of t on both sides must match: = −3 − c2 2c1 + c3 = −4 2c2
− c3
=
6
This system is simple enough to be solved directly by substitution, or we can use the standard approach based on the augmented matrix, ⎡ ⎤ 0 −1 0 −3 ⎢ ⎥ 0 1 −4 ⎦ . ⎣ 2 0
2 −1
6
⎡ ⎤ ⎤ 0 0 −2 −2 1 ⎢ ⎥ ⎢ ⎥ → Its reduced row echelon form ⎣ 0 u ]S = ⎣ 3 ⎦ . 1 0 3 ⎦ yields the solution [− 0 0 1 0 0 (Check: −2 (2t) + 3 −1 + 2t2 + (0) t − t2 = −3 − 4t + 6t2 .) ⎡
Let us summarize the procedure followed in the two examples above. Given a basis S and a → → vector − u , finding [− u ]S generally involves solving a linear system (unless the problem becomes as easy as in Example 4.37, requiring hardly any work). → → Moving in the opposite direction, i.e., finding − u when [− u ]S is provided, turns out to be more straightforward, as evidenced by the following two examples.
Finding the vector given its coordinate vector
⎡
E XAMPLE 4.40
⎤ ⎡ ⎤ ⎡ ⎤ 1 0 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ Once again, consider T = {⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦}, a basis for R3 (see 1 −1 1
Example 4.26).
⎡
6
⎤
⎢ ⎥ − Knowing the coordinate vector [→ w ]T = ⎣ −2 ⎦ allows us to evaluate the corresponding 1 → − vector w as follows: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 1 7 ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ → − w = 6 ⎣ 0 ⎦ − 2 ⎣ 1 ⎦ + 1 ⎣ 1 ⎦ = ⎣ −1 ⎦ . 1
−1
1
9
196
Chapter 4 Vector Spaces
1 0 E XAMPLE 4.41 The set S = { , 0 0 M22 (refer to Exercise 19 in Section 4.4). ⎤ ⎡ 7 ⎥ ⎢ ⎢ −5 ⎥ → ⎥ Based on [− u ]S = ⎢ ⎢ 4 ⎥ , we have ⎦ ⎣ −1 1 0 1 1 → − u =7 −5 +4 0 0 0 0
Coordinates with respect to a basis for a subspace
1 1 0 0
1 1 1 0
,
1 1 1 0
−
1 1 1 1
,
1 1
=
1 1
} is a basis for
5 −2 3 −1
.
⎡
⎤ ⎡ ⎤ 0 2 ⎢ ⎥ ⎢ ⎥ E XAMPLE 4.42 The set S = {⎣ 2 ⎦ , ⎣ 0 ⎦} is linearly independent by part 2 of Theorem 1 2 ⎡ ⎤ ⎡ ⎤ 0 2 → = ⎢ 0 ⎥ is a scalar multiple of the other one. → = ⎢ 2 ⎥ nor − u 4.8 since neither vector − u ⎣ ⎦ ⎣ ⎦ 1 2 1 2 Consequently, S is a basis for a two-dimensional subspace of R3 , a plane through the origin. ⎡ ⎤ ⎡ ⎤ ⎤ 0 2 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − → a. Given → v1 = ⎣ 4 ⎦ , let us calculate [− v1 ]S . We need to solve c1 ⎣ 2 ⎦ + c2 ⎣ 0 ⎦ = 1 2 4 ⎡ ⎤ ⎤ ⎡ 2 0 2 2 ⎢ ⎥ ⎥ ⎢ ⎣ 4 ⎦, which corresponds to a system whose augmented matrix ⎣ 2 0 4 ⎦ has the ⎡
z
4 3
4
v1 1 u2 1 2
x
v3 u1 1
3 v2
4
y
⎡
1 2 4
⎤
1 0 2 ⎥ ⎢ reduced row echelon form ⎣ 0 1 1 ⎦ . We conclude that c1 = 2 and c2 = 1; therefore, 0 0 0 2 → → → [− v1 ]S = . As illustrated in the margin figure, − v1 is a result of adding two vectors − u 1 1 → (shown in gray). (shown in white) and one vector − u 2 ⎡ ⎤ 2 ⎢ ⎥ → b. Repeating the procedure of part a for the vector − v2 = ⎣ 3 ⎦ yields the augmented matrix ⎡
0 2 2
⎤
⎡
0 1 0 0
⎤
⎥ ⎥ ⎢ ⎢ ⎣ 2 0 3 ⎦ with the reduced row echelon form ⎣ 0 1 0 ⎦ . There are no values c1 1 2 0 0 0 1 → → − → and − → (see →+c − →=− and c2 for which c1 − v since v is not on the plane spanned by − u u u u 1 2 2 2 2 1 2 → − the figure on the margin). Consequently, [ v2 ]S cannot be found. 1 → → c. To find the vector − v3 given its coordinate vector [− v3 ]S = we simply form the linear 1
Section 4.5 Coordinates combination
197
⎤ ⎡ ⎤ ⎡ ⎤ 0 2 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ → − v3 = 1 ⎣ 2 ⎦ + 1 ⎣ 0 ⎦ = ⎣ 2 ⎦ . 1 2 3 (As in part a, this linear combination is illustrated in the margin figure.)
Linearity of coordinates
⎡
→, − → − → Let S = {− u 1 u2 , . . . , un } be a basis for a vector space V . If ⎤ ⎤ ⎡ ⎡ c1 d1 ⎢ . ⎥ ⎢ . ⎥ → → − ⎢ . ⎥ . ⎥ [− v ]S = ⎢ ⎣ . ⎦ and [ w ]S = ⎣ . ⎦ , cn dn i.e., → → + ··· + c − → and − → + ··· + d − → → − w = d1 − u u v = c1 − 1 n un 1 n un , then applying conditions 2, 3, 7, 8, and 9 of the vector space definition we obtain → + ··· + c − → + ··· + d − → → u u→) + b(d − u u→) a− v + b− w = a(c − 1 1
n n
1 1
n n
→ + · · · + (ac + bd )− → = (ac1 + bd1 )− u 1 n n un so that
⎤ ac1 + bd1 ⎥ ⎢ .. ⎥ = ⎢ . ⎦ ⎣ acn + bdn ⎡ ⎤ ⎡ c1 d1 ⎢ . ⎥ ⎢ . ⎢ ⎥ ⎢ . = a ⎣ . ⎦ + b ⎣ .. cn dn → − → − = a[ v ]S + b[ w ]S . ⎡
→ → [a− v + b− w ]S
⎤ ⎥ ⎥ ⎦
This means that a coordinate vector of a linear combination of two vectors can be obtained by taking a linear combination of the individual coordinate vectors (with the same coefficients). Applying this result repeatedly (using mathematical induction), the following general theorem can be proved. → → v1 , . . . , − vk are T HEOREM 4.19 Let S be a basis for a finite-dimensional vector space V. If − vectors in V and a1 , . . . , ak are scalars, then → → → → [a1 − v1 ]S + · · · + ak [− vk ]S . v1 + · · · + ak − vk ]S = a1 [− The theorem above will lead us to a number of results. Here is one of them.
T HEOREM 4.20 basis for V , then
→, . . . , − If − u u→ 1 m are vectors in a finite-dimensional vector space V and S is a
− → −→ →] , . . . , [− u→ 1. the vectors [− u 1 S m ]S are linearly independent if and only if the vectors u1 , . . . , um are linearly independent, and
198
Chapter 4 Vector Spaces − → −→ →] , . . . , [− u→ 2. dim span{[− u 1 S m ]S } = dim span{u1 , . . . , um }. P ROOF
→ → →} is L.D. ⇔ − →=− →] = − →] } is L.D. If m = 1, then {− u u 0 ⇔ [− u 0 ⇔ {[− u 1 1 1 S 1 S →, . . . , − If m > 1, from the linearity of coordinates, whenever one of the vectors − u u→ 1 m is express− → − → ible as a linear combination of others, the same can be said about [u ] , . . . , [u ] , and vice 1 S
versa.
m S
→ + ··· + c − −→ −−→ −→ − → u uj = c 1 − 1 j−1 uj−1 + cj+1 uj+1 + · · · + cm um
is equivalent to → →] + · · · + c −−→ −−→ −→ [− uj ]S = c1 [− u 1 S j−1 [uj−1 ]S + cj+1 [uj+1 ]S + · · · + cm [um ]S . If no such redundant vectors can be found in either set, then both are linearly independent. Otherwise, after all such redundant vectors are eliminated from both lists, the number of remaining linearly independent vectors is the same on both. In all cases, the number of linearly independent vectors in both sets is equal to →] , . . . , [− − → −→ dim span{[− u u→ 1 S m ]S } = dim span{u1 , . . . , um }.
Application: Control points of Bézier curves
Polynomials are indispensable for computer graphics, geometric design, computer fonts, and many related areas. Typically, a curve is modeled using polynomials in a parametric and piecewise fashion.
4
y(t)
4
y(t)
2
0
0.5
2
1
0
2 x(t)
4
0
2 x(t)
4
t 1 A parametric cubic curve (top right corner) with the corresponding cubic graphs for the y-component cubic (on top) and xcomponent cubic (on the right – axes have been swapped to match the parametric curve axis).
0.5 t
The parametric approach is made necessary by the fundamental limitation of any function y = f (x) (of which a polynomial function is a special case): a graph of such a function in the
Section 4.5 Coordinates
199
xy-plane must not pass through the same vertical line more than once (this is sometimes referred to as the “vertical line test” for a function graph). While some curves can be modeled this way (e.g., , ), most cannot (e.g., ∞, , ϕ). All of these shapes can, however, be described using parametric polynomial curves x = x(t) and y = y(t) (for space curves, we add the third component, z = z(t)). We will sometimes refer to such parametric polynomials using the standard vector-valued function notation ⎤ ⎡ x(t) x(t) ⎢ ⎥ → → − p (t) = or − p (t) = ⎣ y(t) ⎦ . y(t) z(t) (If the components x, y, and z are polynomials in Pn , then these vector-valued polynomials are in Cartesian product spaces Pn × Pn and Pn × Pn × Pn ; see Exercise 17 on p. 150.)
Some shapes may require several such pieces; i.e., they can be generated by piecewise (parametric) polynomial curves. Theoretically, complicated shapes could be modeled using fewer pieces involving parametric polynomials of high degree, but the increased complexity makes this approach less practical. Instead, piecewise parametric cubic curves are used very often, as they offer adequate flexibility.
The specific curve illustrated here corresponds to x(t) = 2 − 3t + 15t2 − 10t3 y(t)
=
(55)
5 − 12t + 12t2 ,
taken over the interval [0, 1] for t (a standard assumption for parametric curves). While it is possible to confirm that at least some values of these polynomials match the ones depicted (e.g., x(0) = 2, y(0) = 5), there is no easy way to relate representation (55) to the corresponding shape of the cubic.
Things will improve a great deal if we consider the Bernstein basis for P3 instead (see Example 4.35): S = {(1 − t)3 , 3t(1 − t)2 , 3t2 (1 − t), t3 }. The coordinate vectors ⎤ ⎤ ⎡ ⎡ c1 d1 ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ c2 ⎥ ⎢ ⎥ and [y(t)] = ⎢ d2 ⎥ [x(t)]S = ⎢ S ⎢ c ⎥ ⎢ d ⎥ ⎣ 3 ⎦ ⎣ 3 ⎦ c4 d4 are solutions of equations c1 (1 − t)3 + c2 3t(1 − t)2 + c3 3t2 (1 − t) + c4 t3 d1 (1 − t) + d2 3t(1 − t) + d3 3t (1 − t) + d4 t 3
2
2
3
= x(t) = y(t)
These can be solved by following the procedure used in Example 4.39. The solutions are ⎡ ⎡ ⎤ ⎤ 2 5 ⎢ ⎢ ⎥ ⎥ ⎢ 1 ⎥ ⎢ 1 ⎥ ⎢ ⎥ ⎥ [x(t)]S = ⎢ (56) ⎢ 5 ⎥ and [y(t)]S = ⎢ 1 ⎥ ⎣ ⎣ ⎦ ⎦ 4 5 (verify!).
200
Chapter 4 Vector Spaces It turns out that if we form a matrix of these two columns ⎡ 2 ⎢ ⎢ 1 [ [x(t)]S | [y(t)]S ] = ⎢ ⎢ 5 ⎣ 4
ò
5 1 1 5
⎤ ⎥ ⎥ ⎥ ⎥ ⎦
its rows carry a great deal of geometric significance in light of the fact that, from calculus, the → → vector − p (t0 ) is tangential to the curve − p (t) at t = t0 and of the following result.
Let S denote the Bernstein basis for P3 . The parametric curve with two x(t) → cubic polynomial components − p (t) = such that y(t) ⎤ ⎡ a11 a12 ⎥ ⎢ ⎢ a21 a22 ⎥ ⎥ [ [x(t)]S | [y(t)]S ] = ⎢ ⎥ ⎢ a ⎣ 31 a32 ⎦ a41 a42 satisfies the following properties: T HEOREM 4.21
− 1. → p (0) =
− 2. → p (0) = 3
a11 a12
− and → p (1) =
a21 a22
−
a11 a12
a41
a42
,
− and → p (1) = 3
a41 a42
−
a31 a32
.
P ROOF Note that letting t = 0, we obtain a11 (1 − 0)3 + a21 3(0)(1 − 0)2 + a31 3(0)2 (1 − 0) + a41 (0)3 = a11 , a12 (1 − 0)3 + a22 3(0)(1 − 0)2 + a32 3(0)2 (1 − 0) + a42 (0)3 = a12 . a41 → − in a similar fashion. Letting t = 1 yields p (1) = a42 x(0) = y(0) =
To establish the second property, begin by differentiating 3 2 2 3 (1 − t) + a 3t(1 − t) + a 3t (1 − t) + a t a 11 21 31 41 → − , p (t) = a12 (1 − t)3 + a22 3t(1 − t)2 + a32 3t2 (1 − t) + a42 t3 −a11 3(1 − t)2 + a21 3 (1 − t)2 − 2t(1 − t) + a31 3 2t(1 − t) − t2 + 3a41 t2 → − p (t) = −a12 3(1 − t)2 + a22 3 (1 − t)2 − 2t(1 − t) + a32 3 2t(1 − t) − t2 + 3a42 t2
then evaluating at t = 0 and t = 1 : − → p (0) =
−3a11 + 3a21 −3a12 + 3a22
,
− → p (1) =
−3a31 + 3a41 −3a32 + 3a42
.
Section 4.5 Coordinates
201
Notes: •
Theorem 4.21 can be extended to polynomial spaces Pn where n ≥ 1. If S is the Bernstein basis for Pn and ⎡ ⎤ a11 a12 ⎢ ⎥ ⎢ a21 a22 ⎥ ⎢ ⎥ ⎢ ⎥ .. .. [ [x(t)]S | [y(t)]S ] = ⎢ ⎥, . . ⎢ ⎥ ⎢ ⎥ an2 ⎦ ⎣ an1 an+1,1 an+1,2 then
y
4 •
an+1,1 • , and an+1,2 a a a a 21 11 n+1,1 n1 → • p (0) = n − and − p (1) = n − . a22 a12 an+1,2 an2 ⎡ ⎤ x(t) ⎢ ⎥ → The theorem also applies to polynomial curves in 3-space, where − p (t) = ⎣ y(t) ⎦ . − → p (0) =
a11 a12
− and → p (1) =
z(t)
2
⎤ a11 a12 ⎥ ⎢ ⎢ a21 a22 ⎥ ⎥ as the control points ⎢ We refer to the rows of the matrix [ [x(t)]S | [y(t)]S ] = ⎢ ⎥ ⎣ a31 a32 ⎦ a41 a42 of the cubic polynomial. These points are also said to form the control polygon. ⎡
0
2
4
x
The significance of the control polygon for the polynomial segment corresponding to t ∈ [0, 1] is that: •
The first point decides where the curve begins, at t = 0. Likewise, the last control point corresponds to the curve’s end, when t = 1 (In our example, these two points are at (2, 5) and (4, 5), respectively.)
•
The segment of the control polygon from the first to the second control point is tangential to the curve at the beginning. Likewise, the segment connecting the last two control points is tangential to the curve at the end. A parametric polynomial curve specified using a control polygon (i.e., coordinates with respect to the Bernstein basis) is called a Bézier curve. In a typical application, a sequence of Bézier curves is used to model a more complicated shape. One of such applications is in generating scalable computer fonts. Following is an illustration of a scalable font letter “r”, whose outline involves 16 segments. Some of these segments are straight, but most are cubic Bézier curves. The nodes (the points where one Bézier curve or line segment connects to another) are marked with squares, and the remaining control points are marked with circles. There are three types of nodes involved when connecting cubic Bézier curves – in the order of increasing smoothness, these are: cusps, asymmetric smooth nodes, and symmetric smooth nodes. Example of each of those are included in our outline.
202
Chapter 4 Vector Spaces
Cusp node - is not collinear with the two adjacent control points
Smooth symmetric node - the adjacent control points are collinear and the node is their midpoint
Smooth asymmetric node - the adjacent control points are collinear but their distances from the node are different
y
x Dilation/contraction
y
x
Representation of such outline curves using control points is much more efficient than storing the individual pixels, especially when one considers the need to store various sizes of the same font. Resizing a scalable font is an easy matter: perform the dilation/contraction linear transformation (see Example 1.20) on each control point (viewed as a position vector). In the same fashion, one can perform rotations (Example 1.23), reflections (Example 1.21), or obtain a slanted version of the font by using a shear transformation (see Exercise 20 in Section 1.4). For a formal justification of this, see Exercise 13 on p. 205.
Rotation Also, designing polynomial curves is routinely performed by directly editing control polygons.
y
Bézier curves possess many additional useful properties. Let us conclude this section by stating that for any positive integer n and for any real number t,
x n i=0
n! ti (1 − t)n−i i!(n − i)!
= 1,
which follows from the binomial expansion for a power of a sum: Reflection
y
(x + y)n
=
n i=0
n! xi y n−i i!(n − i)!
when we let x = t and y = 1 − t.
x Shear
− This means that any point → x on a Bézier curve is a barycentric combination of the control −−→ → − points b1 , . . . , bn+1 : n+1 → − −−→ → − x = c1 b1 + · · · + cn+1 bn+1 with ci = 1. i=1 n! Moreover, when 0 ≤ t ≤ 1, each coefficient cj = (j−1)!(n−j+1)! tj−1 (1 − t)n−j+1 ≥ 0, → making − x a convex combination of the control points. The curve is in the convex hull of its control points – the smallest convex set that contains all of them.
Section 4.5 Coordinates
EXERCISES
1 −1 1. Given a basis S = { , } for R2 , 2 3 −3 → → v = (check your answer), a. find [− v ]S for − 4 5 → → b. find − w such that [− w ]S = . 0 2. Given a basis T = {
0 −1
→ → u = a. find [− u ]T for −
, −6 −7
→ → b. find − v such that [− v ]T =
2
} for R2 ,
5
(check your answer), 4 . −1
⎡
⎤ ⎡ ⎤ ⎡ ⎤ 1 −1 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 3. Given a basis T = {⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦} for R3 , 0 0 1 ⎡ ⎤ 0 ⎢ ⎥ → → a. find [− w ]T for − w = ⎣ 4 ⎦ (check your answer), 1
⎡
⎤ 2 ⎢ ⎥ → → b. find − v such that [− v ]T = ⎣ 7 ⎦ . −3 ⎡
⎤ ⎡ ⎤ ⎡ ⎤ 1 1 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 4. Given a basis S = {⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦} for R3 , 2 1 0 ⎡ ⎤ −1 ⎢ ⎥ → → a. find [− u ]S for − u = ⎣ −3 ⎦ (check your answer), 4 ⎡ ⎤ 6 ⎢ ⎥ → → b. find − v such that [− v ]S = ⎣ −5 ⎦ . 2 ⎡
1
⎤ ⎡
1
⎤ ⎡
0
⎤ ⎡
0
⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ −1 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ , ⎢ ⎥} for R4 , 5. Given a basis T = {⎢ ⎥,⎢ ⎥,⎢ ⎥ ⎢ ⎥ ⎣ 0 ⎦ ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 1 ⎦ 0 0 −1 1 ⎡ ⎤ 4 ⎢ ⎥ ⎢ ⎥ −2 → − → − ⎢ ⎥ (check your answer), a. find [ w ]T for w = ⎢ ⎥ ⎣ −2 ⎦ 0
203
204
Chapter 4 Vector Spaces ⎡ ⎢ ⎢ − → → − b. find v such that [ v ]T = ⎢ ⎢ ⎣ ⎡ ⎢ ⎢ 6. Given a basis S = {⎢ ⎢ ⎣
−1 0 1 0 ⎡
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎦ ⎣
4 1 −2 1 1 1 0 0
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎦ ⎣
0 −2 0 2
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎦ ⎣
3 1 1 −1
⎤ ⎥ ⎥ ⎥} for R4 , ⎥ ⎦
⎤ 0 ⎥ ⎢ ⎢ −3 ⎥ → − → − ⎥ (check your answer), ⎢ a. find [ u ]S for u = ⎢ ⎥ ⎣ 0 ⎦ −1 ⎤ ⎡ 2 ⎥ ⎢ ⎢ −2 ⎥ → − → − ⎥. ⎢ b. find v such that [ v ]S = ⎢ ⎥ 1 ⎦ ⎣ 0 7. Given a basis S = {t + 1, t − 1} for P1 , → → a. find [− v ]S for − v = 3t − 1 (check your answer), 6 → − → − b. find w such that [ w ]S = . 4 8. Given a basis T = {1 + t2 , 1 − t, t − t2 } for P2 ,
→ → a. find [− u ]T for − u = 2 − t + 3t2 (check your answer), ⎡ ⎤ −3 ⎢ ⎥ → → b. find − v such that [− v ]T = ⎣ 1 ⎦ . 1
1 0 1 1 0 0 0 0 9. Given a basis S = { , , , } for M22 , 0 −1 0 −1 1 0 0 1 3 2 → − → − (check your answer), a. find [ v ]S for v = −1 1 ⎤ ⎡ −2 ⎥ ⎢ ⎢ 1 ⎥ → → ⎥ b. find − w such that [− w ]S = ⎢ ⎢ 0 ⎥. ⎦ ⎣ 0 1 0 1 1 1 1 1 10. Given a basis S = { , , , 0 0 0 0 1 0 1 −1 1 → → v = (check your answer), a. find [− v ]S for − 3 3
1 1
} for M22 ,
Section 4.5 Coordinates ⎡ ⎢ ⎢ → − → − b. find w such that [ w ]S = ⎢ ⎢ ⎣
1 0 1 −1
205
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
⎧⎡ ⎪ ⎨ ⎢ 11. Consider a basis S = ⎣ ⎪ ⎩
⎤ ⎡ ⎤⎫ 1 0 ⎪ ⎬ ⎥ ⎢ ⎥ 2 ⎦ , ⎣ 2 ⎦ for a two-dimensional subspace of R3 (a plane ⎪ ⎭ −1 3 passing through the origin). ⎡ ⎤ ⎡ ⎤ 2 3 ⎢ ⎥ ⎢ ⎥ → → → → a. If possible, find [− u ]S for − u = ⎣ −2 ⎦ and [− v = ⎣ 1 ⎦. What does v ]S such that − −11 1 it say about a vector when its coordinate vector with respect to S cannot be found? 4 → → . b. Find − w such that [− w ]S = 1
1 0 0 1 1 0 12. Consider the set S = { , , }. Recall that in Example 4.19 0 1 1 0 0 −1 on p. 166, S was shown to be linearly independent. Therefore, S forms a basis for a threedimensional subspace of M22 . 1 2 0 2 → → → → a. If possible, find [− u ]S for − u = and [− v ]S such that − v = . What 3 4 2 −3 does it say about a vector when its coordinate vector with respect to S cannot be found? ⎡ ⎤ −1 ⎢ ⎥ → → b. Find − w such that [− w ]S = ⎣ 2 ⎦ . 3 − 13. * Let → p (t) =
x(t) y(t)
be a parametric curve in the plane and let B be a 2 × 2 matrix.
x(t) for all t y(t) also defines curve in the plane and find the formulas for the components a parametric ∗ (t) x → − involving elements of B. q (t) = y ∗ (t) b. If x(t) and y(t) are polynomials in Pm and S is a basis for Pm , prove that [x∗ (t)]TS [x(t)]TS = . B T T [y(t)]S [y ∗ (t)]S (If S is a Bernstein basis, then this result implies that the same matrix transforming → → each of the points on the curve − p (t) to − q (t) can instead be multiplied by a transpose → − of the matrix of control points of p to obtain the transpose of the matrix of control → points for − q .) ⎡ ⎤ x(t) ⎢ ⎥ → c. Is the same type of result valid for parametric curves in 3-space − p (t) = ⎣ y(t) ⎦? a. Show that
− → q (t) = B
z(t)
206
Chapter 4 Vector Spaces ⎡
⎤ x(t) ⎢ ⎥ → 14. * Show that for every quadratic Bézier curve in 3-space − p (t) = ⎣ y(t) ⎦ (where x(t), z(t) y(t), z(t) ∈ P2 ) there exists a plane that contains the entire curve. (Therefore, such a curve is not truly a space curve – this is one of the main reasons why quadratic curves are of limited use in 3-space.) →} →, . . . , − u 15. * Let A be an m×n matrix and let B be an n×p matrix. Furthermore, let S = {− u 1 q be a basis for the null space of B. →, − → → − →, . . . , − u a. Show that there exists a basis T = {− u 1 q v1 , . . . , vk } for the null space of the product AB (where k can be 0). → → vk are linearly indeb. If k = 0 in the basis T obtained in part a, show that B − v1 , . . . , B − pendent vectors in the null space of A. (Hint: Use Theorem 4.18 to show that assuming → − → → v1 + · · · + ck B − vk = 0 with at least some nonzero coeffilinear dependence, i.e., c1 B − → → − → − → − v ∈ / span S – a cients, yields v = c1 v1 + · · · + ck vk in the null space of B such that − contradiction.) 16. * If W1 and W2 are subspaces of a vector space V , then the sum of W1 and W2 defined in (45) was shown in Exercise 39 on p. 162 to be a subspace of V. If → − W1 ∩ W2 = { 0 }, then the sum of W1 and W2 is called a direct sum and is denoted →+− →|− → − → W1 ⊕ W2 = {− w w 1 2 w1 ∈ W1 ; w2 ∈ W2 }. → Show that if W1 and W2 are finite-dimensional, then for every vector − u ∈ W1 ⊕ W2 there − → − → − → − → − → exist unique vectors w1 in W1 and w2 in W2 such that u = w1 + w2 .
4.6 Rank and Nullity In this section, we shall restrict our discussion to matrices and the n-spaces of vectors (thus, no general vector spaces, composed of functions, polynomials, etc., will be dealt with here). Recall Theorem 4.6 stating that any set of vectors in a vector space V spans a subspace of V. It follows that the columns of an m × n matrix span a subspace of Rm . Likewise, the rows of such a matrix, considered as n-vectors, span a subspace of Rn . D EFINITION (Column Space and Row Space) Let A be an m × n matrix. • •
The subspace of Rm spanned by the n columns of A is called the column space of A. Its dimension is said to be the column rank of A. The subspace of Rn spanned by the m (transposed) rows of A is called the row space of A. Its dimension is said to be the row rank of A.
In the next two subsections, we develop procedures for gathering important information about the column and row spaces, in particular: their bases and their dimensions (the column rank and row rank).
Section 4.6 Rank and Nullity
Finding a basis for the column space
207
⎡
E XAMPLE 4.43
⎤ 1 −1 0 1 ⎢ ⎥ Let A = ⎣ 2 −2 1 1 ⎦ . The column space of A, 3 −3 1 2 ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ −1 0 1 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ span{⎣ 2 ⎦, ⎣ −2 ⎦, ⎣ 1 ⎦, ⎣ 1 ⎦}, −3 1 2 3 → − v1
→ − v2
→ − v3
→ − v4
3
is a subspace of R . ⎡
⎤ ⎡ ⎤ 1 0 ⎢ ⎥ ⎢ ⎥ → → In Example 4.32, we found that the vectors − v1 = ⎣ 2 ⎦ and − v3 = ⎣ 1 ⎦ form a basis for this 3 1 space. While you should refer to that example for more details, in essence the result was based on the r.r.e.f. (or r.e.f.) of A: ⎤ ⎡ 1 1 −1 0 ⎥ ⎢ 0 1 −1 ⎦ . ⎣ 0 0 0 0 0 The positions of the leading entries indicate which columns of the original matrix A form a basis for the column space of A. (Note that Example 4.32 included the zero right-hand side values, which can be safely omitted as they will not influence the layout of the leading entries.) The column rank of A is 2.
Finding a basis for the row space
Before we propose a procedure to produce a basis for the row space, let us state two helpful results. T HEOREM 4.22
If matrices A and B are row equivalent, then row space of A = row space of B.
P ROOF Performing any elementary row operation on a matrix A results in a matrix A1 whose rows are linear combinations of rows of A. Consequently the row space of A1 is contained in the row space of A. Repeating this reasoning for each subsequent row operation in the chain A → A1 → A2 → · · · → B we see that the row space of B is contained in the row space of A. However, by Theorem 2.3, we can also perform a sequence of elementary row operations on B to transform it into A, showing that the row space of A is contained in the row space of B. We conclude that the row spaces of the matrices A and B are identical.
208
Chapter 4 Vector Spaces T HEOREM 4.23 dent.
The nonzero rows of a matrix in a row echelon form are linearly indepen-
P ROOF Let us refer to the third condition of the definition of r.r.e.f. and r.e.f. on p. 65, according to which the leading entries in nonzero rows (i.e., the leftmost nonzero entries in these rows) must form a staircase pattern. This guarantees that below each leading entry in r.e.f. there can be zeros only. We now set a linear combination of these nonzero rows equal to a transposed zero vector: c1 + c2 + c3 + ···
[ [ [
1 0 0
0 0
1 0
1
··· ··· ···
]
]
]
=
[
0
0
0
0
···
0
]
•
Focus on the column with the first leading entry. To match the zero value in this column on the right-hand side, c1 must equal zero.
•
Now, with c1 = 0, look at the column with the second leading entry. In order to match the zero value underneath, c2 must equal zero.
•
etc. In conclusion, all values c1 , c2 , .... must equal zero, making the nonzero rows linearly independent.
(A technical note: In the proof above, it was easier to write matrix rows as 1 × n matrices, to keep their row-by-row appearance. In future examples, we will sometimes transpose such rows so that they can be considered vectors in Rn .) If a matrix contains zero rows, then its row space is spanned by its nonzero rows (refer to Exercise 32 on p. 162 in Section 4.2). According to Theorems 4.22 and 4.23, the nonzero rows of an r.e.f. of A form a basis for the row space of A. ⎡
⎤ 1 −1 0 1 ⎢ ⎥ E XAMPLE 4.44 The r.r.e.f. we found in the previous example for A = ⎣ 2 −2 1 1 ⎦ 3 −3 1 2 can serve as a row echelon form: ⎡ ⎤ 1 1 −1 0 ⎥ ⎢ 0 1 −1 ⎦ . ⎣ 0 0 0 0 0 The two nonzero rows form a basis for the row space of A. ⎡ ⎤ ⎡ ⎤ 1 0 ⎢ ⎥ ⎢ ⎥ ⎢ −1 ⎥ ⎢ 0 ⎥ 4 ⎢ ⎥ ⎢ ⎥ Written as vectors in R , these are ⎢ ⎥ and ⎢ 1 ⎥ . ⎣ 0 ⎦ ⎣ ⎦ 1 −1 The row rank of A is 2.
Section 4.6 Rank and Nullity
The rank of a matrix
209
Let us compare the procedures for finding bases for column and row spaces: For column space of A:
For row space of A:
1. Find B, an r.e.f. of A. 2. The leading entries of B point to columns of A that form a basis.
1. Find B, an r.e.f. of A. 2. The rows of B that contain leading entries form a basis.
The main difference between the two procedures is that •
for a basis of the column space, the leading entries in a row echelon form of A are used as pointers to the original matrix, whereas
•
for a basis of the row space, the rows containing leading entries in an r.e.f. of A are themselves a basis (rather than being pointers). It is at least as important to realize the main thing that the two procedures have in common:
•
the number of vectors in bases for both the column space and the row space is the same as the number of leading entries in r.e.f. of A. In other words, we arrive at the following result.
T HEOREM 4.24
For any matrix A, column rank A = row rank A.
In light of this, it is no longer necessary to distinguish between the two ranks. D EFINITION (Rank of a Matrix) The rank of a given matrix A, denoted by rank A, is the number equal to both the column rank of A and the row rank of A.
⎡
E XAMPLE 4.45
Basis for null space
⎤ 1 −1 0 1 ⎢ ⎥ As shown in the last two examples, the rank of ⎣ 2 −2 1 1 ⎦ is 2. 3 −3 1 2
If A is an m × n matrix, then the solution set of the homogeneous system → − → A− x = 0 is a subspace of Rn (refer to Example 4.10 for a verification of that); this space is called the solution space of the system, or the null space of the matrix A.
210
Chapter 4 Vector Spaces D EFINITION (Nullity) The nullity of A (denoted nullity A) is the dimension of the null space of A.
E XAMPLE 4.46
Let
⎡ ⎢ ⎢ A=⎢ ⎢ ⎣
0 −1 2 0
1 3 0 2
1 2 1 0 4 1 2 −2
⎤
0 4 3 6
⎥ ⎥ ⎥. ⎥ ⎦
→ − → − → To solve the system A− x = 0 , we could form the augmented matrix [A | 0 ] and transform → − it to the r.r.e.f. [B | 0 ], or it is equivalent to simply transforming A to its r.r.e.f. B. Both approaches carry exactly the same amount of information because of the zero right-hand sides.
LA TOOLKIT .COM IL 26748372846 NG 03571894251 EE 71028335189 AB 25729491092 RR 17398108327 + A 39163728081 The module “finding a basis of a null space of a matrix” will allow you to generate additional examples like this one.
Using the latter approach, we find the r.r.e.f. ⎡ 1 0 ⎢ ⎢ 0 1 ⎢ ⎢ 0 0 ⎣ 0 0
of A: 2 1 0 0
0 0 1 0
2 2 −1 0
⎤ ⎥ ⎥ ⎥. ⎥ ⎦ ⎡
⎤ x1 ⎢ ⎥ ⎢ x2 ⎥ ⎢ ⎥ → ⎥ Of the five unknowns – components of the vector − x = ⎢ ⎢ x3 ⎥ – x3 and x5 are arbitrary ⎢ ⎥ ⎣ x4 ⎦ x5 because the corresponding columns in the r.r.e.f. do not contain leading entries. The remaining unknowns can be solved for: x1 x2 x4
= −2x3 − 2x5 = −x3 − 2x5 = x5 .
→ Substituting these expressions into − x we can write ⎤ ⎤ ⎤ ⎡ ⎡ ⎡ −2 −2 −2x3 − 2x5 ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ ⎢ −1 ⎥ ⎢ −2 ⎥ ⎢ −x3 − 2x5 ⎥ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ → − ⎥ = x3 ⎢ 1 ⎥ + x5 ⎢ 0 ⎥ . x =⎢ x3 ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ x5 ⎦ ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 0 1 x5 Note the pattern of 1’s and 0’s in rows number 3 and 5: ⎤ ⎡ ⎡
⎥ ⎢ ⎢ ⎢ ⎥ ⎢
⎥ ⎢ ⎢ ⎥ ⎢ x3 ⎢ ⎢ 1 ⎥ + x5 ⎢ 0 ⎥ ⎢ ⎢ ⎣ ⎦ ⎣
0 1
⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦
A similar pattern will always emerge as a result of this procedure: when the jth unknown is arbitrary, the vector corresponding to xj has 1 on the jth component, and the remaining vectors have 0 on the jth component.
Section 4.6 Rank and Nullity 211 ⎤ ⎡ ⎤ ⎡ −2 −2 ⎥ ⎢ ⎥ ⎢ ⎢ −2 ⎥ ⎢ −1 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ Furthermore, this pattern ensures that the vectors in question (in our case ⎢ ⎢ 1 ⎥ and ⎢ 0 ⎥) ⎥ ⎢ ⎥ ⎢ ⎣ 1 ⎦ ⎣ 0 ⎦ 1 0 are linearly independent since the only way to have the third component equal zero is when x3 = 0, and the same goes for other arbitrary unknowns. ⎤ ⎡ ⎤ ⎡ −2 −2 ⎥ ⎢ ⎥ ⎢ ⎢ −2 ⎥ ⎢ −1 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ As a result, the vectors ⎢ ⎢ 1 ⎥ and ⎢ 0 ⎥ form a basis for the null space of A. ⎥ ⎢ ⎥ ⎢ ⎣ 1 ⎦ ⎣ 0 ⎦ 1 0 The nullity of A is 2.
In general, when the vector equation (20) is applied to a homogeneous system and zero components are omitted, we obtain ⎧ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ x i1 c1,j1 c1,j2 c1,jn−k ⎪ → ⎪ ⎪ ⎢ ⎪ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ xi ⎥ ⎢ c2,j ⎥ ⎢ c2,j ⎥ ⎢ c2,jn−k ⎥ leading ⎨ → ⎢ ⎢ . 2 ⎥ = xj ⎢ . 1 ⎥ + xj ⎢ . 2 ⎥ + · · · + xj ⎢ ⎥ .. 1 ⎢ 2 ⎢ n−k ⎢ ⎥ ⎥ ⎥ ⎥ → ⎢ unknowns ⎪ ⎪ ⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦ ⎣ ⎦ . ⎪ ⎪ ⎩ → x ik ck,j1 ck,j2 ck,jn−k where C denotes the negative of the reduced row echelon form of A. On the other hand, we also have ⎧ ⎡ ⎤ ⎤ ⎤ ⎤ ⎡ ⎡ ⎡ xj1 1 0 0 ⎪ → ⎪ ⎪ ⎢ ⎪ ⎥ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ x j2 ⎥ ⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥ nonleading ⎨ → ⎢ ⎢ ⎥ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ + x + · · · + x = x j1 ⎢ . ⎥ j2 ⎢ . ⎥ jn−k ⎢ . ⎥ . .. ⎥ → ⎢ unknowns ⎪ ⎪ ⎣ ⎦ ⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦ . ⎪ ⎪ ⎩ → 0 0 1 xjn−k We can merge these two vector equations into one by employing partitioned matrices (vectors): ⎡ ⎧ ⎪ leading ⎨ ⎢ ⎢ ⎢ unknowns ⎪ ⎩⎢ ⎢ ⎧⎢ ⎢ ⎪ ⎢ ⎪ ⎪ ⎢ ⎪ nonleading ⎨ ⎢ ⎢ ⎢ unknowns ⎪ ⎢ ⎪ ⎪ ⎪ ⎩⎣
⎤ ⎤ ⎤ ⎡ ⎡ c1,j1 c1,j2 c1,jn−k ⎥ ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ .. ⎥ ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ ⎥ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ ⎥ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ x ik ⎥ ⎢ ck,j1 ⎥ ⎢ ck,j2 ⎥ ⎢ ck,jn−k ⎥ ⎥ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ xj1 ⎥ = xj1 ⎢ 1 ⎥ + xj2 ⎢ 0 ⎥ + · · · + xjn−k ⎢ 0 ⎥. ⎥ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ ⎥ ⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ 0 x j2 ⎥ ⎥ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ ⎥ ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ . .. ⎥ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ . . . . . . . ⎦ ⎦ ⎦ ⎦ ⎣ ⎣ ⎣ 0 0 1 xjn−k The pattern of 1’s and 0’s that was observed in the example will guarantee that the vectors on the right-hand side are linearly independent. This will continue to be the case after the rows are interchanged on both sides (to sort in the increasing order of the index of xi instead of grouping the leading unknowns above the nonleading ones). Hence, the dimension of the null space of a matrix equals the number of nonleading columns. x i1 .. .
⎤
⎡
212
Chapter 4 Vector Spaces
Rank-nullity theorem
Recall that vectors in bases for both column and row spaces of A corresponded to the locations of the leading entries in r.e.f. of A. Consequently, we had rank A = number of columns with leading entries in r.e.f. of A.
(57)
When identifying vectors to form a basis of the null space of A, we look for the columns in r.e.f. of A without leading entries – each of them corresponds to an arbitrary unknown and a vector in a basis for the null space. Thus, the dimension of the null space of A, nullity A = number of columns without leading entries in r.e.f. of A.
(58)
Adding equalities (57) and (58) leads us directly to the main result of this section.
T HEOREM 4.25
(Rank-Nullity Theorem)
If A is an m × n matrix, then rank A + nullity A = n.
Nine equivalent statements
For an n × n matrix A, Theorem 4.25 leads to two possible scenarios: (a) If nullity A = 0, then rank A = n so that • •
the n columns of A must be linearly independent and the n rows of A are L.I.
(b) If nullity A > 0, then rank A < n. Consequently, the n columns of A (and the n rows of A) are L.D. We incorporate these results into our lists of equivalent statements from p. 165: 9 Equivalent Statements For an n × n matrix A, the following statements are equivalent. 1. A is nonsingular. 2. A is row equivalent to In . → − → − → 3. For every n-vector b , the system A− x = b has a unique solution. → − → 4. The homogeneous system A− x = 0 has only the trivial solution. 5. det A = 0. 6. The columns of A are linearly independent. 7. The rows of A are linearly independent. 8. rank A = n. 9. nullity A = 0.
Section 4.6 Rank and Nullity 9 Equivalent “Negative” Statements For an n × n matrix A, the following statements are equivalent. -1. A is singular. -2. A is not row equivalent to In . → − → − → -3. For some n-vector b , the system A− x = b has either no solution or many solutions. → − → -4. The homogeneous system A− x = 0 has nontrivial solutions. -5. det A = 0. -6. The columns of A are linearly dependent. -7. The rows of A are linearly dependent. -8. rank A < n. -9. nullity A > 0.
EXERCISES
In Exercises 1–4, given the matrix A, find i. a basis for the column space of A, ii. a basis for the row space of A, iii. the rank of A, iv. a basis for the null space of A, v. the nullity of A. 1. a.
1 0 2 2 −1 1
⎡
⎤ 1 1 ⎢ ⎥ 2. a. ⎣ −1 1 ⎦ ; 1 2 ⎡ 1 0 −1 ⎢ ⎢ 0 2 0 3. a. ⎢ ⎢ 0 1 3 ⎣ 1 0 1 ⎡ ⎢ ⎢ 4. a. ⎢ ⎢ ⎣
−2 0 0 1
0 1 0 2
2 1 0 1
; b.
2 −1
−4 2
⎡
⎤ 1 0 −1 2 ⎢ ⎥ ; c. ⎣ −2 0 2 −4 ⎦ . 1 0 −1 2
⎤ 0 −1 0 ⎢ ⎥ b. ⎣ 1 2 1 ⎦; 0 1 0 ⎤ ⎡ 0 0 0 ⎢ ⎥ ⎢ ⎥ ⎥ ; b. ⎢ 0 1 −1 ⎢ 0 0 ⎥ 0 ⎣ ⎦ 0 1 1 2 2 0 1
0 0 0 2
⎤
⎡
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ; b. ⎢ ⎥ ⎢ ⎢ ⎦ ⎣
⎡
⎡
⎤ 1 2 0 2 1 ⎢ ⎥ c. ⎣ 0 0 1 1 1 ⎦. 2 4 −1 3 1 ⎤ 1 1 −1 ⎥ 0 0 2 ⎥ ⎥. 2 2 −2 ⎥ ⎦ 0 2 0
−1 2 −7 3 0 1 2
0
1 7 0 1 −2 −1 0 3 3 0 6 −2
−2
2 1
6 1 4 1 2 0 −1 −3 4 6 2 0
⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦
213
214
Chapter 4 Vector Spaces − → → 5. Given a linear system with an m × n coefficient matrix A, A− x = b , what conditions → − should rank A and rank [A | b ] satisfy in order for the system to have a. no solution, b. one solution, or c. many solutions?
T/F?
In Exercises 6–10, decide whether each statement is true or false. Justify your answer. 6. No 3 × 3 matrix can have rank equal to zero. 7. rank In = n. 8. If columns of a 5 × 5 matrix span R5 , then so do the rows. 9. The columns of any 4 × 7 matrix are linearly dependent. 10. If a 5 × 9 matrix A has nullity 4, then its columns span R5 .
? ? ? ?
R
m
of A pace AB ns ce o f spa
? ? ?
um
col
lu m co n ? ? ?
In Exercises 11–14, show an example of a matrix described or explain why such a matrix cannot exist. 11. It is possible for a 3 × 6 matrix to have nullity 4. 12. It is possible for a 6 × 3 matrix to have nullity 4. 13. It is possible for a 6 × 3 matrix to have rank 4. 14. It is possible for a 3 × 6 matrix to have rank 4.
15. * Show that if A is an m × n matrix, then AT A is invertible if and only if rank(A) = n. (Hint: You may find the result established in Exercise 29 on p. 33 useful.) 16. Consider an m × n matrix A and an n × p matrix B. → a. Show that for any m-vector − y , the following is always true: → − → y ∈ (column space of AB) ⇒ − y ∈ (column space of A); (59) therefore, (column space of AB) ⊆ (column space of A). ⎡ ⎤ ⎡ ⎤ 1 0 0 1 0 0 ⎢ ⎥ ⎢ ⎥ b. Using m = n = p = 3, A = ⎣ 0 1 0 ⎦, and B = ⎣ 0 0 0 ⎦ , find an example 0 0 0 0 0 0 → → of a vector − y ∈ (column space of A) for which − y ∈ / (column space of AB). Since this shows that the converse of (59) is not generally true, it also follows that the column spaces of AB and A are generally not equal. c. Based on the results of parts a and b, the diagram on the left illustrates the relationship between the two column spaces. Fill in the three examples of vectors in the diagram for A and B defined in part b.
Section 4.6 Rank and Nullity
215
17. Consider an m × n matrix A and an n × p matrix B. → a. Show that for any p-vector − x, → − → x ∈ (null space of B) ⇒ − x ∈ (null space of AB). ⎡ ⎤ ⎡ ⎤ 1 0 0 1 0 0 ⎢ ⎥ ⎢ ⎥ b. Using m = n = p = 3, A = ⎣ 0 0 0 ⎦, and B = ⎣ 0 1 0 ⎦ , find an example 0 0 0 0 0 0 → − → − of a vector x ∈ (null space of AB) for which x ∈ / (null space of B). c. Draw a diagram similar to the one in part c of Exercise 16, and for A and B defined in part b of this exercise, include one example of a vector from each subset (a total of three vectors). 18. Consider an m × n matrix A and an n × p matrix B. Follow the steps of the last two examples to draw a diagram similar to the one in part c of Exercise 16 illustrating the relationship between the row space of AB and the row space of one other matrix (which one?). Illustrate this relationship by choosing specific A and B, then including one example of a vector from each subset (a total of three vectors). 19. * The spaces considered in the previous three exercises: column space of A, null space of A, and row space of A, along with the null space of AT are sometimes referred to as the four fundamental spaces of a matrix A. Repeat Exercise 18 for the null space of AT . 20. * Use Theorem 4.25 to show that for any m × n matrix A rankAT + nullityAT = m. Furthermore, if B is an n × p matrix, show that rankA ≥ rank(AB), rankB ≥ rank(AB), nullityAT ≤ nullity(AB)T , nullityB ≤ nullity(AB). 21. * Prove Sylvester’s Law of Nullity: if A and B are n × n matrices, then max{nullity(A), nullity(B)} ≤ nullity(AB) ≤ nullity(A) + nullity(B). (Hints: The first inequality follows from Exercise 20 and Theorem 4.25. When proving the second inequality, refer to Exercise 15 on p. 206.) 22. * Let U and V be n × n nonsingular matrices. Show that the rank of the n × 2n matrix is n. (Hint: Show that the rank of C T is n by row reducing it C = U +V U −V to a matrix whose 2n rows include n linearly independent ones.) 23. * Let A be an m×n matrix and let B be an n×m matrix such that AB = Im and BA = In . Show that m must equal n. (Hint: Use rank inequalities of Exercise 20.)
216
Chapter 4 Vector Spaces
4.7 Chapter Review
’
,
Section 4.7 Chapter Review
Row space, column space, and rank row echelon form: elementary row operations ...
1 b12 b13 b14 ... b1n 0 0 1 b24 ... b2n 0 0 0 1 ... b3n
B=
...
... ... ... ...
...
...
...
... ...
0 0 0 0 ... 0
am1 am2 am3 am4 ... amn
Number of leading entries of B
=
dim(row space of A)
=
leading columns of A form a basis for column space of A (space spanned by all columns of A)
nonzero rows of B form a basis for row space of A (space spanned by all rows of A)
=
a11 a12 a13 a14 ... a1n a21 a22 a23 a24 ... a2n A = a31 a32 a33 a34 ... a3n
=
dim (column space of A)
rank A
Null space and nullity The solution set of the homogeneous system A x = 0 is a vector space called the null space of A.
dim(null space of A)
=
nullity A
Rank and nullity
number of columns with leading entries
number of columns
A=
elementary row operations ... B=
1 0 0 0 0
0 0 0 0
1 0 0 0
1 0 0
rank A 0 0
1 0
number of columns without leading entries
+ nullity A =
n
217
218
Chapter 5 Linear Transformations
5 u
Linear Transformations F
v
F
F(u ) F( v ) In Section 1.4 we introduced the notion of a linear transformation from Rn to Rm . Several specific linear transformations in R2 and R3 were discussed therein, including examples of
+
+
projections, reflections, and rotations.
u+v
F
F( u )+F( v ) ? F( u + v )
Illustration of condition 1 of the definition of linear transformation.
In this chapter, linear transformations will be investigated in more detail. We will extend this notion to general vector spaces, so that we will be able to consider linear transformations F : V → W where V and W are vector spaces as defined in Section 4.1 (i.e., they will no longer be restricted to be Rn ).
5.1 Linear Transformations in General Vector Spaces As we have already seen in Section 1.4, a transformation (or a function) F : V → W is a rule that assigns a single value from the codomain W to each value from its domain V . As we have done in Section 1.4 for V = Rn and W = Rm , we distinguish a special class of such transformations for more general V and W.
u
F
F( u )
·c
·c
cu
cF( u ) ? F(c u )
F
Illustration of condition 2 of the definition of linear transformation.
D EFINITION (Linear Transformation) Given vector spaces V and W, the transformation F : V → W, which assigns a single → → vector F (− u ) in W to every vector − u in V, is said to be a linear transformation if → → → → → → 1. for all vectors − u,− v in V , F (− u +− v ) = F (− u ) + F (− v ) and → → → 2. for all vectors − u in V and real numbers c, F (c− u ) = cF (− u ). → → The vector F (− u ) is called the image of − u under the transformation F.
T HEOREM 5.1
If F : V → W is a linear transformation and V, W are vector spaces, then
→ − → − a. F ( 0 ) = 0 and →+c − → − → − → − → − → − → u b. F (c1 − 1 2 u2 + · · · + ck uk ) = c1 F (u1 ) + c2 F (u2 ) + · · · + ck F (uk ) for all vectors u1 , − → − → u , . . . , u in V and real numbers c , c , . . . , c . 2
k
P ROOF (part a) → − F( 0 )
1
2
k
→ − = F (0 0 ) (Theorem 4.2) → − = 0F ( 0 ) (condition 2 of the linear transformation definition) → − = 0 (Theorem 4.2)
Section 5.1 Linear Transformations in General Vector Spaces
219
We leave the proof of part b as an exercise for the reader. We begin with a few examples involving transformations from Rn to Rm , such as those discussed in Section 1.4. Subsequently, we shall provide more general examples.
⎡
E XAMPLE 5.1
•
⎤ x x+y ⎢ ⎥ F1 ( ⎣ y ⎦ ) = is a linear transformation: 2z z
Condition 1⎡is satisfied ⎤ ⎤ ⎡since ⎡ x x ⎥ ⎢ ⎥ ⎢ ⎢ LHS = F1 (⎣ y ⎦ + ⎣ y ⎦) = F1 (⎣ z z ⎡ ⎤ ⎡ ⎤ x x ⎢ ⎥ ⎢ ⎥ RHS = F1 (⎣ y ⎦) + F1 (⎣ y ⎦) = z
z •
⎤ x + x ) + (y + y ) (x + x ⎥ , y + y ⎦) = 2(z + z ) z+z x + y + x + y x+y x + y = . + 2z 2z + 2z 2z
Condition 2 holds ⎡ as ⎤ well: ⎡ ⎤ x cx cx + cy ⎢ ⎥ ⎢ ⎥ LHS = F1 (c ⎣ y ⎦) = F1 (⎣ cy ⎦) = , 2cz z cz ⎡ ⎤ x x+y ⎢ ⎥ RHS = cF1 (⎣ y ⎦) = c . 2z z
E XAMPLE 5.2
F2 (2
1 0
F2 (
) = F2 (
x y
2 0
⎤ x2 − y 2 ⎥ ⎢ )=⎣ xy ⎦ is not a linear transformation since x
⎡
⎡
⎡ ⎤ ⎤ ⎡ ⎤ 4 1 2 1 ⎢ ⎢ ⎥ ⎥ ⎢ ⎥ ) = ⎣ 0 ⎦ does not equal 2F2 ( ) = 2⎣ 0 ⎦ = ⎣ 0 ⎦, 0 2 1 2
violating condition 2.
An alternative method to proceed in Example 5.1 would be to use Theorem 1.8 – once a matrix ⎡ ⎤ ⎡ ⎤ x x 1 1 0 ⎢ ⎢ ⎥ ⎥ representation is found for F1 : F1 (⎣ y ⎦) = ⎣ y ⎦, it follows that F1 is 0 0 2 z z a linear transformation. However, a word of caution is in order: just because such a matrix representation does not appear possible, as in Example 5.2, we cannot rule out its existence.
220
Chapter 5 Linear Transformations 3 2 x −x +x−1 +1 x x2 +1 E XAMPLE 5.3 F3 ( may not look like a linear transforma) = y y tion. However, after some simple algebra transformations, (x2 + 1)(x − 1) x3 − x2 + x − 1 +1= + 1 = x − 1 + 1 = x, 2 x +1 x2 +1 x x 1 0 x we conclude that it is a linear transformation: F3 ( )= = . y y 0 1 y
The transformation F : Mnn → Mnn defined by F (A) = AT satisfies conditions 1 and 2 of the definition of a linear transformation as a consequence of parts 1 and 2 of Theorem 1.4, respectively ((A + B)T = AT + B T and (cA)T = cAT ). E XAMPLE 5.4
H : Mnn → Mnn defined by H(A) = A−1 is not a linear transformation. In fact, H does not even produce a result for every square matrix (only for nonsingular ones). E XAMPLE 5.5
While it is no longer necessary, it can also be checked that H does not satisfy either condition of the linear transformation definition.
The transformation G : P3 → P3 given by G(a0 + a1 t + a2 t2 + a3 t3 ) = a1 + 2a2 t + 3a3 t2 is a linear transformation: E XAMPLE 5.6
Condition 1 of the definition is satisfied since G( a0 + a1 t + a2 t2 + a3 t3 + b0 + b1 t + b2 t2 + b3 t3 ) = G((a0 + b0 ) + (a1 + b1 )t + (a2 + b2 )t2 + (a3 + b3 )t3 ) = (a1 + b1 ) + 2(a2 + b2 )t + 3(a3 + b3 )t2 = G(a0 + a1 t + a2 t2 + a3 t3 ) + G(b0 + b1 t + b2 t2 + b3 t3 ). Condition 2 holds as well: G(c(a0 + a1 t + a2 t2 + a3 t3 )) = G(ca0 + ca1 t + ca2 t2 + ca3 t3 ) = ca1 + 2ca2 t + 3ca3 t2 = cG(a0 + a1 t + a2 t2 + a3 t3 ).
ò
Formula (60) in the example above could be rewritten as G(p) = p since the transformation in question amounts to differentiation of polynomials. Here is another calculus-related transformation.
(60)
Section 5.1 Linear Transformations in General Vector Spaces
ò
221
E XAMPLE 5.7 Let CR denote the subspace of FR containing functions continuous over the entire set of real numbers (see Exercise 17 in Section 4.2). Consider the transformation H : CR → R defined by ) 1 H(f ) = f (x) dx. Standard properties of definite integrals imply • •
*1 0 *1 0
0
*1 *1 (f (x) + g(x)) dx = 0 f (x) dx + 0 g(x) dx and *1 cf (x) dx = c 0 f (x) dx;
therefore, H is a linear transformation.
Let F : P4 → M22 be the transformation defined by 0 a0 + 2a2 2 3 4 . F (a0 + a1 t + a2 t + a3 t + a4 t ) = a0 − 4a4 a2 + 2a4
E XAMPLE 5.8
We check that condition 1 of the linear transformation definition holds: F ((a0 + a1 t + a2 t2 + a3 t3 + a4 t4 ) + (b0 + b1 t + b2 t2 + b3 t3 + b4 t4 )) = F ((a0 + b0 ) + (a1 + b1 )t + (a2 + b2 )t2 + (a3 + b3 )t3 + (a4 + b4 )t4 ) 0 a0 + b0 + 2(a2 + b2 ) = a0 + b0 − 4(a4 + b4 ) (a2 + b2 ) + 2(a4 + b4 ) equals F (a0 + a1 t + a2 t2 + a3 t3 + a4 t4 ) + F (b0 + b1 t + b2 t2 + b3 t3 + b4 t4 ) 0 0 b0 + 2b2 a0 + 2a2 + = a0 − 4a4 a2 + 2a4 b0 − 4b4 b2 + 2b4 0 a0 + b0 + 2(a2 + b2 ) = . a0 + b0 − 4(a4 + b4 ) (a2 + b2 ) + 2(a4 + b4 ) Condition 2 is also satisfied: F (c(a0 + a1 t + a2 t2 + a3 t3 + a4 t4 )) = F (ca0 + ca1 t + ca2 t2 + ca3 t3 + ca4 t4 ) 0 ca0 + 2ca2 = ca0 − 4ca4 ca2 + 2ca4 equals cF (a0 + a1 t + a2 t2 + a3 t3 + a4 t4 ) 0 a0 + 2a2 = c a0 − 4a4 a2 + 2a4 0 ca0 + 2ca2 = . ca0 − 4ca4 ca2 + 2ca4 We conclude that F is a linear transformation.
222
Chapter 5 Linear Transformations Define G : M23 → P2 by a11 a12 a13 ) = a11 + a22 t + t2 . G( a21 a22 a23 This is not a linear transformation because, e.g., 0 0 0 0 0 0 G(2 ) = G( ) = t2 0 0 0 0 0 0 does not match 0 0 0 2G( ) = 2t2 , 0 0 0 violating the second condition of the linear transformation definition. E XAMPLE 5.9
V1
u
F V2
F( u )
G
H=G F The process of determining a coordinate vector from a given vector, discussed in Section 4.5, provides an important example of a linear transformation.
V3 G(F( u )) = H( u ) Illustration for Theorem 5.2
Composition of linear transformations
→ → E XAMPLE 5.10 Let S = {− v1 , − v2 , . . . , − v→ n } be a basis for a vector space V. Theorem 4.19 implies that the transformation F : V → Rn defined by → → → F (− v ) = [− v ]S for any − v ∈V is linear.
In Section 1.4, we have seen examples of compositions of two or more linear transformations in n-spaces (Examples 1.27 and 1.28); you may recall that in each of those examples, the resulting composite transformation was also linear. This is true in general. T HEOREM 5.2 If V1 , V2 , and V3 are vector spaces and F : V1 → V2 , G : V2 → V3 are linear transformations, then H : V1 → V3 defined by → → H(− u ) = G(F (− u )) is also a linear transformation. P ROOF → and − → in V , we We begin by demonstrating condition 1 of the definition. For any vectors − u u 1 2 1 have →+− →) = G(F (− →+− →)) H(− u u u u 1 2 1 2 − → →)) (since F is a linear transformation) = G(F (u1 ) + F (− u 2 →)) + G(F (− →)) (since G is a linear transformation) = G(F (− u u 1 2 →) + H(− →). = H(− u u 1 2 → Condition 2 holds because for any − u in V1 and scalar c, → − → H(c u ) = G(F (c− u )) → − = G(cF ( u )) (since F is a linear transformation) → = cG(F (− u )) (since G is a linear transformation) → − = cH( u ).
The notation H = G ◦ F is sometimes used to denote a composition discussed above.
Section 5.1 Linear Transformations in General Vector Spaces
223
Let F : P3 → P3 be a transformation defined by F (p) = G(G(p)) where G is the transformation introduced in Example 5.6. E XAMPLE 5.11
By Theorem 5.2, this is a linear transformation.
ò
The reader should perform the necessary algebra to verify that F (a0 + a1 t + a2 t2 + a3 t3 ) = 2a2 + 6a3 t so that F finds the second derivative of the original polynomial, F (p) = p .
EXERCISES 2 2 1. For each a linear F : R → R decide if F is transformation: x 2x x 0 x 2 a. F ( )= ; b. F ( )= ; c. F ( )= . y 3x y x−y y −y
2. For each F : R2 ⎡→ R3 ⎤decide if F is a linear ⎡ transformation: ⎤ ⎡ ⎤ x2 x+y y x x x ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ a. F ( ) = ⎣ x ⎦ ; b. F ( ) = ⎣ x − y ⎦ ; c. F ( ) = ⎣ y ⎦. y y y y y−x y 3 2 F 3. For each ⎡ F ⎤: R → R decide if ⎡ x x ⎢ ⎥ ⎢ a. F (⎣ y ⎦) = ; b. F (⎣ yz z
is a⎤linear transformation:⎡ ⎤ x x y x + 2z ⎥ ⎢ ⎥ ; c. F (⎣ y ⎦) = . y ⎦) = 2z 3y − 2 z z
3 → R3 decide 4. For each ⎡ F ⎤: R ⎡ ⎤ if F is⎡a linear ⎤ transformation: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ x y−z x x−1 x πx ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ a. F (⎣ y ⎦) = ⎣ z − x ⎦ ; b. F (⎣ y ⎦) = ⎣ x ⎦ ; c. F (⎣ y ⎦) = ⎣ 0 ⎦ . z x−y z x+1 z y+z 2 if Gis a linear transformation: 5. For each G : R →R decide x x x ex x2 − y 2 . a. G( ) = ln y ; b. G( ) = sin x; c. G( )= e x+y y y y
a linear 6. For each G: R√→ R2 decide if G is √ 3 √ transformation: 2 x2 ( 3 x) ( x) ; b. G(x) = ; c. G(x) = . a. G(x) = 0 x −x 7. Is F : Mmn → Mmn defined by F (A) = 2A for all A ∈ Mmn a linear transformation? 8. Given an m × n matrix B, is G : Mmn → Mmn defined by G(A) = A + B for all A ∈ Mmn a linear transformation? (If it is true only for some special matrices B, specify those in your answer.)
224
Chapter 5 Linear Transformations 9. Given an n × n matrix B, is H : Mnn → Mnn defined by H(A) = BA for all A ∈ Mnn a linear transformation? (If it is true only for some special matrices B, specify those in your answer.) 10. Is F : Mnn → Mnn defined by F (A) = In for all A ∈ Mnn a linear transformation? 11. Is G : Mnn → Mnn defined by G(A) = A2 for all A ∈ Mnn a linear transformation? 12. Is H : Mmn → Mmn defined by H(A) = r.r.e.f. A for all A ∈ Mmn a linear transformation? 13. Is F : Mmn → R defined by F (A) = rank A for all A ∈ Mmn a linear transformation? 14. Is G : Mmn → Rm defined by G(A) = col1 A for all A ∈ Mmn a linear transformation?
In Exercises 15–19, decide if the given transformation F : P3 → P3 is linear. 15. F (a0 + a1 t + a2 t2 + a3 t3 ) = a0 . 16. F (a0 + a1 t + a2 t2 + a3 t3 ) = a0 − a1 t + a2 t2 − a3 t3 . 17. F (a0 + a1 t + a2 t2 + a3 t3 ) = 6a3 . 18. F (a0 + a1 t + a2 t2 + a3 t3 ) = a0 t + a1 a2 . 19. F (a0 + a1 t + a2 t2 + a3 t3 ) = a0 − 1.
a c 20. Is H : P2 → M22 defined by H(at + bt + c) = a linear transformation? c b a11 a12 a13 21. Is F : M23 → P3 defined by F ( ) = a11 t3 + 1 a linear transformation? a21 a22 a23 2
⎡
⎤ a ⎢ ⎥ 22. Is G : R3 → P2 defined by G(⎣ b ⎦) = (at + b)(bt − c) a linear transformation? c a 2 ) = a a + b a + 2b a linear transforma23. Is H : R → M13 defined by H( b tion?
Section 5.2 Kernel and Range
225
24. Show that tr : Mnn → R defined by ⎡ ⎤ a11 a12 · · · a1n ⎢ ⎥ ⎢ a21 a22 · · · a2n ⎥ ⎢ tr(⎢ . .. .. ⎥ .. ⎥) = a11 + a22 + · · · + ann . ⎣ .. . . ⎦ an1 an2 · · · ann is a linear transformation. (This transformation is called the trace of the square matrix.) 25. * Let V and W be finite-dimensional vector spaces. Show that the set of all linear transfor→ mations F : V → W forms a vector space with the operations defined by (F + G)(− x) = → − → − → − → − → − F ( x ) + G( x ) and (cF )( x ) = c(F ( x )). What is the zero “vector” z in this space? (See condition 4 of the definition on p. 142.) 26. * Consider the vector space of all linear transformations F : R2 → R2 (special case of the space considered in the previous exercise). Show that the dimension of this space is 4 by finding a basis with four elements (transformations).
V
27. * Consider the following subsets of the space W of all linear transformations F : R2 → R2 (see the previous two exercises). Which of these subsets are subspaces of W ?
ker F 0
F
0
range F
a. All scaling transformations (including contractions and dilations), → → {F |F (− x ) = c− x , c ∈ R}. b. All rotations, x cos α − sin α x {F |F ( )= , α ∈ R}. y sin α cos α y c. All shear transformations parallel to the x-axis: x 1 a x , a ∈ R}. {F |F ( )= y 0 1 y d. All rotation-dilation transformations (see Exercise 35 on p. 55): x x a −b {F |F ( )= , a, b ∈ R}. y b a y
W 28. * Prove part b of Theorem 5.1 by induction:
Illustration for the definition of the kernel and the range
a. show that the statement of the theorem holds true for k = 1; b. for any k = 2, 3, . . . , show that if the statement holds true for k − 1, then it does for k as well. (See margin notes next to the proof of Theorem 3.3 for information on proofs by induction.)
5.2 Kernel and Range D EFINITION Let F : V → W be a linear transformation. Then: a. The kernel of F is the subset of V containing all vectors whose image is the zero vector: → − → → ker F = {− v | F (− v ) = 0 }. b. The range of F is the subset of W containing all images of vectors in V : → → → range F = {− w | F (− v)=− w }.
226
Chapter 5 Linear Transformations T HEOREM 5.3
If F : V → W is a linear transformation, then
a. ker F is a subspace of V and b. range F is a subspace of W. P ROOF → − → − → − → ker F is a subset of V that contains 0 (since, by part a of Theorem 5.1, F ( 0 ) = 0 ). If − u → − → − → − → − → − → − → − → − and v are in ker F (i.e., F ( u ) = F ( v ) = 0 ), then so is u + v (since F ( u + v ) = → − → − → → → → → F (− u ) + F (− v ) = 0 ). Also, c− u is in ker F since F (c− u ) = cF (− u ) = 0 . Therefore, ker F is a subspace of V. Proof of part b is left as an exercise.
The following examples feature a number of linear transformations which will be used to illustrate the newly introduced notions of kernel and range. E XAMPLE 5.12 The linear transformation introduced in Example 1.19, defined by F ( x = , performs projection onto the x-axis. 0 •
R2
ker F 0
•
F
x y
)
Any vector with first component equal (i.e., positioned along the y-axis)is projected to zero 0 0 x into the origin (zero vector): F ( ) = . Any other vector (i.e., with y 0 y 0 x = . Therefore, the kernel of F consists of all scalar x = 0) is projected onto 0 0 0 multiples of : 1 0 ker F = span{ }. 1 x The range of F contains all possible images of F. These are all vectors of the form , 0 i.e., all vectors positioned along the x-axis: 1 range F = span{ }. 0
0 Consider the linear transformation F : R2 → R2 defined by → → F (− u ) = 2− u which is a special case of a dilation transformation, discussed in Example 1.20. This transformation results in doubling the original vector. Clearly, the only vector transformed into the zero vector is the zero vector itself, therefore, → − ker F = { 0 }. a On the other hand, any vector in R2 , , is the image of another vector, specifically b E XAMPLE 5.13
range F=R2 Illustration for Example 5.13
Section 5.2 Kernel and Range
ker F=R2
a/2 b/2
227
. Consequently, rangeF = R2 .
→ E XAMPLE 5.14 Once again, let us consider the linear transformation F : R2 → R2 , F (− u)= → − k u defined in Example 1.20, taking k = 0, i.e., x 0 F( )= . y 0
0
F
Since every vector in R2 is transformed into the zero vector, we have ker F = R2 . On the other hand, the range of this transformation consists of just one vector: → − range F = { 0 }.
0
range F
R2 Illustration for Example 5.14
The previous three examples were intentionally kept at a simple level so that we could use them to introduce the concepts of kernel and range without having them obscured by excessive algebraic manipulation. We should now be ready for more comprehensive examples, involving considerable computations. The good news is that large parts of these computations will follow the procedures developed in Chapter 4.
⎡
LA TOOLKIT .COM IL 26748372846 NG 03571894251 EE 71028335189 AB 25729491092 RR 17398108327 + A 39163728081 Refer to the modules “finding the kernel/range of a linear transformation” for more detailed solutions of problems of this type.
⎡
⎤
x + 2z x ⎢ ⎢ −2y + 2z ⎢ ⎥ E XAMPLE 5.15 Let F (⎣ y ⎦) = ⎢ ⎢ 2x + y + 3z ⎣ z y−z ⎡ ⎡ ⎤ 1 x ⎢ ⎢ 0 ⎢ ⎥ transformation by Theorem 1.8 as F (⎣ y ⎦) = ⎢ ⎢ 2 ⎣ z 0
⎤ ⎥ ⎥ ⎥ . Note that F : R3 → R4 is a linear ⎥ ⎦ ⎤
⎡ ⎤ ⎥ x −2 2 ⎥ ⎥ ⎥⎢ ⎣ y ⎦. 1 3 ⎥ ⎦ z 1 −1 0
2
⎡
⎤ x ⎢ ⎥ → According to the definition, the kernel of F consists of all vectors − v = ⎣ y ⎦ for which z ⎤ ⎤ ⎡ ⎡ ⎤ ⎡ ⎡ ⎤ 0 x + 2z 0 ⎥ ⎥ ⎢ ⎢ x ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0 ⎥ ⎢ ⎥ ⎢ −2y + 2z ⎥ = ⎢ 0 ⎥ . , i.e., F ( ⎣ y ⎦) = ⎢ ⎢ 2x + y + 3z ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎦ ⎦ ⎣ ⎣ ⎦ ⎣ z 0 y−z 0 This matrix equation is equivalent to the homogeneous system of equations x 2x
− 2y + y
+ + +
2z 2z 3z
= 0 = 0 = 0
y
−
z
= 0
228
Chapter 5 Linear Transformations ⎡
⎤ ⎤ ⎡ 0 2 1 0 2 1 ⎢ ⎥ ⎥ ⎢ ⎢ 0 −2 ⎢ 0 1 −1 ⎥ 2 ⎥ ⎢ ⎥ . Following the pro⎥ ⎢ whose coefficient matrix ⎢ has r.r.e.f. ⎢ 0 0 ⎥ 1 3 ⎥ ⎣ 2 ⎦ ⎦ ⎣ 0 0 0 0 0 1 −1 cedure of Example 2.11 on p. 82, we determine that the system has infinitely many solutions: x = −2z y=z z − arbitrary, i.e.,
⎡
⎤ ⎡ ⎤ ⎡ ⎤ x −2z −2 ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ z ⎦ = z⎣ 1 ⎦. ⎣ y ⎦=⎣ z z 1 The kernel of F is the solution set of this system, which is a subspace of R3 with a basis ⎡ ⎤ −2 ⎢ ⎥ ⎣ 1 ⎦. 1 ⎡
⎤ x ⎢ ⎥ To find the range of F, let us begin by writing the image of an arbitrary vector ⎣ y ⎦ as a z linear combination with the x, y, and z as coefficients: ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ 2 0 1 x + 2z ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ −2y + 2z ⎥ ⎥ = x ⎢ 0 ⎥ + y ⎢ −2 ⎥ + z ⎢ 2 ⎥ . ⎢ ⎢ 3 ⎥ ⎢ 1 ⎥ ⎢ 2 ⎥ ⎢ 2x + y + 3z ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎦ ⎣ −1 1 0 y−z The range of F is the set of all such images, which can be written as ⎤ ⎤ ⎡ ⎤ ⎡ ⎡ 2 0 1 ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ ⎢ 0 ⎥ ⎢ −2 ⎥ ⎢ 2 ⎥ ⎥}. ⎢ ⎥ ⎢ ⎥ ⎢ span{⎢ ⎥ ⎥,⎢ ⎥,⎢ ⎣ 2 ⎦ ⎣ 1 ⎦ ⎣ 3 ⎦ −1 1 0 However, using the r.r.e.f. above and following the procedure of Example 4.32, we determine ⎡ ⎤ ⎡ ⎤ 1 0 ⎢ ⎥ ⎢ ⎥ ⎢ 0 ⎥ ⎢ −2 ⎥ ⎢ ⎥ ⎢ ⎥. a basis for the range ⎢ ⎥ , ⎢ ⎥ 2 1 ⎣ ⎦ ⎣ ⎦ 0 1
In Example 5.8, we have established that F : P4 → M22 defined by 0 a0 + 2a2 2 3 4 F (a0 + a1 t + a2 t + a3 t + a4 t ) = a0 − 4a4 a2 + 2a4 is a linear transformation. E XAMPLE 5.16
The kernel of F consists of all polynomials in P4 whose image under the transformation F is 0 0 . Setting 0 0 0 a0 + 2a2 0 0 = (61) a0 − 4a4 a2 + 2a4 0 0
Section 5.2 Kernel and Range
229
yields the linear system a0
+ 2a2
a0 ⎡ ⎢ ⎢ whose coefficient matrix ⎢ ⎢ ⎣
a2 1 0 1 0
0 0 0 0 ⎡
2 0 0 1
0 0 0 0 0 −4 0 2
⎤
− +
= 0 0 = 0 4a4 = 0 2a4 = 0
(62)
⎥ ⎥ ⎥ has the reduced row echelon form ⎥ ⎦
⎤ 1 0 0 0 −4 ⎢ ⎥ ⎢ 0 0 1 0 2 ⎥ ⎢ ⎥. (63) ⎢ 0 0 0 0 0 ⎥ ⎣ ⎦ 0 0 0 0 0 Following again the procedure of Example 2.11 on p. 82, we find that the system has infinitely many solutions: a0 = 4a4 a1 − arbitrary a2 = −2a4 a3 − arbitrary a4 − arbitrary The kernel of F contains polynomials of the form a0 + a1 t + a2 t2 + a3 t3 + a4 t4 = 4a4 + a1 t − 2a4 t2 + a3 t3 + a4 t4 = a1 (t) + a3 (t3 ) + a4 (4 − 2t2 + t4 ) (Setting a1 (t) + a3 (t3 ) + a4 (4 − 2t2 + t4 ) = 0 requires that a1 = a3 = a4 = 0 – compare the coefficients in front of t, t3 , and t4 on both sides.)14 Since the vectors t, t3 , 4 − 2t2 + t4 are linearly independent and span the kernel of F, they form a basis for ker F. The range of F is a subspace of M22 consisting of all images of vectors in P4 under F. Let us express the image of an arbitrary polynomial in P4 , a0 + a1 t + a2 t2 + a3 t3 + a4 t4 , as a linear combination with a0 , a1 , a2 , a3 , and a4 as coefficients: 0 a0 + 2a2 a0 − 4a4 a2 + 2a4 1 0 0 0 2 0 0 0 0 0 = a0 + a1 + a2 + a3 + a4 . 1 0 0 0 0 1 0 0 −4 2 1 0 0 0 2 0 0 0 → = → = → = → = → = The vectors − u ,− u ,− u ,− u ,− u 1 2 3 4 5 1 0 0 0 0 1 0 0 0 0 span range F. According to Theorem 4.13, it must be possible to obtain a basis for −4 2 →, − → − → − → − → range F from the indexed set S = {− u 1 u2 , u3 , u4 , u5 } by possibly deleting some of the vectors from the set. Applying a procedure analogous to that of Example 4.32, we begin by setting 1 0 0 0 2 0 0 0 0 0 0 0 + a1 + a2 + a3 + a4 = . a0 1 0 0 0 0 1 0 0 −4 2 0 0
14
Vectors obtained from this procedure are always linearly independent – refer to a related discussion in Example 4.46.
230
Chapter 5 Linear Transformations However, this is equivalent to (61) so once again it yields the system (62) whose augmented matrix has the r.r.e.f. (63). Following the argument made in Example 4.32, the nonleading → − → →, − columns correspond to the vectors − u 2 u4 , u5 , which can be expressed as linear combinations → and − → of those corresponding to the leading columns (− u are linearly 1 u3 ). The latter two vectors 1 0 2 0 independent (check!). Therefore, we conclude that and form a basis for 1 0 0 1 range F.
One-to-one and onto transformations
D EFINITION (One-to-one and Onto) Let F : V → W be a linear transformation. • •
→ → → → If F (− u ) = F (− v ) implies − u = − v , then the transformation F is said to be one-to-one. If range F = W , then F is said to be onto W.
To establish that F is onto, we typically follow the condition in the definition above. However, the following result offers a way of testing for F being one-to-one that may be more efficient. T HEOREM 5.4
→ − A linear transformation F is one-to-one if and only if ker F = { 0 }.
P ROOF
one-to-one and onto
not one-to-one and onto
one-to-one and not onto
not one-to-one and not onto
Part I (⇒) → → → → Assuming F is one-to-one, we know that F (− u ) = F (− v ) implies − u = − v . By part a of → − → − → − → − → Theorem 5.1 we have F ( 0 ) = 0 . Consequently, for any vector w such that F (− w ) = 0 we → − → − → must have − w = 0 . This shows that ker F = { 0 }. Part II (⇐) → − → → Given that ker F = { 0 }, let us consider vectors − u and − v such that → − → − F( u ) = F( v ) → − → → → → F ( u ) − F (− v ) = F (− v ) − F (− v ) (add − F (− v ) to both sides) → − → − → − F ( u ) − F ( v ) = 0 (apply condition 5 of the vector space definition) → − → − → F ( u ) + (−1)F (− v ) = 0 (apply Theorem 4.3) → − → → F (− u ) + F ((−1)− v ) = 0 (apply condition 2 of the linear transformation definition) → − → → F (− u + (−1)− v ) = 0 (apply condition 1 of the linear transformation definition) → − Since ker F = { 0 }, we must have → − → − → u + (−1)− v = 0 → → − → − → → → u + (−1)− v +− v = 0 +− v (add − v to both sides) → − → − − → − → u + 0 = 0 + v (apply Th. 4.3 and condition 5 of the vector space def.) → − → u = − v (apply condition 4 of the vector space definition) so that F is one-to-one.
Section 5.2 Kernel and Range
E XAMPLE 5.17 Of the five transformations discussed in Examples 5.12–5.16, F in Example 5.13 is one-to-one and onto, whereas the remaining four are neither.
G P3
231
P2= range G
t2+1
E XAMPLE 5.18 The transformation introduced in Example 5.6 could be considered as G : P3 → P2 . In this case, it is onto P2 (every polynomial in P2 is a derivative of some p in P3 ), but it is not one-to-one (any constant-valued polynomial, e.g., p(t) = 5, is transformed into the derivative p (t) = 0).
2t
ker G 0
0
5
In Example 5.18, G is onto but is not one-to-one. E XAMPLE 5.19
F
F (x) =
2
R
R
1 2
2 ker F 0
Consider the linear transformation F : R → R2 defined by
range F 1 1 0 0
In Example 5.19, F is one-to-one but is not onto.
x 2 x 2
(projecting a point from the x-axis onto the line y = x).
1 This transformation is not onto, because range F =span{ } covers only the line y = x in 1 R2 . F is one-to-one by Theorem 5.4 since for F (x) to be the zero vector, we must have x = 0 (another way to establish this is to use the definition directly: if two projections onto y = x are equal,
x1 2 x1 2
=
x2 2 x2 2
, then we must have x1 = x2 ).
E XAMPLE 5.20 In Example 5.10, we introduced the coordinate vector with respect to the basis S for a vector space V with dimension n → → F (− u ) = [− u ]S n as a linear transformation F : V → R . → − → − → → This transformation is one-to-one by Theorem 5.4 since [− u ]S = 0 implies that − u = 0, → − → u ]S in Rn corresponds to a linear making ker F = { 0 }. It is onto Rn , because every vector [− → − combination of vectors in S that equals some u in V = span S.
Invertible linear transformations
D EFINITION (Invertible Linear Transformation) A linear transformation F : V → W is said to be invertible if a transformation G : W → V, called an inverse of F, exists such that → → → G(F (− v )) = − v for all − v ∈V (64) and → → → F (G(− w )) = − w for all − w ∈ W. (65)
232
Chapter 5 Linear Transformations While the definition does not explicitly require for an inverse of a linear transformation to be linear or unique, the following theorems guarantee it always will be both.
T HEOREM 5.5 transformation.
If a linear transformation F is invertible, then its inverse is also a linear
P ROOF • •
→ →, F (− → →, and F (− → → →+− → imply G(− →+− →) = − → → F (− v1 ) = − w v2 ) = − w v1 + − v2 ) = − w w w w v1 + − v2 = 1 2 1 2 1 2 − → − → G(w1 ) + G(w2 ) and → → → → → → → F (− v)=− w and F (c− v ) = c− w imply G(c− w ) = c− v = cG(− w ).
T HEOREM 5.6
If a linear transformation F is invertible, then it has a unique inverse.
(The reader is asked to prove this theorem in Exercise 28.) In light of Theorem 5.6 we can refer to the inverse of a linear transformation F ; we shall denote it by F −1 .
T HEOREM 5.7
Let F : V → W be a linear transformation. F is invertible ⇔ F is one-to-one and onto.
P ROOF
F
Part I (⇒)
F( v ) = w
G(w) = v
→ → → Since F (G(− w )) = − w for all − w ∈ W , F is onto W. → → → → → Let − v1 and − v2 be two vectors in V such that F (− v1 ) = F (− v2 ). Then we must have G(F (− v1 )) = → − → − → − G(F ( v2 )), implying v1 = v2 . Therefore, F is one-to-one.
G
Part II (⇐) → → If F is one-to-one and onto, then for every − w ∈ W there exists a unique − v ∈ V such that → − → − → − → F ( v ) = w . Let G : W → V be defined to assign that v to the corresponding − w : → − → − → − → − G( w ) = v whenever F ( v ) = w , which yields both (64) and (65). We conclude that F is invertible.
Here are some examples of invertible transformations and their inverses: •
→ → Dilation/contraction F : Rn → Rn , F (− x ) = c− x with a nonzero c has the inverse → → 1− −1 − F (x) = c x.
•
Rotation by angle α in R2 (Example 1.23) has the inverse corresponding to rotation by −α.
•
Any reflection transformation in R2 or R3 defined so far (e.g., Example 1.21, Example 1.28) also serves as its own inverse.
Section 5.2 Kernel and Range
233
→, . . . , − u u→ E XAMPLE 5.21 Given a basis S = {− 1 n } for the vector space V, the transformation → − → − n F : V → R defined by F ( u ) = [ u ]S (Example 5.10) has its inverse F −1 : Rn → V defined by ⎤ ⎡ a1 ⎢ . ⎥ − → − → . ⎥ F −1 (⎢ ⎣ . ⎦ ) = a 1 u1 + · · · + a n un . an
Application: Quadrature formulas
ò
According to the fundamental theorem of calculus, for any continuous function f ) b f (x)dx = F (b) − F (a)
(66)
a
where F is an antiderivative of f ; i.e., F (x) = f (x) for all x. As long as an antiderivative of f is available, the formula (66) is frequently the most convenient way to evaluate the definite integral. Unfortunately, there are many algebraic functions whose antiderivatives cannot be 2 expressed in the algebraic form (e.g., ex , sin x2 ). Even if the antiderivative exists in a closed form, it may not always be easy to find. Additionally, our function f itself may not be given in a closed form – instead, our knowledge of f may be limited to a finite number of data values, making it impossible to obtain an expression for F. These are just some of the possible situations where it becomes desirable to numerically ap*b proximate a f (x)dx. The most popular way of doing so is to use a quadrature formula: a summation n ci f (xi ) = c0 f (x0 ) + · · · + cn f (xn ), (67) i=0
where c0 , . . . , cn and x0 , . . . , xn are given real numbers. *b Since such a formula only approximates a f (x)dx, the key question is how close is the sum to the integral? One of the ways to address this issue is by studying the degree of precision of a quadrature formula, defined as the largest integer k such that for any polynomial p(x) of degree k or less, *b n the formula is exact; i.e., i=0 ci p(xi ) = a p(x)dx. Let C denote the set of all functions continuous on R. Consider a transformation G : C → R defined by ) b n G(f ) = f (x)dx − ci f (xi ). a
i=0
We leave it as an exercise to demonstrate that G is a linear transformation. The degree of precision of the quadrature formula (67) is the largest integer k such that Pk ⊆ ker G. Find the degree of precision of the Trapezoidal Rule quadrature formula b−a [f (a) + f (b)] . 2 Because G is linear, G(a0 +a1 x+· · ·+ak xk ) = a0 G(1)+a1 G(x)+· · ·+ak G(xk ). Therefore, the degree of precision is the largest integer k such that G(1) = G(x) = · · · = G(xk ) = 0 and G(xk+1 ) = 0. E XAMPLE 5.22
234
Chapter 5 Linear Transformations • • •
*b
[1 + 1] = [x]ba − b−a 2 (2) = b − a − (b − a) = 0. b *b 2 2 2 2 2 2 x2 G(x) = a xdx − b−a − b −a = b −a − b −a = 0. 2 [a + b] = 2 2 2 2 a 1 2 2 1 2 2 *b 3 b a + b2 = x3 a + b2 − b−a G(x2 ) = a x2 dx − b−a 2 2 a 1 2 2 3 3 b−a 1 3 1 3 1 2 1 2 2 = b −a a = − − + b b + a − 3 2 6 6 2 a b + 2 ab = 0.
G(1) =
a
1dx −
b−a 2
Since P1 ⊆ ker G and P2 ker G, we conclude that the degree of precision of Trapezoidal Rule is 1.
EXERCISES
1. Which of the vectors
0 0
,
transformation? x x + 2y a. F ( )= ; y 2x + 4y
1 0
,
2
b. F (
1
x y
,
−2
are in ker F for the given linear
1
)=
2x 3y
.
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 1 0 0 ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 2. Which of the vectors ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦ are in ker F for the given linear 0 0 0 1 transformation? ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ x 0 x x ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ a. F (⎣ y ⎦) = ⎣ 0 ⎦ ; b. F (⎣ y ⎦) = ⎣ x ⎦ . z 0 z z ⎡
3. Which of the vectors in Exercise 1 are in range F for each given transformation? 4. Which of the vectors in Exercise 2 are in range F for each given transformation? 0 0 0 4 1 1 4 0 5. Which of the vectors , , , in M22 are in ker F 0 0 0 2 1 1 0 −1 for the given linear transformation? a b a b ) = b + (a + 4d)t + (c − b)t2 . a. F ( ) = −a + c + at2 ; b. F ( c d c d 6. Which of the vectors 0, 1 + t, t2 , 1 + 2t + 3t2 in P2 are in ker F for the given linear transformation? a b 0 2c 2 2 a. F (a + bt + ct ) = ; b. F (a + bt + ct ) = . b a 0 c 7. Which of the vectors in M22 listed in Exercise 5 are in range F for each transformation given in Exercise 6? 8. Which of the vectors in P2 listed in Exercise 6 are in range F for each transformation given in Exercise 5?
Section 5.2 Kernel and Range
235
In Exercises 9–23, consider the given linear transformation F. a. Find a basis for ker F. b. Find a basis for range F. c. Is F onto? d. Is F one-to-one? e. Is F invertible?
9. F : R2 → R2 defined by F ( 10. F : R2 → R2 defined by F (
x y x y
)=
)=
y x
.
x−y 0
.
⎡
⎤ ⎡ ⎤ x x−y ⎢ ⎥ ⎢ ⎥ 11. F : R3 → R3 defined by F (⎣ y ⎦) = ⎣ y − z ⎦ . z z−x ⎡
⎤ ⎡ ⎤ x x+y+z ⎢ ⎥ ⎢ ⎥ 12. F : R3 → R3 defined by F (⎣ y ⎦) = ⎣ y + z ⎦ . z z ⎡
⎤ x x+y ⎢ ⎥ 3 2 . 13. F : R → R defined by F (⎣ y ⎦) = y+z z 14. F : R2 → R3 defined by F (
x y
⎡
x ⎢ 15. F : R3 → R4 defined by F (⎣ y z ⎡
x ⎢ ⎢ y 16. F : R4 → R2 defined by F (⎢ ⎢ z ⎣ w
⎤ x + 2y ⎢ ⎥ ) = ⎣ x − y ⎦. y ⎤ ⎡ ⎤ x − 2z ⎢ ⎥ ⎢ y+z ⎥ ⎥ ⎥ ⎦) = ⎢ ⎢ x + 2y ⎥ . ⎣ ⎦ x+y−z ⎤ ⎥ ⎥ x + 3y − w ⎥) = . ⎥ −y + 2z + w ⎦
⎡
17. F : P1 → P1 defined by F (a0 + a1 t) = a1 . 18. F : P1 → P1 defined by F (a0 + a1 t) = a1 + a0 t. 19. F : P2 → P2 defined by F (a0 + a1 t + a2 t2 ) = −a0 + a1 + a2 − a1 t + (a0 − 2a1 + a2 )t2 . 20. F : P3 → P1 defined by F (a0 +a1 t+a2 t2 +a3 t3 ) = 2a0 −a2 +3a3 +(6a0 + a1 − 2a3 ) t. a11 a12 21. F : M22 → R defined by F ( ) = a11 + a22 . a21 a22
236
Chapter 5 Linear Transformations 22. F : R → P2 defined by F ( 2
x y
) = x − 2y + (x − y)t + (x + y)t2 .
23. F : P2 → M22 defined by F (a0 + a1 t + a2 t ) = 2
0 0 0 0
.
24. Prove part b of Theorem 5.3.
ò
25. Find the degree of precision of the Midpoint Rule quadrature formula (b − a)f ( a+b 2 ) ap*b proximating a f (x)dx.
ò
26.
a. Calculate the degree of precision of Simpson’s Rule *2 mating 0 f (x)dx. b. Show that under the substitution u = ) 2 f (x)dx =
x(b−a) 2
1 3
[f (0) + 4f (1) + f (2)] approxi-
+ a and g(u) = f (x) we have ) b g(u)du
2 (b − a) a 0 and 3 4 1 1 a+b [f (0) + 4f (1) + f (2)] = g(a) + 4g + g(b) 3 3 2 1 a+b 2 *b g(a) + 4g + g(b) approximates g(u)du with the same degree so that b−a 6 2 a of precision as the one obtained in part a.
ò
27.
a. * Consider a quadrature formula c0 f (x0 ) + c1 f (x1 ) approximating the integral *1 f (x)dx. Find the values of c0 , c1 , x0 , and x1 that result in G(1) = G(x) = −1 G(x2 ) = G(x3 ) = 0. Hint: Solve a system of four linear equations in the unknowns c0 and c1 , where x0 and x1 will become parts of the coefficients. See Exercise 36 on p. 112. b. * Show that the method obtained in part a has G(x4 ) = 0, resulting in the degree of precision equal to 3. (This method is an example of Gaussian quadrature.) c. * Find a substitution similar to the one used on part b of the previous exercise to show that the resulting method approximates an integral over an arbitrary closed interval with the same degree of precision.
28. * Prove Theorem 5.6. (Hint: Refer to the proof of the related Theorem 2.9.) ⎡ ⎡ ⎤ ⎤ x x −1 √ 1 0 ⎢ ⎢ ⎥ ⎥ 29. a. * Find the kernel of the cabinet transformation F1 (⎣ y ⎦) = 2−12 ⎣ y ⎦ √ 0 1 2 2 z z defined in Example 1.30. b. * Find the kernel of the transformation ⎡ ⎡ axonometric ⎤ ⎤ x √ x 3 −1 0 ⎢ ⎢ ⎥ ⎥ 2 √ F2 (⎣ y ⎦) = −√23 −1 ⎣ y ⎦ 3 4 4 2 z z defined in Example 1.31. c. * What is the geometric significance of the kernels obtained in parts a and b?
30. * Find the kernel and range of the transformation defined in formula (17). Discuss the geometric significance of these results.
Section 5.3 Matrices of Linear Transformations
237
5.3 Matrices of Linear Transformations According to Theorem 1.8 in Section 1.4, for every linear transformation F : Rn → Rm there → → → exists an m × n matrix A such that F (− u ) = A− u for all n-vectors − u . (And, vice versa: every → − → − F specified by F ( u ) = A u must be a linear transformation.) The following theorem will establish a similar correspondence for general linear transformations F : V → W and related matrices.
T HEOREM 5.8 Let F : V → W be a linear transformation. Moreover, let the set S = → − → −→ {− v1 , . . . , − v→ n } be a basis for V, and let T = {w1 , . . . , wm } be a basis for W . Then there exists → − a unique matrix A such that for every vector x in V → → x ]S . (68) [F (− x )]T = A[− → − The jth column of A is [F ( vj )]T . P ROOF
⎡
⎤ d1 ⎢ . ⎥ → − − → → → − . ⎥ Letting [− x ]S = ⎢ ⎣ . ⎦, i.e., x = d1 v1 + · · · + dn vn , we obtain dn → − → [F ( x )]T = [F (d1 − v1 + · · · + dn − v→ n )]T → − = [d1 F ( v1 ) + · · · + dn F (− v→ n )]T (apply part b of Theorem 5.1) → − = d1 [F ( v1 )]T + · · · + dn [F (− v→ n )]T (apply Theorem 4.19) ⎤ ⎡ ⎤⎡ d1 | ··· | ⎥ ⎢ ⎥⎢ → ⎢ . ⎥ (apply Theorem 1.6) = ⎣ [F (− v1 )]T · · · [F (− v→ n )]T ⎦ ⎣ .. ⎦ | ··· | dn → = A[− x] . S
To see that
⎡
⎤ | ··· | ⎢ ⎥ → A = ⎣ [F (− (69) v1 )]T · · · [F (− v→ n )]T ⎦ | ··· | is the only matrix with the property → → x ]S , [F (− x )]T = A[− suppose there is another matrix, B, satisfying → → [F (− x )]T = B[− x ]S → − for all x in V. Subtracting the two equations and applying Theorem 1.5 we obtain → − → 0 = (A − B)[− x ]S . → − Since this is true for all n-vectors [ x ] , we have nullity(A − B) = n; therefore, by Theorem S
4.25, rank(A − B)= 0. The only matrix with zero rank is the zero matrix; thus A = B.
The matrix A referred to in Theorem 5.8 is called the matrix of the linear transformation F with respect to S and T . When dealing with a transformation F : V → V, using S for both domain and codomain (i.e., “input” and “output” spaces), we refer to A as “the matrix of F with respect to S” (rather than “S and S”).
238
Chapter 5 Linear Transformations E XAMPLE 5.23
Consider F : R2 → R3 defined by F (
(a) Find the matrix of F 1 1 S={ , 1 −1 → − v1
→ − v2
x1 x2
with respect to the bases ⎡ ⎤ ⎡ ⎤ ⎡ 1 1 ⎢ ⎥ ⎢ ⎥ ⎢ } and T = {⎣ 0 ⎦, ⎣ 1 ⎦, ⎣ 0 0 − → w 1
(b) Use the matrix obtained in (a) to evaluate F (
− → w 2
−2 3
⎤ x1 ⎥ ⎢ ) = ⎣ x1 + x2 ⎦ . x1 + 2x2 ⎡
⎤ 1 ⎥ 1 ⎦}. 1
− → w 3
). Evaluate the same expression di-
rectly.
S OLUTION → (a) F (− v1 ) = F (
1 1
⎡
⎤ 1 ⎢ ⎥ ) = ⎣ 2 ⎦. 3
⎤ a1 ⎥ ⎢ → To find the coordinate vector with respect to the vectors in T, [F (− v1 )]T = ⎣ a2 ⎦, we a3 →+a − →+a − → = F (− → w w w v ). The resulting linear need to find a1 , a2 , and a3 such that a1 − 1 2 2 1 ⎡ ⎤ 3 3 ⎤ ⎡ 1 1 1 1 1 0 0 −1 ⎢ ⎥ ⎥ ⎢ system has the augmented matrix ⎣ 0 1 1 2 ⎦ whose r.r.e.f. is ⎣ 0 1 0 −1 ⎦ . ⎡
⎤
⎡
0 0 1
−1 ⎢ ⎥ − → Thus [F ( v1 )]T = ⎣ −1 ⎦ . 3
0 0 1
3
3
⎤ 1 1 ⎢ ⎥ → ) = ⎣ 0 ⎦ with respect to the vectors in The coordinate vector of F (− v2 ) = F ( −1 −1 ⎤ ⎡ 1 1 1 1 ⎥ ⎢ T can be found in a similar manner, using the augmented matrix ⎣ 0 1 1 0 ⎦ with
⎡
1 0 ⎢ r.r.e.f. ⎣ 0 1 0 0 The matrix of F
⎡
0 0 1 −1
⎤
1 0 ⎥ 0 1 ⎦. 1 −1 with respect to bases S and T is ⎤ ⎡ ⎡ ⎤ −1 1 | | ⎥ ⎢ ⎢ ⎥ → → A = ⎣ [F (− v2 )]T ⎦ = ⎣ −1 1 ⎦. v1 )]T [F (− | | 3 −1
Shortcut: The matrix A could have been obtained more quickly by forming a “combined” ⎡ ⎤ ⎤ ⎡ 1 1 1 1 1 1 1 0 0 −1 ⎢ ⎥ ⎥ ⎢ augmented matrix ⎣ 0 1 1 2 0 ⎦ whose r.r.e.f. is ⎣ 0 1 0 −1 1 ⎦. 0 0 1 3 −1
0 0
1
3 −1
Section 5.3 Matrices of Linear Transformations
Step *: → → Given − x , find [− x ]S .
Step **: → → x )]T . Given [− x ]S , find [F (−
Step ***: → → x ). Given [F (− x )]T , find F (−
Step ****: → → Given − x , find F (− x ).
239
(b) We shall be guided by the following diagram: ∗∗∗∗ → − → x −→ F (− x) ↓∗ ↑∗∗∗ ∗∗ → → [− x ]S −→ [F (− x )]T → − The process of determining F ( x ) using the matrix A obtained in part (a) will be carried out in three steps, denoted *,**, and ***. The same result will then be obtained directly in Step ****. −2 Begin by finding the coordinate vector of with respect to the vectors in S. To do 3 1 1 −2 . Since the r.r.e.f. so, we solve a linear system with the augmented matrix 1 −1 3 1 1 1 0 −2 2 of the matrix is , we obtain [ ]S = −52 . −5 0 1 2 3 2 → − → − → We will use the equation [F ( x )]T = A[ x ]S to find [F (− x )]T : ⎡ ⎤ ⎡ ⎤ −1 1 1 −3 −2 ⎢ ⎥ ⎢ ⎥ [F ( )]T = ⎣ −1 1 ⎦ −52 = ⎣ −3 ⎦ . 3 2 3 −1 4 As you should recall from Section 4.5, finding a vector whose coordinate vector is given is simply a matter of evaluating the appropriate linear combination of the basis vectors: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 1 −2 −2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ F( ) = −3 ⎣ 0 ⎦ − 3 ⎣ 1 ⎦ + 4 ⎣ 1 ⎦ = ⎣ 1 ⎦ . 3 0 0 1 4 −2 Evaluating F ( ) directly from the definition of F , i.e., 3 ⎤ ⎡ ⎤ ⎡ −2 −2 −2 ⎥ ⎢ ⎥ ⎢ F( ) = ⎣ −2 + 3 ⎦ = ⎣ 1 ⎦, 3 4 −2 + 2(3) yields (as expected!) the same result.
Here is an example of a matrix of linear transformation involving polynomial spaces. E XAMPLE 5.24 (51) on p. 184),
Let S and T be the Bernstein bases for P3 and P2 , respectively (see formula S
t3 }, = {(1 − t)3 , 3t(1 − t)2 , 3t2 (1 − t), q0 (t)
T
q1 (t)
q2 (t)
q3 (t)
t }. = {(1 − t) , 2t(1 − t), 2
2
p0 (t)
p1 (t)
p2 (t)
In Example 5.6, we have shown G(a0 + a1 t + a2 t2 + a3 t3 ) = a1 + 2a2 t + 3a3 t2 to be a linear transformation from P3 to P3 . Here, we will let G : P3 → P2 instead. The matrix of G with respect to S and T has a form ⎡ | | | ⎢ A = ⎣ [G(q0 )]T [G(q1 )]T [G(q2 )]T |
|
|
⎤ | ⎥ [G(q3 )]T ⎦ . |
240
Chapter 5 Linear Transformations We follow in the footsteps of Example 5.23. Begin by applying G to the vectors in S : G((1 − t)3 ) = G(1 − 3t + 3t2 − t3 ) = −3 + 6t − 3t2 G(3t(1 − t)2 ) = G(3t − 6t2 + 3t3 ) = 3 − 12t + 9t2 G(3t2 (1 − t)) = G(3t2 − 3t3 ) = 6t − 9t2 G(t3 ) = 3t2 ⎤ ⎤ ⎤ ⎡ ⎡ ⎡ x1 y1 z1 ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ To find [−3 + 6t − 3t2 ]T = ⎣ x2 ⎦ , [3 − 12t + 9t2 ]T = ⎣ y2 ⎦ , [6t − 9t2 ]T = ⎣ z2 ⎦, ⎡
⎤
x3
y3
z3
w1 ⎥ ⎢ and [3t2 ]T = ⎣ w2 ⎦ we set w3 x1 (1 − t)2 + x2 2t(1 − t) + x3 t2 = −3 + 6t − 3t2 y1 (1 − t)2 + y2 2t(1 − t) + y3 t2 = 3 − 12t + 9t2 z1 (1 − t)2 + z2 2t(1 − t) + z3 t2 = 6t − 9t2 w1 (1 − t)2 + w2 2t(1 − t) + w3 t2 = 3t2 Expanding the left-hand sides yields x1 + (−2x1 + 2x2 )t + (x1 − 2x2 + x3 )t2 , and similar expressions for y, z, and w. The corresponding linear systems can be written in a single augmented matrix with all four right-hand sides ⎡ ⎤ 1 0 0 −3 3 0 0 ⎢ ⎥ 6 −12 6 0 ⎦. 2 0 ⎣ −2 1 −2 1 −3 9 −9 3 The r.r.e.f. is ⎡ ⎤ 3 0 0 1 0 0 −3 ⎢ ⎥ 0 −3 3 0 ⎦; ⎣ 0 1 0 0 0 1 0 0 −3 3 ⎡ ⎤ −3 3 0 0 ⎢ ⎥ therefore, the matrix is A = ⎣ 0 −3 3 0 ⎦. 0
Rank and nullity, once again
0 −3 3
While different matrices can represent the same transformation (depending on the choice of bases), all such matrices share some important properties described below.
T HEOREM 5.9 Let V and W be finite-dimensional vector spaces with bases S and T , respectively and let F : V → W be a linear transformation whose matrix with respect to S and T is A. 1. dim range F = rank A, and 2. dim ker F = nullity A.
Section 5.3 Matrices of Linear Transformations
241
P ROOF → Let S = {− v1 , . . . , − v→ n }. Proof of property 1. Using the fact that S spans V (since it is a basis) and that F is a linear transformation, we can write → → range F = {F (− v)|− v ∈V} → − v→ = {F (c1 v1 + · · · + cn − n ) | c1 , . . . , cn ∈ R} → − = {c1 F ( v1 ) + · · · + cn F (− v→ n ) | c1 , . . . , cn ∈ R} → − − → = span{F ( v1 ), . . . , F (vn )}. On the other hand, → (column space of A) = span{[F (− v1 )]T , . . . , [F (− v→ n )]T }. By property 2 of Theorem 4.20, these two spaces have the same dimension. Proof of property 2. → The kernel of F consists of all vectors − v in V for which → − → F (− v)= 0. (70) → Expressing − v as a linear combination of vectors in the basis S and using the linearity of F, we can write → − → F (c1 − v1 + · · · + cn − v→ 0 n) = → − → v1 ) + · · · + cn F (− v→ 0 (71) c1 F ( − n) = → − → − → − → − Since u = 0 is the only vector such that [ u ]T = 0 , (71) is equivalent to → − → [c1 F (− v1 ) + · · · + cn F (− v→ n )]T = 0 , which, by linearity of coordinates can be rewritten as → − → c [F (− v )] + · · · + c [F (− v→)] = 0 1
1
T
n
n
T
or, by Theorem 1.6, as
→ − → (72) A [− v ]S = 0 . Because (70) and (72) are equivalent, we conclude that the dimension of the kernel of F equals the dimension of the null space of A.
The correspondence established in Theorem 5.9 leads us to the following definition. D EFINITION (Rank and Nullity) If F : V → W is a linear transformation, then • •
the rank of F (denoted rank F ) is defined to be the dimension of range F, and the nullity of F (denoted nullity F ) is defined to be the dimension of ker F.
The following result is a consequence of Theorem 4.25.
T HEOREM 5.10 Let V and W be finite-dimensional vector spaces. If F : V → W is a linear transformation, then rank F + nullity F = dim V.
242
Chapter 5 Linear Transformations E XAMPLE 5.25 Let us verify the above theorem for five of the examples presented in the previous section:
E XAMPLE 5.26
rank F
nullity F
dim V
Example 5.12
1
1
2
Example 5.13
2
0
2
Example 5.14
0
2
2
Example 5.15
2
1
3
Example 5.16
2
3
5
Recall the linear transformation G : P3 → P3 given by G(a0 + a1 t + a2 t2 + a3 t3 ) = a1 + 2a2 t + 3a3 t2
of Example 5.6. The kernel of G is the set of all polynomials in P3 such that G(a0 + a1 t + a2 t2 + a3 t3 ) = 0; i.e., a1 + 2a2 t + 3a3 t2 = 0. Clearly, this implies a1 = a2 = a3 = 0 so that ker G = {a0 + a1 t + a2 t2 + a3 t3 | a1 = a2 = a3 = 0} = {a0 | a0 ∈ R} = span{1}. The zeroth-degree polynomial p(t) = 1 forms a basis for ker G. The range of G is the set of all polynomials in P3 that can be obtained as images of G. All images of G have the form a1 + 2a2 t + 3a3 t2 so that range G = span {1, t, t }. The linearly independent vectors 1, t, and t2 form a basis for range G. 2
Consequently, for this transformation,
rank G = dim P3 . G + nullity =3
Matrix of composite transformation
=1
=4
In Theorem 5.2, we proved that a composition of two linear transformations is also a linear transformation. Additionally, you may recall a discussion in Section 1.4 (beginning on p. 44) focused on the connection between composing transformations in Rn and multiplying their matrices. In this subsection, we would like to extend this important relationship to linear transformations in general vector spaces.
Section 5.3 Matrices of Linear Transformations
243
T HEOREM 5.11 Let V1 , V2 , and V3 be finite-dimensional vector spaces with bases S = → − → − → {− v1 , . . . , − v→ m }, T = {w1 , . . . , wn }, and U, respectively. If
V1
u
F( u )
G
F : V1 → V2 is a linear transformation whose matrix with respect to S and T is A and
•
G : V2 → V3 is a linear transformation whose matrix with respect to T and U is B, then the matrix of the composite linear transformation H : V1 → V3 defined by → → H(− u ) = G(F (− u )) with respect to bases S and U is BA.
F V2
•
P ROOF
H=G F
V3 G(F( u )) = H( u )
The matrix of H with respect to S and U is ⎤ ⎡ | ··· | ⎥ ⎢ → C = ⎣ [H(− v→ v1 )]U · · · [H(− m )]U ⎦ . | ··· | To show that C = BA, we show that the jth column of C equals the jth column of BA; i.e., ⎤ ⎡ ⎤⎡ a1j | ··· | ⎥ ⎢ ⎥⎢ →)] ⎢ .. ⎥ (73) B colj A = ⎣ [G(− · · · [G(− w→ w 1 U n )]U ⎦ ⎣ . ⎦ | ··· | anj where ⎤ ⎡ a1j ⎢ . ⎥ → ⎢ . ⎥ = colj A = [F (− vj )]T ⎣ . ⎦ anj so that → + ··· + a − → → w (74) F (− vj ) = a1j − 1 nj wn . The right-hand side of (73) can be expressed as a linear combination of columns of B, →)] + · · · + a [G(− B colj A = a1j [G(− w w→ 1 U nj n )]U . From linearity of coordinates (Theorem 4.19) and the fact that G is a linear transformation, we can further rewrite the right-hand side as →) + · · · + a G(− B colj A = [a1j G(− w w→ 1 nj n )]U − → − → = [G(a1j w1 + · · · + anj wn )]U . After using (74), → → B colj A = [G(F (− vj ))]U = [H(− vj )]U , we can conclude that the jth column of C equals the jth column of BA for any j. Thus, C = BA.
In spite of the presence of quite a few details in the statement of the above theorem (owing to the three vector spaces and their bases), the main idea remains the same as in Section 1.4: a composition of two linear transformations can be represented by the product of matrices of the individual transformations. E XAMPLE 5.27 Consider the transformation J : P2 → P1 performing differentiation on any polynomial of degree 2 or less: J(b0 + b1 t + b2 t2 ) = b1 + 2b2 t. By following the procedure of Example 5.24, the reader should be able to show that the matrix
244
Chapter 5 Linear Transformations of J with respect to the Bernstein bases T = {(1 − t)2 , 2t(1 − t), t2 } and U = {1 − t, t} is −2 2 0 . B= 0 −2 2 If we let H : P3 → P1 be the composition of J and the transformation G discussed in Example 5.24, H = J ◦ G, then the matrix of H with respect to the Bernstein bases for P3 and P1 , S and U, is ⎡ ⎤ −3 3 0 0 −2 2 0 ⎢ 6 −12 6 0 ⎥ BA = . 3 0 ⎦= ⎣ 0 −3 0 −2 2 0 6 −12 6 0 0 −3 3 The transformation H corresponds to taking the second derivative of a polynomial in P3 . ⎤ ⎡ 0 ⎥ ⎢ ⎢ 1 ⎥ 2 1 ⎥ E.g., the polynomial 3t(1 − t)2 has the coordinate vector 3t(1 − t)2 S = ⎢ ⎢ 0 ⎥ . Using the ⎦ ⎣ 0 matrix BA we have ⎤ ⎡ 0 ⎢ ⎥ ⎥ 1 2 −12 1 6 −12 6 0 ⎢ 2 ⎥= ⎢ H(3t(1 − t) ) U = , ⎥ 6 0 6 −12 6 ⎢ ⎣ 0 ⎦ 0 which leads to H(3t(1 − t)2 ) = −12(1 − t) + 6t = −12 + 18t. This matches the result obtained directly: H(3t(1 − t)2 ) = J(G(3t − 6t2 + 3t3 )) 2 = J(3 − 12t + 9t) first derivative
−12 + 18t .
=
second derivative
Matrix of inverse transformation
Applying Theorem 5.11 to the situation where G is an inverse transformation of F leads to the following result. T HEOREM 5.12 Let V and W be finite-dimensional vector spaces with bases S and T, respectively. Furthermore, let F : V → W be a linear transformation, whose matrix with respect to S and T is A. Then: 1. F is invertible if and only if A is invertible. 2. The matrix of the inverse transformation F −1 with respect to T and S is A−1 . P ROOF Property 1 By Theorem 5.7,
Section 5.3 Matrices of Linear Transformations
245
F is invertible if and only if F is one-to-one and onto. → − A transformation F is one-to-one ⇔ ker F = { 0 }; i.e., dim ker F = 0. F is onto W ⇔ dim range F = dim W. In light of Theorem 5.10, this is equivalent to dim ker F = 0 and dim W = dim V, and, by Theorem 5.9, to nullity A = 0 and m = n. Using the equivalent statements on p. 212, this is true if and only if A is nonsingular. Property 2. Follows from Theorem 5.11, with G = F −1 .
E XAMPLE 5.28 In Examples 5.10 and 5.21, we have defined the transformation F : V → Rn and its inverse F −1 : Rn → V , respectively, by → → F (− v ) = [− v ]S and ⎤ ⎡ a1 ⎥ ⎢ → −1 ⎢ .. ⎥ F ( ⎣ . ⎦) = a1 − v1 + · · · + an − v→ n an → − → − − → where S = { v , v , . . . , v } is a basis for the vector space V. Letting T denote the standard 1
2
n
basis for Rn , it should be an easy exercise for the reader to see that both the matrix of F with respect to S and T and the matrix of F −1 with respect to T and S are In .
Eleven equivalent statements
Based on Theorems 5.12 and 5.7, we can add two equivalent statements to our list. To save space, we do not include all of the previously introduced statements (the list of the first nine statements appears on p. 212). 11 Equivalent Statements For an n × n matrix A, the following statements are equivalent. 1. A is nonsingular. .. . 10. If A is a matrix of a linear transformation, then the transformation is invertible. 11. If A is a matrix of a linear transformation, then the transformation is one-to-one and onto.
246
Chapter 5 Linear Transformations 11 Equivalent “Negative” Statements For an n × n matrix A, the following statements are equivalent. -1. A is singular. .. . -10. If A is a matrix of a linear transformation, then the transformation is not invertible. -11. If A is a matrix of a linear transformation, then the transformation is neither oneto-one nor onto. Statement “-11” may appear to be considerably stronger than a simple negation of statement “11”. However, for a square matrix of a linear transformation, Theorem 5.10 implies that when rank F < n (meaning that F is not onto), then nullity F = n−rank F > 0 (i.e., F is not one-to-one), and vice versa.
Change of basis An important special case of the formula → → [F (− x )]T = A[− x ]S → − → occurs when F is the identity transformation, F ( x ) = − x . Under such a scenario, the matrix A is called the coordinate-change matrix from the basis S to T and is denoted PT ←S : → → [− x] . (75) [− x] = P T ←S
T
S
To determine the coordinate-change matrix, follow the same procedure introduced for finding → → the matrix of linear transformation, taking F to be the identity transformation, I(− x) = − x.
E XAMPLE 5.29
Consider two bases for R3 : ⎡
•
⎢ the standard basis S = {⎣ ⎡
•
1
⎤ ⎡
⎤ ⎡ 1 ⎥ ⎢ 0 ⎦, ⎣ 0 → − i
0
⎤ ⎡ 0 ⎥ ⎢ 1 ⎦, ⎣ 0 → − j
⎤ ⎡
1
⎤
⎤ 0 ⎥ 0 ⎦} and 1 → − k
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ the basis T = {⎣ 0 ⎦, ⎣ 1 ⎦, ⎣ 1 ⎦} introduced in Example 4.26. 1 −1 1 − → u 1
− → u 2
− → u 3
We shall find two coordinate-change matrices, a. PS←T and b. PT ←S .
Section 5.3 Matrices of Linear Transformations Part a is actually quite straightforward: PS←T
⎡
247
⎤ | | →] →] ⎥ [− u [− u 2 S 3 S ⎦ | | ⎤ 1 ⎥ 1 ⎦. 1 −1 1
| ⎢ → = ⎣ [− u 1 ]S | ⎡ 1 0 ⎢ = ⎣ 0 1
In part b,
⎡
⎤ | | | → ⎥ − → → − ⎢ − PT ←S = ⎣ [ i ]T [ j ]T [ k ]T ⎦ , | | | we are solving three linear systems, which can be arranged in a common augmented matrix: ⎤ ⎡ 1 0 1 1 0 0 ⎥ ⎢ 1 1 0 1 0 ⎦. ⎣ 0 1 −1
Its r.r.e.f. is
⎡
1 0 0
1 0 0 1
2 −1 −1
⎢ 1 ⎣ 0 1 0 0 0 1 −1 thus
⎤
⎥ 0 −1 ⎦ ; 1 1
⎡
PT ←S
⎤ 2 −1 −1 ⎢ ⎥ =⎣ 1 0 −1 ⎦ . −1 1 1
Note that the steps performed at the end of the example are reminiscent of the procedure of Section 2.3, in which a sequence of row operations applied to an n × 2n matrix 2 1 [A | In ] → In | A−1 resulted in the inverse in the right half of the matrix. This is consistent with Theorem 5.12: since
•
PS←T is the matrix of I : R3 → R3 with respect to T and S and
•
PT ←S is the matrix of the inverse transformation I −1 : R3 → R3 (same as I) with respect to S and T ; therefore, −1 . PT ←S = PS←T Moreover, according to Theorem 5.11, coordinate-change matrices can be multiplied, PU←S = PU←T PT ←S , if S, T, and U are bases for the same vector space.
248
Chapter 5 Linear Transformations
EXERCISES
1.
2.
3.
4.
5.
6.
7.
x x+y Find the matrix of the linear transformation F ( )= with respect to the y 3x + y 1 0 −2 basis S = { , }. Calculate F ( ) twice: directly and using the matrix. 2 1 3 x x + 2y Find the matrix of the linear transformation F ( )= with respect to the y −x + y 1 −1 3 basis S = { , }. Calculate F ( ) twice: directly and using the matrix. 1 1 −4 x + 2y x Find the matrix of the linear transformation F ( )= with respect to the y 2x + y −1 1 1 0 6 bases S = { , } and T = { , }. Calculate F ( ) twice: 0 1 2 1 0 directly and using the matrix. x 2x Find the matrix of the linear transformation F ( ) = with respect to the y x+y 2 −1 −1 −3 −2 bases S = { , } and T = { , }. Calculate F ( ) 4 1 2 2 3 twice: directly and using the matrix. ⎡ ⎤ y x ⎢ ⎥ Find the matrix of the linear transformation F ( ) = ⎣ x + y ⎦ with respect to the y x ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 1 1 1 3 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ bases S = { , } and T = {⎣ 1 ⎦ , ⎣ 0 ⎦ , ⎣ 0 ⎦}. Calculate F ( ) 3 −1 −4 0 1 −1 twice: directly and using the matrix. ⎡ ⎤ x x+y ⎢ ⎥ with respect to the Find the matrix of the linear transformation F (⎣ y ⎦) = y+z z ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 0 1 5 1 4 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ bases S = {⎣ 1 ⎦ , ⎣ 2 ⎦ , ⎣ 0 ⎦} and T = { , }. Calculate F (⎣ 4 ⎦) 0 −1 0 1 2 3 twice: directly and using the matrix. ⎡ ⎤ ⎡ ⎤ x x ⎢ ⎥ ⎢ ⎥ Find the matrix of the linear transformation F (⎣ y ⎦) = ⎣ x + y ⎦ with respect to the z −x ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ 0 1 2 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ basis S = {⎣ 1 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦}. Calculate F (⎣ 4 ⎦) twice: directly and using the 0 matrix.
1
1
4
Section 5.3 Matrices of Linear Transformations
249
8. Find the matrix of the linear transformation F : P2 → P1 defined by F (a0 + a1 t + a2 t2 ) = a1 + 2a2 t with respect to the bases S = {1, 1 + t, 1 + t2 } and T = {1, t}. Calculate F (3 + 4t + t2 ) twice: directly and using the matrix. 9. Find the matrix of the linear transformation F : P2 → P2 defined by F (a0 + a1 t + a2 t2 ) = a0 +2a1 t+4a2 t2 with respect to the bases S = {1, t, t2 } and T = {1−2t+t2 , 2t−2t2 , t2 }. Calculate F (−2 + t + 3t2 ) twice: directly and using the matrix. 10. Find the matrix of the linear transformation F : P2 → R3 defined by F (a0 + a1 t + a2 t2 ) ⎡ ⎤ a0 − a1 ⎢ ⎥ = ⎣ 0 ⎦ with respect to the bases S = {1 + t2 , t, −1 + t2 } and T = a1 + 3a2 ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ {⎣ 1 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦}. Calculate F (4 + 2t − t2 ) twice: directly and using the ma1 1 1 trix. 1 0 1 1 Given the bases S = { , } and T = { , } for R2 , find the 0 1 −1 1 2 → − , check that one of the coordinate vectors coordinate-change matrix PT ←S . For u = 3 → → [− u ]S and [− u ]T can be obtained by multiplying PT ←S by the other one. 1 4 1 0 Given the bases S = { , } and T = { , } for R2 , find the coordinate5 3 0 1 2 → − → , check that one of the coordinate vectors [− u ]S change matrix PT ←S . For u = −7 → and [− u ]T can be obtained by multiplying PT ←S by the other one. 1 0 1 3 Given the bases S = { , } and T = { , } for R2 , find the 3 1 −1 −1 4 → − , check that one of the coordinate coordinate-change matrix PT ←S . For u = −1 → → vectors [− u ]S and [− u ]T can be obtained by multiplying PT ←S by the other one. ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 1 1 0 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ Given the bases S = {⎣ 0 ⎦ , ⎣ −1 ⎦ , ⎣ 1 ⎦} and T = {⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦} ⎡
11.
12.
13.
14.
1
1
−1
⎡
0 ⎤
−1
0
1 ⎢ ⎥ − → for R , find the coordinate-change matrix PT ←S . For u = ⎣ 3 ⎦, check that one of the 2 → → u ]T can be obtained by multiplying PT ←S by the other one. coordinate vectors [− u ]S and [− ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 1 1 1 1 −1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 15. Given the bases S = {⎣ 0 ⎦ , ⎣ 0 ⎦ , ⎣ 1 ⎦} and T = {⎣ −1 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦} 1 1 0 0 −1 1 ⎡ ⎤ 4 ⎢ ⎥ → − 3 for R , find the coordinate-change matrix PT ←S . For u = ⎣ −2 ⎦, check that one of the 3
1 → → u ]T can be obtained by multiplying PT ←S by the other one. coordinate vectors [− u ]S and [−
250
Chapter 5 Linear Transformations 16. Find the coordinate-change matrix from the monomial basis to the Bernstein basis for P2 . 17. Find the coordinate-change matrix from the Bernstein basis to the monomial basis for P2 . 18. Find the coordinate-change matrix from the Bernstein basis to the monomial basis for P3 . 19. Find the coordinate-change matrix from the monomial basis to the Bernstein basis for P3 . 20. * Two of the most popular scalable font technologies are the TrueType fonts (supported by Apple and Microsoft) and the Type 1 fonts (by Adobe). One of the key differences is that TrueType fonts are based on piecewise quadratic Bézier curves, whereas Type 1 fonts use piecewise cubic Bézier curves. In many instances, it becomes necessary to convert a font from one system to another. Your task in this exercise is to find a matrix A of the linear transformation H, which practically looks like an identity transformation H(p(t)) = p(t) except for the fact that H : P2 → P3 . This matrix is supposed to be found with respect to the Bernstein bases S and T for P2 and P3 , respectively, so that [p(t)]T = A[p(t)]S for all p(t) in P2 .
T/F?
In Exercises 21-24, decide whether each statement is true or false. Justify your answer. 21. If F : R3 → R4 is a linear transformation and F is one-to-one, then the rank of F is 4. 22. If F : R5 → R3 is a linear transformation and F is onto R3 , then the rank of F is 3. 23. If F : R4 → R4 is a linear transformation and the nullity of F is 0, then F is invertible. 24. If F : R2 → R4 is a linear transformation, then F cannot be one-to-one.
→, . . . , − n 25. * Suppose F : Rn → Rn is a linear transformation and S = {− u u→ 1 n } is a basis for R such that − → → − → →) = c − F (− u 1 1 u1 , . . . , F ( un ) = c n un for some real numbers c1 , . . . , cn . Show that the matrix of F with respect to S is diagonal. 26. * Let F : V → W be a linear transformation, and let S and S ∗ be bases for V, while T and T ∗ are bases for W. If A is the matrix of F with respect to S and T , while B is the matrix of F with respect to S ∗ and T ∗ , show that B = PT ∗ ←T APS←S ∗ . → → 27. * Given an n × n matrix A, let F (− x ) = A− x . Show that A is the matrix of F with respect n to the standard basis for R . → ··· − , let the indexed set T contain its 28. * Given a nonsingular matrix C = − u u→ 1 n − → → − − → − → columns T = {u , . . . , u } whereas S = { e , . . . , e } contains the standard basis for Rn . 1
Show that C = PS←T .
n
1
n
Section 5.3 Matrices of Linear Transformations
251
29. * Suppose you want to find a linear transformation F : Rn → Rm such that → →, . . . , F (− → → F (− v1 ) = − w vk ) = − w 1 k → → →, . . . , − → are given m-vectors. (We are not where − v1 , . . . , − vk are given n-vectors and − w w 1 k assuming anything about these vectors – they can be L.I., L.D., etc.) →’s are specified in such a way that → w a. Show examples where − vi ’s and − i i. F is defined uniquely by the constraints, ii. no F can satisfy the constraints, iii. infinitely many different F definitions will satisfy our constraints. → →’s. b. Propose a procedure to decide between i, ii, and iii for any given − vi ’s and − w i → − → − c. Generalize your procedure to apply if F : V → W with v1 , . . . , vk in the vector space → in the vector space W. →, . . . , − w V and − w 1 k
252
Chapter 5 Linear Transformations
5.4 Chapter Review
Linear transformation F
V u+v + u
v
W F (u + v) F(v )
·c cu
+
F(cu ) ·c
F(u )
Kernel of a linear transformation
Given vector spaces V and W, F: V ® W is called a linear transformation if for every u and v in V and for every real number c: F (u + v ) = F ( u ) + F ( v )
and
F ( cu ) = c F ( u ) Range of a linear transformation
ker F = { v | F(v ) = 0 }
range F = { w | F(v ) = w }
the set of all v in V such that F(v) is the zero vector in W
the set of all w in W such that w is an image of some v in V
F
F
V
W
0
V
0
0
ker F
range F W
0
ker F is a subspace of V
range F is a subspace of W
Linear tranformation F is one-to-one
Linear transformation F is onto
if F ( u ) = F ( v ) implies u = v
if range F = W
(equivalent to ker F = {0} ) F F V
W
different vectors in V cannot have the same image in W
V
W
no vectors in W can be outside the range of F
Section 5.4 Chapter Review
253
254
Chapter 6 Orthogonality and Projections
6
Orthogonality and Projections
In Section 1.1, we have introduced orthogonality of two vectors in Rn , which generalized the familiar notion of perpendicular vectors in the plane or space into higher-dimensional nspaces. This chapter will feature further discussion of orthogonality but will also introduce the Rn analogue of an angle between any two vectors (not just perpendicular ones).
z 3 n u
y
O x
2
p
4
Last but not least, we will investigate an important concept of orthogonal projection. A simple ⎡ ⎤ 2 ⎢ ⎥ → example of a projection occurs when we seek the y-component of the vector − u = ⎣ 4 ⎦; 3 → → → → we are looking for a way to represent − u as a sum of two vectors: − p +− n such that − p (the → orthogonal projection) lies along the y-axis and − n is orthogonal to it. While it is extremely easy to solve this problem in our case15 , in this chapter we shall perform projections in more general contexts (i.e., onto lines other than coordinate axes or onto higher-dimensional subspaces of Rn ).
6.1 Orthogonality D EFINITION →, − →, . . . , − →} be an indexed set of vectors in Rn . We say that Let S = {− u u u 1
• •
2
k
→ → uj = 0 for all i = j, S is an orthogonal set if − ui · − → S is an orthonormal set if S is orthogonal and − ui = 1 for all i.
If S is an orthogonal (orthonormal) set that forms a basis for a space V , then S is called an orthogonal (orthonormal) basis for V . For simplicity, we will occasionally → are orthogonal (orthonormal)” which will be synonymous with “the →, . . . , − u say “− u 1 k →} is orthogonal (orthonormal)”. − → u indexed set {u1 , . . . , − k
E XAMPLE 6.1 ⎡ ⎢ ⎣
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎪ −2 1 ⎪ ⎨ 1 ⎬ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ The set S1 = ⎣ 2 ⎦ , ⎣ 1 ⎦ , ⎣ 2 ⎦ is orthogonal since ⎪ ⎪ ⎩ ⎭ 1 0 −5 ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 −2 1 1 −2 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 2 ⎦ · ⎣ 1 ⎦ = ⎣ 2 ⎦ · ⎣ 2 ⎦ = ⎣ 1 ⎦ · ⎣ 2 ⎦ = 0. 1
15
0
1
−5
0
−5
This simplicity is one of the reasons why the Cartesian coordinate system is so appealing.
Section 6.1 Orthogonality 255 ⎡ ⎤ 1 ⎥ √ ⎢ The set is not orthonormal; however, since ⎣ 2 ⎦ = 6 = 1 (we don’t have to check the 1 other lengths). ⎧⎡ ⎪ ⎪ ⎪ ⎪⎢ ⎨ ⎢ E XAMPLE 6.2 The set S2 = ⎢ ⎢ ⎪ ⎪ ⎣ ⎪ ⎪ ⎩ ⎡ ⎤ ⎡ 1 −1 2 ⎢ 1 ⎥ ⎢ 21 ⎢ 2 ⎥ ⎢ −2 ⎥ ⎢ • it is orthogonal: ⎢ ⎢ 1 ⎥·⎢ 1 ⎣ 2 ⎦ ⎣ 2 1 2
• Illustration for Example 6.1.
3 2
·
−2 3
⎤⎫ − 12 ⎪ ⎪ ⎪ ⎥ ⎢ 1 ⎥⎪ ⎥ ⎢ − 2 ⎥⎬ ⎥,⎢ ⎥ is orthonormal since ⎥ ⎢ 1 ⎥⎪ ⎦ ⎣ 2 ⎦⎪ ⎪ ⎪ ⎭ 1 ⎤ ⎡
2
⎥ ⎥ ⎥ = 0 and ⎥ ⎦
1 2
( The set S3 =
⎤
the lengths of both vectors are
E XAMPLE 6.3
1 2 1 2 1 2 1 2
1 4
3 2
+
1 4
+
,
1 4
−2 3
+
1 4
= 1.
,
2 2
+ is not orthogonal. Even though
= 0,
the other dot products do not equal zero: −2 2 3 2 · = 10, · = 2. 2 3 2 2 (Since this set is not orthogonal, it cannot be orthonormal.) → − → − E XAMPLE 6.4 The set {− e1 , → e2 , . . . , − e→ n } where ei is the ith column of In forms an orthonorn mal basis for R . The following result establishes a connection between orthogonality (or orthonormality) of vectors and their linear independence. T HEOREM 6.1
(Orthogonality and Linear Independence)
→, − → − → n 1. If S = {− u 1 u2 , . . . , uk } is an orthogonal set of nonzero vectors in R , then S is linearly independent. → − → →, − n 2. If S = {− u 1 u2 , . . . , uk } is an orthonormal set of vectors in R , then it is linearly independent. P ROOF To prove part 1, let us consider the equation → → + ··· + c − → − u c1 − 1 k uk = 0 . Taking a dot product of each side and the ith vector of the set S and applying Theorem 1.2 we obtain →·− → → − − → − → − → c1 ( − u 1 ui ) + · · · + ci ( ui · ui ) + · · · + ck (uk · ui ) = 0. From the orthogonality of S, it follows that all the dot products in parentheses are zero except 2 → → → for − ui · − ui = − ui which cannot be zero (since the vectors were assumed to be nonzero).
256
Chapter 6 Orthogonality and Projections Consequently, ci = 0. Because this argument can be repeated for all i = 1, 2, . . . , k, we have c1 = · · · = ck = 0, making S linearly independent. Part 2 follows once we realize that an orthonormal set must not contain a zero vector (since the lengths of all its vectors equal 1).
Because of the above theorem, it follows that every orthogonal set of nonzero vectors in Rn (and every orthonormal set) forms a basis for some subspace of Rn . Here is one of the main reasons why such orthogonal (or orthonormal) bases are very desirable.
→, − → − → n T HEOREM 6.2 If S = {− u 1 u2 , . . . , uk } is an orthogonal basis for a subspace V of R , then → − any vector v in V satisfies ⎡ − → → ⎤ v ·− u 1 → ⎥ →·− ⎢ − u u ⎢ 1 1 ⎥ → − → ⎥ ⎢ − ⎢ v · u2 ⎥ ⎢ → ⎥ − →·− → u 2 u2 ⎥ . [− v ]S = ⎢ ⎥ ⎢ .. ⎥ ⎢ . ⎥ ⎢ ⎥ ⎢ → − → ⎣ − v · uk ⎦ − →·− → u k uk If S is an orthonormal basis, then the formula simplifies to ⎡ − → → ⎤ v ·− u 1 ⎢ − → − → ⎥ v · u ⎢ 2 ⎥ → ⎥. [− v ]S = ⎢ .. ⎥ ⎢ ⎦ ⎣ . → − → v ·− u k
P ROOF →, − → − → − → Since S = {− u 1 u2 , . . . , uk } is a basis for V, by Theorem 4.18 any vector v in V can be expressed as a unique linear combination of vectors in S : → + ···+ c − → − → u v = c1 − 1 k uk . Taking a dot product of both sides with the ith vector in S yields → − → →·− → → − − → →·− → v ·− ui = c1 (− u · · · + ck ( − u 1 ui ) + · · · + ci ( ui · ui ) + k ui ) ; =0
=0
therefore, → → ui = 1. If S is orthonormal, then − ui · −
=0
=0
=0
→ − → v ·− ui ci = → . → − ui ui · −
⎡
E XAMPLE 6.5
⎤ ⎡ ⎤ ⎡ ⎤ 1 −2 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ The set S1 = {⎣ 2 ⎦, ⎣ 1 ⎦, ⎣ 2 ⎦} of Example 6.1 is orthogonal. 1 0 −5 − → u 1
− → u 2
− → u 3
Moreover, because it contains no zero vector, by Theorem 6.1 S1 is linearly independent. Consequently (by Theorem 4.16), it is a basis for R3 .
Section 6.1 Orthogonality 257 ⎤ 4 ⎢ ⎥ − for → v = ⎣ 0 ⎦ would require that we solve a ⎡
→ Ordinarily, finding a coordinate vector [− v ]S1
1 corresponding linear system. However, we can use the shortcut offered by Theorem 6.2: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎢ ⎢ → [− v ]S = ⎢ ⎣ ⎡ (Check:
5 6
1
⎤
⎡
⎢ ⎥ ⎣ 2 ⎦− 1
8 5
⎢ ⎣
−2
⎤
→ − → v ·− u 1 → − →·− u 1 u1 → − − → v ·u2 → − →·− u 2 u2 → − → v ·− u 3 → − →·− u u 3
⎥ 1 ⎦− 0
3
1 30
4+0+1 1+4+1
⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎦ ⎣ ⎡ ⎢ ⎣
1
−8+0+0 4+1+0
⎤
4+0−5 1+4+25
⎡
5
6 ⎥ ⎢ ⎥ ⎢ = ⎥ ⎢ − 85 ⎦ ⎣ 1 − 30 ⎤
⎥ ⎥ ⎥. ⎦
4 ⎥⎢ ⎥ 2 ⎦ = ⎣ 0 ⎦ .) −5 1
It’s even easier to find a coordinate vector with respect to an orthonormal basis. ⎡ E XAMPLE 6.6
⎢ ⎢ The orthonormal set S2 = {⎢ ⎢ ⎣
1 2 1 2 1 2 1 2
⎤ ⎡
⎤ − 12 ⎥ ⎢ 1 ⎥ ⎥ ⎢ −2 ⎥ ⎥, ⎢ ⎥ ⎥ ⎢ 1 ⎥} (see Example 6.2) is linearly ⎦ ⎣ 2 ⎦ 1 2
− → u 1
− → u 2
independent (Theorem 6.1), so that it spans a two-dimensional subspace of R4 .
⎡
⎢ ⎢ → − − → Let us use the result in Theorem 6.2 to find a coordinate vector [ v ]S2 for v = ⎢ ⎢ ⎣ → [− v ]S2 =
− → → v ·− u 1 → − − → v ·u 2
=
⎡
⎢ ⎢ By checking that 2 ⎢ ⎢ ⎣
1 2 1 2 1 2 1 2
⎤
3 3 1 1 2 + 2 − 2 − 2 3 1 1 −3 2 − 2 − 2 − 2
⎡
=
⎤ ⎡ − 12 3 ⎥ ⎢ 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − 4 ⎢ −2 ⎥ = ⎢ 3 ⎥ ⎢ 1 ⎥ ⎢ −1 ⎦ ⎣ 2 ⎦ ⎣ 1 −1 2
2 −4 ⎤
3 3 −1 −1
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
.
⎥ ⎥ − ⎥, we conclude that the vector → v is in ⎥ ⎦
span S2 .
⎡
⎢ ⎢ − → The final step above was crucial. To see why, consider a different vector, w = ⎢ ⎢ ⎣
− → → w ·− u 1 → − − → w ·u ⎤ 2⎡
4 0
1 3 3 1
⎤ ⎥ ⎥ ⎥; ⎥ ⎦
repeating the same procedure would yield = , which turns out not to be ⎤ ⎤ ⎡ ⎡ ⎡ ⎤ 1 − 12 2 3 2 ⎢ 1 ⎥ ⎢ 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 2 ⎥ ⎢ −2 ⎥ ⎢ 2 ⎥ ⎢ 3 ⎥ → − → ⎥ ⎥ ⎢ ⎢ ⎢ ⎥ ⎢ ⎥ . Apparently, − equal to [ w ]S2 : 4 ⎢ 1 ⎥ +0 ⎢ 1 ⎥ = ⎢ ⎢ w is not in spanS2 ; ⎥= ⎥ ⎣ 2 ⎦ ⎣ 2 ⎦ ⎣ 2 ⎦ ⎣ −1 ⎦ 1 1 2 −1 2 2 → therefore, it makes no sense to ask for a coordinate vector of − w with respect to the basis S2 .
258
Chapter 6 Orthogonality and Projections
Orthogonal matrices
Consider an m × n matrix A partitioned in the column-by-column fashion ⎤ ⎡ | | ··· | ⎥ ⎢ → − → ··· − A=⎣ − u→ u1 u 2 n ⎦. | | ··· | The product AT A is the following n × n matrix: ⎡ →T − ⎤ − − u ⎡ 1 ⎢ ⎥ − → T − ⎥⎢ ⎢ − u2 AT A = ⎢ .. .. ⎥ ⎢ .. ⎥⎣ ⎣ . . . ⎦ − − u→T −
| | − → − → u u 1 2 | |
n
⎡ − →·− →·− → − → ··· − →·− → u u u 1 u1 1 u2 1 un ⎢ − → − → − → − → − → − → ⎢ u2 · u1 u2 · u2 · · · u2 · un = ⎢ .. .. .. .. ⎢ . ⎣ . . . − → − → − → − → − → u→ un · u 1 u n · u 2 · · · u n · − n In light of this representation, the following theorem is obvious. T HEOREM 6.3
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
The columns of A are orthonormal if and only if AT A = In . ⎡
E XAMPLE 6.7
⎤ | ⎥ − u→ n ⎦ |
··· ··· ···
⎢ ⎢ A=⎢ ⎢ ⎣
1 2 1 2 1 2 1 2
T
A A=
⎤ − 12 ⎥ − 12 ⎥ ⎥ has orthonormal columns (Example 6.2). We have 1 ⎥ 2 ⎦ 1 2
1 2 − 12
⎡ 1 2 − 12
1 2 1 2
1 2 1 2
⎢ ⎢ ⎢ ⎢ ⎣
1 2 1 2 1 2 1 2
⎤ − 12 ⎥ − 12 ⎥ ⎥= 1 0 . 1 ⎥ 0 1 2 ⎦ 1 2
The result below reveals important properties of matrices of this type when they are multiplied by vectors. T HEOREM 6.4
Let A be an m × n matrix with orthonormal columns.
→ → → → → → 1. (A− u ) · (A− v)=− u ·− v for all n-vectors − u and − v. → → → 2. A− u = − u for all n-vectors − u. P ROOF → → → → → → → → → → → → v =− u T− v =− u ·− v. 1. (A− u ) · (A− v ) = (A− u )T (A− v)=− u T AT A− v =− u T In − 2 2 → → → → → → 2. Follows from the first part as A− u = (A− u ) · (A− u)= − u ·− u = − u .
Clearly, for an m × n matrix with n orthonormal columns, it is required that m ≥ n. An important special case arises when m = n. By Theorem 2.12, if A is an n × n matrix, then A T A = In
Section 6.1 Orthogonality
259
implies AAT = In , so the two statements yield AT = A−1 . D EFINITION (Orthogonal Matrix) An n × n matrix A is said to be orthogonal if AT = A−1 . Make sure to keep in mind that in an orthogonal matrix, column vectors must be orthonormal (rather than just orthogonal). The 2 × 2 matrix B =
E XAMPLE 6.8
an orthogonal matrix:
T
B B=
EXERCISES
√1 2 −1 √ 2
√1 2 √1 2
−1 √ 2 √1 2
√1 2 √1 2
has orthonormal columns so that B is −1 √ 2 √1 2
√1 2 √1 2
=
1 0 0 1
.
In Exercises 1–4, decide whether the given set is (i) orthonormal, (ii) orthogonal, but not orthonormal, or (iii) neither. 1. a. { 2. a. { ⎡ ⎢ ⎢ 3. a. {⎢ ⎢ ⎣ ⎡ ⎢ ⎢ 4. a. {⎢ ⎢ ⎣
1 −1
√1 2 √1 2
− 12 1 2 1 2 − 12
1 −1 2 −1
,
, ⎤ ⎡
3 3
⎡ ⎢ }; b. {⎣
−1 √ 2 √1 2
}; b. {
− 12
⎥ ⎢ 1 ⎥ ⎢ 2 ⎥,⎢ ⎥ ⎢ −1 ⎦ ⎣ 2 1 2
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎦ ⎣
3 4 1 1
− 13 2 3 − 23
−2 −2
⎤ ⎡
2 3 − 13 − 23
⎥ ⎢ ⎦,⎣ ,
1 −1
⎤
⎡
⎤ ⎡ ⎤ ⎡ ⎤ 1 1 −1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎦}; c. {⎣ −2 ⎦ , ⎣ −1 ⎦ , ⎣ 0 ⎦}. 1 1 1
,
0 0
⎡
⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ }; c. {⎣ 0 ⎦ , ⎣ 0 ⎦ , ⎣ −1 ⎦}. 0 −1 0
⎤
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎥ 0 0 0 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥}; b. {⎢ ⎣ 0 ⎦ , ⎣ 0 ⎦ , ⎣ 0 ⎦}. ⎥ ⎦ 0 0 0
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎦ ⎣
1 1 0 0
⎤
⎡
⎥ ⎢ ⎥ ⎢ ⎥}; b. {⎢ ⎥ ⎢ ⎦ ⎣
1 0 2 0
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎦ ⎣
0 1 0 −1
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎦ ⎣
−2 0 1 0
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎦ ⎣
0 1 0 1
⎤ ⎥ ⎥ ⎥}. ⎥ ⎦
260
Chapter 6 Orthogonality and Projections
3 −1
5. Given the orthogonal set S = { 0 → − v = . Check your answer. −20
2 6
,
− }, find the coordinate vector [→ v ]S for
⎡
⎤ ⎡ ⎤ ⎡ ⎤ 2 0 −1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ → 6. Given the orthogonal set T = {⎣ 0 ⎦ , ⎣ 2 ⎦ , ⎣ 0 ⎦}, find the coordinate vector [− u ]T 1 0 2 ⎡ ⎤ 3 ⎢ ⎥ → for − u = ⎣ −1 ⎦ . Check your answer. 2 ⎡
2 3 2 3 −1 3
⎢ 7. Given the orthonormal set S = {⎣ ⎡
⎤ ⎡ ⎥ ⎢ ⎦,⎣
⎤ 3 ⎢ ⎥ → → w = ⎣ −9 ⎦ . Check your answer. [− w ]S for − 12 ⎤ ⎡ ⎡ ⎢ ⎢ 8. Given the orthonormal set T = {⎢ ⎢ ⎣ ⎡ ⎢ ⎢ − → → − nate vector [ u ]T for u = ⎢ ⎢ ⎣
4 0 −2 −4
⎤
1 2 1 2 1 2 1 2
⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎦ ⎣
2 3 −1 3 2 3
−1 2 −1 2 1 2 1 2
⎤ ⎡
−1 3 2 3 2 3
⎥ ⎢ ⎦,⎣
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎦ ⎣
⎤ ⎥ ⎦}, find the coordinate vector
⎤ ⎡
−1 2 1 2 −1 2 1 2
⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎦ ⎣
10. a.
T/F?
1 −1 1 1
⎢ ⎢ ; b. ⎢ ⎢ ⎣
−1 2 1 2 −1 2 −1 2
1 2 1 2 1 2 −1 2
⎤ ⎥ ⎥ ⎥}, find the coordi⎥ ⎦
⎥ ⎥ ⎥ . Check your answer. ⎥ ⎦
In Exercises 9–10, decide whether the matrix is orthogonal. ⎡ ⎡ ⎤ −1 √ √1 0 2 1 −2 2 ⎢ 2 0 1 ⎢ ⎥ 9. a. ; b. ⎣ 1 2 2 ⎦ ; c. ⎢ 0 −1 0 ⎣ −1 0 √1 √1 2 −2 1 0 2 2 ⎡
−1 2 1 2 1 2 −1 2
−1 2 −1 2 1 2 −1 2
−1 2 1 2 1 2 1 2
⎤
⎡
⎢ ⎥ ⎢ ⎥ ⎥ ; c. ⎢ ⎢ ⎥ ⎣ ⎦
1 0 0 0
0 0 0 1
⎤ ⎥ ⎥. ⎦
0 0 0 0
0 0 1 0
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
In Exercises 11–15, decide whether each statement is true or false. Justify your answer. → − → → → − → → 11. If a set S = {− u,− v ,− w } of vectors in R4 is orthogonal, then so is T = {→ u,− v ,− w , 0 }, → − where 0 is the zero 4-vector. 12. If a set S of five vectors in R5 is orthogonal, then S forms a basis for R5 . 13. If T is an orthonormal set of vectors in R7 , then T is a basis for R7 . 14. If A is an orthogonal matrix, then AT is also orthogonal.
Section 6.2 Orthogonal Projections and Orthogonal Complements
261
15. If A is an orthogonal matrix, then A−1 is also orthogonal.
16. * Prove that if A is an orthogonal matrix, then det A equals either −1 or 1. → → 17. * Let A be an orthogonal n × n matrix and let {− u,− v } be an orthonormal set of vectors in → − → − n R . Show that {A u , A v } is also orthonormal. 18. * Prove that if A and B are n × n orthogonal matrices, then AB is also orthogonal. → 19. * Let A be an m × n matrix. If − u is an m-vector orthogonal to each column of A, show → T− that A u equals the zero n-vector. → |· · ·| − − → − → 20. * Show that if A = − u u→ 1 n is an orthogonal matrix and S = {u1 , . . . , un }, then for → → → any n-vector − x we have [− x ] = AT − x. S
21. * Let A be an m × n matrix, let U be an orthogonal m × m matrix, and let V be an orthogonal n × n matrix. Show that the matrix of the linear transformation F : Rn → Rm → → F (− x ) = A− x with respect to the bases made up of the columns of V and columns of U is T U AV. →} and {− → → →, . . . , − u v1 , . . . , − vl } are two linearly independent indexed sets of 22. * Suppose {− u 1 k → − → − n vectors in R such that ui · vj = 0 for all i and j. →, − → → − →, . . . , − u Show that the indexed set {− u 1 k v1 , . . . , vl } is linearly independent. → + ··· + c − → u Hint: Consider the first nonzero coefficient in the linear combination c1 − 1 k uk + → − → − → − d1 v1 + · · · + dl vl = 0 , and show that it cannot be any of the ci ’s (as that would contradict the linear independence of the first set) or any of the dj ’s (which would contradict the linear → → vj = 0). independence of the second set or the orthogonality − ui · −
6.2 Orthogonal Projections and Orthogonal Complements We have considered a number of special cases of orthogonal projections in Section 1.4: •
projections of vectors in R2 onto the coordinate axes (Example 1.19) and onto any line passing through the origin (Exercise 36 on p. 55) and
•
projections of vectors in R3 onto the coordinate axes or coordinate planes (p. 42). Our main objective in this section will be to provide a unified treatment to all projections in R2 and R3 , then extend it to the general Rn space.
Orthogonal projection onto a line in R2 or R3
→ → Let us assume that we have a given vector − u and a given nonzero vector − v , both in Rn with → → n = 2 or 3. We seek to construct an n-vector − p that becomes an orthogonal projection of − u → − → − onto the line extending along v passing through the origin, i.e., orthogonal projection of u → onto the span of − v , which we denote as → − → − → − u. p = proj span{ v }
262
Chapter 6 Orthogonality and Projections This vector has to satisfy the following two requirements:
u •
n
p
•
v
− → → p must be parallel to − v (the vector we are projecting onto), i.e., there exists a scalar k such that → − → p = k− v, (76) and → − → → → n =− u −− p must be orthogonal to − v , i.e., → − → → (u −− p)·− v = 0. (77) Substituting (76) into (77) and using properties of dot products from Theorem 1.2, we obtain → → → (− u − k− v)·− v = 0, − → → → → u ·− v − k (− v ·− v) = → − → − which can be solved for k (since v = 0 ) → − → u ·− v k= − . → → − v · v
0,
→ → The resulting formula for an orthogonal projection of − u onto the span of a nonzero vector − v is → − → u ·− v− → → − − projspan{→ v. (78) → → − v}u = − v · v
→ → T HEOREM 6.5 If − u and − v are two nonzero vectors in Rn with n = 2 or 3, then the angle → − → − α between u and v satisfies → − → u ·− v cos α = − . → → − u v P ROOF Let us consider the five possible cases: → → → → 1. If − u and − v are in the same direction, i.e., − u = c− v for some c > 0, then → − → − (c v ) · v = 1 = cos 0 = LHS. RHS = − → c→ v − v → → 2. If − u and − v form an acute angle α (i.e., 0 < α < π2 ), then the scalar k in (76) must be → → → positive, as the vectors − p = k− v and − v point in the same direction. In this case, we have → − → − → → − → → → − kv u ·− v − u ·− v v p = = = = RHS. LHS = cos α = − → → − → − → − → − → − → − u u u v v · v u → → → → 3. If − u and − v are perpendicular, i.e., − u ·− v = 0, then → − → − u · v π RHS = − = 0 = cos = LHS. → 2 → u − v
u
p a v
b
→ → → → → p = k− v and − v are in 4. If the angle α between − u and − v is obtuse ( π2 < α < π), then − opposite directions, and k < 0 in (76). Let β =π−α → denote the acute angle that is supplementary to α. This is the angle between the vectors − u → − → − and p (rather than v !), so we can refer to the case already established in part 2 of this proof: → − → u ·− p . cos β = − → → − u p
Section 6.2 Orthogonal Projections and Orthogonal Complements
263
Using the trigonometric identity cos α = − cos(π − α) = − cos β we have >0
→ → → − → − → −k (− u ·− v) u ·− v −→ u ·− p = = = RHS. LHS = − cos β = − → → − → − → − − → → − u p |k| u v u v → → → → 5. Finally, if − u and − v are in opposite directions, i.e., − u = −c− v for some c > 0, then → − → − (−c v ) · v = −1 = cos π = LHS. RHS = → → −c− v − v
The following table summarizes the relationship between the angles and their cosines.
y
y = cos a
1
90°
0°
180° a
-1 y
6 5 4
n
u
3 2
p
1
x
v 1
2
3
4
cos α = 1
α = 0◦ – vectors are in the same direction
0 < cos α < 1
0◦ < α < 90◦ – acute angle
cos α = 0
α = 90◦ – right angle
−1 < cos α < 0
90◦ < α < 180◦ – obtuse angle
cos α = −1
α = 180◦ – vectors are in the opposite directions
2 − → E XAMPLE 6.9 Let us consider the orthogonal projection of u = onto the line 6 2 − → spanned by the vector v = . Equation (78) yields 1 2 2 · 6 1 4 2 10 2 → − − = . = projspan{→ v}u = 5 1 2 1 2 2 · 1 1 2 4 −2 → − → − → − → Check that n = u − p = − = is orthogonal to − v. 6 2 4
5 √ 6 −1 → − √ √ and v = is E XAMPLE 6.10 2 − 3 √ √ √ √ → − → − 6− 6 − 3 u ·− v −2 6 √ =√ , cos α = − = √ = → → u − v 2 6+2 1+3 4 2 which implies α = 150◦ . − The cosine of the angle α between → u =
⎡
⎤ ⎡ ⎤ 2 1 ⎢ ⎥ ⎢ ⎥ − − E XAMPLE 6.11 The cosine of the angle α between → u = ⎣ 1 ⎦ and → v = ⎣ 1 ⎦ is 2 −1 1 ◦ √ cos α = 3 3 . This corresponds to α ≈ 78.9 .
264
Chapter 6 Orthogonality and Projections
Angle between vectors in Rn
The main difference between the discussions of Rn for n ≤ 3 vs. n > 3 (and the reason for having a separate subsection devoted to the latter) is that the notion of an “angle” between vectors does not automatically extend to higher-dimensional spaces. Such an extension will be offered here; however, we first need to build some theory to support the subsequent developments. (Cauchy-Schwarz Inequality) → → For all n-vectors − u and − v → → → → |− u ·− v | ≤ − u − v . T HEOREM 6.6
P ROOF → − → If − v = 0 , the inequality obviously holds (and becomes an equality: 0 = 0). → − → If − v = 0 , we apply Theorem 1.2 to obtain 2 → − → u ·− v− − → → 0 ≤ u − − v 2 → v → − → → − → u ·− v− u ·− v − → → − → → − · u − − = u − − 2 v 2 v → v → v 2 → − → → → u ·− v − (− u ·− v) → → → → → − → = − u ·− u −2 − ( u · v ) + (− v ·− v) 2 4 → → v − v 2 → → 2 (− u ·− v) 2 2 → → − → − = − u − − ( u · v ) + 2 2 → → v − v 2 → → (− u ·− v) 2 → = − u − 2 . → − v 2 → Multiplying both sides by − v yields 2 → 2 2 → → → 0 ≤ − u − v − (− u ·− v) 2 2 → 2 → → → (− u ·− v ) ≤ − u − v
Taking a square root on both sides leads to the desired inequality. (Triangle Inequality) → → For all n-vectors − u and − v → → → → − u +− v ≤ − u + − v . T HEOREM 6.7
P ROOF Applying Theorems 1.2 and 6.6 yields 2 → → → → → → u +− v ) · (− u +− v) − u +− v = (− → − → − → − → − → → = u · u + 2(u · v ) + − v ·− v → − → − → − → − → − → − ≤ u · u + 2|u · v | + v · v
2 2 → → → → ≤ − u + 2 − u − v + − v 2 → → = (− u + − v ) .
We are now tooled up to properly define angles between vectors in higher-dimensional nspaces.
Section 6.2 Orthogonal Projections and Orthogonal Complements
265
D EFINITION → → → → Let − u and − v be nonzero n-vectors with n > 3. The angle α between − u and − v is the real number in the interval [0, π] that satisfies → − → u ·− v . cos α = − → → u − v
(79)
→ − → −
u·v Note that Theorem 6.6 ensures this definition makes sense in that −1 ≤ → ≤ 1, result− v −u → ing in legitimate values for cos α.
At this point you are hopefully convinced that technically we can now calculate an angle between two vectors in, say, R6 . It is quite another story when it comes to the meaning of this calculation – what good is it to calculate such an angle if the vectors themselves cannot even be “seen”? Hopefully, the following example will help address this (very legitimate) concern.
⎡
E XAMPLE 6.12
⎢ ⎢ ⎢ ⎢ ⎢ → − The cosine of the angle α between u = ⎢ ⎢ ⎢ ⎢ ⎣
0 1 0 1 1 0
⎤
⎡
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ → − ⎥ and v = ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦
0 1 0 0 1 0
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ is ⎥ ⎥ ⎥ ⎦
√ 2 2 cos α = √ √ = √ ≈ 0.816. 3 2 3 Although this value of cosine, as well as the corresponding value α ≈ 35.3◦ cannot be readily interpreted in the physical, three-dimensional world, it does have other applications. Consider the following “dictionary” of six words ⎡ apple ⎢ ⎢ banana ⎢ ⎢ celery ⎢ ⎢ ⎢ dill ⎢ ⎢ eggplant ⎣ fig
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎦
→ The vector − u can be viewed as representing a query, containing the words “banana, dill, eggplant” (we set a component equal to 1 if the query or the document contains the word – otherwise, we set it equal to 0). → This query can be compared to the document represented by − v (with words “banana, eggplant”) by calculating the cosine of the angle between the two. The larger the cosine (i.e., the smaller the magnitude of the angle), the more similar the query and the document are.
266
Chapter 6 Orthogonality and Projections Let us⎡take ⎤ another document, containing all six words, corresponding to the vector 1 ⎢ ⎥ ⎢ 1 ⎥ ⎢ ⎥ ⎢ 1 ⎥ ⎢ ⎥ → − → u ) to this vector we obtain w = ⎢ ⎥ . Comparing the query (− ⎢ 1 ⎥ ⎢ ⎥ ⎢ 1 ⎥ ⎣ ⎦ 1 3 1 cos β = √ √ = √ ≈ 0.707. 3 6 2 Since cos β < cos α, which corresponds to the angle β = 45◦ exceeding α, we conclude that → → the first document (represented by − v ) matches the query (− u ) more closely than the second → − document ( w ) does. In spite of the fact that the second document contains all of the words in the query, it should not be judged a closer match (if it were allowed to happen, then every time we looked for a document in a search engine, we would keep bringing up encyclopedias and dictionaries, because of their completeness, rather than their relevance to our search!). The cosine formula accomplishes this by dividing the “raw” match count (the dot product in the numerator) by the lengths of the two vectors (so that longer documents are penalized). Document similarity models16 used in practice tend to be more elaborate, but a number of them are related to the idea described above.
Orthogonal complements
Having extended the notion of the angle from R2 and R3 to Rn , we are about to similarly generalize the concept of a projection beyond those discussed at the beginning of the section, in which we projected onto a line in R2 or in R3 . However, we will first need to develop a concept of the orthogonal complement of a vector space. D EFINITION (Orthogonal Complement) If V is a subspace of Rn , then the set V ⊥ of all n-vectors orthogonal to every n-vector in V is called the orthogonal complement of V . The notation V ⊥ is read “V perp”. → → vk be a basis for the subspace V. Then the k × n matrix Let − v1 , . . . , − ⎤ ⎡ → − − v1 T − ⎢ . .. .. ⎥ ⎥ . A=⎢ . . ⎦ ⎣ . → − − vk T − has •
(Row space of A) = V and
•
(Null space of A) = V ⊥ → − → → since any solution vector − x of the homogeneous system A− x = 0 must be orthogonal to each row of A. By Theorem 4.25, this leads us to the following result. 16
One of the early articles on the subject is the paper by Gerald Salton and Christopher Buckley, “Term-Weighing Approaches in Automatic Text Retrieval”, Inf. Process. Manage. 24, 3 (1988), 513–524, but a number of more recent sources are also available.
Section 6.2 Orthogonal Projections and Orthogonal Complements T HEOREM 6.8
267
If V is a subspace of Rn , then V ⊥ is also a subspace of Rn and dim V ⊥ = n − dim V.
In addition to (Row space of A)⊥ = (Null space of A), → − → we could consider the transpose of A, along with the homogeneous system AT − y = 0 , which leads us to (Column space of A)⊥ = (Null space of AT ). E XAMPLE 6.13
In order to find a basis for the orthogonal complement of ⎤ ⎡ ⎤ ⎡ ⎤ ⎤ ⎡ ⎡ 1 0 −1 2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ⎢ 0 ⎥ ⎢ 3 ⎥ ⎢ 3 ⎥ ⎢ 3 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ V = span{⎢ ⎢ 4 ⎥ , ⎢ −1 ⎥ , ⎢ 1 ⎥ , ⎢ 3 ⎥} ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 1 ⎦ ⎣ 1 ⎦ 0 1 2 −2
in R5 , we construct the matrix, whose rows are populated by the given vectors: ⎡ ⎤ ⎤ ⎡ 1 2 0 4 0 −2 0 2 0 −1 ⎢ ⎥ ⎥ ⎢ 1 ⎥ ⎢ ⎢ −1 3 −1 1 2 ⎥ 1 13 31 3 ⎥ ⎥ and its r.r.e.f. ⎢ 0 . A=⎢ ⎢ 0 ⎢ 0 3 1 1 1 ⎥ 0 0 0 0 ⎥ ⎣ ⎦ ⎦ ⎣ 1 3 3 1 0 0 0 0 0 0 The null space of A is the orthogonal complement we are seeking. It contains vectors in the form ⎡ ⎤ ⎡ ⎤ ⎡ ⎡ ⎡ ⎤ ⎤ ⎤ x1 −2 0 1 −2x3 + x5 ⎢ ⎥ ⎢ ⎥ ⎢ −1 ⎥ ⎢ −1 ⎥ ⎢ −1 ⎥ ⎢ x2 ⎥ ⎢ −1 ⎢ 3 ⎥ ⎢ 3 ⎥ ⎢ 3 ⎥ x − 1x − 1x ⎥ ⎢ ⎥ ⎢ 3 3 3 4 3 5 ⎥ ⎢ ⎢ ⎢ ⎥ ⎥ ⎥ ⎢ x3 ⎥ = ⎢ ⎥ ⎢ ⎢ ⎢ ⎥ ⎥ ⎥ = x + x + x x 1 0 3⎢ 4⎢ 5 ⎢ 0 ⎥. 3 ⎢ ⎥ ⎢ ⎥ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ ⎢ ⎥ ⎥ ⎥ x4 ⎣ x4 ⎦ ⎣ ⎦ ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎦ 0 0 1 x5 x5 ⎤ ⎡ ⎤ ⎤ ⎡ ⎡ −2 0 1 ⎢ −1 ⎥ ⎢ −1 ⎥ ⎢ −1 ⎥ ⎢ 3 ⎥ ⎢ 3 ⎥ ⎢ 3 ⎥ ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ⊥ ⎥ ⎥ ⎥ ⎢ ⎢ The vectors ⎢ 1 ⎥ , ⎢ 0 ⎥ , and ⎢ ⎢ 0 ⎥ form a basis for the orthogonal complement V ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎦ 0 0 1 of the space V. Note that the dimension of V is 2 (in spite of being spanned by four vectors).
Orthogonal projection onto a subspace of Rn → T HEOREM 6.9 If V is a subspace of Rn , then for every vector − u in Rn there exists a unique → − vector p in V such that → − → → u =− p +− n (80) → − ⊥ for some vector n in V . → → → → The vector − p is called the orthogonal projection of − u onto V and is denoted − p =proj − u. V
268
Chapter 6 Orthogonality and Projections P ROOF
,− →→ 0 , then V ⊥ = Rn so for every − u in Rn the equation (80) is satisfied by the unique → − → → → vectors − p = 0 and − n =− u. ,− →→ u in Rn the At the other extreme, if V = Rn , then V ⊥ = 0 so once again, for every − → − → → → equation (80) is satisfied by the unique vectors − p =− u and − n = 0. If V =
→ → For all remaining cases (i.e., when 0 < dim V = k < n), there exists a basis − v1 , . . . , − vk for V. − → − − → Also, by Theorem 6.8, there exists a basis vk+1 , . . . , vn for V ⊥ . → −→, . . . , − → n vk , − vk+1 v→ The vectors − v1 , . . . , − n form a basis for R (by Exercise 22 on p. 261 and by part → b of Theorem 4.16); thus it follows from Theorem 4.18 that for any vector − u in Rn , there exist unique real numbers c1 , . . . , cn such that → − → → −→ + · · · + c − → u = c1 − (81) v1 + · · · + ck − vk + ck+1 − vk+1 n vn , allowing us to construct the vectors → − → → p = c1 − v1 + · · · + ck − vk (82) in V and −→ + · · · + c − → → − vk+1 n = ck+1 − n vn → − → → → ⊥ in V that satisfy equation (80). The vector p is uniquely determined : if − u =− q +− m with → − → − → − → − → − − − → − → → − q = d1 v1 + · · · + dk vk and m = dk+1 vk+1 + · · · + dn vn , then q = p would contradict Theorem 4.18.
⎡
E XAMPLE 6.14
⎢ ⎢ ⎢ − → Let us find the orthogonal projection of the vector u = ⎢ ⎢ ⎢ ⎣
−5 5 −5 3 7
⎤ ⎥ ⎥ ⎥ ⎥ onto the ⎥ ⎥ ⎦
subspace V of R5 discussed in Example 6.13. It follows from the calculations preformed in Example 6.13 that dim V = 2, so that any two linearly independent vectors in V form a basis for V . In this case, we can use any two of the four given vectors since none of them is a scalar multiple of another (see Theorem 4.8). For ⎡ ⎡ ⎤ ⎤ 2 −1 ⎢ ⎢ ⎥ ⎥ ⎢ 0 ⎥ ⎢ 3 ⎥ ⎢ ⎢ ⎥ ⎥ → → ⎢ − ⎥ ⎥ instance, let us take the first two given vectors − v1 = ⎢ ⎢ 4 ⎥ and v2 = ⎢ −1 ⎥ as a basis for ⎢ ⎢ ⎥ ⎥ ⎣ 0 ⎦ ⎣ 1 ⎦ −2 2 V. ⎤ ⎤ ⎤ ⎡ ⎡ ⎡ −2 0 1 ⎢ −1 ⎥ ⎢ −1 ⎥ ⎢ −1 ⎥ ⎢ 3 ⎥ ⎢ 3 ⎥ ⎢ 3 ⎥ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ → → − → ⎥, − ⎥ ⎢ ⎢ 0 ⎥ v , and v = = In Example 6.13 it was also shown that − v3 = ⎢ 1 0 5 ⎥ 4 ⎥ ⎥ ⎢ ⎢ ⎢ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎦ 0 0 1 → → → → → → form a basis for V ⊥ . The equation − u = c1 − v1 + c2 − v2 + c3 − v3 + c4 − v4 + c5 − v5 corresponds to a ⎤ ⎡ 2 −1 −2 0 1 −5 ⎥ ⎢ −1 −1 ⎢ 0 3 −1 5 ⎥ 3 3 3 ⎥ ⎢ linear system whose augmented matrix ⎢ 0 0 −5 ⎥ ⎥ has the reduced ⎢ 4 −1 1 ⎥ ⎢ 1 0 1 0 3 ⎦ ⎣ 0 −2 2 0 0 1 7
Section 6.2 ⎡ 1 0 ⎢ ⎢ 0 1 ⎢ row echelon form ⎢ ⎢ 0 0 ⎢ ⎣ 0 0 0 0 ⎤ ⎡ −4 ⎥ ⎢ ⎢ 6 ⎥ ⎥ ⎢ ⎢ −6 ⎥ . ⎥ ⎢ ⎥ ⎢ 2 ⎦ ⎣ 6
Orthogonal Projections and Orthogonal Complements 269 ⎤ 0 0 0 −1 ⎥ 2 ⎥ 0 0 0 ⎥ → − → − → − 1 0 0 1 ⎥ ⎥ . We conclude that projV u = −1 v1 + 2 v2 = ⎥ 1 ⎦ 0 1 0 0 0 1 1
All projections discussed in Section 1.4 are special cases of the orthogonal projection introduced here. Recall that projections onto axes and coordinate planes in Section 1.4 were actually linear transformations. To see that this is also the case for a general orthogonal projection onto a subspace V of Rn , consider three possible cases: (i) dim V = 0, (ii) 0 < dim V < n, → u is a linand (iii) dim V = n. We leave it as an exercise for the reader to show that projV − ear transformation in cases (i) and (iii) (see Exercises 33 and 34, respectively). In case (ii), let → → −→, . . . , − S = {− v1 , . . . , − vk } be a basis for V, a subspace of Rn , and let {− vk+1 v→ n } be a basis for → − → − − − → − → ⊥ n V , so that T = { v1 , . . . , vk , vk+1 , . . . , vn } is a basis for R (refer to the proof of Theorem 6.9 for details). Let us consider three linear transformations: • •
→ → F : Rn → Rn defined by F (− u ) = [− u ]T (see Example 5.10), → − → n k 0 ]− u , and G : R → R defined by G( u ) = [Ik | ⎡
•
⎤
k×(n−k)
c1 ⎢ . ⎥ → − → − . ⎥ H : R → R defined by H(⎢ ⎣ . ⎦) = c1 v1 + · · · + ck vk (see Example 5.21). ck k
n
We leave it for the reader to verify that → → projV − u = H(G(F (− u ))). The following result follows directly from Theorem 5.2.
→ → T HEOREM 6.10 If V is a subspace of Rn , then F : Rn → Rn defined by F (− u u ) = projV − is a linear transformation.
In light of this, every projection onto a subspace of Rn can be represented using a matrix→ → → u = B− u , where the matrix B is the matrix of the linear vector multiplication F (− u ) = projV − transformation F which was discussed in Sections 1.4 and 5.3. As we shall see in the following subsection, if we know an orthogonal basis for V , it will be easy to find the orthogonal projection of a vector onto V (compared to Example 6.14) and to find the matrix of this transformation.
270
Chapter 6 Orthogonality and Projections
Orthogonal projection onto a subspace of Rn with a known orthogonal basis u proj span{v2}
u
pr
oj
sp
an
{v
1
v2 projVu
}u
v1
V
→ → If S = {− v1 , . . . , − vk } is an orthogonal basis for V , then equation (82) and Theorem 6.2 yield → − → → − → u ·− v1 − u ·− vk − → → → u =− v vk . + · · · + (83) projV − 1 → − → − → → − v1 · v1 vk · vk Note that the formula (78) → − → u ·− v− → → − − projspan{→ v u = → − → − v} v · v 2 3 for the orthogonal projection onto a line in R or R is a special case of (83). According to formula (83), the orthogonal projection onto a subspace V can be obtained by adding projections onto spans of individual vectors in an orthogonal basis for V. →, . . . , − →} for V is available, then (83) can be simplified to If an orthonormal basis {− w w 1 k → − → →) − → + · · · + (− → →) − →. u ·− w proj u = (− w u ·− w w
(84)
If dim V = 1, then this formula simplifies to → − → − − → − → → − u = ( u · w) w. proj
(85)
1
V
1
k
k
span{ w }
⎡ E XAMPLE 6.15
⎢ ⎢ → The vectors − v1 = ⎢ ⎢ ⎣
2 0 1 −1
⎤
⎡
⎥ ⎢ ⎥ ⎢ → ⎥ and − ⎢ v = 2 ⎥ ⎢ ⎦ ⎣
0 1 1 1 ⎡
⎢ ⎢ − → → − → − that v1 · v2 = 0). To find the orthogonal projection of u = ⎢ ⎢ ⎣
⎤ ⎥ ⎥ ⎥ form an orthogonal set (check ⎥ ⎦ 3 1 2 −2
⎤ ⎥ ⎥ → → ⎥ onto V = span{− v1 , − v2 } , ⎥ ⎦
we can apply formula (83): → projV − u =
=
=
→ − → → − → u ·− v1 − u ·− v2 − → → v1 + − v2 → − → − − → → v1 · v1 v2 · v2 ⎤ ⎤ ⎡ ⎡ ⎤ ⎤ ⎡ ⎡ 2 3 0 3 ⎥ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ ⎢ ⎢ 1 ⎥ ⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ 1 ⎥ ⎥⎡ ⎥·⎢ ⎢ ⎥ ⎥·⎢ ⎢ ⎤ ⎢ 2 ⎥ ⎢ 1 ⎥ ⎢ 2 ⎥ ⎢ 1 ⎥⎡ 2 ⎦ ⎦ ⎣ ⎣ ⎦ ⎦ ⎣ ⎣ ⎢ ⎢ ⎥ −1 ⎢ 0 ⎥ −2 1 ⎢ −2 ⎢ ⎥+ ⎡ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎤ ⎢ ⎢ ⎥ 2 0 2 ⎣ 1 ⎦ 0 ⎣ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ −1 ⎢ 1 ⎥ ⎢ 1 ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ 1 ⎥·⎢ 1 ⎥ ⎢ 1 ⎥·⎢ 1 ⎥ ⎣ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎦ −1 1 −1 1 ⎡ ⎡ ⎤ ⎤ ⎤ ⎡ 10 2 0 ⎢ ⎢ ⎥ ⎥ ⎢ 3 ⎥ ⎥ 1 ⎢ 1 ⎥ ⎢ 13 ⎥ 0 10 ⎢ ⎢ ⎥. ⎥+ ⎢ ⎥=⎢ ⎢ ⎥ ⎥ ⎥ ⎢ 6 ⎢ ⎣ 1 ⎦ 3⎣ 1 ⎦ ⎣ 2 ⎦ − 43 −1 1
→ → For any two n-vectors − u and − v we have → − → → → → → u T− v =− v T− u = [− u ·− v ].
0 1 1 1
⎤ ⎥ ⎥ ⎥ ⎥ ⎦
(86)
→ Let us take advantage of this equation and of the fact that a scalar multiple c− u can be written
Section 6.2 Orthogonal Projections and Orthogonal Complements
271
→ as a matrix product of − u and the 1 × 1 matrix [c] to rewrite the formula (84) for an orthogonal projection onto any subspace V of Rn : → → − − → − → → → →) − projV − u = (− u ·− w 1 w1 + · · · + ( u · wk ) wk →T − →T − → − → − → → − = − w 1 w1 u + · · · + wk wk u →− →T + · · · + − →T − → →− u. (87) w = − w1 w 1 k wk B
→ → If an orthogonal basis − v1 , . . . , − vk is available for V , then B can be calculated using the formula 1 − 1 − → → → → (88) v1 − v1 T + · · · + − vk − vk T . B=− → → → → v1 · − vk · − v1 vk E XAMPLE 6.16 Let us construct the matrix (with respect to the standard basis) of the projection discussed in Example 6.15 using the formula (88): 1 − 1 − → → → → v1 − v1 T + − v2 − v2 T B = − → → → − → v1 · v1 v2 · − v2 ⎤ ⎤ ⎡ ⎡ 2 0 ⎥ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ 1⎢ ⎢ 0 ⎥ 2 0 1 −1 + 1 ⎢ 1 ⎥ 0 1 1 1 = ⎥ ⎥ ⎢ ⎢ 6⎣ 1 ⎦ 3⎣ 1 ⎦ −1 1 ⎡ ⎤ ⎡ ⎤ 4 0 2 −2 0 0 0 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 0 0 ⎥ 1⎢ ⎢ 0 0 ⎥+ 1 ⎢ 0 1 1 1 ⎥ = ⎢ ⎥ ⎢ 6⎣ 2 0 1 −1 ⎦ 3 ⎣ 0 1 1 1 ⎥ ⎦ −2 0 −1 1 0 1 1 1 ⎡ ⎤ 2 1 1 0 − 3 ⎢ 3 1 31 ⎥ 1 ⎥ ⎢ 0 3 3 3 ⎥ . = ⎢ ⎢ 1 1 1 1 ⎥ ⎣ 3 3 2 6 ⎦ 1 − 13 31 61 2 ⎤ ⎡ 3 ⎥ ⎢ ⎢ 1 ⎥ → − ⎥ yields ⎢ We can verify that multiplying this matrix by the vector u = ⎢ ⎥ ⎣ 2 ⎦ −2 ⎡ ⎤ ⎡ ⎤⎡ ⎤ 2 1 1 10 3 0 3 −3 3 3 ⎢ ⎥ ⎢ 1 ⎥ ⎥⎢ 1 ⎥⎢ ⎢ 0 1 1 ⎢ ⎥ 1 ⎥ 3 3 3 ⎥⎢ ⎢ ⎥ = ⎢ 3 ⎥, ⎢ 1 1 1 ⎢ ⎥ ⎢ ⎥ 1 2 ⎦ ⎣ 2 ⎥ ⎣ 3 3 2 ⎦ 6 ⎦⎣ − 13
1 3
1 6
−2
1 2
→ which matches projV − u obtained in Example 6.15.
− 43
Yet another way to express the matrix B of (87) is B = AAT , where A is the n × k matrix
⎡
| | ⎢ → − → A=⎣ − w1 w 2 | |
··· ··· ···
(89) ⎤ | − → ⎥ w k ⎦. |
In the example that follows, we will consider the orthogonal projection onto the xy-plane in
272
Chapter 6 Orthogonality and Projections ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎡ ⎤ x x 1 0 0 x ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ R3 , F (⎣ y ⎦) = ⎣ y ⎦ = ⎣ 0 1 0 ⎦ ⎣ y ⎦, introduced in Section 1.4, attempting to z 0 0 0 0 z verify that (89) will reproduce the same formula. ⎡
⎤ 1 → = ⎢ 0 ⎥ and − → = An orthonormal basis for the xy-plane in R3 is − w w ⎣ ⎦ 1 2 0
E XAMPLE 6.17 ⎡
⎤ 0 ⎢ ⎥ ⎣ 1 ⎦ . Therefore, the matrix of this linear transformation (with respect to the standard basis) 0 is ⎡ ⎤ ⎡ ⎤ 1 0 1 0 0 ⎥ 1 0 0 ⎢ ⎥ ⎢ = ⎣ 0 1 0 ⎦. ⎣ 0 1 ⎦ 0 1 0 0 0 0 0 0
One of our objectives in the next section will be to develop a method for finding an orthogonal (or orthonormal) basis for a given subspace V of Rn , therefore allowing us to apply the procedures we developed in this subsection in case any basis for V (not necessarily orthogonal) is given.
EXERCISES
In Exercises 1–4, find the cosine of the angle between the two vectors. Are these vectors: •
forming an acute angle?
•
forming an obtuse angle?
•
perpendicular?
•
in the same direction? or
•
in the opposite directions? 1. a.
2. a.
⎡
1 4
,
−2 1
2 −3
⎤ ⎡ ⎤ −1 3 ⎥ ⎢ ⎢ ⎥ b. ⎣ 2 ⎦ , ⎣ 1 ⎦; 1 1 ⎡
,
−4 3
⎤ ⎡
⎤
1 2 ⎢ ⎥ ⎢ ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎥,⎢ ⎥ 3. a. ⎢ ⎢ 2 ⎥ ⎢ 4 ⎥; ⎣ ⎦ ⎣ ⎦ 1 2
;
⎡
;
⎤ ⎡ ⎤ 1 0 ⎢ ⎥ ⎢ ⎥ c. ⎣ 0 ⎦ , ⎣ 2 ⎦ . 2 1
⎤ ⎡ ⎤ 4 −6 ⎢ ⎥ ⎢ ⎥ b. ⎣ 2 ⎦ , ⎣ −3 ⎦; −2 3 ⎡
⎢ ⎢ ⎢ b. ⎢ ⎢ ⎢ ⎣
0 1 1 −1 1
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣
0 −1 1 −1 1
⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦
⎡
⎤ ⎡ ⎤ 3 1 ⎥ ⎢ ⎥ ⎢ c. ⎣ −4 ⎦ , ⎣ 2 ⎦ . 5 1 ⎡
⎡
1 ⎢ ⎢ 0 4. a. ⎢ ⎢ −1 ⎣ −1
Section 6.2 Orthogonal Projections and Orthogonal Complements ⎤ ⎤ ⎡ ⎡ 1 1 ⎤ ⎤ ⎡ ⎥ ⎥ ⎢ ⎢ ⎢ 1 ⎥ ⎢ 2 ⎥ 0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ 1 ⎥ ⎢ 1 ⎥ ⎥ ⎢ −1 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥,⎢ ⎥. ⎥,⎢ ⎥ ⎢ 1 ⎥; b. ⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎦ ⎣ ⎢ −1 ⎥ ⎢ −1 ⎥ ⎥ ⎢ ⎥ ⎢ 1 ⎣ 1 ⎦ ⎣ 1 ⎦ 1 2 ⎡
⎢ ⎢ 5. Find a basis for the orthogonal complement of span{⎢ ⎢ ⎣ ⎡ ⎢ ⎢ 6. Find a basis for the orthogonal complement of span{⎢ ⎢ ⎣ ⎡ ⎢ ⎢ 7. Find a basis for the orthogonal complement of span{⎢ ⎢ ⎣ ⎡ ⎢ ⎢ ⎢ 8. Find a basis for the orthogonal complement of span{⎢ ⎢ ⎢ ⎣
⎤ ⎡
⎤ 0 ⎥ ⎢ ⎥ ⎥ ⎢ 2 ⎥ 4 ⎥,⎢ ⎥ ⎥ ⎢ 1 ⎥} in R . ⎦ ⎣ ⎦ 1 ⎤ ⎡ ⎤ 1 1 ⎥ ⎢ ⎥ ⎢ ⎥ 3 ⎥ ⎥ , ⎢ 2 ⎥} in R4 . ⎥ ⎥ −1 ⎦ ⎢ ⎣ 1 ⎦ 1 4 ⎤ ⎡ ⎤ ⎡ −1 3 2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 2 ⎥ ⎢ 1 ⎥ ⎢ 3 ,⎢ ⎥,⎢ 1 ⎥ ⎦ ⎣ −2 ⎦ ⎣ −1 −3 1 −2 1 1 0 0
0 0 3 0 1
273
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣
0 2 2 1 0
⎤ ⎥ ⎥ ⎥} in R4 . ⎥ ⎦
⎤ ⎥ ⎥ ⎥ ⎥} in R5 . ⎥ ⎥ ⎦
→ − → − → → → − In Exercises 9–10, for the given − u and − v , find − p =projspan{→ v } u and verify that n = → − → − → − → − → − → − → − u − p is orthogonal to v . Sketch the vectors u , v , p , and n similarly to the figure next to Example 6.9. 3 −2 −4 1 → − → − → − → − 9. a. u = , v = ; b. u = , v = . −1 4 2 2 1 2 1 2 → − → − → − → − , v = ; b. u = 10. a. u = , v = . 4 −2 2 4 → → → → − → − − → − In Exercises 11–14, for the given − u and − v , find − p = projspan{→ v } u and verify that u − p → − is orthogonal to v . ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 1 1 2 ⎢ ⎥ → ⎢ ⎥ ⎢ ⎥ → ⎢ ⎥ → → 11. a. − u = ⎣ 2 ⎦, − v = ⎣ 1 ⎦; b. − u = ⎣ 0 ⎦, − v = ⎣ −2 ⎦ . 1 ⎡
1
⎤ ⎡ ⎤ 0 4 ⎢ ⎥ → ⎢ ⎥ → 12. a. − u = ⎣ 0 ⎦, − v = ⎣ 5 ⎦; 0 −1
1 ⎡
1
⎤ ⎡ ⎤ 3 2 ⎢ ⎥ → ⎢ ⎥ → v = ⎣ −1 ⎦ . b. − u = ⎣ −1 ⎦, − 2 1
274
Chapter 6 Orthogonality and Projections ⎡
⎤
⎡
1 0 ⎢ ⎥ ⎢ ⎢ ⎢ 1 ⎥ − → ⎥ → ⎢ 1 13. a. − u =⎢ ⎢ 2 ⎥, v = ⎢ −1 ⎣ ⎦ ⎣ −1 3 ⎡
⎤
⎡
⎡
⎤
⎢ ⎢ ⎢ → − b. u = ⎢ ⎢ ⎢ ⎣
⎥ ⎥ ⎥; ⎥ ⎦
3 1 ⎢ ⎥ ⎢ ⎢ ⎢ −1 ⎥ − → ⎥ → ⎢ 1 14. a. − u =⎢ ⎢ 0 ⎥, v = ⎢ −2 ⎣ ⎦ ⎣ 1 1
⎤ ⎥ ⎥ ⎥; ⎥ ⎦
3 0 −1 1 2 ⎡
⎢ ⎢ ⎢ → − b. u = ⎢ ⎢ ⎢ ⎣
⎡
⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − ⎢ ⎥, → ⎥ v =⎢ ⎢ ⎥ ⎣ ⎦
0 2 −2 4 0
⎤
2 1 −1 −1 −1 ⎡
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − ⎢ ⎥, → ⎥ v =⎢ ⎢ ⎥ ⎣ ⎦
⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦
3 2 −1 1 0
⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦
→ u for In Exercises 15–18, verify that the given set S is orthonormal and then find projspanS − → − the given vector u . ⎡ 1 ⎤ ⎡ 2 ⎤ ⎡ ⎤ 1 3 3 ⎥ ⎢ ⎥ → ⎢ ⎢ ⎥ 15. S = {⎣ 23 ⎦ , ⎣ − 23 ⎦}, − u = ⎣ −2 ⎦ . − 23 ⎡ ⎢ 16. S = {⎣ ⎡ ⎢ ⎢ 17. S = {⎢ ⎢ ⎣ ⎡
2 3 2 3 1 3 1 2 1 2 1 2 1 2
− 13
⎤ ⎡ ⎥ ⎢ ⎦,⎣
− 23
⎤ ⎡
1 3 2 3 1
⎥ ⎢ 21 ⎥ ⎢ −2 ⎥,⎢ ⎥ ⎢ −1 ⎦ ⎣ 2
− 12
⎢ 1 ⎢ 2 18. S = {⎢ ⎢ −1 ⎣ 2 − 12
1 2
⎤ ⎡
⎤
1 ⎡
⎤ 1 ⎥ − ⎥ → ⎢ ⎦}, u = ⎣ −2 ⎦ . 1 ⎤ ⎡ ⎤ 1 ⎥ ⎢ ⎥ ⎢ −2 ⎥ ⎥ − ⎥ ⎢ ⎥}, → u = ⎢ 1 ⎥. ⎥ ⎦ ⎣ ⎦ 1 ⎤ ⎡ ⎤
1 −1 ⎥ ⎢ 21 ⎥ ⎢ 21 ⎥ ⎢ −2 ⎥ ⎢ 2 ⎥,⎢ ⎥ ⎢ ⎥ ⎢ 1 ⎥,⎢ 1 ⎦ ⎣ 2 ⎦ ⎣ 2 − 12 − 12
⎡
⎥ ⎢ ⎥ − ⎢ ⎥}, → ⎢ ⎥ u =⎢ ⎦ ⎣
2 −1 1 0
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
→ In Exercises 19–22, verify that the given set S is orthogonal and then find projspanS − u for → − the given vector u . ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 −1 5 ⎢ ⎥ ⎢ ⎥ → ⎢ ⎥ 19. S = {⎣ 2 ⎦ , ⎣ 1 ⎦}, − u = ⎣ 2 ⎦. −1 ⎡ ⎢ 20. S = {⎣ ⎡ ⎢ ⎢ 21. S = {⎢ ⎢ ⎣
1
−2
⎤ ⎡ ⎤ ⎡ ⎤ 3 1 4 ⎥ ⎢ ⎥ → ⎢ ⎥ u = ⎣ 0 ⎦. −1 ⎦ , ⎣ 1 ⎦}, − 1 −2 1 ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ −1 0 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ → ⎢ 0 ⎥ ⎥ , ⎢ −1 ⎥ , ⎢ −1 ⎥}, − ⎢ ⎢ ⎥ ⎢ ⎥ u =⎢ 1 ⎥ ⎦ ⎣ 1 ⎦ ⎣ −1 ⎦ ⎣ 1 0 1
⎤ 2 ⎥ 0 ⎥ ⎥. 1 ⎥ ⎦ 2
⎡ ⎢ ⎢ 22. S = {⎢ ⎢ ⎣
Section 6.2 Orthogonal Projections and Orthogonal Complements ⎤ ⎡ ⎤ ⎡ ⎤ 2 −1 4 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ → ⎢ 1 ⎥ 1 ⎥ ⎥ , ⎢ 1 ⎥}, − ⎢ ⎥ ⎢ ⎥ u = ⎢ 0 ⎥. 1 ⎥ ⎦ ⎣ 1 ⎦ ⎣ ⎦ 1 0 −1
275
→ → 23. For the vectors − u and − v specified in Exercise 9a, find the matrix B (with respect to the → → standard basis) of the orthogonal projection onto span{− v } and verify that B − u yields the → − → − u which was found in Exercise 9a. (Compare B− u to the same vector as the projspan{→ v} answer for Exercise 9a found in Appendix A.) 24. Repeat Exercise 23 for the vectors specified in Exercise 9b. 25. Repeat Exercise 23 for the vectors specified in Exercise 11a. 26. Repeat Exercise 23 for the vectors specified in Exercise 11b. 27. Repeat Exercise 23 for the vectors specified in Exercise 13a. 28. Repeat Exercise 23 for the vectors specified in Exercise 13b.
→ 29. For the vector − u and the set S specified in Exercise 15, find the matrix B (with respect → to the standard basis) of the orthogonal projection onto spanS and verify that B − u yields → − → u to the the same vector as the projspanS u which was found in Exercise 15. (Compare B − answer for Exercise 15 found in Appendix A.) 30. Repeat Exercise 29 for the vector and the set specified in Exercise 19. 31. Repeat Exercise 29 for the vectors specified in Exercise 21. 32. Repeat Exercise 29 for the vectors specified in Exercise 17.
→ − 33. * Let V be the subspace of Rn whose dimension is 0 (i.e., V = { 0 }). Show that F : → → u ) = projV − u is a linear transformation. (Refer to the proof of Rn → Rn defined by F (− Theorem 6.9 for information on the projection onto such a subspace.) 34. * Let V be the subspace of Rn whose dimension is n (i.e., V = Rn ). Show that F : Rn → → → u is a linear transformation. (Refer to the proof of Theorem Rn defined by F (− u ) = projV − 6.9 for information on the projection onto such a subspace.)
In Exercises 35–36, consider V to be a subspace of Rn . → 35. * Prove that the Pythagorean formula holds for any n-vector − u : 2 2 2 → − → − → − → − u = proj u + u − proj u . V
V
→ 36. * Prove that for any n-vector − u the following inequalities hold: → → u − u ≥ projV − and → → → − u ≥ − u − projV − u.
(90) (91)
276
Chapter 6 Orthogonality and Projections
v
37. * Let A be an m × n matrix with orthonormal columns. Show that the angle between two → → → → n-vectors − u and − v remains the same as the angle between the m-vectors A− u and A− v.
z
y x a
u
38. All cabinet transformations F : R3 → R2 are linear transformations with the properties ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 1 1 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ , F ( ⎣ 0 ⎦) = F (⎣ 0 ⎦) = , F (⎣ 1 ⎦) = 2 0 1 0 0 1 ⎡ ⎤ 1 1 ⎢ ⎥ (see Example 1.30 on p. 47), but different angles α between and F (⎣ 0 ⎦) can be 0 0 used. a. Show that the transformation ⎡ ⎡ ⎤ ⎤ x x −1 √ 1 0 ⎢ ⎢ ⎥ ⎥ F1 (⎣ y ⎦) = 2−12 ⎣ y ⎦ √ 0 1 2 2 z z introduced in Example 1.30 corresponds to using α = 135◦ . b. Derive a formula for the cabinet transformation using the angle α = 120◦ instead. → → → → 39. * Prove that for any two n-vectors − u and − v , the rank of their outer product − u− v T is 0 or → → 1. (Hint: Find elementary row operations to introduce zeros into n − 1 rows of − u− v T .) When is the rank 0? In Exercises 40–42, we use the notion of a direct sum of spaces V and W, V ⊕ W, which was introduced in Exercise 16 on p. 206. 40. * Let V be a subspace of Rn . Prove that V ⊕ V ⊥ = Rn . → → − → → 41. * Let − p and − q be n-vectors such that → p ·− q = 0. Denoting V =span{− p } and W = → − ⊥ n (span{ q }) , show that V ⊕ W = R . 42. * Let V and W be subspaces of Rn such that V ⊕ W = Rn . It follows from the result of → → → Exercise 16 on p. 206 that for every n-vector − u , there exist unique vectors − v in V and − w → − → − → − → − → − in W such that u = v + w . We call the transformation F ( u ) = v the projection onto V along W. a. Show that in the special case W = V ⊥ , the transformation F becomes the orthogonal projection onto V. → → → → → → → → u is b. Let − p and − q be n-vectors such that − p ·− q = 1. Show that F (− u) = − p− q T− → − → − ⊥ the projection onto V = span{ p } along W = (span{ q }) . (Hint: Use the result of → → → → Exercise 41 to show that every n-vector − u can be expressed as − u = c− p +− w where → → − → − → − → − → − T− w · q = 0 and then show that c = u · q = q u .)
6.3 Gram-Schmidt Process and Least Squares Approximation →, . . . , − →} is given. According to Suppose a linearly independent set of n-vectors S = {− u u 1 k Chapter 4, S forms a basis for V = span S, a subspace of Rn . We would like to construct an orthogonal (or orthonormal) basis for V.
Section 6.3 Gram-Schmidt Process and Least Squares Approximation
→ → vk for V is found, an orthonormal basis It should be clear that once an orthogonal basis − v1 , . . . , − − → − → w1 , . . . , wk can be created simply by letting − →= 1 − → w vj for j = 1, . . . , k. j → − vj → − Therefore, we shall focus on obtaining an orthogonal basis − v1 , . . . , → vk .
u2
u2
} n{v 1
v2
277
j pa pro s
We construct the orthogonal basis one vector at a time.
v1 u 1= Illustration of the second step of the Gram-Schmidt process.
After the first step, consisting of simply copying the first vector from S, → − →, v =− u 1
1
the remaining vectors will be built according to these guidelines: •
v3
u3
v2
•
projspan{v ,v }u3 1 2
v1
span{v1,v2}
Illustration of the third step of the Gram-Schmidt process.
→ → in the second step, − v2 is determined, such that it is orthogonal to − v1 and → − → − − → − → span{ v1 , v2 } = span{u1 , u2 }, → in the third step, − v3 is determined, such that it is orthogonal to the vectors found before it, → − → − v1 , v2 , and → → →, − →, − →}, → v ,− v } = span{− u u u span{− v ,− 1
• •
2
3
1
2
3
etc.
→ in the last (kth) step, − vk is determined, such that it is orthogonal to the vectors found before → − − − → → − it, v1 , v2 , . . . , vk−1 , and → − − →, − → − → span{− v1 , → v2 , . . . , → vk } = span{− u 1 u2 , . . . , uk }. → → → v2 , . . . , − vi } created after the ith step is an orthogonal basis for The set {− v1 , − →, − →, . . . , − → span{− u u u }. 1
2
i
This is why, at the end, the resulting set of vectors is an orthogonal basis for the entire V. Here is how the steps are executed. In the second step, let − → v2 =
− → − → − proj − } u2 u 2 span{→ v 1 → − → − → − u2 · v1 − → = − u v1 . 2 → − → − v1 · v1 → → → → As we have shown in the previous section, such − v2 is orthogonal to − v1 . Moreover, − v1 and − v2 − → − → form an orthogonal basis for span{u , u }. (Check!) 1
2
In step three, we observe that → − v3 =
− → − → − proj − − u 3 span{→ v1 ,→ v2 } u3 → → − → − − →·− u 3 v2 − → − u3 · v1 − → → = − u − v v2 3 1 → → → − → − v1 v2 v1 · − v2 · − → → − − is orthogonal to every vector in the subspace span{− v1 , − v2 }, including → v1 and → v2 themselves. → − → − → − − → − → − → Therefore, { v , v , v } is an orthogonal basis for span{u , u , u }. 1
2
3
1
Generally, during the ith step, we would create → − → → − v = − u − proj i
2
3
→ −
−→} ui span{v1 ,...,− vi−1
i
→ − → v1 − ui · − → → v1 − · · · − = − ui − − → → v ·− v 1
1
(92)
→ − → v− ui · − i−1 − → v− i−1 . − − → − − v ·v → i−1
i−1
→ → The procedure described above, for finding an orthogonal basis − v1 , . . . , − vk for V, as well as an − → − → orthonormal basis w1 , . . . , wk for V, is called the Gram-Schmidt process.
278
Chapter 6 Orthogonality and Projections ⎡
⎡ ⎡ ⎤ ⎤ ⎤ 1 2 7 ⎢ ⎢ ⎢ ⎥ ⎥ ⎥ ⎢ −2 ⎥ − ⎢ 4 ⎥ ⎢ −1 ⎥ − → → − → ⎢ ⎢ ⎢ ⎥ ⎥ ⎥ E XAMPLE 6.18 Consider the vectors u1 = ⎢ ⎥ , u2 = ⎢ 0 ⎥, and u3 = ⎢ 9 ⎥ . ⎣ 1 ⎦ ⎣ ⎣ ⎦ ⎦ 3 −3 9 It can be verified that these vectors are linearly independent; therefore, they form a basis for a three-dimensional subspace V of R4 . We will use the Gram-Schmidt process to find • •
→ → → v2 , − v3 } for V and an orthogonal basis {− v1 , − − → →} for V . − → w an orthonormal basis {w , w , − 1
2
3
After copying the first vector,
⎡ ⎢ − → →=⎢ ⎢ v1 = − u 1 ⎢ ⎣
→ → we apply the formula (92) to find − v2 and − v : ⎡3 ⎢ → − → − ⎢ → − → → − u2 · v1 − ⎢ v v2 = − u = 2 1 → − → − ⎢ v1 · v1 ⎣
2 4 0
⎤
1 −2 1 3
⎤ ⎥ ⎥ ⎥, ⎥ ⎦
⎡
⎢ ⎥ ⎥ −15 ⎢ ⎢ ⎥− ⎥ 15 ⎢ ⎣ ⎦
−3 → − → − → − →·− u 3 v2 − − → → − u3 · v1 − → → v3 = − u − v v2 3 1 → → → − → − v1 v2 v1 · − v2 · − ⎤ ⎤ ⎡ ⎡ ⎡ 1 7 ⎥ ⎥ ⎢ ⎢ ⎢ ⎢ −1 ⎥ 45 ⎢ −2 ⎥ 28 ⎢ ⎥− ⎥− ⎢ ⎢ = ⎢ ⎢ 9 ⎥ 15 ⎢ 1 ⎥ 14 ⎢ ⎦ ⎦ ⎣ ⎣ ⎣ 3 9
1 −2 1
⎤
⎡
⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎦ ⎣
3
3 2 1 0
⎤
3 2 1
⎤ ⎥ ⎥ ⎥, ⎥ ⎦
0
⎡
⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎦ ⎣
−2 1 4 0
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
An orthonormal basis can be constructed from the unit vectors in the direction of each of the orthogonal basis vectors: ⎡ ⎤ 1 ⎢ ⎥ −2 ⎥ 1 − 1 ⎢ − → → ⎢ ⎥, w1 = − v1 = √ ⎢ → v1 15 ⎣ 1 ⎥ ⎦ 3 ⎡ ⎤ 3 ⎢ ⎥ 2 ⎥ 1 − 1 ⎢ − → → ⎢ ⎥, w2 = − v2 = √ ⎢ → v2 14 ⎣ 1 ⎥ ⎦ 0 ⎤ ⎡ −2 ⎥ ⎢ 1 ⎥ 1 − 1 ⎢ → − → ⎥. ⎢ v3 = √ ⎢ w3 = − → v3 21 ⎣ 4 ⎥ ⎦ 0
In the following example, the Gram-Schmidt process will help us find a matrix representation for the orthogonal projection onto a plane in R3 .
Section 6.3 Gram-Schmidt Process and Least Squares Approximation
279
Consider the orthogonal projection F : R3 → R3 onto the plane x − y − ⎡ ⎤ x ⎢ ⎥ 2z = 0 (i.e., onto V = {⎣ y ⎦ | x − y − 2z = 0} ). The solution space of the plane E XAMPLE 6.19
⎡
z x
⎤
⎡
y + 2z
⎢ ⎥ ⎢ equation consists of vectors ⎣ y ⎦ = ⎣ y z z ⎡ ⎤ ⎡ ⎤ 1 2 ⎢ ⎥ − ⎢ ⎥ → − → u1 = ⎣ 1 ⎦ , u2 = ⎣ 0 ⎦ form a basis for V. 0
⎤
⎡
1
⎤
⎡
2
⎤
⎥ ⎢ ⎥ ⎢ ⎥ ⎦ = y ⎣ 1 ⎦ + z ⎣ 0 ⎦ , so that the vectors 0 1
1
We are going to use formula (83) to find the matrix of F. However, for this we need an orthogonal basis for V, which we shall find using the Gram-Schmidt process: ⎡ ⎤ 1 → = ⎢ 1 ⎥, → − u v1 = − ⎣ ⎦ 1 0 ⎡
⎤ 2 − → →·→ ⎢ ⎥ →− − → − u 2 v1 − v =⎣ 0 ⎦− u v2 = − − → − 2 v1 1 v1 ·→ 1
⎡
⎤ ⎡ ⎤ 1 1 ⎥ ⎢ ⎥ 2 ⎢ 2 ⎣ 1 ⎦ = ⎣ −1 ⎦. 0 1
The matrix of the transformation can be obtained using (83) as → → 1 − v − v T − → − v1 1 1 v1 ·→ ⎡
+
→ → 1 − v − v T − → − v2 2 2 v2 ·→
⎤ ⎡ 1 1 ⎥ 1 ⎢ 1 ⎢ = 2 ⎣ 1 ⎦ 1 1 0 + 3 ⎣ −1 0 1 ⎡ ⎤ ⎡ 1 1 0 1 −1 ⎢ ⎥ ⎢ = 12 ⎣ 1 1 0 ⎦ + 13 ⎣ −1 1 0 0 0 1 −1
Least squares approximation
⎤ ⎥ ⎦
1 −1 1
⎤ ⎡ 1 ⎥ ⎢ −1 ⎦ = ⎣ 1
5 6 1 6 1 3
1 6 5 6 − 13
1 3 − 13 1 3
⎤ ⎥ ⎦.
→ − → D EFINITION Let A be an m × n matrix and let b be an m-vector. An n-vector − y is → − → − called a least squares solution of the system A x = b if the inequality → → − → − − x − b y − b ≤ A− A→ → holds for every n-vector − x.
If we denote the column space of A by W , then the system → − → A− x = b
(93)
is either • •
→ − consistent when b ∈ W (in which case any of its solutions will also be its least squares solution and vice versa – see Exercise 15) or → − inconsistent when b ∈ / W.
280
Chapter 6 Orthogonality and Projections
Ax - b - projW(Ax-b) = projWb - b = Ay - b (translated)
Ax - b (translated)
b Ax Ay = projWb W
projW(Ax-b) (translated)
Regardless of that, the system
→ − → A− x = projW b (94) → − is always consistent since its right-hand side projW b is in the column space of A. Moreover, → → for any n-vector − x and for any n-vector − y that solves (94) we have → − → − → − − − → → → from inequality (91) x − b ) A x − b ≥ A x − b − projW (A− → → − → − → x − b − projW (A− = A− since projW is a lin. transf. x ) + projW b → − → − → → → x − b − A− x + projW b since A− x is in W = A− → − − → = projW b − b → → − → since − y is a solution of (94), y − b = A−
which leads to the following important result.
→ T HEOREM 6.11 Let W denote the column space of A. If − y is a solution of the system → − → − → → − x = b. A x = projW b , then it is also a least squares solution of the system A−
→ − When b is projected onto W,
− → → → − b = projW b + − n, (95) → − the vector n is orthogonal to all vectors in W, including all columns of A. Therefore (see → − → − → − → Exercise 19 in Section 6.1), AT − n = 0 so that AT b = AT projW b . Consequently, premultiplying both sides of (94) by AT yields the system of equations → − → AT A− x = AT b (96) known as normal equations of the least squares problem (93). If the coefficient matrix AT A is invertible, this leads to the following explicit formula for the least squares solution of (93): −1 T − → → − x = AT A A b. (97)
⎡ E XAMPLE 6.20
⎢ ⎢ − → We leave it as an easy exercise for the reader to check that b = ⎢ ⎢ ⎣ ⎡
2 0 2 6
⎤ ⎥ ⎥ ⎥ is ⎥ ⎦
⎤ 1 −2 ⎢ ⎥ ⎢ 1 −2 ⎥ → − → ⎢ ⎥; thus the system A− not in the column space of A = ⎢ x = b is inconsistent. The ⎥ 1 ⎦ ⎣ −1 1 −1 normal equations (96) involve the coefficient matrix ⎡ ⎤ 1 −2 ⎢ ⎥ ⎥ 1 1 −1 1 ⎢ 1 −2 4 −6 T ⎢ ⎥= A A= . −2 −2 1 −1 ⎢ 1 ⎥ −6 10 ⎣ −1 ⎦ 1 −1
Section 6.3 Gram-Schmidt Process and Least Squares Approximation Together with the right-hand side,
→ T−
A b =
1 −2
⎡ 1 −1 1 −2 1 −1
⎢ ⎢ ⎢ ⎢ ⎣
2 0 2 6
281
⎤ ⎥ ⎥ 6 ⎥= , ⎥ −8 ⎦
4 −6 6 −6 10 −8
we can form the augmented matrix of the system, , whose reduced row 3 1 0 3 → − reveals the unique least squares solution x = . echelon form 0 1 1 1
Alternative formula for orthogonal projection
The formula (89) developed in Section 6.2 for orthogonal projection onto a column space of a matrix required that the matrix have nonzero orthogonal or orthonormal columns. Using formulas (97) and (94) we can write −1 T − → − → projW b = A AT A A b
(98)
provided that AT A is nonsingular, i.e., if A has linearly independent columns (see Exercise 15 in Section 4.6). E XAMPLE 6.21 Let us use formula (98) to obtain the matrix of the orthogonal projection onto the plane x−y −2z = 0 – we did so already in Example 6.19 using a different method. As ⎡ ⎤ ⎡ ⎤ 1 2 ⎢ ⎥ ⎢ ⎥ was shown in that example, vectors ⎣ 1 ⎦ and ⎣ 0 ⎦ form a basis for this plane. Therefore, 0
1 ⎡
⎤ 1 2 ⎢ ⎥ our plane is the column space of the matrix A = ⎣ 1 0 ⎦ with rank 2. The matrix (with 0 1 respect to the standard basis) of the orthogonal projection onto the plane is ⎡ ⎤⎛ ⎡ ⎤⎞−1 1 2 1 2 T −1 T 1 1 0 ⎢ ⎥⎜ 1 1 0 ⎢ ⎥⎟ A = ⎣ 1 0 ⎦⎝ A A A ⎣ 1 0 ⎦⎠ 2 0 1 2 0 1 0 1 0 1 ⎡ ⎤ −1 1 2 1 1 0 ⎢ ⎥ 2 2 = ⎣ 1 0 ⎦ 2 0 1 2 5 0 1 ⎡ ⎤ 1 2 5 − 13 1 1 0 ⎢ ⎥ 6 = ⎣ 1 0 ⎦ (verify the inverse calculation) 1 − 13 2 0 1 3 0 1 ⎤ ⎡ 5 1 1 ⎢ = ⎣
6 1 6 1 3
6 5 6 − 13
3
⎥ − 13 ⎦ (check both matrix multiplications), 1 3
which matches the matrix obtained in Example 6.19.
282
Chapter 6 Orthogonality and Projections
EXERCISES In each of the Exercises 1–4, a basis for a space V is provided. Use the Gram-Schmidt process to obtain an orthogonal basis and an orthonormal basis for V. ⎡ ⎢ 1. a. ⎣
⎡
⎤ ⎡
⎤
−2 −3 ⎥ ⎢ ⎥ 1 ⎦ , ⎣ 2 ⎦; 2 5 ⎤ ⎡
⎤
1 0 ⎢ ⎥ ⎢ ⎥ 2. a. ⎣ 1 ⎦ , ⎣ 2 ⎦; 0 2
⎡ ⎢ b. ⎣
⎡ ⎢ b. ⎣
⎡
⎤ ⎡
⎤ ⎡
⎡
⎤ ⎡
⎤ ⎡
⎤ ⎡
⎤
3 ⎥ ⎥ ⎢ ⎥ ⎢ 3 ⎥ ⎥,⎢ ⎥; ⎥ ⎢ 3 ⎥ ⎦ ⎣ ⎦ 0
⎢ ⎢ c. ⎢ ⎢ ⎣ ⎡
⎤
−3 2 ⎥ ⎢ ⎥ 4 ⎦ , ⎣ −3 ⎦; 3 −1 ⎤
⎡
⎤
2 4 ⎥ ⎢ ⎥ 1 ⎦ , ⎣ 3 ⎦; −1 2
−2 −1 1 ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ ⎢ 2 ⎥ ⎢ −3 ⎥ ⎢ −1 ⎥ ⎥ ⎥,⎢ ⎥,⎢ 3. a. ⎢ ⎢ 0 ⎥ ⎢ 2 ⎥ ⎢ 3 ⎥; ⎦ ⎣ ⎦ ⎦ ⎣ ⎣ 2 −1 2
1 0 ⎥ ⎢ ⎢ ⎢ 1 ⎥ ⎢ 0 ⎥ ⎢ 4. a. ⎢ ⎢ 1 ⎥ , ⎢ −3 ⎦ ⎣ ⎣ 0 1
⎤ ⎡
⎡ ⎢ ⎢ ⎢ b. ⎢ ⎢ ⎢ ⎣ ⎡ ⎢ ⎢ ⎢ b. ⎢ ⎢ ⎢ ⎣
1 0 0
⎢ ⎢ c. ⎢ ⎢ ⎣
⎤ ⎡
0 2 0
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎥ ⎢ 0 ⎦ ⎣ 0 4 −1
1 2 0 −1 −1
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣
1 6 0 1 −2
1 1 1 1
0 −2 −1 1
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎦ ⎣ ⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎦ ⎣
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣ ⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣
2 0 −1
6 4 0 2
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
2 −7 −5 5
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
⎤
⎥ ⎥ ⎥ ⎥. ⎥ ⎥ 1 ⎦ 4
0 0 1 0 0
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣
1 1 −1 1 0
⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦
In Exercises 5–8, find a matrix (with respect to the standard basis) for the linear transformation F : R3 → R3 performing the orthogonal projection onto the given plane in two ways: a. using formula (89), following the procedure of Example 6.19, and b. using formula (98), following the procedure of Example 6.21. Make sure both methods yield the same matrix. 5. x − y − z = 0. 6. x − 2y + 5z = 0. 7. x + z = 0. 8. y − 2z = 0. − → → In Exercises 9–14, find a least squares solution of the inconsistent system A− x = b with → − the given A and b . If many least squares solutions exist, find a general formula. ⎡ ⎤ ⎡ ⎤ 1 2 1 → ⎢ ⎥ ⎢ ⎥ − 9. A = ⎣ 1 1 ⎦ , b = ⎣ 3 ⎦ . 0 1
1
10.
11.
12.
13.
Section 6.3 Gram-Schmidt Process and Least Squares Approximation ⎡ ⎤ ⎡ ⎤ 2 1 1 → ⎢ ⎢ ⎥ − ⎥ A = ⎣ −1 0 ⎦, b = ⎣ 2 ⎦. −1 −3 0 ⎤ ⎡ ⎤ ⎡ 2 1 −2 ⎥ ⎢ ⎥ ⎢ ⎢ 0 ⎥ ⎢ 1 −2 ⎥ − ⎥ ⎢ ⎥, → b = A=⎢ ⎢ 2 ⎥. ⎢ −1 1 ⎥ ⎦ ⎣ ⎦ ⎣ 6 1 −1 ⎤ ⎡ ⎤ ⎡ 0 1 −1 ⎥ ⎢ ⎥ ⎢ ⎢ 1 −1 ⎥ − 3 ⎥ → ⎢ ⎥ ⎢ ⎥ ⎢ A=⎢ ⎥ , b = ⎢ −1 ⎥ . ⎣ −1 −1 ⎦ ⎦ ⎣ 0 0 1 1 0 2 1 → − A= , b = . 2 0 4 1
283
⎡
⎤ ⎡ ⎤ 0 1 0 −1 2 → ⎢ ⎥ ⎥ − ⎢ 14. A = ⎣ 1 2 −1 , b = 0 ⎦ ⎣ 0 ⎦. 1 0 −1 2 1 → − 15. * Let A be an m × n matrix and let b be an m-vector in the column space of A. Show that → − → − → − → → y is a least squares solution of A− x = b if and only if A− y = b. 16. * Let B be an m × n matrix whose columns are orthogonal vectors with m ≥ n. a. If all columns of B are nonzero vectors, show that an orthogonal m × m matrix U and an m × n rectangular diagonal matrix R can be found such that B = U R. (99) (Hint: Let the main diagonal entries of R equal the magnitudes of the corresponding columns of B.) b. If there are some zero columns in B, show that (99) can still be obtained. (Hint: Refer to the hint from part a and consider using the Gram-Schmidt process.) 17. * Let B be an m × n matrix whose columns are orthogonal vectors with m < n. a. Show that at least n − m columns of B must be zero vectors. b. Assuming the zero columns are all on the right side of B, show that B can be factored as in (99) with an orthogonal m × m matrix U and an m × n rectangular diagonal matrix R. → − 18. * Let b be an m-vector, let A be an m × n matrix, and let V denote the row space of A. → − → → Show that if A− x = A− y = b , then → → projV − x = projV − y. → → and − → → with − → → →, − → ∈ V ⊥ .) x +− n y +− n (Hint: − x =proj − y =proj − n n V
1
V
2
1
2
19. * Show that the Gram-Schmidt process formula can be rearranged by computing → − vi → − → → →) − → → −−→ −−→ − − → vi = − for i = 1, 2, . . . , k (100) ui − (− ui · − w 1 w1 − · · · − ( ui · wi−1 ) wi−1 and wi = − → vi →, rather than the orthogonal so that the projection in each step uses orthonormal vectors − w i → − vectors vi as in (92).
284
Chapter 6 Orthogonality and Projections 20. * Prove that (100) is equivalent to ⎡ ⎢ A =⎣
− → u 1
− → u 2
− → u 3
⎤ − → u k
···
⎥ ⎦ ⎡
⎡ ⎢ =⎣
− → w 1
− → w 2
− → w 3
···
− → w k
Q
→ →·− → − v1 − u 2 w1 ⎤⎢ → ⎢ 0 − v2 ⎢ ⎥⎢ 0 0 ⎦⎢ ⎢ . .. ⎢ . . ⎣ . 0 0
→ − →·− u 3 w1 · · · − →·− → u 3 w2 · · · → − v3 · · · .. .. . . 0
···
− →·− → u k w1 − →·− → u k w2 − → − → u ·w k
.. . → − vk
3
⎤ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦
R
This is known as a QR decomposition of the matrix A, which is of great importance in numerical linear algebra. 21. * Use results from Example 6.18 to obtain a 4 × 3⎡matrix Q with orthonormal columns and ⎤ 1 2 7 ⎥ ⎢ ⎥ ⎢ −2 4 −1 ⎥ = QR (see Exercise a 3 × 3 upper triangular matrix R such that A = ⎢ ⎢ 1 0 9 ⎥ ⎦ ⎣ 3 −3 9 20). 22. * An alternative representation, also referred to as a QR decomposition in some sources, R 5 5 is an n × n orthogonal matrix. In this case, involves A = Q|Q where Q|Q 0 5 form an orthonormal basis for the orthogonal complement of the column columns of Q space of A. Append the appropriate fourth column to the matrix Q found in Exercise 21 so that the resulting matrix is orthogonal. Verify that when multiplying this 4 × 4 matrix by the matrix obtained by adding a zero row at the bottom of the “old” matrix R, the result continues to be A. 23. * Show that the matrix of orthogonal projection onto the column space of A, B = A(AT A)−1 AT , of (98) is • •
idempotent: B 2 = B and symmetric: B T = B.
6.4 Introduction to Singular Value Decomposition This section will introduce the notions of singular values and singular vectors. It includes some key theoretical results but will mostly be focused on developing a thorough geometric appreciation for these notions when applied to 2 × 2 matrices. In Section 7.4, we will develop a systematic approach to computing singular values and vectors for a general m × n matrix.
Section 6.4 Introduction to Singular Value Decomposition E XAMPLE 6.22
285
Consider the following 2 × 2 matrices √ 3 3 3 2 0 √ and B = A= 0 2 −1 1
along with their associated linear transformations x1 x1 y1 x1 x1 y1 F( )=A = and G( )=B = . x2 x2 y2 x2 x2 y2 In Section 1.4, on p. 39 we discussed how the columns of the 2 × 2 matrix lead to a geometric visualization of the corresponding linear transformation, where each square on the input side is transformed into a parallelogram on the output side. The given A and B can be illustrated as follows:
y2 x2
Ö ``2
1 e2
F
1
F(e2)
x1
e1
y1
F(e1)
1
3
3Ö ``2
-1
y2 x2 1 e2
G
1
x1
e1
G(e2)
1
y1 3
-1
G(e1)
As you can see, some output vectors resulting from both F and G appear to be much longer than others, depending on the direction of the input (even if we restrict ourselves to input → → vectors of the same length). For example F (− e1 ) is three times as long as F (− e2 ), even though → − → − e1 = e2 = 1. Let us scrutinize this in more detail by studying the set of output vectors resulting from the unit input vectors only – in other words, the question is: what set does the unit circle x21 + x22 = 1 (101) map into on the y1 y2 -plane? •
Answering this for A is quite simple. We have √ √ 3 2x1 = y1 and 2x2 = y2 ; therefore, y1 y2 x1 = √ and x2 = √ . 3 2 2
286
Chapter 6 Orthogonality and Projections Substituting into equation (101) we obtain y2 y12 + 2 = 1, 18 2 which describes an ellipse centered√at the origin, √ with√the horizontal semimajor axis and the vertical semiminor axis of lengths 18 = 3 2 and 2, respectively:
y2 x2
Ö ``2
1 e2
F
1
F(e2)
x1
e1
y1
F(e1)
1
3
3Ö ``2
-1
•
For B, the algebra is a little more involved. The linear system y1 3 3 x1 = x2 y2 −1 1 3 3 y1 has the augmented matrix , whose reduced row echelon form is −1 1 y2 1 0 16 y1 − 12 y2 . Consequently, we have 0 1 16 y1 + 12 y2 1 1 x1 = y1 − y2 6 2 1 1 y1 + y2 x2 = 6 2 Substituting into (101) yields y1 y2 y2 y2 y1 y2 y2 y12 − + 2 + 1 + + 2 =1 36 6 4 36 6 4 which simplifies to the same ellipse equation we have obtained above: y2 y12 + 2 =1 18 2
(102)
y2 x2
Ö ``2
1 e2
G
1
x1
e1
G(e2)
1
y1 3
-1
3Ö ``2
G(e1)
Let us now examine which specific unit input vectors map to the longest and shortest outputs (i.e., major and minor axes, respectively).
Section 6.4 Introduction to Singular Value Decomposition •
287
Of all unit input vectors for F, √ → → e1 ), • vectors ±− e1 transform into the longest possible vectors (±3 2− √ − → → − • vectors ± e transform into the shortest ones (± 2 e ). 2
•
2
For G, we can use (102) to determine the following: √ √ x1 3 2 1/ 2 y1 √ . result from =± =± • The longest output vectors y2 x2 0 1/ 2 √ y1 x1 0 −1/ 2 √ • The shortest output vectors result from . =± √ =± 1/ 2 y2 2 x2 → Note that the patterns we established with respect to the unit input vectors − u will continue to → − → − → − apply to other vectors k u since F (k u ) = |k| F ( u ). Let A be an m × n matrix with the corresponding linear transformation F : Rn → Rm , → → → F (− x ) = A− x = − y . To generalize the discussion of the example above to apply to this situation, consider the following construction: → → − Step 1. Of all the input vectors − x such that − x = 1, find a vector → v1 that leads to the → − longest output A x ; i.e., → → maxn A− x . A− v1 = → −
•
x ∈R
− x =1 →
→ Step 2. Let W2 = (span{− v1 })⊥ – the orthogonal complement (see p. 266) of the span of → → x for all unit the vector we found in Step 1. Find a vector − v2 maximizing the length of A− vectors in W2 : → → A− v2 = → max A− x . −
•
x ∈W2
− x =1 → → − − − → → → ⊥ Step i. Let Wi = (span{ v1 , . . . , vi−1 }) . Let − vi be a vector maximizing the length of A− x for all unit vectors in Wi : → → A− vi = → max A− x . −
•
x ∈Wi
− x =1 →
The procedure is carried out for i = 1, 2, . . . , n. Note that in the final step Wn has dimension 1 → → (straight line through the origin), so that there are only two vectors − x such that − x = 1 and → − x ∈ Wn , one being the negative of the other. Either one of them can be taken to be the final − → v . n
→ While it may not be immediately obvious how one should go about choosing the appropriate − vi vectors desired here, such vectors always exist17 (although they are not unique). The following result states crucial properties of these vectors.
17
One way to justify this would be to realize that at every step of the procedure we are maximizing a continuous function of n variables over a closed and bounded nonempty domain in Rn – such a maximum is guaranteed to exist.
288
Chapter 6 Orthogonality and Projections T HEOREM 6.12
→ Let A be an m × n matrix. If the vectors − v1 , . . . , − v→ n satisfy the condition → − → − A vi = → max A x for i = 1, 2, . . . , n − x ∈Wi
− x =1 → → − → ⊥ n where W1 = R and Wi = (span{ v1 , . . . , − v− i−1 }) for i = 2, . . . , n, then
→ 1. the numbers A− vi form a nonincreasing sequence: → → A− v1 ≥ A− v2 ≥ · · · ≥ A− v→ n , → n 2. the vectors − v1 , . . . , − v→ n form an orthonormal basis for R , → v→ 3. the vectors A− v1 , . . . , A− n are orthogonal. P ROOF → → → 1. Assume there are two integers i < j such that A− vi < A− vj . Since − vj ∈ Wi , this leads → − → − A x ≥ A v . to a contradiction as → max j − x ∈Wi
− x =1 →
→ → → ⊥ → − 2. By construction, − vi ∈ Wi = (span{− v1 , . . . , − v− i−1 }) so that vi is orthogonal to every vector − − → → − − − → → → − vi = 1 so that in span{ v1 , . . . , vi−1 }, including v1 , . . . , vi−1 themselves. Furthermore, − n the set is orthonormal. Finally, since the set contains n = dim R linearly independent vectors (by Theorem 6.1), it follows from Theorem 4.16 that it is a basis for Rn . → − → 3. Consider a vector − w = α→ vi + β − vj with α2 + β 2 = 1 and 1 ≤ i < j ≤ n. Then we must have → → w A− vi ≥ A− → − → − as w ∈ Wi and w = 1. Squaring both sides yields → − → → → → → → → vi T AT A− vi T AT A− vj T AT A− vi ≥ α2 − vi + 2αβ − vi T AT A− vj + β 2 − vj 2 − → → → − → → − → T T − T T − T T − β ( vi A A vi − vj A A vj ) ≥ 2αβ vi A A vj 2α − → → − → → → → vi T AT A− vi T AT A− vi − − vj T AT A− vj ≥ vj (assuming β = 0). β → → vj = 0, then we reach a contradiction, as the right-hand side of this inequality If − vi T AT A− can be made arbitrarily large by choosing β sufficiently close to 0 (and the sign of αβ to be → → vj ). the same as the sign of − vi T AT A− → Therefore, for this inequality to be satisfied for all − w , we must have → − → → − → T T − v A A v = (A v ) · (A− v ) = 0. i
j
i
j
→ Part 1 of Theorem 6.12 implies that once a zero shows up in the sequence A− vi , then all → − of its remaining elements are zero as well. If we let A vr be the last nonzero element, then according to part 3 of the theorem the vectors 1 1 − →= → → →= u A− v1 , . . . , − A− vr u 1 r → − → A v1 A− vr are orthonormal. −→ → If r < m, then we can construct m − r vectors − u− r+1 , . . . , um by first obtaining a basis for →, . . . , − →})⊥ (see Example 6.13), then using the Gram-Schmidt process to convert (span{− u u 1 r that basis to an orthonormal one. →, . . . , − m Vectors − u u→ 1 m assembled in this fashion form an orthonormal basis for R .
Section 6.4 Introduction to Singular Value Decomposition
289
D EFINITION (Singular Value Decomposition) → Let r be the index of the last nonzero value in the A− vi sequence; i.e., → − → − − − → A v1 ≥ · · · ≥ A vr > Avr+1 = · · · = A− v→ n = 0. We shall refer to → → σ 1 = A− v1 , . . . , σ r = A− vr as well as σ r+1 = · · · = σ min(m,n) = 0 as singular values of A. → The orthonormal vectors − v ,...,− v→ are called the right singular vectors of A. 1
n
→, . . . , − The left singular vectors of A are the orthonormal vectors − u u→ 1 m such that • •
Matrix form of SVD
1 → − → ui = A − vi for i = 1, . . . , r, σi −−→, . . . , − − → − → ⊥ if r < m, then u u→ r+1 m is an orthonormal basis for (span{u1 , . . . , ur }) .
A singular value decomposition (abbreviated SVD) of A is often convenient to express in matrix form. T HEOREM 6.13 For every m × n matrix A, there exists an orthogonal m × m matrix U, an orthogonal n × n matrix V, and a rectangular diagonal m × n matrix Σ such that AV = U Σ
(103)
A = U ΣV T .
(104)
or, equivalently, P ROOF Populating the first r main diagonal entries of an m × n rectangular diagonal matrix with singular values of A we obtain ⎡
Σ
σ1 0 .. . 0
⎢ ⎢ ⎢ ⎢ ⎢ = ⎢ ⎢ ⎢ ⎢ ⎣
··· ··· .. .
0 σ2 .. . 0
···
0 0 .. . σr
0
⎤ 0
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
0
r columns
⎫ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎭ +
r rows (105) m − r rows
n − r columns
Use left singular vectors as columns for an m × m matrix U and right singular vectors as columns for an n × n matrix V : ⎡ ⎤ ⎤ ⎡ ⎢ U =⎣
− → u 1
···
− u→ m
⎥ ⎦,
⎢ V =⎣
− → v1
···
Both matrices are orthogonal because their columns are orthonormal.
− v→ n
⎥ ⎦.
290
Chapter 6 Orthogonality and Projections We can write
⎤
⎡ ⎢ AV = ⎣
and
→ A− v1
→ A− vr
···
→ A− v− r+1
···
⎥ ⎦
A− v→ n
⎡
⎤
⎢ UΣ = ⎣
→ σ1 − u 1
− → 0
→ σr − u r
···
− → 0
···
⎥ ⎦.
The first r left singular vectors have been defined so as to make the corresponding columns in → − → both matrices equal. The remaining n − r columns equal 0 since we have A− vi = 0 for all i > r. The equality (104) is obtained from (103) by postmultiplying it on both sides by V T = V −1 .
E XAMPLE 6.23
Let us recast our findings of Example 6.22 using the terminology of SVD.
Both matrices, A and B, share the same singular value sequence: √ √ σ 1 = 3 2, σ 2 = 2. → → → → For the matrix A, right singular vectors could be taken as − v1 = − e1 and − v2 = − e2 and left singular − → → − − → → − vectors are also u1 = e1 and u2 = e2 . The resulting SVD is √ √ 1 0 1 0 3 2 0 3 2 0 √ √ = . 0 0 2 2 0 1 0 1 A
− For B, taking → v1 = interesting SVD
V
U
Σ
√ √ 1/ 2 −1/ 2 − → →= → − → = → − √ √ , v2 = ,− u e1 , and − u e2 yields a more 1 2 1/ 2 1/ 2
3 3
−1 1 B
√ 1/ 2 √ 1/ 2
√ √ −1/ 2 3 2 0 1 0 √ √ = 0 1/ 2 2 0 1 V
U
Σ
(verify this by multiplying the matrices out on each side). → → The geometric interpretation of transformation G(− x ) = B− x can be this√result is that the √ √ 1 0 3 2 0 1/ 2 1/ 2 − → √ √ √ expressed as G(x) = x. 0 1 0 2 −1/ 2 1/ 2 U
Σ
VT
cos α
− sin α
x1
Recall from Example 1.23 that multiplication rotates the vector sin α cos α x2 x1 → → x rotates − x by −45◦ (or, 45◦ clockwise). counterclockwise by α. Therefore, V T − x2 Then, √ to scaling the (new) horizontal component by the factor √ premultiplying by Σ amounts 3 2 and scaling the vertical by 2. The final multiplication by U = I leaves the vector unchanged.
Section 6.4 Introduction to Singular Value Decomposition
Geometry of SVD in general case
291
Let A be an m×n matrix. Using a singular value decomposition of A, the linear transformation → − → y = A− x can be rewritten as
− → → y = U ΣV T − x
or
→ → UT − y = ΣV T − x.
→ → → → Let us denote − y ∗ = UT − y and − x ∗ = V T− x . Since U and V (as well as their transposes) are orthogonal matrices, then by Theorem 6.4 and Exercise 37 on p. 276, multiplication by each of these matrices preserves lengths of vectors and angles between them. → → → → Consider the effect of − y = A− x when applied to the set of unit vectors {− x | − x = 1}. This n 2 set could be described as a hypersphere in R – a generalization of a circle x1 + x22 = 1 in R2 and a sphere x21 + x22 + x23 = 1 in R3 . •
•
•
→ → Performing − x ∗ = V T− x results in another hypersphere of the same size (individual direc→ → → xi equals the ith tions may change from − x to − x ∗ , but lengths and angles do not). When − → − → − ∗ row of V , it is transformed into x = ei . → → → x ∗ scales each of the axes in the − x ∗ coordinate system by the respective The step − y ∗ = Σ− singular value. This yields a hyperellipsoid – a generalization of an ellipse in R2 and an → y∗ ellipsoid in R3 . The principal axes of this hyperellipsoid correspond to the axes of the − → − → − ∗ ∗ coordinate system (the first min(m, n) axes in the x and y coordinate systems are the same, except that the longer vectors are filled with zeros at the end). A zero singular value corresponds to “flattening” the hyperellipsoid in the respective direction. → → Finally, − y = U− y ∗ results in another hyperellipsoid of the same shape, but in a different → direction. Each axis of the − y ∗ coordinate system (principal axis of the hyperellipsoid) → transforms into a column of U in the − y coordinate system. → → → → y ∗ , and − y. The following diagram summarizes the relationship between the vectors − x,− x ∗, −
preserves lengths and angles
− → x ↓ → − x∗
→ →
− → y ↑ → − y∗
preserves lengths and angles
scales each principal axis by singular value
EXERCISES In Exercises 1–2, we are given the linear transformation the images resulting from applying x x y 1 1 1 F : R2 → R2 defined by F ( )=A = to the position vectors of each x2 x2 y2 point shown in the margin. Based on these plots, identify both singular values of each matrix A – all values in these exercises are known to be integers 0, 1, 2, 3, and 4.
292
Chapter 6 Orthogonality and Projections
1 e2
y2
1. a.
x2 F
2
x1
e1
y2
b.
F(e2)
y2 2
1 F(e1)
1
1
c.
1
y1
y1 1
F(e2)
F(e1) −1
y1 F(e1)
−1
−2
2
F(e2)=0 −1 −2
y2
d.
2 1 y1 F(e1)
5 -1
F(e2)
-2
2.
x2 1 e2
a.
y2
b.
2
F
F(e1)
1
1
x1
e1
y2
y1
1
y1
F(e1) 1
3 F(e2) -1
-1 F(e2)
-2
c.
y2
d. y2
2
1
1 y1 3 F(e1) -1 -2
F(e2)
y1 -2
F(e2)
F(e1) -1
2
Section 6.4 Introduction to Singular Value Decomposition
293
In Exercises 3–4, ellipse corresponding to each trans plot a parallelogram and an inscribed x1 x1 y1 formation F ( )=A = specified by the given matrix A. Based on x2 x2 y2 these plots, identify both singular values of each matrix A – all values in these exercises are known to be integers 0, 1, 2, 3, 4, and 5. −3 −1 3 −2 3 0 0 3 3. a. ; b. ; c. ; d. . 1 3 2 −3 0 −1 −5 0 1 −4 2 −2 1 1 0 2 4. a. ; b. ; c. ; d. . 4 −1 2 1 −1 −1 2 3 While generally left singular vectors are not uniquely defined, there is only one way to match each listed pair in Exercises 5–8 to different transformations. Make sure you pay attention to the order: the first vector →, must corlisted within each pair, − u 1 respond to the larger singular value σ1.
→ − → − 5. Match transformation of Exercise 1 with thepair of left singular vectors u1 , u 2 : each 1 −1 −2 1 √ √ √ √ 0 1 1 0 2 2 5 5 , ; ii. , ; iii. , ; iv. , . i. 1 1 1 2 √ √ √ √ 1 0 0 1 2 2 5 5 − → − → 6. Match of Exercise 2 withthe pair of left singular vectors u1, u2 : each transformation −2 1 1 2 1 −1 √ √ √ √ √ √ 1 0 5 5 5 5 2 2 , ; iii. , ; iv. , . i. , ; ii. −2 √1 √2 √ √1 √1 √1 0 1 5 5 5 5 2 2 − → − → 7. Match of Exercise 3 withthe pair of left singular vectors u1 , u 2 : each transformation −1 √1 √1 √1 √ 0 1 1 0 2 2 2 2 , ; iii. , ; iv. , . i. , ; ii. −1 √ √1 √1 √1 1 0 0 1 2 2 2 2 − → − →: 8. Match 2 transformation of Exercise 4 with thepair of left singular vectors u1 , u each i.
To complete Exercises 9 and 10, you will need the solutions of Exercises 1, 3, 5, and 7. (These can be found in Appendix A.)
√1 5 √2 5
,
√2 5 −1 √ 5
; ii.
√2 5 √1 5
,
√1 5 √2 5
; iii.
√1 2 √1 2
,
√1 2 −1 √ 2
; iv.
√1 2 −1 √ 2
,
√1 2 √1 2
9. For each transformation of Exercise 1, with the left singular vectors chosen in Exercise 5, − → → − select vectors v1, v2 : the corresponding pair of right singular −1 −2 √1 √ √1 √ −1 0 0 1 2 2 5 , ; iii. , ; iv. , −15 . i. , ; ii. −2 √1 √1 √ √ 0 −1 1 0 2 2 5 5 Set up the matrices U, Σ, and V, and then verify that the equality (104) holds. 10. For each transformation of Exercise 3, with the left singular vectors chosen in Exercise 7, − → → − select pair of right singular vectors v1 , v2: the corresponding −1 −1 −1 √1 √ √ √ −1 0 1 0 2 2 2 2 , −1 ; ii. , ; iii. , ; iv. , . i. −1 −1 √ √ √ √1 0 1 0 −1 2 2 2 2 Set up the matrices U, Σ, and V, and then verify that the equality (104) holds.
.
294
Chapter 6 Orthogonality and Projections
6.5 Chapter Review
.
Section 6.5 Chapter Review
,
295
296
7
Chapter 7 Eigenvalues and Singular Values
Eigenvalues and Singular Values
One of the leading themes in our study of linear algebra has been a thorough investigation of the relationship → → A− v =− w. → Sometimes, as in Section 1.4, we assume both the matrix A and the vector − v are given and → − analyze the vector w as the outcome of the multiplication (i.e., linear transformation). At other → times, as in Chapter 2, we consider the right-hand side vector − w to be given, along with the → coefficient matrix A, while focusing on solving for the unknown vector − v. In most of this chapter, we shall pursue yet another question: → → → Given a matrix A, which vectors − v produce results A− v =− w
(106) − → that point in the same direction as the original vector v ? → → If A is a rectangular m × n matrix with m = n, then the vectors − v and − w “live” in two → − → n m different vector spaces (R and R , respectively) making it impossible for v and − w to be in the same direction. Consequently, (106) requires A to be a square matrix.
7.1 Eigenvalues and Eigenvectors The fundamental question (106) leads to the following definition: D EFINITION (Eigenvalue and Eigenvector)
u
Au
→ Let A be an n × n matrix. A nonzero n-vector − v is called an eigenvector of A if → − → − Av =λv (107) → for some scalar λ, which is called an eigenvalue associated with the eigenvector − v.
v Av → − v is an eigenvector of A associated with an eigenvalue λ = 2 → → (since A− v = 2− v) → − while u is not a real eigenvector of → → A (since A− u = λ− u for any realvalued λ).
→ − → Note that if it weren’t for the restriction − v = 0 , then the above definition would lead to every → − → − square matrix having all real numbers as its eigenvalues (A 0 = x 0 for all x). Therefore, for an eigenvalue to be properly defined, eigenvectors must never equal zero (however, an eigenvalue itself can be zero).
E XAMPLE 7.1
Consider the linear transformation x 1 0 x F( )= . y 0 0 y
This is the orthogonal projection onto the x-axis in R2 already introduced in Section 1.4 in Example 1.19.
Section 7.1 Eigenvalues and Eigenvectors •
•
a 0
297
(positioned along the x-axis) will be transformed 1 0 into itself. Any such nonzero vector is an eigenvector of the matrix A = associ0 0 1 0 a a = (1) . ated with the eigenvalue λ = 1 since 0 0 0 0 0 While it may not be immediately obvious that any nonzero vector along the y-axis b is also an eigenvector of A, it follows from 1 0 0 0 = (0) . 0 0 b b These eigenvectors are all associated with the eigenvalue λ = 0. It should be clear that any vector
2 Another linear transformation in in R defined Section 1.4 x 0 −1 x F( )= y 1 0 y performed a counterclockwise rotation by 90 degrees (see Example 1.22).
E XAMPLE 7.2
No nonzero vector remains pointed in its original direction after this transformation! Consequently, we are unable to identify real eigenvalues or corresponding eigenvectors in this case.
“Eyeballing” eigenvalues and eigenvectors works fine for simple transformations that can be readily visualized. On the other hand, we will frequently want to find eigenvalues and eigenvectors for more complex transformations, often involving matrices of size 3 × 3 or larger. In the following subsection, we shall develop a mechanism to address such cases.
Procedure for finding eigenvalues and eigenvectors
→ Let us begin with the simplest possible case: for a 1 × 1 matrix A = [a] and − v = [x] , (107) becomes a scalar equation ax = λx. We can solve it for λ and x as follows: λx − ax
= 0
(λ − a)x
= 0
Since the definition requires x = 0, we have the eigenvalue λ = a that corresponds to an → eigenvector − v = [x] where x is any nonzero number. If A is an n × n matrix, equation (107) can be rewritten as → − → → λ− v − A− v = 0 but, unlike the 1 × 1 case discussed above, it cannot be written in the form → − → (λ − A)− v = 0. cannot evaluate!
298
Chapter 7 Eigenvalues and Singular Values − Instead, let us premultiply → v by In (the neutral element of matrix multiplication) so that the equation → − → → v − A− v = 0 λIn − can be legitimately rewritten as → − → (λIn − A)− v = 0. (108) → Recall from the definition that if − v is an eigenvector of A, it cannot be a zero vector. There→ fore, such a − v is a nontrivial solution of the homogeneous system (108). According to the equivalent conditions (see Appendix C), for the system to have such solutions, its coefficient matrix, λIn − A, must be singular, and det(λIn − A) = 0.
(109)
This leads us to the following. Procedure for finding eigenvalues and eigenvectors of a matrix A. 1. Find all eigenvalues λ = λ1 , λ2 , . . ., i.e., solutions of the equation det(λIn − A) = 0. 2. For each eigenvalue λi identified in the first step, find the nontrivial solutions → − → v = 0 , which are the corresponding eigenvectors of the system (λi In − A)− → − v.
Let us apply this procedure to the matrix 1 0 A= 0 0 discussed in Example 7.1. E XAMPLE 7.3
Step 1 of the procedure:
λI2 − A
= λ
1 0 0 1
−
1 0 0 0
λ−1 0 = , 0 λ det(λI2 − A) = (λ − 1)λ. The eigenvalues are λ1 = 1 and λ2 = 0. Step 2 of the procedure:
For λ1 = 1, the homogeneous system (108) becomes 0 0 x 0 = . 0 1 y 0 The coefficient matrix has the reduced row form echelon 0 1 0 0 so that x is arbitrary and y = 0. Therefore, the solution is x 1 =x . y 0 Eigenvectors associated with λ1 = 1 are all scalar multiples of vector).
1 0
(except for the zero
Section 7.1 Eigenvalues and Eigenvectors For λ2 = 0, the homogeneous system (108) becomes −1 0 x 0 = . 0 0 y 0 In this case, the coefficient matrix has ther.r.e.f. 1 0 0 0 with y arbitrary and x = 0. We obtain the solution x 0 =y . y 1
Eigenvectors associated with λ2 = 0 are all scalar multiples of
0 1
299
(except for the zero
vector).
Recall the 2 × 2 matrix A =
λ 1
det(λI2 − A) =
= λ2 + 1.
−1 λ E XAMPLE 7.4
of Example 7.2.
A has
Our focus in this book will be on real eigenvalues and eigenvectors. However, in some examples, it will be necessary to acknowledge the presence of complex eigenvalues and eigenvectors.
1
1
This confirms our conclusion from Example 7.2: the matrix has no real eigenvalues. It only has two complex eigenvalues: ±i.
Compare these findings to the result of Exercise 31 on p. 112 in Section 2.4.
2
0 −1 1 0
0
1
To gain further insight into the geometry involved in the last two examples, let us picture
2
→ a set of unit vectors − ui starting at the origin (using thin lines), along with → the corresponding vectors A− ui , each of which is translated to begin at the terminal point of → − u (we use thicker lines for these vectors).
•
1
“Eigenpicture” for Example 7.3. Hor- • izontal vectors are collinear with their i extensions (eigenvalue = 1), while The resulting “eigenpicture”18 is often sufficiently clear to help us “eyeball” the eigenvectors vertical vectors have zero extensions (eigenvalue = 0). (where the “thin” vector is collinear with its “thick” extension) and eigenvalues (signed ratio of the lengths of the two).
1
1
E XAMPLE 7.5
0
1
det(λI2 − A)
1 “Eigenpicture” for Example 7.4. No “thin” vectors are aligned with their extensions – no real eigenvalues.
The matrix A =
1 2 2 1 =
has det
λ−1 −2 −2 λ−1
= (λ − 1)2 − 4 = λ2 − 2λ − 3 = (λ − 3)(λ + 1).
18
The term “eigenpicture” and the idea described here have been introduced by Steven Schonefeld in the article “Eigenpictures: Picturing the Eigenvector Problem” in the College Mathematics Journal, Vol. 26, No. 4 (Sept. 1995), 316–319.
300
Chapter 7 Eigenvalues and Singular Values The two eigenvalues, λ = 3 and λ = −1, will now be scrutinized for the corresponding eigenvectors.
Compare these findings to the results in Exercise 32 on p. 112 in Section 2.4.
•
For λ1 = 3, the homogeneous system (108) has the coefficient matrix 2 −2 , 3I2 − A = −2 2 1 −1 whose reduced row echelon form is . The eigenvectors are a nontrivial solu0 0 tion of the system, i.e., vectors x y 1 = =y with y = 0. (110) y y 1
•
For λ2 = −1, the homogeneous system (108) has the coefficient matrix −2 −2 , −1I2 − A = −2 −2 1 1 with r.r.e.f. . The eigenvectors are 0 0 x −y −1 = =y with y = 0. y y 1
2 e2
v2 2
0
v1 e1
2
The eigenvalues and eigenvectors in the previous two examples were initially investigated geometrically, with an algebraic confirmation provided afterwards. The most recent example is more typical of how we are going to solve such problems in that the solution was obtained by purely algebraic means. However, such solutions still have important geometric meaning, as illustrated in the margin.
u 2 F Ae1
2
Av1
F(u) = Au
− The upper picture contains unit vectors → u in R2 , whereas the lower picture features their → − images under the transformation, A u . − Two vectors are made thicker than others: in the upper picture, these are → v1 =
Ae2 − → v2 =
2
0
2 Av2
−1 √ 2 √1 2
√1 2 √1 2
and
− . Since → v1 is among the eigenvectors in (110), the lower picture contains the
→ → → v1 and exactly three times as long as − v1 , just as we expected. image A− v1 in the same direction as − → − → − → − Likewise, the image of v2 , A v2 , is in the opposite direction to v2 , with equal magnitude (again, no surprise there).
2
→ → Vectors in directions other than − v1 or − v2 are transformed to images that are not aligned with the → → original vectors anymore (e.g., we show − e1 and − e2 with their images). If we wanted to use an “eigenpicture” to illustrate this example, then the portion of the picture near the eigenvectors with λ = −1 would not look clear, as the corresponding extensions would overlap the original vectors (with reverse directions). One way to help avoid this is to modify → → → the eigenpicture so that whenever A− ui and − ui form an obtuse angle, A− ui is translated so that → its terminal (rather than initial) point coincides with the terminal point of − ui .
Section 7.1 Eigenvalues and Eigenvectors
301
3 2 1
3
2
1
0
1
2
3
1 2 3
Characteristic polynomial and characteristic equation
For any 2 × 2 matrix A =
a11 a21
a12 a22
det(λI2 − A) =
,
det(
λ − a11 −a21
−a12 λ − a22
)
=
(λ − a11 ) (λ − a22 ) − a12 a21
=
λ2 − (a11 + a22 )λ + a11 a22 − a12 a21
is a quadratic polynomial in the variable λ whose leading coefficient is 1. If A is a 3 × 3 matrix, then expanding det(λI3 − A) along the first row yields ⎤ ⎡ −a12 −a13 λ − a11 ⎥ ⎢ det(⎣ −a21 λ − a22 −a23 ⎦) = (λ − a11 ) A11 + (−a12 )A12 + (−a13 )A13 −a31 −a32 λ − a33 λ2 +··· degree < 3 = λ3 + (terms of degree < 3). Repeating this argument for matrices of size 4 × 4, 5 × 5, etc., it can be shown that for an n × n matrix A, det(λIn − A) is a polynomial of degree n with leading coefficient 1 (see Exercise 32 for an alternative justification of this claim). This polynomial is called the characteristic polynomial of A, and the equation (109) is referred to as the characteristic equation of A. From algebra, the characteristic equation of an n×n matrix A has precisely n roots. This count includes •
multiple roots (e.g., (λ − 3)2 = 0 has a double root 3, so that 3 would be counted twice) and
•
complex roots (e.g., λ2 + 16 = 0 has two complex roots λ = ±4i). Every root λi of the characteristic equation corresponds to a factor (λ − λi )m of the characteristic polynomial. We refer to the exponent m as the algebraic multiplicity of the eigenvalue λi .
302
Chapter 7 Eigenvalues and Singular Values
Eigenspaces
→ → If − v and − w are eigenvectors associated with the same eigenvalue λ, then → → → → → → → → A(− v +− w ) = A− v + A− w = λ− v + λ− w = λ (− v +− w)
(111)
and for any scalar c,
→ → → → A(c− v ) = c(A− v ) = c(λ− v ) = λ(c− v ). (112) It might be tempting to proclaim that “the set of eigenvectors associated with λ forms a subspace of Rn ”; however, it would also be wrong. This is because, upon closer examination, → − we should notice that such a set does not contain 0 (eigenvectors must be nonzero), which disqualifies it from being a valid subspace for Rn . It would be a shame to give up this easily. Instead, let us construct a set containing •
all eigenvectors associated with λ and
•
the zero vector. Such a set (which still satisfies (111) and (112)) is a well-defined subspace of Rn . We shall call it an eigenspace associated with the eigenvalue λ. The dimension of this space is called the geometric multiplicity of λ.
E XAMPLE 7.6
Find all eigenvalues and a basis for each associated eigenspace for the matrix ⎡ ⎢ ⎢ A=⎢ ⎢ ⎣
1 0 1 1
0 1 0 0
⎤ 0 0 ⎥ 5 −10 ⎥ ⎥. 2 0 ⎥ ⎦ 0 3
Step 1. The characteristic polynomial: ⎡
λ−1 0 0 0 ⎢ ⎢ 0 λ−1 −5 10 det(λI − A) = det ⎢ ⎢ −1 0 λ − 2 0 ⎣ −1 0 0 λ−3 ⎡ λ−1 −5 ⎢ 2 = (λ − 1)(−1) det ⎣ 0 λ−2
LA TOOLKIT .COM IL 26748372846 NG 03571894251 EE 71028335189 AB 25729491092 RR 17398108327 + A 39163728081 The toolkit has no separate module for finding eigenvalues or eigenvectors. However, once you’ve solved the characteristic equation, then, for each eigenvalue you found, you can form the coefficient matrix of the homogeneous system λI −A and apply the module “finding a basis of a null space of a matrix” to complete the process of finding a basis for each eigenspace.
0
0
= (λ − 1)2 (λ − 2)(λ − 3). Eigenvalues: λ1 = 1 (algebraic multiplicity 2), λ2 = 2, λ3 = 3. Step 2. For the eigenvalue λ1 = 1, the system (108) becomes ⎡ ⎢ ⎢ ⎢ ⎢ ⎣
0 0 −1 −1
0 0 0 0 −5 10 0 −1 0 0 0 −2
⎤⎡ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎦⎣
x1 x2 x3 x4
⎤
⎡
⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎦ ⎣
0 0 0 0
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
⎤ ⎥ ⎥ ⎥ ⎥ ⎦ 10
⎤
⎥ 0 ⎦ λ−3
Section 7.1 Eigenvalues and Eigenvectors The coefficient matrix has the reduced ⎡ row echelon form:⎤ 1 0 0 2 ⎥ ⎢ ⎢ 0 0 1 −2 ⎥ ⎥. ⎢ ⎢ 0 0 0 0 ⎥ ⎦ ⎣ 0 0 0 0 x2 and x4 are arbitrary, while x1 = −2x4 and x3 = 2x4 : ⎤ ⎤ ⎡ ⎡ ⎡ 0 −2 x1 ⎥ ⎥ ⎢ ⎢ ⎢ ⎢ 1 ⎥ ⎢ 0 ⎢ x2 ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ ⎢ x ⎥ = x2 ⎢ 0 ⎥ + x4 ⎢ 2 ⎦ ⎣ ⎣ ⎣ 3 ⎦ 0 1 x4 ⎡ ⎤ ⎡ ⎤ −2 0 ⎢ ⎥ ⎢ ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥ ⎢ ⎥ ⎢ ⎥}. A basis for the eigenspace: {⎢ ⎥ , ⎢ ⎥ ⎣ 0 ⎦ ⎣ 2 ⎦ 0 1 For the eigenvalue λ2 = 2, the system (108) yields ⎤⎡ ⎤ ⎡ ⎡ x1 1 0 0 0 ⎥⎢ ⎥ ⎢ ⎢ ⎢ 0 1 −5 10 ⎥ ⎢ x2 ⎥ ⎢ ⎥⎢ ⎥=⎢ ⎢ ⎢ ⎥ ⎢ ⎢ −1 0 0 0 ⎥ ⎦ ⎣ x3 ⎦ ⎣ ⎣ −1 0 0 −1 x4 The coefficient matrix has the r.r.e.f. ⎡ ⎤ 1 0 0 0 ⎥ ⎢ ⎢ 0 1 −5 0 ⎥ ⎥. ⎢ ⎥ ⎢ 0 0 0 1 ⎦ ⎣ 0 0 0 0 x3 is arbitrary, while x1 = x4 = 0 and x2 = 5x3 : ⎡ ⎤ ⎡ ⎤ x1 0 ⎢ ⎥ ⎢ ⎥ ⎢ x2 ⎥ ⎢ 5 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ x ⎥ = x3 ⎢ 1 ⎥ . ⎣ 3 ⎦ ⎣ ⎦ x4 0 ⎡ ⎤ 0 ⎢ ⎥ ⎢ 5 ⎥ ⎥ A basis for the eigenspace: {⎢ ⎢ 1 ⎥}. ⎣ ⎦ 0 ⎡ ⎢ ⎢ For the eigenvalue λ3 = 3, a basis for the eigenspace is {⎢ ⎢ ⎣
0 −5 0 1
303
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
0 0 0 0
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
⎤ ⎥ ⎥ ⎥}. (Check!) ⎥ ⎦
In the example above the algebraic multiplicities matched the corresponding geometric multiplicities for each eigenvalue. This need not be true in general, as shall be illustrated later in this chapter.
304
Chapter 7 Eigenvalues and Singular Values
Solving characteristic equations
Sometimes, the characteristic equation may become too difficult to solve by hand. If the matrix in question is sufficiently large (e.g., 6×6), which is often the case in real-life applications, then it may not be possible to find an exact solution of the corresponding characteristic equation. However, for many smaller and simpler problems, standard algebra is powerful enough to find the solutions. We are about to mention some techniques that will help us solve such “reasonable” equations. Before we do so, however, let us make one thing clear: solving a nonlinear equation f (λ) = 0 is a very different matter from solving linear equations or linear systems. In trying to succeed here, your best hope is to extract factors out of the function f (λ). Remember: if you manage to rewrite f (λ) as a product, then the equation (factor1 )(factor2 ) = 0 is satisfied when either factor1 = 0 or factor2 = 0. •
Whichever approach you use to evaluate det(λI − A) (whether it’s the specific 2 × 2 or 3 × 3 formulas, row operations, or cofactor expansion), common factors involving λ will frequently emerge spontaneously. ⎡
E XAMPLE 7.7
⎤ −1 4 0 ⎢ ⎥ The characteristic polynomial of A = ⎣ 3 −2 0 ⎦ is 2 0 4 ⎡ ⎤ λ+1 −4 0 ⎢ ⎥ det(λI3 − A) = det ⎣ −3 λ+2 0 ⎦. −2
Expanding along the third column, we obtain
0
λ−4
λ+1 −4 . det(λI3 − A) = (λ − 4) det −3 λ+2 Your next step will be to calculate the 2 × 2 determinant, but the key thing is to keep the factor (λ − 4) separate: 1 2 (λ − 4) [(λ + 1) (λ + 2) − 12] = (λ − 4) λ2 + 3λ − 10 rather than multiplying it out: (λ − 4) λ2 + 3λ − 10 = λ3 − λ2 − 22λ + 40 and instantly making an easy problem quite difficult! 2 1 Instead, factoring out the quadratic in the brackets in (λ − 4) λ2 + 3λ − 10 we arrive at (λ − 4)(λ + 5)(λ − 2), leading to the eigenvalues λ1 = 4, λ2 = −5, and λ3 = 2.
WRONG WAY
•
Use the quadratic formula if you have a quadratic factor λ2 + bλ + c which you are unable to factor using one of the shortcuts.
•
If you have a cubic, or higher degree polynomial, which you don’t seem to be able to extract factors from in any other way, then the following result might be helpful. If all coefficients on the left-hand side of the equation λn + an−1 λn−1 + · · · + a1 λ + a0 = 0 are integers, then all integer solutions of the equation are among the factors of the free term a0 . This is because, in the factored form of the same equation, (λ − b1 ) · · · (λ − bn ) = 0,
Section 7.1 Eigenvalues and Eigenvectors
305
we have a0 = (−1)n b1 · · · bn . (This result can also be extended to cover rational coefficients and rational solutions but does not work for irrational solutions, e.g., in λ2 − 2 = 0.) ⎡
E XAMPLE 7.8
⎤ −1 −1 2 ⎢ ⎥ The matrix A = ⎣ −1 −1 −2 ⎦ has the characteristic polynomial 2 −2 2 ⎡
⎤ λ+1 1 −2 ⎢ ⎥ det(λI − A) = det ⎣ 1 λ+1 2 ⎦ = λ3 − 12λ − 16. (Check!) −2 2 λ−2 The free term a0 = −16 has the following integer factors: 1, −1, 2, −2, 4, −4, 8, −8, 16, −16. Take a deep breath, and start checking these, one at a time, until we find one that works: det((1)I − A) det((−1)I − A) det((2)I − A) det((−2)I − A)
= = = =
(1)3 − 12(1) − 16 = −27 = 0, (−1)3 − 12(−1) − 16 = −5 = 0, (2)3 − 12(2) − 16 = −32 = 0, (−2)3 − 12(−2) − 16 = 0.
We have found one eigenvalue, λ = −2, which means that the characteristic polynomial det(λI − A) must contain a factor (λ + 2). Let us use polynomial division to determine the quotient:
(λ + 2)
λ2 λ3 −λ3
−
2λ
− − +
2
2λ 2λ2 2λ2
− −
8 12λ
− + − +
12λ 4λ 8λ 8λ
−
16
−
16
− +
16 16 0
We have det(λI − A)
= λ3 − 12λ − 16 = (λ + 2)(λ2 − 2λ − 8) = (λ + 2)(λ + 2)(λ − 4). There is a double eigenvalue (algebraic multiplicity 2) λ1 = −2 and a single eigenvalue (algebraic multiplicity 1) λ2 = 4.
Twelve equivalent statements
Recall that when we added statement 5 (see p. 125) we stipulated that det A = 0 is equivalent to A being singular. Since det(0I − A) = 0 if and only if det A = 0, we can add a final equivalent statement to each list.
306
Chapter 7 Eigenvalues and Singular Values 12 Equivalent Statements For an n × n matrix A, the following statements are equivalent. 1. A is nonsingular. 2. A is row equivalent to In . → − → − → 3. For every n-vector b , the system A− x = b has a unique solution. → − → 4. The homogeneous system A− x = 0 has only the trivial solution. 5. det A = 0. 6. The columns of A are linearly independent. 7. The rows of A are linearly independent. 8. rank A = n. 9. nullity A = 0. 10. If A is a matrix of a linear transformation, then the transformation is invertible. 11. If A is a matrix of a linear transformation, then the transformation is one-to-one and onto. 12. Zero is not an eigenvalue of A. 12 Equivalent “Negative” Statements For an n × n matrix A, the following statements are equivalent. -1. A is singular. -2. A is not row equivalent to In . → − → − → -3. For some n-vector b , the system A− x = b has either no solution or many solutions. → − → -4. The homogeneous system A− x = 0 has nontrivial solutions. -5. det A = 0. -6. The columns of A are linearly dependent. -7. The rows of A are linearly dependent. -8. rank A < n. -9. nullity A > 0. -10. If A is a matrix of a linear transformation, then the transformation is not invertible. -11. If A is a matrix of a linear transformation, then the transformation is neither oneto-one nor onto. -12. Zero is an eigenvalue of A.
Section 7.1 Eigenvalues and Eigenvectors
EXERCISES
307
In Exercises 1–4, refer to the graphs in the margin (assuming all vectors begin at the
Au1 u3 Au3
origin), and choose the best answer from (a)–(g): (a) the given vector is an eigenvector of A associated with λ = 1, (b) the given vector is an eigenvector of A associated with λ = −1, (c) the given vector is an eigenvector of A associated with λ = 2, (d) the given vector is an eigenvector of A associated with λ = −2, (e) the given vector is an eigenvector of A associated with λ = 12 , (f) the given vector is an eigenvector of A associated with λ = −1 2 , (g) the given vector is not an eigenvector of A associated with a real eigenvalue. →. 1. The vector − u 1 →. 2. The vector − u 2 →. 3. The vector − u 3
u1
→. 4. The vector − u 4
u2
5. For each “eigenpicture” corresponding to a 2 × 2 matrix A, choose the correct statement:
a.
b.
c.
Au4
u4
Au2
(i) A has no real eigenvalue. (ii) A has only one real eigenvalue, and the corresponding eigenspace has dimension 1. (iii) A has only one real eigenvalue, and the corresponding eigenspace has dimension 2. (iv) A has two distinct real eigenvalues. 6. Repeat Exercise 5 for the following “eigenpictures”
a.
b.
c.
In Exercises 7–8, form the characteristic polynomial, and calculate all eigenvalues of the given matrix.
308
Chapter 7 Eigenvalues and Singular Values 7. a.
8. a.
2 4 1 −1
5 1 3 3
⎡
;
⎤ 0 2 0 ⎢ ⎥ b. ⎣ 1 −1 0 ⎦. 0 2 −3 ⎡
;
⎤ 2 2 −2 ⎢ ⎥ b. ⎣ 0 −2 0 ⎦. −1 2 3
9. Find bases for all eigenspaces of the matrix in Example 7.7. (Use the eigenvalues that were found in the example.) 10. Find bases for all eigenspaces of the matrix in Example 7.8. (Use the eigenvalues that were found in the example.) In Exercises 11–16, find all eigenvalues and bases for the corresponding eigenspaces for the given matrix. 4 2 6 5 1 −2 11. a. ; b. ; c. . 0 1 1 2 8 9 −1 0 −1 3 2 −6 12. a. ; b. ; c. . 6 −1 2 4 −5 −5 ⎡
⎤ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ 0 0 0 2 0 0 1 −4 4 0 1 0 ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ 13. a. ⎣ 0 4 2 ⎦; b. ⎣ 0 −2 −2 ⎦; c. ⎣ 2 2 4 ⎦ ; d. ⎣ 1 0 −2 ⎦ . −1 0 −3 0 −2 1 0 2 −1 1 −1 0 ⎡
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −3 −1 0 1 0 4 −4 2 −1 3 4 −1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 14. a. ⎣ 3 1 0 ⎦ ; b. ⎣ 0 9 0 ⎦ ; c. ⎣ 4 −3 −3 ⎦ ; d. ⎣ 3 3 2 ⎦. −2 1 3 2 0 8 −4 2 2 −9 2 9 ⎡ ⎤ 1 0 0 0 ⎥ ⎢ ⎢ 0 0 0 0 ⎥ ⎢ ⎥. 15. ⎢ ⎥ ⎣ 3 3 −1 1 ⎦ −2 3 0 0 ⎤ ⎡ 0 −1 −1 0 ⎥ ⎢ ⎢ −1 1 0 −1 ⎥ ⎥. 16. ⎢ ⎢ −2 1 2 1 ⎥ ⎦ ⎣ 2 −1 −2 −1
T/F?
In Exercises 17–24, decide whether each statement is true or false. Justify your answer. 17. If an n × n matrix A is nonsingular, then it has no more than n − 1 nonzero eigenvalues. 18. If A is the zero n × n matrix, then the only eigenvalue of A is zero. → → x ) = A− x doubles every vertical 19. If the linear transformation F : R2 → R2 defined by F (− vector, then λ = 2 is an eigenvalue of A. → → 20. If A is an n × n matrix such that A− u =− u , then λ = 1 is an eigenvalue of A.
Section 7.1 Eigenvalues and Eigenvectors
309
21. If det(A + I) = 0, then λ = 1 is an eigenvalue of A. 22. If A is a 5×5 matrix such that the rank of 3I5 −A is 2, then the dimension of the eigenspace associated with λ = 3 is 2. 23. If A is a scalar n × n matrix, then it has only one eigenvalue of algebraic multiplicity n. 24. If A is an n × n matrix, then AT has the same eigenvalues as A.
25. * Recall the definition of the n × n exchange matrix Jn introduced in Exercise 22 on p. 129. 0 1 a. Find all eigenvalues and corresponding eigenvectors of J2 = . 1 0 ⎡ ⎤ 0 0 1 ⎢ ⎥ b. Find all eigenvalues and corresponding eigenvectors of J3 = ⎣ 0 1 0 ⎦ . 1 0 0 c. Generalize the results from parts a and b to find all eigenvalues and corresponding eigenvectors of Jn for any positive integer n. 26. * Show that the eigenvalues of any lower triangular (or upper triangular) matrix A are its main diagonal entries. 27. * Given the polynomial p(λ) = λn + an−1 λn−1 + an−2 λn−2 + · · · + a2 λ2 + a1 λ + a0 , (113) the matrix ⎡ ⎤ 0 0 0 · · · 0 0 −a0 ⎢ ⎥ ⎢ 1 0 0 ⎥ · · · 0 0 −a1 ⎢ ⎥ ⎢ 0 1 0 ⎥ · · · 0 0 −a2 ⎢ ⎥ ⎢ . . ⎥ .. .. .. ⎢ . . .. ⎥ . (114) B=⎢ . . ⎥ . . . ⎢ ⎥ ⎢ ⎥ .. ⎢ 0 0 0 . 0 0 −an−3 ⎥ ⎢ ⎥ ⎢ 0 0 0 · · · 1 0 −an−2 ⎥ ⎣ ⎦ 0 0 0 · · · 0 1 −an−1 is called the companion matrix of p. ⎤ ⎡ 0 0 16 ⎢ ⎥ a. Verify that the characteristic polynomial of ⎣ 1 0 12 ⎦ is the same polynomial as 0 1 0 in Example 7.8 on p. 305. b. By induction, prove that for the polynomial p(λ) of (113) and its companion matrix B of (114) we have det(λIn − B) = p(λ). 28. * Show that if λ is an eigenvalue of a nonsingular matrix A, then How are the corresponding eigenspaces related?
1 λ
is an eigenvalue of A−1 .
29. * Show that if λ is an eigenvalue of a matrix A, then a. λ2 is an eigenvalue of A2 , b. kλ is an eigenvalue of kA. 30. * Suppose the n × n matrices A and B both have the same λ as an eigenvalue. Show that neither A + B nor AB is guaranteed to also have λ an eigenvalue. (Hint: Try finding 2 × 2 diagonal matrices A and B that will serve as a counterexample.)
310
Chapter 7 Eigenvalues and Singular Values − 31. * Suppose the matrix A has an eigenvalue λ corresponding to an eigenvector → x , and the → − matrix B has an eigenvalue μ corresponding to the same eigenvector x . → a. Show that for any numbers p and q, pA + qB also has − x as an eigenvector. What eigenvalue does it correspond to? → b. Show that AB and BA both have − x as their eigenvector as well. What are the corresponding eigenvalues? 32. * Exercise 27 on p. 129 introduced the following determinant formula (35): det A = σ i1 ...in a1i1 a2i2 · · · anin . Apply this formula to the determinant of λIn − A (where A is an n × n matrix) to show that the characteristic polynomial of A is a polynomial of degree n. Show that of the n! terms in this summation, the only term containing λn is (λ − a11 ) · · · (λ − ann ), therefore justifying that the coefficient of λn is 1. 33. * Consider an n × n matrix A. a. Use Theorem 3.6 to show that det(λIn − A) = ⎤ ⎡ ⎡ λ 0 ··· 0 −a11 ⎥ ⎢ ⎢ λ − a · · · −a −a ⎥ ⎢ ⎢ −a21 21 22 2n ⎥ + det ⎢ det ⎢ . . . .. .. ⎥ ⎢ ⎢ .. .. .. . ⎦ ⎣ ⎣ . −an2 · · · λ − ann −an1 −an1
−a12 λ − a22 .. . −an2
··· ··· .. . ···
−a1n −a2n .. . λ − ann
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
(115) b. Use cofactor expansion of the first determinant on the right-hand side above to show that it must contain λ as a factor (i.e., it is a polynomial in λ with zero free term). c. Show that det(λIn − A) = λq(λ) + det(−A) where q(λ) is a polynomial in λ. (Hint: Use Theorem 3.6 to break up the last determinant in (115) into two terms using the second row, etc.) d. Use the result of part c to prove that (116) det A = λ1 λ2 · · · λn . 34. * A k × k matrix
⎡
c 1 0 ··· ⎢ ⎢ 0 c 1 ··· ⎢ ⎢ .. ⎢ 0 0 c . ⎢ ⎢ .. .. .. . . ⎢ . . . . ⎢ ⎢ ⎣ 0 0 0 ··· 0 0 0 ···
c
⎤ 0 ⎥ 0 ⎥ ⎥ ⎥ 0 ⎥ ⎥ .. ⎥ . ⎥ ⎥ ⎥ 1 ⎦
0
c
0 0 0 .. .
is called a Jordan block. a. Show that λ = c is the only eigenvalue of this matrix. b. Find the eigenspace corresponding to λ = c, and show that this eigenvalue has geometric multiplicity 1.
Section 7.2 Diagonalization
311
7.2 Diagonalization D EFINITION We say a matrix A is similar to a matrix B if there exists a nonsingular matrix P such that B = P −1 AP . If a matrix A is similar to a diagonal matrix, then A is said to be diagonalizable (i.e., A can be diagonalized). If A is similar to B, then the characteristic polynomial of B, det(λI − B)
=
det(λI − P −1 AP )
= det(P −1 (λI) P − P −1 AP ) = det(P −1 (λI − A)P ) = det P −1 det(λI − A) det P = det(λI − A), is the same as the characteristic polynomial of A. Therefore, the two matrices share the same eigenvalues. T HEOREM 7.1 Let A be an n × n matrix. A is diagonalizable if and only if it has n linearly independent eigenvectors. P ROOF Part I (⇒) By definition, for A to be diagonalizable, a matrix P and a diagonal matrix D must exist such that P −1 AP = D so that AP = P D. We can specify columns of P and entries of D in this equation ⎤ ⎤ ⎡ ⎤⎡ ⎡ 0 d1 | ··· | | ··· | ⎥ ⎥ ⎢ − ⎥⎢ ⎢ → .. → ··· − ⎥ ⎢ (117) A⎣ − u→ u→ u1 · · · − . n ⎦ = ⎣ u1 n ⎦⎣ ⎦ | ··· | | ··· | 0 dn − → − → where the column vectors u , . . . , u are linearly independent (since P is nonsingular). 1
n
The matrix equation (117) can be rewritten as ⎡ ⎤ ⎡ ⎤ | ··· | | ··· | ⎢ − ⎥ ⎢ − ⎥ → · · · A− → u→ u→ dn − ⎣ Au 1 n ⎦ = ⎣ d1 u1 · · · n ⎦, | ··· | | ··· | where the equality must hold between the corresponding columns → → ui for i = 1, . . . , n, A− ui = di −
(118)
(119)
→, . . . , − which means A has n linearly independent eigenvectors − u u→ 1 n. Part II (⇐) →, . . . , − Assuming A has n linearly independent eigenvectors − u u→ 1 n associated with eigenvalues d1 , . . . , dn , we can write (119). Rewriting these in the matrix form, we obtain (118) and (117);
312
Chapter 7 Eigenvalues and Singular Values i.e., AP = P D. Since the columns of the n × n matrix P are linearly independent, P is nonsingular, leading to P −1 AP = D.
T HEOREM 7.2 If λ1 , . . . , λk are distinct real eigenvalues of the matrix A, then the corre→, . . . , − → are linearly independent. sponding eigenvectors − u u 1 k P ROOF →, . . . , − → are linearly dependent. Since − → cannot be the zero vector (from the definiAssume − u u u 1 k 1 →, . . . , − → tion of an eigenvector), it follows from Theorem 4.12 that at least one of the vectors − u u 2 k → − can be expressed as a linear combination of the preceding vectors. Let uj be the first such → →, . . . , − u− vector (therefore, − u 1 j−1 are linearly independent): → − → + ··· + c − −→ uj = c 1 − u (120) 1 j−1 uj−1 . Premultiplying from the left by A on both sides, → → + ··· + c − −→ A− uj = A(c1 − u 1 j−1 uj−1 ), and using properties 2 and 4 of Theorem 1.5, we obtain → →) + · · · + c (A− → A− uj = c1 (A− u u− 1 j−1 j−1 ). →, . . . , − → Since − u uj are eigenvectors of A corresponding to λ1 , . . . , λj , we can write 1 → → + ··· + c λ − −→ λj − uj = c 1 λ 1 − u 1 j−1 j−1 uj−1 . Multiplying both sides of (120) by λj yields → → + ···+ c λ − −→ λj − uj = c1 λj − u 1 j−1 j uj−1 . Let us subtract the last two equations, combining the like terms: → − → + · · · + c (λ −−→ 0 = c1 (λ1 − λj )− u 1 j−1 j−1 − λj )uj−1 . − → − − → Since u1 , . . . , uj−1 are linearly independent and λ’s are distinct, this implies c1 = · · · = → − → cj−1 = 0. However, in this case (120) would imply − uj = 0 , which contradicts the assumption → that − u is an eigenvector. j
→, . . . , − → are linearly dependent resulted in a contradiction. Therefore, we Assuming that − u u 1 k − → → are linearly independent instead. must conclude that u1 , . . . , − u k
If an n × n matrix A has n distinct and real eigenvalues, then it follows from Theorem 7.2 that A has n linearly independent eigenvectors. By Theorem 7.1, this leads us to the conclusion that such a matrix A can be diagonalized. T HEOREM 7.3
If all the roots of det(λI − A) are real and distinct, then A is diagonalizable.
⎡
E XAMPLE 7.9
⎤ 4 1 −3 ⎢ ⎥ We will attempt to diagonalize A = ⎣ −2 0 2 ⎦. 1 1 0
The characteristic polynomial is
⎡
⎤ λ − 4 −1 3 ⎢ ⎥ det(λI3 − A) = det ⎣ 2 λ −2 ⎦ −1 −1 λ = λ3 − 4λ2 + 3λ = λ(λ2 − 4λ + 3) = λ (λ − 3) (λ − 1) .
Section 7.2 Diagonalization
313
The eigenvalues are λ1 = 0, λ2 = 3, and λ3 = 1, all with algebraic multiplicities 1. Let us find eigenvectors for each eigenvalue. •
→ − → For λ1 = 0, the homogeneous system (0I3 − A)− x = 0 has the coefficient matrix ⎡ ⎤ ⎤ ⎡ −4 −1 3 0 −1 1 ⎢ ⎥ ⎥ ⎢ 0 −2 ⎦ whose reduced row echelon form is ⎣ 0 1 1 ⎦. ⎣ 2 −1 −1 0 0 0 0 ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ 1 x3 x1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ The eigenvectors are ⎣ x2 ⎦ = ⎣ −x3 ⎦ = x3 ⎣ −1 ⎦ with x3 = 0. x3
•
1 − → → For λ3 = 1, the homogeneous system (1I3 − A)− x = 0 has the coefficient matrix ⎡ ⎤ ⎤ ⎡ −3 −1 3 0 −1 1 ⎢ ⎥ ⎥ ⎢ 1 −2 ⎦ whose r.r.e.f. is ⎣ 0 1 0 ⎦. ⎣ 2 −1 −1 1 0 0 0 ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ 1 x3 x1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ The eigenvectors are ⎣ x2 ⎦ = ⎣ 0 ⎦ = x3 ⎣ 0 ⎦ with x3 = 0. x3
•
1
x3
→ − − For λ2 = 3, the system (3I3 − A)→ x = 0 has the coefficient matrix ⎡ ⎤ ⎤ ⎡ −1 −1 3 0 −7 1 ⎢ ⎥ ⎥ ⎢ 3 −2 ⎦ with the r.r.e.f. ⎣ 0 1 4 ⎦. ⎣ 2 −1 −1 3 0 0 0 ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ 7 7x3 x1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ The eigenvectors are ⎣ x2 ⎦ = ⎣ −4x3 ⎦ = x3 ⎣ −4 ⎦ with x3 = 0.
x3
x3
x3
1
⎡
Column positions in matrices P and D must ⎤ ⎡ match: | | ··· | ⎥ ⎢ → − P =⎣ − v2 · · · − v→ v1 → n ⎦
⎡ ⎢ ⎢ D=⎢ ⎢ ⎣
|
|
···
|
···
λ1 0 .. . 0
0 λ2 .. . 0
··· ··· .. .
0 0 .. . λn
···
⎤ ⎡ ⎤ ⎡ ⎤ 1 7 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ According to Theorem 7.2, eigenvectors ⎣ −1 ⎦ , ⎣ −4 ⎦ , and ⎣ 0 ⎦ are linearly indepen1 1 1 dent, as they correspond to distinct eigenvalues. This permits us to use them as columns of the matrix P, which will diagonalize A : P −1 AP = D.⎡ ⎡ ⎤ ⎤ 1 7 1 0 0 0 ⎢ ⎥ ⎢ ⎥ E.g., P = ⎣ −1 −4 0 ⎦ will correspond to D = ⎣ 0 3 0 ⎦ . 1
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
1 1
0 0 1
Let us verify that P −1 AP = D; i.e., AP = P D : ⎡
⎤⎡ ⎤ ⎡ 4 1 −3 1 7 1 0 21 ⎢ ⎥⎢ ⎥ ⎢ AP = ⎣ −2 0 2 ⎦ ⎣ −1 −4 0 ⎦ = ⎣ 0 −12 1 1 0 1 1 1 0 3 ⎡
⎤⎡ ⎤ ⎡ 1 7 1 0 0 0 0 ⎢ ⎥⎢ ⎥ ⎢ P D = ⎣ −1 −4 0 ⎦ ⎣ 0 3 0 ⎦ = ⎣ 0 1 1 1 0 0 1 0
⎤ 1 ⎥ 0 ⎦. 1
⎤ 21 1 ⎥ −12 0 ⎦ . 3 1
314
Chapter 7 Eigenvalues and Singular Values Note that the order of columns (eigenvectors) in P agrees with the order in which the diagonal entries (eigenvalues) appear in D. Any rearrangement in one of these matrices always triggers a corresponding rearrangement in the other one. ⎤ ⎤ ⎡ 1 7 1 1 0 0 ⎢ ⎥ ⎢ ⎥ E.g., changing the matrix P to ⎣ 0 −4 −1 ⎦ would mean that D becomes ⎣ 0 3 0 ⎦ . 1 1 1 0 0 0 ⎡
⎡ E XAMPLE 7.10
⎢ ⎢ Eigenvalues and eigenvectors of the matrix A = ⎢ ⎢ ⎣
1 0 1 1
0 1 0 0
⎤ 0 0 ⎥ 5 −10 ⎥ ⎥ were 2 0 ⎥ ⎦ 0 3
found in Example 7.6: ⎡ •
⎢ ⎢ The eigenspace associated with the eigenvalue λ1 = 1 has a basis ⎢ ⎢ ⎣ ⎤ 0 ⎥ ⎢ ⎢ 5 ⎥ ⎥. ⎢ The eigenspace corresponding to λ2 = 2 has a basis ⎢ ⎥ 1 ⎦ ⎣ 0 ⎤ ⎡ 0 ⎥ ⎢ ⎢ −5 ⎥ ⎥ The eigenvalue λ3 = 3 has a basis for the eigenspace ⎢ ⎢ 0 ⎥. ⎦ ⎣ 1 ⎡
•
•
0 1 0 0
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎦ ⎣
−2 0 2 1
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
Forming the matrix P out of the linearly independent eigenvectors, ⎤ ⎡ 0 −2 0 0 ⎥ ⎢ ⎢ 1 0 5 −5 ⎥ ⎥, ⎢ P =⎢ 2 1 0 ⎥ ⎦ ⎣ 0 0 1 0 1 we can diagonalize the matrix A as follows: ⎡ ⎤ 1 0 0 0 ⎢ ⎥ ⎢ 0 1 0 0 ⎥ −1 ⎢ ⎥ = D. P AP = ⎢ ⎥ ⎣ 0 0 2 0 ⎦ 0 0 0 3 ⎡ ⎢ ⎢ Check: AP = ⎢ ⎢ ⎣
⎤⎡ ⎤ ⎡ ⎤ 1 0 0 0 0 −2 0 0 0 −2 0 0 ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎢ 0 1 5 −10 ⎥ 0 5 −5 ⎥ 0 10 −15 ⎥ ⎥⎢ 1 ⎥=⎢ 1 ⎥. ⎢ ⎢ 1 0 2 0 ⎥ 2 1 0 ⎥ 2 2 0 ⎥ ⎦⎣ 0 ⎦ ⎣ 0 ⎦ 1 0 0 3 0 1 0 1 0 1 0 3
⎡
⎤⎡
0 −2 0 0 1 0 ⎢ ⎥⎢ ⎢ 1 ⎢ 0 5 −5 ⎥ ⎥⎢ 0 1 PD = ⎢ ⎢ 0 ⎥ 2 1 0 ⎦⎢ ⎣ ⎣ 0 0 0 1 0 1 0 0
0 0 2 0
0 0 0 3
⎤
⎡
⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎦ ⎣
Section 7.2 Diagonalization ⎤ 0 −2 0 0 ⎥ 1 0 10 −15 ⎥ ⎥. 0 2 2 0 ⎥ ⎦ 0 1 0 3
315
Therefore, AP = P D.
⎡ E XAMPLE 7.11
2 0
0
⎤
⎢ ⎥ Let us try to diagonalize A = ⎣ 2 0 −1 ⎦ . 3 2 3
This matrix has the characteristic polynomial λ3 − 5λ2 + 4λ + 4 = (λ − 2)2 (λ − 1) so that there are two eigenvalues: λ1 = 2 of algebraic multiplicity 2, and λ2 = 1 of algebraic multiplicity 1. ⎡ ⎤ 0 0 0 ⎢ ⎥ For λ1 = 2, the homogeneous system has the coefficient matrix ⎣ −2 2 1 ⎦ . The re−3 −2 −1 ⎤ ⎡ 0 0 1 ⎥ ⎢ duced row echelon form is ⎣ 0 1 12 ⎦ . The solution space of this system (the eigenspace 0 0 0 corresponding to λ1 = 2) has dimension 1 so that the geometric multiplicity of λ1 = 2 does not match the algebraic multiplicity (= 2) of this eigenvalue. Consequently, we will not be able to extract two linearly independent vectors out of this eigenspace, which makes it impossible to diagonalize this matrix.
As illustrated in the last example, it is possible for the geometric multiplicity of an eigenvalue to be smaller than the algebraic one; we have also seen a number of instances where the two multiplicities were equal. A curious reader would be right to ask: is it possible for the geometric multiplicity to exceed the algebraic one? The answer to this question turns out to be negative – see Exercise 22 for a justification.
Calculating powers of a matrix using its diagonalization
Eigenvalues, eigenvectors, and diagonalization of matrices have numerous applications, some of which will be discussed in the next section. For now, let us show a simple way in which diagonalization of a matrix can be used to simplify calculating its power. If A is diagonalizable, there exists a nonsingular matrix P and a diagonal matrix D such that P −1 AP = D. Premultiplying both sides by P and postmultiplying both by P −1 yields A = P DP −1 .
316
Chapter 7 Eigenvalues and Singular Values Powers of A can now be expressed in the following manner: A2 A3
= AA = (P DP −1 )(P DP −1 ) = P D(P −1 P )DP −1 = P D2 P −1 , = A2 A = P D2 P −1 (P DP −1 ) = P D2 (P −1 P )DP −1 = P D3 P −1 ,
etc. Generally, Ak = P Dk P −1
⎡
⎢ (this can be formally proved by induction – see Exercise 15). If D = ⎢ ⎣ ⎡ ⎢ Dk = ⎢ ⎣
dk1
0 ..
.
.. 0
⎤
0
d1 .
⎤(121) ⎥ ⎥, then ⎦
dn
⎥ ⎥ (see Exercise 37 on p. 34), therefore making it easy to calculate Ak ⎦
0 dkn even for large values of k. E XAMPLE 7.12
Given A =
1 2 2 1
, find A8 (eighth power of A).
S OLUTION Instead of repeatedly multiplying A by itself by “brute force”, let us follow the approach based on diagonalization of A. Recall that in Example 7.5on p. 299, we found A to have eigenvalues 3 1 −1 and −1 corresponding to eigenvectors and , respectively. Therefore, we can let 1 1 1 1 1 −1 3 0 2 2 P = . With P −1 = (check!), we have P −1 AP = D = ; 1 1 − 12 21 0 −1 therefore, (121) yields 1 1 0 1 −1 38 8 2 2 A = 0 (−1)8 − 12 21 1 1 1 1 1 −1 6561 0 2 2 = 1 1 0 1 − 12 21 3281 3280 = . 3280 3281
Diagonalization of symmetric matrices
Symmetric matrices have very special properties with respect to their eigenvalues, eigenvectors, and diagonalizability compared to general square matrices. For instance, recall from Theorem 7.2 that, for a general matrix, distinct real eigenvalues must correspond to linearly independent eigenvectors. In case of a symmetric matrix, we can strengthen this result considerably, as shown below. T HEOREM 7.4 are orthogonal.
Eigenvectors corresponding to different eigenvalues of a symmetric matrix
Section 7.2 Diagonalization
317
P ROOF → → Let − u and − v be eigenvectors corresponding to λ1 and λ2 , respectively, with λ1 = λ2 . → → → → u, u can be evaluated in two ways. First, using the equality A− u = λ1 − The expression − v T A− we have → → → → → → − u =− v T (λ1 − u ) = λ1 − v T− u. v T A− → − → → → − v T A = λ2 − v T (since A = On the other hand, transposing both sides of A v = λ2 v yields − T A ); thus → → → → − u = λ2 − v T− u. v T A− Equating both expressions we obtained results in → → → → λ1 − v T− u − λ2 − v T− u =0 or → → (λ1 − λ2 ) − u = 0. v T− → → Because of the assumption λ1 = λ2 , this implies that − u and − v are orthogonal.
We have already seen a number of matrices that cannot be diagonalized, either due to complex eigenvalues (Example 7.4) or because eigenvalue’s algebraic and geometric multiplicities disagreed (Example 7.11). The good news is that none of this can happen for a symmetric matrix, as evidenced by the important result below.
T HEOREM 7.5
(The Spectral Theorem)
A matrix A can be orthogonally diagonalized, i.e., QT AQ = D where Q is an orthogonal matrix and D is diagonal, if and only if A is symmetric. P ROOF Part I (⇒) Premultiplying both sides of QT AQ = D by Q = (QT )−1 and postmultiplying by QT = Q−1 , we obtain A = QDQT . A is symmetric because AT = (QT )T DT QT = QDQT = A. Part II (⇐) In Section 6.4, we have established that every n×n matrix A has a singular value decomposition (104) A = U ΣV T where U and V are n × n orthogonal matrices and Σ is an n × n diagonal matrix. If A is symmetric, then A = AT = V ΣU T . Postmultiplying the first and the second equation by V and U, respectively, yields AV = U Σ and AU = V Σ. Adding and subtracting these, we obtain A(U + V ) = (U + V )Σ and A(U − V ) = (U − V )(−Σ).
318
Chapter 7 Eigenvalues and Singular Values → − → − → →+− v1 , . . . , − u→ In Exercise 22 on p. 215 it was shown that n of the 2n vectors − u 1 n + vn , u1 − − → − → → − v1 , . . . , un − vn are linearly independent. Those vectors are (real) eigenvectors of A, with the corresponding (real) eigenvalues σ i or −σ i . By Theorem 7.1 A is diagonalizable, and by Theorem 7.4 its eigenvectors corresponding to distinct eigenvalues are orthogonal. Therefore, we can find n orthonormal eigenvectors to form the columns of an orthogonal matrix Q (within each eigenspace, we can use the Gram-Schmidt process to obtain such vectors if necessary).
Here are some important consequences of Theorem 7.5: •
every symmetric matrix has only real eigenvalues, and
•
their algebraic multiplicities match the corresponding geometric multiplicities. To perform the orthogonal diagonalization of a symmetric matrix, we can follow these steps: Orthogonal diagonalization of a symmetric n × n matrix A. 1. Begin by forming the characteristic polynomial det(λIn − A) and finding its zeros – eigenvalues of A. (All of these are guaranteed to be real.) 2. For each eigenvalue of algebraic multiplicity 1, solve the corresponding homogeneous system to find a nonzero eigenvector. A unit vector in its direction can be placed in the appropriate column of Q. 3. For each eigenvalue of algebraic multiplicity greater than 1, • • •
solve the corresponding homogeneous system, obtaining a basis for the solution space (eigenspace), unless the basis obtained above is already orthogonal, apply the Gram-Schmidt process to produce an orthogonal basis, divide each of the orthogonal basis vectors by its magnitude – the resulting vectors should be placed in the appropriate columns of Q. ⎡ ⎢ ⎢ Let us orthogonally diagonalize A = ⎢ ⎢ ⎣
E XAMPLE 7.13
−1 1 −1 1 1 −1 −1 −1 −1 2 −2 −2
2 −2 −2 −2
The characteristic polynomial is ⎡ ⎤ λ+1 −1 1 −2 ⎢ ⎥ ⎢ −1 λ−1 1 2 ⎥ ⎢ ⎥ det(λI4 − A) = det ⎢ 1 λ+1 2 ⎥ ⎣ 1 ⎦ −2 2 2 λ+2 ⎡ ⎤ ⎡ ⎤ λ−1 1 2 −1 1 2 ⎢ ⎥ ⎢ ⎥ = (λ + 1) det ⎣ 1 λ+1 2 ⎦ − (−1) det ⎣ 1 λ + 1 2 ⎦ 2 2 λ+2 −2 2 λ+2 ⎤ ⎡ ⎤ ⎡ −1 λ − 1 2 −1 λ − 1 1 ⎥ ⎢ ⎥ ⎢ + det ⎣ 1 1 2 ⎦ − (−2) det ⎣ 1 1 λ+1 ⎦ −2
2
λ+2
−2
2
2
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
Section 7.2 Diagonalization
319
= (λ + 1) λ3 + 2λ2 − 10λ + 4 + −λ2 + 4 + −λ2 − 6λ + 16 + 2 −2λ2 + 8 = λ4 + 3λ3 − 14λ2 − 12λ + 40. Check the factors of the free term 40 (1, −1, 2, −2, 4, −4, 5, −5, etc.): det(1I4 − A) = 1 + 3 − 14 − 12 + 40 = 18 = 0 ⇒ 1 is not an eigenvalue, det(−1I4 − A) = 1 − 3 − 14 + 12 + 40 = 36 = 0 ⇒ −1 is not an eigenvalue, det(2I4 − A) = 16 + 3(8) − 14(4) − 12(2) + 40 = 0 ⇒ 2 is an eigenvalue. We could use long division right now, but the quotient would be a cubic. Instead, it’s better to keep looking until we find one more... det(−2I4 − A) = 16 + 3(−8) − 14(4) − 12(−2) + 40 = 0 ⇒ −2 is an eigenvalue. Consequently, (λ − 2)(λ + 2) = λ2 − 4 is a factor of the characteristic polynomial. Use long division: λ2 + 3λ − 10 λ2 − 4 λ4 + 3λ3 − 14λ2 − 12λ + 40 −λ4 + 4λ2 3λ3 − 10λ2 − 12λ + 40 − 3λ3 + 12λ 2 − 10λ + 40 2 + 10λ − 40 0 The characteristic polynomial in the factored form λ4 + 3λ3 − 14λ2 − 12λ + 40 = (λ − 2)(λ + 2)(λ2 + 3λ − 10) = (λ − 2)2 (λ + 2)(λ + 5) means that there are the following eigenvalues: •
λ1 = 2 with algebraic multiplicity 2,
•
λ2 = −2 with algebraic multiplicity 1, and
•
λ3 = −5 with algebraic multiplicity 1. To find eigenvectors corresponding to λ1 = 2, we must solve the homogeneous system → − → (2I4 − A) − x = 0. ⎡ ⎤ ⎤ ⎡ 1 3 −1 1 −2 0 1 0 ⎢ ⎥ ⎥ ⎢ ⎢ ⎢ −1 1 1 2 ⎥ 1 2 2 ⎥ ⎥ , has the r.r.e.f. ⎢ 0 ⎥ . The The coefficient matrix, ⎢ ⎢ 0 ⎢ 1 1 3 2 ⎥ 0 0 0 ⎥ ⎣ ⎦ ⎦ ⎣ −2 2 2 4 0 0 0 0 eigenvectors, i.e., nontrivial solutions, are ⎤ ⎤ ⎡ ⎤ ⎡ ⎡ x1 −1 0 ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ x2 ⎥ ⎢ ⎢ ⎢ ⎥ = x3 ⎢ −2 ⎥ + x4 ⎢ −2 ⎥ ⎢ x ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥ ⎦ ⎦ ⎣ 3 ⎦ ⎣ ⎣ 0 1 x4 ⎡ ⎡ ⎤ ⎤ −1 0 ⎢ ⎢ ⎥ ⎥ ⎢ −2 ⎥ ⎢ −2 ⎥ − → − → ⎢ ⎢ ⎥ ⎥ so that a basis for the eigenspace is formed by u1 = ⎢ ⎥ and u2 = ⎢ 0 ⎥ . These 1 ⎣ ⎣ ⎦ ⎦ 0 1 vectors are not orthonormal – we will use the Gram-Schmidt process to obtain an orthonormal basis for the eigenspace.
320
Chapter 7 Eigenvalues and Singular Values Begin by finding an orthogonal set: ⎡ ⎢ − → →=⎢ ⎢ v1 = − u 1 ⎢ ⎣
−1 −2 1 0
⎤
⎡
⎥ ⎥ − − → ⎥; → ⎥ v2 = u2 − ⎦
⎢ → − →·− ⎢ u 2 v1 − → ⎢ = v 1 → − → − ⎢ v1 · v1 ⎣
⎡ ⎢ ⎢ → (replace − v2 with 3 times this vector, ⎢ ⎢ ⎣
2 −2 −2 3
⎤
0 −2 0 1
⎡
⎥ ⎥ ⎥− ⎥ ⎦
4 6
⎢ ⎢ ⎢ ⎢ ⎣
−1 −2 1 0
⎤
⎡
⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎦ ⎣
2 3 − 23 − 23
⎤ ⎥ ⎥ ⎥ ⎥ ⎦
1
⎤ ⎥ ⎥ ⎥ , for convenience). ⎥ ⎦
An orthonormal basis for the eigenspace is ⎡ − →= w 1
→ 1 − v − v1 1 →
=
√1 6
⎢ ⎢ ⎢ ⎢ ⎣
−1 −2 1 0
⎤
⎡
⎥ ⎥ − → → 1 − ⎥; w v = − ⎥ 2 = → v2 2 ⎦
√1 21
⎢ ⎢ ⎢ ⎢ ⎣
2 −2 −2 3
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
⎤ −1 −1 1 −2 ⎥ ⎢ ⎢ −1 −3 1 2 ⎥ ⎥ has For λ2 = −2, the coefficient matrix of the homogeneous system ⎢ ⎢ 1 1 −1 2 ⎥ ⎦ ⎣ −2 2 2 0 ⎡ ⎡ ⎤ ⎤ ⎡ ⎤ 1 x1 1 0 −1 0 ⎢ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ 0 ⎢ ⎥ ⎥ ⎢ 0 ⎥ 1 0 0 ⎥ x2 ⎥ ⎥. the r.r.e.f. ⎢ so that a solution is ⎢ = x3 ⎢ ⎢ 0 ⎢ ⎥ ⎥ ⎥ ⎢ 0 0 1 ⎦ ⎣ ⎣ x3 ⎦ ⎣ 1 ⎦ 0 0 0 0 0 x4 ⎡
⎤ 1 ⎥ ⎢ ⎢ 0 ⎥ ⎥ An orthonormal basis for the eigenspace is formed by the unit vector √12 ⎢ ⎢ 1 ⎥. ⎦ ⎣ 0 ⎡
Check ⎡ that⎤an orthonormal basis for the eigenspace associated with λ3 = −5 is formed by −1 ⎢ ⎥ ⎢ 1 ⎥ 1 ⎢ ⎥. √ ⎥ 7 ⎢ ⎣ 1 ⎦ 2 To conclude,
⎡ ⎢ ⎢ Q=⎢ ⎢ ⎣
−1 √ 6 −2 √ 6 √1 6
0
√2 21 −2 √ 21 −2 √ 21 √3 21
√1 2
0 √1 2
0
−1 √ 7 √1 7 √1 7 √2 7
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
Section 7.2 Diagonalization The product QT AQ equals ⎡ −1 −2 1 ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ =
√ 6 √2 21 √1 2 −1 √ 7
√ 6 −2 √ 21
0 √1 7
√ 6 −2 √ 21 √1 2 √1 7
2 0 0 0 ⎢ ⎢ 0 2 0 0 ⎢ ⎢ 0 0 −2 0 ⎣ 0 0 0 −5
0 √3 21
0 ⎤
√2 7
⎤⎡ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎦⎣
⎤ ⎡ −1 √ −1 1 −1 2 ⎥ ⎢ −26 ⎢ √ 1 1 −1 −2 ⎥ ⎥⎢ 6 ⎥ 1 −1 −1 −1 −2 ⎦ ⎢ ⎣ √6 2 −2 −2 −2 0
√2 21 −2 √ 21 −2 √ 21 √3 21
√1 2
0 √1 2
0
−1 √ 7 √1 7 √1 7 √2 7
321 ⎤ ⎥ ⎥ ⎥ ⎥ ⎦
⎥ ⎥ ⎥. ⎥ ⎦
Note how the order of the columns of Q corresponds to the order of the corresponding eigenvalues along the main diagonal of D.
Spectral Theorem and projections
An interesting connection can be made between the orthogonal diagonalization and the orthog→, . . . , − → onal projection onto a subspace V in Rn with an orthonormal basis − w w 1 k
→ →− →T − → − →− →T − → u =− w projV − 1 w1 u + · · · + wk wk u , introduced in Section 6.2. The orthogonal diagonalization can be written as A = QDQT where Q, as an n×n orthogonal →, . . . , − matrix, must have n orthonormal columns. Denoting those by − w w→ 1 n we obtain
A
= QDQT ⎡ ⎤⎡ | ··· | ⎢ → ⎥⎢ ⎢ = ⎣ − w→ w1 · · · − n ⎦⎣ | ··· | ⎡ | ··· | ⎢ − → = ⎣ λ1 w1 · · · λn − w→ n | ··· |
⎤⎡
− ⎥⎢ . .. ⎥⎢ . . ⎦⎣ . 0 λn − ⎤ ⎤⎡ − → − w1 T − ⎥⎢ .. .. .. ⎥ ⎥ ⎦⎢ . . ⎦ ⎣ . T − − w→ − n − → − → − → − → T T = λ1 w 1 w 1 + · · · + λn w n w n . λ1
0
− →T w 1 .. . − → wn T
⎤ − .. ⎥ ⎥ . ⎦ −
→ → A linear transformation F (− u ) = A− u with a symmetric matrix A can be viewed as a sum of → − scaled orthogonal projections of u : each projection is performed onto the eigenspace and is scaled by the corresponding eigenvalue. (In Exercise 25 this argument is extended to cover nonsymmetric diagonalizable matrices.)
322
Chapter 7 Eigenvalues and Singular Values
EXERCISES In Exercises 1–4, determine whether the given matrix A is diagonalizable. If so, find a matrix P and a diagonal matrix D such that P −1 AP = D. Verify your results by checking that AP = P D.
1. a.
2. a.
⎡ ⎢ 3. a. ⎣
⎡
6 −6 6 −7
−1 1 2 0
;
;
1 −2 −4 0
b.
−5 −2 2 −1
⎡
2 −4
6 4 ⎢ 4. a. ⎣ 0 0 0 6
2 −2 ⎢ b. ⎣ 2 −3 0 0 ⎡ ⎤ ⎢ 3 ⎢ ⎥ 1 ⎦ ; b. ⎢ ⎢ ⎣ −6
⎤
0 ⎥ 0 ⎦; 6
⎡
⎡
;
⎤ 3 2 2 ⎢ ⎥ c. ⎣ −1 0 2 ⎦ . 0 0 0
⎤ −2 ⎥ 1 ⎦; −1 0 −2 1 0 −1 0 0 1 2 0 0 0
1 −2 −1 2 ⎢ ⎢ 0 0 0 2 b. ⎢ ⎢ 0 2 −2 2 ⎣ 1 1 2 0
⎡
⎤ −3 −3 0 ⎢ ⎥ c. ⎣ 0 −3 0 ⎦ . −3 0 3 ⎤ −1 ⎥ 0 ⎥ ⎥. −2 ⎥ ⎦ −2 ⎤ ⎥ ⎥ ⎥. ⎥ ⎦
In Exercises 5–6, determine whether the given matrix A is orthogonally diagonalizable. If so, find an orthogonal matrix Q and a diagonal matrix D such that QT AQ = D. Check that the two sides actually equal. ⎤ ⎡ ⎤ ⎡ 1 2 0 0 1 ⎡ ⎤ 2 0 0 0 ⎥ ⎢ ⎢0 1 0 0 1⎥ 0 1 −1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ −1 −2 0 1 −1 −1 ⎢ ⎥ ⎥ ; d. ⎢ 0 0 1 2 0 ⎥ . 5. a. ; b. ⎣ 1 0 1 ⎦ ; c. ⎢ ⎥ ⎢ ⎥ ⎢ −2 2 1 −1 ⎦ ⎥ ⎢ ⎣ 0 −1 0 0 1 5 0 −1 1 0 ⎦ ⎣ 0 −1 −1 1 1 2 0 0 1 ⎡ ⎤ ⎡ ⎤ −2 −1 1 0 −4 1 2 ⎢ ⎥ ⎢ −1 −1 −4 6 1 −1 ⎥ ⎢ ⎥ ⎢ ⎥. 6. a. ; b. ⎣ 1 −4 2 ⎦ ; c. ⎢ 6 1 1 −1 1 ⎥ ⎣ 1 ⎦ 2 2 −1 0 −1 1 −2 In Exercises 7–8, follow the procedure of Example 7.12 to calculate the following powers of (diagonalizable) matrices. (If the matrix is symmetric, you may find it easier to use orthogonal diagonalization.) 6 10 11 2 2 −1 1 0 1 ; b. ; c. . 7. a. 1 3 1 −1 2 1
8. a.
2 −3 −2 1
7
; b.
−1 2 1 −2
9
; c.
Section 7.2 Diagonalization 8 2 −2 . −2 −1
323
→ → → 9. * Given that − v is a nonzero n-vector, consider the matrix A = − v− v T. a. Show that A has an eigenvalue λ = 0 of algebraic multiplicity n − 1. b. Determine the nonzero eigenvalue of A and a corresponding eigenvector.
T/F?
In Exercises 10–13, decide whether each statement is true or false. Justify your answer. −3 0 −1 10. If A is a 2 × 2 matrix such that P AP = , then 1 is an eigenvalue of A. 0 1 11. If an n × n matrix A has n linearly independent columns, then A is diagonalizable. 12. If a 3 × 3 matrix A has only real eigenvalues, then A is symmetric. 13. If a 3 × 3 matrix A is symmetric, then A has only real eigenvalues.
0 1 1 2 0 0 14. * Given the matrices Q = ,A = and D = , verify that 0 0 3 4 0 1 QT AQ = D, even though A is not symmetric. Why does this not contradict Theorem 7.5? 15. * Use mathematical induction to prove equality (121). 16. * Show that if A and B are similar matrices, then det A = det B. → − → − −1 17. * Given let F ( x ) = A x . If B = P AP for some nonsingular matrix an n × n matrix A, − → ··· − (i.e., B is similar to A), show that B is the matrix of F with P = u→ u 1 n →, . . . , − respect to the basis T = {− u u→} . (Hint: Consider Exercises 27 and 28 on p. 250.) 1
n
18. * Combine the results from Exercise 17 above and Exercise 25 on p. 250 to show that if B = P −1 AP and the columns of P are eigenvectors of A, then B is diagonal. (Do not use Theorem 7.1.) 19. * Show that if A and B are similar matrices, then rank A = rank B. In Exercises 20 and 21, refer to Exercise 42 in Section 1.3 (p. 34) for a discussion of matrix polynomials.
20. * The Cayley-Hamilton Theorem states that if A is an n × n matrix with the characteristic polynomial p(λ) = det(λI − A), then p(A) = 0n×n where 0n×n denotes the n × n zero matrix. Verify the statement of this theorem if A is diagonalizable. 21. * In this exercise, we will prove the Cayley-Hamilton Theorem (see Exercise 20) in the general case of an n × n matrix A. Let us denote the characteristic polynomial of A : p(λ) = det(λI − A) = λn + cn−1 λn−1 + · · · + c2 λ2 + c1 λ + c0 . a. Applying Theorem 3.14 to the matrix λI − A show that there exist n × n matrices B0 , . . . , Bn−1 such that n λ + cn−1 λn−1 + · · · + c2 λ2 + c1 λ + c0 I 2 1 = (λI − A) λn−1 Bn−1 + λn−2 Bn−2 + · · · + λ2 B2 + λB1 + B0 . (Hint: The polynomial in brackets is the adjoint of λI − A. Why is its degree ≤ n − 1?) b. Expand the right-hand side of the equality obtained in part a. Both sides are now polynomials of nth degree. This equality is required to hold for all values of λ; therefore,
324
Chapter 7 Eigenvalues and Singular Values the (matrix) coefficients corresponding to the same power of λ must be the same. Set up the resulting system of n + 1 matrix equations: (LHS matrix coefficient of λn ) = (RHS matrix coefficient of λn ) .. . (LHS matrix coefficient of λ1 ) = (RHS matrix coefficient of λ1 ) (LHS matrix coefficient of λ0 ) = (RHS matrix coefficient of λ0 ). c. Multiply both sides of the first equation by An , both sides of the second equation by An−1 , etc. Adding all resulting equations should yield the statement of the Cayley-Hamilton Theorem. 22. * Given an n × n matrix A, suppose det(λIn − A) = (λ − λ1 )p1 · · · (λ − λk )pk so that all eigenvalues of A are real numbers, and the eigenvalue λi has algebraic multiplicity pi . Let us assume that the geometric multiplicity of λi exceeds its algebraic multiplicity; →, . . . , − −→ i.e., there exist pi +1 linearly independent eigenvectors corresponding to λi : − u u− 1 pi +1 .
23.
a. Use the result of Theorem 4.13, part b, to show that a basis for Rn can be obtained by → →, . . . , − −→ −−−→ − → vj to the list: T = {− u u− adding n − (pi + 1) vectors − 1 pi +1 , vpi +2 , . . . , vn } . b. Let P be a matrix whose columns are the vectors in the basis T (in the same order). D B Show that AP = P F where F = and D is the (pi +1)×(pi +1) diagonal 0 C matrix whose main diagonal entries all equal λi . c. Use the result of Exercise 25 on p. 129 to show that the characteristic polynomial of the matrix F from part b has a factor (λ − λi )pi +1 . d. Show that A and F are similar matrices, and then prove that part c leads to a contradiction, therefore forcing us to acknowledge that (Algebraic multiplicity of λi ) ≥ (Geometric multiplicity of λi ) . ⎡ ⎤ 2 0 0 ⎢ ⎥ a. * Verify that the matrix A = ⎣ 2 0 −1 ⎦ of Example 7.11 satisfies AP = P F ⎡
⎤
3 2
3 ⎡
⎤ 0 0 1 1 0 0 ⎢ ⎥ ⎢ ⎥ where P = ⎣ −1 −5 7 ⎦ and F = ⎣ 0 2 1 ⎦ (while F is not a diagonal 1 10 −7 0 0 2 matrix, it can be viewed as a block diagonal matrix with two Jordan blocks on the main diagonal – refer to Exercise 34 onp. 310). → − → to show that → − b. * Partition P into columns: P = − u u u 1 2 3 → = 1− →, A− u u 1 1 → = 2− →, A− u u 2 2 → = 2− →+− →. A− u u u 3
3
2
c. * In part b, you have shown that the first two columns of P are eigenvectors of A. ⎡ ⎤ 1 → = ⎢ 7 ⎥ satisfies Verify that the third column, − u ⎣ ⎦ 3 −7 → − 2→ u3 = 0 . (2I3 − A) − → − → → (Any nonzero vector − x such that (λI −A)k − x = 0 is called a generalized eigenvector of A corresponding to λ.)
Section 7.3 Applications of Eigenvalues and Eigenvectors
325
→ 24. * Suppose − x is a generalized eigenvector associated with λ; i.e., there exists a positive integer k for which → − → x = 0. (122) (λI − A)k − → Show that λ is an eigenvalue of A associated with some eigenvector − y ; i.e., the nullity of λI − A is at least 1. (Hint: Demonstrate that assuming nullity(λI − A) = 0 would render (122) impossible.) ⎡ ⎤ λ1 0 ⎢ ⎥ .. ⎥ 25. * Let A be a diagonalizable matrix so that a diagonal matrix D = ⎢ . ⎣ ⎦ 0 λn exists such that A = P DP −1 . Partitioning P ⎤into columns and P −1 into rows, we have ⎡ − → T w1 ⎢ .. ⎥ → − − → −1 ⎥ P = v1 · · · vn and P = ⎢ . ⎦. ⎣ − T w→ n → →T + · · · + λ − →− →T a. Show that A = λ1 − v1 − w 1 n vn wn . − → → − b. Show that vi · wi = 1 for all i = 1, . . . , n. → → c. On p. 321, A− u was described as “a sum of scaled orthogonal projections of − u : each projection is performed onto the eigenspace and is scaled by the corresponding eigenvalue.” However, that description applies to symmetric matrix A only – how can → A− u be described in terms of projections if A is diagonalizable but not symmetric? (Hint: Use the results of parts a and b above along with Exercise 42 on p. 276.)
7.3 Applications of Eigenvalues and Eigenvectors
Population growth example revisited
E XAMPLE 7.14 In Example 1.26 of Section 1.4, we have studied a model for population dynamics of perennial grasses based on the following linear transformation: ⎡ ⎤ 0 0 500 −−→ −−−−→ ⎢ ⎥−−→ x(n+1) = F (x(n) ) = ⎣ 0.006 0.4 0.4 ⎦x(n) .
0
0.4
0.4
A
−−→ Recall that each component of the vectors x(n) contains information on the population count in year n among: seeds (S), vegetative adults (V), and generative adults (G). Based on the initial ⎡ ⎤ 12, 500 −−→ ⎢ ⎥ state x(0) = ⎣ 0 ⎦ , the state vectors for years 1, 2, and 3 have already been found in 0 Example 1.26 of Section 1.4. On p. 326, we present a table containing the corresponding data through the year 20. It appears that, after the initial perturbations (e.g., years 1 and 2 with no seeds), later on, all three groups (S, V, and G) enjoy a steady growth. To gain some insight into the rate of this perceived growth, on p. 327 we tabulate the ratios (for all n = 4, . . . , 20) of the counts of S, V, and G in the nth year divided by the count in the year n − 1. For larger n values, each of the three populations appears to be growing at about the same rate, close to 1.4.
326
Chapter 7 Eigenvalues and Singular Values
Table for Example 7.14 showing the counts of S, V, G.
n
S(n)
V(n)
G(n)
0
12500
0
0
1
0
75
0
2
0
30
30
3
15000
24
24
4
12000
109
19
5
9600
123
51
6
25680
127
70
7
34944
233
79
8
39475
334
125
9
62396
421
184
10
91850
616
242
11
120850
894
343
12
171555
1220
495
13
247464
1715
686
14
342991
2445
961
15
480259
3420
1362
16
681164
4795
1913
17
956521
6770
2683
18
1341528
9520
3781
19
1890620
13370
5321
20
2660321
18820
7476
Perron-Frobenius theory for regular matrices
Let us see how these findings relate to the eigenvalues and eigenvectors of A. The characteristic polynomial of A is ⎡ ⎤ λ 0 −500 ⎢ ⎥ det(λI3 − A) = det ⎣ −0.006 λ − 0.4 −0.4 ⎦ 0 −0.4 λ − 0.4 = λ (λ − 0.4)2 − 0.16 − (500) (0.0024) = λ λ2 − 0.8λ + 0.16 − 0.16 − 1.2 = λ3 − 0.8λ2 − 1.2. It would not be an easy matter to find zeros of this polynomial by hand. Using a computational device instead, we found that, of the three eigenvalues, two are complex, while one is → approximately λ ≈ 1.407. This real eigenvalue corresponds to the eigenspace span{− v } where, ⎡ ⎤ 355.5 ⎢ ⎥ → approximately, − v ≈ ⎣ 2.52 ⎦ . (You should be able to verify, using your calculator, that 1 ⎤ ⎡ 500 ⎥ ⎢ → → both A− v and λ− v are approximately equal to ⎣ 3. 54 ⎦ .) 1. 408 Note that the rate of growth in the second table approaches 1.407 very closely, and the propor→ tion of S, V, and G counts in the first table resembles that of the eigenvector − v more and more as n increases.
→ In some ways, the behavior we have observed may not be surprising: if − v is an eigenvector −−−−→ −−→ (n+1) corresponding to an eigenvalue λ and we have a system of the form x = Ax(n) , then −−→ → letting x(n) = c− v ,we obtain −−−−→ → → → x(n+1) = A(c− v ) = c (A− v ) = cλ− v, −−→ −−−−→ which means x(n+1) = λx(n) . −−→ However, this leaves several questions unanswered: In our case x(0) was not an eigenvector – why did the subsequent iterates approach multiples of eigenvectors? Can this type of behavior be expected to continue if we change the starting vector? In the presence of multiple eigenvalues and their corresponding eigenvectors, how do we know which eigenvectors are being approached?
In this subsection, we shall discuss some key theoretical results that will allow us to gain further insight into many linear models like the one discussed above. D EFINITION A matrix A = [aij ] is said to be positive if aij > 0 for all i and j. Likewise, we say A is nonnegative if aij ≥ 0 for all i, j. An n × n nonnegative matrix A is called regular if there exists a positive integer k such that Ak is positive.
Section 7.3 Applications of Eigenvalues and Eigenvectors
327
Since an n-vector is an n × 1 matrix, we can apply this definition to refer to positive or nonnegative vectors. We omit the proof19 of the following theorem, parts of which are due to Perron and Frobenius:
T HEOREM 7.6
If A is a regular n × n matrix, then:
1. There exists a positive real eigenvalue λmax (called the dominant eigenvalue) such that all remaining eigenvalues (real or complex) are smaller in magnitude.20 2. The eigenvalue λmax is a single root of the characteristic equation of A. Table for Example 7.14 showing the ratios of counts of S, V, G. G(n) V(n) S(n) n S(n-1) V(n-1) G(n-1)
→ 3. Any eigenvector − v corresponding to λmax is either positive or its negative is. No other eigenvalue contains any nonnegative eigenvectors in their eigenspaces. → 4. If − v is a positive eigenvector corresponding to λmax , then for any nonnegative n-vector → − → − x = 0 there exists a constant c > 0 such that → x Ar − → = c− v. (123) lim r r→∞ λmax
4
0.8
4.55
0.8
5
0.8
1.13
2.675
6
2.675
1.033
1.361
7
1.361
1.828
1.13
8
1.13
1.435
1.581
9
1.581
1.257
1.472
10
1.472
1.465
1.316
11
1.316
1.451
1.42
12
1.42
1.364
1.442
13
1.442
1.406
1.386
14
1.386
1.426
1.4
15
1.4
1.399
1.418
16
1.418
1.402
1.404
17
1.404
1.412
1.403
The matrix A in Examples 1.26 and 7.14 is nonnegative, but not positive. ⎡ ⎤ 1. 2 160 160 ⎢ ⎥ We ask the reader to verify that while A2 is not positive, A3 = ⎣ 0.00192 1.456 2.656 ⎦ 0.00192 0.256 1.456 is.
18
1.403
1.406
1.409
Consequently, A is regular; thus Theorem 7.6 applies.
19
1.409
1.404
1.407
20
1.407
1.408
1.405
The characteristic polynomial has three eigenvalues: one real λ ≈ 1.407 and two complex λ ≈ −0.303 ± 0.872i. The magnitude of the latter two is approximately 0.923. As expected, the dominant eigenvalue is real: λmax ≈ 1.407. ⎡ ⎤ ⎡ ⎤ 355.5 12, 500 → x Ar − ⎢ ⎥ ⎢ ⎥ → → ≈ Taking − v ≈ ⎣ 2.52 ⎦ and − x = ⎣ 0 ⎦ we can verify that for large r, r λmax 1 0 → 8.14− v.
Note that if A is regular, so is AT ; therefore, the theorem applies to it as well, guaranteeing that the dominant eigenvalue λmax of AT (AT and A always have the same eigenvalues) corre→ sponds to a positive eigenvector − w . It can be shown that this eigenvector yields the following formula (also left without a proof) for the constant c in (123): → − → w ·− x c= − . (124) → → − w· v
E XAMPLE 7.15
19
Throughout this book, we have made an effort to provide a complete justification to all major statements we made. However, we are making an exception for this theorem – an interested reader can find a number of sources containing a proof, including the article “The Many Proofs and Applications of Perron’s Theorem” by C. R. MacCluer, SIAM Review, 42 (2000), pp. 487–498, as well as many of the √ references contained therein. 20 The magnitude of a complex number a + bi is a2 + b2 .
328
Chapter 7 Eigenvalues and Singular Values
Markov chains E XAMPLE 7.16 Imagine a building with three rooms connected by open passages shown in the margin. A rat is initially placed in one of the rooms and is moving to one of the neighboring rooms, choosing a passage at random.
2 1 3
If a rat is currently in room 1, it can end up in any of the neighboring two rooms with the same likelihood, i.e., with probability 1/2 for each. On the other hand, a rat in room 2 is twice as likely to move to room 3 next (as two openings lead there) than to room 1, making the corresponding probabilities 2/3 and 1/3, respectively. (A similar situation applies to a rat in room 3.) ⎡ (n) ⎤ p1 −−→ ⎥ ⎢ The following transition matrix can be used to multiply a vector x(n) = ⎣ p(n) ⎦ to obtain 2 (n) p3 −−−−→ (n) x(n+1) , where pi denotes the probability of the rat being in room i after the nth move: ⎡ ⎤ 0 1/3 1/3 ⎢ ⎥ A = ⎣ 1/2 0 2/3 ⎦ . 1/2
2/3
0
Note that A is nonnegative. Squaring A we obtain a positive matrix ⎤ ⎡ 1 2 2 ⎢ A2 = ⎣
3 1 3 1 3
9 11 18 1 6
9 1 6 11 18
⎥ ⎦
so A is a regular matrix. In agreement with Theorem 7.6, its eigenvalues are 1 (= λmax ), − 13 , and − 23 (check!). ⎤ ⎡ ⎤ ⎡ 1 −1/3 −1/3 1 0 − 23 ⎥ ⎢ ⎥ ⎢ Since the matrix 1I3 − A = ⎣ −1/2 1 −2/3 ⎦ has the r.r.e.f. ⎣ 0 1 −1 ⎦ , the 0 0 0 −1/2 −2/3 1 ⎡ 2 ⎤ 3
⎥ ⎢ eigenspace associated with 1 is span{⎣ 1 ⎦}. 1 Exactly one of these eigenvectors has components that add up to unity: ⎡ 2 ⎤ ⎡ 2 ⎤ ⎡ 1 ⎤ ⎡ ⎤ 0.25 3 3 4 1 ⎢ ⎥ ⎢ ⎥ 3⎢ ⎥ ⎢ ⎥ → − v = 2 ⎣ 1 ⎦ = ⎣ 1 ⎦ = ⎣ 38 ⎦ = ⎣ 0.375 ⎦ . 8 + 1 + 1 3 3 1 1 0.375 8 It is rather amazing to see how quickly this vector is being approached when we start with the rat in room 1: n −−→ x(n)
⎡
0
⎤
1 ⎢ ⎥ ⎣ 0 ⎦ 0
⎡ ⎢ ⎣
1 0 1 2 1 2
⎤
⎡
⎥ ⎦
⎢ ⎣
2 1 3 1 3 1 3
⎤
⎡
⎥ ⎦
⎢ ⎣
2 9 7 18 7 18
⎤
3 ⎡
⎤
0.222 ⎥ ⎢ ⎥ ⎦ ≈ ⎣ 0.389 ⎦ 0.389
⎡ ⎢ ⎣
7 27 10 27 10 27
4 ⎡
⎤
⎤ 0.259 ⎥ ⎢ ⎥ ⎦ ≈ ⎣ 0.370 ⎦ 0.370
The convergence is not always this quick, as evidenced when the rat starts in room 2 instead: n −−→ x(n)
⎡
0
⎤
0 ⎢ ⎥ ⎣ 1 ⎦ 0
⎡
1
⎤
⎡
⎢ ⎥ ⎣ 0 ⎦
⎢ ⎣
1 3 2 3
2 2 9 11 18 1 6
⎤
⎡
⎥ ⎦
⎢ ⎣
7 27 2 9 14 27
⎤
3 ⎡
⎤
0.259 ⎥ ⎢ ⎥ ⎦ ≈ ⎣ 0.222 ⎦ 0.519
⎡ ⎢ ⎣
20 81 77 162 5 18
⎤
4 ⎡
⎤ 0.247 ⎥ ⎢ ⎥ ⎦ ≈ ⎣ 0.475 ⎦ 0.278
Section 7.3 Applications of Eigenvalues and Eigenvectors ⎡ ⎤ ⎡ ⎤ 0.25 0.25 −−→ ⎢ −−→ ⎢ ⎥ ⎥ However, we do get there eventually as x(10) ≈ ⎣ 0.384 ⎦ and x(20) ≈ ⎣ 0.375 ⎦ . 0.366
329
0.375
You may have noticed that the constant c of (123) appears to be 1 for our choice of the eigen→ vector − v . This is not a coincidence, as we are about to establish below. ⎤ x1 ⎢ . ⎥ → . ⎥ A nonnegative vector − x =⎢ ⎣ . ⎦ is said to be a probability vector (or a stochastic xn vector) if x1 +· · ·+xn = 1. A square matrix whose columns are all probability vectors is called a stochastic matrix. ⎡
T HEOREM 7.7
If A is a regular n × n stochastic matrix, then:
1. There exists an eigenvalue λmax = 1 such that all remaining eigenvalues (real or complex) are smaller in magnitude. 2. The eigenvalue λmax = 1 is a single root of the characteristic equation of A. → 3. Any eigenvector − v corresponding to λmax = 1 is either positive or its negative is. No other eigenvalues correspond to nonnegative eigenvectors. → 4. There is a unique eigenvector − v corresponding to λmax = 1 that satisfies the definition of → − → a probability vector; v is called the stable vector of A. For any probability n-vector − x → → − r− (125) lim A x = v . r→∞
P ROOF Each of the four properties mirrors respective properties of the more general Theorem 7.6. ⎤ ⎡ v1 ⎢ . ⎥ → . ⎥ From properties 1–3 of that theorem, we know a positive − v =⎢ ⎣ . ⎦ can be found such that vn → → − → − v are stochastic, A v = λmax v for some real positive λmax . However, because both A and − → − the sum of the elements in the product A v must be n n n n n n → (rowi A) · − v = aij vj = aij vj = vj = 1. i=1
i=1 j=1
j=1 i=1
=1
j=1
→ v to meet the same requirement is when The only way for the right-hand side vector λmax − λmax = 1. ⎡ ⎤ 1 ⎢ . ⎥ → T . ⎥ To prove property 4, observe that the positive n-vector − w =⎢ ⎣ . ⎦ is an eigenvector of A 1
330
Chapter 7 Eigenvalues and Singular Values corresponding to λmax = 1: ⎡
a11 ⎢ . → T− . A w =⎢ ⎣ . a1n
⎤⎡
⎡
⎤
⎢ an1 1 ⎢ . ⎥ ⎢ .. ⎥ ⎥⎢ ⎥ ⎢ . ⎦ ⎣ .. ⎦ = ⎢ ⎢ ⎣ ann 1
··· .. . ···
=1
a11 + · · · + an1 .. . a + · · · + ann 1n
⎤
⎡ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥=⎣ ⎥ ⎦
⎤ 1 .. ⎥ → ⎥ − . ⎦ = w. 1
=1
→ → → → − → Since both − x and − v have entries that sum to 1, we have − w ·− x =→ w ·− v = 1 so that (124) yields c = 1, consequently making (123) lead to (125). −−−−→ −−→ → A process described by x(n+1) = Ax(n) where both A and − x are stochastic is called a Markov chain. While quite simple in nature (as a transition from one state to the next is linear and involves “no memory” of the previous states), Markov chains are used to model a surprisingly diverse spectrum of real-life phenomena. Let us conclude this subsection with a warning to the reader: check the assumptions of a theorem carefully before you use it.
2 1 3
E XAMPLE 7.17 in the margin.
Suppose we modified the layout of the building of Example 7.16 as shown ⎡
⎤ 0 1/2 0 ⎢ ⎥ The resulting transition matrix is A = ⎣ 1 0 1 ⎦ . 0 1/2 0 This matrix is stochastic, but we must check if A is regular before invoking Theorem 7.7. To do so, let us evaluate powers of A : ⎡ ⎤⎡ 0 1/2 0 0 ⎢ ⎥⎢ 2 A = ⎣ 1 0 1 ⎦⎣ 1 0 1/2 0 0 ⎡ ⎤⎡ 1/2 0 1/2 ⎢ ⎥⎢ A3 = ⎣ 0 1 0 ⎦ ⎣ 1/2 0 1/2
⎤ ⎡ 1/2 0 1/2 ⎥ ⎢ 0 1 ⎦=⎣ 0 1/2 0 1/2 ⎤ ⎡ 0 1/2 0 0 ⎥ ⎢ 1 0 1 ⎦=⎣ 1 0 1/2 0
⎤ 0 1/2 ⎥ 1 0 ⎦, 0 1/2 ⎤ 1/2 0 ⎥ 0 1 ⎦.
0 1/2 0
3
4
There is no need to go any further: since A = A, we will get A = A3 A = AA = A2 next, then A5 = A3 = A, etc. All odd powers of A are the same as A, whereas all even powers of A are the same as A2 . As neither A nor A2 is positive, we conclude that A is not regular. Theorem 7.7 does not apply here! Indeed, A has two eigenvalues of magnitude one: 1 and −1. Furthermore, if we let a rat start in room 1, we get a sequence n −−→ x(n)
⎡
0
⎤
1 ⎢ ⎥ ⎣ 0 ⎦ 0
⎡
1
⎤
0 ⎢ ⎥ ⎣ 1 ⎦ 0
that obviously diverges.
⎡
2 1 2
⎤
⎢ ⎥ ⎣ 0 ⎦ 1 2
⎡
3
⎤
0 ⎢ ⎥ ⎣ 1 ⎦ 0
⎡
4 1 2
⎤
⎢ ⎥ ⎣ 0 ⎦ 1 2
⎡
5
⎤
0 ⎢ ⎥ ⎣ 1 ⎦ 0
··· ···
Section 7.3 Applications of Eigenvalues and Eigenvectors
Google’s PageRank
331
Generally, web search engines attempt to arrange the list of pages that match the user’s query, placing the (hopefully) most relevant ones near the top. In this subsection we present a key idea behind the PageRank algorithm which was developed by Sergey Brin and Lawrence Page21 , the founders of Google. E XAMPLE 7.18 Let us examine four web pages, each of which contains hyperlinks to some of the other pages (see the illustration in the margin). Suppose a web user is clicking on those (n) hyperlinks at random, with equal probability of each click. If pi denotes the probability of visiting page i after the nth click, then the probability vector ⎡ (n) ⎤ p ⎢ 1 ⎥ −−→ ⎢ p(n) ⎥ 2 ⎥ x(n) = ⎢ ⎢ p(n) ⎥ ⎣ 3 ⎦ (n) p4
1
−−→ −−−−→ undergoes the transformation x(n+1) = Ax(n) with the stochastic transition matrix ⎡ ⎤ 0 0 0 1/2 ⎢ ⎥ ⎢ 1/2 0 0 0 ⎥ ⎥ A=⎢ ⎢ 1/2 0 0 1/2 ⎥ (check!). ⎣ ⎦ 0 1 1 0
2
It can be checked (not necessarily by hand) that A6 is a positive matrix; thus A is regular. The reader should verify that the eigenspace corresponding to the dominant eigenvalue 1 is ⎤ ⎡ 1/2 ⎥ ⎢ −−→ ⎢ 1/4 ⎥ ⎥}. Therefore, in the limit, x(n) approaches the stable vector ⎢ span{⎢ ⎥ ⎣ 3/4 ⎦
3
1
⎡
1 2
4
Illustration for Example 7.18 (a “hand” indicates where a user is clicking on a hyperlink – the arrow points to the link’s target)
+
1 4
1 +
3 4
⎤ ⎡ 1/2 ⎢ ⎥ ⎢ ⎢ 1/4 ⎥ ⎢ ⎢ ⎥=⎢ ⎥ ⎢ +1⎢ ⎣ 3/4 ⎦ ⎣ 1
⎤ 0.2 ⎥ 0.1 ⎥ ⎥. 0.3 ⎥ ⎦ 0.4
Based on these probabilities, we would rank these four pages in this order: 4, 3, 1, 2 (from the most likely to be visited to the least likely).
In general, in the set of n web pages, we consider an n × n stochastic matrix A such that aij represents the probability of jumping to page i from page j. If multiple links exist from page j to i, they are counted as one (this is different than the situation involving rats moving between rooms in the last subsection, where multiple openings between two rooms would increase the likelihood of the corresponding move). However, a page ranking scheme devised in this way could potentially fail to produce a unique −−→ lim x(n) . n→∞
21
S. Brin and L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Seventh International World-Wide Web Conference (WWW 1998), April 14–18, 1998, Brisbane, Australia, 1998.
332
Chapter 7 Eigenvalues and Singular Values E XAMPLE 7.19
Consider three web pages where
•
page 1 links to page 2,
•
page 2 links to both pages 1 and 3, and
•
page 3 links to page 2. ⎡
⎤ 0 1/2 0 ⎢ ⎥ After we construct the transition matrix A = ⎣ 1 0 1 ⎦ , we discover that this is the 0 1/2 0 same matrix used in Example 7.17. Clearly, the kind of unstable behavior exhibited by probability vectors there would be highly undesirable in a search engine!
To ameliorate this problem, Brin and Page used a simple modification of the original stochastic matrix A by considering the matrix C = (1 − )A + B.
(126)
The n × n matrix B has all of its n2 entries equal n1 . The number is chosen arbitrarily to be 0.15. The resulting n × n matrix C is a stochastic matrix (an easy proof is left as an exercise for the −−−−→ −−→ reader). The Markov chain defined by x(n+1) = C x(n) can be interpreted as follows: •
When visiting any page, there is a probability = 0.15 that the next page visited will be chosen completely at random (from among all pages available, not just these linked to from the current one).
•
With probability 1 − = 0.85, the user will follow a link on the current page (with equal probability assumed for each page being linked to, as discussed above). At first, this modification may not appeal to us, until we realize that it accomplishes something of great importance: it ensures C is a positive matrix; therefore, it is regular. Consequently, −−→ convergence of the sequence C n x(0) to the unique stable vector is guaranteed. Let us construct the matrix C for the “web” of three pages in Example 7.19: ⎡ ⎤ ⎡ ⎤ 0 1/2 0 1/3 1/3 1/3 ⎢ ⎥ ⎢ ⎥ C = 0.85 ⎣ 1 0 1 ⎦ + 0.15 ⎣ 1/3 1/3 1/3 ⎦ 0 1/2 0 1/3 1/3 1/3 ⎡ ⎤ ⎡ ⎤ 2 19 2 0.05 0.475 0.05 1 ⎢ ⎥ ⎢ ⎥ = ⎣ 0.9 0.05 0.9 ⎦ = ⎣ 36 2 36 ⎦ . 40 2 19 2 0.05 0.475 0.05 ⎤ ⎡ ⎡ ⎤ 19/74 1 ⎥ ⎢ ⎢ ⎥ } =span{⎣ 36/74 ⎦}. The eigenspace corresponding to the dominant eigenvalue is span{⎣ 36 19 ⎦ E XAMPLE 7.20
1
19/74
The resulting page rank would place page 2 at the top, followed by pages 1 and 3 (a tie).
Using some technology, the reader can check that constructing C in Example 7.18 leads to the
Section 7.3 Applications of Eigenvalues and Eigenvectors 333 ⎡ ⎤ 0.202 ⎢ ⎥ ⎢ 0.123 ⎥ ⎢ ⎥ . While this is somewhat different than the stable vector approximately equal to ⎢ ⎥ ⎣ 0.288 ⎦ 0.387 ⎤ ⎡ 0.2 ⎥ ⎢ ⎢ 0.1 ⎥ ⎥ we obtained for A, this difference does not affect the resulting page rank in ⎢ vector ⎢ ⎥ ⎣ 0.3 ⎦ 0.4 this case.
Systems of linear ordinary differential equations with constant coefficients
ò
An ordinary differential equation is an equation involving (ordinary) derivative(s) of an unknown function. Numerous important problems in a variety of disciplines involve systems of such equations, albeit they are often impossible to solve exactly; one frequently has to settle for numerically approximating their solutions instead. In a typical differential equations course, special classes of differential equations and their systems are identified for which an analytical (exact) solution can be found. In this subsection, we shall focus on one such class: systems of linear ordinary differential equations of the first order with constant coefficients: dx1 dt dx2 dt
= a11 x1 + a12 x2 + · · · + a1n xn + b1
(127)
= a21 x1 + a22 x2 + · · · + a2n xn + b2 .. .
dxn dt
= an1 x1 + an2 x2 + · · · + ann xn + bn .
It turns out that the theory of eigenvalues and eigenvectors we have developed in this chapter will be very useful here. Before addressing the general system (127), let us examine the simplest case n = 1 : dx = ax + b. dt It can be easily verified that if a = 0, the exact solution22 of this equation is b x(t) = Keat − . a In addition to the differential equation (128), an initial condition
(128)
(129)
x(t0 ) = x0 is often imposed. If so, the arbitrary constant K has to be specified accordingly: b = x0 Keat0 − a x0 + ab K = eat0 22
Simple calculus is sufficient to verify this solution by differentiation. To obtain this solution, one could follow the standard procedure for solving separable differential equations.
334
Chapter 7 Eigenvalues and Singular Values so that x(t) =
b b x0 + ea(t−t0 ) − . a a
(130)
The system (127) can be expressed in the matrix form → → − d− x → = A− x + b dt
(131)
⎡
⎤ ⎡ ⎤ ⎡ ⎤ x1 a11 · · · a1n b1 ⎢ . ⎥ ⎢ . ⎢ . ⎥ → − .. ⎥ → ⎢ . ⎥ ⎢ ⎥ . ⎥ where − x = ⎢ . ⎦ , and b = ⎣ .. ⎦ . Let us assume A is ⎣ . ⎦, A = ⎣ . xn an1 · · · ann bn nonsingular and diagonalizable; i.e., there exists a nonsingular matrix P and a diagonal matrix D with nonzero main diagonal entries such that A = P DP −1 . Substituting this into equation (131) and premultiplying both sides by P −1 yields
P −1
→ → − d− x → = DP −1 − x + P −1 b . dt
→ − → → → Let us substitute − c = P −1 b and − x . Denoting P −1 = [qij ] we have y = P −1 − ⎡ ⎤⎡ ⎤ ⎡ dx1 dxn 1 q11 · · · q1n q11 dx dt dt + · · · + q1n dt → − ⎢ ⎥ ⎢ ⎥ ⎢ d x .. .. .. ⎥ ⎢ .. ⎥ = ⎢ = ⎢ P −1 . . . ⎣ ⎦⎣ . ⎦ ⎣ dt dxn dxn n + · · · + qnn dx qn1 · · · qnn q n1 dt dt dt ⎡ ⎤ ⎡ dy ⎤ d 1 (q11 x1 + · · · + q1n xn ) dt → − ⎢ dt ⎥ ⎢ ⎥ .. ⎥ = ⎢ .. ⎥ = d y ; = ⎢ . ⎣ ⎦ ⎣ . ⎦ dt dyn d dt (qn1 x1 + · · · + qnn xn ) dt thus equation (132) becomes → d− y → → = D− y +− c. dt
(132)
⎤ ⎥ ⎥ ⎦
(133)
⎡
⎤ d1 0 · · · 0 ⎢ ⎥ ⎢ 0 d2 · · · 0 ⎥ ⎢ Since D = ⎢ . .. .. ⎥ .. ⎥ is a diagonal matrix, the matrix equation (133) actually . . ⎣ . . . ⎦ 0 0 · · · dn describes a very simple system of differential equations: dy1 = d1 y1 + c1 dt .. . dyn = dn yn + cn dt where each component of the solution y1 , . . . , yn is obtained in the same manner as in equation (129) ci yi (t) = Ki edi t − di
Section 7.3 Applications of Eigenvalues and Eigenvectors so that the solution of the system (127) is ⎡ ⎢ ⎢ − → x =P⎢ ⎣
335
c1 ⎤ d1 ⎥ ⎥ .. ⎥. . ⎦ c n Kn edn t − dn K1 ed1 t −
Consider the system of ordinary differential equations dx1 = x1 − 4x2 + 2 dt dx2 = −2x1 − 6x2 − 5. dt 1 −4 2 → − The matrix form (131) has A = and b = . The characteristic polyno−2 −6 −5 mial of A is E XAMPLE 7.21
det(λI − A) = (λ − 1) (λ + 6) − 8 = λ2 + 5λ − 14 = (λ + 7) (λ − 2) . −8 4 1 −1 2 The r.r.e.f. of −7I2 − A = whereas the r.r.e.f. of 2I2 − A = is 0 0 2 −1 1 4 1 4 equals . Populating columns of P with eigenvectors we obtain P = 2 8 0 0 1 −4 which diagonalizes A : 2 1 −7 0 −1 P AP = D = . 0 2 → − → → → − → x and − c = P −1 b we need to determine c . We can do When substituting − y = P −1 − 1 −4 2 → − → − → so by solving the system P − c = b . Since the r.r.e.f. of P | b = is 2 1 −5 −2 1 0 −2 → ,− c = . The system (133) now becomes 0 1 −1 −1 dy1 = −7y1 − 2 dt dy2 = 2y2 − 1 dt and has a general solution ⎤ ⎡ 2 −7t e − K y1 ⎢ 1 7 ⎥ =⎣ 1 ⎦ 2t y2 K2 e + 2 so that the final solution⎡is ⎤ ⎤ ⎡ ⎤⎡ 2 −7t − ⎥ ⎢ x1 ⎥ ⎢ 1 −4 ⎥ ⎢ K1 e 7 ⎥ ⎥ = ⎢ ⎢ ⎥⎢ ⎦ ⎣ ⎣ ⎦⎣ 1 ⎦ 2t K2 e + x2 2 1 2⎤ ⎡ 16 ⎥ ⎢ K1 e−7t − 4K2 e2t − 7 ⎥ = ⎢ ⎣ 1 ⎦ 2K1 e−7t + K2 e2t − 14 with arbitrary constants K1 and K2 .
336
Chapter 7 Eigenvalues and Singular Values → will lead to specifying the constants K , . . . , K → x Imposing an initial condition − x (t0 ) = − 0 1 n similarly to equation (130), as illustrated in the following example.
E XAMPLE 7.22 Three tanks contain salt dissolved in water. The capacities of the three tanks are V1 = 20, V2 = 80, and V3 = 40 (in liters); they contain solutions with initial salt contents of 3 kg, 2 kg, and 1 kg, respectively. At the constant rate of 80 liters per minute •
water (containing no salt) flows into the first tank,
•
solution from the first tank flows into the second tank,
•
solution from the second tank flows into the third,
•
solution from the third tank drains outside the system.
We assume the contents of the three tanks are constantly stirred to ensure uniform concentration of the solute. Our objective is to find the salt contents (in kg) in the three tanks as functions of time: x1 (t), x2 (t), and x3 (t). The following system of differential equations describes this situation: rate of change of salt content in the 1st tank rate of change of salt content in the 2nd tank rate of change of salt content in the 3rd tank
x1 (80) 20
dx1 dt
= −
dx2 dt
=
x1 x2 (80) − (80) 20 80
inflow of salt from 1st tank
dx3 dt
=
x2 x3 (80) − (80) 80 40
inflow of salt from 2nd tank
current salt concentration in the first tank times the (out)flow rate
minus the outflow
minus the outflow
⎡
⎤ −4 0 0 → − − → ⎢ ⎥ In the matrix form (131), this system has A = ⎣ 4 −1 0 ⎦ and b = 0 . Clearly, the 0 1 −2 eigenvalues of A are −4, −1, and −2 (see Exercise 26 on p. 309). Since these are distinct and real, by Theorem 7.3, A is diagonalizable. ⎤ ⎤ ⎡ 0 0 0 1 0 − 32 ⎥ ⎢ ⎥ ⎢ The r.r.e.f. of −4I3 − A = ⎣ −4 −3 2 ⎦. 0 ⎦ is ⎣ 0 1 0 0 0 0 −1 −2 ⎡ ⎤ ⎡ ⎤ 3 0 0 1 0 0 ⎢ ⎥ ⎢ ⎥ The r.r.e.f. of −1I3 − A = ⎣ −4 0 0 ⎦ is ⎣ 0 1 −1 ⎦ . 0 −1 1 0 0 0 ⎡ ⎤ ⎡ ⎤ 2 0 0 1 0 0 ⎢ ⎥ ⎢ ⎥ The r.r.e.f. of −2I3 − A = ⎣ −4 −1 0 ⎦ is ⎣ 0 1 0 ⎦ . ⎡
•
•
•
0 −1 0 ⎡
0 0 0
⎤ ⎡ ⎤ 3 0 0 −4 0 0 ⎢ ⎥ ⎢ ⎥ Therefore, we have P −1 AP = D with P = ⎣ −4 1 0 ⎦ and D = ⎣ 0 −1 0 ⎦. 2 1 1 0 0 −2
Section 7.3 Applications of Eigenvalues and Eigenvectors
337
Following the procedure described above we obtain the general solution ⎡ ⎤ ⎤ ⎡ ⎤⎡ x1 (t) 3 0 0 K1 e−4t ⎢ ⎥ ⎥ ⎢ ⎥⎢ ⎣ x2 (t) ⎦ = ⎣ −4 1 0 ⎦ ⎣ K2 e−t ⎦ K3 e−2t x3 (t) 2 1 1 ⎤ ⎡ 3K1 e−4t ⎥ ⎢ = ⎣ −4K1 e−4t + K2 e−t ⎦. −4t −t −2t 2K1 e + K2 e + K3 e The values of constants K1 , K2 , K3 can now be determined by imposing the initial condition x1 (0) = 3, x2 (0) = 2, x3 (0) = 1 so that (check!) K1 = 1, K2 = 6, K3 = −7 and the final solution is x1 (t) = 3e−4t x2 (t) = −4e−4t + 6e−t x3 (t) = 2e−4t + 6e−t − 7e−2t .
EXERCISES 1. In Exercise 31 on p. 55, you were asked to find a transition matrix A such that S (n) S (n+1) =A C (n+1) C (n) describes the change in probabilities of sunny or cloudy weather on a given day in a town, where • •
if a day is sunny, then 9 out of 10 times it is followed by another sunny day, if a day is cloudy, then it has 70% chance of being followed by another cloudy one.
S
SUNNY
0.1 0.9
C
CLOUDY
0.3 0.7
a. Show that A is a regular stochastic matrix. S (0) 1 b. Starting with a sunny day, i.e., = , calculate the probability vectors C (0) 0 S (n) for n = 1, 2, 3, 4, 5. C (n) c. Repeat part b starting with a cloudy day. d. Find the stable vector of A and compare it to the vectors obtained in parts b and c. 2. Repeat the previous exercise for a different town, in which the weather is more likely to change. See the following figure.
338
Chapter 7 Eigenvalues and Singular Values
S
SUNNY
C
0.2 0.8
CLOUDY
0.4 0.6
In Exercises 3–6, consider the given layout of the rooms and a rat choosing any of the openings at random to move to another room. Find the transition matrix A corresponding to the Markov chain. One of the matrices in each exercise is regular, and one is not. For the → regular one, determine the stable vector − v. b.
3. a.
2 1
2
1 3
4. a.
b.
2 1
1
2
3
3 5. a.
b.
3 1
1
3
2
4
2 4
6. a.
b.
3 1
1
3
2 4
2
4
Section 7.3 Applications of Eigenvalues and Eigenvectors
339
In Exercises 7–8, consider the “web” consisting of the three pages linked as described. a. Find the stochastic matrix A as in Example 7.18 and show that A is not regular. → b. Find the regular stochastic matrix C of (126) and find the stable vector − v. c. Determine the page rank of the three pages. 7. Page 1 links to both pages 2 and 3, while each of the pages 2 and 3 links to page 1 only. 8. Page 1 links to both pages 2 and 3. Pages 2 and 3 link to each other. (No link leads to page 1.) 9.
a. * Show that when the process defined by x(n+1) =A y (n+1) 0 1 x(0) with A = is started with 1 1 y (0) sequence n 0 1 2 3 4 5 6 7
x(n)
y (n) = 8
1 1
9
, it calculates the Fibonacci 10
···
x 1 1 2 3 5 8 13 21 34 55 89 · · · which satisfies the recursive relationship x(0) = x(1) = 1; x(n+2) = x(n) + x(n+1) for n = 0, 1, . . . . b. * Show that A is a regular matrix (although it is not Use Theorem 7.6 to stochastic). (n) x determine the limit behavior (123) of the sequence , calculating the constant y (n) 1 x(0) = according to (124). Verify that this prediction agrees with the c for y (0) 1 actual values generated. (n)
10. Prove that the matrix C defined in (126) is a stochastic matrix.
ò
In Exercises 11–16, find the general solution of the given system of ordinary differential equations. 11.
dx1 dt
= 2x1 + 6x2 ;
12.
dx1 dt
= −2x1 − 2x2 ;
13.
dx1 dt
= −3x1 − 3x2 − 4;
14.
dx1 dt
= 3x1 + 7x2 + 3;
15.
dx1 dt
= 3x1 + x2 + x3 + 2;
16.
dx1 dt
= x1 + 2x2 + 3;
ò
dx2 dt
= 3x1 + 5x2
dx2 dt
= −3x1 − 3x2 dx2 dt
dx2 dt
dx2 dt
= −6x1 + 4x2 + 1
= 7x1 + 3x2 + 4 dx2 dt
= x1 + 3x2 + x3 + 2;
= x2 + 2x3 + 2;
dx3 dt
dx3 dt
= x1 + x2 + 3x3 + 5
= 2x2 + x3 − 2
In Exercises 17–20, find the solution of the given system of ordinary differential equations that satisfies the given initial condition. 17.
dx1 dt
= 3x1 − x2 ;
18.
dx1 dt
= 2x1 + 2x2 + 6;
dx2 dt
= −5x1 + 7x2 + 6; x1 (0) = 0; x2 (0) = 1 dx2 dt
= −5x1 − 9x2 + 6; x1 (0) = −10; x2 (0) = −5
340
Chapter 7 Eigenvalues and Singular Values
ò
19.
dx1 dx2 dt = 2x1 − x2 − 1; dt = −x1 x1 (0) = 3; x2 (0) = −3; x3 (0) =
20.
dx1 dx2 dt = −2x1 + 3x3 − 4; dt x1 (0) = −10; x2 (0) = −14;
+ 3x2 − x3 + 5; 0
dx3 dt
= x1 + 3x2 + 2x3 + 1; x3 (0) = −12
= −x2 + 2x3 − 1;
dx3 dt
= −x1 + 2x3 + 4;
In Exercises 21–22, consider the connected tanks containing solution of salt. Find the salt contents (in kg) in those tanks as functions of time (in minutes). 21. a.
10 l/min
b.
10 l/min
20 l/min
10 l/min V1 = 10 liters x1(0) = 2 kg
10 l/min
10 l/min
V1 = 10 liters x1(0) = 1 kg
V2 = 10 liters x2(0) = 1 kg 10 l/min
10 l/min 10 l/min
10 l/min V2 = 10 liters x2(0) = 2 kg
10 l/min
V3 = 10 liters x3(0) = 1 kg
10 l/min 22. a.
b.
10 l/min
15 l/min
10 l/min
5 l/min
10 l/min
10 l/min V2 = 5 liters V1 = 10 liters 10 x1(0) = 3 kg l/min x2(0) = 1 kg
V1 = 10 liters x1(0) = 0 kg
10 l/min
V2 = 10 liters x2(0) = 2 kg
10 l/min
10 l/min
10 l/min V3 = 10 liters x3(0) = 1 kg
7.4 Singular Value Decomposition In Section 6.4, we have shown that every m × n matrix A has a singular value decomposition (SVD). The main purpose of this section will be to develop a procedure to obtain such a decomposition.
Section 7.4 Singular Value Decomposition
341
It is easy to check that the n × n matrix AT A is symmetric (see Exercise 30 in Section 1.3 on p. 33). By Theorem 7.5, it is orthogonally diagonalizable – V T AT AV = C, or, equivalently, AT A = V CV T ,
⎡
⎢ → ⎢ with an orthogonal matrix V = [− v1 | · · · |− v→ n ] and a diagonal matrix C = ⎣ → → → → → → (A− vi ) · (A− vi T − vj = vj ) = − vi T AT A− vj = λj −
(
λ1
0 ..
0 λj
if i = j
0
if i = j
.
⎤ ⎥ ⎥. ⎦
λn
implies that
• •
2 → all eigenvalues of AT A are nonnegative (since λj = A− vj ) and → the vectors A− v , . . . , A− v→ are orthogonal. 1
n
Let us arrange the λi ’s in nonincreasing order: λ1 ≥ · · · ≥ λr > λr+1 = · · · = λn = 0 and rearrange the columns of V accordingly.
→ v→ The m × n matrix AV = [A− v1 | · · · |A− n ] has columns that are orthogonal vectors, whose √ √ √ lengths are λ1 ≥ · · · ≥ λr > 0 = · · · = 0. Therefore, taking σ i = λi for i = 1, . . . , r it is possible (see Exercises 16 and 17 in Section 6.3 on p. 283) to write AV = U Σ
(134)
which is a desired SVD (see (103)) and can be rewritten (see (104)) A = U ΣV T .
(135)
As in Section 6.4
• •
U denotes an m × m orthogonal matrix whose columns are left singular vectors, D 0 Σ = is an m × n rectangular diagonal matrix – its main diagonal entries 0 0 σ 1 ≥ · · · ≥ σ r > σ r+1 = · · · = σ min(m,n) are the singular values of A; the first r (nonzero) singular values are also main diagonal entries of the r × r diagonal matrix D,
•
V is an n × n orthogonal matrix whose columns are right singular vectors.
We shall at times find it convenient to partition U and V in the following way:
6
U 5 and V = V6 V5 U = U m×m
n×n
m×r m×(m−r)
(136)
n×r n×(n−r)
6 and V6 ) can be easily so that the columns that correspond to the nonzero singular values (in U 5 and V5 ). distinguished from the remaining ones (in U
342
Chapter 7 Eigenvalues and Singular Values We summarize our construction below.23 Procedure for constructing an SVD of A 1. Perform an orthogonal diagonalization of the symmetric matrix AT A : AT A = V CV T arranging the eigenvalues on the main diagonal of C in a nonincreasing order λ1 ≥ · · · ≥ λr > λr+1 = · · · = λn = 0. (Refer to the procedure on p. 318.) √ Calculate the singular values σ i = λi for i = 1, . . . , r and the matrix Σ according to (105). 6 ) as a unit vector 2. Construct each of the first r columns of U (i.e., the columns of U → − in the direction of A vi : 1 → → − ui = A − vi if i ≤ r. σi 5 , should be constructed 3. The remaining m−r columns of U, that is, the columns of U 6. as an orthonormal basis for the orthogonal complement of the column space of U This step will be similar to the procedure illustrated in Example 6.13 in Section 6.2, followed by the Gram-Schmidt process.
˜) (Column space of U ˆ )⊥ = (Column space of U
We shall follow the above procedure to obtain a singular value decomposi⎡ ⎤ 1 2 ⎢ ⎥ tion of the matrix A = ⎣ 1 0 ⎦ . E XAMPLE 7.23
0 1 Step 1. ⎤ 2 1 1 0 2 2 ⎥ T A A= . 0 ⎦= 2 0 1 2 5 1 λ−2 −2 The characteristic polynomial is det( ) = (λ−2)(λ−5)−4 = λ2 −7λ+6 = −2 λ−5 (λ − 6)(λ − 1). √ Since the eigenvalues are λ1 = 6 and λ2 = 1, the singular values of A are σ 1 = 6 and σ 2 = 1. 4 −2 → − → − T For λ = 6, the system (λI − A A) x = 0 has coefficient matrix with reduced −2 1 ( + 1 1 − 12 2 . The corresponding eigenspace is span . Therefore, row echelon form 0 0 1 √1 → − 5 the first right singular vector is v = .
⎡
1 ⎢ ⎣ 1 0
1
√2 5
→ − − For λ = 1, the homogeneous system (λI −A A)→ x = 0 has coefficient matrix T
23
−1 −2 −2 −4
This procedure is adequate for simple problems we solve in this book by hand. Different algorithms are more appropriate for calculations performed using finite precision arithmetic, which often involve much larger problems.
1 2 with r.r.e.f. 0 0 2 − √5 → − is v2 = . 1
Section 7.4 Singular Value Decomposition 343 ( + −2 . The eigenspace is span . The second right singular vector 1
√ 5
We obtain V =
− √25 √1 5
√1 5 √2 5
⎤ ⎡ √ 6 0 ⎥ ⎢ and Σ = ⎣ 0 1 ⎦. 0 0
Step 2. 6 are Columns of U ⎡
⎤ 1 2 1 ⎢ ⎥ → − → = 1 A− v1 = √ ⎣ 1 0 ⎦ u 1 σ1 6 0 1 ⎡
1 2
⎤
1⎢ ⎥ → − → = 1 A− v2 = ⎣ 1 0 ⎦ u 2 σ2 1 0 1
√1 5 √2 5
− √25 √1 5
⎡
⎤ ⎡ ⎤ 1 2 5 1 ⎢ 1 ⎢ ⎥ 1 ⎥ =√ ⎣ 1 0 ⎦ = √ ⎣ 1 ⎦. 30 30 2 0 1 2 ⎡
1 2
⎤
1 ⎢ ⎥ =√ ⎣ 1 0 ⎦ 5 0 1
−2 1
⎡
0
⎤
1 ⎢ ⎥ = √ ⎣ −2 ⎦ . 5 1
Step 3. →, should be a unit vector spanning the one-dimensional space The remaining column of U, − u 3 ⊥ − → − → (span{u1 , u2 }) . This space is⎡also the solution space of⎤ the homogeneous system with coefficient matrix 5 1 2 √ √ ⎢ √ 30 30 30 ⎥ − → ⎥ ⎢ u1 T ⎥. ⎢ = ⎥ ⎢ − → T u2 ⎣ 1 ⎦ −2 √ √ 0 5 5 At the beginning process, it is a good idea to scale each row by its common of the elimination 5 1 2 denominator: . 0 −2 1 The reduced row echelon form
1 0
1 0 2 1 − 12
implies that the solution space is
⎧⎡ ⎧⎡ ⎤⎫ ⎤⎫ 1 ⎪ ⎪ ⎨ −1 ⎪ ⎬ ⎨ −2 ⎪ ⎬ ⎢ ⎥ ⎢ ⎥ span ⎣ 12 ⎦ =span ⎣ 1 ⎦ . Therefore, we can take ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ ⎩ ⎭ 2 1 ⎡
⎤ −1 − → = √1 ⎢ 1 ⎥ . u ⎣ ⎦ 3 6 2
344
Chapter 7 Eigenvalues and Singular Values We found the following singular value decomposition of our matrix: ⎡ ⎤ 5 1 √ 0 −√ ⎢ 30 ⎡ 6 ⎥ ⎥⎡ √ 2 ⎤ √1 ⎡ ⎤ ⎢ ⎢ ⎥ √ ⎢ ⎥ 6 0 ⎢ 5 1 2 5 −2 1 ⎥ ⎥⎢ ⎢ ⎥ ⎢ √1 ⎥⎢ ⎢ √ √ 0 1 ⎦ ⎣ ⎣ 1 0 ⎦=⎢ ⎢ 30 ⎢ 5 6 ⎥ ⎢ ⎥ ⎣ −2 1 ⎢ ⎥ 0 0 0 1 √ √ ⎥ ⎢ 5 5 ⎣ 2 1 2 ⎦ A Σ √ √ √ T V 30 5 6
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
U
SVD and the four fundamental spaces
In Exercise 19 in Section 4.6 (p. 215), we have introduced the four fundamental spaces of a matrix A : the row space of A, the column space of A, as well as the null spaces of A and of AT . A singular value decomposition of A is closely related to all four. The reader may recall from the proof of Theorem 6.13 that the equation (134) can be rewritten so that each side is partitioned into columns → → − → → → − → − → − → − [ A− v1 | · · · | A− vr | A− v− r+1 | · · · | Avn ] = [ σ 1 u1 | · · · | σ r ur | 0 | · · · | 0 ]. r columns
n−r columns
r columns
n−r columns
This makes it clear that •
•
•
− → → the columns of V5 , − v− r+1 , . . . , vn , form an orthonormal basis for the null space of A (in fact, they can be picked to be any orthonormal basis for that space, which is one of the reasons why SVD is not unique); − → vr , form an orthonormal basis for the row space of A – unlike V5 , the columns of V6 , − v1 , . . . , → this cannot be just any orthonormal basis for that space, however; the requirement is that the → → vr be orthogonal (this is ensured by using orthogonal diagonalization vectors A− v1 , . . . , A− T of A A); → – unit vectors in the directions →, . . . , − 6 , i.e., vectors − u as a consequence, the columns of U u 1 r → − → − of the vectors A v , . . . , A v – form an orthonormal basis for the column space of A (again, 1
•
SVD and rank
r
this is not just any such orthonormal basis); −−→, . . . , − 5, u u→, can be chosen to be any orthonormal basis for the finally, the columns of U r+1
m
orthogonal complement of the column space of A, i.e., for the null space of AT .
Consider the following two linear systems: System I
x −x
+ 2y + y
System II
= 2 = 4
and
x 3x
+ 2y + 5y
= 2 = 4
These systems may appear to be very similar; e.g., the rank of each coefficient matrix 1 2 1 2 A= and B = −1 1 3 5 equals 2, making each system possess a unique solution. In fact, we can easily solve them by hand to find out it is the same solution for both: x = −2, y = 2. In many ways, however, these
Section 7.4 Singular Value Decomposition
345
two systems are very different. To begin to explore these differences, let us plot the two lines corresponding to the equation in each system.
+ -x
y=
4
y
y 3
3 2
System I
1 x + 2y =2 -3 -2 -1 1 2 -1 -2
1 x+2 y
x 3
System II
2
-3 -2 -1
1 -1 -2
=2
2 3 3x +5 y=
x
4
When trying to “eyeball” the solution, the first picture is much more helpful than the second one. The lines in the first picture can be seen to be forming an angle close to 90◦ , whereas the lines in the second picture follow very similar directions, making their intersection look “fuzzy”. Indeed, the coefficient matrix of the second system is very close to being singular. matrix A
matrix B
σ1 ≈
2.303
6.243
σ2 ≈
1.303
0.160
The approximate singular values of A and B (i.e., roots of eigenvalues of AT A and B T B) are listed in the margin. The large value of the ratio between the largest and the smallest singular value (σ 1 /σ 2 ≈ 38.97) is an indication that B is much closer to being singular (or “rank deficient”) than the matrix A (for which σ 1 /σ 2 ≈ 1.77). E XAMPLE 7.24 Consider the 9 × 9 matrix ⎡ ⎤ 0.03 0.80 0.57 0.40 0.95 0.00 0.12 0.33 0.01 ⎢ ⎥ 0.23 0.10 0.53 −0.11 0.08 0.38 ⎥ ⎢ 0.20 −0.13 −0.33 ⎢ ⎥ ⎢ 0.23 0.43 0.05 0.23 0.52 0.62 −0.04 0.07 0.44 ⎥ ⎢ ⎥ ⎢ ⎥ 0.40 0.30 0.06 0.35 0.80 −0.07 ⎢ 0.00 0.05 0.45 ⎥ ⎢ ⎥ C≈⎢ 0.41 0.04 0.29 0.56 0.62 −0.04 0.12 0.44 ⎥ ⎢ 0.23 ⎥. ⎢ ⎥ ⎢ −0.03 0.27 0.23 0.06 0.25 −0.27 0.08 0.07 −0.16 ⎥ ⎢ ⎥ ⎢ 0.11 0.93 0.57 −0.11 0.60 0.18 0.11 −0.17 0.14 ⎥ ⎢ ⎥ ⎢ ⎥ 1.07 0.77 0.51 1.25 0.27 0.12 0.43 0.16 ⎦ ⎣ 0.03 0.00 0.27 0.20 0.40 0.55 0.09 0.03 0.35 0.05 Linearly independent 9-vectors were used to determine four of the columns of C, while the remaining columns were filled with linear combinations of the other four. Can you tell which ones are which? We didn’t think so! Actually, we can’t tell them apart either. Not only that, but using standard procedures developed earlier in this book, it can be shown that C is row equivalent to I9 ; therefore, rank C = 9, so that all of its columns are linearly independent! This discrepancy is explained by the fact that we used finite precision arithmetic when writing entries of C, which have been rounded from the entries of the original linearly dependent vectors. Fortunately, singular values of C reveal the true nature of the data involved: σ 1 ≈ 3.157699, σ 2 ≈ 1.474733, σ 3 ≈ 0.821445, σ 4 ≈ 0.360370, σ5
≈ 0.009484, σ 6 ≈ 0.006951, σ 7 ≈ 0.004126, σ 8 ≈ 0.001321, σ 9 ≈ 0.000597.
Since the first four singular values are much larger than the remaining five, we can conclude that the “numerical rank” of C is 4.
346
Chapter 7 Eigenvalues and Singular Values
SVD and linear transformation
→ → Considering the linear transformation F : Rn → Rm defined by F (− x ) = A− x , it follows that Σ = U T AV is the matrix of F with respect to the bases composed of the columns of V and columns of U (see Exercise 21 in Section 6.1 on p. 261). In other words, denoting → − → −→ S = {− v1 , . . . , − v→ n } and T = {u1 , . . . , um }, → → [A− x ] = Σ[− x] . T
S
Let us draw a diagram similar to the one introduced in part (b) of the solution of Example 5.23: ∗∗∗∗ → − → x −→ A− x ↓∗ ↑∗∗∗ ∗∗ − → → [ x ]S −→ [A− x ]T Just as the transition marked **** corresponds to multiplication by a matrix, it turns out that so do the remaining three transitions in this diagram: → → * [− x ]S = V T − x (see Exercise 20 in Section 6.1 on p. 261) → − → → − = Σ[ x ] = ΣV T − x ** [A x ] S
T
→ → → x *** A− x = U [A− x ]T = U ΣV T −
z y
multiply by A
1 v1 v2 1 x
∗∗∗∗
−→
Av1= s1u1
Av2= s2u2 = u2
u3
u1 y
x USV T x=Ax
x
multiply by V T
↓∗
↑∗∗∗
multiply by U
z
V T v2
y
∗∗
−→
1
1 x
VTx
multiply by S
V T v1
SV T x Ö``6 x
SV v1 T
1 SV T v2
y
Section 7.4 Singular Value Decomposition
347
Least squares problem and pseudoinverse In Section 6.3, we have introduced the problem of finding a least squares solution of a (possibly → − → inconsistent) linear system A− x = b . Our approach to finding a least squares solution was based on solving a system with the same coefficients, but whose right-hand side has been projected onto W, the column space of A (see (94)): → − → A− x = proj b . W
While the resulting system is guaranteed to be consistent and its solution is a least squares → − → solution of A− x = b , this solution may or may not be unique.
→ − → − → → → If − x is a least squares solution of A− x = b , therefore a solution of A− x = projW b , it can → → be easily seen that − z , the orthogonal projection of − x onto the row space of A, also satisfies → − → − A z = projW b since T T → → → − → → x = V6 V6 T AT x = AT − x = A− x A− z = A V6 V6 T − projection matrix onto row space of A
as projecting each of the rows of A (columns of AT ) onto the row space of A leaves it unchanged.
→ − → Furthermore, all least squares solutions of A− x = b , when projected onto the row space, → → result in the same vector − z (apply the result of Exercise 18 in Section 6.3 on p. 283 to A− x = → − → − → − A z = projW b ). Therefore, z is the only least squares solution in the row space of A. → → → → Consequently, the inequality (90) on p. 275, − z ≤ − x , becomes sharp, − z < − x , for → − → − → − → − → − all least squares solutions x = z , making z the unique least squares solution of A x = b → that has minimum length. To derive an explicit formula for − z , we introduce the notion of a pseudoinverse of A.
D EFINITION The pseudoinverse of an m × n matrix A with a SVD A = U ΣV T is the n × m matrix A + = V Σ+ U T where r rows D−1 0 + Σ = 0 0 n − r rows r columns
m − r columns
348
Chapter 7 Eigenvalues and Singular Values
Because AA+
T + T = U ΣV VΣ U In
= U ΣΣ+ U T D 0 D−1 6|U 5 = U 0 0 0 6D | 0 ⎡ U ⎢ ⎢ ⎢ ⎢ ⎣
⎤ 6T U 0 ⎢ ⎥ ⎣ − ⎦ 0 5T U ⎤ −1 6 T D U
⎡
− 0
⎥ ⎥ ⎥ ⎥ ⎦
6T 6 DD−1 U = U Ir
6U 6T , = U → − → − → x = b . Moreover, since A+ b is a least squares solution of A− V V T A+ = A+ ,
1 2 10 0 1 1 =- 1 + 0 1 = -1 1
z
it follows that
-1 1
→ − − → z = A+ b
3
2 0 1
(137)
→ − → is the least squares solution of A− x = b that has minimum length.
0 0 3
One of the consequences of (137) is that the pseudoinverse of A is unique. (See Exercise 24.)
1
O 2 0 1 x
1 2
1
1 1 0
y
E XAMPLE 7.25 In Example 7.23, we established that ⎡ ⎤ ⎡ ⎤ √ ⎤⎡ √ √ √ √ 1 2 0 −1/ 6 6 0 5/ 30 √ √ √ ⎥⎢ 2/ 5 1/ 5 ⎢ ⎥ ⎢ ⎥ √ √ . 1 ⎦ 1/ 6 ⎦⎣ 0 ⎣ 1 0 ⎦ = ⎣ 1/ 30 −2/ 5 √ √ √ −2/ 5 1/ 5 0 1 0 0 2/ 30 1/ 5 2/ 6 VT A
U
Σ
The pseudoinverse of A is A+
= V Σ+ U T √ 1/ 5 √ = 2/ 5 =
1/6 1/3
⎡ √ √ √ 5/ 30 −2/ 5 1/ 6 0 0 ⎢ √ 0 ⎣ √ 0 1 0 1/ 5 −1/ 6 5/6 −1/3 . −1/3 1/3
√ 1/ 30 √ −2/ 5 √ 1/ 6
⎤ √ 2/ 30 √ ⎥ 1/ 5 ⎦ √ 2/ 6
⎡
⎤ 0 ⎢ ⎥ − Let us use this pseudoinverse to find the least squares solution of the system A→ x = ⎣ 0 ⎦ 3
Section 7.4 Singular Value Decomposition that has minimum length: − → z =
Application: Change of variables in double integrals
ò
1/6 1/3
−1/3 1/3
5/6 −1/3
349
⎡
⎤ 0 −1 ⎢ ⎥ . ⎣ 0 ⎦= 1 3
In Examples 1.33 (p. 50) and 3.10 (p. 136), we derived linearizations of nonlinear transformations from R2 to R2 using their Jacobian matrices. As we saw there, the magnitude of the Jacobian determinant acts as a scaling factor involved in such transformations: when it’s greater than one (or less than 1), the corresponding portion of the region expands (or shrinks) as a result of performing the transformation. Singular value decomposition of the Jacobian matrix provides additional information about the geometry of the associated transformation.
y t=1
16
t=2
15 t = 0
t=3
14
t=4
13 12
s=4
The Jacobian matrix11at (s,t)=(3,3) has singular values s110 »6.7 and s2»3.1. Because of this, and of different singular 9 is different from... vectors, the resulting ellipse 8 7 6
t
5 4
4
3
3
2
2
1
O
1
2
... the ellipse corresponding to the s Jacobian at (s,t)=(1,1) and s 3 s1»2.3 and s2»1.3.
s=3
1
x 1
2
3
4
5
-1 -2 -3 -4 s=0
s=1
6
7
8
9
10 11 12 13 14 15 16 s=2
350
Chapter 7 Eigenvalues and Singular Values Let us consider a different example of a nonlinear transformation.
⎤ st → E XAMPLE 7.26 The nonlinear transformation − r (s, t) = ⎣ s2 − t2 ⎦ has the Jacobian 2 ⎡ ⎤ ∂ ∂ (st) (st) ⎢ ⎥ ∂s ∂t t s ⎢ ⎥ matrix, J = ⎢ . ⎥= ⎣ ⎦ s −t ∂ s 2 − t2 ∂ s 2 − t2 ( ) ( ) ∂s 2 ∂t 2 ⎡
Since J is symmetric, its singular values are the magnitudes of its nonzero eigenvalues (see √ Exercise 23 on p. 358). Because these eigenvalues are ± s2 + t2 , we have σ 1 = σ 2 = √ s 2 + t2 . As can be seen in the following illustration, a unit circle centered at (s, t) is transformed to a √ circle (rather than ellipse) with radius s2 + t2 .
y
8 7 6 t
5
4
4
3
3
2
2
1
1
x
s
O
1
2
3
4
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16
-1 -2 -3 -4 -5 -6 -7 -8
This is an example of a conformal transformation since it preserves angles (see Exercise 20 on p. 358).
Section 7.4 Singular Value Decomposition
351
⎡
Application: Surface integrals
ò
⎤ x(s, t) ⎢ ⎥ → A commonly used representation of a surface S involves parametric equation − r (s, t) = ⎣ y(s, t) ⎦, z(s, t) which defines a transformation from R2 to R3 . While a Jacobian matrix can be found, ⎤ ⎡ xs (s, t) xt (s, t) ⎥ ⎢ J(s, t) = ⎣ ys (s, t) yt (s, t) ⎦ , zs (s, t) zt (s, t) we cannot find a determinant of such matrix. To properly perform integration on S, )) )) f (x, y, z)dS = f (x(s, t), y(s, t), z(s, t)) g(s, t) ds dt, S
(138)
Q
we need to identify the “scaling factor” g(s, t) that accounts for the shrinking or expanding that elements of Q (in R2 ) undergo when they are mapped to S (in R3 ). After recalling Example 7.23 on p. 342, as well as the discussion that follows it, we hope that the reader will agree with the following conclusion: g(s, t) = σ 1 (s, t) σ 2 (s, t) (139) where σ 1 (s, t) and σ 2 (s, t) denote the singular values of the Jacobian matrix J(s, t).
Application: Cartography
ò
Mapping the features from the surface of the Earth to two-dimensional media has been explored for many years. Such mapping actually represents going from R3 to R2 , ⎡ ⎤ x X(x, y, z) ⎢ ⎥ F ( ⎣ y ⎦) = ; Y (x, y, z) z however, it is often more convenient to develop the corresponding formulas by going in the “reverse” direction: ⎡ ⎤ x(X, Y ) X ⎢ ⎥ G( ) = ⎣ y(X, Y ) ⎦ . Y z(X, Y ) Approximating the surface of the Earth with a sphere and choosing the unit of length so that the radius equals 1, we obtain standard formulas relating the (x, y, z) coordinates to the longitude θ and the latitude φ x =
cos θ cos φ,
y
=
sin θ cos φ,
z
=
sin φ.
In a cylindrical projection (the word “projection” here is used more loosely than in the preceding sections), we typically assume
This results in
G(
X Y
θ
= X,
φ
= h(Y ). ⎡
⎤ cos X cos(h(Y )) ⎢ ⎥ ) = ⎣ sin X cos(h(Y )) ⎦ sin(h(Y ))
352
Chapter 7 Eigenvalues and Singular Values and the Jacobian
⎡
− sin X cos(h(Y )) ⎢ J = ⎣ cos X cos(h(Y )) 0
⎤ − sin(h(Y ))h (Y ) cos X ⎥ − sin(h(Y ))h (Y ) sin X ⎦ . cos(h(Y ))h (Y )
While this may look intimidating, singular values of J (i.e., reciprocals of singular values of the Jacobian of F ) are square roots of eigenvalues of the following surprisingly simple matrix: 0 cos2 (h(Y )) T . (140) J J= 0 (h (Y ))2 (You should be able to prove it easily using basic trigonometric identities – see Exercise 25.) The latitude φ is assumed to be between −π/2 and π/2, making its cosine nonnegative. Moreover, the mapping φ = h(Y ) must be invertible, with the inverse relationship Y = H(φ);
we shall therefore assume h (Y ) > 0 for all Y (another possibility is that h (Y ) < 0 would result in a map that is “upside down”). Consequently, singular values of J are cos(h(Y )) and h (Y ), while the singular values of the 1 1 1 = and = H (φ). Jacobian of F are cos(h(Y )) cos φ h (Y ) Here are four examples of cylindrical projections.
Confirm these findings by comparing each of the six circles placed on the surface of the globe above to its image in the corresponding projection.
Projection
Y = H(φ) =
Mercator
ln(sec φ + tan φ)
Cylindrical Equidistant
φ
Lambert Cylindrical Equal-Area
sinφ
Gall Orthographic
2 sin φ
singular value pair formulas values at φ = 0 π/6 π/3 √ 1/ cos φ 1 2/ 3 2 √ 1/ cos φ 1 2/ 3 2 √ 1/ cos φ 1 2/ 3 2 1 1 1 1 √ 1/ cos φ 1 2/ 3 2 √ cos φ 1 3/2 1/2 √ 1/ cos φ 1 2/ 3 2 √ 2 cos φ 2 3 1
Several important properties of these projections can be deduced from singular values of their Jacobians: •
Mercator projection is conformal (it preserves angles) since it has σ 1 = σ 2 .
•
Both Lambert and Gall projections preserve areas since they have constant σ 1 σ 2 . The position of left and right singular vectors have been noted as well.
Section 7.4 Singular Value Decomposition
Mercator Projection
Cylindrical Equidistant Projection
353
354
Chapter 7 Eigenvalues and Singular Values
Lambert Cylindrical Equal-Area Projection
Gall Orthographic Projection
Section 7.4 Singular Value Decomposition
355
While cylindrical projections we discussed so far are quite popular, there are numerous other types of projections in use. Without going into details involving their formulas, let us present just one example: a gnomonic projection.
q=0
q = p/6 f = p/3
f = p/6 Gnomonic Projection
f=0
356
Chapter 7 Eigenvalues and Singular Values As before, we show how small circles on the surface of the Earth become distorted. Here are some things worth noting: •
The shape of the distorted ellipse depends on both the longitude θ and the latitude φ (for cylindrical projections, it only depended on the latter).
•
Neither the left nor the right singular vectors are generally aligned with the meridians and the parallels (which was the case for cylindrical projections).
•
The reason why we did not include singular vectors for the bottom left circle is that there is no unique way to position these since both singular values are equal at that location (the singular vectors form bases for two-dimensional spaces). This was also true for some of the circles of cylindrical projections (including all of those for Mercator projection); we still plotted examples of singular vectors that could be used. What is different here is that there is no way to choose these vectors to change continuously around that location (look at the rotations singular vectors undergo). The subject of mapping the Earth has a long history and is quite fascinating. Regrettably, in this book we will have to cut our discussion of it short, concluding with some further references:
•
Timothy G. Feeman, Portraits of the earth: A mathematician looks at maps, Mathematical World, Vol. 18, American Mathematical Society, Providence, RI, 2002.
•
Eric W. Weisstein, “Map Projection”, from MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/MapProjection.html (and hyperlinks therein).
EXERCISES In Exercises 1–4, find a singular value decomposition of the given matrix. Compare your answer to the results obtained in Exercises 3, 7, and 10 on p. 293.
−3 −1 1. (see part a of Exercise 3 and part i of Exercises 7 and 10 on p. 293) 1 3 3 −2 2. (see part b of Exercise 3 and part ii of Exercises 7 and 10 on p. 293) 2 −3 3 0 3. (see part c of Exercise 3 amd part iii of Exercises 7 and 10 on p. 293) 0 −1 0 3 4. (see part d of Exercise 3 and part iv of Exercises 7 and 10 on p. 293) −5 0 In Exercises 5–10, find a singular value decomposition of the given matrix. −2 2 5. . 1 2 −2 −2 6. . 5 2
Section 7.4 Singular Value Decomposition 7. 8.
4 2 −1 2
357
.
−2 1 2 −1
.
⎡
⎤ −2 0 ⎢ ⎥ 9. ⎣ 0 2 ⎦. −4 −4 ⎡ ⎢ 10. ⎣
0
2
⎤
⎥ 2 −1 ⎦ . −1 −2
11. * Given an SVD of an m × n matrix A, D 6 5 A= U |U 0 show that AT = V6 | V5
0
0
D
0
0
0
⎤ V6 T ⎥ ⎢ ⎣ − ⎦, V5 T ⎡
⎤ 6T U ⎥ ⎢ ⎣ − ⎦, 5T U ⎡
(141)
making the singular values of A and AT the same. In Exercises 12–15, use the result of Exercise 11 to find a singular value decomposition of each matrix. 0 1 1 12. . 1 0 −2 1 0 1 13. . 2 1 −2 1 0 2 14. . 2 0 4 ⎡
⎤ 0 1 0 −1 ⎢ ⎥ 15. ⎣ 1 2 −1 0 ⎦. 1 0 −1 2 ⎡
⎤ −2 0 ⎢ ⎥ 16. Use an SVD obtained in Exercise 9 to find the pseudoinverse of ⎣ 0 2 ⎦. −4 −4 1 0 1 17. Use an SVD obtained in Exercise 13 to find the pseudoinverse of . 2 1 −2
358
Chapter 7 Eigenvalues and Singular Values
18.
a. b.
c.
19.
a. b.
1 0 2 Use an SVD obtained in Exercise 14 to find the pseudoinverse of A = . 2 0 4 Use A+ obtained in part a to find the least squares solution of smallest magnitude of 1 → the inconsistent system A− x = . 1 Verify that the solution found in part b is among the solutions found in Exercise 13 on p. 283. ⎡ ⎤ 0 1 0 −1 ⎢ ⎥ Use an SVD obtained in Exercise 15 to find the pseudoinverse of A = ⎣ 1 2 −1 0 ⎦. 1 0 −1 2 + Use A obtained in part a to find the least squares solution of smallest magnitude of ⎡ ⎤ 2 ⎢ ⎥ → the inconsistent system A− x = ⎣ 0 ⎦.
1 c. Verify that the solution found in part b is among the solutions found in Exercise 14 on p. 283.
20. * Let A be an n × n matrix with equal nonzero singular values σ 1 = · · · = σ n = c = 0. → → → a. Show that for any n-vector − x , A− x = c − x. → → → − → − → − → x ·− y ). b. Show that for any n-vectors x and y , (A x ) · (A− y ) = c2 ( − → − → − c. Use parts a and b to show that the transformation F ( x ) = A x is conformal, i.e., it → → preserves angles. (Hint: Show that the cosine of the angle between − x and − y equals → − → − the cosine of the angle between A x and A y .) 21. * Use equation (116) on p. 310 to prove that the product of all singular values of an m × n matrix A satisfies
σ1 · · · σr
σ1 · · · σr
det(AT A) if r = n ≤ m,
= det(AAT ) if r = m ≤ n. =
22. * Show that A+ A is the matrix of projection onto the row space of A. 23. * Show that if A is a symmetric matrix, then the singular values of A are the magnitudes of the eigenvalues of A. (Hint: Use orthogonal diagonalization of A.) 24. * Show that if A is an m × n matrix, its pseudoinverse is unique. (Hint: Use the fact → − → that (137) provides the unique least squares solution with smallest length of A− x = b, → − − → − considering m least squares problems with b = → e1 , . . . , b = − e→ m .)
ò
25. * Prove equation (140).
ò
26. The formula for g(s, t) of (138) found in many calculus books is ⎡ ⎤ ⎡ ⎤ xs (s, t) xt (s, t) ⎥ ⎢ ⎥ ⎢ g(s, t) = ⎣ ys (s, t) ⎦ × ⎣ yt (s, t) ⎦ . zs (s, t) zt (s, t) Use the result of Exercise 21 to prove that this formula for g(s, t) is equivalent to (139) when rank J = 2.
Section 7.4 Singular Value Decomposition
ò
359
√ 27. Each complex number x + yi (where i = −1 is the imaginary unit) canbe repre x x sented in the plane using the vector . Show that if the transformation F ( )= y y u(x, y) (or, written in terms of complex number notation F (x + yi) = u(x, y) + v(x, y) iv(x, y)) has continuous partial derivatives that satisfy the Cauchy-Riemann equations ux = vy and uy = −vx with ux = 0 or uy = 0, then F is a conformal transformation. (See Exercise 20.) Hint: Show that the Jacobian matrix of F has two equal positive singular values.
360
Chapter 7 Eigenvalues and Singular Values
7.5 Chapter Review
.
.
,
Section 7.5 Chapter Review
361
362 Appendix A Answers to Selected Odd-Numbered Exercises
A
Answers to Selected Odd-Numbered Exercises
Section 1.1
⎤ −2 √ √ −3 1 12 0 ⎢ ⎥ •1. ; length is 13. •3. ⎣ −4 ⎦ ; length is 2 5. •5. a. ; b. ; c. ;d. 3. −2 5 −8 −3 0 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −1 2 0 √ −4 −9 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ •7. . •9. . •11. − 2. •13. 9. •15. a. ⎣ 2 ⎦ ; b. ⎣ 4 ⎦ ; c. ⎣ −2 ⎦ ; d. 2 2. 8 1 4 −6 −2 ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎡ ⎤ ⎡ ⎤ −2 −2 4 0 ⎥ ⎢ ⎥ ⎢ ⎥ 4 ⎢ −5 ⎢ 0 ⎥ ⎢ 6 ⎥ ⎢ 1 ⎥ ⎢ 1 ⎥ √ ⎢ ⎥ ⎢ ⎥ ⎥ ; d. 13. ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ; c. ⎢ ; b. ⎢ •17. ⎣ 10 ⎦ . •19. ⎣ 0 ⎦ . •21. ⎣ √2 ⎦ . •23. 4. •25. a. ⎢ ⎥ ⎥ ⎥ ⎣ −3 ⎦ ⎣ 2 ⎦ ⎣ 3 ⎦ √1 −10 13 2 0 4 2 ⎤ ⎡ −2 ⎢ ⎥ ⎢ −2 ⎥ ⎥ •27. ⎢ ⎢ 3 ⎥ . •29. 15. •31. TRUE. •33. FALSE. •35. TRUE. ⎣ ⎦ −4 ⎡ ⎤ 1 ⎢ ⎥ → •37. FALSE; e.g., counterexample: − u = ⎣ 0 ⎦ , c = −3. 0 3 2 1 → − → − → − •39. FALSE; e.g., counterexample: u = , v = ,w = . 0 0 0 ⎡ ⎤ ⎡ ⎤ −3 0 y' y ⎢ ⎥ ⎢ ⎥ •45. a. ⎣ 6 ⎦ ; b. ⎣ 0 ⎦ .
⎡
−1 0 1 2 3 → − → − → − − → •51. p = , q = , p + q = ; 4 2 6 3 4 7 → − → → → u = , − v = ,− u +− v = . 3 1 4 → − → − → → q + 0.35 f . •53. − a = 0.40 t + 0.25−
C D A B
x'
1
x 1 Illustration for Exercise 51
Section 1.2 •1. a. a21 = −2; b. a34
⎡
⎤ 0 ⎢ ⎥ = −4; c. col3 A = ⎣ −3 ⎦ ; d. row2 A = −2 −5
3 −3 1
;
Appendix A Answers to Selected Odd-Numbered Exercises ⎡ ⎢ ⎢ T e. A = ⎢ ⎢ ⎣
⎤
a. a diagonal matrix 2 −2 4 b. an upper triangular matrix ⎥ 1 3 5 ⎥ ⎥ . •3. c. a lower triangular matrix 0 −3 −5 ⎥ ⎦ d. a scalar matrix −1 1 −4 e. an identity matrix f. a symmetric matrix
⎡
⎤ 4 −1 ⎢ ⎥ •5. a.⎣ 8 −2 ⎦ ; b. cannot be evaluated; c. 0 3 3 −1 1 3 4 •7. LHS = + = 1 2 0 4 1 T = •9. AT
3 −1
1 2
T
=
3 −1 1 2
K
L
M
N
Yes
Yes
No
No
Yes
Yes
No
No
Yes
Yes
Yes
No
No
Yes
No
No
No
No
No
No
Yes
Yes
No
No
363
⎡
⎤ −18 −3 ⎢ ⎥ ⎣ −3 −3 ⎦ . 0 0 2 1 3 3 −1 4 2 ; RHS = + = . 6 0 4 1 2 1 6
= A. •11. A and D. •13. A and B. •15. TRUE. •17. TRUE.
•19. FALSE. •21. TRUE. •23. Matrix exists – e.g.,
2 0 0 2
. •25. Matrix exists – e.g.,
0 0
0 0
.
Section 1.3 ⎤ 7 3 3 16 −1 −1 13 −3 ⎢ ⎥ •1. a. ; b. cannot be evaluated; c. ⎣ 12 3 3 ⎦ ; d. ; 18 7 7 −3 −2 9 1 1 6 8 6 8 1 9 11 1 9 11 e. ; f. . •3. a. ; b. cannot be evaluated; c. ; 10 −4 10 −4 10 12 6 10 12 6 22 −10 0 18 22 −10 0 18 29 4 d. ; e. . •5. a. ; b. cannot be evaluated. 12 4 0 −2 12 4 0 −2 5 24 ⎡ ⎤ 14 −9 −9 10 11 12 ⎢ ⎥ ; b.⎣ 48 −3 −3 ⎦ ; c. cannot be evaluated. •7. a. 14 13 6 50 5 5 4 2 11 •9 Both sides equal . •11. Both sides equal . •13. TRUE. 20 10 9 •15. FALSE. •17. TRUE. •19. FALSE. •21. TRUE. Total revenue ($/day) Total profit ($/day)
•23.
⎡
Plant 1
3,000,000
270,000
Plant 2
4,800,000
530,000
Plant 3
3,840,000
440,000
D.E.
S.T.
•25. John
$13.50
$11.00
Kate
$13.00
T.L. $17.50 ; John should choose Symmetric Telecom, while Kate ought to se-
$17.00 $12.00 ⎡ ⎤ −2 4 −1 −7 ⎢ ⎥ lect Transpose Labs. •33. ⎣ 1 −1 1 3 ⎦. 3 1 5 7
364
Appendix A Answers to Selected Odd-Numbered Exercises Section 1.4 4 2 1 12 ; . • 3. •1. 0 1 2 0 ⎡ ⎤ −4 0 −4 6 ⎢ ⎥ •9. ⎣ 16 0 0 ⎦ . • 11. 1 2 0 0 y •13. 3 a., b. 3 2 ; 1 1 e2 3 u c. . 0 e1 -3 -2 -1 1 1 -1
•15. a., b. 3 ; −1 3 c. . −1
0 −1
. • 5.
9 0 −2 6 0 7
2x
2
6x
3y
.
.
y F(e2)
1
F(u) x
0 -3 -2 -1 F(e1) 2 -1
3
-3
-3 y
2
F u
F(e2)
1
x
-2
0 -3 -2 -1 1 2 -1 F(e ) 1 -2
-3
-3
2
3
−1 0 0 −2
3
y
3
−1 0 −1 1
2 0 1 1
x 3 F(u)
•19. a. ; b. ; c. . −1 equals the matrix of rotation by 90 degrees clockwise. 0 1 1 √1 √1 0 0.9 0.3 0.36 2 2 2 2 = 1 1 . •31. a. ; b. . −1 √ √1 0 0.1 0.7 0.64 2 2 2 2
⎡
−
. • 7.
x
1 6 0 3
⎢ ⎢ . • 3. ⎢ ⎢ ⎣
0 0 7 0 0 5 1 −1 x
•5. a.
2 −1 3 0
2
Section 2.1
;
F
1 e2 0 e 1 1 -3 -2 -1 -1
3
2
1 6 0 0 3 1
0 0 0 1
-2
3
•17. a. (i); b. (iii); c. (ii). √ 3 −1 0 2 2 √ ; •21. 1 3 1 2 2 1 −1 √ √ 1 2 2 •23. 1 1 √ √ 0 2 2
•1.
-2
= 4 =
0
b.
+
⎤ ⎡ 1 4 ⎥ ⎢ ⎢ 3 5 ⎥ ⎥; ⎢ ⎥ 0 −6 ⎦ ⎢ ⎣ 0 3
0 0 7 0 0 5 1 −1
2y
+ z
= 2
− z
= 0
+ z
=
3x y
1 3 0 0
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
0
0 = 1 •7. a. Both reduced row echelon form, and row echelon form; b. row echelon form, but not in reduced row echelon form; c. neither; d. neither. ⎤ ⎡ 1 0 0 1 0 0 ⎥ ⎢ ; ii. one solution; iii. x = 0, y = 2; b. i. ⎣ 0 1 0 ⎦ ; ii. no solution; •9. a. i. 0 1 2 0 0 1
Appendix A Answers to Selected Odd-Numbered Exercises
365
1 0 −3 4 ; ii. infinitely many solutions; iii. x = 3z + 4, y = −2z + 5, z−arbitrary. 0 1 2 5 ⎡ ⎤ ⎡ ⎤ 1 3 0 0 0 1 0 −5 0 0 ⎢ ⎥ ⎢ 0 0 1 0 ⎥ ⎢ ⎥ ⎥ •11. a. i. ⎢ 3 0 0 ⎦; ⎢ 0 0 0 1 ⎥ ; ii. no solution; b. i. ⎣ 0 0 1 ⎣ ⎦ 0 0 0 0 1 0 0 0 0 0 ii. infinitely iii. x1 =arbitrary, x 2 = 5x4 ,x3 = −3x4 , x 4 =arbitrary, x5 = 0. many solutions; 1 5 −2 3 1 5 −2 1 0 •13. a. ; b. r.e.f.: ; r.r.e.f. ; c., d. x = 3, y = −1. 5 2 13 0 1 −1 0 1 −1 3 1 2 32 1 2 0 2 4 ; b. r.e.f.: ; r.r.e.f.: ; •15. a. augmented matrix: 1 2 −1 0 0 1 0 0 1 2 1 3 −1 7 ; c., d. 0 = 1 ⇒ no solution. •17. a. Augmented matrix: −1 3 2 4 0 7 1 0 1 −1 3 1 12 23 −1 2 2 ; r.r.e.f.: ; b. r.e.f.: 0 1 1 1 1 0 1 1 1 1 c., d. x1 = −x3 + x4 + 3, x2 = −x3 − x4 + 1, x3 =arbitrary, x4 =arbitrary. ⎡ ⎤ ⎤ ⎡ 3 0 1 2 1 0 13 23 ⎢ ⎥ ⎥ ⎢ •19. a. augmented matrix ⎣ −1 1 0 1 ⎦ ; b. r.e.f.: ⎣ 0 1 13 53 ⎦ ;
c. i.
4 2 1 4
⎤
⎡
0 0
1
2
1 0 0 0 ⎥ ⎢ r.r.e.f.: ⎣ 0 1 0 1 ⎦ ; c., d. x = 0, y = 1, z = 2. 0 0 1 2 ⎤ ⎤ ⎡ ⎡ 0 0 1 2 3 −3 1 2 3 −3 ⎥ ⎥ ⎢ ⎢ •21. a. augmented matrix ⎣ 2 1 3 0 6 ⎦ ; b. r.e.f.: ⎣ 0 1 1 −2 −2 ⎦ ; −1 1 0 −3 −6 0 0 0 0 0 ⎤ ⎡ 4 1 0 1 1 ⎥ ⎢ r.r.e.f.: ⎣ 0 1 1 −2 −2 ⎦ c., d. x = −z − w + 4, y = −z + 2w − 2, z =arbitrary, w =arbitrary. 0 0 0
0
⎡
⎢ ⎢ •23. a. augmented matrix ⎢ ⎢ ⎣ ⎡
0 1 0 1 1 2 3 0 −1 ⎤
0 0 1 2
1 5 1 0 0 −3 0 −3
⎤
⎡
⎥ ⎢ ⎥ ⎢ ⎥ ; b. r.e.f.: ⎢ ⎥ ⎢ ⎦ ⎣
1 0 0 0
0 1 0 0
1 0 5 0 −4 −1 1 2 −2 0 1 0
⎤ ⎥ ⎥ ⎥; ⎥ ⎦
1 0 ⎥ 0 −1 ⎥ ⎥ c., d. x = 1, y = −1, z = −2, w = 0. 0 −2 ⎥ ⎦ 1 0 ⎤ ⎡ 0 1 1 1 −2 0 ⎥ ⎢ ⎢ 0 −1 −1 −1 2 0 ⎥ ⎥; ⎢ •25. a. augmented matrix ⎢ 0 −3 3 0 ⎥ ⎦ ⎣ 1 −1 1 0 1 −2 1 0 ⎡ ⎤ ⎤ ⎡ 1 −1 0 −3 3 0 1 0 1 −2 1 0 ⎢ ⎥ ⎥ ⎢ ⎢ 0 ⎢ 1 1 1 −2 0 ⎥ 1 −2 0 ⎥ ⎥ ; r.r.e.f.: ⎢ 0 1 1 ⎥; b. r.e.f.: ⎢ ⎢ 0 ⎢ 0 0 0 0 0 0 0 0 ⎥ 0 0 0 ⎥ ⎣ ⎦ ⎦ ⎣ 0 0 0 0 0 0 0 0 0 0 0 0 c., d. x1 = −x3 + 2x4 − x5 , x2 = −x3 − x4 + 2x5 , x3 =arbitrary, x4 =arbitrary, x5 =arbitrary. •27. Such system does not exist: we need three leading entries in the first three columns of the r.r.e.f., but 1 0 ⎢ ⎢ 0 1 r.r.e.f.: ⎢ ⎢ 0 0 ⎣ 0 0
0 0 1 0
366
Appendix A Answers to Selected Odd-Numbered Exercises
1 0 0 3 0 0 1 5
there are only two rows, making it impossible. •29. E.g.,
⎡
⎤ 1 0 0 ⎢ ⎥ . • 31. E.g., ⎣ 0 0 1 ⎦ . 0 0 0
•33. FALSE. •35. TRUE. •37. TRUE.
Section 2.2 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ 1 0 0 1 2 0 1 1 0 0 1 −3 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ •1. a. ⎣ 0 1 0 ⎦ ; b.⎣ 0 1 −2 2 ⎦ . • 3. a. ⎣ 0 1 −2 ⎦ ; b. ⎣ 0 1 4 0 1 0 11 1 6 0 0 1 0 0 ⎤ ⎡ ⎡ 1 3 1 1 0 0 0 ⎥ ⎢ ⎢ ⎢ ⎢ 0 1 0 0 ⎥ 1 0 2 3 1 −2 ⎥ ; b. ⎢ 0 1 2 •5. a. ; b. . • 7. a. ⎢ ⎥ ⎢ 0 0 1 ⎢ 1 −1 1 0 1 4 4 ⎣ ⎣ 0 0 2 0 ⎦ 0 0 0 1 0 0 3 ⎤ ⎡ ⎤ ⎡ 2 5 0 0 1 0 ⎥ ⎢ ⎥ ⎢ ⎢ 0 3 ⎥ ⎢ 0 1 0 0 ⎥ ⎥ ⎥ ; b. ⎢ •9. a. ⎢ ⎢ 0 2 ⎥. ⎢ 1 0 0 0 ⎥ ⎦ ⎣ ⎦ ⎣ −7 1 0 0 0 1 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 0 0 1 1 0 0 0 1 0 0 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ 0 1 0 0 ⎥ ⎥ ⎥ ; b. ⎢ 0 1 0 0 ⎥ ; c. ⎢ 0 3 0 0 ⎥ . •11. a. ⎢ ⎢ 0 0 1 0 ⎥ ⎢ 0 0 1 0 ⎥ ⎢ 0 0 1 0 ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 1 0 0 0 5 0 0 1 0 0 0 1 ⎡ ⎤ ⎤ ⎡ ⎤ ⎡ 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ 0 1 0 0 0 ⎥ ⎢ 0 1 0 0 0 ⎥ ⎢ 0 1 0 0 − 12 ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ ⎢ •13. a. ⎢ 0 ⎥ 0 ⎥ ⎢ 0 0 1 0 ⎥ ; b. ⎢ 0 0 0 0 1 ⎥ ; c. ⎢ 0 0 1 0 ⎥. ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ 0 ⎦ 0 ⎦ ⎣ 0 0 0 1 ⎣ 0 0 0 1 0 ⎦ ⎣ 0 0 0 1 0 0 0 0 −6 0 0 0 0 1 0 0 1 0 0 •17.FALSE. •19.TRUE.
Section 2.3 •1. No. •3. Yes. •5. a. ⎡
1 ⎢ ⎣ 0 0
⎤
−5 3 2 −1
; b.
5 4 − 12
0 −2 ⎥ 0 ⎦; 0 1
− 34 1 2
⎤ 5 ⎥ 0 ⎦. 1 −2 5 −3 7
⎡
⎤
1 2 4 ⎢ ⎥ b. no inverse (singular matrix); c. ⎣ 1 2 3 ⎦ . •9. a. 1 1 3 ⎡
1 0 0 ⎢ ⎢ 0 1 0 ⎢ b. no inverse – singular matrix. •11. a. ⎢ ⎢ 0 0 1 ⎢ ⎣ 0 3 0 0 0 0
0 0 0 1
⎢ ⎢ ⎢ ⎢ ⎣
0 0 0 0
0 1
1 0 0 0 ⎤
⎥ ⎥ ⎥. ⎥ ⎦
; c. no inverse (singular matrix).
1 2
⎡
⎤
0 0 1 2 0 −1 0 0 ⎡ 1 ⎢ ⎢ 0 ⎥ ⎢ ⎥ ⎢ 0 ⎥ ⎥ ; b. ⎢ ⎢ ⎥ ⎢ 0 ⎥ ⎢ ⎦ ⎢ 0 ⎣ 0
−1 0 0 1 0 1 0 0 0 0
⎤ ⎥ ⎥ ⎥; ⎥ ⎦ 0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 1 4
0
0 0 0 0 0 1
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎦
•7. a.
Appendix A Answers to Selected Odd-Numbered Exercises 367 ⎤ ⎡ ⎤ x −1 x 0 ⎢ ⎥ ⎢ ⎥ •13. = . •15. ⎣ y ⎦ = ⎣ −4 ⎦ . y −1 z 6 −1 0 −1 . •17. Inverse transformation: reflection with respect to the y-axis (same as F ) A = 0 1 ⎡ ⎤ ⎡ ⎡ ⎤ ⎤ 3 4 7 1 1 1 1 ⎢ ⎢ ⎥ ⎥ ⎥ −1 T −1 ⎢ ) = •19. Not invertible. •21. a. B −1 A−1 = ⎣ 7 11 ; b. (A ; c. A 0 ⎦ ⎣ 1 ⎦= ⎣ 2 2 1 ⎦ 1 4 8 −1 −1 0 1 ⎡ ⎤ 4 ⎢ ⎥ ⎣ 5 ⎦ ; d. yes; by equivalent conditions, both matrices must be row equivalent to I4 . ⎡
5
⎡
⎢ ⎢ •23. a. A−1 (B −1 )T = ⎢ ⎢ ⎣ ⎡
1 0 ⎢ ⎢ 0 1 d. ⎢ ⎢ 0 0 ⎣ 0 0 •27. TRUE.
1 2 2 2
2 1 3 2
1 3 2 4
1 1 1 2
⎤
⎡
⎥ ⎢ ⎥ ⎢ ⎥; b. A−1 = ⎢ ⎥ ⎢ ⎦ ⎣
⎤ 0 0 ⎥ 0 0 ⎥ ⎥. 1 0 ⎥ ⎦ 0 1 •29. FALSE. •31. TRUE. •33. TRUE.
0 2 0 2
1 0 2 0
0 1 0 2
1 0 1 0
⎤
⎡
⎥ ⎢ ⎥ ⎢ ⎥ ; c. B −1 B −1 = ⎢ ⎥ ⎢ ⎦ ⎣
1 1 2 1
2 2 3 1
1 3 2 2
1 2 1 1
⎤ ⎥ ⎥ ⎥; ⎥ ⎦
Section 2.4
1
•1. 2 N2 O5 → 4 NO2 + O2 . •3. C10 H16 + 8Cl2 → 10C+16HCl. •5. Cannot be balanced. •7. Rb3 PO4 +CrCl3 → 3RbCl+CrPO4 . •9. All soltutions satisfy x = 200 + v, y = −100 + v, z = 100 + v, v =arbitrary, w = 100. For a solution to be feasible, it must have v ≥ 100. E.g., x = 400, y = 100, z = 300, v = 200, w = 100. •11. All solutions Feasible solutions w x y satisfy correspond to 3 500 x = −100 − z + w, the shaded z 400 100 600 y = 600 − z + w, region in the 300 w u 200 z = arbitrary, zw-plane, e.g., 100 v 500 300 u = −200 + w, x = 100, y = 800, 0 z v = 100 + w, z = 100, u = 100, 200 200 w = arbitrary. v = 400, w = 300. •13. Melt together 2 parts of alloy I, 2 parts of alloy II, and 3 parts of alloy III. •15. Many solutions. E.g., melt together 10 parts of alloy I, 10 parts of alloy II, 1 part of alloy III, and 7 parts of alloy V. •17. Adding 1 part of alloy III and 2 parts of alloy IV, we can obtain alloy V. It is not possible to obtain alloys III or IV from the other two. •19. p(x) = 2 + 5x − 3x2 . • 21. p(x) = 3x − x2 . • 23. p(x) = 2 + x + 4x2 − 3x3 . •25. No solution. •27. p0 (x) = 2 + 2x − x3 , p1 (x) = 3 − 1(x − 1) − 3(x − 1)2 + 1(x − 1)3 . •29. (i) Impossible; (ii) a = 2; (iii) a = 2. •31. (i) Impossible; (ii) for all real a values; (iii) impossible. •33. (i) a = ±2 and b = 0; (ii) a = ±2; (iii) a = ±2 and b = 0. •35. (i) Impossible; (ii) a = b and a = −b; (iii) a = b or a = −b.
4
2
5
100 200 300 400 500 600 700
6
368
Appendix A Answers to Selected Odd-Numbered Exercises 1 0 −2 → − → − → − •41. a. L = ; b. solving L y = b yields y = and 5 1 23 ⎤ ⎡ 1 0 0 0 ⎥ ⎢ ⎢ 1 1 0 0 ⎥ −1 → − → − → − ⎥; ⎢ solving U x = y yields x = . •43. a. L = ⎢ 3 1 0 ⎥ 3 ⎦ ⎣ 2 0 −1 2 1 ⎤ ⎡ ⎤ ⎡ 1 0 ⎥ ⎢ ⎥ ⎢ ⎢ −1 ⎥ ⎢ −2 ⎥ → − → − − → → − → − → − ⎢ ⎥ ⎢ ⎥ b. solving L y = b yields y = ⎢ ⎥ and solving U x = y yields x = ⎢ −1 ⎥ . ⎣ −2 ⎦ ⎦ ⎣ 0 1 ⎡ ⎤ ⎤ ⎡ ⎡ ⎤ 0 1 0 1 0 0 −1 1 0 ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ •45. P = ⎣ 1 0 0 ⎦ , L = ⎣ −3 1 0 ⎦ , U = ⎣ 0 3 1 ⎦. 0 0 1
−4
0 0 −1
2 1
Section 3.1 •1. a. −5; b. 33; c. 0. • 3. a. −10; b. 48. • 5. a. 6; b. −12; c. 3. • 7. a. −16; b. 0. •9. a. 6; b. −6; c. 0; d. 1/6; e. 48. •11. a. −6; b. −6; c. −3 2 . •13. a. 45; b. FALSE. •21. TRUE.
−5 3 ;
c. −81; . d.
1 40 ;
e. 5; f. 9. •15. FALSE. •17. TRUE. •19.
Section 3.2 •1. x1 = 2, x2 = 3. • 3. Cramer’s rule cannot be used here. •5. Cramer’s rule cannot be used here. •7. x1 = 0, x2 = 3, x3 = 4. • 9. a. ⎡
⎤ ⎡ ⎤ −1 7 −3 2 0 1 ⎢ ⎥ ⎢ ⎥ b. ⎣ −6 −3 −3 ⎦ ; c. ⎣ 0 0 0 ⎦ . • 11. −4 −2 3 4 0 2 •15. TRUE. •17. FALSE. •19. FALSE.
13 2 .
1 −2 −3 −1
;
• 13. a. 4; b. 1; c. 1.
Section 4.1 •1. Properties 3 and 7 are satisfied. Properties 2 and 8 are not satisfied. •3. Properties 7, 9, and 10 are satisfied. Property 8 is not satisfied. •5. Properties 2, 3, 4, 9, and 10 are satisfied. •7. Not a vector space. Properties 4 and 10 are not satisfied. •9. Vector space. All ten properties hold. Section 4.2 •1. Not a subspace of R2 . •3. Subspace of R2 . •5. Not a subspace of R2 . •7. Subspace of R3 . •9. Subspace of R3 . • 11. Subspace of P2 . • 13. Not a subspace of M22 . • 15. Not a subspace of M34 . •17. Subspace of FR . • 19. Not a subspace of FR . • 21. Yes. •23. No. •25. Yes. •27. Yes. •29. No. •31. FALSE. •33. TRUE. Section 4.3 → = 1− → − 1− →; c. linearly independent. •1. a. Linearly independent; b. linearly dependent, e.g., − u u u 3 1 2 − → − → − → → + 2− → − 3− →. •3. a. Linearly dependent, e.g., u2 = 0u1 ; b. linearly dependent, e.g., u2 = 2− u u u 1 3 4 •5. Linearly independent. •7. Linearly independent.
Appendix A Answers to Selected Odd-Numbered Exercises
369
•9. Linearly dependent, e.g., 1 + 2t + t2 = (1 + t) + t + t2 . •11. Linearly dependent, e.g., 2t + 2t3 = 2 1 + t + t2 − 2 1 + t2 − t3 . 0 0 0 3 0 1 •13. Linearly independent. •15. Linearly dependent, e.g., =0 . 0 0 0 0 2 0 •19. Linearly independent. Not possible to mix two of the alloys to obtain the third. •21. FALSE. •23. TRUE. Section 4.4 •1. Yes. •3. No. •5. No. •7. No. •9. Yes. •11. No. •13. Yes. •15. No. •17. No. •19.⎡Yes. ⎤ ⎡ ⎤ 0 1 1 1 1 ⎢ ⎥ ⎢ ⎥ •21. a. , ; b. 2; c. the plane. •23. a. ; b. 1; c. a line. •25. a. ⎣ 1 ⎦, ⎣ 0 ⎦ ; 2 1 −1 −1 2 b. 2; c. a plane. ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 1 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ •27. a. ⎣ 2 ⎦, ⎣ 4 ⎦ , ⎣ 2 ⎦ ; b. 3; c. the 3-space. •29. a. No basis; b. 0; c. a point. ⎡ ⎢ ⎢ •31. a. ⎢ ⎢ ⎣
1 1 0 2
⎤ ⎡
0 −1 1
⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎦ ⎣
0
3
1 0 0 1 ⎤ ⎡
3 ⎤ ⎡
2 2 0
⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎦ ⎣
⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥,⎢ ⎥ ⎢ ⎦ ⎣
−6
3
1 1 0
⎤ ⎥ ⎥ ⎥ ; b. 4. • 33. a. −t, 1 − t, t − t2 ; b. 3. ⎥ ⎦
2
⎡
⎤ ⎡ ⎤ ⎡ ⎤ 1 1 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ; b. 3. • 37. a. ⎣ −3 ⎦ , ⎣ 0 ⎦ , ⎣ 0 ⎦ ; 0 0 1
0 0 0 1 , 1 1 0 −1 ⎡ ⎤ ⎡ ⎤ 1 2 0 ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ b. ⎣ 0 ⎦ , ⎣ 0 ⎦ , ⎣ 1 ⎦ . • 39. FALSE. •41. TRUE. •43. TRUE. −2 −3 0 •35. a.
,
Section 4.5
⎡
⎤
⎡
⎤
⎡
3 1 −1 −1
⎤
⎡
5 −3 −1 3
⎤
3 −8 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥. •1. a. ; b. . • 3. a. ⎣ 4 ⎦ ; b. ⎣ 7 ⎦ . •5. a. ⎢ ⎥ ; b. ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ 1 −3 ⎤ ⎡ 1 ⎥ ⎢ ⎢ 2 ⎥ −1 1 2 1 − → ⎥ . •11. a. [ u ]S = ; •7. a. ; b. 10t + 2. •9. a. ⎢ ⎢ −1 ⎥ ; b. 0 1 −3 2 ⎦ ⎣ 4 ⎡ ⎤ 4 ⎢ ⎥ → → → [− v ]S cannot be found since − v is not in the plane spanned by S; b. − w = ⎣ 10 ⎦. −1 2
5 10
−1 Section 4.6
⎤ ⎡ 1 0 1 0 ⎢ ⎥ ⎢ •1. a. i. , ; ii. ⎣ 0 ⎦ , ⎣ 1 2 −1 2 3 2 1 2 b. i. ; ii. ; iii. 1; iv. −1 −2 1
⎡
⎤
⎤ −2 ⎥ ⎢ ⎥ ⎦ ; iii. 2; iv. ⎣ −3 ⎦ ; v. 1; 1 ; v. 1.;
⎡
370
Appendix A Answers to Selected Odd-Numbered Exercises ⎤ ⎤ ⎡ ⎤ ⎡ ⎡ ⎤ ⎡ ⎡ ⎤ −2 1 0 1 ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ 1 ⎥ ⎢ ⎢ 1 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ c. i. ⎣ −2 ⎦ ; ii. ⎢ ⎥ ; iii. 1; iv. ⎢ 0 ⎥ , ⎢ 1 ⎥ , ⎢ 0 ⎥ ; v. 3. −1 ⎦ ⎦ ⎣ ⎦ ⎣ ⎣ ⎦ ⎣ 1 1 0 0 2 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 −1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 1 0 0 ⎢ 0 ⎥ ⎢ 2 ⎥ ⎢ 0 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥,⎢ ⎥,⎢ ⎥ ; ii. ⎢ •3. a. i. ⎢ , , 0 1 0 ⎦ ; iii. 3; iv. no basis; v. 0; ⎣ ⎦ ⎣ ⎦ ⎣ ⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ 3 ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ −1 0 1 1 0 1 ⎤ ⎡ ⎡ ⎤ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎡ 1 0 0 0 0 0 ⎤ ⎤ ⎡ ⎤ ⎡ ⎡ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ 1 0 0 ⎢ 0 ⎥ ⎢ −1 ⎥ ⎢ −1 ⎢ 1 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ ⎢ 0 ⎥ ⎢ −1 ⎥ ⎢ 1 ⎢ −1 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ −1 ⎥ ⎢ 0 ⎥ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ; ii. ⎢ ⎥, ⎢ ⎥, ⎢ ; iii. 3; iv. , , b. i. ⎢ ⎥, ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥, ⎢ ⎢ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎢ 2 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ 1 ⎢ ⎢ 0 1 0 0 −1 ⎦ ⎦ ⎣ ⎦ ⎣ ⎣ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ⎢ 0 1 1 ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 1 ⎦ 0 −1 −1 1 2 0 → − → − → − v. 3. •5. a. rank [A | b ] = rank A + 1; b. rank [A | b ] = rank A = n; c. rank [A | b ] = rank A < n. ⎡ ⎤ 1 0 0 0 0 0 ⎢ ⎥ •7. TRUE. •9. TRUE. •11. e.g. ⎣ 0 1 0 0 0 0 ⎦ .
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥; ⎥ ⎥ ⎥ ⎦
0 0 0 0 0 0 •13. Such matrix cannot exist: rank+nullity=3; therefore, both must be no bigger than 3. → − → − − → → → → → •17. a. If − x ∈ (null space of B), then B − x = 0 . Therefore, AB − x = A 0 = 0 , which implies − x ∈ (null space of AB); ⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤ 0 1 0 0 0 0 → − ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ → → b. e.g., − x = ⎣ 1 ⎦ ∈ (null space of AB) but B − x = ⎣ 0 1 0 ⎦ ⎣ 1 ⎦ = ⎣ 1 ⎦ = 0 so that 0 − → x ∈ / (null space of B); p R null space of AB c. 1 0 0 0 null space 1 0 0 of B 0 1
0 0 0
0
0
Section 5.1 •1. a. Yes; b. yes; c. no. •3. a. No; b. yes; c. no. •5. a. Yes; b. no; c. no. •7. Yes. •9. Yes. •11. No. •13. No. •15. Yes. •17. Yes. •19. No. •21. No. •23. Yes.
Section 5.2 0 −2 0 0 1 0 1 1 −2 •1. a. , ; b. . •3. a. , ; b. , , , . 0 1 0 0 2 0 0 2 1 0 0 0 4 0 0 4 0 0 0 1 1 , ; b. , . •7. a. , ; •5. a. 0 0 0 2 0 0 0 −1 0 0 1 1 0 4 1 0 0 0 . •9. a. No basis; b. , ; c. yes; d. yes; e. yes. b. , 0 2 0 1 0 0 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 −1 1 1 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ •11. a. ⎣ 1 ⎦; b. ⎣ 0 ⎦, ⎣ 1 ⎦; c. no; d. no; e. no. •13. a. ⎣ −1 ⎦; b. , ; 0 1 1 −1 0 1
Appendix A Answers to Selected Odd-Numbered Exercises ⎤ ⎤ ⎡ ⎡ ⎡ ⎤ 0 1 ⎥ 2 ⎥ ⎢ ⎢ ⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ c. yes; d. no; e. no. •15. a. ⎣ −1 ⎦; b. ⎢ ⎥, ⎢ 2 ⎥; c. no; d. no; e. no. 1 ⎦ ⎦ ⎣ ⎣ 1 1 1 •17. a. 1; b. 1; c. no; d. no; e. no. •19. a. No basis; b. −1 + t2 , 1 − t − 2t2 , 1 + t2 (or any other set of 3 L.I. vectors in P2 ); c. yes; d. yes; e. yes. 0 1 0 0 −1 0 •21. a. , , ; b. 1; c. yes; d. no; e. no. 0 0 1 0 0 1 •23. a. 1, t, t2 ; b. no basis; c. no; d. no; e. no. •25. Degree of precision is 1. Section 5.3 •1.
3
1
−1 −1 ⎡
1 2 2 ⎢ 1 •7. ⎣ 0 2 −1 − 12
•11. PT ←S = •13. PT ←S = ⎡ ⎢ •15. PT ←S = ⎣
;
1
. •3.
−3 ⎤ ⎡ 3
−1
3
;
⎡
⎤ ⎡ ⎤ 4 2 −4 ⎢ ⎥ ⎢ ⎥ . •5. ⎣ 2 1 ⎦ ; ⎣ −2 ⎦ . 1 −2 2
0 −3 12 ⎡ ⎤ 2 1 0 0 2 ⎥ ⎢ ⎥ ⎢ ⎥ − 12 ⎦ ; ⎣ 6 ⎦ . •9. ⎣ 1 1 0 ⎦ ; −2 + 2t + 12t2 . − 12 −2 1 2 4 1 1 1 − 2 − → → 2 2 2 u ]S = PT ←S = [− u ]T . ; PT ←S [− = 1 5 1 3 2 2 2 −5 − 32 4 − 12 → → u ]S = PT ←S = [− u ]T . ; PT ←S [− = 3 1 2 −13 2 2 ⎡ ⎤ ⎡ ⎤ ⎤ −5 1 2 1 5 ⎢ ⎥ ⎢ ⎥ ⎥ → → u ]S = PT ←S ⎣ 6 ⎦ = ⎣ 3 ⎦ = [− u ]T . 1 2 2 ⎦ ; PT ←S [− ⎤
2 3 2 ⎡
6
⎡
⎤
−2
1 1 0 0 ⎢ ⎢ 1 ⎢ ⎥ •17. PS←T = ⎣ −2 2 0 ⎦ . • 19. PT ←S = ⎢ ⎢ 1 ⎣ 1 −2 1 1 •21. FALSE. •23. TRUE.
0 1 3 2 3
1
0 0 1 3
1
0 0 0 1
4 ⎤ ⎥ ⎥ ⎥. ⎥ ⎦
Section 6.1 •1. a. (ii) Orthogonal, but not orthonormal; b. (i) orthonormal; c. (iii) neither. ⎡ ⎤ −8 2 ⎢ ⎥ •3. a. (i) Orthonormal; b. (ii) orthogonal, but not orthonormal. •5. . • 7 ⎣ 13 ⎦ . −3 1 •9. a. Orthogonal matrix; b. not an orthogonal matrix; c. orthogonal matrix. •11. TRUE. •13. FALSE. •15. TRUE. Section 6.2 −10 √ ; the vectors form an obtuse angle; b. 0; the vectors are perpendicular; •1. a. √17 13 2 c. 5 ; the vectors form an acute angle. •3. a. are ⎤ in the same direction; ⎡ 1; the⎤vectors ⎡ 1
1
2 ⎥ ⎢ 2 ⎥ ⎢ −1 ⎥ ⎢ 2 ⎥ ⎢ −1 1 ⎥ ⎢ 2 ⎥ b. 2 ; the vectors form an acute angle. •5. ⎢ ⎢ 1 ⎥,⎢ 0 ⎥. ⎦ ⎣ ⎦ ⎣ 0 1
371
372
Appendix A Answers to Selected Odd-Numbered Exercises ⎡ ⎤ ⎡ ⎤ 5 −5 7 7 ⎢ −1 ⎥ ⎢ 8 ⎥ ⎢ 7 ⎥ ⎢ 7 ⎥ 1 0 ⎢ ⎢ ⎥ ⎥ •7. ⎢ ⎥,⎢ ⎥ . •9. a. −2 ; b. 0 . ⎣ 1 ⎦ ⎣ 0 ⎦ 0 1 y 4
v
y
4
3
3
2
2
1
u=n 1
v
x
-2 -1 -1 -2
1
u
2
p
x
-4 -3 -2 -1 p=0 1 -1
3
n
-2
Illustration for Exercise 9a.
Illustration 9b. ⎡ for Exercise ⎤ ⎤ 1 ⎡ ⎤ ⎡ 2 ⎤ −1 ⎢ 1 ⎥ ⎢ 2 ⎥ ⎥ ⎢ 1 3 ⎢ ⎥ ⎥ ⎢ −1 ⎢ ⎥ ⎢ −2 ⎥ ⎥ ⎢ ; b. ⎢ •11. a. ⎣ 1 ⎦ ; b. ⎣ 3 ⎦ . •13. a. ⎢ − 12 ⎥ ⎢ ⎥ ⎥. ⎢ 1 ⎥ ⎣ 1 ⎦ 1 1 ⎣ −2 ⎦ 3 1 − 12 ⎡ ⎤ ⎡ 7 ⎤ 1 7 ⎢ 1 ⎥ 9 2 ⎢ −2 ⎥ ⎥ ⎢ → → − → 8 . •17. proj ⎢ ⎥ . • 19. projspanS − •15. projspanS − u = u = u = 2 ⎦. ⎣ spanS ⎢ −1 ⎥ 9 ⎣ ⎦ 2 11 7 −2 −9 1 ⎡ ⎤ 8 ⎡ 1 1 1 ⎤ ⎢ 31 ⎥ 3 3 3 1 2 ⎢ 3 ⎥ −5 ⎢ 1 1 1 ⎥ → 5 ⎥ . •23. B = u =⎢ . •25. B = •21. projspanS − . ⎣ ⎢ 1 ⎥ 3 3 3 ⎦ 4 − 25 ⎣ ⎦ 5 1 1 1 ⎡ ⎢ ⎢ •27. B = ⎢ ⎢ ⎣
⎡
7 3 1 4 1 4 − 14 − 14
1 4 1 4 − 14 − 14
− 14 − 14 1 4 1 4
3
⎤
⎡ 5 − 14 ⎥ 9 1 ⎥ −4 ⎥ ⎢ . •29. B = ⎣ − 29 1 ⎥ 4 ⎦ − 49 1 4
− 29 8 9 − 29
− 49 − 29 5 9
3
3
⎤ 1 0 3 ⎥ ⎢ 1 ⎥ ⎢ 0 − ⎥ 3 ⎥ ⎢ . . •31. B = ⎦ ⎢ 0 0 1 0 ⎥ ⎦ ⎣ 1 2 − 13 0 3 3 ⎤
⎡
Section 6.3 ⎡ ⎤ ⎤ ⎡ ⎤ ⎡ 2 ⎤ 1 −2 1 −3 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ •1. a. Orthogonal basis: ⎣ 1 ⎦ , ⎣ 0 ⎦ ; orthonormal basis: ⎣ 13 ⎦ , √12 ⎣ 0 ⎦ ; 2 1 2 1 3 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 2 2 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ b. orthogonal basis: ⎣ 1 ⎦ , ⎣ 3 ⎦ ; orthonormal basis: √16 ⎣ 1 ⎦ , √162 ⎣ 3 ⎦ ; −1 7 −1 7 ⎡ ⎡ ⎡ ⎤ ⎤ ⎤ ⎡ ⎤ 1 3 1 3 ⎢ ⎢ ⎢ ⎥ ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ ⎢ 1 ⎥ ⎢ 1 ⎥ ⎥ ⎥ ⎥, ⎢ ⎥ ; orthonormal basis: 1 ⎢ 1 ⎥ , √1 ⎢ 1 ⎥ . c. orthogonal basis: ⎢ 2 ⎢ ⎢ ⎢ 1 ⎥ ⎢ −3 ⎥ ⎥ ⎥ 20 ⎣ 1 ⎦ ⎣ −3 ⎦ ⎣ ⎦ ⎣ ⎦ 1 −1 1 −1 ⎡
2 3 1 3
1 3 2 3
Appendix A Answers to Selected Odd-Numbered Exercises 373 ⎤ ⎡ ⎤ ⎡ ⎤ ⎤ ⎡ ⎤ ⎡ ⎡ −2 0 1 1 0 −4 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ 2 ⎥ ⎢ −1 ⎥ ⎢ 12 ⎥ ⎢ ⎢ ⎢ 2 −1 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 1 ⎢ ⎥ 1 ⎢ 1 ⎥ •3. a. Orthogonal basis: ⎢ ⎢ 0 ⎥ , ⎢ 2 ⎥ , ⎢ 0 ⎥ ; orthonormal basis: 3 ⎢ 0 ⎥ , √6 ⎢ 2 ⎥ , √18 ⎢ 0 ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎣ 1 1 2 2 1 1 2 ⎤ ⎤ ⎡ ⎤ ⎡ ⎡ ⎤ ⎡ 1 ⎤ ⎤ ⎡ ⎡ √ 1 2 1 1 1 3 ⎥ ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ 1 ⎥ ⎢ √ ⎢ 0 ⎥ ⎢ 2 ⎥ ⎢ −2 ⎥ ⎢ 0 ⎥ ⎢ 3 ⎥ ⎢ −2 ⎥ ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ ⎢ 1 ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0 ⎥ b. orthogonal basis: ⎢ −1 ⎥ ⎥ , √18 ⎢ ⎢ 0 ⎥ , ⎢ 0 ⎥ , ⎢ −1 ⎥ ; orthonormal basis: √2 ⎢ 0 ⎥ , ⎢ ⎥. ⎢ ⎥ ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ ⎣ 0 ⎦ ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎦ ⎢ ⎣ 1 ⎦ ⎣ 0 ⎦ √1 1 2 −1 −1 1 3 ⎡ 2 ⎤ ⎤ ⎡ 1 1 1 1 0 −2 3 3 3 2 2 3 ⎢ 1 ⎥ ⎥ ⎢ 1 2 •5. ⎣ 3 . • 11. . − 3 ⎦ . •7. ⎣ 0 1 0 ⎦ . • 9. 3 0 1 1 1 1 1 2 − − 0 2 ⎤ 2 ⎡ 3 ⎤3 ⎡ 33 − 2x x1 3 5 ⎥ ⎢ ⎥ ⎢ •13. ⎣ x2 ⎦ = ⎣ x2 ⎦ ( x2 and x3 are arbitrary). ⎡
x3
x3
Section 6.4 •1.a. 2, 1; b. 1, 1; c. 2, 0; d. 4, 1. •3. a. 4, 2; b. 5, 1; c. 3, 1; d. 5, 3. •5. a. iii; b. iv; c. i; d. ii. • 7. a. ii; b. iv; c. i; d. iii. •9. a. i; b. iii; c. ii; d. iv.
Section 7.1 •1. Answer (f). •3. Answer (c). •5. a. iv; b. i; c. ii. •7. a. Characteristic polynomial: λ2 − λ − 6 = (λ + 2)(λ − 3); eigenvalues: −2 and 3; b. characteristic polynomial: (λ + 3)(λ + 2)(λ − 1); eigenvalues: 1, −2 and −3. ⎡ ⎤ ⎡ ⎤ −9 −4 ⎢ ⎥ ⎢ ⎥ •9. For λ1 = −5 the eigenspace has basis ⎣ 9 ⎦ ; for λ2 = 2 the eigenspace has basis ⎣ −3 ⎦ ; for 2 4 ⎡ ⎤ 0 ⎢ ⎥ λ3 = 4 the eigenspace has basis ⎣ 0 ⎦ . 1
1 − 23 ; ; for λ1 = 1 the eigenspace has basis •11. a. For λ1 = 4 the eigenspace has basis 1 0 −1 5 b. for λ1 = 1 the eigenspace has basis ; for λ1 = 7 the eigenspace has basis ; 1 1 ⎤ ⎡ −3 −1 ⎥ ⎢ 2 . •13. a. For λ1 = 0 the eigenspace has basis ⎣ −1 c. for λ = 5 the eigenspace has basis ; 2 ⎦ 1 1 ⎤ ⎡ ⎤ ⎡ 0 0 ⎢ ⎥ ⎢ −2 ⎥ for λ2 = 4 the eigenspace has basis ⎣ 1 ⎦ ; for λ3 = −3 the eigenspace has basis ⎣ 7 ⎦ ;
0 ⎡
⎤ ⎡
⎤
1
⎡ ⎤ 1 0 0 ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ b. for λ1 = 2 the eigenspace has basis ⎣ 0 ⎦ , ⎣ −1 ; for λ2 = −3 the eigenspace has basis ⎣ 2 ⎦ ; 2 ⎦ 1 0 1
⎤ ⎥ ⎥ ⎥; ⎥ ⎦
374
Appendix A Answers to Selected Odd-Numbered Exercises ⎡ −5 ⎤ 2
⎢ c. for λ1 = 1 the eigenspace has basis ⎣ 1 1 ⎡ −2 ⎢ for λ3 = −1 the eigenspace has basis ⎣ 0 1 ⎡ 2 −3 ⎢ 4 for λ2 = −2 the eigenspace has basis ⎣ 3 ⎡
⎡
⎥ ⎢ ⎦ ; for λ2 = 2 the eigenspace has basis ⎣
−2 3 2
⎡
⎤
⎤ ⎥ ⎦;
1
⎤ 1 ⎢ ⎥ ⎥ ⎦ ; d. for λ1 = 1 the eigenspace has basis ⎣ 1 ⎦ ; 0 ⎡ ⎤ 0 ⎢ ⎢ 0 ⎥ ⎦ . •15. For λ1 = −1 the eigenspace has basis ⎢ ⎢ 1 ⎣ 1 0 ⎤ ⎡ ⎤
⎤ ⎥ ⎥ ⎥; ⎥ ⎦
−1 0 ⎢ ⎥ ⎢ 2 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎥ ⎢ ⎥ for λ2 = 0 the eigenspace has basis ⎢ ⎢ 1 ⎥ ; for λ3 = 1 the eigenspace has basis ⎢ −1 ⎥ . ⎣ ⎦ ⎣ 4 ⎦ 1 1 •17. FALSE. •19. TRUE. •21. FALSE. •23. TRUE.
Section 7.2
•1. a. Diagonalizable with P = ⎡
2 3 3 2
−3 and D = 0 ⎤ ⎡ 0 ⎥ ⎢ ⎦ and D = ⎣ 0 0 ⎡ ⎤ 0 ⎢ ⎥ ⎢ 0 ⎥ ⎥ and D = ⎢ ⎢ 0 ⎥ ⎣ ⎦
0 2
; b. not diagonalizable;
⎤ 0 0 ⎥ 1 0 ⎦ . •3. a. Not diagonalizable; 0 2 ⎤ 0 0 0 ⎥ −1 0 0 ⎥ ⎥. 0 2 0 ⎥ ⎦ 0 0 0 −2 0 0 0 4 −1 √2 √ −2 0 5 5 •5. a. Orthogonally diagonalizable: Q = and D = ; √1 √2 0 3 5 5 ⎡ ⎤ ⎡ ⎤ −1 √1 √ √1 1 0 0 2 6 3 ⎢ 1 ⎥ ⎢ ⎥ −1 ⎥ √ √1 √ and D = ⎣ 0 1 b. orthogonally diagonalizable: Q = ⎢ 0 ⎦; ⎣ 2 6 3 ⎦ √2 √1 0 0 −2 0 6 3 ⎤ ⎡ ⎤ ⎡ 0 1 0 0 −1 0 0 0 ⎥ ⎢ ⎢ √1 −1 −1 ⎥ √ ⎢ 0 2 0 0 ⎥ ⎥ ⎢ 3 0 √ 2 6 ⎥ ⎥; ⎢ ⎢ and D = ⎢ c. orthogonally diagonalizable: Q = ⎢ 1 ⎥ −1 ⎥ 1 ⎣ 0 0 2 0 ⎦ ⎣ √3 0 √2 √6 ⎦ √1 √2 0 0 0 2 0 0 3 6 d. not symmetric – not orthogonally diagonalizable. 682 683 1366 2730 512 −512 •7. a. ; b. ; c. . • 11. FALSE. •13. TRUE. 1366 1365 1365 2731 −512 512 2 −1 −2 ⎢ c. diagonalizable: P = ⎣ −4 1 1 1 0 0 ⎡ 1 −7 1 1 ⎢ ⎢ 0 −3 0 0 b. diagonalizable: P = ⎢ ⎢ 0 1 2 2 ⎣
Section 7.3 0.9 0.3 •1. a. A = is positive; therefore, it is regular. A is stochastic since each column has nonneg0.1 0.7 ative entries adding up to one; n 0 1 2 3 4 5 (n) b. S 1 0.9 0.84 0.804 0.7824 0.76944 C (n)
0
0.1
0.16
0.196
0.2176
0.23056
Appendix A Answers to Selected Odd-Numbered Exercises
c.
n (n)
S C (n)
0 0 1
1 0.3 0.7
2 0.48 0.52
3 0.588 0.412
4 0.6528 0.3472
− d. → v =
3/4 1/4
slowly).
⎤ 0 2/5 1/4 1 ⎢ ⎥ is not regular; b. ⎣ 2/3 0 3/4 ⎦ is regular; stable vector: 0 1/3 3/5 0 ⎡ ⎤ ⎡ ⎤ 0 1/10 1/3 0 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 3/10 ⎥ 0 1/3 1/3 ⎥ ⎢ 1/2 is regular; stable vector: ⎢ ⎢ 3/10 ⎥ ; b. ⎢ 1/2 1/3 0 2/3 ⎥ ⎣ ⎦ ⎣ ⎦ 0 3/10 1/3 2/3 0
•3. a. ⎡
0 1
0 ⎢ ⎢ 1 •5. a. ⎢ ⎢ 0 ⎣ 0 regular.
5
0.69168 0.30832
375
; both sequences in b and c appear to approach this vector (although c does so more ⎡
⎡
⎤ 1/4 ⎢ ⎥ ⎣ 5/12 ⎦ . 1/3
⎤ 1/3 1/3 0 ⎥ 0 0 1/2 ⎥ ⎥ is not 0 0 1/2 ⎥ ⎦ 2/3 2/3 0
⎡
⎤ ⎤ ⎡ 1 1 1 0 0 ⎥ ⎥ ⎢ 0 0 ⎦ ; A2 = ⎣ 0 12 21 ⎦ ; A3 = A, etc.; A is not regular. 0 0 0 12 21 ⎤ ⎡ ⎤ 9 36/62 20 10 10 ⎢ ⎢ ⎥ → 1 1 ⎥; − b. C = ⎣ 19 v = ⎣ 13/62 ⎦ ; c. page rank: page 1, followed by pages 2 and 3 (tied). 40 20 20 ⎦ 19 1 1 13/62 40 20 20 8t •11. x1 (t) = K1 e + 2K2 e−t ; x2 (t) = K1 e8t − K2 e−t . 9 −5t •13. x1 (t) = 3K1 e−5t − K2 e6t − 13 + 3K2 e6t − 10 . 30 ; x2 (t) = 2K1 e 1 1 •15. x1 (t) = K1 e5t + K2 e2t − 10 ; x2 (t) = K1 e5t − K2 e2t − K3 e2t − 10 ; x3 (t) = K1 e5t + K3 e2t − 85 . 2 2t 9 35 8t 2 2t 9 8t •17. x1 (t) = −7 24 e + 3 e − 8 ; x2 (t) = 24 e + 3 e − 8 . •19. x1 (t) = 32 e2t + e4t + et − 12 ; x2 (t) = −2e4t + et − 2; x3 (t) = e4t − 32 e2t + et − 12 . −3t •21. a. x1 (t) = −1 + 32 e−t ; x2 (t) = 12 e−3t + 32 e−t ; 2 e 4 −3t 4 −2t b. x1 (t) = 3 e − 3e + e−t ; x2 (t) = 43 e−3t + 23 e−2t ; x3 (t) = 43 e−3t + 23 e−2t − e−t . 0 ⎢ •7. a. A = ⎣ 1/2 1/2 ⎡ 1 9
Section 7.4 ⎡ ⎡ −1 1 ⎤ 1 ⎤ √1 √ √ √ 1 ⎥ 4 0 ⎢ 2 ⎥ ⎢ 2 2 2 •1. ⎣ 1 1 ⎦ 0 2 ⎣ −1 1 ⎦ . •3. 0 √ √ √ √ 2 ⎡ 22 −12 ⎤ ⎡ −12 2 ⎤ √ √ √ √ 3 0 1 ⎥ ⎥ ⎢ 5 ⎢ 5 5 5 •5. ⎣ 1 2 ⎦ 0 2 ⎣ 2 1 ⎦ . •7. 0 √ √ √ √ 5 5 5 ⎤ ⎡ 5 −2 −1 1 √ √ ⎡ ⎤⎡ 1 1 ⎤ 2 ⎢ 3 2 3 ⎥ 6 0 √ √ ⎥ ⎢ 1 2 ⎥⎢ ⎥⎢ ⎢ 2 ⎥ √1 •9. ⎢ √ ⎥ ⎣ 0 2 ⎦ ⎣ −12 1 ⎦. 2 3 ⎥ ⎢ 3 2 √ √ ⎣ −4 1 ⎦ 0 0 2 2 √ 0 3 3 2 ⎡ 2 1 − 23 3 3 √ 0 1 3 0 0 ⎢ 1√ 1 √ •13. 0 2 ⎣ 2 2 2 √ √ √ 2 0 1 0 0 1 2 1 −6 2 3 2 6 2
0 −1
1 0 ⎡ √ 0 2 5 0 ⎢ √ ⎣ 0 5 1
⎤ ⎥ ⎦.
3 0 0 1
0 1
2 √ 5 −1 √ 5
. 1 ⎤ √ 5 ⎥ 2 ⎦. √ 5
376
Appendix A Answers to Selected Odd-Numbered Exercises ⎡ ⎢ •15. ⎣ ⎡ ⎢ •17. ⎣
√ √ 2/ 6 0 −1/ 3 √ √ √ 1/ 2 −1/ 3 −1/ 6 √ √ √ 1/ 2 1/ 3 1/ 6 ⎡ ⎤ 0 1 2 ⎢ 1 2 9 ⎢ 6 1 ⎥ . •19. a. ⎢ 0 ⎢ 0 9 ⎦ ⎣ 1 − 29 2 − 16
⎡ 1 1 ⎤ √ 2√ 2 2 0 0 0 ⎢ 2 1 √ −2 2 ⎥⎢ 0 ⎥⎢ 6 0 0 ⎦⎢ ⎦⎣ 0 ⎢ 1 √2 0 ⎣ 2 0 0 0 0 1 1 − 2 ⎤ ⎤ 2 ⎡ ⎤⎡
1 8 7 24 − 18 1 − 24
1 8 1 − 24 − 18 7 24
1
⎥ ⎥ ⎥ ; b. ⎥ ⎦
8 ⎢ 7 ⎢ 24 ⎢ ⎢ −1 ⎣ 8 1 − 24
⎥ ⎥ ⎥. ⎥ ⎦
− 12 0 √ 1 2 2 1 2
1 2
1 2 √
⎤
⎥ 2 ⎥ ⎥. 0 ⎥ ⎦ 1 2
Appendix B Recurring References to Selected Transformations in R2 377
B
Recurring References to Selected Transformations in R2
Projection Section
1 0
A1
1.4
0 0
2
0 0 a ab , , ab b2 0 1 A2
A3
• Example 1.19, p. 38 Projection onto x-axis → → F1 ( − u u ) = A1 − Projection onto y-axis → → F2 ( − u ) = A2 − u • Exercise 36, p. 55 Projection onto a line → → F3 ( − u ) = A3 − u
Scaling (dilation, contraction, etc.)
k 0
0 k
Bk
• Example 1.20, p. 38 → → Gk (− u ) = Bk ( − u) Dilation if k > 1 Contraction if 0 < k < 1 Identity if k = 1 (B1 = I)
Rotation
cos α sin α
− sin α 0 −1 , cos α 1 0
Cα
Cπ/2
• Example 1.22, p. 39 Counterclockwise rotation by 90◦ → → u Hπ/2 (− u ) = Cπ/2 − • Example 1.23, p. 40 Counterclockwise rotation by α → → Hα ( − u ) = Cα − u • Example 1.27, p. 45 4 Composition Cπ/2 = I • Example 2.16, p. 94 Inverse of Cπ/2
2.3 4.5
• Discussion on p. 202 Dilation of scalable fonts
• Discussion on p. 202 Rotation of scalable fonts
5.1
• Exercise 27 a, p. 225 Scaling transformations form a subspace of space of lin. transf.
• Exercise 27 b, p. 225 Rotations form a subspace of space of lin. transf.
• Example 5.13, p. 226 Kernel and range of G2 • Example 5.14, p. 227 Kernel and range of G0 • Example 5.17, p. 231 G2 is both onto and one-to-one G0 is neither • Discussion on p. 232
• Discussion on p. 232 Hα is invertible
• Example 5.12, p. 226 Kernel and range of F1 • Example 5.17, p. 231 F1 is neither onto nor one-to-one 5.2
Gk is invertible for k = 0 5.3
• Example 5.25, p. 242 rankF1 +nullityF1 = dim R2
7.1
• Example 7.1, p. 296 Eigenvalues of A1 (geometrically) • Example 7.3, p. 298 Eigenvalues and eigenvectors of A1
• Example 5.25, p. 242 rank Gk +nullity Gk = dim R2 • Example 7.2, p. 297 Eigenvalues of Cπ/2 (geometrically) • Example 7.4, p. 299 Eigenvalues of Cπ/2 (algebraically)
Appendix C Twelve Equivalent Statements 379
C
Twelve Equivalent Statements
380
Appendix C Twelve Equivalent Statements
Index 381
Index adjoint, 132 affine combination, 10 angle between n-vectors, 265 augmented matrix, 63 axonometric transformation, 47, 236 barycentric combination, 10 basis, 176 standard, for Pn , 178 standard, for Rn , 177 Bernstein basis, 184 Bézier curve, 201 block multiplication, 30 cabinet transformation, 47, 236 Cartesian product, 150 cavalier transformation, 54 characteristic equation, 301 characteristic polynomial, 301 codomain, 36 coefficient matrix, 63 cofactor, 117 column rank, 206 column space, 206 companion matrix, 309 components of a vector, 2 composition of linear transformations, 45 conformal transformation, 358 consistent system, 66 contraction, 39 control points, 201 control polygon, 201 coordinate vector, 193 coordinate-change matrix, 246 Cramer’s rule, 130 cross product, 15 cubic spline, 111 degree of precision of quadrature, 233 determinant, 116 diagonal matrix, 17 diagonalizable matrix, 311 difference of matrices, 19 of vectors, 6 dilation, 39 dimension, 182 direct sum, 206 distance between points, 3 domain, 36 dominant eigenvalue, 327 dot product, 8 eigenspace, 302 eigenvalue, 296 eigenvector, 296 elementary column operations, 130 elementary matrix, 77 elementary row operations, 67
exchange matrix, 129, 309 feasible solutions, 102 fixed vector, 10 free vector, 10 fundamental spaces of a matrix, 215 Gauss-Jordan reduction, 72 Gaussian elimination, 72 Gaussian quadrature, 236 generalized eigenvector, 324 Gram-Schmidt process, 277 Hermite interpolating polynomial, 111 Hermite-Birkhoff interpolating polynomial, 111 homogeneous linear system, 82 homogeneous system solution space, 153 idempotent matrix, 284 identity matrix, 17 identity transformation, 39 image, 218 inconsistent system, 66 indexed set, 154 induction, proof by, 119 initial point, 2 interpolating polynomial, 103 intersection of subspaces, 163 inverse of a matrix, 87 inverse transformation, 94 inversion, 129 invertible linear transformation, 231 invertible matrix, 87 Jacobian matrix, 50 Jordan block, 310 kernel, 225 Lagrange interpolating polynomial, 103 leading column, 65 leading entry, 65 least squares solution, 279 length of a vector, 3 linear combination, 6 linear system, 62 solution, 62 linear transformation, 37, 218 matrix of, 37, 237 linear transformations composition of, 45, 222, 242 linearly dependent vectors, 163 linearly independent vectors, 163 lower triangular matrix, 17 LU decomposition, 104 magnitude of a vector, 3 main diagonal, 17
382
Index matrices block multiplication of, 30 row equivalent, 67 matrix, 17 addition, 18 augmented, 63 coefficient, 63 column, 17 entry, 17 inverse, 87 invertible, 87 multiplication, 26 noninvertible, 87 nonsingular, 87 power, 29 product, 26 row, 17 singular, 87 trace, 225 matrix of linear transformation, 237 midpoint rule, 236 minor, 116 monomial basis, 178 morphing, 12 multiplication of matrices, 26 multiplicity algebraic, 301 geometric, 302
polynomial interpolation, 103 polynomials, 144 position vector, 10 positive matrix, 326 power of matrix, 29 probablility vector, 329 product of matrices, 26 projection, 38, 42 projection onto subspace, 267 pseudoinverse, 347
n-space, 2 n-vector, 2 natural cubic spline, 111 negative of a vector, 6 nilpotent matrix, 33 noninvertible matrix, 87 nonleading column, 65 nonnegative matrix, 326 nonsingular matrix, 87 nontrivial solutions of homogeneous system, 82 normal equations, 280 null space, 153 nullity, 210 nullity of linear transformation, 241
scalar multiple of a matrix, 19 of a vector, 4 scaling, 38 shear transformation, 53 similar matrices, 311 Simpson’s rule, 236 singular matrix, 87 singular values, 289, 341 singular vectors, 289, 341 skew-symmetric matrix, 23 solution of linear system, 62 solution set, 62 solution space, 153 space, 2 span, 155 spherical coordinates, 54 spline interpolation, 111 square matrix, 17 stable vector, 329 standard basis for Pn , 178 standard basis for Rn , 177 stochastic matrix, 329 subspace, 151 sum of matrices, 18 of subspaces, 162 of vectors, 4 symmetric matrix, 20 system of linear equations, 62
one-to-one, linear transformation, 230 onto, linear transformation, 230 orthogonal complement, 266 orthogonal diagonalization, 317 orthogonal matrix, 259 orthogonal projection, 267 orthogonal set, 254 orthogonal vectors, 9 orthographic transformation, 47 orthonormal set, 254 P A = LU decomposition, 107 particular solution, 84 partitioned form of matrix, 22 pentadiagonal matrix, 34 permutation, 129 permutation matrix, 108 pivot, 68
QR decomposition, 284 quadrature formula, 233 range, 36, 225 rank, 209 rank of linear transformation, 241 reduced row echelon form, 65 reflection, 39, 42 regular matrix, 326 rotation, 43 by 90 degrees counterclockwise, 39 by an arbitrary angle, 40 rotation-dilation transformation, 55 row echelon form, 65 row equivalent matrices, 67 row rank, 206 row space, 206
terminal point, 2 trace of a matrix, 225
Index 383 transpose, 20 tridiagonal matrix, 24 trivial solution of homogeneous system, 82 unit vector in the direction of the given vector, 5 upper triangular matrix, 17 vector, 2
addition, 4 vector space, 142 vectors linearly dependent, 163 linearly independent, 163 zero vector, 3
AMS / MAA
TEXTBOOKS
Linear Algebra: Concepts and Applications is designed to be used in a first linear algebra course taken by mathematics and science majors. It provides a complete coverage of core linear algebra topics, including vectors and matrices, systems of linear equations, general vector spaces, linear transformations, eigenvalues, and eigenvectors. All results are carefully, clearly, and rigorously proven. The exposition is very accessible. The applications of linear algebra are extensive and substantial—several of those recur throughout the text in different contexts, including many that elucidate concepts from multivariable calculus. Unusual features of the text include a pervasive emphasis on the geometric interpretation and viewpoint as well as a very complete treatment of the singular value decomposition. The book includes over 800 exercises and numerous references to the author’s custom software Linear Algebra Toolkit.
For additional information and updates on this book, visit www.ams.org/bookpages/text-47
TEXT/47