Introduction to Vectors and Tensors [Volume 1: Linear and Multilinear Algebra] 9780486469140, 048646914X, 2008024877, 0306375087


155 88 12MB

English Pages 563 Year 2008

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
INTRODUCTION TO VECTORS AND TENSORS
Linear and Multilinear Algebra
Volume 1
Copyright Ray M. Bowen and C.-C. Wang (ISBN 0-306-37508-7 (v. 1))
PREFACE
To Volume 1
CONTENTS
CONTENTS
PART III. VECTOR AND TENSOR ANALYSIS
Selected Reading for Part I
Chapter 0
ELEMENTARY MATRIX THEORY
Exercises
Chapter 1
SETS, RELATIONS, AND FUNCTIONS
Section 1. Sets and Set Algebra
Exercises
Section 2. Ordered Pairs, Cartesian Products, and Relations
Exercises
Section 3. Functions
Exercises
Chapter 2
GROUPS, RINGS, AND FIELDS
Section 4. The Axioms for a Group
Exercises
Section 5. Properties of a Group
Exercises
Section 6. Group Homomorphisms
Exercises
Section 7. Rings and Fields
Exercises
Selected Readings for Part II
Chapter 3
VECTOR SPACES
Section 8. The Axioms for a Vector Space
Exercises
(u,x)+(v,y)=(u+v,x+y)
(u,v)+(x,y)=(u+x,v+y)
Section 9. Linear Independence, Dimension, and Basis
Exercises
Section 10. Intersection, Sum and Direct Sum of Subspaces
Exercises
Section 11. Factor Space
Exercises
Section 12. Inner Product Spaces
u • v
MM
u • v iMM
(3) IM + l |v|| > llu + v||;
llu - vll = 1 lv - ull
||u - w| R
Exercises
J: Tsr(V )^L(Tqp(V);Tq-ps(V))
Section 35. Tensors on Inner Product Spaces
G p :T p (V )>Tp (V)
I (u,v) = u • v
Exercises
Chapter 8
EXTERIOR ALGEBRA
Section 36 Skew-Symmetric Tensors and Symmetric Tensors
Exercises
Section 37 The Skew-Symmetric Operator
Exercises
Section 38. The Wedge Product
vav=0
Exercises
Section 39. Product Bases and Strict Components
Exercises
Section 40. Determinant and Orientation
D= dE
Exercises
Section 41. Duality
Exercises
Section 42. Transformation to the Contravariant Representation
Recommend Papers

Introduction to Vectors and Tensors [Volume 1: Linear and Multilinear Algebra]
 9780486469140, 048646914X, 2008024877, 0306375087

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

INTRODUCTION TO VECTORS AND TENSORS Second Edition Two Volumes Bound as One

Volume 1: Linear and Multilinear Algebra

RAYM. BOWEN Mechanical Engineering Texas A&M University College Station, Texas

C.-C. WANG Mathematical Sciences Rice University Houston, Texas

DOVER PUBLICATIONS, INC. Mineola, New York

Copyright Copyright ~ 1976.2008 by Ray M. Bowen and c.-c. Wang All rights reserved .

Bibliographical Note This Dover edition. nrst published in 2008. is a revised and expanded second edition or the work originally published in 1976 by Plenum Press. New York.

Library oj Congress Catalogillg-ill-Publication Data Bowen. Ray M .. 1936Introd uction to vectors and tensors I Ray M. Bowen and C.-c. Wang. - 2nd ed .. rev. and expanded. p. cm. Originally published: ew York: Plenum Press. 1976. Includes bibliographical rererences a nd index. ISBN-13: 978-0-486-46914-0 ISBN-IO: 0-486-46914-X I. Vector algebra . 2. Vector ana lysis. 3. Tensor a lgebra. 4. Calculu s or tenso rs. I. Wa ng. Chao-cheng. 1938- II. Title. QA200.B68 2008 5 12'.5 c22 2008024877 Manufactured in the United States or America Dover Publications. Inc .. 31 East 2nd Street. Mineola. .Y. 11 50 1

INTRODUCTION TO VECTORS AND TENSORS Linear and Multilinear Algebra

Volume 1

Ray M. Bowen Mechanical Engineering Texas A&M University College Station, Texas

and C.-C. Wang Mathematical Sciences Rice University Houston, Texas

Copyright Ray M. Bowen and C.-C. Wang (ISBN 0-306-37508-7 (v. 1)) Updated 2008

PREFACE

To Volume 1 This work represents our effort to present the basic concepts of vector and tensor analysis. Volume I begins with a brief discussion of algebraic structures followed by a rather detailed discussion of the algebra of vectors and tensors. Volume II begins with a discussion of Euclidean Manifolds which leads to a development of the analytical and geometrical aspects of vector and tensor fields. a discussion of general differentiable manifolds. We have not included a discussion of general differentiable manifolds. However, we have included a chapter on vector and tensor fields defined on Hypersurfaces in a Euclidean Manifold.

In preparing this two volume work our intention is to present to Engineering and Science students a modern introduction to vectors and tensors. Traditional courses on applied mathematics have emphasized problem solving techniques rather than the systematic development of concepts. As a result, it is possible for such courses to become terminal mathematics courses rather than courses which equip the student to develop his or her understanding further.

As Engineering students our courses on vectors and tensors were taught in the traditional way. We learned to identify vectors and tensors by formal transformation rules rather than by their common mathematical structure. The subject seemed to consist of nothing but a collection of mathematical manipulations of long equations decorated by a multitude of subscripts and superscripts. Prior to our applying vector and tensor analysis to our research area of modern continuum mechanics, we almost had to relearn the subject. Therefore, one of our objectives in writing this book is to make available a modern introductory textbook suitable for the first in-depth exposure to vectors and tensors. Because of our interest in applications, it is our hope that this book will aid students in their efforts to use vectors and tensors in applied areas. The presentation of the basic mathematical concepts is, we hope, as clear and brief as possible without being overly abstract. Since we have written an introductory text, no attempt has been made to include every possible topic. The topics we have included tend to reflect our personal bias. We make no claim that there are not other introductory topics which could have been included. Basically the text was designed in order that each volume could be used in a one-semester course. We feel Volume I is suitable for an introductory linear algebra course of one semester. Given this course, or an equivalent, Volume II is suitable for a one semester course on vector and tensor analysis. Many exercises are included in each volume. However, it is likely that teachers will wish to generate additional exercises. Several times during the preparation of this book we taught a one semester course to students with a very limited background in linear algebra and no background in tensor analysis. Typically these students were majoring in Engineering or one of the Physical Sciences. However, we occasionally had students from the Social Sciences. For this one semester course, we covered the material in Chapters 0, 3, 4, 5, 7 and 8 from Volume I and selected topics from Chapters 9, 10, and 11 from Volume 2. As to level, our classes have contained juniors,

iii

iv

PREFACE

seniors and graduate students. These students seemed to experience no unusual difficulty with the material. It is a pleasure to acknowledge our indebtedness to our students for their help and forbearance. Also, we wish to thank the U. S. National Science Foundation for its support during the preparation of this work. We especially wish to express our appreciation for the patience and understanding of our wives and children during the extended period this work was in preparation.

Houston, Texas

R.M.B. C.-C.W.

CONTENTS Vol. 1

Linear and Multilinear Algebra

Contents of Volume 2....................................................................................

vii

PART 1 BASIC MATHEMATICS Selected Readings for Part 1......................................................................................

2

CHAPTER 0 Elementary Matrix Theory................................................................

3

CHAPTER 1 Sets, Relations, and Functions...........................................................

13

Sets and Set Algebra.............................................................. Ordered Pairs" Cartesian Products" and Relations................ Functions................................................................................

13 16 18

CHAPTER 2 Groups, Rings and Fields..................................................................

23

The Axioms for a Group........................................................ Properties of a Group............................................................. Group Homomorphisms......................................................... Rings and Fields.....................................................................

23 26 29 33

Section 1. Section 2. Section 3.

Section 4. Section 5. Section 6. Section 7.

PART I1 VECTOR AND TENSOR ALGEBRA Selected Readings for Part II......................................................................................

40

CHAPTER 3 Vector Spaces.....................................................................................

41

The Axioms for a Vector Space............................................. Linear Independence, Dimension and Basis.......................... Intersection, Sum and Direct Sum of Subspaces.................... Factor Spaces.......................................................................... Inner Product Spaces.............................................................. Orthogonal Bases and Orthogonal Compliments................... Reciprocal Basis and Change of Basis...................................

41 46 55 59 62 69 75

CHAPTER 4. Linear Transformations.....................................................................

85

Section 8. Section 9. Section 10. Section 11. Section 12. Section 13. Section 14.

Section 15.

Definition of a Linear Transformation....................................

v

85

CONTENTS OF VOLUME 1

Section Section Section Section

16. 17. 18. 19.

Sums and Products of Linear Transformations....................... Special Types of Linear Transformations............................... The Adjoint of a Linear Transformation................................. Component Formulas..............................................................

CHAPTER 5. Determinants and Matrices.................................................................

Section 20. Section 21. Section 22. Section 23

93 97 105 118

125

The Generalized Kronecker Deltas and the Summation Convention....................... 125 Determinants............................................................................ 130 The Matrix of a Linear Transformation................................... 136 Solution of Systems of Linear Equations.................................. 142

CHAPTER 6 Spectral Decompositions.....................................................................

Section 24. Section 25. Section 26. Section 27. Section 28. Section 29. Section 30.

vi

145

Direct Sum of Endomorphisms............................................... Eigenvectors and Eigenvalues................................................. The Characteristic Polynomial................................................ Spectral Decomposition for Hermitian Endomorphisms......... Illustrative Examples................................................................ The Minimal Polynomial......................................................... Spectral Decomposition for Arbitrary Endomorphisms..........

145 148 151 158 171 176 182

CHAPTER 7. Tensor Algebra...................................................................................

203

Section 31. Section 32. Section 33. Section 34. Section 35.

Linear Functions, the Dual Space........................................... The Second Dual Space, Canonical Isomorphisms................ Multilinear Functions, Tensors............................................... Contractions............................................................................ Tensors on Inner Product Spaces............................................

CHAPTER 8. Exterior Algebra.................................................................................

Section 36. Section 37. Section 38. Section 39. Section 40. Section 41. Section 42.

203 213 218 229 235 247

Skew-Symmetric Tensors and Symmetric Tensors.................. 247 The Skew-Symmetric Operator............................................... 250 The Wedge Product................................................................. 256 Product Bases and Strict Components..................................... 263 Determinants and Orientations................................................ 271 Duality..................................................................................... 280 Transformation to Contravariant Representation................ 287

CONTENTS Vol. 2

Vector and Tensor Analysis

PART III. VECTOR AND TENSOR ANALYSIS Selected Readings for Part III...............................................................

296

CHAPTER 9. Euclidean Manifolds................................................................

297

Section 43. Section 44. Section 45. Section 46. Section 47. Section 48.

Euclidean Point Spaces................................................. Coordinate Systems....................................................... Transformation Rules for Vector and Tensor Fields.... Anholonomic and Physical Components of Tensors.. Christoffel Symbols and Covariant Differentiation...... Covariant Derivatives along Curves.............................

297 306 324 332 339 353

CHAPTER 10. Vector Fields and Differential Forms....................................

359

Section 49. Section 5O. Section 51. Section 52. Section 53. Section 54.

Lie Derivatives............................................................. Frobenius Theorem.................................................... Differential Forms and Exterior Derivative................. The Dual Form of Frobenius Theorem: the Poincare Lemma......................................................................... Vector Fields in a Three-Dimensiona1 Euclidean Manifold, I. Invariants and Intrinsic Equations......... Vector Fields in a Three-Dimensiona1 Euclidean Manifold, II. Representations for Special Class of Vector Fields........................ 399

359 368 373 381

389

CHAPTER 11. Hypersurfaces in a Euclidean Manifold

Section 55. Section 56. Section 57. Section 58. Section 59. Section 60. Section 61.

Normal Vector, Tangent Plane, and Surface Metric Surface Covariant Derivatives Surface Geodesics and the Exponential Map Surface Curvature, I. The Formulas of Weingarten and Gauss Surface Curvature, II. The Riemann-Christoffel Tensor and the Ricci Identities 443 Surface Curvature, III. The Equations of Gauss and Codazzi Surface Area, Minimal Surface vii

407 416 425

433

449 454

CONTENTS OF VOLUME 11 Section 62.

Surfaces in a Three-Dimensional Euclidean Manifold

viii 457

CHAPTER 12. Elements of Classical Continuous Groups Section 63. Section 64. Section 65. Section 66. Section 67.

The General Linear Group and Its Subgroups The Parallelism of Cartan One-Parameter Groups and the Exponential Map Subgroups and Subalgebras Maximal Abelian Subgroups and Subalgebras

463 469 476 482 486

CHAPTER 13. Integration of Fields on Euclidean Manifolds, Hypersurfaces, and Continuous Groups Section 68. Section 69. Section 70. Section 71. Section 72.

Arc Length, Surface Area, and Volume Integration of Vector Fields and Tensor Fields Integration of Differential Forms Generalized Stokes’ Theorem Invariant Integrals on Continuous Groups

491 499 503 507 515

INDEX.................................................................................................................................. x

PART 1

BASIC MATHEMATICS

Selected Reading for Part I BIRKHOFF, G., and S. MACLANE, A Survey ofModern Algebra, 2nd ed., Macmillan, New York, 1953. Frazer, R. A., W. J. Duncan, and A. R. Collar, Elementary Matrices, Cambridge University Press, Cambridge, 1938. Halmos, P. R., Naive Set Theory, Van Nostrand, New York, 1960. Kelley, J. L., Introduction to Modern Algebra, Van Nostrand, New York 1965. Mostow, G. D., J. H. Sampson, and J. P. Meyer, Fundamental Structures ofAlgebra, McGrawHill, New York, 1963. van der Waerden, B. L., Modern Algebra, rev. English ed., 2 vols., Ungar, New York, 1949, 1950.

Chapter 0

ELEMENTARY MATRIX THEORY When we introduce the various types of structures essential to the study of vectors and tensors, it is convenient in many cases to illustrate these structures by examples involving matrices. It is for this reason we are including a very brief introduction to matrix theory here. We shall not make any effort toward rigor in this chapter. In Chapter V we shall return to the subject of matrices and augment, in a more careful fashion, the material presented here.

An M by N matrix A is a rectangular array of real or complex numbers Aij arranged in

M rows and N columns. A matrix is often written A1N A2N

(0.1) AMN

AA M1M2

and the numbers Aij are called the elements or components of A . The matrix A is called a real

matrix or a complex matrix according to whether the components of A are real numbers or complex numbers. A matrix of M rows and N columns is said to be of order M by N or M x N. It is customary to enclose the array with brackets, parentheses or double straight lines. We shall adopt the notation in (0.1). The location of the indices is sometimes modified to the forms Aij , Aij , or Aij . Throughout this chapter the placement of the indices is unimportant and shall always be written as in (0.1). The elements Ai1, Ai2,..., AiNare the elements of the ith row ofA , and the elements A1k, A2k,..., ANk are the elements of the kth column. The convention is that the first index denotes the row and the second the column.

A row matrix is a 1 x N matrix, e.g.,

[

A11

A12

while a column matrix is an M x 1 matrix, e.g.,

3







An

]

4

ELEMENTARY MATRIX THEORY

Chap. 0 A11

A21

AM1

The matrix A is often written simply

A = [A]

(0.2)

A square matrix is an N x N matrix. In a square matrix A, the elements A11, A22,..., ANN are its diagonal elements. The sum of the diagonal elements of a square matrix A is called the trace and is written trA . Two matrices A and B are said to be equal if they are identical. That is, A and B have the same number of rows and the same number of columns and Aij = Bij,

i =1,...,N,

j =1,...,M

A matrix, every element of which is zero, is called the zero matrix and is written simply0.

If A =[ A ] and B = [B, ] are two M x N matrices, their sum (difference) is an M x N

matrix A+ B (A - B) whose elements are Aj +Bj (Aj - Bj). Thus A ± B = [A, ± B,]

(0.3)

Two matrices of the same order are said to be conformable for addition and subtraction. Addition and subtraction are not defined for matrices which are not conformable. If 2 is a number and A is a matrix, then 2A is a matrix given by 2 A = [_2AJ ] = A2

(0.4)

- A = (-1) A = [- A,]

(0.5)

Therefore,

These definitions of addition and subtraction and, multiplication by a number imply that A+B=B+A

(0.6)

Chap. 0

5

ELEMENTARY MATRIX THEORY A+(B+C)=(A+B)+C

(0.7)

A+0=A

(0.8)

A-A=0

(0.9)

2( A + B) = 2A + 2B

(0.10)

(2 + ^) A = 2A + pA

(0.11)

1A=A

(0.12)

and

where A, B and C are as assumed to be conformable.

If A is an M x N matrix and B is an N x K matrix, then the product of B by A is written AB and is an M xK matrix with elements A11

A12

A= A21

A22

A31

A32

^N

==1

AjjBs , i = 1,...,M , s = 1,...,K . For example, if

B=

and

B11

B12

B21

B22

then AB is a 3x 2 matrix given by A11

A12

AB = A2i

A22

A31

A32

B11

B12

B21

B22

A11B11

+A12B21

A11B12

+ A12B22

A21B11

+ A22B21

A21B12

+ A22B22

A31B11

+ A32B21

A32B12

+ A32B22

The product AB is defined only when the number of columns of A is equal to the number of rows ofB . If this is the case, A is said to be conformable to B for multiplication. If A is conformable to B , then B is not necessarily conformable to A . Even if BA is defined, it is not necessarily equal to AB . On the assumption that A , B ,and C are conformable for the indicated sums and products, it is possible to show that

A(B+C)=AB+AC

(0.13)

(A+B)C=AC+BC

(0.14)

6

ELEMENTARY MATRIX THEORY

Chap. 0

and

A(BC) = (AB)C

(0.15)

However, AB ^ BA in general, AB = 0 does not imply A = 0 or B = 0, and AB = AC does not necessarily imply B = C .

The square matrix I defined by 10

0

01

0

I=

(0.16)

0 0

•••

1

is the identity matrix. The identity matrix is a special case of a diagonal matrix which has all of its elements zero except the diagonal ones. A square matrix A whose elements satisfy Aij = 0, i> j, is called an upper triangular matrix , i.e.,

A13

A11

A12

0 0

A22

0

A33

0

0

0

A1N

A=

A

A lower triangular matrix can be defined in a similar fashion. A diagonal matrix is both an upper triangular matrix and a lower triangular matrix.

If A and B are square matrices of the same order such that AB = BA = I , then B is called the inverse of A and we write B = A-1 . Also, A is the inverse of B , i.e. A= B-1 . If A has an inverse it is said to be nonsingular. If A and B are square matrices of the same order with inverses A-1 and B-1 respectively, then (AB)-1= B-1A-1

Equation (0.17) follows because

(0.17)

Chap. 0

ELEMENTARY MATRIX THEORY

7

(AB)B-1A-1=A(BB-1)A-1=AIA-1=AA-1=I

and similarly (B-1A-1)AB=I

The matrix of order N x M obtained by interchanging the rows and columns of an M x N matrix A is called the transpose of A and is denoted by AT . It is easily shown that (AT)T= A

(0.18)

(AA)T = AAT

(0.19)

(A+B)T=AT+BT

(0.20)

(AB)T = BTAT

(0.21)

and

A square matrix A is symmetric if A= AT and skew-symmetric if A= -AT . Therefore, for a symmetric matrix

Aij = Aji

(0.22)

Aij = -Aji

(0.23)

and for a skew-symmetric matrix

Equation (0.23) implies that the diagonal elements of a skew symmetric-matrix are all zero. Every square matrix A can be written uniquely as the sum of a symmetric matrix and a skew-symmetric matrix, namely A = 1(A + AT) + 2(A - AT)

(0.24)

If A is a square matrix, its determinant is written detA . The reader is assumed at this stage to have some familiarity with the computational properties of determinants. In particular, it should be known that

detA=detAT

and

detAB=(detA)(detB)

(0.25)

8

ELEMENTARY MATRIX THEORY

Chap. 0

If A is an N x N square matrix, we can construct an (N -1) x (N -1) square matrix by removing the ith row and the jth column. The determinant of this matrix is denoted by Mijand is the minor of Aij . The cofactor of Aij is defined by cof Aij =(-1)i+jMij

(0.26)

For example, if A11

A12

(0.27)

AA 21 22 then

= A22, M12= A21 M21= A12 cof A11 =A22 cof A12 = -A21,

M11

M22

= A11

etc.

The adjoint of an Nx N matrix A , written adjA, is an Nx N matrix given by

adjA =

cof A11

cof A21



cof A12

cof A22



■ cof An 1 ■■cof AN2

■ ■

(0.28)



_cof AN

cof A2N





cof ANN _

The reader is cautioned that the designation “adjoint” is used in a different context in Section 18. However, no confusion should arise since the adjoint as defined in Section 18 will be designated by a different symbol. It is possible to show that A(adjA) =(detA)I =(adjA)A

(0.29)

We shall prove (0.29) in general later in Section 21; so we shall be content to establish it for N = 2 here. For N = 2 A12

adjA = - A21

Then

A11

Chap. 0

ELEMENTARY MATRIX THEORY

A (adj A) =

a12

A22

A22

— A12

'An An

9

An A22 - A12 A21

0

0

An A22 — A12 A21

Therefore

A (adj A) = (An A22 — A12 A21)I = (det A)I

Likewise

(adj A) A = (det A) I If det A ^ 0, then (0.29) shows that the inverse A 1 exists and is given by A _j = adj A det A

(0.30)

Of course, if A 1 exists, then (det A)(det A ’) = det I = 1 ^ 0 . Thus nonsingular matrices are those with a nonzero determinant. Matrix notation is convenient for manipulations of systems of linear algebraic equations. For example, if we have the set of equations

x + A12 x2 + A13 x3 +

+ A1 NxN

A11 l

yl

a21 xl + a22x2 + A23x3 + ’ ’ ’ + A2NxN =

y2

x + AN 2 x2 + AN 3 x3 + • • • + ANNxN = y-2

AN l l

then they can be written

An

A12

Al N

xl x2

yi y2

AN l

AN 2

A

xN

yN

The above matrix equation can now be written in the compact notation

AX = Y

(0.31)

ELEMENTARYMATRIX THEORY

Chap. 0

10

and if A is nonsingular the solution is X = A-1Y

(0.32)

For N = 2, we can immediately write 1 A22 det A - A21

X1 X2

- A12 A11

y1 y2

(0.33)

Exercises 0.1

Add the matrices

1

-5

2

6

Add the matrices

0.2

2i

3

7 + 2i

5

4 + 3i

i

.2 i.

■-5i ’

+3

. 6 _

Multiply

3

2i

4 + 3i

■- 5

0.4

2 5i -4 -4 i 3 + 3i - i

Add ■1■

0.3

+

’2i 7 + 2i 1 i -■ 3i

8’

6i

2

Show that the product of two upper (lower) triangular matrices is an upper lower triangular matrix. Further, if A

= 1A]•

B = 1 B, ]

are upper (lower) triangular matrices of order N x N, then

(AB)„ = (BA) „ = AiBu

Chap. 0

ELEMENTARY MATRIX THEORY

11

for all i = 1,...,N. The off diagonal elements (AB)ij and (BA)ij, i ^ j, generally are not

equal, however.

0.5

What is the transpose of

2i 3 7 + 2i 5 4 + 3i i

0.6

What is the inverse of 3 0 7

1’ 2

1

0

”5

0.7

4

Solve the equations 5x+3y=2 x+4y=9 by use of (0.33).

Chapter 1

SETS, RELATIONS, AND FUNCTIONS The purpose of this chapter is to introduce an initial vocabulary and some basic concepts. Most of the readers are expected to be somewhat familiar with this material; so its treatment is brief.

Section 1. Sets and Set Algebra The concept of a set is regarded here as primitive. It is a collection or family of things viewed as a simple entity. The things in the set are called elements or members of the set. They are said to be contained in or to belong to the set. Sets will generally be denoted by upper case script letters, A, B, C, D,., and elements by lower case letters a, b, c, d,.... The sets of complex numbers, real numbers and integers will be denoted by C , R , and I , respectively. The notation a e A means that the element a is contained in the set A; if a is not an element of A, the notation a e A is employed. To denote the set whose elements are a, b, c, and d, the notation {a,b,c,d}is employed. In mathematics a set is not generally a collection of unrelated objects like a

tree, a doorknob and a concept, but rather it is a collection which share some common property like the vineyards of France which share the common property of being in France or the real numbers which share the common property of being real. A set whose elements are determined by their possession of a certain property is denoted by {x|P (x)}, where x denotes a typical element and P(x) is the property which determines xto be in the set. If Aand B are sets, B is said to be a subset of Aif every element of B is also an element of A. It is customary to indicate that B is a subset of A by the notation B c A, which may be read as “ B is contained in A,” or A o B which may be read as “ A contains B .” For example, the set of integers I is a subset of the set of real numbers R , Ic R. Equality of two sets A and B is said to exist if Ais a subset of B and B is a subset of A; in equation form

A = B & A c B and B c A

(1.1)

A nonempty subset B of A is called a proper subset of Aif B is not equal to A. The set of integers I is actually a proper subset of the real numbers. The empty set or null set is the set with no elements and is denoted by 0 . The singleton is the set containing a single element a and is denoted by {a}. A set whose elements are sets is often called a class.

13

14

Chap. 1



SETS, RELATIONS, FUNCTIONS

Some operations on sets which yield other sets will now be introduced. The union of the sets A and B is the set of all elements that are either in the set A or in the set B . The union of A and B is denoted by A u B

A uB = {a |a e A or aeB}

(1.2)

It is easy to show that the operation of forming a union is commutative,

AuB=BuA

(1.3)

and associative Au(BuC)=(AuB)uC

(1.4)

The intersection of the sets A and B is the set of elements that are in both A and B . The intersection is denoted by A n B and is specified by

A n B = {a | a e A and a eB}

(1.5)

Again, it can be shown that the operation of intersection is commutative

AnB=BnA

(1.6)

and associative An(BnC)=(AnB)nC

(1.7)

Two sets are said to be disjoint if they have no elements in common, i.e. if

AnB = 0

(1.8)

The operations of union and intersection are related by the following distributive laws:

An(BuC)=(AnB)u(AnC) (1.9)

Au(BnC)=(AuB)n(AuC) The complement of the setB with respect to the set Ais the set of all elements contained in A but not contained inB . The complement of B with respect to the set A is denoted by A/B and is specified by A / B = {a | a e A and a £B}

It is easy to show that

(1.10)

Sets and Set Algebra

Sec. 1

A /(A / B) = B for

15

BcA

(1.11)

and A/B = A ^ A oB = 0

(1.12)

Exercises 1.1

List all possible subsets of the set {a,b,c,d} .

1.2 1.3

List two proper subsets of the set {a,b} .

1.4 1.5 1.6

1.7 1.8

Prove the following formulas: (a) A = A u B «• B c A. (b) A = A0 . (c) A=AuA. Show that IoR=I. Verify the distributive laws (1.9). Verify the following formulas: (a) A c B &■ A o B = A. (b) A o0=0. (c) AoA=A. Give a proof of the commutative and associative properties of the intersection operation. Let A ={-1,-2,-3,-4}, B ={-1,0,1,2,3,7}, C ={0}, and D ={-7,-5,-3,-1,1,2,3}. List

the elements of A uB,A oB,A uC,BoC,A uBuC,A/B, and (D/C) uA.

16

Chap. 1

SETS, RELATIONS, FUNCTIONS

Section 2. Ordered Pairs, Cartesian Products, and Relations The idea of ordering is not involved in the definition of a set. For example, the set {a, b} is equal to the set {b,a}. In many cases of interest it is important to order the elements of a set. To define an ordered pair (a, b) we single out a particular element of {a,b}, say a , and define (a, b) to be the class of sets; in equation form, (a,b) ^ {{a},{a,b}}

(2.1)

This ordered pair is then different from the ordered pair (b, a) which is defined by

(b,a) ^ {{b},{b,a}}

(2.2)

It follows directly from the definition that (a, b) = (c, d) «• a = c and b = d

Clearly, the definition of an ordered pair can be extended to an ordered N -tuple (a1,a2,...,aN) . The Cartesian Product of two sets A andB is a set whose elements are ordered pairs (a,b) where a is an element of A and b is an element ofB . We denote the Cartesian product of A andB by A xB .

A x B = {(a, b)| a e A, b eB}

(2.3)

It is easy to see how a Cartesian product of N sets can be formed using the notion of an ordered Ntuple. Any subset K ofA xB defines a relation from the set A to the setB . The notation aKbis employed if (a, b) e K; this notation is modified for negation as aKb if (a, b) £ K. As an example of a relation let A be the set of all Volkswagons in the United States and let B be the set of all Volkswagons in Germany. Let be K o A x B be defined so that aKb if b is the same color as a .

If A= B, then Kis a relation onA . Such a relation is said to be reflexive if aKa for all a e A, it is said to be symmetric if aKb whenever bKa for any a and b in A, and it is said to be transitive if aKb and bKc imply that aKc for any a, band cin A . A relation on a set A is said to be an equivalence relation if it is reflexive, symmetric and transitive. As an example of an equivalence relation K on the set of all real numbers R let aKb& a = b for all a, b . To verify that this is an equivalence relation, note that a = a for each a e K (reflexivity), a = b ^ b = a for each a e K (symmetry), and a = b, b = c ^ a = c for a, b and c in K (transitivity).

Ordered Pairs, Cartesian Products, Relations

Sec. 2

17

A relation K on A is antisymmetric if aKbthen bKa implies b= a. A relation on a set A is said to be a partial ordering if it is reflexive, antisymmetric and transitive. The equality relation is a trivial example of partial ordering. As another example of a partial ordering on R , let the inequality < be the relation. To verify that this is a partial ordering note that a < a for every a : R (reflexivity), a < b and b < a ^ a = b for all a, b : R (antisymmetry), and a < b and b < c ^ a < c for all a, b, c ' R (transitivity). Of course inequality is not an equivalence relation since it is not symmetric.

Exercises 2.1

Define an ordered triple (a, b, c) by (a,b,c) =((a,b),c)

and show that (a, b, c) = (d, e, f) «• a = d, b = e , and c = f . 2.2

Let A be a set and K an equivalence relation on A. For each a e A consider the set of all elements x that stand in the relation K to a ; this set is denoted by {x |aKx} , and is called an equivalence set. (a) Show that the equivalence set of a contains a . (b) Show that any two equivalence sets either coincide or are disjoint. Note: (a) and (b) show that the equivalence sets form a partition of A ; A is the disjoint union of the equivalence sets.

2.3

On the set Rof all real numbers does the strict inequality < constitute a partial ordering on the set?

2.4

For ordered triples of real numbers define (a1,b1,c1) < (a2,b2,c2) if a1Y is a one-to-one function, then there exists a function f: f (X) ^ X which associates with each ye f(X ) the unique x eX such that y= f(x). We call f-1 the inverse of f, written x = f-1(y). Unlike the preimage defined by (3.3), the definition of inverse functions is possible only for functions that are one-to-one. Clearly, we have f-1(f(X))= X. The composition of two functions f : X >Y and g : > Y ^ Z is a function h : X ^ Z defined by h (x) = g (f (x)) for all x eX . The function h is written h = g ° f . The composition of any finite number of functions is easily defined in a fashion similar to that of a pair. The operation of composition of functions is not generally commutative, g°f

*f°g

Indeed, if g ° f is defined, f ° g may not be defined, and even if g ° f and f ° g are both defined, they may not be equal. The operation of composition is associative

Chap. 1

20

SETS, RELATIONS, FUNCTIONS

h ° (g ° f) = (h ° g) ° f if each of the indicated compositions is defined. The identity function id : X ^ X is defined as the function id(x) = x for all x eX . Clearly if f is a one-to-one correspondence from X to Y, then

f-1 ° f = idX ,

f ° f-1 = id*

where idX and idY denote the identity functions of X andY , respectively. To close this discussion of functions we consider the special types of functions called sequences. A finite sequence of N terms is a function f whose domain is the set of the first

N positive integers {1,2,3,...,N} . The range of fis the set of N elements { f (1), f (2),..., f (N)} ,

which is usually denoted by { f1, f2,..., fN} . The elements f1, f2,..., fN of the range are called terms of the sequence. Similarly, an infinite sequence is a function defined on the positive integers I + . The range of an infinite sequence is usually denoted by { fn } , which stands for the infinite set

{ f1, f2, f3,...} .

A function g whose domain is I +and whose range is contained in I +is said to be

order preserving if m< n implies that g(m) < g(n) for all m, ne I + . If f is an infinite sequence and g is order preserving, then the composition f ° g is a subsequence of f . For example, let f. = 1/(n +1),

g. = 4

Then f o g (n) = V(4n +1)

is a subsequence of f .

Exercises 3.1 3.2 3.3

Verify the results (3.2) and (3.4). How may the domain and range of the sine function be chosen so that it is a one-to-one correspondence? Which of the following conditions defines a function y= f(x) ? x2+y2=1,

3.4 3.5 3.6

xy=1,

x,yeR.

Give an example of a real valued function of a complex variable. Can the implication of (3.5) be reversed? Why? Under what conditions is the operation of composition of functions commutative?

Sec. 3

3.7



Functions

21

Show that the arc sine function sin-1really is not a function from R to R . How can it be made into a function?

Chapter 2

GROUPS, RINGS, AND FIELDS The mathematical concept of a group has been crystallized and abstracted from numerous situations both within the body of mathematics and without. Permutations of objects in a set is a group operation. The operations of multiplication and addition in the real number system are group operations. The study of groups is developed from the study of particular situations in which groups appear naturally, and it is instructive that the group itself be presented as an object for study. In the first three sections of this chapter certain of the basic properties of groups are introduced. In the last section the group concept is used in the definition of rings and fields.

Section 4. The Axioms for a Group Central to the definition of a group is the idea of a binary operation. If G is a non-empty set, a binary operation on G is a function from C x C to G. If a, b e G, the binary operation will be denoted by * and its value by a * b . The important point to be understood about a binary operation on G is that G is closed with respect to * in the sense that if a, b eG then a * b eG also. A binary operation * on G is associative if

(a*b)*c=a*(b*c)

for all a,b,ceG

(4.1)

Thus, parentheses are unnecessary in the combination of more than two elements by an associative operation. A semigroup is a pair (G,*) consisting of a nonempty set G with an associative binary

operation * . A binary operation * on G is commutative if a*b=b*a

for all a,beG

(4.2)

The multiplication of two real numbers and the addition of two real numbers are both examples of commutative binary operations. The multiplication of two N by N matrices (N > 1) is a noncommutative binary operation. A binary operation is often denoted by a ° b, a ■ b, or ab rather than bya*b; if the operation is commutative, the notation a+ b is sometimes used in place of a*b. An element e e G that satisfies the condition

23

GROUPS, RINGS, FIELDS

Chap. 2

24

e *a = a*e = a

for all a ■ G

(4.3)

is called an identity element for the binary operation * on the set G . In the set of real numbers with the binary operation of multiplication, it is easy to see that the number 1 plays the role of identity element. In the set of real numbers with the binary operation of addition, 0 has the role of the identity element. Clearly G contains at most one identity element ine . For if e1 is another identity element in G , then e1 * a = a * e1 = a for all a e G also. In particular, if we choose a = e , then e1 *e=e. But from (4.3), we have also e1*e= e1. Thus e1 = e. In general, G need not have any identity element. But if there is an identity element, and if the binary operation is regarded as multiplicative, then the identity element is often called the unity element; on the other hand, if the binary operation is additive, then the identity element is called the zero element. In a semigroup G containing an identity element e with respect to the binary operation * , an element a-1 is said to be an inverse of the element a if

a*a-1=a-1*a=e

(4.4)

In general, a need not have an inverse. But if an inverse a-1 of a exists, then it is unique, the proof being essentially the same as that of the uniqueness of the identity element. The identity element is its own inverse. In the set R /{0} with the binary operation of multiplication, the

inverse of a number is the reciprocal of the number. In the set of real numbers with the binary operation of addition the inverse of a number is the negative of the number. A group is a pair (G,*) consisting of an associative binary operation * and a set G which

contains the identity element and the inverses of all elements of G with respect to the binary operation * . This definition can be explicitly stated in equation form as follows.

Definition. A group is a pair (G,*) where G is a set and * is a binary operation satisfying the following: (a)

(a*b)*c=a*(b*c)for all a,b,ce G .

(b)

There exists an element e e G such that a* e=e*a=a for all a e G .

(c)

For every a e G there exists an element a-1 e G such that a*a-1=a-1*a=e.

If the binary operation of the group is commutative, the group is said to be a commutative (or Abelian) group.

The Axioms for a Group

Sec. 4

25

The set R /{0} with the binary operation of multiplication forms a group, and the set R

with the binary operation of addition forms another group. The set of positive integers with the binary operation of multiplication forms a semigroup with an identity element but does not form a group because the condition © above is not satisfied. A notational convention customarily employed is to denote a group simply by G rather than by the pair (G, *) . This convention assumes that the particular * to be employed is understood.

We shall follow this convention here.

Exercises 4.1

Verify that the set G e {1,i,-1, -i}, where i2 = -1, is a group with respect to the binary operation of multiplication.

4.2

Verify that the set G consisting of the four 2 x 2 matrices 10 01

0 -1

1 0

0 -1 10

-1

0

0 -1

constitutes a group with respect to the binary operation of matrix multiplication.

4.3

Verify that the set G consisting of the four 2x 2 matrices

10 01

0 01

-1

10 0 -1

-1

0

0 -1

constitutes a group with respect to the binary operation of matrix multiplication.

4.4

Determine a subset of the set of all 3x 3matrices with the binary operation of matrix multiplication that will form a group.

4.5

If A is a set, show that the set G, G = {f\f : A ^ A, f is a one-to-one correspondence}

is a one-to-one correspondence}constitutes a group with respect to the binary operation of composition.

GROUPS, RINGS, FIELDS

Chap. 2

26

Section 5. Properties of a Group In this section certain basic properties of groups are established. There is a great economy of effort in establishing these properties for a group in general because the properties will then be possessed by any system that can be shown to constitute a group, and there are many such systems of interest in this text. For example, any property established in this section will automatically hold for the group consisting of R/{0} with the binary operation of multiplication, for the group

consisting of the real numbers with the binary operation of addition, and for groups involving vectors and tensors that will be introduced in later chapters. The basic properties of a group G in general are:

1.

The identity element e ■ G is unique.

2. The inverse element aof any element a : G is unique. The proof of Property 1 was given in the preceding section. The proof of Property 2 follows by essentially the same argument.

If n is a positive integer, the powers of a : G are defined as follows: (i)For n = 1,a1 = a.

( )

(ii) For n > 1, an = an-1 * a. (iii) a0 = e. (iv) a-n = a 3.

.

If m, n, k are any integers, positive or negative, then for a : G km+n

a) (

=a

mk+nk

In particular, when m=n=-1, we have: 4.

(a )

= a for all a :G

5.

(a * b)

1 = b* a-1 for all a, b ■' G

-

For square matrices the proof of this property is given by (0.17); the proof in general is exactly the same. The following property is a useful algebraic rule for all groups; it gives the solutions x and y to the equations x *a=b and a*y=b, respectively. 6.

For any elements a,bin G , the two equations x *a= b and a*y=b have the unique solutions x = b*a 1 and y = a 1 *b.

Properties of a Group

Sec. 5

27

Proof. Since the proof is the same for both equations, we will consider only the equation x * a = b. Clearly x = b * asatisfies the equation x * a = b,

(

x* a = b* a

1 )* a = b* (a 1 * a) = b* e = b

Conversely, x*a=b,implies

(

)

x=x*e=x* a*a-1 =(x*a)*a-1=b*a-1

which is unique. As a result we have the next two properties.

7.

For any three elements a,b,c in G , either a* c=b*c or c* a=c*b implies that

a=b. 8. For any two elements a,b in the group G , either a* b= b or b*a=b implies that a is the identity element. A non-empty subset G ' of G is a subgroup of G if G ' is a group with respect to the binary operation of G , i.e., G' is a subgroup of G if and only if (i) e eG' (ii) a eG' ^aeG' (iii) a, b eG' ^ a * b eG'.

9. Let G' be a nonempty subset of G . Then G' is a subgroup if a, b eG' ^ a * b- eG'.

Proof. The proof consists in showing that the conditions (i), (ii), (iii) of a subgroup are satisfied. (i) Since G ' is non-empty, it contains an element a , hence a*a 1 =eeG' . (ii) If b e G ' then

( )

e*b 1=b 1eG ' . (iii) If a,be G ' , then a* b 1 1 =a*beG ' .

If G is a group, then G itself is a subgroup of G , and the group consisting only of the element e is also a subgroup of G . A subgroup of G other than G itself and the group e is called a proper subgroup of G .. 10.

The intersection of any two subgroups of a group remains a subgroup.

Exercises 5.1

Show that e is its own inverse.

5.2

Prove Property 3.

28

Chap. 2



GROUPS, RINGS, FIELDS

5.3

Prove Properties 7 and 8.

5.4

Show that x = y in Property 6 if G is Abelian.

5.5

If G is a group and a : G show that a * a = a implies a = e .

5.6

Prove Property 10.

5.7

Show that if we look at the non-zero real numbers under multiplication, then (a) the rational numbers form a subgroup, (b) the positive real numbers form a subgroup, (c) the irrational numbers do not form a subgroup.

Group Homorphisms

Sec. 6

Section 6.

29

Group Homomorphisms

A “group homomorphism” is a fairly overpowering phrase for those who have not heard it or seen it before. The word homomorphism means simply the same type of formation or structure. A group homomorphism has then to do with groups having the same type of formation or structure.

Specifically, if G and H are two groups with the binary operations * and °, respectively, a function f : G ^ H is a homomorphism if f (a * b ) = f (a) ° f (b)

for all a, b :G

(6.1)

If a homomorphism exists between two groups, the groups are said to be homomorphic. A homomorphism f : G ^ H is an isomorphism if f is both one-to-one and onto. If an isomorphism exists between two groups, the groups are said to be isomorphic. Finally, a homomorphism f : G > G is called an endomorphism and an isomorphism f : G > G is called an automorphism. To illustrate these definitions, consider the following examples:

1.

Let G be the group of all nonsingular, real, N x N matrices with the binary operation of matrix multiplication. Let H be the group R /{0} with the binary operation of scalar multiplication. The function that is the determinant of a matrix is then a homomorphism from G to H .

2.

Let G be any group and let H be the group whose only element ise. Let f be the function that assigns to every element of G the value e; then fis a (trivial) homomorphism.

3.

The identity function id : G > G is a (trivial) automorphism of any group.

4.

Let G be the group of positive real numbers with the binary operation of multiplication and let H be the group of real numbers with the binary operation of addition. The log function is an isomorphism between G and H .

5.

Let G be the group R /{0} with the binary operation of multiplication. The function that takes the absolute value of a number is then an endomorphism of G into G . The restriction of this function to the subgroup H of H consisting of all positive real numbers is a (trivial) automorphism of H, however.

From these examples of homomorphisms and automorphisms one might note that a homomorphism maps identities into identities and inverses into inverses. The proof of this

30

Chap. 2

GROUPS, RINGS, FIELDS

observation is the content of Theorem 6.1 below. Theorem 6.2 shows that a homomorphism takes a subgroup into a subgroup and Theorem 6.3 proves a converse result. Theorem 6.1 . If f :G ^ H is a homomorphism, then f (e) coincides with the identity element e0 of H and

( )

f a-1 = f (a)-1

Proof. Using the definition (6.1) of a homomorphism on the identity a = a * e, one finds that

f

(a ) = f ( a * e ) = f (a ) ° f (e )

From Property 8 of a group it follows that f (e) is the identity element e0 of H. Now, using this

result and the definition of a homomorphism (6.1) applied to the equation e=a*a-1 , we find

e0

=

( ) = f (a * a- ) = f (a) ° f (a-)

f e

( )

It follows from Property 2 that f a-1 is the inverse of f (a) .

Theorem 6.2 . If f :G ^ H is a homomorphism and if G' is a subgroup of G , then f (G') is a subgroup of H. Proof. The proof will consist in showing that the set f(G') satisfies the conditions (i), (ii), (iii) of

a subgroup.

(i) Since

e eG' it follows from Theorem 6.1 that the identity element e0 of H is

(ii)For any a eG', f (a) 1 e f (G') by Theorem 6.1. (iii)For any f (a) ° f (b) e f (G') since f (a) ° f (b) = f (a * b)e f (G') .

contained in f (G') .

a,b eG',

As a corollary to Theorem 6.2 we see that f (G) is itself a subgroup of H.

Theorem 6.3 . If f :G ^ H is a homomorphism and if H 'is a subgroup of H , then the preimage f 1(H ')is a subgroup of G .

The kernel of a homomorphism f :G ^ H is the subgroup f(e0) of G . In other words, the kernel of f is the set of elements of G that are mapped by f to the identity element e0 of H .

Sec. 6

31

Group Homorphisms

The notation K( f) will be used to denote the kernel of f . In Example 1 above, the kernel of f consists of N x N matrices with determinant equal to the real number 1, while in Example 2 it is the entire group G . In Example 3 the kernel of the identity map is the identity element, of course. Theorem 6.4 . A homomorphism f :G ^ H is one-to-one if and only if K (f) = {e}.

Proof. This proof consists in showing that f (a) = f(b) implies a= bif and only if K(f)= {e}.

(

)

If f (a) = f (b), then f (a) ° f (b) 1 = e0, hence f a * b = e0 and it follows that a * b- e K(f). Thus, if K(f ) = {e},then a * b= e or a = b. Conversely, now, we assume f

is

one-to-one and since K(f)is a subgroup of G by Theorem 6.3, it must contain e. If K(f)contains any other element a such that f(a) = e0, we would have a contradiction since f is one-to-one; therefore K(f)= {e}.

Since an isomorphism is one-to-one and onto, it has an inverse which is a function from H onto G . The next theorem shows that the inverse is also an isomorphism.

Theorem 6.5 . If f :G ^ H is an isomorphism, then f: H > G is an isomorphism.

Proof Since f is one-to-one and onto, it follows that f 1 is also one-to-one and onto (cf. Section

3). Let a0be the element of H such that a= f 1 (a0) for any a e G ; then a * b = f-1 (a0 )* f-1 (b0) • But a * b is the inverse image of the element a0 ° b0 e Hbecause

f (a * b) = f (a) ° f (b) = a0 * b0 since f is a homomorphism. Therefore f- (a0) * f(b0) = f(a0 ° b0), which shows that fsatisfies the definition (6.1) of a homomorphism.

Theorem 6.6 . A homomorphism f :G ^ H is an isomorphism if it is onto and if its kernel contains only the identity element of G .

The proof of this theorem is a trivial consequence of Theorem 6.4 and the definition of an isomorphism.

Chap. 2

32

GROUPS, RINGS, FIELDS

Exercises 6.1

If f :G ^ H and g: H ^ M are homomorphisms, then show that the composition of the mappings f and g is a homomorphism from G to M .

6.2

Prove Theorems 6.3 and 6.6.

6.3

Show that the logarithm function is an isomorphism from the positive real numbers under multiplication to all the real numbers under addition. What is the inverse of this isomorphism?

6.4

Show that the function f : f : (R, +)^ R+, • defined by f (x) = x2 is not a

( )

homomorphism.

Rings and Fields

Sec. 7

Section 7.

33

Rings and Fields

Groups and semigroups are important building blocks in developing algebraic structures. In this section we shall consider sets that form a group with respect to one binary operation and a semigroup with respect to another. We shall also consider a set that is a group with respect to two different binary operations.

Definition. A ring is a triple (D, +, •) consisting of a set D and two binary operations + and • such that (a)

D with the operation + is an Abelian group.

(b)

The operation • is associative.

(c)

D contains an identity element, denoted by 1, with respect to the operation • , i.e., 1•a =a•1 =a

for all a e D.

(d)

The operations + and • satisfy the distributive axioms a •( b + c ) = a • b + a • c

(b + c )• a = b • a + c • a

(7.1)

The operation + is called addition and the operation • is called multiplication. As was done with the notation for a group, the ring (D, +, •) will be written simply as D . Axiom (a) requires that D contains the identity element for the + operation. We shall follow the usual procedure and denote this element by 0. Thus, a+0=0+a=a. Axiom (a) also requires that each element a eD have an additive inverse, which we will denote by -a, a+ (-a) =0 . The quantity a+ bis called the sum of a and b , and the difference between a and b is a+ (-b) , which is usually written as a- b. Axiom (b) requires the set D and the multiplication operation to form a semigroup. If the multiplication operation is also commutative, the ring is called a commutative ring. Axiom (c) requires that D contain an identity element for multiplication; this element is called the unity element of the ring and is denoted by 1. The symbol 1 should not be confused with the real number one. The existence of the unity element is sometimes omitted in the ring axioms and the ring as we have defined it above is called the ring with unity. Axiom (d), the distributive axiom, is the only idea in the definition of a ring that has not appeared before. It provides a rule for the interchanging of the two binary operations.

34

Chap. 2

GROUPS, RINGS, FIELDS

The following familiar systems are all examples of rings. 1.

The set I of integers with the ordinary addition and multiplication operations form a ring.

2.

The set Ra of rational numbers with the usual addition and multiplication operations form a ring.

3.

The set R of real numbers with the usual addition and multiplication operations form a ring.

4.

The set C of complex numbers with the usual addition and multiplication operations form a ring.

5.

The set of all N by N matrices form a ring with respect to the operations of matrix addition and matrix multiplication. The unity element is the unity matrix and the zero element is the zero matrix.

6.

The set of all polynomials in one (or several) variable with real (or complex) coefficients form a ring with respect to the usual operations of addition and multiplication.

Many properties of rings can be deduced from similar results that hold for groups. For example, the zero element 0, the unity element 1 , the negative element -a of an element a are unique. Properties that are associated with the interconnection between the two binary operations are contained in the following theorems:

Theorem 7.1 . For any element a t D , a ■ 0 = 0 ■ a = 0.

Proof. For any b tD we have by Axiom (a) that b + 0 = b. Thus for any a t Dit follows that (b + 0) ■ a = b ■ a, and the Axiom (d) permits this equation to be recast in the form b ■ a + 0 ■ a = b ■ a. From Property 8 for the additive group this equation implies that 0 ■ a = 0. It can be shown that a ■0=0by a similar argument. Theorem 7.2 . For all elements a, btD

(-a)■ b = a ■(-b) = -(a ■ b) This theorem shows that there is no ambiguity in writing - a ■ b for -(a ■ b).

Many of the notions developed for groups can be extended to rings; for example, subrings and ring homomorphisms correspond to the notions of subgroups and group homomorphisms. The interested reader can consult the Selected Reading for a discussion of these ideas.

The set of integers is an example of an algebraic structure called an integral domain.

Rings and Fields

Sec. 7

35

Definition. A ring D is an integral domain if it satisfies the following additional axioms: (e)

The operation • is commutative.

(f)

If a, b, c are any elements of D with c ^ 0, then

a•c = b•c^a = b

(7.2)

The cancellation law of multiplication introduced by Axiom (f) is logically equivalent to the assertion that a product of nonzero factors is nonzero. This is proved in the following theorem. Theorem 7.3 . A commutative ring D is an integral domain if and only if for all elements a, b : D, a • b ^ 0 unless a = 0 or b = 0.

Proof. Assume first that D is an integral domain so the cancellation law holds. Suppose a • b = 0 and b ^ 0. Then we can write a • b = 0 • b and the cancellation law implies a = 0. Conversely, suppose that a product in a commutative ring D cannot be zero unless one of its factors is zero. Consider the expression a • c = b • c, which can be rewritten as a • c - b • c = 0 or as (a - b) • c = 0. If c ^ 0 then by assumption a - b = 0or a = b. This proves that the cancellation law holds. The sets of integers I , rational numbers Ra, real numbers R , and complex numbers C are examples of integral domains as well as being examples of rings. Sets of square matrices, while forming rings with respect to the binary operations of matrix addition and multiplication, do not, however, form integral domains. We can show this in several ways; first by showing that matrix multiplication is not commutative and, second, we can find two nonzero matrices whose product is zero, for example, 5

0j0 0

0 0

0

1

00 00

The rational numbers, the real numbers, and the complex numbers are examples of an algebraic structure called a field.

Definition. A field F is an integral domain, containing more than one element, and such that any element a : F / {0} has an inverse with respect to multiplication.

Chap. 2

36

GROUPS, RINGS, FIELDS

It is clear from this definition that the set F and the addition operation as well as the set F /{0} and the multiplication operation form Abelian groups. Hence the unity element 1 , the zero

element 0, the negative (-a), as well as the reciprocal (1/ a), a ^ 0, are unique. The formulas of arithmetic can be developed as theorems following from the axioms of a field. It is not our purpose to do this here; however, it is important to convince oneself that it can be done.

The definitions of ring, integral domain, and field are each a step more restrictive than the other. It is trivial to notice that any set that is a field is automatically an integral domain and a ring. Similarly, any set that is an integral domain is automatically a ring. The dependence of the algebraic structures introduced in this chapter is illustrated schematically in Figure 7.1. The dependence of the vector space, which is to be introduced in the next chapter, upon these algebraic structures is also indicated.

Commutative ring

▼ Integral domain I

Field

Vector space Figure 1. A schematic of algebraic structures.

Rings and Fields

Sec. 7

37

Exercises 7.1

Verify that Examples 1 and 5 are rings.

7.2

Prove Theorem 7.2.

7.3

Prove that if a,b,c,d are elements of a ring D , then (a)

(-a )•(-b ) = a ■ b.

(b)

(—1)’ a = a •(—!) = —a•

(c)

(a + b )•( c + d ) = a ■ c + a ■ d + b ■ c + b ■ d.

7.4

Among the Examples 1-6 of rings given in the text, which ones are actually integral domains, and which ones are fields?

7.5

Is the set of rational numbers a field?

7.6

Why does the set of integers not constitute a field?

7.7

If F is a field, why is F /{0}an Abelian group with respect to multiplication?

7.8

Show that for all rational numbers x, y , the set of elements of form x + yJ2 constitutes a field.

7.9

For all rational numbers x, y, z, does the set of elements of the form x + y^/2" + z>/3 form a field? If not, show how to enlarge it to a field.

PART 2

VECTOR AND TENSOR ALGEBRA

Selected Readings for Part II Frazer, R. A., W. J. Duncan, and A. R. Collar, Elementary Matrices, Cambridge University Press, Cambridge, 1938. GREUB, W. H., Linear Algebra, 3rd ed., Springer-Verlag, New York, 1967. Greub, W. H., Multilinear Algebra, Springer-Verlag, New York, 1967. Halmos, P. R., Finite Dimensional Vector Spaces, Van Nostrand, Princeton, New Jersey, 1958. Jacobson, N., Lectures in Abstract Algebra, Vol. II. Linear Algebra, Van Nostrand, New York, 1953. Mostow, G. D., J. H. Sampson, and J. P. Meyer, Fundamental Structures ofAlgebra, McGrawHill, New York, 1963. Nomizu, K., Fundamentals ofLinear Algebra, McGraw-Hill, New York, 1966.

Shephard, G. C., Vector Spaces ofFinite Dimensions, Interscience, New York, 1966. Veblen, O., Invariants of Quadratic Differential Forms, Cambridge University Press, Cambridge, 1962.

Chapter 3

VECTOR SPACES Section 8.

The Axioms for a Vector Space

Generally, one’s first encounter with the concept of a vector is a geometrical one in the form of a directed line segment, that is to say, a straight line with an arrowhead. This type of vector, if it is properly defined, is a special example of the more general notion of a vector presented in this chapter. The concept of a vector put forward here is purely algebraic. The definition given for a vector is that it be a member of a set that satisfies certain algebraic rules. A vector space is a triple (V,F,f ) consisting of (a) an additive Abelian group V , (b) a

field F, and (c) a function f :Fy.V >V called scalar multiplication such that

f

(2, (/, v )) = (2//, v) f

f

(2 + /, u ) = f (2, u )+f (/, u) f(2,u+v)=f(2,u)+f(2,v) f (1,v)=v

f

(8.1)

for all 2,/^F and all u, v ^V. A vector is an element of a vector space. The notation

(V,F,f )for a vector space will be shortened to simply

V . The first of (8.1) is usually called the

associative law for scalar multiplication, while the second and third equations are distributive laws, the second for scalar addition and the third for vector addition. It is also customary to use a simplified notation for the scalar multiplication function f .

We shall write f (2,v) =2vand also regard 2v and v2 to be identical. In this simplified notation

we shall now list in detail the axioms of a vector space. In this definition the vector u+ v in V is called the sum of u and v and the difference of u and v is written u- vand is defined by u-v=u+(-v)

(8.2)

Definition. Let V be a set and F a field. V is a vector space if it satisfies the following rules: (a)

There exists a binary operation in V called addition and denoted by + such that

41

Chap. 3

42

VECTOR SPACES

(1)

(u + v) + w = u + (v + w)

(2) (3) (4)

u + v = v + u for all u,v eV . There exists an element 0 eV such that u + 0 = u for all u eV. For every u eV there exists an element -u eV. such that u + (-u) = 0 .

for all u, v, w eV.

(b) There exists an operation called scalar multiplication in which every scalar AeF can be combined with every element u eV to give an element Au eV such that (1)

A( pu ) = ( Ap) u

(2)

(A + p) u = Au + pu A(u+v)=Au+Av

(3)

(4) 1u= u for all A,peF and all u,veV .

If the field F employed in a vector space is actually the field of real numbers R, the space is called a real vector space. A complex vector space is similarly defined. Except for a few minor examples, the material in this chapter and the next three employs complex vector spaces. Naturally the real vector space is a trivial special case. The reason for allowing the scalar field to be complex in these chapters is that the material on spectral decompositions in Chapter 6 has more usefulness in the complex case. After Chapter 6 we shall specialize the discussion to real vector spaces. Therefore, unless we provide some qualifying statement to the contrary, a vector space should be understood to be a complex vector space.

There are many and varied sets of objects that qualify as vector spaces. The following is a list of examples of vector spaces:

1.

(

)

The vector space CN is the set of all N-tuples of the form u = A1,A2,...,AN , where N is a positive integer and A1,A2,...,AN eC . Since an N-tuple is an ordered set, if

)

(

v = p1,p2,...,pN is a second N-tuple, then uandvare equal if and only if pk=Ak for all k=1,2,...,N

The zero N-tuple is 0 = (0, 0,..., 0) and the negative of the N-tuple u is

(

)

-u = -A1,-A2,...,-AN . Addition and scalar multiplication of N-tuples are defined by the

formulas

(

)

u+v= p1+A1,p2+A2,...,pN+AN

The Axioms for a Vector Space

Sec. 8

43

and

(

^u = ^21, ^2,...., ^2N

)

respectively. The notation CN is used for this vector space because it can be considered to be an Nth Cartesian product ofC . 2.

3.

The set V of all N xM complex matrices is a vector space with respect to the usual operation of matrix addition and multiplication by a scalar. Let H be a vector space whose vectors are actually functions defined on a set A with values in C . Thus, if h e H, xeAthen h(x)eC and h: A ^C . If k is another vector of H then equality of vectors is defined by h = k «• h (x) = k (x) for all x e A

The zero vector is the zero function whose value is zero for all x . Addition and scalar multiplication are defined by

(h+k)(x)=h(x)+k(x) and

( 2h )(x ) = 2( h ( x )) respectively.

4.

Let P denote the set of all polynomials u of degree N of the form u = 20 + /.• x + 22 x2 +----- + AN xN

where20,21,22,..., AN eC. The set Pforms a vector space over the complex numbers C if addition and scalar multiplication of polynomials are defined in the usual way. 5. 6.

The set of complex numbers C , with the usual definitions of addition and multiplication by a real number, forms a real vector space. The zero element 0 of any vector space forms a vector space by itself.

The operation of scalar multiplication is not a binary operation as the other operations we have considered have been. In order to develop some familiarity with the algebra of this operation consider the following three theorems.

Chap. 3

44

VECTOR SPACES

Theorem 8.1 . Au = 0 «• A = 0 or u = 0 . Proof. The proof of this theorem actually requires the proof of the following three assertions:

(a) 0u = 0,

(b) A0 = 0,

(c) Au = 0 ^ A = 0 or u = 0

To prove (a), take Li = 0 in Axiom (b2) for a vector space; then Au = Au + 0u. Therefore Au- Au=Au-Au+0u

and by Axioms (a4) and (a3)

0= 0+0u=0u

which proves (a). To prove (b), set v= 0 in Axiom (b3) for a vector space; then Au= Au+ A0. Therefore Au- Au=Au-Au+A0

and by Axiom (a4)

0= 0+A0=A0

To prove (c), we assume Au= 0. If A= 0, we know from (a) that the equation Au= 0 is satisfied. If A ^ 0, then we show that umust be zero as follows:

u = 1u = A | — | u = —(Au) = —(0) = 0 < A)

Theorem 8.2 .

(-A) u = -Au

A

'

A

'

for all A e C, u e V.

Proof. Let p= 0and replace A by -A in Axiom (b2) for a vector space and this result follows directly. Theorem 8.3 . -Au = A (-u) for all AeC, u e V.

The Axioms for a Vector Space

Sec. 8

45

Finally, we note that the concepts of length and angle have not been introduced. They will be introduced in a later section. The reason for the delay in their introduction is to emphasize that certain results can be established without reference to these concepts.

Exercises 8.1

8.2 8.3

8.4

Show that at least one of the axioms for a vector space is redundant. In particular, one might show that Axiom (a2) can be deduced from the remaining axioms. [Hint: expand (1+1)(u+v)by the two different rules.] Verify that the sets listed in Examples 1- 6 are actually vector spaces. In particular, list the zero vectors and the negative of a typical vector in each example. Show that the axioms for a vector space still make sense if the field F is replaced by a ring. The resulting structure is called a module over a ring. Show that the Axiom (b4) for a vector space can be replaced by the axiom

(b4) 8.5

2u = 0 «• 2 = 0or u = 0

Let Vand Ube vector spaces. Show that the set VxU is a vector space with the definitions

(u,x)+(v,y)=(u+v,x+y) and 2(u,x) = (2u,2x)

where u,v e V; x,y eU; and 2eF

8.6 8.7

Prove Theorem 8.3. Let V be a vector space and consider the set VxV. Define addition in Vx Vby

(u,v)+(x,y)=(u+x,v+y) and multiplication by complex numbers by

(2 + iy) (u, v) = (2u - ^v, uw + 2v) where 2,u' R. Show that VxV is a vector space over the field of complex numbers.

Chap. 3

46

VECTOR SPACES

Linear Independence, Dimension, and Basis

Section 9.

The concept of linear independence is introduced by first defining what is meant by linear dependence in a set of vectors and then defining a set of vectors that is not linearly dependent to be linearly independent. The general definition of linear dependence of a set of N vectors is an algebraic generalization and abstraction of the concepts of collinearity from elementary geometry.

Definition. A finite set of N(N > 1) vectors {v1,v2,..., vN}in a vector space V is said to be

}

{

linearly dependent if there exists a set of scalars A1,A2,..., AN , not all zero, such that

N

SAV

j

=0

(9.1)

j=1

The essential content of this definition is that at least one of the vectors {v1, v2,..., vN} can

be expressed as a linear combination of the other vectors. This means that if A * 0, then v1 =

S

vj ,where

N2 Pj

lij = -A

/ A for j = 2,3,..., N. As a numerical example, consider the two

vectors v1=(1,2,3),

v2=(3,6,9)

from R 3. These vectors are linearly dependent since

v2= 3v1

The proof of the following two theorems on linear dependence is quite simple. Theorem 9.1 . If the set of vectors {v1, v 2, — , v N } is linearly dependent, then every other finite set

of vectors containing {v1, v 2, — , v N } is linearly dependent.

Theorem 9.2 . Every set of vectors containing the zero vector is linearly dependent.

Linear Independence, Dimension, Basis

Sec. 9

47

A set of N (N > 1) vectors that is not linearly dependent is said to be linearly independent. Equivalently, a set of N(N > 1) vectors {v1, v2,..., vN} is linearly independent if (9.1) implies 21 = 22 = ... = 2 = 0. As a numerical example, consider the two vectors v1 = (1,2) and

v2 = (2,1) from R2. These two vectors are linearly independent because 2v1 + 2v2 = 2(1,2) + ^(2,1) = 0 ^ 2 + 2y = 0,22 + y = 0

and 2 + 2// = 0, 22 + y = 0 ^ 2 = 0, y = 0

Theorem 9.3 . Every non empty subset of a linearly independent set is linearly independent. A linearly independent set in a vector space is said to be maximal if it is not a proper subset of any other linearly independent set. A vector space that contains a (finite) maximal, linearly independent set is then said to be finite dimensional. Of course, if V is not finite dimensional, then it is called infinite dimensional. In this text we shall be concerned only with finite­ dimensional vector spaces.

Theorem 9.4 . Any two maximal, linearly independent sets of a finite-dimensional vector space must contain exactly the same number of vectors. Proof. Let {v1,..., vN} and {u1, , uM } be two maximal, linearly independent sets of V . Then we must show that N = M. Suppose that N ^ M, say N < M. By the fact that {v1,..., vN} is

{

}

maximal, the sets v1,...,vN,u1 ,...,{v1,...,vN,uM} are all linearly dependent. Hence there exist relations 211v1 + - + 21 N v N + /4u1 = 0

: 2M 1v1 +

(9.2)

+ 2MN v N + ^M uM = 0

where the coefficients of each equation are not all equal to zero. In fact, the coefficients ^1,...,y.M of the vectors u1,...,uM are all nonzero, for if u. = 0 for any i, then

2 1 v1 + - + 2iN v N =

0

Chap. 3

48

VECTOR SPACES

for some nonzero Xi 1,...,XiN contradicting the assumption that {v1,...,vN} is a linearly independent

set. Hence we can solve (9.2) for the vectors {u1,... ,uM} in terms of the vectors{v1,..., vN} , obtaining = P11 v1 + - + PN vN

u1

(9.3) u M = PM1 V1 +

+ PMN v N

where pij = -Xj Ipi for i = 1,...,M; j = 1,...,N.

Now we claim that the first N equations in the above system can be inverted in such a way that the vectors {v1,..., vN} are given by linear combinations of the vectors {u1, ,uN} . Indeed,

inversion is possible if the coefficient matrix [pij]for i, j = 1,...,N is nonsingular. But this is clearly the case, for if that matrix were singular, there would be nontrivial solutions {a1,....,aN} to the linear system N

Xa P >

v

i=1

= 0,

j = 1,..., N

(9.4)

Then from (9.3) and (9.4) we would have N

N

/N \ | aPj Vj = o \ i=1 J

^au = 2 S i

j=1

i =1

contradicting the assumption that set {u1,..., uN} ,being a subset of the linearly independent set

{u1,...,uM}, is linearly independent. Now if the inversion of the first N equations of the system (9.3) gives V1

= £1u1 + -+£N u N

(9.5) VN

= ^N 1u1 + ... + ^NN u N

Linear Independence, Dimension, Basis

Sec. 9

49

where ^Jj] is the inverse of [yj] for i, j = 1,...,N ,we can substitute (9.5) into the remaining

M - N equations in the system (9.3), obtaining r

n

-k| - -

u N+1 =

j=1

+1 jjji

i=1

r

n

uM =

\

n

n

J

U

\

-|k- ^k J j=1

i=1

But these equations contradict the assumption that the set {u1,... , uM} is linearly independent. Hence M > N is impossible and the proof is complete.

An important corollary of the preceding theorem is the following.

Theorem 9.5 . Let {u1, , uN} be a maximal linearly independent set in V , and suppose that

{v1,..., vN}

is given by (9.5). Then {v1,..., vN} is also a maximal, linearly independent set if and

only if the coefficient matrix [ j] in (9.5) is nonsingular. In particular, if vi = ui for i = 1,...,k -1, k +1,...,N but vk ^ uk , then {u1,...,uk-1, vk,uk+1,...,uN} is a maximal, linearly independent set if and only if the coefficient jkk in the expansion of vk in terms of {u1,...,uN } in

(9.5) is nonzero. From the preceding theorems we see that the number N of vectors in a maximal, linearly independent set in a finite dimensional vector space V is an intrinsic property of V . We shall call this number N the dimension of V , written dim V , namely N = dim V , and we shall call any maximal, linearly independent set of V a basis of that space. Theorem 9.5 characterizes all bases of V as soon as one basis is specified. A list of examples of bases for vector spaces follows:

1.

The set ofN vectors

(1,0,0,...,0) (0,1,0,...,0) (0,0,1,...,0) (0,0,0,...,1)

x------ v-----N

Chap. 3

50

VECTOR SPACES

is linearly independent and constitutes a basis for C N , called the standard basis. 2.

If M2x2 denotes the vector space of all 2 x 2 matrices with elements from the complex numbers C , then the four matrices 10 0 0,

0 1 00

0 0

0 0 01

10,

form a basis for M 2x2 called the standard basis.

3.

The elements 1 and i = V-1 form a basis for the vector space C of complex numbers over the field of real numbers.

The following two theorems concerning bases are fundamental.

Theorem 9.6 . If {e1,e2,...,eN} is a basis for V , then every vector in V has the representation

N

^e

v=

(9.6)

j j

j=1

{

}

where ?, £\...,£N are elements of C which depend upon the vector v e V.

The proof of this theorem is contained in the proof of Theorem 9.4.

}

Theorem 9.7 . The N scalars { ||u1|-||u2|| for all u1,u2 in an inner product space V.

12.8

Prove by direct calculation that

12.5 12.6

^ ^ N=1

k in (12.9) is real.

N=1 ekp.jp.

If U is a subspace of an inner product space V , prove that U is also an inner product space. 12.10 Prove that theN x N matrix [ejk J defined by (12.6) is nonsingular.

12.9

70

Chap. 3

VECTOR SPACES

Section 13. Orthonormal Bases and Orthogonal Complements Experience with analytic geometry tells us that it is generally much easier to use bases consisting of vectors which are orthogonal and of unit length rather than arbitrarily selected vectors. Vectors with a magnitude of 1 are called unit vectors or normalized vectors. A set of vectors in an inner product space V is said to be an orthogonal set if all the vectors in the set are mutually orthogonal, and it is said to be an orthonormal set if the set is orthogonal and if all the vectors are unit vectors. In equation form, an orthonormal set {i1,...,iM} satisfies the conditions

i jk j • i k = $jk

ifj=k

= |1 ‘0

(13.1)

if j * k

The symbol 8jk introduced in the equation above is called the Kronecker delta.

Theorem 13. 1. An orthonormal set is linearly independent. Proof. Assume that the orthonormal set {i1, i2,..., iM} is linearly dependent, that is to say there

}

{

exists a set of scalars l1,//,..., IM , not all zero, such that M

EAi j=1

j

=0

The inner product of this sum with the unit vector ik gives the expression M

E A8 = 18

1 k + - + AM 8k jM

=l=0

j=1

for k= 1, 2,..., M. A contradiction has therefore been achieved and the theorem is proved.

As a corollary to this theorem, it is easy to see that an orthonormal set in an inner product space V can have no more than N = dimV' elements. An orthonormal set is said to be complete in an inner product space V if it is not a proper subset of another orthonormal set in the same space. Therefore every orthonormal set with N = dimV elements is complete and hence maximal in the sense of a linearly independent set. It is possible to prove the converse: Every complete orthonormal set in an inner product space V has N = dimV elements. The same result is put in a slightly different fashion in the following theorem.

Sec. 13

Orthonormal Bases, Orthogonal Complements

71

Theorem 13. 2. A complete orthonormal set is a basis forV ; such a basis is called an orthonormal basis. Any set of linearly independent vectors can be used to construct an orthonormal set, and likewise, any basis can be used to construct an orthonormal basis. The process by which this is done is called the Gram-Schmidt orthogonalization process and it is developed in the proof of the following theorem. Theorem 13. 3. Given a basis {e1,e2,...,eN}of an inner product spaceV, then there exists an orthonormal basis {i1,...,iN} such that {e1...,ek} and {i1,...,ik} generate the same subspace Uk of V, for each k= 1,., N.

Proof. The construction proceeds in two steps; first a set of orthogonal vectors is constructed, then this set is normalized. Let {d1,d2,...,dN } denote a set of orthogonal, but not unit, vectors. This set

is constructed from {e1, e2,..., eN} as follows: Let d1= e1 and put d2 = e2 + pd1

The scalar p will be selected so that d 2 is orthogonal to d1; orthogonality of d1 and d2 requires that their inner product be zero; hence

d2 • d1 = 0 = e2 • d1 + pd1 • d1 implies P = - elA

d1 • di where d1 • d1 P 0 since d1 p 0. The vector d2 is not zero, because e1 and e2 are linearly independent. The vector d3 is defined by

d3= e3+P2d2+P1d1 The scalars P2 and P1 are determined by the requirement that d3 be orthogonal to both d1 and d2 ; thus

Chap. 3

72

VECTOR SPACES

d3 • d1 = e3 • d1 + ^d1 • d1 = 0 d3 • d2 = d3 • d2 + ^2d2 • d2 = 0 and, as a result, el = - el • d1 d1 • dj

d d

/2 = - e3 • d2 d2 • d2

The linear independence of e1,e2,and e3 requires that d3 be nonzero. It is easy to see that this scheme can be repeated until a set of N orthogonal vectors {d1, d..., d N} has been obtained. The orthonormal set is then obtained by defining

ik = dk /||dJ|, k = 1,2,.,N It is easy to see that {e1.,ek}, {d1,.,dk},and, hence, {i1,., ik} generate the same subspace for

each k.

The concept of mutually orthogonal vectors can be generalized to mutually orthogonal subspaces. In particular, if U is a subspace of an inner product space, then the orthogonal complement of U is a subset of V, denoted by U1, such that

U 1={v|v• u = 0 for all u eU}

(13.2)

The properties of orthogonal complements are developed in the following theorem.

Theorem 13. 4. If U is a subspace of V, then (a) U1 is a subspace of V and (b) V = U ® U1.

Proof. (a) If u1 and u2 are in U1 , then

u1 • v = u2 • v = 0 for all v e U . Therefore for any 21,22 e C,

Sec. 13

Orthonormal Bases, Orthogonal Complements

73

(Alui + ^2u2 ) • v = Alui • v + ^2u2 • v = 0 Thus A1u1 + A2u2 e U1.

(b) Consider the vector space U + U1. Let v e U o U1. Then by (13.2), v • v = 0, which implies v = 0. Thus U + U 1= U ® U1. To establish that V = U ® U1, let {ip...,iR}be an orthonormal basis for U . Then consider the decomposition R

R

v=/2 e1 + 3/5/2 + 2 e2

The contravariant components of the vector v are illustrated in Figure 4. To develop a geometric feeling for covariant and contravariant components, it is helpful for one to perform a vector decomposition of this type for oneself.

e2

Figure 4. The covariant and contravariant components of a vector in R2. A substantial part of the usefulness of orthonormal bases is that each orthonormal basis is self-reciprocal; hence contravariant and covariant components of vectors coincide. The self-

Chap. 3

82

VECTOR SPACES

reciprocity of orthonormal bases follows by comparing the condition (13.1) for an orthonormal basis with the condition (14.1) for a reciprocal basis. In orthonormal systems indices are written only as subscripts because there is no need to distinguish between covariant and contravariant components.

Formulas for transferring from one basis to another basis in an inner product space can be developed for both base vectors and components. In these formulas one basis is the set of linearly independent vectors {e1..... eN}, while the second basis is the set {e15..., eN}. From the fact that both bases generate V , we can write N

S Te . s

k = 1.2..... N

(14.17)

e .

q = 1.2..... N

(14.18)

e= =

s=1

and N

eq =

yT

, k

ks where Tqk k and Tks are both sets of N22 scalars that are related to one another. From Theorem 9.5 the

N x N matrices [T J and [Tqk J must be nonsingular.

Substituting (14.17) into (14.18) and replacing eq by NN

y

es. we obtain

N=1 Sqs

N

yy

y

k=1 s=1

s=1

e qqksq = T kTs e = S e

which can be rewritten as

" (N*. y | y T^kT s=1

x k=1

qks

A - ^q |e s = 0

(14.19)

)

The linear independence of the basis {es} requires that

N

y TT = S

q

k=1

(14.20)

Sec. 14.

Reciprocal Basis and Change ofBasis

83

A similar argument yields N w—
W, show that

dim R (BA) < min (dim R ( A) ,dim R (B))

16.4

Let A : V ^ U be a linear transformation and define A : V /K (A) ^ U as in Exercises 15.3. If P :V >V/K(A) is the canonical projection defined in Exercise 15.4, show that A = AP

This result along with the results of Exercises 15.3 and 15.4 show that every linear transformation can be written as the composition of an onto linear transformation and a regular linear transformation.

Sec. 17

Special Types

97

Section 17. Special Types ofLinear Transformations In this section we shall examine the properties of several special types of linear transformations. The first of these is one called an isomorphism. In section 6 we discussed group isomorphisms. A vector space isomorphism is a regular onto linear transformation A : V ^ U . It immediately follows from Theorem 15.8 that if A : V ^ U is an isomorphism, then

dimV = dimU

(17.1)

An isomorphism A : V ^ U establishes a one-to-one correspondence between the elements of V and U . Thus there exists a unique inverse function B : U >V with the property that if u= Av

(17.2)

v=B(u)

(17.3)

then

for all u e U and v e V. We shall now show that B is a linear transformation. Consider the vectors u1 and u2 eU and the corresponding vectors [as a result of (17.2) and (17.3)] v1 and

v2 e V. Then by (17.2), (17.3), and the properties of the linear transformation A,

B (Au1 + pu2) = B (A Av. + pAv2)

(

)

= B A (Av1 + pv 2) = Av1 + z/v 2

= AB (u1) + lib (u2)

Thus B is a linear transformation. This linear transformation shall be written A-1 . Clearly the linear transformation A-1 is also an isomorphism whose inverse is A ; i.e.,

(A-1)-1= A

(17.4)

LINEAR TRANSFORMATIONS

Chap. 4

98

Theorem 17. 1. If A : V ^ U and B : U ^ W are isomorphisms, then BA : V ^ W is an isomorphism whose inverse is computed by

(BA)-1 = A-1B-1

(17.5)

Proof. The fact that BA is an isomorphism follows directly from the corresponding properties of A and B . The fact that the inverse of BA is computed by (17.5) follows directly because if u=Av

and

w=Bu

then

v=A-1u

and

u=B-1w

Thus

v = (BA)-1 w = A-1B-1w

Therefore

((BA)

1

)

- A-1B-1 w = 0 for all w e W, which implies (17.5).

The identity linear transformation I: V >V is defined by (17.6)

Iv = v

for all v in V . Often it is desirable to distinguish the identity linear transformations on different vector spaces. In these cases we shall denote the identity linear transformation by IV . It follows from (17.1) and (17.3) that if A is an isomorphism, then

AA-1=IUand

A-1A = IV

Conversely, if A is a linear transformation from V toU , and if there exists a linear transformation B : U >V such that AB = IU and BA = I^, then A is an isomorphism and

(17.7)

Sec. 17

99

Special Types

B=A-1. The proof of this assertion is left as an exercise to the reader. Isomorphisms are often referred to as invertible or nonsingular linear transformations.

A vector space V and a vector space U are said to be isomorphic if there exists at least one isomorphism from V toU . Theorem 17. 2. Two finite-dimensional vector spaces V and U are isomorphic if and only if they have the same dimension.

Proof. Clearly, if V and U are isomorphic, by virtue of the properties of isomorphisms, dimV=dimU. If U and V have the same dimension, we can construct a regular onto linear transformation A :V ^ U as follows. If {e1,...,eN} is a basis for Vand {b1,...,bN} is a basis for U , define A by k=1,...,N

Aek=bk,

(17.8)

Or, equivalently, if N

v=

Ev e

* *

k =1

then define A by N

Av =

X v 'b *

*

(17.9)

k =1

A is regular because if Av = 0, then v=0. Theorem 15.10 tells us A is onto and thus is an isomorphism.

As a corollary to Theorem 17.2, we see that V and the vector space CN, where N = dimV , are isomorphic. In Section 16 we introduced the notation L(V;U)for the vector space of linear

transformations from V to U . The set L(V;V) corresponds to the vector space of linear

transformations V >V . An element of L(V; V) is called an endomorphism ofV”. This nomenclature parallels the previous usage of the word endomorphism introduced in Section 6. If

100

LINEAR TRANSFORMATIONS

Chap. 4

an endomorphism is regular (and thus onto), it is called an automorphism. The identity linear transformation defined by (17.6) is an example of an automorphism. If A e L(V;V),then it is easily seen that

(17.10)

AI = IA = A Also, if A and are inL(V;V), it is meaningful to compute the products AB and BA ;

however,

(17.11)

AB * BA

in general. For example, let V be a two-dimensional vector space with basis {e1,e2} and define A and B by the rules 2 Ae k

=

2

YA e

k j

and

Bek =

yB e

k j

j=1

j=1

where Ajk and Bjk . k,j= 1,2, are prescribed. Then 22

Y

BAe =kkjl A\Bl ,.e. and j,1=1

y

ABe. = B\A .e, kkjl j,1=1

An examination of these formulas shows that it is only for special values of Ajk and Bjkthat AB = BA.

The set L(V;V) has defined on it three operations. They are (a) addition of elements of L(V;V), (b) multiplication of an element of L(V;V) by a scalar, and (c) the product of a pair of elements of L(V;V). The operations (a) and (b) make L(V;V) into a vector space,

while it is easily shown that the operations (a) and (c) make L(V;V) into a ring. The structure

of L(V;V) is an example of an associative algebra.

Sec. 17

Special Types

101

The subset of L(V;V)that consists of all automorphisms of V is denoted by GL (V ).

It is immediately apparent that GL (V ) is not a subspace of L(V;V) , because the sum of two of its elements need not be an automorphism. However, this set is easily shown to be a group with respect to the product operation. This group is called the general linear group. Its identity element is I and if A eGL(V), its inverse is A-1 eGL(V).

A projection is an endomorphism P e L (V ;V) which satisfies the condition

P2 =P

(17.12)

The following theorem gives an important property of a projection. Theorem 17. 3. If P : V >V is a projection, then V = R (P )© K ( P )

(17.13)

Proof. Let v be an arbitrary vector in V . Let w=v-Pv

(17.14)

Then, by (17.12), Pw = Pv -P(Pv)=Pv -Pv = 0. Thus, we K (P). Since Pv e R(P), (17.14) implies that V=R(P)+K(P)

To show that R (P )n K (P ) = {0}, let u e R (P) n K (P). Then, since u e R (P) for some

v e V , u= Pv. But, since u is also inK(P),

0=Pu=P(Pv)=Pv=u

which completes the proof.

102

LINEAR TRANSFORMATIONS

Chap. 4

The name projection arises from the geometric interpretation of (17.13). Given any v e V, then there are unique vectors u e R (P) and w e K (P) such that

(17.15)

v=u+w

where

Pu = u

and

Pw = 0

(17.16)

Geometrically, P takes v and projects in onto the subspace R(P) along the subspaceK (P) .

Figure 5 illustrates this point for V= R2.

Figure 5

Given a projectionP , the linear transformation I- P is also a projection. It is easily shown that

V = R (I - P )© K (I - P )

Sec. 17

Special Types

103

and

R(I-P)=K(P), K(I-P)=R(P)

It follows from (17.16) that the restriction of P to R(P) is the identity linear transformation on the subspace R(P). Likewise, the restriction of I- P to K(P) is the identity linear transformation on K (P). Theorem 17.3 is a special case of the following theorem.

Theorem 17. 4. If Pk , k= 1,..., R, are projection operators with the properties that

Pk2 =Pk,

k=1,...,R

PkPq=0,

k^q

(17.17)

and R

i=

2P

k

(17.18)

k=1

then V = R (P1 )© R (P2 )©•••© R (PR)

(17.19)

The proof of this theorem is left as an exercise for the reader. As a converse of Theorem 17.4, if V has the decomposition

©•••© ;

v = v1 vR

(17.20)

then the endomorphisms Pk : V >V defined by

Pkv= vk,

k= 1,..., R

(17.21)

104

Chap. 4

LINEAR TRANSFORMATIONS

where v = v12 + v, +---- + v„R

(17.22)

are projections and satisfy (17.18). Moreover, Vk = R(Pk), k = 1,...,R.

Exercises 17.1

17.2 17.3 17.4

Let A : V ^ U and B: U >V be linear transformations. If AB = I, then B is the right inverse of A . If BA = I, then B is the left inverse ofA . Show that A is an isomorphism if and only if it has a right inverse and a left inverse. Show that if an endomorphism A of V commutes with every endomorphism B of V , then A is a scalar multiple of I. Prove Theorem 17.4 An involution is an endomorphism L such that L2 = I. Show that L is an involution if and only if P = 2 (L +I) is a projection.

17.5

Consider the linear transformation A

V,1

: V1 ^ U defined in Exercise 15.2. Show that if

V = V1 © K (A), then A |V,1 is am isomorphism from V 1to R (A).

17.6

If A is a linear transformation from V to U where dimV = dimU, assume that there exists a linear transformation B : U >V such that AB = I (or BA = I). Show that A is an isomorphism and that B= A-1

Sec. 18

105

Adjoint

Section 18. The Adjoint of a Linear Transformation The results in the earlier sections of this chapter did not make use of the inner product structure onV . In this section, however, a particular inner product is needed to study the adjoint of a linear transformation as well as other ideas associated with the adjoint.

Given a linear transformation A : V ^ U, a function A*: U >V is called the adjoint of

A if

( )

u •( Av ) = A*u • v

(18.1)

for all v e V and u e U. Observe that in (18.1) the inner product on the left side is the one in U , while the one for the right side is the one in V . Next we will want to examine the properties of the adjoint. It is probably worthy of note here that for linear transformations defined on real inner product spaces what we have called the adjoint is often called the transpose. Since our later applications are for real vector spaces, the name transpose is actually more important. Theorem 18.1. For every linear transformation A : V ^ U, there exists a unique adjoint A*: U >V satisfying the condition (18.1).

Proof. Existence. Choose a basis {e1,...,eN} for V and a basis {b1,...,bM}for U . Then A can be characterized by the M x N matrix fAak J in such a way that

M

Ae t =

£A

a„

k = 1,..., N

b„,

a=1

This system suffices to define A since for any v e V with the representation N

v=

2v e

k k

k=1

the corresponding representation of Av is determined by P Aak 1 and P vk 1 by

(18.2)

LINEAR TRANSFORMATIONS

Chap. 4

106

M

Av =

(

a

a=1

{

}

{

N

y| y A k v

k

y k=1

\ Ib J

}

Now let e1,..., eN and b1,..., bM be the reciprocal bases of {e1,..., eN} and {bp..., bM}, respectively. We shall define a linear transformation A* by a system similar to (18.2) except that we shall use the reciprocal bases. Thus we put N

A*ba =

yA

(18.3)

e

ka k

k=1

where the matrix [ A *ka ] is defined by

Aa AAk a = -A k

(18.4)

for all a = 1,..., M and k = 1,..., N. For any vector u eU the representation of A*u relative to e1,...,eN is then given by

{

}

NM

y|y y a *

a*u=

k=1

a=1

u

J

C defined by

X

trA= N Akk

(19.8)

k=1

It easily follows from (19.7) and (14.21) that trA is independent of the choice of basis ofV . Later we shall give a definition of the trace which does not employ the use of a basis. If A e L(V;U), then A* e L(U;V). The components of A*are obtained by the same logic as was used in obtaining (19.1). For example, N

A*b“ =

XA

ak

(19.9)

k=1

where the MN scalars A *ka (k = 1,..., N ;a = 1,..., M) are the components of A* with respect to

{b1,..., bM}and {e1,...,eN} .

From the proof of Theorem 18.1 we can relate these components of

A* to those of A in (19.1); namely,

120

LINEAR TRANSFORMATIONS

Chap. 4

Asa = Aas

(19.10)

If the inner product spaces U and V are real, (19.10) reduces to

(19.11)

ATa = Aas

By the same logic which produced (19.1), we can also write M

Ae k =

XA

(19.12)

ak b“

and MM

X A k b =X A k b

Ae k =

a

a

a

a=1

(19.13)

a

a=1

If we use (14.6) and (14.9), the various components of A are related by NMN

X

Aak =

s=1

A'

=

M

XX b^ **

=

ft=1 s=1

XA

ft^

ft=1

(19.14)

where

baft = b,,-b ft = bft,

baft = ba • bft = b

and

(19.15)

A similar set of formulas holds for the components of A*. Equations (19.14) and (19.10) can be used to obtain these formulas. The transformation rules for the components defined by (19.12) and (19.13) are easily established to be N -----

M

yy sa'A

Ak=( Ae k )• k a a=

a

jj

ft=1 j=1

ft

T

(19.16)

Sec. 19

Component Formulas

121

M

)

Aak = ( Aek • ba =

___

N

XX SaA p

T

(19.17)

Pj j

p=1 j=1

and N -----

M

( )

SS S

Aak = Aek • ba =

-----

(19.18)

A

a PT

p=1 j=1

where

( )

(19.19)

A Pj = Aej • b p

(

(19.20)

( ) f

(19.21)

-Aa = Ae, • ba

)

and

A‘ = Ae j • b

a

a

The quantities Sba introduced in (19.16) and (19.18) above are related to the quantities S a by formulas like (14.20) and (14.21).

Exercises 19.1 19.2 19.3

Express (18.20), written in the form A*A = I, in components. Verify (19.14) Establish the following properties of the trace of endomorphisms of V :

trI = dimV tr(A+B)=trA+trB tr ( 2A ) = 2trA

tr AB = trBA

122

LINEAR TRANSFORMATIONS

Chap. 4 and

trA* = trA

19.4

If A,B e L(V;V), where V is an inner product space, define A • B = tr AB*

Show that this definition makes L(V;V) into an inner product space.

19.5 19.6 19.7

19.8

In the special case when V is a real inner product space, show that the inner product of Exercise 19.4 implies that A (V ;V ) ± S (V ;V) . Verify formulas (19.16)-(19.18). Use the results of Exercise 14.6 along with (19.14) to derive the transformation rules (19.16)-(19.18) Given an endomorphism A e L(V;V) , show that

tr A = ±Ak k=1

( )

where Akq = Aeq • ek . Thus, we can compute the trace of an endomorphism from (19.8)

or from the formula above and be assured the same result is obtained in both cases. Of course, the formula above is also independent of the basis ofV . 19.9

Exercise 19.8 shows that trAcan be computed from two of the four possible sets of components of A . Show that the quantities

^

:1 Akk and

N

£

N

Akk are not equal to each

other in general and, moreover, do not equal tr A . In addition, show that each of these quantities depends upon the choice of basis of V . 19.10 Show that

(

)

A *ka = A*ba • ek = Aak

A*ka

and

(

=

)

ba • ek = Aak

A*

Sec. 19



123

Component Formulas

(

)

A'ka = A'ba • ek = Aak

Chapter 5 DETERMINANTS AND MATRICES In Chapter 0 we introduced the concept of a matrix and examined certain manipulations one can carry out with matrices. In Section 7 we indicated that the set of 2 by 2 matrices with real numbers for its elements forms a ring with respect to the operations of matrix addition and matrix multiplication. In this chapter we shall consider further the concept of a matrix and its relation to a linear transformation.

Section 20. The Generalized Kronecker Deltas and the Summation Convention Recall that a M x N matrix is an array written in anyone of the forms A11

A11

A1 N A =[ Aaj ],

A=

AM1

AM N A

A1N

A=

A

AM1

(20.1) A

A1

A =[ Aa],

A=

•••

AN = [ Aaj ]

A =• •

A

AM1

A

A

Throughout this section the placement of the indices is of no consequence. The components of the matrix are allowed to be complex numbers. The set of M x N matrices shall be denoted byMMxN . It is an elementary exercise to show that the rules of addition and scalar multiplication insure that M MxN is a vector space. The reader will recall that in Section 8 we used the set M MxN as one of several examples of a vector space. We leave as an exercise to the reader the fact that the dimension of M MxN isMN . In order to give a definition of a determinant of a square matrix, we need to define a permutation and consider certain of its properties. Consider a set of K elements {a1,..., aK}. A

permutation is a one-to-one function from {a1,...,aK} to{a1,...,aK }. If r is a permutation, it is customary to write

125



Chap 5

126

DETERMINANTS AND MATRICES

a1

a2



aK

a(ai)

a(a2)



^(aK )

(20.2)

It is a known result in algebra that permutations can be classified into even and odd ones: The permutation a in (20.2) is even if an even number of pairwise interchanges of the bottom row is required to order the bottom row exactly like the top row, and a is odd if that number is odd. For a given permutation a, the number of pairwise interchanges required to order the bottom row the same as the top row is not unique, but it is proved in algebra that those numbers are either all even or all odd. Therefore the definition fora to be even or odd is meaningful. For example, the permutation

2 3X 1 3J

is odd, while the permutation f1 a=I I2

2 3

3

1

is even.

a

The parity of a, denoted by s , is defined by +1 if a is an even permutation

sa =
N-M > 0, since R(A) is a subspace of U and M = dimU.

If in (23.1) and (23.3) M = N, the system (23.1) has a solution for all uk if and only if (23.3) has the trivial solution v1 = v2 = ••• = vN = 0 only. For, in this circumstance, A is regular

Systems ofLinear Equations

Sec. 23

143

and thus invertible. This means det |_ Aak J ^ 0, and we can use (21.19) and write the solution of

(23.1) in the form

vj=

N 1 (cof Aaj) ua det [ Aak J a=1

^

(23.4)

which is the classical Cramer's rule and is to N dimension of equation (0.32).

Exercise 23.1

Given the system of equations (23.1), the augmented matrix of the system is the matrix obtained from [Aak J by the addition of a co1umn formed from u’,...,uN . Use Theorem 23.1 and prove that the system (23.1) has a solution if and only if the rank of [AakJ equals the rank of the augmented matrix.

Chapter 6

SPECTRAL DECOMPOSITIONS In this chapter we consider one of the more advanced topics in the study of linear transformations. Essentially we shall consider the problem of analyzing an endomorphism by decomposing it into elementary parts.

Section 24. Direct Sum ofEndomorphisms If Ais an endomorphism of a vector space V , a subspace V1 of V is said to be A invariant if A maps V1 to V1 . The most obvious example of an A -invariant subspace is the null space K(A). Let A1,A2,...,AL be endomorphisms of V ; then an endomorphism A is the direct sum of A1,A2,...,AL if (24.1)

A = A1 + A2 + ••• + A L and

Ai A j = 0

i*j

(24.2)

For example, equation (17.18) presents a direct sum decomposition for the identity linear transformation.

Theorem 24. 1. If A e L (V ;U) and V = V11 ® V2 ® •••® VL, where each subspace Vj is A invariant, then A has the direct sum decomposition (24.1), where each Ai is given by

Aivi=Avi

(24.3)1

Aivj=0

(24.3)2

for all vie Vi and

145

SPECTRAL DECOMPOSITIONS

Chap 6

146

For all v j ^Vj, i ^ j , for i = 1,..., L . Thus the restriction of A to Vj coincides with that of A j; further, each Vi is Aj -invariant, for all j= 1,..., L.

Proof. Given the decomposition V = V1 ® V2 ®---®VL, then v e V has the unique representation v = v1 +-----+ vL, where each vi e Vi . By the converse of Theorem 17.4, there exist L projections Pj,...,PL (19.18) and also vj e Pj(V) . From (24.3), we have Aj = APj, and since Vj is A invariant, APj = PjA . Therefore, if i ^ j ,

AiA iP =0 ij= APiAP i j= AAP ij where (17.17)2 has .been used. Also

A1 + A 2 + ••• + A L = AP1 + AP2 + ••• + APL = A(P1 + P2 +••• + Pl ) =A

where (17.18) has been used. When the assumptions of the preceding theorem are satisfied, the endomorphism Ais said to be reduced by the subspaces V1,...,VL . An important result of this circumstance is contained in the following theorem.

Theorem 24. 2. Under the conditions of Theorem 24.1, the determinant of the endomorphism A is given by det A = det A1 det A 2 ••• det A L

(24.4)

where Ak denotes the restriction of Ato Vk for all k= 1,..., L.

The proof of this theorem is left as an exercise to the reader.

Exercises 24.1

Assume that A eL(V;V) is reduced by V1,...,VL and select for the basis of V the union of some basis for the subspaces V1,...,VL . Show that with respect to this basis M(A) has the following block form:

Sec. 24



Direct Sum ofEndomorphisms

147

0 (24.5)

M(A)= 0

AL

where Aj is the matrix of the restriction of A to Vj . 24.2 24.3

Use the result of Exercise 24.1 and prove Theorem 24.2. Show that if V = V1 ®V2 ®---®VL, then A e L (V ;V) is reduced by V1,..., VL if and only if APj= PjA for j= 1,..., L, where Pj is the projection on Vj given by (17.21).

148

SPECTRAL DECOMPOSITIONS

Chap 6

Section 25 Eigenvectors and Eigenvalues Given an endomorphism A e L(V;V), the problem of finding a direct sum decomposition of Ais closely related to the study of the spectral properties of A . This concept is central in the discussion of eigenvalue problems.

A scalar 2 is an eigenvalue of A e L (V;V) if there exists a nonzero vector v eV such that

Av = 2v

(25.1)

The vector v in (25.1) is called an eigenvector of A corresponding to the eigenvalue 2 . Eigenvalues and eigenvectors are sometimes refered to as latent roots and latent vectors, characteristic roots and characteristic vectors, and proper values and proper vectors. The set of all eigenvalues of A is the spectrum of A, denoted by rr( A). For any 2 e rr( A), the set V (2) = {v e V\ Av = 2v}

is a subspace of V , called the eigenspace or characteristic subspace corresponding to 2 . The geometric multiplicity of 2 is the dimension of V (2).

Theorem 25. 1. Given any 2 e ct(A) , the corresponding eigenspace V(2) is A -invariant.

The proof of this theorem involves an elementary use of (25.1). Given any 2 e ct(A) , the restriction of v to V (2), denoted byAV ( ), has the property that

2

AV (2)u

= 2u

(25.2)

2

for all u in V (2). Geometrically, AV ( ) simply amplifies u by the factor 2. Such linear transformations are often called dilatations. Theorem 25. 2. A e L(V;V) is not regular if and only if 0 is an eigenvalue of A ; further, the corresponding eigenspace is K(A).

Sec. 25

Eigenvectors and Eigenvalues

149

The proof is obvious. Note that (25.1) can be written as (A - 2I)v = 0, which shows that

(25.3)

V (2) = K (A - 2I)

Equation (25.3) implies that 2 is an eigenvalue of A if and only if A-2I is singular. Therefore, by Theorem 22.2,

2 g ct(A) ^ det(A - 2I) = 0

(25.4)

The polynomial f (2) of degree N = dimV defined by

f(2)= det(A - 2I)

(25.5)

is called the characteristic polynomial of A . Equation (25.5) shows that the eigenvalues of A are roots of the characteristic equation

(25.6)

f(2)=0

Exercises 25.1

Let A be an endomorphism whose matrix M(A) relative to a particular basis is a triangular matrix, say

A11 A12

A1 N A

M(A)=

0

ANN

What is the characteristic polynomial of A ? Use (25.6) and determine the eigenvalues of A.

25.2

Show that the eigenvalues of a Hermitian endomorphism of an inner product space are all real numbers and the eigenvalues of a skew-Hermitian endomorphism are all purely imaginary numbers (including the number 0= i0).

150

Chap 6



SPECTRAL DECOMPOSITIONS

25.3

Show that the eigenvalues of a unitary endomorphism all have absolute value 1 and the eigenvalues of a projection are either 1 or 0.

25.4

Show that the eigenspaces corresponding to different eigenvalues of a Hermitian endomorphism are mutually orthogonal.

25.5

If B e L (V ;V) is regular, show that B-1AB and A have the same characteristic equation. How are the eigenvectors of B-1AB related to those of A ?

Sec. 26

The Characteristic Polynomial

151

Section 26 The Characteristic Polynomial In the last section we found that the eigenvalues of A e L(V; V) are roots of the characteristic polynomial

f (2) = 0

(26.1)

f (2) = det(A - 2I)

(26.2)

where

If {e1,..., e2} is a basis for V, then by (19.6)

Aek= Ajkej

(26.3)

Therefore, by (26.2) and (21.11),

f 2) = 71!5''N (AN - A^) - (ANN - ^-)

(26.4)

If (26.4) is expanded and we use (20.11), the result is

f (2) = (-2)N + ^ (-2)N-1 + ••• + a—1 (-2) + a

(26.5)

where

= j! ^i'1:J;A',i ••• j

(26.6)

Since f (2) is defined by (26.2), the coefficients p.j, j = 1,...,N, are independent of the choice of basis for V. These coefficients are called the fundamental invariants of A . Equation (26.6) specializes to

152

SPECTRAL DECOMPOSITIONS

Chap 6

Pi = tr A,

}

{

p2 = 2 (trA)2 - tr A2 ,

and

ll = det A

(26.7)

where (21.4) has been used to obtain (26.7)2. Since f (A) is a Nth degree polynomial, it can be factored into the form

f (A) = (A1- A)d1 (A2 - A)d2... (Al - A)dL

(26.8)

where A1,...,AL are the distinct foots of f(A) = 0 and d1,..., dL are positive integers which must

satisfy

^

L==1

dj = N. It is in writing (26.8) that we have made use of the assumption that the scalar

field is complex. If the scalar field is real, the polynomial (26.5), generally, cannot be factored. In general, a scalar field is said to be algebraically closed if every polynomial equation has at least one root in the field, or equivalently, if every polynomial, such as f (A), can be factored into the form (26.8). It is a theorem in algebra that the complex field is algebraically closed. The real field, however, is not algebraically closed. For example, if A is real the polynomial equation f(A) = A2 +1=0 has no real roots. By allowing the scalar fields to be complex, we are assured that every endomorphism has at least one eigenvector. In the expression (26.8), the integer dj is

called the algebraic multiplicity of the eigenvalue Aj . It is possible to prove that the algebraic multiplicity of an eigenvalue is not less than the geometric multiplicity of the same eigenvalue. However, we shall postpone the proof of this result until Section 30.

An expression for the invariant pj can be obtained in terms of the eigenvalues if (26.8) is expanded and the results are compared to (26.5). For example, p1=trA =d1A1+d2A2+...+dLAL

(26.9) pN=detA=A1d1A2d2...ALdL

The next theorem we want to prove is known as the Cayley-Hamilton theorem. Roughly speaking, this theorem asserts that an endomorphism satisfies its own characteristic equation. To make this statement clear, we need to introduce certain ideas associated with polynomials of endomorphisms. As we have done in several places in the preceding chapters, if A e L(V;V), A 2 is defined by

A2 = AA

Sec. 26

The Characteristic Polynomial

153

Similarly, we define by induction, starting from

A0 = I and in general

A k = AA k-1 = A k-1A where k is any integer greater than one. If k and l are positive integers, it is easily established by induction that

AkAl= AlAk= Al+k

(26.10)

Thus Ak and A l commute. A polynomial in A is an endomorphism of the form

g (A) = a0AM + a1AM-1 + ••• + aM-1 A + aM I

(26.11)

where M is a positive integer and a0,...,aM are scalars. Such polynomials have certain of the properties of polynomials of scalars. For example, if a scalar polynomial

g (1) = a0

+ a1

-1

+•••+ aM-11+ aM

can be factored into

g(1) = a0( 1 - n1)( 1 - n) •••(1 - nM) then the polynomial (26.11) can be factored into

g(A) = ao(A - n1I)(A - n2I) •••(A - nM I)

(26.12)

The order of the factors in (26.12) is not important since, as a result of (26.10), the factors commute. Notice, however, that the product of two nonzero endomorphisms can be zero. Thus the formula

154

Chap 6

SPECTRAL DECOMPOSITIONS

g1(A)g2(A) = 0

(26.13)

for two polynomials g1 and g2 generally does not imply one of the factors is zero. For example, any projection P satisfies the equation P(P-I) = 0 , but generally P and P- I are both nonzero.

Theorem 26.1. (Cayley-Hamilton). If f (2) is the characteristic polynomial (26.5) for an endomorphism A , then

f (A) = (-A) N + ^ (-A)N-1 + ••• + n—1 (-A) + n I = 0

(26.14)

Proof. : The proof which we shall now present makes major use of equation (0.29) or, equivalently, (21.19). If adj(A - 2I) is the endomorphism whose matrix is adj [Apq - 25 q ], where [ Ap q ] = M (A), then by (26.2) and (0.29) (see also Exercise 40.5)

(adj(A-2I))(A-2I)=f(2)I

(26.15)

By (21.18) it follows that adj(A -2I) is a polynomial of degree N - 1 in 2. Therefore

adj(A - 2I) = B0(-2)N-1+ B1(-2)N-2+ • • • + BN-2(-2) + BN-1

(26.16)

where B0,..., BN-1 are endomorphisms determined by A . If we now substitute (26.16) and (26.15) into (26.14) and require the result to hold for all 2, we find B0=I

B0A + B1 = //J B1A + B2 = ^I • • •

B N-2A + B N-1 = U-1I

B N-1A = "I

(26.17)

The Characteristic Polynomial

Sec. 26

155

Now we multiply (26.17)1 . by (-A)N , (26.17)2 by (-A)N 1, (26.17)3 by (-A)N 2 ,...,(26.17)kby (-A)N-k+1 , etc., and add the resulting N equations, to find

(-A)N + M1 (-A)N-1 + • • • + MN-1 (-A) + MN I = 0

(26.18)

which is the desired result.

Exercises 26.1

If N = dimV is odd, and the scalar field is R , prove that the characteristic equation of any A e L V;V) has at least one eigenvalue.

26.2

If N = dimV is odd and the scalar field is R , prove that if detA >0(V is

f(2) =(-1)-N2N-L(1-2)

where N = dimV and L = dimP(V) .

26.6

If A e L(V;V) is regular, express A-1 as a polynomial in A .

26.7

Prove directly that the quantity Mj defined by (26.6) is independent of the choice of basis for V .

26.8

Let C be an endomorphism whose matrix relative to a basis {e1,...,eN} is a triangular matrix of the form

SPECTRAL DECOMPOSITIONS

Chap 6

156

0

10 • 010

0

(26.19)

M(C) = 0 01

0

0

i.e. C maps e1 to 0 , e2 to e1, e3 to e2,.... Show that a necessary and sufficient condition for the existence of a component matrix of the form (26.19) for an endomorphism C is that

CN=0

but

CN-1 * 0

(26.20)

We call such an endomorphism C a nilcyclic endomorphism and we call the basis {e1,...,eN} for the form (26.19) a cyclic basis for C . 26.9

If $,..., $N denote the fundamental invariants of A-1, use the results of Exercise 26.4 to show that

0j = (det A)//-.—j

for j=1,...,N. 26.10 It follows from (26.16) that

BN-1 = adjA Use this result along with (26.17) and show that

adj A = (-A)N-1 + m (—A)N—2 + • • • + MN—1I 26.11 Show that MN—1 = tr(adj A)

Sec. 26

The Characteristic Polynomial

157

and, from Exercise (26.10), that

{

}

-A)N 1 + ^1 tr(-A)N 2 + ■ ■ ■ + ^N-2 tr(-A) Nt--rW 1

.LlN-1 =----

158

SPECTRAL DECOMPOSITIONS

Chap 6

Section 27. Spectral Decomposition for Hermitian Endomorphisms An important problem in mathematical physics is to find a basis for a vector space V in which the matrix of a given A e L(V;V) is diagonal. If we examine (25.1), we see that if V has a basis of eigenvectors of A , then M(A) is diagonal and vice versa; further, in this case the diagonal elements of M(A) are the eigenvalues of A . As we shall see, not all endomorphisms have matrices which are diagonal. Rather than consider this problem in general, we specialize here to the case where A ,.then M (A) is Hermitian and show that every Hermitian endomorphism has a matrix which takes on the diagonal form. The general case will be treated later, in Section 30. First, we prove a general theorem for arbitrary endomorphisms about the linear independence of eigenvectors. Theorem 27.1. If X,...,XL are distinct eigenvalues of A eL(V;V) and if u1,...,uL are

eigenvectors corresponding to them, then {u1,..., uL} form a linearly independent set.

Proof. If {u1,..., uL} is not linearly independent, we choose a maximal, linearly independent subset, say {u1,..., uS}, from the set {u1,..., uL} ; then the remaining vectors can be expressed

uniquely as linear combinations of {u1,...,uS}, say

uS+11 = a11 ,u, +----- + aSSS uS

(27.1)

where a1,...,aS are not all zero and unique, because {u1,..., uS} is linearly independent. Applying

A to (27.1) yields

Xs+1u s+1 = (aX)u1 + ••• + (asXs )u s

(27.2)

Now if XS+1 = 0, then X1,...,XS are nonzero because the eigenvalues are distinct and (27.2)

contradicts the linear independence of {u1,..., u s}; on the other hand, if Xs+1 ^ 0, then we can divide (27.2) by Xs+1 , obtaining another expression of us+1 as a linear combination of {u1,..., us}

contradicting the uniqueness of the coefficients a1,...,as . Hence in any case the maximal linearly independent subset cannot be a proper subset of {u1,..., uL} ; thus {u1,..., uL} is linearly

independent.

Sec. 27

159

Hermitian Endomorphisms

As a corollary to the preceding theorem, we see that if the geometric multiplicity is equal to the algebraic multiplicity for each eigenvalue of A , then the vector space V admits the direct sum representation

v

= v (A1) ©v(A2) ©•••©r (AL)

where A1,..., AL are the distinct eigenvalues of A . The reason for this representation is obvious, since the right-hand side of the above equation is a subspace having the same dimension as V ; thus that subspace is equal to V . Whenever the representation holds, we can always choose a basis of V formed by bases of the subspaces V (A1),...,V (AL) . Then this basis consists entirely of eigenvectors of A becomes a diagonal matrix, namely,

A1

0

A2

• •

M (A)=

(27.3)1

A2

• • AL

0

• •

AL

where each AK is repeated dk times, dk being the algebraic as well as the geometric multiplicity of AK . Of course, the representation of V by direct sum of eigenspaces of A is possible if A has

N = dimV distinct eigenvalues. In this case the matrix of A taken with respect to a basis of eigenvectors has the diagonal form

160

SPECTRAL DECOMPOSITIONS

Chap 6

A

0

M(A)=

(27.3)2

0 ^

If the eigenvalues of v are not all distinct, then in general the geometric multiplicity of an eigenvalue may be less than the algebraic multiplicity. Whenever the two multiplicities are different for at least one eigenvalue of A , it is no longer possible to find any basis in which the matrix of A is diagonal. However, if V is an inner product space and if A is Hermitian, then a diagonal matrix of A can always be found; we shall now investigate this problem.

Recall that if u and v are arbitrary vectors in V, the adjoint A* of A e L(V;V) is defined by

Au • v = u • A*v

(27.4)

As usual, if the matrix of A is referred to a basis {ek} , then the matrix of A* is referred to the

{ }

reciprocal basis ek and is given by [cf. (18.4)]

M(A*) = M(A)T

(27.5)

where the overbar indicates the complex conjugate as usual. If A Hermitian, i.e., if A= A* , then (27.4) reduces to Au • v =u •Av for all u,ve V .

Theorem 27.2. The eigenvalues of a Hermitian endomorphism are all real. Proof. Let A e LV;V) be Hermitian. Since Au = Au for any eigenvalue A, we have

(27.6)

Sec. 27

Hermitian Endomorphisms

161

2 = Au-u u•u

(27.7)

Therefore we must show that Au • u is real or, equivalently, we must show Au • u = Au • u . By (27.6)

Au • u =u •Au =Au •u

(27.8)

where the rule u • v = v • u has been used. Equation (27.8) yields the desired result.

Theorem 27. 3. If A is Hermitian and if V1 is an A -invariant subspace of V, then V11 is also A invariant .

Proof. If v e V1 and u e V11, then Av • u = 0 because Av e V1 . But since A is Hermitian, Av • u = v • Au = 0 . Therefore, Au e Vf1, which proves the theorem.

Theorem 27. 4. If A is Hermitian, the algebraic multiplicity of each eigenvalue equa1s the geometric multiplicity. Proof Let V (20) be the characteristic subspace associated with an eigenvalue 20 . Then the

geometric multiplicity of 20 is M = dimV (20) . By Theorems 13.4 and 27.3

V = V (2o) -V(Ao)1

Where V (20) and V (20)1 are A -invariant. By Theorem 24.1

A=A1+A2 and by Theorem 17.4 I= P1+P2

(27.9)

162



Chap 6

SPECTRAL DECOMPOSITIONS

where P1 projects V onto V(A00), P2 projects V onto V(^0)x, A1 = AP1, and A2 = AP2. By Theorem 18.10, P1 and P2 are Hermitian and they also commute with A . Indeed, for any v e V,

P1v VK(20) and P2v e V(20)1, and thus AP1v e V(20) and AP2v e V(^J1. But since

Av=A(P1+P2)v=AP1v+AP2v

we see that AP1v is the V(20) component of Av and AP2vis the V(^o)1 component of Av . Therefore P1Av = AP1v,

P2Av = AP2v

P1A = AP1,

P2A = AP2

for all v e V, or, equivalently

Together with the fact that P1 , and P2 are Hermitian, these equations imply that A1 and A2 are also Hermitian. Further, A is reduced by the subspaces V (20) and V (^0)1, since

A1A2= AP1AP2= A2P1P2 Thus if we select a basis {e1,..., eN} such that {e1,..., eM} span V (20) and {eM+1,..., eN} span V (^0)1, then by the result of Exercise 24.1 the matrix of A to {e k } takes the form

A11

A1 M A

AM1

A

M(A) =

AM+1

AM+1 M+1

0

M+1

Sec. 27

163

Hermitian Endomorphisms

and the matrices of A, and A2 are

0

M(A1) =

(27.10)

which imply det(A - 21) = det(A; - 2Im)) det(A2 -

By (25.2), A1 = Vm); thus by (21.21)

)

164

SPECTRAL DECOMPOSITIONS

Chap 6

det(A - 2I) = (20 - 2)M det(A2 - 2Ir()

On the other hand, 20 is not an eigenvalue of A2. Therefore

det(A 2 — 2I, () * 0 Hence the algebraic multiplicity of 20 equals M, the geometric multiplicity.

The preceding theorem implies immediately the important result that V can be represented by a direct sum of eigenspaces of A if A is Hermitian. In particular, there exists a basis consisting entirely in eigenvectors of A , and the matrix of A relative to this basis is diagonal. However, before we state this result formally as a theorem, we first strengthen the result of Theorem 27.1 for the special case that A is Hermitian.

Theorem 27. 5. If A is Hermitian, the eigenspaces corresponding to distinct eigenvalues 21 and 22 are orthogonal.

Proof. Let Au1= 21u1 and Au2= 22u2. Then

21u1 • u2 = Au1 • u2 = u1 • Au2 = 22u1 • u2

Since 21 ^ 22, u1 • u2 = 0, which proves the theorem.

The main theorem regarding Hermitian endomorphisms is the following.

Theorem 27. 6. If A is a Hermitian endomorphism with (distinct) eigenvalues 21,22,...,2N , then V has the representation V = Vf2) ©V (22) ©•••©^(2l )

where the eigenspaces V(2k ) are mutually orthogonal.

(27.11)

Sec. 27

165

Hermitian Endomorphisms

The proof of this theorem is a direct consequence of Theorems 27.4 and 27.5 and the remark following the proof of Theorem 27.1.

Corollary (Spectral Theorem). If A is a Hermitian endomorphish with (distinct) eigenvalues X1,...,XL then

N

A=

EX P j

j

(27.12)

j=1

where Pj is the perpendicular projection of V onto V (Xj) , for j = 1,..., L .

Proof. By Theorem 27.6, A has a representation of the form (24.1). Let u be an arbitrary element of V , then, by (27.11),

u = u1 +----- + u L

(27.13)

where uj e V(Xj) . By (24.3), (27.13), and (25.1)

Aju= Ajuj=Auj=Xjuj

But uj= Pju; therefore

Aj= XjPj

(no sum)

which, with (24.1) proves the corollary.

The reader is reminded that the L perpendicular projections satisfy the equations L

EP

=I

(27.14)

Pj2 = Pj

(27.15)

j

j=1

166

SPECTRAL DECOMPOSITIONS

Chap 6

(27.16)

Pj=P*j and

i*j

PjPi = 0

(27.17)

These equations follow from Theorems 14.4, 18.10 and 18.11. Certain other endomorphisms also have a spectral representation of the form (27.12); however, the projections are not perpendicular ones and do not obey the condition (27.17).

Another Corollary of Theorem 27.6 is that if A is Hermitian, there exists an orthogonal basis for V consisting entirely of eigenvectors of A . This corollary is clear because each eigenspace is orthogonal to the other and within each eigenspace an orthogonal basis can be selected. (Theorem 13.3 ensures that an orthonormal basis for each eigenspace can be found.) With respect to this basis of eigenvectors, the matrix of A is clearly diagonal. Thus the problem of finding a basis for V such that M(A) is diagonal is solved for Hermitian endomorphisms.

If f (A) is any polynomial in the Hermitian endomorphism, then (27.12),(27.15) and (27.17) can be used to show that L

^ f (2 )P

f (A) =

j

j

(27.18)

j=1

where f (2) is the same polynomial except that the variable A is replaced by the scalar 2. For example, the polynomial P2 has the representation N A

2=

Z2

2Pj

j=1

In general, f(A) is Hermitian if and only if f (2j) is real for all j = 1,..., L . If the eigenvalues of

A are all nonnegative, then we can extract Hermitian roots of A by the following rule: N

A1/k = Z21 j/kPj

(27.19)

j=1

(

)

where 2’7k > 0. Then we can verify easily that A1/k

= A . If A has no zero eigenvalues, then

Sec. 27

Hermitian Endomorphisms

167

1

L

(27.20)

A-1 = 2 — P;j j=1

Aj

which is easily confirmed. A Hermitian endomorphism A is defined to be

positive definite positive semidefinite > if v • Av ■ 1 negative semidefinite negative definite

> 0' >0 oi


0

1 and mk= gkak , then the

polynomials gk, gk2,..., gk(ak -1) are proper divisors of mk . Hence by Theorem 30.3

V(Ak) = K(gk(A)) o K(g2(A)) o---o K(gkak-1)(A)) o K(mk(A))

(30.43)

where the inclusions are strictly proper and the dimensions of the subspaces change by at least one in each inclusion. Thus (30.41) holds.

The second part of the theorem can be proved as follows: If Ak is a simple root of m , then mk= gk, and, thus K(gk(A)) = K(gk2(A)) = V(Ak) . On the other hand, if Ak is not a simple root

of m , then mk is at least of second degree. In this case gk and gk2 are both divisors of m . But

since gk is also a proper divisor of gk2, by Theorem 30.3, K(gk(A)) is strictly a proper subspace of K(gk2(A)), so that (30.42) cannot hold, and the proof is complete.

The preceding three theorems show that for each eigenvalue Ak of A , generally there are two nonzero A -invariant subspaces, namely, the eigenspace V(Ak) and the subspace K(mk(A)) . For definiteness, let us call the latter subspace the characteristic subspace corresponding to Ak and denote it ‘by the more compact notation U(Ak) . Then V(Ak) is a subspace of U(Ak) in general, and the two subspaces coincide if and only if Ak is a simple root of m . Since Ak is the only eigenvalue of the restriction of A to U(Ak), by the Cayley-Hamilton theorem we have also

U(Ak)=K((A-AkI)dk)

(30.44)

192

SPECTRAL DECOMPOSITIONS

Chap 6

where dk is the algebraic multiplicity of Ak, which is also the dimension of U(Ak). Thus we can determine the characteristic subspace directly from the characteristic polynomial of A by (30.44). Now if we define Pk to be the projection on U(Ak) in the direction of the remaining

U(Aj), j * k , namely L

R(Pk) = U(Ak), K(Pk) = © U(Aj)

(30.45)

j=1 j*k

and we define Bk to be A- AkPk on U(Ak) and 0 on U(Aj), j * k, then A has the spectral

decomposition by a direct sum L

E

A=

j=1

(30.46)

B j)

j■+

where

2=Pj PB = B.P. = Bj jj jj P

L

j=1,...,L

(30.47)

Ba = 0, 1 < aj < dj

Pj■ Pk

=

0

PjBk=BkPj=0L B j B k = 0,

j*k, j,k=1,...,L

(30.48)

,

In general, an endomorphism B satisfying the condition

Ba =0

(30.49)

for some power a is called nilpotent. From (30.49) or from Theorem 30.9 the only eigenvalue of a nilpotent endomorphism is 0, and the lowest power a satisfying (30.49) is an integer a,1 < a< N, such that tN is the characteristic polynomial of B and ta is the minimal polynomial of B . In view of (30.47) we see that each endomorphism Bj in the decomposition (30.46) is nilpotent and can be

Sec. 30

Arbitrary Endomorphisms

193

regarded also as a nilpotent endomorphism on U(2j). In order to decompose A further from (30.46) we must determine a spectral decomposition for each Bj . This problem is solved in

general as follows. First, we recall that in Exercise 26.8 we have defined a nilcylic endomorphism C to be a nilpotent endomorphism such that

but

CN = 0

CN-1 * 0

(30.50)

Where N is the dimension of the underlying vector space V . For such an endomorphism we can find a cyclic basis {e1,...,eN} which satisfies the conditions

CN-1e1= 0,

CN-1e2=e1,...,CeN=eN-1

(30.51)

CN-1eN =e1,

CN-2e2=e2,...,CeN=eN-1

(30.52)

or, equivalently

so that the matrix of C takes the simple form (26.19). Indeed, we can choose eN to be any vector

such that CN-1eN ^ 0; then the set {e1,...,eN} defined by (30.52) is linearly independent and thus forms a cyclic basis for C . Nilcyclic endomorphisms constitute only a special class of nilpotent endomorphisms, but in some sense the former can be regarded as the building blocks for the latter. The result is made precise by the following theorem.

Theorem 30.1 2. Let B be a nonzero nilpotent endomorphism of V in general, say B satisfies the conditions

but

Ba = 0

B a-1 * 0

(30.53)

for some integer a between 1 and N . Then there exists a direct sum decomposition for V :

v=

vf ®---®rM

and a corresponding direct sum decomposition for B (in the sense explained in Section 24):

(30.54)

194

Chap 6

SPECTRAL DECOMPOSITIONS

B = B1 + ••• + B M

(30.55)

such that each Bj is nilpotent and its restriction to Vj is nilcyclic. The subspaces V1,...,VM in the

decomposition (30.54) are not unique, but their dimensions are unique and obey the following rules: The maximum of the dimension of V1,...,VM is equal to the integer a in (30.53); the number Na of subspaces among V1,...,VM having dimension a is given by

Na =N-dimK(Ba-1)

(30.56)

More generally, the number Nb of subspaces among V1,...,VM having dimensions greater than or equal to b is given by

Nb =dimK(Bb) -dimK(Bb-1)

(30.57)

for all b= 1,...,a. In particular, when b = 1 , N1 is equal to the integer M in (30.54), and (30.57) reduces to

M= dimK(B)

(30.58)

Proof. We prove the theorem by induction on the dimension of V . Clearly, the theorem is valid for one-dimensional space since a nilpotent endomorphism there is simply the zero endomorphism which is nilcylic. Assuming now the theorem is valid for vector spaces of dimension less than or equal to N - 1 , we shall prove that the same is valid for vector spaces of dimension N . Notice first if the integer a in (30.53) is equal to N , then B is nilcyclic and the assertion is trivially satisfied with M = 1, so we can assume that 1 < a< N. By (30.53)2, there exists a vector ea L

'1

0

Applying this rule to the restriction of Bj to V for all j= 1,...,M and using the fact that the kernel of Bk is equal to the direct sum of the kernel of the restriction of (Bj)k for all j = 1,..., M ,

prove easily that (30.57) holds. Thus the proof is complete. In general, we cannot expect the subspaces V1,...,VM in the decomposition (30.54) to be unique. Indeed, if there are two subspaces among V1,...,VM having the same dimension, say

dim V1 = dim V2 = a, then we can decompose the direct sum V1 ® V2 in many other ways, e.g.,

V1 ®V2 = v 1 ®v 2

(30.74)

and when we substitute (30.74) into (30.54) the new decomposition

v

= v 1 ®r 2 ®V3 ®---®vn

possesses exactly the same properties as the original decomposition (30.54). For instance we can define V1 and V2 to be the subspaces generated by the linearly independent cyclic set {e 1,...,ea}

{

}

and f1,...,fa , where we choose the starting vectors ea and fa by ~

_

fa = )y e a + V self-adjoint! Another example will illustrate the point even clearer. The operation of taking the opposite vector of any vector is an isomorphism -:V >V

which is defined by using the vector space structure alone. Evidently, we cannot suppress the minus sign without any ambiguity. To test whether the isomorphism Jcan be made a canonical one without any ambiguity, we consider the effect of this choice on the notations for the dual basis and the dual of a linear transformation. Of course we wish to have {e i}, when considered as a basis for V **, to be the

{ }

dual basis of ei , and A, when considered as a linear transformation from V** to U**, to be the dual of A*. These results are indeed correct and they are contained in the following.

{ }

Theorem 32. 2. Given any basis {ei} for V , then the dual basis of its dual basis ei is {Jei}.

{ }

Proof. This result is more or less obvious. By definition, the basis {ei } and its dual basis ei are

related by

for i, j = 1,..., N. From (32.4) we have , e i) = (e i, e j)

Je j

Comparing the preceding two equations, we see that Je j, e = 8lj

{

}

which means that {Je1,., JeN} is the dual basis of e1,.eN .

{ }

Because of this theorem, after suppressing the notation for J we say that {ei} and ej are dual relative to each other. The next theorem shows that the relation holds between AandA*.

216

Chap. 7

TENSOR ALGEBRA

Theorem 32. 3. Given any linear transformation A :V >U, the dual of its dual A* is JU AJ-1. Here J V denotes the isomorphism from V to V** defined by (32.4) and JU denotes the

isomorphism from UtoU** defined by a similar condition. Proof. By definition A and A*are related by (31.13)

(u *** , Av = ( Au , v for all u* e U*, v e V. From (32.4) we have (u ****** , Av = < JU Av, u ), (Au , v = < JY v, A u )

Comparing the preceding three equations, we see that JUAJ-1 ( Jv),u‘} = (Jv, A*u*^

Since JV is an isomorphism, we can rewrite the last equation as -1 *** ** (JUAJ^v ,u ) = (v** ,Au )

Because v** e V** and u* e U* are arbitrary, it follows that JUAJ-1 is the dual of A*. So if we

suppress the notations for JUandJV,then AandA* are the duals relative to each other.

A similar result exists for the operation of taking the orthogonal complement of a subspace; we have the following result. Theorem 32. 4. Given any subspace UofV,the orthogonal complement of its orthogonal

complement U1 is J (U) . We leave the proof of this theorem as an exercise. Because of this theorem we say that UandU1 are orthogonal to each other. As we shall see in the next few sections, the use of canonical isomorphisms, like the summation convention, is an important device to achieve economy in writing. We shall make use of this device whenever possible, so the reader should be prepared to allow one symbol to represent two or more different objects.

The last three theorems show clearly the advantage of making Ja canonical isomorphism, so from now on we shall suppress the symbol forJ . In general, if an isomorphism from a vector space V to a vector space U is chosen to be canonical, then we write

Sec. 32



Second Dual Space, Canonical Isomorphism

217

V=U

(32.7)

V = V **

(32.8)

In particular, we have

Exercises 32.1 32.2

Prove theorem 32.4. Show that by making J a canonical isomorphism essentially we have identified V with V**,v****,.„, and v* with ***v****,^ . So a symbol vorv*,jnfact, represents

infinitely many objects.

Chap. 7

218

TENSOR ALGEBRA

Section 33. Multilinear Functions, Tensors In Section 31 we discussed the concept of a linear function and the concept of a bilinear function. These concepts can be generalized in an obvious way to multilinearfunctions. In general if V1,..., Vs ,is a collection of vector spaces, then a s - linear function is a function A fx-xfs^R

(33.1)

that is linear in each of its variables while the other variables are held constant. If the vector spaces Vf,..., Vs, are the vector space V or its dual space V*, then A is called a tensor on V. More specifically, a tensor of order (p, q) on V, where pand q are positive integers, is a (p+q)linear function

x--jxV V *

x Vx ^xf ^ R

p times

(33.2)

q times

We shall extend this definition to the case p= q= 0 and define a tensor of order (0,0) to be a scalar in R. A tensor of order (p,0)is a pure contravariant tensor of order p and a tensor of order (0, q) is a pure covariant tensor of order q. In particular, a vector v e V is a pure contravariant tensor of order one. This terminology, of course, is defined relative to a given vector space V as we have explained in Section 31. If a tensor is not a pure contravariant tensor or a pure covariant tensor, then it is a mixed tensor, and for a mixed tensor of order (p,q), pis the contravariant order and q is the covariant order.

For definiteness, we denote the set of all tensors of order (p,q)on V by the symbol

Tq p (V). However, the set of pure contravariant tensors of order p shall be denoted simply by Tp (V)and the set of pure covariant tensors of order q shall be denoted simply Tq (V). Of

course, tensors of order (0,0) form the set R and

T1(V)=V,

T1(V)=V*

Here we have made use of the identification of V with V' ** as explained in Section 32.

We shall now give some examples of tensors.

Example 1. If A is an endomorphism of V, then we define a function A : V * xV ^ R by

(33.3)

Sec. 33

Multilinear Functions, Tensors

(

A v*, v ) = ^ v*, Av^

219

(33.4)

./✓♦ for all v * eV and v e-1S V. Clearly A is bilinear and thus A e1~\ Tf1 (V). As we shall see later, it is

possible to establish a canonical isomorphism from L(V,V)to T1 1 (V)in such a way that the

endomorphism A is identified with the bilinear function A. Then the same symbol A shall represent two objects, namely, an endomorphism of Vand a bilinear function of V* x V. Then (33.4) becomes simply

(

)

A v*, v = ( v*, Av^

(33.5)

Under this canonical isomorphism, the identity automorphism of V is identified with the scalar product,

(

I v*, v

)

, Iv^ = ( v*, v^

= ^ v*

(33.6)

Example 2. If v is a vector in Vand v* is a covector in V*, then we define a function v ® v* :V x^K ^ R

(33.7)

by

(

)

v ® v* u*, u = (u‘, ^v‘, u)

(33.8)

for all u’eeV *, u eV. Clearly, v ® v* is a bilinear function, so v ® v* e Tf1 (V). If we make

use of the canonical isomorphism to be established between T1 1 (V) and L(V;V), the tensor v ® v* corresponds to an endomorphism of V such that

(

)

v^ v u ,u = (u ,v^v u)

or equivalently (u , v)( v ,u) = (u , v^ v u)

for all u* e V*, u e V. By the bilinearity of the scalar product, the last equation can be rewritten in the form

220

TENSOR ALGEBRA

Chap. 7 * \ \ / * * x-x u */ ,v) = \u ,v^vu

Then by the definiteness of the scalar product, we have v* ,u)v = v®v * u,

(33.9)

for all u e V

which defines v ® v* as an endomorphism of V. The tensor or the endomorphism v ® v* is called the tensor product of vand v*.

Clearly, the tensor product can be defined for arbitrary number of vectors and covectors. Let v1,...,vpbe vectors in V and v1,...,vq be covectors in V*. Then we define a function xV x-x^ ^R v1 ®-®vp ®v1 ®-®vq :V* x... -y-'x' ^ * X \____ / p times

q times

by

)

(

v1&■■■ ® vp ® v1 &■■■ ® vq u1,...,up,u1,...,uq

(33.10)

= (u1, v1) ■■■{ u p,v p)( v1, u1) ■■■{ v q,u q)

for all u1,., uq eV and u1,.,uq eV *. Clearly this function is (p+q)-linear, so that v1 ® ••• ® v p ® v1 ®--- ® v q e Tq p (Y) , is called the tensor product of v1,., v p and v1,..., v q.

Having seen some examples of tensors on V ,we turn now to the structure of the set Tqp (). We claim that Tqp () has the structure of a vector space and the dimension of

Tqp () is equal to N(p+q) , where N is the dimension of V . To make Tqp ()a vector space, we define the operation of addition of any A,Be Tqp (

) and the scalar multiplication of Aby

a e R by

(

)

(A+B) v1,.,vp,v1,.vq

(

) (

^ A v1,..., v p, Y1,... v q + B v1,..., v p, Y1,... v q

)

(33.11)

and

(aA)(v1,..., v p, Yj,... v q ) = a A (v1,..., v p, v1,... v q)

(33.12)

Sec. 33

221

Multilinear Functions, Tensors

respectively, for all v1,...,vp : V* and v1,., vq : V. We leave as an exercise to the reader the proof of the following theorem. Theorem 33.1. Tqp (V)is a vector space with respect to the operations of addition and scalar

multiplication defined by (33.11) and (33.12). The null element of Tqp (V) ,of course, is the

zero tensor 0 :

(

)

(33.13)

0 v1,...,vp, vi,...,vq = 0

for all v1,..., vp ' V* and v1,., vq r V.

Next, we determine the dimension of the vector space Tqp (V) by introducing the concept of a product basis.

{ }

Theorem 33.2. Let {ei}and ei be dual bases for VandV *. Then the set of tensor products

{

}

e,»••• ® eip ® ej1 ®-® eiq, ip.,ip, J„..„ jt = 1..... N

(33.14)

forms a basis for Tqp (V), called the product basis. In particular,

dimTqp(V)=N(p+q)

(33.15)

Proof. We shall prove that the set of tensor products (33.14) is a linearly independent generating set for Tqp (V). To prove that the set (33.14) is linearly independent, let j i 1 qei.1 ®-®eip. ®ej1 ®•••®eq = 0

i,...,i p j. ,...,j

A1

(33.16)

where the right-hand side is the zero tensor given by (33.13). Then from (33.10) we have

0 = Aj

j

=A ii r' pj....j 1q

(

)

ei ®---®ei ®eq ®---®eq ek1 ,.,ekp,el1,.,elq j1.jq 1p ee

j 1,eh Xekj p,ei }(e 1,eY\e ,el 1p 1 q

k

= A h"i\j1.j,■ q^i1 ••• diip Pdlj1 ■ ■ ■ 8lj q = A k1"'kpll1.llq

(33.17)

TENSOR ALGEBRA

Chap. 7

222

which shows that the set (33.14) is linearly independent. Next, we show that every tensor A eT^ p (V) can be expressed as a linear combination of the set (33.14). We define

}

{

N(p+q)scalars A^j j ,i1...ip,jv..jq=1,.,N by

A

\..h =

(

A e

' e ip, en,-, ejg

)

(33.18)

Now we reverse the steps of equation (33.17) and obtain

(

)

AiA .e.j1---®jq®e.ip ®ej1 ®-®ej \ ek1l,..., ekpl,e.,...e. i1 l V lq )

(

= A ek1,.,ekp,el1,.elq

(33.19)

) { }

for all k1,.,kp, l1,.,lq = 1,.,N. Since A is multilinear and since {ei} and ei are dual bases for V andV' *, the condition (33.19) implies that

A^"ip j1. jq e i1 ^ ^ eip ^ e j1 ®- ^ e jq

(

(

V1

= A v1,.,vp,v1,.vq

, ■■■, v p, v1, . v q

)

)

for all v1,., vq e V and v1,., v p e V * and thus A = Ai1"Aj1.j.e q ii1 ®-®eiip ®e11 ®-®elq

(33.20)

Now from Theorem 9.10 we conclude that the set (33.14) is a basis for Tqp (V).

Having determined the dimension of Tqp (V) , we can now prove that T11 (V)is isomorphic to L(V;V), a result mentioned in Example 1. As before we define the operation

^L (V;V )^ T11 (V) by the condition (33.4). Since from (16.8) and (33.15), the vector space L(V;V) and T11 (V) are of the same dimension, namely N2, it suffices to show that the operation Indeed, let A = 0. Then from (33.13) and (33.4) we have

^v*, Av) = 0

is one-to-one.

Sec. 33

Multilinear Functions, Tensors

223

for all v* e V* and all v e V. Now since the scalar product is definite, this condition implies

Av = 0 for all v e V, and thus A = 0. Consequently the operation

is an isomorphism.

As remarked in Example 1, we shall suppress the notation by regarding the isomorphism it represents as a canonical one. Therefore, we can replace the formula (33.4) by the formula (33.5).

Now returning to the space Tqp (V) in general, we see that a corollary of Theorem 33.2 is the simple fact that the set of all tensor products of the form (33.10) is a generating set for Tqp (V). We define a tensor that can be represented as a tensor product to be a simple tensor. It

should be noted, however, that such a representation for a simple tensor is not unique. Indeed, from (33.8), we have v ® v* =( 2v )®(2 v’^

for any v e V and v’ e V* since

(

)

v ® v* u*,u

for all u* andu. In general, the tensor product, as defined by (33.10), can be regarded as a mapping ® : Vx---x Vx V* x---x V’^ Tqp (V)

(33.21)

q times

p times

given by

(

)

®: Y1,-, vp, v1,-, vq = vi

&■■■ ®

vp ® v1 x-® vq

(33.22)

for all v1,...,vp Vf and v1,...,vq V**. It is easy to verify that the mapping ® is multilinear in the usual sense, i.e.,

Chap. 7

224

)

(

TENSOR ALGEBRA

(

® Vi,—,a v + fru,..., v q = a® Vi,—, v,-, v q

)

(

+ fr® vp-, u,..., v q

(33 23)

)

where av + /Ai , v and u all take the same position in the argument of ® but that position is arbitrary. From (33.23), we see that

(av1) ® v2 ® • • • ® vp ® v1 ® • • • ® vq = v1 ®(av2)®®vp ®v1 ®

®vq

We can extend the operation of tensor product from vectors and covectors to tensors in general. If A eTqp (V) and B e Ts r (V), then we define their tensor product A ® B to be a tensor of order (p +r,q+s)by

(

)

A ® B v, v p+r, vi,... v q+s

(

) (

= A v ....... v p , vi,... v q B v p ■',..., v p+ r , v q+1,..., v q.,

)

(33.24)

for all v1,..., vp+r e V* and vp..., vq+s e V . Applying this definition to arbitrary tensors AandB yields a mapping ®: Tqp (V) Tsr (V)^ T,Pr (V)

(33.25)

Clearly this operation can be further extended to more than two tensor spaces, say TqPk (V) ^ Tq+'■,+pk (^)

®: Tq:'R

is a bilinear function. By Theorem 34.1 ( , c can be factored through the tensor product ®: VxV* ^T11 (V)

i.e., there exists a linear map

C: T14 (V) ^ R such that ( , =CCo®

(34.6)

Sec. 34

231

Contractions

or, equivalently,

(

(34.7)

v, v J = C v ® v*

for all v e V and v* e V*. This linear function C is the simplest kind of contraction operation. It transforms the tensor space T1 1 (V) to the tensor space T1-11-1(V) = T00 (V) = R. In general if A e Tqp (V) is an arbitrary simple tensor, say

A = v1 ®...®vp ®v1 ®-® vq

(34.8)

then for each pair of integers (i, j) , where 1 < i < p, 1 < j < q, we seek a unique linear transformation

Cj: Tqp(VPTq-;'(V)

(34.9)

such that

CjA = (vj, vi)v1 ®-® vi-1 ® vi+1 ®-® vp ® v1 ®-®vj-1 ® vj+1 ®-® vq

(34.10)

for all simple tensors A . A more compact notation for the tensor product on the right-hand side is v1 ®-® vi - ® vp ®-® v1 ®- vj - ® vq

(34.11)

Since the representation of a simple tensor by a tensor product is not unique, the existence of such a linear transformation Cij is by no means obvious. However, we can prove that Cij does exist and is uniquely determined by the condition (34.10). Indeed, using the universal factorization property, we can prove the following theorem. Theorem 34.2. A unique contraction operation Cij satisfying the condition (34.10) exists.

Proof. We define a multilinear transformation of the form (34.1) with U= Tq-p1-1 (V) by Z(vp...,vp,v1,...vq ) v j, v A v, ®^ v i — ® v p ® v1 ® •- v j — ® v q

(34.12)

Chap. 7

232

TENSOR ALGEBRA

for all v1,...,vp eV and v1,...vq eV*. Then by Theorem 34.1 there exists a unique linear transformation Cij of the form (34.9) such that (34.10) holds, and thus the proof is complete. Next we express the contraction operation in component form.

Theorem 34.3. If A e Tqp (V) is an arbitrary tensor of order (p, q), then in component form

{ }

relative to any dual bases {ei} and ei we have k1.ki.kp

= Ak1.ki-1tki+1.kp

(34.13)

Proof. Since the contraction operation is linear applying it to the component form (33.20), we get CjA = A'"\ ,q (elj,e,)eti®^ek ■■■ ®e,p ®e'®-elj

®elq

(34.14)

which means nothing but the formula (34.13) because we have ^elj, ek = 3k.

In particular, for the special case C given by (34.6) we have C(A)=Att

(34.15)

for all A e T11 (V). If we now make use of the canonical isomorphism %1 (V ) = L (V ;V)

we see that C coincides with the trace operation defined by (19.8). Using the contraction operation, we can define a scalar product for tensors. If A e q ( )an e pq ( ),eensorproucsaensorn pp++qq ( )ansene p

by (33.24). We apply the contraction

C = C1

O...O

C1 o cq+1

q times

to A ® B, then the result is a scalar (A,B), namely

O...O p times

cq+1

(34.16)

Sec. 34

Contractions

233

(A, B) = C (A ® B)

(34.17)

called the scalar product of A and B. It is a simple matter to see that , , } is a bilinear and

definite function in the sense similar to those properties of the scalar product of v and v* explained in Section 31. We can use the bilinear function

(,

): Tqp(V)xTpq(V)^>R

(34.18)

to identify the space Tqp (V) with the dual space Tpq (V) of the space Tpq (V) or equivalently,

we can define the dual space Tpq (V) abstractly as usual and then introduce a canonical isomorphism from Tqp (V) to Tpq (V) through (34.17). Thus we write Tqp (V) = Tq (V)*

Of course we shall also identify the second dual space Tpq (V) general in Section 32. Hence, we have

(34.19) with Tpq (V) as explained in

Tqp (V)* = Tpq (V)**

(34.20)

which follows also by interchanging pandqin (34.19). From (34.13) and (34.16), the scalar product (A,B) is given by the component form

(A, B = A'"'\j.iB"

(34.21)

{ }

relative to any dual bases {ei} and ei for V and V*.

Exercises 34.1

Show that

(

(A, V' ®-® vq ®v1 ®••• ® vp) = A v',-,vp, V',-,vq

34.2

)

for all A e Tq p (V), v',---, v q p, or

by lowering the last p-p1 superscripts of A if p > p1 . Thus if A has the component form

( )

(35.13), then Gqp1 -1 GqpA has the component form

(G )-1 G 'A = p1 q1 p q

e

Ai^ip>.

j1... jq1 . i1 e. ® • • • ®ip1

e11 ® - ® ejq1

(35.28)

eip1kp1-p if p1 >p

(35.29)

®

where Ai1...ip1

=Ai1...ip j1 ... jq1

eip+1k1 k1...kp1-pj1... jq1

and

A1... p1

=Ai.i1...i7p1... kp-p1

— eekk; p-p1jjp-p1 if p > p11 jjp-p1+1... jjq1ekj k1j1

(35.30)

240

Chap. 7

TENSOR ALGEBRA

For example, if A e T21 (V) has the representation A = Ajki k et ® e j ® ek then the covariant representation of A is G12 AA jke e j fa e k = - Ai jkeil eee e l fa e j fa e k-A e l fa e j fa e k = ei fa ee =ljk eee

the contravariant representation of A is

( )

fG AA jkeei to e j to ek- A ejekme to e to e = Ailm e to e to e G 3\—1 G G12A =A ^ e ^ e = A jke e ei ^ el ^ em — A ei ^ el ^ em and the representation of A in T12 (V) is G( GM A= - A jkei fa e j fa ek = - A jke A1 ei fa e l ^ e— = Ail ke i ^ el ^ ek 1)—1 G12 ee etc. These formulas follow from (35.20), (35.24), and (35.29).

Of course, we can apply the operations of lowering and raising to indices at any position, e.g., we can define a “component” Ai1 j i3 - '“j. for A e Tr (V) by

Ai1 j i3; ia.. . — Aj

e 11 e313 -ea“ -

j i1

(35.31)

However, for simplicity, we have not yet assigned any symbol to such representations whose “components” have an irregular arrangement of superscripts and subscripts. In our notation for the component representation of a tensor A e Tqp (V) the contravariant superscripts always

come first, so that the components of A are written as Ai1...ipj...jas shown in (35.13), not as A^

j i1'ip

or as any other rearrangement of positions of the superscripts i1 ^. ip and the

subscripts j1 .^ j , such as the one defined by (35.31). In order to indicate precisely the position of the contravariant indices and the covariant indices in the irregular component form, we may use, for example, the notation

V ®V *®V ®^ ®V ®

(1)

(2)

(3)

(a)

®V *® (b)

(35.32)

for the tensor space whose elements have components of the form on the left-hand side of (35.31), where the order of V and V* in (35.32) are the same as those of the contravariant and

Sec. 35

Tensors on Inner Product Spaces

241

the covariant indices, respectively, in the component form. In particular, the simple notation Tqp (V)now corresponds to V'®...0V0

(1)

(p)

V*0...0 V*

(p+1)

(35.33)

(p+q)

Since the irregular tensor spaces, such as the one in (35.32), are not convenient to use, we shall avoid them as much as possible, and we shall not bother to generalize the notation Gqp to isomorphisms from the irregular tensor spaces to their corresponding pure covariant representations. So far, we have generalized the isomorphism G for vectors to the isomorphisms Gqp for tensors of type (p,q)in general. Equation (35.1) forG , however, contains more information than just the fact that G is an isomorphism. If we read that equation reversely, we see that G can be used to compute the inner product onV . Indeed, we can rewrite that equation as v • u = ^Gv, u) = G G1v, u

(35.34)

This idea can be generalized easily to tensors. For example, if A and B are tensors in Tr (V) , then we can define an inner product A • B by

A • B = (Gr A,B)

(35.35)

We leave the proof to the reader that (35.35) actually defines an inner productT r (V). By use of (34.21) and (35.20), it is possible to write (35.35) in the component form

A•B=A

(35.36)

.ehj j1...A jr]'B i1

or, equivalently, in the forms

Ai1. irBii i1 . ir

A • B = ^ A -i-1 eBi [ A"lrBh 2^e

(35.37) etc.

Equation (35.37) suggests definitions of inner products for other tensor spaces besides Tr (V) .

If A and B are in Tqp (V), we define A • B by

TENSOR ALGEBRA

Chap. 7

242

p+q-1p p+q-1p A ’B = G GqA’ G GqB

(

)

(

)

\G A,(G )-1G B/ p q

p+q

p q

(35.38)

Again, we leave it as an exercise to the reader to establish that (35.38) does define an inner product onTqp (V); moreover, the inner product can be written in component form by (35.37) (35.37) also. In section 32, we pointed out that if we agree to use a particular inner product, then the isomorphism G can be regarded as canonical. The formulas of this section clearly indicate the desirability of this procedure if for no other reason than notational simplicity. Thus, from this point on, we shall identify the dual spaceV * witli^, i.e.,

V = V* by suppressing the symbolG . Then in view of (35.2) and (35.7), we shall suppress the symbol Gq p also. Thus, we shall identify all tensor spaces of the same total order, i.e., we write

T r (V) = T1 r-1 (V ) = - = Tr (V)

(35.39)

By this procedure we can replace scalar products throughout our formulas by inner products according to the formulas (31.9) and (35.38). The identification (35.39) means that, as long as the total order of a tensor is given, it is no longer necessary to specify separately the contravariant order and the covariant order. These separate orders will arise only when we select a particular component representation of the tensor. For example, if A is of total order r , then we can express Aby the following different component forms:

A = A-ieii1 ®...®eiir = A11..Jr—'. ei ®...® ei ® ej-

jr

1

r-1

(35.40)

= Aj1... j.r ej1 ®••• ®ej’

where the placement of the indices indicates that the first form is the pure contravariant representation, the last form is the pure covariant representation, while the intermediate forms are various mixed tensor representations, all having the same total order r . Of course, the various representations in (35.40) are related by the formulas (35.20), (35.25), (35.29), and (35.30). In fact, if (35.31) is used, we can even represent Aby an irregular component form such as

A = A1 ^-aj2 ... ejbi ...®i1 ej2 i® 3 ei ® -ia® ei ® - ® ejb ® -

(35.41)

Sec. 35

Tensors on Inner Product Spaces

243

provided that the total order is unchanged. The contraction operator defined in Section 34 can now be rewritten in a form more convenient for tensors defined on an inner product space. If A is a simple tensor of (total) order r, r > 2 , with the representation A = v1 ®... ® v r then CijA , where 1 < i < j < r, is a simple tensor of (total) order r - 2 defined by (34.10),

)

(

CjA S v • • v j v1 ® - v i " v j " ^ v r

(35.42)

By linearity, Cij can be extended to all tensors of orderr . If A e T (V) has the representation A = Ak^ ek1 ®... ® ekr then by (35.42) and (35.26)

(

)

CA = A,... , * C j e *'® - ® ekr =e

'

= Ak

„ek1 ®"eki "ekj - ® ekr

Aki...k

kj k 1 ......

e

k k' j... r

(35.43)

■ ■■■? k---ekj - ■ ekr i

j

i

It follows from the definition (35.42) that the complete contraction operator C [cf. (35.16)] C: T2r (V )^ R can be written

C = C'2 o C'3

O...O

C'(r+')

(35.44)

Also, if A and B are inTr(V), it is easily establish that [cf. (34.17)]

A • B = C (A ® B)

(35.45)

TENSOR ALGEBRA

Chap. 7

244

In closing this chapter, it is convenient for later use to record certain formulas here. The identity automorphism I of V corresponds to a tensor of order 2. Its pure covariant representation is simply the inner product

I (u,v) = u • v

(35.46)

The tensor I can be represented in any of the following forms:

I = ei.ei ®ij ej = eeii ® ej, = = — 2gn dx1 77 J . JL i > = 1 dgq ii Zgu

Where the indices z and j are not summed.

if

i*7

Chap. 9

352

47.7

EUCLIDEAN MANIFOLDS

Show that

d fk T^-S dx j il

d fk I ft I k I — dxl7}[jJr + “1 il Jr1[jJr

ftIfk 1 r1 r=0 [ j J [tl

The quantity on the left-hand side of this equation is the component of a fourth-order tensor R , called the curvature tensor, which is zero for any Euclidean manifold2.

2 The Maple computer program contains a package tensor that will produce Christoffel symbols and other important tensor quantities associated with various coordinate systems.

Sec. 48

Covariant Derivatives along Curves

353

Section 48. Covariant Derivatives along Curves In the preceding section we have considered covariant differentiation of tensor fields which are defined on open submanifolds in E . In applications, however, we often encounter vector or tensor fields defined only on some smooth curve in E . For example, if X: (a, b) ^ E is a smooth curve, then the tangent vector X is a vector field on E . In this section we shall consider the problem of representing the gradients of arbitrary tensor fields defined on smooth curves in a Euclidean space.

Given any smooth curve X:(a, b) ^ E and a field A:(a, b) ^ Tq (V) we can regard the value A(t) as a tensor of order q at X(t). Then the gradient or covariant derivative of A along X is defined by

dA(t) s lim A(t + At) - A(t) dt At > At

(48.1)

for all t e (a, b) . If the limit on the right-hand side of (48.1) exists, then dA(t)/dt is itself also a tensor field of order q on X. Hence, we can define the second gradient d2A(t)/dt2 by replacing A by dA(t)/dt in (48.1). Higher gradients of A are defined similarly. If all gradients of A exist, then A is C'" -smooth on X. We are interested in representing the gradients of A in component form.

Let x be a coordinate system covering some point of X. Then as before we can characterize X by its coordinate representation (X (t), i = 1,..., N). Similarly, we can express A in component form

A( t) = A^ (t )gi h(X (t)) ®...® giq (X( t))

where the product basis is that of x at X(t), the point where A(t) is defined. Differentiating (48.2) with respect to t, we obtain

(48.2)

EUCLIDEAN MANIFOLDS

Chap. 9

354 dA(t) dt

dAi1...iq (t) g ii( k (t)) ®...® giq (k( t)) dt dgi (k(t)) +Ai1...iq (t) dg 1(kJ(t)) ®-®gi.(k(t))+•••+gii(k(t))®-®-±— dt q 1 dt

(48.3)

By application of the chain rule, we obtain

dg,( k(t)) = dgi(k(t)) dV (t) dt dx j dt

(48.4)

where (47.5) and (47.6) have been used. In the preceding section we have represented the partial derivative of gi by the component form (47.26). Hence we can rewrite (48.4) as

dg^J k \ j g k (k(t» dt [ ij J dt

Substitution of (48.5) into (48.3) yields the desired component representation

dA(t) =1 dAi"^ (t) dt I1dt xg i1

Aki2...iq (t) ■

f i, l dV (t) ’ ii' ■ + • ••+Ai1...iq-1k(t)1 1kjJ 1kj J J dt l

(48.6)

(k(t)) ®-"® g ( (t)) iq k

where the Christoffel symbols are evaluated at the position k(t) . The representation (48.6) gives the contravariant components (dA(t)/dt)i1'''iq of dA(t)/dt in terms of the contravariant components

A1"iq (t) of A(t) relative to the same coordinate chart x. If the mixed components are used, the representation becomes

(

dA(t) JdA'"\..,, t) + dt | dt

^i ii (t) 1 ! + • ••+Ai1 ..i k j1... js

Aki2...ir

r-1

kl

-Ai ...i

Ai1 ..i

1

j1... js-1k

kj2... js

(t)1 kl >

k dVl(t) t) 1V/l jslJ dt

(

xgi (k(t)) ® • • • ® g (k(t)) ® gj (k(t)) ® • • • ® g j (k(t)) i

r

s

(48.7)

Covariant Derivatives along Curves

Sec. 48

355

We leave the proof of this general formula as an exercise. In view of the representations (48.6) and (48.7), we see that it is important to distinguish the notation

( dA ! ir I dt J jj1 ... jjs ^ 1-

which denotes a component of the covariant derivative dAdt, from the notation

dAi1...ir j1... js

dt which denotes the derivative of a component of A . For this reason we shall denote the former by the new notation DAi1...ir j1... js

Dt As an example we compute the components of the gradient of a vector field v along X. By

(48.6)

dv(t) dt

dU (t) dt

(48.8)

or, equivalently,

DUi(t) Dt

dUi(t) i 1 dV (t) + Uk (t) dt kj J dt

(48.9)

In particular, when v is the ith basis vector gi and then X is the jth coordinate curve, then (48.8) reduces to

356

EUCLIDEAN MANIFOLDS

Chap. 9

k

dg i

r gk

dx1

which is the previous equation (47.26).

Next if v is the tangent vector X of I, then (48.8) reduces to

d2 X (t) dt2

fdK (t) dK (t) f i 1 dK (t) 1 „ z „ ---- —---r-----— g i (X (t)) dt2

dt

dt

(48.10)

where (47.4) has been used. In particular, if X is a straight line with homogeneous parameter, i.e., • r» X *C =v = const, , then ,1 if

d2Ki(t)

+

dK (t) dK (t) f i' j r=o, dt dt

i=1,...,N

(48.11)

This equation shows that, for the straight line X(t) = x+tv given by (47.7), we can sharpen the result (47.9) to

(i i k j i f 1 NIr 122I1+o (122) x ( t) ) = x1 +U1 - uU1 — < > t2,..., xN+uNt - -2 Uu 1 < I 2 [ kj kj j j

X(

(48.i2)

for sufficiently small t, where the Christoffel symbols are evaluated at the point x = X(0) .

Now suppose that A is a tensor field on U, say A e T™, and let X:(a, b) ^ U be a curve in U . Then the restriction of A on X is a tensor field

IA(t) - A(X(t)),

t e (a, b)

(48.i3)

In this case we can compute the gradient of A along X either by (48.6) or directly by the chain rule of (48.i3). In both cases the result is

Sec. 48

Covariant Derivatives along Curves dA(X(t)) dt

dA^'i (X(t)) + Akiki2...ii q (t) Li1 dx j

357

..iq k1

i (t) L

dX (t) dt

(48.14)

xg ^(X (t)) &■■■& g' (X( t)) i

or, equivalently,

dA(X(t)) = ( grad A(X(t))) X(t t) dt

(48.15)

where grad A(X (t)) is regarded as an element of L (V ;Ti (V)) as before. Since grad A is given

by (47.15), the result (48.15) also can be written as

dA(X(t)) _ dA(X(t)) dX (t) dt dx j dt

(48.16)

A special case of (48.16), when A reduces to gi, is (48.4).

By (48.16) it follows that the gradient of the metric tensor, the volume tensor, and the skew­ symmetrization operator all vanish along any curve.

Exercises 48.1

Prove the formula (48.7).

48.2

If the parameter t of a curve X is regarded as time, then the tangent vector

dX v=— dt

is the velocity and the gradient of v

358

Chap. 9

EUCLIDEAN MANIFOLDS

dv a=— dt

is the acceleration. For a curve 1 in a three-dimensional Euclidean manifold, express the acceleration in component form relative to the spherical coordinates.

Chapter 10

VECTOR FIELDS AND DIFFERENTIAL FORMS Section 49. Lie Derivatives Let E be a Euclidean manifold and let uand vbe vector fields defined on some open set Uin E. In Exercise 45.2 we have defined the Lie bracket [u, v] by

[u,v]=(gradv)u-(grad u)v

(49.1)

In this section first we shall explain the geometric meaning of the formula (49.1), and then we generalize the operation to the Lie derivative of arbitrary tensor fields. To interpret the operation on the right-hand side of (49.1), we start from the concept of the flow generated by a vector field. We say that a curve X: (a, b) ^ U is an integral curve of a

vector field v if

dt

=v ( X (t))

(49.2)

for all t e (a, b). By (47.1), the condition (49.2) means that X is an integral curve of v if and

only if its tangent vector coincides with the value of v at every point X (t). An integral curve

may be visualized as the orbit of a point flowing with velocity v. Then the flow generated by v is defined to be the mapping that sends a point X(t0) to a point X(t) along any integral curve of v. To make this concept more precise, let us introduce a local coordinate system x. Then the condition (49.2) can be represented by

dA (t) /dt=U (X(t))

(49.3)

where (47.4) has been used. This formula shows that the coordinates (A (t),i = 1,...,N) of an

integral curve are governed by a system of first-order differential equations. Now it is proved in the theory of differential equations that if the fields U on the right-hand side of (49.3) are smooth, then corresponding to any initial condition, say

359

360

Chap. 10

VECTOR FIELDS

k (0) = Xo

(49.4)

or equivalently

X (0) = x0,

i = 1,_,N

(49.5)

a unique solution of (49.3) exists on a certain interval (-8, 8), where 8 may depend on the initial point x0 but it may be chosen to be a fixed, positive number for all initial points in a sufficiently small neighborhood of x0. For definiteness, we denote the solution of (49.2)

corresponding to the initial point xbyk(t,x);then it is known that the mapping from x to k(t,x); then it is known that the mapping from x to k(t,x) is smooth for each t belonging to the interval of existence of the solution. We denote this mapping by p t, namely

x) = k (t,x)

Pt (

(49.6)

and we call it the flow generated by v . In particular,

Po (x) = x,

xG U

(49.7)

reflecting the fact that xis the initial point of the integral curve k(t,x).

Since the fields U are independent of t, they system (49.3) is said to be autonomous. A characteristic property of such a system is that the flow generated by v forms a local oneparameter group. That is, locally, Pt1+ t2 = Pt 1 ° Pt2

(49.8)

k(t1+t2,x)=k(t2,k(t1,x))

(49.9)

or equivalently

for all t1,t2,x such that the mappings in (49.9) are defined. Combining (49.7) with (49.9), we see that the flow Pt is a local diffeomorphism and, locally,

Pt-1 = P-t

(49.1o)

Sec. 49

Lie Derivatives

361

Consequently the gradient, grad p t, is a linear isomorphism which carries a vector at any point

x to a vector at the point pt (x). We call this linear isomorphism the parallelism generated by the flow and, for brevity, denote it by Pt . The parallelism Pt is a typical two-point tensor whose component representation has the form

x) = (x) g (p (x)) ® g (x)

Pt (

Pt

j

i

t

(49.1 1)

j

or, equivalently,

[Pt(x)] (gj (x)) = Pt(x)j gi (Pt(x))

(49.12)

where {gi (x)} and {gj (pt (x))} are the natural bases of x at the points x and pt (x),

respectively. Now the parallelism generated by the flow of v gives rise to a difference quotient of the vector field u in the following way: At any point x eU , Pt(x) carries the vector u(x)atx to the vector (Pt (x)) (u (x)) atpt (x), which can then be compared with the vector u(pt (x)) also at

pt (x). Thus we define the difference quotient

1[u(p (x))-(P (x))(u(x))] t

t

(49.13)

The limit of this difference quotient as t approaches zero is called the Lie derivative of u with respect to v and is denoted by

1 Pu(pt (x))-(Ptv/ t (x))(u(x))] VLu(xt ) tlim ^0 ^V^^^\ /\v//_|

(49.14)

We now derive a representation for the Lie derivative in terms of a local coordinate system x. In view of (49.14) we see that we need an approximate representation for Pt to within first-order terms in t. Let x0 be a particular reference point. From (49.3) we have

X (t, x0) = x0 +ui ( x0) t + o (t) Suppose that x is an arbitrary neighboring point of x0, say x1 = x0 + Axi. Then by the same argument as (49.15) we have also

(49.15)

Chap. 10

362

VECTOR FIELDS

X (t, x ) = x1 +ui (x) t + o (t) = x’ + Axi +U (x0) t +--- (Axj) t + o(Axk ) + o (t) 0 v 0' dx1 \

(49.16) '

Subtracting (49.15) from (49.16), we obtain

dUxxA / X (t,x)-X (t,x0) = Axi +--- dx j ' (Ax1)t + o(Axk) + o(t)

(49.17)

It follows from (49.17) that

(

Pt\t (x0/ 0)j i. yo (gradny p (x ))ji = S'j + U oU Qx^ x0) t + 0v (7t)

(49.18) v 7

Now assuming that u is also smooth, we can represent u (pt (x0)) approximately by

(49.19)

x )) = u (x )+dudxx0) u (x ) t + o (t)

u (pt (

0

j

0

0

where (49.15) has been used. Substituting (49.18) and (49.19) into (49.14) and taking the limit, we finally obtain

x

Lv u(x0)= du ( 0 ) u j dx j

dUj-

x

x)

u(

0

x)

gi (

0

(49.20)

where x0 is arbitrary. Thus the field equation for (49.20) is

du1

j

dU

j

Lu= —uj j---------- - uJ gi v dx1 dx1

(49.21)

By (47.39) or by using a Cartesian system, we can rewrite (49.21) as Lv u=(ui,juj-ui,juj)gi

(49.22)

Consequently the Lie derivative has the following coordinate-free representation:

L u=(gradu)v-(gradv)u

(49.23)

Sec. 49

363

Lie Derivatives

Comparing (49.23) with (49.1), we obtain L u =[v, u]

(49.24)

Since the Lie derivative is defined by the limit of the difference quotient (49.13), L u vanishes if and only if u commutes with the flow of v in the following sense: v u ° Pt = P u

(49.25)

When u satisfies this condition, it may be called invariant with respect tov . Clearly, this condition is symmetric for the pair (u,v), since the Lie bracket is skew-symmetric, namely

[u, v] =-[v,u]

(49.26)

v o 9 t = Q tv

(49.27)

In particular, (49.25) is equivalent to

where 9t and Qt denote the flow and the parallelism generated by u.

Another condition equivalent to (49.25) and (49.27) is

95 ° P t = P t ° 95

(49.28)

which means that

Pt (an integralcurveof u) =anintegralcurve of u

9s (an integralcurveof v) =anintegralcurve of v

(49.29)

To show that (49.29) is necessary and sufficient for

[u,v] =0

(49.30)

we choose any particular integral curve 1: (-3,3)^-U for v, At each point 1 (t) we define an integral curve p (5, t) for u such that ^ (0, t ) = 1 (t)

(49.31)

Chap. 10

364

VECTOR FIELDS

The integral curves p (•, t) with t e (-3,8) then sweep out a surface parameterized by (5, t). We shall now show that (49.30) requires that the curves p (5, •) be integral curves of v for all 5. As before, let x be a local coordinate system. Then by definition

dp (5, t) / d5 = u' (^(5, t))

(49.32)

dp (0, t)/d5 = U (p(0, t))

(49.33)

dp (5, t) / d5 = U (p(5, t))

(49.34)

z (5, t)=dp (5, t)—u (^ (5, t))

(49.35)

z (0, t) = 0

(49.36)

and

and we need to prove that

for 5 ^ 0. We put

By (49.33)

We now show that Z (5, t) vanishes identically. Indeed, if we differentiate (49.35) with respect to 5 and use (49.32), we obtain

Z = _d [ dp_ l-du^dp_ ds d11 d5 J dx1 d5 du1 dp1 dx1 dt

dU

_

dx1

(49.37)

dU { dU U', - dU ---- — Z +1 dx] I dx1 dx1 As a result, when (49.30) holds, z' are governed by the system of differential equations

dZ = dU_ z d5 dx1

(49.38)

Sec. 49

365

Lie Derivatives

and subject to the initial condition (49.36) for each fixed t. Consequently, Z = 0 is the only

solution. Conversely, when Z vanishes identically on the surface, (49.37) implies immediately that the Lie bracket of uandv vanishes. Thus the condition (49.28) is shown to be equivalent to the condition (49.30).

The result just established can be used to prove the theorem mentioned in Section 46 that a field of basis is holonomic if and only if

[ h i, h j ] = 0

(49.39)

for all i, j -1,..., N. Necessity is obvious, since when {h i} is holonomic, the components of each hi relative to the coordinate system corresponding to {hi} are the constants 5'j. Hence from (49.21) we must have (49.39). Conversely, suppose (49.39) holds. Then by (49.28) there exists a surface swept out by integral curves of the vector fields h1 and h2. We denote the

surface parameters by x1 and x2. Now if we define an integral curve for h3 at each surface point

(x1, x2),

then by the conditions

[h1,h3] = [h2,h3] =0 we see that the integral curves of h1, h2,h3 form a three-dimensional net which can be regarded as a “surface coordinate system” (x1,x2,x3) on a three-dimensional hypersurface in the N-

dimensional Euclidean manifold E . By repeating the same process based on the condition (49.39), we finally arrive at an N-dimensional net formed by integral curves of h1,., hN. The

corresponding N-dimensional coordinate system x1,., xN now forms a chart in E and its natural basis is the given basis {hi} . Thus the theorem is proved. In the next section we shall make use of this theorem to prove the classical Frobenius theorem.

So far we have considered the Lie derivative L u of a vector field u relative tov only.

Since the parallelism Pt generated byv is a linear isomorphism, it can be extended to tensor fields. As usual for simple tensors we define Pt (a ® b ® •••) = ( Pt a)®( Pt b)® •••

(49.40)

Then we extend Pt to arbitrary tensors by linearity. Using this extended parallelism, we define the Lie derivative of a tensor field A with respect to v by

x )s limt 1 [ A (p (x))-(P (x))(A (x))]

LA ( v

t ^°

t

t

(49.41)

Chap. 10

366

VECTOR FIELDS

which is clearly a generalization of (49.14). In terms of a coordinate system it can be shown that

(Lv A)i1i j

1- js

dA

i' r jjl •■• js j . kkk _________ k

dx

U

dUi_______ a , -,r k U dx k j1-j dx k dU i+h,,,, Ai -ir dU ______ ir_______ dx11 j1-j k dxj

Ak2-h

1

-1

(49.42)

s

+Ai1

,

l

k-js

s-1

which generalizes the formula (49.21). We leave the proof of (49.42) as an exercise. By the same argument as (49.22), the partial derivatives in (49.42) can be replaced by covariant derivatives. It should be noted that the operations of raising and lowering of indices by the Euclidean metric do not commute with the Lie derivative, since the parallelism Pt generated by the flow generally does not preserve the metric. Consequently, to compute the Lie derivative of a tensor field A,we must assign a particular contravariant order and covariant order to A. The formula (49.42) is valid when A is regarded as a tensor field of contravariant order r and covariant order s. By the same token, the Lie derivative of a constant tensor such as the volume tensor or the skew-symmetric operator generally need not vanish. Exercises

49.1 49.2 49.3

Prove the general representation formula (49.42) for the Lie derivative. Show that the right-hand side of (49.42) obeys the transformation rule of the components of a tensor field. In the two-dimensional Euclidean plane E , consider the vector field v whose components relative to a rectangular Cartesian coordinate system (x1,x2) are

u2 = ax2

U = a x 1,

where a is a positive constant. Determine the flow generated by v. In particular, find the integral curve passing through the initial point

(x 1,x2)=(1,1) 0

49.4

0

In the same two-dimensional plane E consider the vector field such that

Sec. 49

367

Lie Derivatives

U = x 2,

u2 = x1

Show that the flow generated by this vector field is the group of rotations of E . In particular, show that the Euclidean metric is invariant with respect to this vector field.

49.5

Show that the flow of the autonomous system (49.3) possesses the local one-parameter group property (49.8).

Chap. 10

368

VECTOR FIELDS

Section 50. The Frobenius Theorem The concept of the integral curve of a vector field has been considered in some detail in the preceding section. In this section we introduce a somewhat more general concept. If v is a non-vanishing vector field in E , then at each point x in the domain of v we can define a one­ dimensional Euclidean space D (x) spanned by the vector v (x). In this sense an integral curve of v corresponds to a one-dimensional hypersurface in E which is tangent to D (x) at each

point of the hypersurface. Clearly this situation can be generalized if we allow the field of Euclidean spaces D to be multidimensional. For definiteness, we call such a field D a distribution in E , say of dimension D. Then a D-dimensional hypersurface L in E is called an integral surface of D if L is tangent to D at every point of the hypersurface. Unlike a one-dimensional distribution, which corresponds to some non-vanishing vector field and hence always possesses many integral curves, a D-dimensional distribution D in general need not have any integral hypersurface at all. The problem of characterizing those distributions that do possess integral hypersurfaces is answered precisely by the Frobenius theorem. Before entering into the details of this important theorem, we introduce some preliminary concepts first.

Let D be a D-dimensional distribution and let v be a vector field. Then v is said to be contained in D if v (x )eD (x) at each x in the domain of v. Since D is D-dimensional, there exist D vector fields {va,a = 1,...,D} contained in Dsuch that D(x) is spanned by

{v (x),a = 1,.,D} at each point x

in the domain of the vector fields. We call

a

{va, a = 1,..., D} a local basis for D.

We say that D is smooth at some point x0 if there exists a

local basis defined on a neighborhood of x0 formed by smooth vector fields va contained in D. Naturally, D is said to be smooth if it is smooth at each point of its domain. We shall be interested in smooth distributions only.

We say that D is integrable at a point x0 if there exists a local coordinate system x defined on a neighborhood of x0 such that the vector fields {ga,a= 1,., D} form a local basis

for D . Equivalently, this condition means that the hypersurfaces characterized by the conditions xD+1 = const,., xN=const

(50.1)

are integral hypersurfaces of D . Since the natural basis vectors gi of any coordinate system are smooth, by the very definition D must be smooth at x0 if it is integrable there. Naturally, D is said to be integrable if it is integrable at each point of its domain. Consequently, every integrable distribution must be smooth.

Sec. 50

The Frobenius Theorem

369

Now we are ready to state and prove the Frobenius theorem, which characterizes integrable distributions. Theorem 50.1. A smooth distribution D is integrable if and only if it is closed with respect to the Lie bracket, i.e.,

u, v e D > [u, v]e D

(50.2)

Proof. Necessity can be verified by direct calculation. If u, v e D and supposing that x is a local coordinate system such that {ga ,a = 1,..., D} forms a local basis for D, then u and v have the component forms u=uaga,

(50.3)

v = U' ga

where the Greek index a is summed from 1 to D. Substituting (50.3) into (49.21), we see that

[u,v]

dU p dx ^ u

dua

'

p

]

(50.4)

ga

Thus (50.2) holds. Conversely, suppose that (50.2) holds. Then for any local basis {v a, a = 1,., D} for D we have [ va, v p ] = Cap v y

(50.5)

where Crap are some smooth functions. We show first that there exists a local basis

{ua ,a = 1,..., D} which satisfies the somewhat stronger condition [ua, u p ] = 0,

a, P = 1,.,D

(50.6)

To construct such a basis, we choose a local coordinate system y and represent the basis {va} by the component form

v a = U k1 + - + UD k d + UD+1k d +1 +

= UpPk p + UkA

+ U kn

(50.7)

Chap. 10

370

VECTOR FIELDS

where {ki} denotes the natural basis of y, and whre the repeated Greek indices p and A are

summed from 1 to D and D +1 to N, respectively. Since the local basis {va] is linearly independent, without loss of generality we can assume that the D x D matrix |^yp^ is nonsingular, namely

det [upa ]# 0

(50.8)

Now we define the basis {ua} by

a = 1,_,D

Ua - uvp,

(50.9)

where [u~ip] denotes the inverse matrix of [yp], as usual. Substituting (50.7) into (50.9), we

see that the component representation of ua is ua = Sfr p + uA k a=

k

a

+ uA k A

(50.10)

We now show that the basis {ua } has the property (50.6). From (50.10), by direct calculation based on (49.21), we can verify easily that the first D components of [ua, up] are zero, i.e., [ua,up] has the representation

[u« • u p ] = Kpk A

(50.11)

but, by assumption, D is closed with respect to the Lie bracket; it follows that

[ua, up] = Kapuy

(50.12)

Substituting (50.10) into (50.12), we then obtain

[u., u p ]=KYp k Y + K.Apk A

(50.13)

Comparing this representation with (50.11), we see that KYp must vanish and hence, by (50.12), (50.6) holds. Now we claim that the local basis {u

V and that

}

for D can be extended to a field of basis {ui} for

The Frobenius Theorem

Sec. 50

[u i, u j ] = 0,

371

i,j = 1,., N

(50.14)

This fact is more or less obvious. From (50.6), by the argument presented in the preceding section, the integral curves of {ua } form a “coordinate net” on a D-dimensional hypersurface defined in the neighborhood of any reference point x0. To define uD+1, we simply choose an

arbitrary smooth curve X (t, x0) passing through x0, having a nonzero tangent, and not belonging to the hyper surface generated by {u1,..., u D}. We regard the points of the curve X (t, x 0) to have constant coordinates in the coordinate net on the D-dimensional hypersurfaces, say (x^,..., xD ). Then we define the curves X (t, x) for all neighboring points x on the hypersurface

of xo , by exactly the same condition with constant coordinates ( x1,., xD ). Thus, by definition, the flow generated by the curves X (t, x) preserves the integral curves of any ua. Hence if we define uD+1 to be the tangent vector field of the curves X(t,x) , then

[uD+1, u ] = 0,

a = 1,... ,D

(50.15)

where we have used the necessary and sufficient condition (49.29) for the condition (49.30).

Having defined the vector fields {u1,.,uD+1}which satisfy the conditions (50.6) and (50.15), we repeat that same procedure and construct the fields uD+2,uD+3,., until we arrive at a

field of basis {u1,., uN}. Now from a theorem proved in the preceding section [cf. (49.39)] the

condition (50.14) is necessary and sufficient that {ui} be the natural basis of a coordinate system x. Consequently, D is integrable and the proof is complete. From the proof of the preceding theorem it is clear that an integrable distribution D can be characterized by the opening remark: In the neighborhood of any point x0 in the domain of

D there exists a D-dimensional hypersurface S such that D (x) is the D-dimensional tangent

hyperplane of S at each point x in S .

We shall state and prove a dual version of the Frobenius theorem in Section 52.

Exercises 50.1

Let D be an integrable distribution defined on U. Then we define the following equivalence relation on U:x0 ~ x1 «• there exists a smooth curve X in U joining x0 tox1 and tangent to D at each point, i.e., X (t) e D (X (t)), for all t. Suppose that S is an

equivalence set relative to the preceding equivalence relation. Show that S is an

372

Chap. 10



VECTOR FIELDS

integral surface of D . (Such an integral surface is called a maximal integral surface or a leaf of D .)

Sec. 51

Differential Forms, Exterior Derivative

373

Section 51. Differential Forms and Exterior Derivative In this section we define a differential operator on skew-symmetric covariant tensor fields. This operator generalizes the classical curl operator defined in Section 47.

Let U be an open set in E and let A be a skew-symmetric covariant tensor field of order r on U, i.e.,

A: U ^ Tr (V)

(51.1)

Then for any point x e U , A (x) is a skew-symmetric tensor of order r (cf. Chapter 8).

Choosing a coordinate chart xr onU as before, we can express A in the component form A= A

g i1 ® ... ® gir 1---rra °

i

(51.2)

where {gi } denotes the natural basis of xr . Since A is skew-symmetric, its components obey the identify

A

^V- j... k ...i,

-Ai1.k.j.i

(51.3)

for any pair of indices (j,k) in (i1,...,ir). As explained in Section 39, we can then represent A by

A=

E

Ai^ g i1 A • ■ ■A gir = -1 Ai^ g i1 A • ■ ■A g 1 r r! 1 r

(51.4)

H

)

v t + A t - p t,t+at (v (t)) ( du r lim--------------- ,1--- =- = +----uA At ^ dt

(56.37)

which is consistant with the previous formula (56.21) for the surface covariant derivative Dv/Dt of v along 1. Since the parallel transport pt-t is a linear isomorphism from S1 (t) to S1 (t), it gives rise to various induced parallel transport for tangential tensors. We define a tangential tensor field A on 1 a constant field or a parallel field if

D ADt = 0

(56.38)

Then the equations of parallel transport along 1for tensor fields of the forms (56.20) are dA dt

r

A A.

r

F1...Fr-1A fr r n V _ ^

dA1



ALj J dt

=0

(56.39)

If we donate the induced parallel transport by P01, or more generally by Pt T, then as before we have

DA = Aim A (t + At)-Pt,t+at (A (t))

(56.40)

Sec. 56

Surface Covariant Derivatives

423

The coordinate-free definitions (56.34) and (56.40) demonstrate clearly the main difference between the surface covariant derivative on S and the ordinary covariant derivative on the Euclidean manifold E . Specifically, in the former case the parallelism used to compute the difference quotient is the path-dependent surface parallelism, while in the latter case the parallelism is simply the Euclidean parallelism, which is path-independent between any pair of points in E . In classical differential geometry the surface parallelism p, or more generally P, defined by the equation (56.28) or (56.39) is known as the Levi-Civita parallelism or the Riemannian parallelism. This parallelism is generally path-dependent and is determined completely by the surface Christoffel symbols.

Exercises 56.1

Use (55.18), (56.5), and (56.4) and formally derive the formula (56.3).

56.2

Use (55.9) and (56.5) and the assumption that D hr / Dys is a surface vector field and formally derive the formula (56.11).

56.3

Show that rq dhA DhAA = fOl lr A= |r J I dyr Dyr rA

and

.

h

dh£ dy

A 'a, S

h

A

D hr

' Dys

f rl J ^ < SA J

These formulas show that the tangential parts of dhjdyr and dhr/dys coincide with

DhA/Dyr and Dhr/Dys , respectively.

56.4

Show that the results of Exercise 56.3 can be written

phA^I DIa LI —A I =---- A Idy rJ Dyr

, and

। phr L| —^dys

D hr Dys

where L is the field whose value at each x e Sis the projection Lx defined in Exercise 55.1.

Chap. 11

424

56.5

HYPERSURFACES

Use the results of Exercise 56.3 and show that

( dA ^ VA = L*| —A I®hA [dy aJ where L*is the linear mapping induced by L .

56.6

Adopt the result of Exercise 56.5 and derive (56.10). This result shows that the above formula for VA can be adopted as the definition of the surface gradient. One advantage of this approach is that one does not need to introduce the formal operation D ADyA . If needed, it can simply be defined to be L* (d^ dyA).

56.7

Another advantage of adopting the result of Exercise 56.5 as a definition is that it can be used to compute the surface gradient of field on S which are not tangential. For example, each gk can be restricted to a field on S but it does not have a component representation of the form (56.9). As an illustration of this concept, show that

Vgk

j ! klJ g

dx q dx1 h s® h A dy dy

and

( d2 xk VL = [dy rdy A

x' rsidxk rki dx1 dx1 ^ dd^h® ® hA ® hr = 0 J---F + [ Ar J dys [jl J a/ ay^) gks dy ®

Note that VL =0 is no more than the result (56.12). 56.8

Compute the surface Christoffel symbols for the surfaces defined in Excercises 55.5 and 55.7.

56.9

If A is a tensor field on E, then we can, of course, calculate its spatial gradient, grad A. Also, we can restrict its domain Sand compute the surface gradientVA . Show that

VA=L* (gradA) 56.10 Show that

(det [ a ])1 (det [ a ])1 dys

d

Ar

'

Ar

'

r®i [ J ®s J
, = ^XQA dy

r = 1,... n -1

(59.13)

Chap. 11

446

HYPERSURFACES

can be integrated near x0 for each prescribed initial value

U r( x 0 ) = U,

r = 1,... N -1

(59.14)

Proof. Necessity. Let {vL, L = 1,..., N -1} be linearly independent and satisfy (59.13). Then we can obtain the surface Christoffel symbols from

0=

duL rl Q !> +U < QA dy A

(59.15)

and

r ri dul ^ -uQL dy A < “A J

(59.16)

where [uQ J denotes the inverse matrix of |U“]. Substituting (59.16) into (59.10), we obtain

directly

(59.17)

R rLAQ = 0

Sufficiency. It follows from the Frobenius theorem that the conditions of integrability for the system of first-order partial differential equations

u=-u “i n dyA |“Aj

(59.18)

are

d u“r r |QA dyL

d dy A k

U“


(60.5)

= -u nR rQyA hr as required by (59.9), while the right-hand side is a vector having the normal projection

P( brAU r)

dy y

d( brryyu r) 1 1 1 ^ dyA + bryu ,a - bfau ,y n

(60.6)

and the tangential projection

(-b

u r b,b bryu r b,) h,

(60.7)

where we have used Weingarten’s formula (58.7) to compute the covariant derivatives dn / dyy and dn / dyA . Consequently, (60.5) implies that

d( brAu r) d( bryu r) V rA —---- V + bbryur A - brAur y = 0 dy y dyA ’A ,y

(60.8)

and that

Ur R%yA = brAur b,- bryurb,

(60.9)

for all tangential vector fields v.

If we choose v = hr, then (60.9) becomes

R%yA = bfa b,- bryb,

(60.10)

R,ryA = bfa b,y — bryb,a

(60.11)

or, equivalently,

Sec. 60

Surface Curvature, III

451

Thus the Riemann-Christoffel tensor R is completely determined by the second fundamental form. The important result (60.11) is called the equations of Gauss in classical differential geometry. A similar choice for v. in (60.8) yields

dbrA dys

bOA "

ol rs

dbrs +bos ^ ol!> = 0 dy rA,

(60.12)

By use of the symmetry condition (56.4) we can rewrite (60.12) in the more elegant form b ia.s brs,A =

0

(60.13)

which are the equations of Codazzi. The importance of the equations of Gauss and Codazzi lies not only in their uniting the second fundamental form with the Riemann Christoffel tensor for any hypersurfaces S in E , but also in their being the conditions of integrability as asserted by the following theorem. Theorem 60.1. Suppose that arA and brA are any presceibed smooth functions of (yn) such

that [arA] is positive-definite and symmetric, [brA] is symmetric, and together [arA] and [brA] satisfy the equations of Gauss and Codazzi. Then locally there exists a hyper surface S with representation

x = xi (yn)

(60.14)

on which the prescribed arA and brA are the first and second fundamental forms.

Proof. For simplicity we choose a rectangular Cartesian coordinate system xi in E . Then in component form (58.21) and (58.7) are represented by .A A , i I S1, i dhrr / dyA = bia rAn ■ y r^^k > h4, s

dn / dy r = - brA ha

(60.15)

We claim that this system can be integrated and the solution preserves the algebraic conditions

SijhrhA= arA,

Sijh[n1 = 0,

i)in n1 = 1

(60.16)

In particular, if (60.16) are imposed at any one reference point, then they hold identically on a neighborhood of the reference point.

Chap. 11

452

HYPERSURFACES

The fact that the solution of (60.15) preserves the conditions (60.16) is more or less obvious. We put ffrA = = 8 ujh'h t"a1 - a arA,

ffr ==u8jh'n tn 1,

8 n'n1 -11 ff = = ^nn^

(60 17) (w.i/)

Then initially at some point (y01, y02) we have f:a(y0,yo’)=f■(y' y0) = f (y ■ y2) = 0

Now from (60.15) and (60.17) we can verify easily that

'■f| = brsfa+ b*J-, f = -bA.f ' ,A f,+ brsf, dy dy IJSJ

f dy

’fa

(60.19)

along the coordinate curve of any ys. From (60.18) and (60.19) we see that frA, fr, and f must vanish identically. Now the system (60.15) is integrable, since by use of the equations of Gauss and Codazzi we have

d rdhr i d rdhri /nQ , ,Q , ,^,i I a”I —F I = (R rsA - brsba + brAby )ho dy\dy\) dyA ,dySJ v rsA rs A rA o

—fI —a”

+ ( brs,A brA,s) n =

(60.20)

0

and

d | dn1 dy Mdyr) dy Y[dy sH bOr b“J h° = 0

(60.21)

Hence locally there exist functions hr (y“) and n1 (y“) which verify (60.15) and (60.16).

Next we set up the system of first-order partial differential equations

dx1 / dy A= hA (y “)

(60.22)

where the right-hand side is the solution of (60.15) and (60.16). Then (60.22) is also integrable, since we have

Sec. 60

Surface Curvature, III

453

dhA h i M Ql , ii= d ( dx1 — baln + 1 AL r hQ! dy AldyL dy [

(60.23)

Consequently the solution (60.14) exists; further, from (60.22), (60.16), and (60.15) the first and the second fundamental forms on the hypersurface defined by (60.14) are precisely the prescribed fields arA and brA, respectively. Thus the theorem is proved.

It should be noted that, since the algebraic conditions (60.16) are preserved by the solution of (60.15), any two solutions (hr,n1) and (hr,nj) of (60.15) can differ by at most a transformation of the form hr = Qjhj,

ni = Q11

(60.24)

where [Qj ] is a constant orthogonal matrix. This transformation corresponds to a change of

rectangular Cartesian coordinate system on E used in the representation (60.14). In this sense the fundamental forms arA and brA , together with the equations of Gauss andCodazzi, determine the hyper surface S locally to within a rigid displacement in E . While the equations of Gauss show that the Riemann Christoffel tensor R of S is completely determined by the second fundamental form B, the converse is generally not true. Thus mathematically the extrinsic characterization of curvature is much stronger than the intrinsic one, as it should be. In particular, if B vanishes, then S reduces to a hyperplane which is trivially developable so that R vanishes. On the other hand, if R vanishes, then S can be developed into a hyperplane, but generally B does not vanish, so S need not itself be a hyperplane. For example, in a three-dimensional Euclidean space a cylinder is developable, but it is not a plane.

Exercise 60.1

In the case where N = 3 show that the only independent equations of Codazzi are

bAA,r - bAr,A =

0

for A ^ r and where the summation convention has been abandoned. Also show that the only independent equation of Gauss is the single equation

R1212=b11b22-b122 =detB

Chap. 11

454

HYPERSURFACES

Section 61. Surface Area, Minimal Surface Since we assume that S is oriented, the tangent plane Sx of each point x e S is equipped with a volume tensor S. Relative to any positive surface coordinate system (yr) with

natural basis {hr}• S can be represented by S

h1

(61.1)

a • • • a h N-1

where a = det [ arA]

If U is a domain in S covered by (yr), then the surface area

(61.2) g{U)

is defined by

) = J- • -J4ady1 ••• dyN 1

ct(U

(61.3)

U

where the (N -1) -fold integral is taken over the coordinates (yr) on the domain U. Since Oa

obeys the transformation rule 0a

y/a det dyr dyA

(61.4)

relative to any positive coordinate systems (yr) and (yr), the value of ct(U) is independent of the choice of coordinates. Thus o can be extended into a positive measure on S in the sense of measure theory.

We shall consider the theory of integration relative to ct in detail in Chapter 13. Here we shall note only a particular result wich connects a property of the surface area to the mean curvature of S . Namely, we seek a geometric condition for S to be a minimal surface, which is defined by the condition that the surface area ct (U) be an extremum in the class of variations of hypersurfaces having the same boundary as U . The concept of minimal surface is similar to that of a geodesic whichwe have explored in Section 57, except that here we are interested in the variation of the integral ct given by (61.3) instead of the integral s given by (57.5).

As before, the geometric condition for a minimal surface follows from the EulerLagrange equation for (61.3), namely,

Sec. 61

Surface Area, Minimal Surface

455

d d48 ^ dOa dy r t dhr , ~Xxr

(61.5)

where a is regarded as a function of x and hr through the representation

dx ' d x j arA = 8 hrh1 = 8^ |Xr dy dy

(61.6)

For simplicity we have chosen the spatial coordinates to be rectangular Cartesian, so that arA does not depend explicitly on x'. From (61.6) the partial derivative of 08 with respect to hr is given by bO8

Tro

a arA 8'j hA

Oa 8ij h rj

(61.7)

Substituting this formula into (61.5), we see that the condition for S to be a minimal surface is 08 8.'j

or

/rj dh rj ) n h ^+—- | = 0, dyr I

i = 1,_, N

(61.8)

or, equivalently, in vector notation . r dh dhr । _ ^ hr +—- I = 0 or dyr

O

a

(61.9)

In deriving (61.8) and (61.9), we have used the identity

a |O I = !> a ~yO or W

(61.10)

which we have noted in Exercise 56.10. Now from (58.20) we can rewrite (61.9) in the simple form Ob bjTn = 0

(61.11)

456

Chap. 11

HYPERSURFACES

As a result, the geometric condition for a minimal surface is IB = tr B = br= 0

(61.12)

In general, the first invariant IB of the second fundamental form B is called the mean curvature of the hyper surface. Equation (61.12) then asserts that S is a minimal surface if and only if its mean curvature vanishes. Since in general B is symmetric, it has the usual spectral decomposition N-1

B = £ Pc K c r

(61.13)

r=i

where the principal basis {cr} is orthonormal on the surface. In differential geometry the

direction of c r (x) at each point x e S is called a principal direction and the corresponding proper number p (x) is called a principal (normal) curvature at x. As usual, the principal

(normal) curvatures are the extrema of the normal curvature

Kn = B ( s, s)

(61.14)

in all unit tangents s at any point x [cf. (58.29)2]. The mean curvature IB, of course, is equal to the sum of the principal curvatures, i.e.,

IB = £ P r=1

(61.15)

Sec. 62

Three-Dimensional Euclidean Manifold

457

Section 62. Surfaces in a Three-Dimensional Euclidean Manifold In this section we shall apply the general theory of hypersurfaces to the special case of a two-dimensional surface imbedded in a three-dimensional Euclidean manifold. This special case is the commonest case in application, and it also provides a good example to illustrate the general results.

As usual we can represent S by xi = xi (yr)

(62.1)

where i ranges from 1 to 3 and r ranges from 1 to 2. We choose (xi) and (yr) to be positive

spatial and surface coordinate systems, respectively. Then the tangent plane of S is spanned by the basis hr=-T g i, dy

(62.2)

r = 1,2

and the unit normal n is given by n=

h1 x h2 12

(62.3)

I|h1 x h2||

The first and the second fundamental forms arA and brA relative to (yr) are defined by arA = hr’ hA.

, dhr , dn brA = n---- r = -hr-------rA dyA r dyA

(62.4)

These forms satisfy the equations of Gauss and Codazzi:

R®rsA brA b®s

brsb® a,

brA,s brs,A

0

(62.5)

Since brA is symmetric, at each point x e S there exists a positive orthonormal basis

{c (x)} r

relative to which B (x) can be represented by the spectral form B (x) = Pi (x)

c ( x) ® c (x) + p (x) c ( x) ® c (x) 1

1

2

2

(62.6)

Chap. 11

458

HYPERSURFACES

The proper numbers $1 (x) and $2 (x) are the principal (normal) curvatures of S at x. In general {c r (x)} is not the natural basis of a surface coordinate system unless

[c1,c2] =0

(62.7)

However, locally we can always choose a coordinate system (zr) in such a way that the natural basis {hr} is parallel to {c r (x)}. Naturally, such a coordinate system is called a principal

coordinate system and its coordinate curves are called the lines of curvature. Relative to a principal coordinate system the components arA and brA satisfy b12 = a12 = 0

h1 =^11 c1,

and

h2 = ^/a22 c2

(62.8)

The principal invariants of B are I b = tr B = $1 + $2 = arA brA

IIb = det B = $1$2 = det [ bAr]

(62.9)

In the preceding section we have defined IB to be the mean curvature. Now IIB is called the Gaussian curvature. Since

(62.10)

bA = a rQ baq

we have also

det [b det[a

II = B

] = det [baq]

aq

]

aq

(62.11)

a

where the numerator on the right-hand side is given by the equation of Gauss:

det[bAq]=b11b22-b122=R1212

(62.12)

Notice that for the two-dimensional case the tensor R is completely determined by R1212, since from (62.5)1, or indirectly from (59.10), R-rEA vanishes when ® = r or when E = A.

We call a point x e S elliptic, hyperbolic, or parabolic if the Gaussian curvature threre is positive, negative, or zero, respectively. These terms are suggested by the following geometric considerations: We choose a fixed reference point x0 e S and define a surface coordinate

Sec. 62

Three-Dimensional Euclidean Manifold

459

system (yr) such that the coordinates of x0 are (0,0)) and the basis {hr( x)} is orthonormal, i.e., (62.13)

a fa(0.0) = ,•

In this coordinate system a point x e S having coordinate (y1. y2) near (0.0) is located at a

normal distance d(y1,y2) from the tangenet plane Sx with d(y1,y2) given approximately by d (y ’. y 2 ) =

n (x )•(x - x ) 0

0

= n (x0 )

x ) yr+ 22 Dhf Dy

hr(

yryA

0

(62.14)

x0

=1 brA (x0) yr yA where (62.14)2.3 are valid to within an error of third or higher order in yr. From this estimate we see that the intersection of S with a plane parallel to Sx is represented approximately by the curve

bfa(

x ) yr yA = const 0

(62.15)

which is an ellipse. a hyperbola. or parallel lines when x0 is an elliptic. hyperbolic. or parabolic point. respectively. Further. in each case the principal basis {cr} of B (x0) coincides with the principal axes of the curves in the sense of conic sections in analytical geometry. The estimate (62.14) means also that in the rectangular Cartesian coordinate system (xi) induced by the

orthonormal basis {h1(x0).h2(x0).n}. the surface S can be represented locally by the

equations x1 = y ’.

x2 = y2.

x3 = 2 bFa( x0) yr yA

16

(62. )

As usual we can define the notion of conjugate directions relative to the symmetric bilinear form of B(x0). We say that the tangential vectors u.ve Sx are conjugate at x0 if B (u. v) = u • (Bv) = v • (Bu) = 0

(62.17)

460

Chap. 11

HYPERSURFACES

For example, the principal basis vectors c1(x0) and c2(x0) are conjugate since the components of B form a diagonal matrix relative to {cr} . Geometrically, conjugate directions may be explained in the following way: We choose any curve X(t) e S such that

X (0) = u eS

(62.18)

Then the conjugate direction vofu corresponds to the limit of the intersection of SX(t) with Sx as t tends to zero. We leave the proof of this geometric interpretation for conjugate directions as an exercise.

A direction represented by a self-conjugate vector v is called an asymptotic direction. In this case v satisfies the equation B (v, v ) = v.( Bv ) = 0

(62.19)

Clearly, asymptotic directions exit at a point xo if and only if xo is hyperbolic or parabolic. In the former case the asympototic directions are the same as those for the hyperbola given by (62.15), while in the latter case the asymptotic direction is unique and coincides with the direction of the parallel lines given by (62.15). If every point of S is hyperbolic, then the asymptotic lines form a coordinate net, and we can define an asymptotic coordinate system. Relative to such a coordinate system the components brA satisfy the condition

b11 = b22 = 0

(62.20)

II B =-b122 /a

(62.21)

and the Gaussian curvature is given by

A minimal surface which does not reduce to a plane is necessarily hyperbolic at every point, since when [> + [>2 = 0 we must have [> [>2 < 0 unless [> = [>2 = 0. From (62.12), S is parabolic at every point if and only if it is developable. In this case the asymptotic lines are straight lines. In fact, it can be shown that there are only three kinds of developable surfaces, namely cylinders, cones, and tangent developables. Their asymptotic lines are simply their generators. Of course, these generators are also lines of curvature, since they are in the principal direction corresponding to the zero principal curvature.

Sec. 62

Three-Dimensional Euclidean Manifold

461

Exercises 62.1

Show that

J= = n • (h1 x h2) 62.2

Show that

b _ 1 d2x (dx dx rA Oa dyAdyr ^dy1 dy2 62.3

Compute the principal curvatures for the surfaces defined in Exercises 55.5 and 55.7.

Chapter 12

ELEMENTS OF CLASSICAL CONTINUOUS GROUPS In this chapter we consider the structure of various classical continuous groups which are formed by linear transformations of an inner product space V . Since the space of all linear transformation of V is itself an inner product space, the structure of the continuous groups contained in it can be exploited by using ideas similar to those developed in the preceding chapter. In addition, the group structure also gives rise to a special parallelism on the groups.

Section 63. The General Linear Group and Its Subgroups In section 17 we pointed out that the vector space L(V;V) has the structure of an algebra, the product operation being the composition of linear transformations. Since V is an inner product space, the transpose operation

T :L (V ;V )^ L (V ;V )

(63.1)

is defined by (18.1); i.e., u• AT v = Au • v,

u, ve^

(63.2)

for any A eL(V;V). The inner product of any A,BeL(V;V) is then defined by

(cf. Exercise 19.4))

A • B = tr (ABT )

(63.3)

From Theorem 22.2, a linear transformation A eL(V;V) is an isomorphism of V if and only if det A * 0

(63.4)

Clearly, if A and B are isomorphisms, then AB and BA are also isomorphisms, and from Exercise 22.1 we have

463

464

Chap. 12

CLASSICAL CONTINUOUS GROUP

det(AB)=det(BA) = (detA) (detB)

(63.5)

As a result, the set of all linear isomorphisms of V forms a group GL (V ) , called the general linear group ofV . This group was mentioned in Section 17. We claim that GL (V ) is the disjoint

union of two connected open sets in L(V;V) . This fact is more of less obvious since GL (V ) is the reimage of the disjoint union (-»,0)u(0,») under the continuous map det: L (V ;V )^> R

If we denote the primeages of (-»,0) and (0,») under the mapping det by GL (Y) and GL (V )+ , respectively, then they are disjoint connected open sets inL(V;V) , and

GL (V) = GL (V)- oGL (V)+

(63.6)

Notice that from (63.5) GL(V )+ is, but GL (V )- is not, closed with respect to the group operation. So GL(V )+ is, but GL(V )-is not, a subgroup ofGL(V) . We call the elements of GL(V )+ and GL(V )- proper and improper transformations ofV , respectively. They are

separated by the hypersurface S defined by

AeS &AeL (V ;V)

and

det A = 0

Now since GL(V) is an open set in L(V;V) , any coordinate system in L(V;V)

corresponds to a coordinate system in GL(V) when its coordinate neighborhood is restricted to a subset ofGL(V). For example, if {ei} and {ei} are reciprocal bases foiV, then {ei ® ei} is a basis

for L(V;V) which gives rise to the Cartesian coordinate system {Xij } . The restriction of {Xij } to GL(V) is then a coordinate system onGL(V) . A Euclidean geometry can then be defined on

GL (V) as we have done in general on an arbitrary open set in a Euclidean space. The Euclidean

geometry on GL(V) is not of much interest, however, since it is independent of the group structure onGL(V).

The group structure on GL (V) is characterized by the following operations.

1. Left-multiplication by any AeGL (V ) ,

The General Linear Group

Sec. 64

465

LA : GL(V)GL(V)

is defined by LA (X) = AX,

Xe GL(V)

(63.7)

2. Right-multiplication by any A e GL (V)

RA :GL (V)^GL(V) is defined by

RA (X ) = XA,

X eGL (V)

(63.8)

3. Inversion J :GL(V}>GL(V)

is defined by J(X)=X-1,

XeGL (V)

(63.9)

Clearly these operations are smooth mappings, so they give rise to various gradients which are fields of linear transformations of the underlying inner product space L(V;V). For example, the gradient of LA is a constant field given by

V LA (Y) = AY,

Ye L (V; V)

(63.10)

since for any Y eL(V;V) we have d (A X + A Yt) d LA (X+Y t) = d tt dt

=AY

By the same argument V RA is also a constant field and is given by

Chap. 12

466

V RA ( Y ) = YA,

CLASSICAL CONTINUOUS GROUP

YeL (V ;V )

(63.11)

On the other hand, VJ is not a constant field; its value at any point X e GL(V) is given by

[VJ (X )](Y ) = -X-1YX,

Y e L (V ;V)

(63.12)

In particular, at X=I the value of VJ(I) is simply the negation operation,

[VJ (I)]( Y ) = - Y,

YeL (V ;V )

(63.13)

We leave the proof of (63.13) as an exercise.

The gradients VLA , AeGL(V), can be regarded as a parallelism on GL(V) in the

following way: For any two points X and Yin GL (V) there exists a unique AeGL(V) such that Y=LA X. The corresponding gradient V LA at the point X is a linear isomorphism

[VLA (X)] GLV)x >GL(V)y

(63.14)

where GL(V)X denotes the tangent space of GL(V) at X , a notation consistent with that introduced in the preceding chapter. Here, of course, GL(V)X coincides with L(V;V)X , which

is a copy of L(V;V). We inserted the argument X in VLA (X) to emphasize the fact that

VLA (X) is a linear map from the tangent space of GL(V) at X to the tangent space of GL(V)at the image point Y. This mapping is given by (63.10) via the isomorphism of GL(V)X and GL(V)Y with L(V;V).

From (63.10) we see that the parallelism defined by (63.14) is not the same as the Euclidean parallelism. Also, VLA is not the same kind of parallelism as the Levi-Civita parallelism on a hypersurface because it is independent of any path joining X and Y. In fact, if X and Y do not belong to the same connected set of GL(V), then there exists no smooth curve joining them at all, but the parallelism VLA (X) is still defined. We call the parallelism VLA (X) with AeGL (V) the

Cartan parallelism on GL(V), and we shall study it in detail in the next section. The choice of left multiplication rather than the right multiplication is merely a convention. The gradient RAalso defined a parallelism on the group. Further, the two parallelisms VLA and VRA are related by

The General Linear Group

Sec. 64

467

VLAA =VJ ° VRA- ° VJ 1

(63.15)

Since LA and RA are related by LA = J R o

A-1 o

J

(63.16)

for all AeGL(V) .

Before closing the section, we mention here that besides the proper general linear group GL (V )+ , several other subgroups of GL(V)are important in the applications. First, the special linear group SL(V)is defined by the condition

A eSL (V )& det A = 1

(63.17)

Clearly SL(V)is a hypersurface of dimension N2 -1 in the inner product space L(V;V). Unlike GL(V), SL(V)is a connected continuous group. The unimodular group UM (V) is

defined similiarly by

AeUM (V )^ |det A = 1

(63.18)

Thus SL(V) is the proper subgroup of UM(V) , namely

UM(V )+ =SL (V )

(63.19)

Next the orthogonal group O(V)is defined by the condition

AeO (V )& A-1 = A T

(63.20)

From (18.18) or (63.2) we see that A belongs to O(V) if and only if it preserves the inner product of V , i.e., Au • Av = u • v,

u, v e V

(63.21)

Further, from (63.20) if AeO(V), then det A has absolute value 1. Consequently, O(V)is a subgroup of UM(V). As usual, O(V)has a proper component and an improper component, the

468



Chap. 12

CLASSICAL CONTINUOUS GROUP

former being a subgroup of O(V), denoted by SO (V ) , called the special orthogonal group or the rotational group of V . As we shall see in Section 65, O(V)is a hypersurface of dimension

1 N(N -1) . From (63.20), O(V)is contained in the sphere of radius NN in L(V;V) since if 2 AgO (V), then

A • A = tr AA T = tr AA 1 = tr I = N

(63.22)

As a result, O(V) is bounded in L(V;V). SinceUM (V), SL(V), O(V), and SO (V) are subgroups of GL(V), the restrictions of the group operations of GL (V )can be identified as the group operations of the subgroups. In particular, if X, Yg SL (V) and Y = LA (X), then the restriction of the mapping V LA (X) defined

by (63.14) is a mapping

[VL„ ( X )]:SL V)X ^ SL V)Y

(63.23)

Thus there exists a Cartan parallelism on the subgroups as well as onGL(V). The tangent spaces

of the subgroups are subspaces of L(V;V) of course; further , as explained in the preceding chapter, they vary from point to point. We shall characterize the tangent spaces of SL(V) and

SO (V) at the identity element I in Section 66. The tangent space at any other point can then be obtained by the Cartan parallelism, e.g.,

1

SL(V')x =[VLX ( )](SL(V)i) for any XgSL (V) .

Exercise 63.1 Verify (63.12) and (63.13).

Sec. 64

The Parallelism of Cartan

469

Section 64. The Parallelism of Cartan The concept of Cartan parallelism on GL (V )and on its various subgroups was introduced in the preceding section. In this section, we develop the concept in more detail. As explained before, the Cartan parallelism is path-independent. To emphasize this fact, we now replace the notation VLA (X) by the notation C (X,Y), where Y=AX . Then we put

C (X > C (I, X >VLx (I)

(64.1)

for any X eGL (V) . It can be verified easily that

C ( X, Y ) = C (Y ) ° C ( X )-1

(64.2)

for all X and Y. We use the same notation C(X,Y) for the Cartan parallelism from X to Y for

pairs X, Y in GL(V)or in any continuous subgroup of GL(V), such as SL(V). Thus

C (X, Y) denotes either the linear isomorphism C(X,Y) :GL(V)X >GL(V)Y

(64.3)

C (X, Y) : SL (V)X ^ SL (V)Y

(64.4)

or its various restrictions such as

when X, Y belong to SL (V ).

Let V be a vector field on GL(V), that is V :GL(V) >L(V;V)

(64.5)

Then we say that V is a left-invarient field if its values are parallel vectors relative to the Cartan parallelism, i.e.,

[ C (X.Y)](V (X ))=V (Y)

(64.6)

Chap. 12

470

CLASSICAL CONTINUOUS GROUP

for any X, Y in GL (V ). Since the Cartan parallelism is not the same as the Euclidean parallelism induced by the inner product spaceL (V;V) , a left-invarient field is not a constant field. From

(64.2) we have the following representations for a left-invarient field.

Theorem 64. 1. A vector field V is left-invariant if and only if it has the representation V (X )=[C ( X )]( V (I))

(64.7)

for all X eGL (V) .

As a result, each tangent vector V(I) at the identity element I has a unique extension into a left-variant field. Consequently, the set of all left-invariant fields, denoted by

g l (V ), is a copy

of the tangent space GL(V)I which is canonically isomorphic to L(V;V) . We call the

restriction map

|I :gl(V) >GL(V)I = L(V;V) the standard representation of gl(V). Since the elements of

g l (V )

(64.8)

satisfy the representation

(64.7), they characterize the Caratan parallelism completely by the condition (64.6) for allV e gl (V ). In the next section we shall show that gl (V ) has the structure of a Lie algebra, so we call it the Lie Algebra ofGL(V). Now using the Cartan parallelism C(X,Y) , we can define an opereation of covariant

derivative by the limit of difference similar to (56.34), which defines the covariant derivative relative to the Levi-Civita parallelism. Specifically, let X(t)be a smooth curve in GL(V) and let U(t)be a vector field on X(t). Then we defince the covariant derivative of U(t)along

X (t) relative to the Cartan parallelism by DU (t) = U (t + A t )-[C (X (t), X (t +A t))](U (t))

Dt

At

(64.9)

Since the Cartan parallelism is defined not just on GL(V)but also on the various continuous subgroups of GL(V), we may use the same formula (64.9) to define the covariant derivative of

tangent vector fields along a smooth curve in the subgroups also. For this reason we shall now derive a general representation for the covariant derivative relative to the Cartan parallelism without restricting the underlying continuous group to be the general linear groupGL(V). Then

Sec. 64

The Parallelism of Cartan

471

we can apply the representation to vector fields on GL (V ) as well as to vector fields on the various continuous subgroups ofGL (V ), such as the special linear group SL (V ). The simplest way to represent the covariant derivative defined by (64.9) is to first express the vector field U(t) in component form relative to a basis of the Lie algebra of the underlying continuous group. From (64.7) we know that the values {Er (X), r = 1,..., M} of a left-invariant

field of basis {Er (X), r=1,..., M} form a basis of the tangent space at X for all X belonging to the underlying group. Here M denotes the dimension of the group; it is purposely left arbitrary so as to achieve generality in the representation. Now since U(t) is a tangent vector atX(t), it can be

represented as usual by the component form relative to the basis {Er (X (t))}, say U (t )=U r( t) Er( X (t))

(64.10)

where r is summed from 1 to M . Substituting (64.10) into (64.9) and using the fact that the basis {Er} is formed by parallel fields relative to the Cartan parallelism, we obtain directly DU(t)_ dUr(t) Er( X (t)) Dt dt

(64.11)

This formula is comparable to the representation of the covariant derivative relative to the Euclidean parallelism when the vector field is expressed in component form in the terms of a Cartesian basis. As we shall see in the next section, the left-variant basis {Er} used in the representation

(64.11) is not the natural basis of any coordinate system. Indeed, this point is the major difference between the Cartan parallelism and the Euclidean parallelism, since relative to the latter a parallel basis is a constant basis which is the natural basis of a Cartesian coordinate system. If we introduce a local coordinate system with natural basis {Hr ,r = 1,..., M}, then as usual we can

represent X (t) by its coordinate functions (Xr (t)), and Er and U (t) by their components

Er = EJ Ha

and

U (t ) = U r( t) Hr( X (t))

(64.12)

By using the usual transformation rule, we then have U r( t )=U a( t) F

(X (t))

where PFAr~l is the inverse of PEJ! . Substituting (64.13) into (64.11), we get

(64.13)



Chap. 12

472

CLASSICAL CONTINUOUS GROUP

D U = dUA + U q d F dXs E J H Dt ^ dt dX2 dt rJ A

(64.14)

We can rewrite this formula as

DU Dt

dUAqA dX IH U < LAl ^tJ H * dt qs

---------- +

(64.15)

where L ErA LAqs =E

d F> = d Xs

Fr 1 q

d ErA d Xs

(64.16)

Now the formula (64.15) is comparable to (56.37) with LAqs playing the role of the Christoffel

symbols, except that LAqs is not symmetric with respect to the indices q and s . For definiteness,

we call LAqs the Cartan symbols. From (64.16) we can verify easily that they do not depend on the choice of the basis {Er}. It follows from (64.12)1 and (64.16) that the Cartan symbols obey the same transformation rule as the Christoffel symbols. Specifically, if (Xr) is another coordinate system in which the

Cartan symbols are LQs , then

dXA d XT d X0 d XA d2 X * qs~L^ Q X * d X q d X s +d X * d X Qd X s

-A



,

&

(64.17)

This formula is comparable to (56.15). In view of (64.15) and (64.17) we can define the covariant derivative of a vector field relative to the Cartan parallelism by

VU =

d UA +uq Lqs IHA ® Hs d Xs

(64.18)

where {Hs}denotes the dual basis of{HA} . The covariant derivative defined in this way clearly

possesses the following property.

Theorem 64. 2. A vector field U is left-invariant if and only if its covariant derivative relative to the Cartan parallelism vanishes.

The Parallelism of Cartan

Sec. 64

473

The proof of this proposition if more or less obvious, since the condition

d UA + U Q LQS = 0 d XS

(64.19)

■'=0 d XS

(64.20)

is equivalent to the condition

where UA denotes the components of U relative to the parallel basis {Er}. The condition (64.20) means simply that UA are constant, or, equivalent, U is left-invariant.

Comparing (64.16) with (59.16), we see that the Cartan parallelism also possesses the following property.

Theorem 64. 3. The curvature tensor whose components are defined by Rr = d /'---- — Lr + L® Lr — L® Lr R SAQ _ - ,z A LSQ - ,ZQ LSA+LSQL®A LSAL®Q dX dX

(64.21)

vanishes identically.

Notice that (64.21) is comparable with (59.10) where the Christoffel symbols are replaced by the Cartan symbols. The vanishing of the curvature tensor R rSAQ= 0

(64.22)

is simply the condition of integrability of equation (64.19) whose solutions are left-invariant fields. From the transformation rule (64.17) and the fact that the second derivative therein is symmetric with respect to the indices Q and S , we obtain L®)

d XA d X T d X 0

) ) d X® d X Q d XS

r'&

(64.23)

which shows that the quanties defined by T A = A - LSQ A TQS “ L^QS

(64.24)

474

Chap. 12

CLASSICAL CONTINUOUS GROUP

are the components of a third-order tensor field. We call the T the torsion tensor of the Cartan parallelism. To see the geometric meaning of this tensor, we substitute the formula (64.16) into (64.24) and rewrite the result in the following equivalent form: T a E q E s E a d ET E a d E. Tas E