122 58 18MB
English Pages 305+x [319] Year 1967
LINEAR ALGEBRA
INTERNATIONAL SERIES IN PURE AND APPLIED MATHEMATICS William Ted Martin and E. H. Spanier, Consulting Editors AHLFORS - Complex Analysis BELLMAN - Stability Theory of Difl'erential Equations BUCK - Advanced Calculus BUSACKER AND SAATY - Finite Graphs and Networks CHENEY ~ Introduction to Approximation Theory CODDINGTON AND LEVINSON - Theory of Ordinary Differential Equations DETTMAN ~ Mathematical Methods in Physics and Engineering EPSTEIN - Partial Differential Equations GOLOMB AND SHANKS ~ Elements of Ordinary Differential Equations GRAVES . The Theory of Functions of Real Variables GREENSPAN - Introduction to Partial Differential Equations GRIFFIN - Elementary Theory of Numbers KAMMING - Numerical Methods for Scientists and Engineers aDEBEAND - Introduction to Numerical Analysis HOUSEHOLDER - Principles of Numerical Analysis LASS - Elements of Pure and Applied Mathematics LAss - Vector and Tensor Analysis LEPAGE - Complex Variables and the Lapage Transform for Engineers NEHARI . Conformal Mapping NEWELL - Vector Analysis RALSTON ‘ A First Course in Numerical Analysis ROSSER ' Logic for Mathematicians RUBIN - Principles of Mathematical Analysis SAATY AND BRAM - Nonlinear Mathematics SIMMONS - Introduction to Topology and Modern Analysis SNEDDON ~ Elements of Partial Difi'erential Equations SNEDDON ' Fourier Transforms STOLL - Linear Algebra and Matrix Theory STRUBLE - Nonlinear Differential Equations WEINSTooK - Calculus of Variations WEISS - Algebraic Number Theory ZEMANIAN - Distribution Theory and Transform Analysis
LINEAR ALGEBRA WALTER NEF Professor of Matbematits, University of Berne, Switzerland
Translated from the German by
I. C. Ault Letturer in Mathematics,
University of Leicester
MCGRAW—HILL BOOK COMPANY
New York ' St: Louis ' San Francisco ' Toronto ' London ' Sydney
Published by McGraW-Hill Publishing Company Limited McGraW-Hill House, Maidenhead, Berkshire, England
Authorized translation from the first German-language edition, copyrighted in Switzerland and published by Birkhauser Verlag AG, Basel 94039
Lehrbuch der Linearen Algebra first published by Birkhéiuser Verlag, Basel, 1966. © 1966 Birkhauser Verlag, Basel
Copyright ©1967 McGraw-Hill, Inc. All rights reserved. N0 part of this publication may be reproduced, stored in a
retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or
otherwise, without the prior permission of McGraW—Hill, Inc.
CONTENTS SETS AND MAPPINGS Sets Families of Sets
wn—I
1 .1 1.2
1 .3 Mappings VECTOR SPACES
2.1
The Concept of a Vector Space, Examples
2.2
Rules for Calculation 1n Vector Spaces
2.3
Linear Combinations and Calculation \1- ith Subsets of a Vector Space . . Subspaces of a Vector Space . Cosets and Quotient Spaces Convex Sets .
2.4 2.5 2.6
. 19 21 25 31
BASES OF A VECTOR SPACE, FINITE-DIMENSIONAL VECTOR SPACES
3.1 3.2
Bases of a Vector Space . Finite-Dimensional Vector Spaces .
3.3
The Exchange Method
3.4
Convex Polyhedra
38 47 51 61
DETERMINANTS 4.1 4. 9 4.3
Permutations Determinants . Numerical Evaluation of Determinants
69 72 78
LINEAR MAPPINGS 0F VECTOR SPACES, MATRICES
. Linear Mappings . Linear Mappings of Finite-Dimensional Vector Spaces, Matrices . 5.3 Linear Mappings of a Vector Space into Itself (Endomor— . . . . . . phisms) 5.4 Change of Basis 5.5 Numerical Inversion of Matrices 5.6 The Exchange Method and Matrix Calculation. 5.1 5.2
V
82 87 96 100 102 105
fi
6
CONTENTS
LINEAR FUNO'rIONALs
6.1 6.2 63 7
Duality 1n Finite-Dimensional Vector Spaces Linear Functionals which are Positive on a Convex Set
The solutions of a System of Linear Equations
Numerical Solution of a System of Linear Equations Positive Solutions Of a System of Real Linear Equations Systems of Linear Inequalities
8.1 8.2 8.3
Linear Programmes
8.4 8.5 8.6
The Treatment of Free Variables
The Duality Law of Linear Programming The Simplex Method for the Numerical Solution of Linear .
General Linear Programmes .
The Simplex Method and Duality
155 170 172 178
Tchebychev’s Method of Approximation . The Proof Of two Results used Earlier
184 193
TWO-Person Zero-Sum Games, Pure Strategies Mixed Strategies .
The Evaluation of Games by the Simplex Method
195 197 201
FORMS OF THE SECOND DEGREE 11.1 11.2
12
.
GAME THEORY 10.1 10.2 10.3
11
.
149 153
TCHEBYCHEV APPROXIMATIONS
9.1 9.2 10
131 134 140 143
LINEAR PROGRAMMING
Programmes
9
107 111 121
SYSTEMS 0F LINEAR EQUATIONS AND INEQUALITIES
7.1 7.2 7.3 7.4 8
Linear Functionals and Cosets
Quadratic Forms on Real Vector Spaces
Hermitian Forms on Complex Vector Spaces
207 216
EUCLIDEAN AND UNITARY VECTOR SPACES
Euclidean Vector Spaces 220 Approximation 1n Euclidean Vector Spaces, The Method of Least Squares . . . . . 230 12.3 Hilbert Spaces . 238 12.4 Unitary Vector Spaces . 243 12.1 12.2
CONTENTS
l3
14
vii
EIGENVALUES AND EIGENVECTORS OF' ENDOMORPHISMS OF A VECTOR SPACE 13.1 13.2 13.3 13.4
Eigenvectors and Eigenvalues Symmetric Endomorphisms of a Euclidean Vector Space The Transformation of Quadratic Forms to Principal Axes Self—Adjoint Endomorphisms of a Unitary Vector Space
13.5
Extremal Properties of the Eigenvalues of Symmetric and Self--Adjoint Endomorphisms 258
13.6
Numerical Calculation of Eigenvalues and Eigenvectors
247 251 253 256
262
INVARIANT SUBSPACES, CANONICAL FORMS OF MATRICES
14.1
Invariant Subspaces
14.2
Canonical Forms of Matrices
Bibliography Index .
.
280 292 300 302
PREFACE This book is based on an introductory course of lectures on linear algebra
which I have often given in the University of Berne. This course is intended for students in their second year and is for both specialist and supplementary mathematicians. These latter usually include actuaries, astronomers, physicists, chemists and also, sometimes, mathematical economists.
Because of the wide diversity of the audience, it was necessary to develop the discussion from as few basic assumptions as possible and this is a feature of the book. It is assumed that the reader has had a grammar school education in mathematics and that he has the ability to think abstractly.
It is also helpful if he has some knowledge of vector geometry because this will provide him with visual illustrations of the general theory.
It was also necessary to restrict the course to the most important topics so that in the book, with the exception of the last chapter, only real and complex vector spaces will be considered. In addition, it was necessary to consider the applications of linear algebra in accordance with the needs of the majority of the audience. Apart from details concerning the choice ofmaterial, the main effect ofthis requirement has been the inclusion of simple techniques for the solution of the most important types of numerical problem. Naturally, it has not been possible
to give a complete discussion of these. It is not usual practice to deal with linear programming, Tchebychev approximations, and game theory in a textbook of linear algebra but these topics are becoming increasingly important, so I have presented them in the form of introductory chapters. The publication of this book would hardly have been possible without the
very considerable help of several co-workers. Especially, I would like to thank Miss Dora Hanni for typing the manuscript and Messrs. H. P. Brier, H. P. Blau, D. Fischer, and N. Ragaz for their help with the proof-reading and also H. P. Blau for drawing the diagrams. Lastly, I thank the Birkhéiuser Verlag and the McGraW-Hill Publishing Company, Limited for their always friendly co-operation and their careful preparation of the book. W. NEF
Notes for the Reader
1 . The book is divided into fourteen chapters and the chapters into sections. The numbering of definitions, theorems, examples and individual formulae starts afresh in each section. For example a reference to Theorem 2.4 ;9 would mean the Theorem 9 in section 2.4. Similarly, 3.3;(11) would mean the formula (11) in section 3.3. The number of the section is omitted whenever the reference is made within the same section.
2. In contrast with the usual notation, the matrix which has just one column consisting of the components 5 ofa vector x will be denoted by g (instead of
x). The transpose of g is denoted by g'. This avoids the problem of having different things denoted by the same symbol. (See 5.2.2.) 3. At the end of some of the sections, there are ‘Exercises’ and ‘Problems’.
The exercises are numerical examples and the problems are theoretical examples which are often extensions of the material in the text. 4. Readers who are not interested in linear programming, Tchebychev
approximations, and game theory may omit the following sections and chapters: 2.6, 3.4, 6.3, 7.3, 7.4;8, 9, 10.
CHAPTER 1
SETS AND MAPPINGS In this chapter we state and discuss the properties of sets and mappings which will be needed in the subsequent chapters. We will do this at an intuitive level without attempting to carry out a strictly axiomatic
approach. 1.1
Sets
A set is a collection of objects which is thought of as an entity in itself. The objects as in a set M are usually referred to as the elements of the set and this
relationship is written formally as xeM. This is usually read as ‘x is an element of M’ or ‘2: belongs to M’ or simply as
‘2: is in M’. A set M is said to ‘contain’ its elements. If the object 3/ is not an element of M, we write y 9% M.
A set is completely determined by its elements. Consequently two sets M and N are considered to be equal if and only if they contain the same elements, i.e., M=N if and only if
926M implies xeN and
xeN implies xeM.
A set is said to be finite if it contains only a finite number of elements. If every element of the set A is also an element of the set B, we say that A is a subset of B and express this in symbols by AEB
or
BEA
These are usually read as ‘A is a subset of B’ or ‘A is contained in B’ or ‘B contains A’. In order to prove that the set A is a subset of the set B it is therefore necessary to show that
a: e A
implies x e B.
CH. 1: SETS AND MAPPINGS
Clearly A :3 if and only if A g B and B E A. Every set A is a subset of itself. All other subsets M of A are referred to as proper subsets of A and we express this in symbols by M CA. In many cases it is convenient to be able to consider the set which con-
tains no elements at all. We call this the empty set and denote it by the symbol 25. The empty set is a subset of every other set. In order to characterize a set, it is necessary to specify its elements. For a finite set A, this can be done by writing out a list of its elements in the form A = {$1, . . ”22”}. Thus, for example, {1,3,5} is the set which contains the numbers 1, 3 and 5 as its elements. If A is countably infinite, we will extend this notation to write A = {x1,x2,x3, . . .}.
More generally, we will use the notation
A = {96; ¢(x)}
(1)
where 95(x) denotes a statement by which the elements of A are characterized. In words (1) reads, ‘A is the set of all those elements a; for which the
statement (Mac) is true’. For example, in 3-dimensional space, if a: is an arbitrary vector and y is a fixed vector, then A = {313; a: is orthogonal to y} is the set of all vectors x which are orthogonal to the vector g. If f is a real
number and N is the set of all natural numbers (positive non-zero integers), then
B = {6; £2 6 N} is the set of square roots of the natural numbers. 1.2
Families of Sets
Many of the sets studied in mathematics have sets as their elements, that is
to say, we will need to deal with ‘sets of sets’. If we wish to emphasize that the elements of a set .V are again sets, we will call 5’ afamily of sets and use
capital script letters.
Example 1. For each positive non-zero real number p, let Ap be the set AP={E;—p 0, then the intersection becomes 0h?) =AP0 e .5" and 5’ has the least element AM. The difference set A\B of two sets A and B is defined by A\B = {x;xeA,x¢B}. That is, A\B consists of those elements of A which are not contained in B. For example, if A = {1, 2,5, 7} and B: {5,6,7}, then A\B= {1,2} and B\A = {6}.
4
1.35 MAPPINGS
Problems 1 . Prove the distributive laws
An(BUC) = (AnB)U(AnC) and
AU(B00)=(AUB)O(AUO)
2. Let E be a set and, for each subset A ofE, let A denote the complement of
A in E, i.e. the set E\A. Prove the following rules.
(MAUB=ZnB (mAnB=ZuE (c) A E BifandonlyifAnE = 9
1.3 Mappings 1.3.1 General Mappings Let E andF be two sets. A nwpping ofE intoF is a rule which assigns to each element x e E a unique element 3/ e F. The element y e E which is assigned to the element a: e E by the mapping fis called the image ofa: underfand is denoted byf(x). The set E is called the domain off and F is called the range off.
Example 1. Suppose E is the set of all squares in a Euclidean plane and F is the set of all real numbers. A mapping of the domain E into the range F may be defined by assigning to each square the real number which is its area. Mappings will usually be denoted by small Roman letters f, g, .
(An
exception will be made in the case of permutations, cf. 4.1.) When an element :1: e E goes into the element 3/ GF under the mapping f, this will be indicated by the notation may =f(51:). Two mappingsfand g ofE intoF will be considered to be equal iff(x) = g(x) for all x e E, and this will be written briefly as f= 9.
Not every element of F needs to be the image of an element in E. The images of the elements of E form a subset ofF which we call the image of E under f and denote by f(E). Thus f (E) = {y; y e F and there is an element x e E such that y:f (95)}. If it happens that f (E) =F, we say that f is a mapping of E onto F. More generally, if A is a subset of E, we use the notation f(A) for the set of all the images of the elements of A. That is, f (A) = {31; there is an x e A such that f(x)=y}. Now let E,F and G be three sets and letf be a mapping of E into F and g 5
CH. 1: SETS AND MAPPINGS
a mapping ofF into 0". We can define a mapping of E into G by assigning to the element x e E the element z=g(f(x)) in G. This mapping is called the product of g and f and we denote it by gf (in this order!). The product mapping gf is therefore defined by gf(x) = g(f(x)) for all x e E.
It is important to remember the order of the factors. The productfg may have no meaning in general and even when it has (e.g., when E=F=G) fg need not be equal to gf.
Example 2. Suppose E=F=G={l,2,3}. Let f be defined by f(l)=2; f(2)=3; f(3)=1 and g by g(l)=l; g(2)=3; g(3)=2. Then, for example,
f9(2) =f(3)= 1 and 9f(2) =9(3) =2 and hencefgaégfNow let H be a fourth set and let h be a mapping of G into H. Then both the products h(gf) and (hg)f have a meaning and, in the following fundamental Theorem, we prove that they are equal. Theorem 1. Multiplication of mappings is associative.
Proof. For a: GE, let y=f(x) eF, z=g(y) GG and u=h(z) EH. Then gf(z) = z and h(gf ) (x) = h(z) = 71.. Similarly hg(y) = u and therefore (kg)f(2:) = u. Thus h(gf) (x) = (by)f(x) for all x e E and hence h(gf) = (119)f. In view of this theorem it is possible to omit the brackets and simply write hgf for the product of h, g and f. Iff is a mapping of E intoF and K is a subset of F, we define the inverse
image of K under f to be the subset of E given by f‘1(K) = {x;e,f(x) eK}.
That is, f‘1(K) is the subset of E which consists of all those elements whose images under f are in K. Note that the symbol 1"1 does not represent a
mapping in general because it is possible for many elements in E to have the same image in F. Clearly f‘1(F) =f‘1(f(E))=E, but we also note that it is possible for f‘1(K) to be the empty set and that this will happen when K n f (E)=@. If K consists of a single element, say K ={y}, we will write f‘1(y) for f‘1({y}) and this then means the set of all x e E for which f(3;) :31.
Example 3. As in Example 1, let E be the set of all squares in a Euclidean plane, F the set of all real numbers and f the mapping which assigns to each square its area. If f e F and §0, then f‘1(§) is the set of all squares of side 1/g. 6
1.3: MAPPINGS
1.3.2
0ne-to-0ne Mappings
A mapping f is called one-to-one (1-1) when no two difl'erent elements in the
domain have the same image in the range, i.e. when
701 ¢ 9’2 implies f($1) 95 f(x2) or, alternatively, f(.731) = f(”2)
implies
$1 = “'2'
If E is a finite set, a mapping f of E is clearly 1-1 if and only if the image f (E) contains exactly the same number of elements as E.
Whenf is 1-1, f'1(y) contains exactly one element x e E for each element 3/ e f(E). Putting f‘1(y)=x (instead of {x}), f—1 can now be thought of as a mapping from f(E) onto E and we call it the inverse mapping of f. The
mapping f‘1 is also l-l. Obviouslyf‘1f is the mapping of E onto itself under which each x e E goes into itself. We call this the identity mapping on E.
Similarly fi‘l is the identity mapping on f(E). Theorem 2. The product of two l-l mappings is 1-1.
Proof. Let f and g be 1-1 mappings and suppose that gf is defined. If
9f(x1) =9f(x2), then 9(f(x1))=9(f(x2)) and, since 9 is 1-1, f(x1)=f(x2) and hence x1 =x2.
1.3.3 Mappings of a Set into Itself It is possible for the domain and the range of a mapping to be the same set and, in this case, we will be dealing with a mapping of a set into itself. The best known examples of this are provided by the concept of a function. For example the function y = sinx assigns to each real number x the real number sin x, so that the function is a mapping of the set of real numbers into itself.
The 1 -1 mappings ofa set E onto itselfare ofparticular interest. (Note that if E is finite, a mapping of E into itself is 1-1 if and only if it is onto E so that only one of these two properties needs to be assumed.)
If we denote by T(E) the set of all 1-1 mappings of the set E onto itself, then T(E) has the following properties.
1. Iff, g E T(E), thenfg and gf e T(E). 2. Iff e T(E), thenf‘1 e T(E). 3. The identity mapping on E is in T(E).
Together with the associativity of the multiplication of mappings proved earlier in Theorem 1.3; 1, properties 1, 2, 3 show that T(E) is a group. (For the concept of a group, see 2.1.1 or [25] pp. 1—3.)
CH. 1: SETS AND MAPPINGS
Example 4. Let E ={1,2,3}. The group T(E) consists of the following six
mappings,
H122) 12—62:) #62?) fs=(;i2)
fs—(éii’) HHS)
In these, the mapping f is defined by writing the elements of E in the first row and then under each element its image under f. Thus fl is the identity
mapping, and, for example, fgl =f3, fgl =f5. It is clear that f1,f2, . . .,f6 are just the permutations of the three elements 1 , 2, 3. Permutations of a general set of elements will be introduced later in 4.1 by using this idea. Problems
1. Letf be a mapping of the set E into the setF and let A, B be subsets of E. Prove that f(A U B)=f(A) Uf(B) and f(A n B) S f(A) nf(B). Give an example to show that the g sign cannot be replaced by = in the second of these rules. What more can be said iff is 1-1? 2. Let f be a mapping of E into E and let A,B be subsets of E. Further let A =f_1(A*) andf(B)=B*. Prove thatf(A fl B)=A* n 3*.
CHAPTER 2
VECTOR SPACES 2.1
The Concept of a. Vector Space, Examples
Linear algebra can be characterized as the study of vector spaces. Vector
spaces, which are sometimes also called linear spaces, are a particular kind ofalgebraic system and accordingly they are sets in which certain operations on the elements are defined. (For the general concept of an algebraic system, see for example [24].)
2.1.1
Real Vector Spaces
Let E be a set whose elements are denoted by small Roman letters, and suppose that there is a rule which assigns to each ordered pair of elements it, y e E a further element 2 e E which we write as the sum ofx and y, i.e. we express the relationship between 2 and 95,3; by the formula
2 = x+y. The construction of z from x and y is known as a binary operation on the set
E which in this case is written as addition. We have said that 2 should be constructed from the ordered pair x,y (i.e., the pair y,x is different from the pair x,y), because initially we do not want to assume that the addition is commutative, i.e., that x + y = 3/ +3: for all x, y e E. We will be able to prove this as a consequence of the other conditions to be introduced later.
Now suppose that there is a further rule which assigns to each real number a and each element x e E a further element a e E which we Write as 714:“.
The element a will be referred to as the product of the element x e E with the
real number (or the scalar) oz. The construction of a from 0: and a: is a binary operation in the set of all real numbers and of elements of E which we will call multiplication by scalars.
Definition 1. The set E together with the operations considered above is a ‘real vector space’, if the operations satisfy the following seven axioms.
CH. 2: VECTOR SPACES
Al. Addition is associative x+(y+z) = (x+y)+z for all x,y,z e E
A2. There is a zero-element O e E such that
x+0 = 0+:c = mforalle A3. To each a: e E, there is an inverse element (—x) e E such that (—x)+x = x+(—:v) = 0
Except when there may be some risk of misunderstanding we will normally write —x in place of ( —-a:). Ml. lx=x for all x e E M2. Multiplication by scalars is associative (10396) = (afi)xfor all scalars cc, )3 and all a: e E D1. u(x+y)=ax+ocy for all scalars a: and all x,y e E
D2. (oc+,B)x=ax+)3x for all scalars «,3 and all x e E D1 and D2 are known as distributive laws. We will prove in 2.2.5 that, as a consequence of these seven axioms, the
addition is also commutative. Because this is so important, we will already make a note of it here in the form of a further axiom. A4. w+y=y+x for all any 6 E (see Theorem 2.2.1)
An algebraic system which has a binary operation satisfying the axioms Al, A2 and A3 is called a group. There are many examples of groups in which the operation is not commutative (e.g., the groups A(E) and GLn in 5.3, where the operation is written as multiplication instead of addition). Consequently, in the case of a vector space, the axioms Al, A2 and A3 Will not be sufficient in themselves to prove A4, and the other axioms must also
be used in the proof. The elements of a vector space will usually be referred to as vectors. This name comes from the first of the following examples of vector spaces in which the elements are vectors of 3-dimensional space in the sense of elementary geometry. As before, we will use small Roman letters for vectors and small Greek letters for scalars throughout the rest of this work.
The significance of Linear Algebra in mathematics and in its applications stems from the fact that various types of vector space arise naturally in many different branches ofmathematics. We will now set out some examples of these for future reference.
Example 1. We start with an (affine or Euclidean) 2-dimensional plane or a 3-dimensional space and consider vectors in the sense of elementary geo-
10
2.1: THE CONCEPT OF A VECTOR SPACE
metry. These can be represented by directed segments (arrows) where two
arrows which are obtained from each other by a parallel shift represent the same vector. For these vectors, an addition and a multiplication by real
scalars are defined in a way which is well-known and may be remembered with the help of Fig. 2.
It is possible to prove by geometrical methods that these two operations satisfy the seven axioms of Definition 1. The set E of all vectors of the plane or of the 3-space is therefore a real vector space with this addition and multiplication by scalars. We Will refer to these two spaces as G2 (for the
plane) and GS (for the space). Note that they are difierent from the plane and space which appear in their construction and which are of course sets of points. This example is of particular importance for two reasons. 1. It makes possible the use of Linear Algebra in the development of analytic geometry. 2. It enables us to illustrate results in Linear Algebra geometrically in a plane or in 3-space. Example 2. Let n be a natural number and let E be the family of all ordered sets ofn real numbers (n-tuples). Ifx= (f1, . . ., E”) e Eandy = (771, . . ., 17") e E are two such ordered sets, we define their sum by x+y = (§1+771:-~:§n+7]n) GE:
and the product of x and the scalar oz by (xx = (afl,...,afn).
It is easy to verify that the seven axioms are satisfied. In particular the zero-element is 0: (0, . . .,0) e E. (Note that here the left-hand zero denotes 11
CH. 2: VECTOR SPACES
the zero-element or zero-vector of E, while the zeros inside the brackets
denote the real number 0.) Further —x=(— £1, . . ., —f,,). The real vector space so defined is called the space of n-tuples and we will refer to it as Rn.
Example 3. Instead of ordered sets of 11, real numbers, we can also consider
countably infinite sequences of real numbers, i.e., x = (51,52, {3, . . .). Addition and multiplication by scalars can be defined exactly as in Example 2 (viz. ‘term by term’) and, in this way, a new real vector space is constructed which we call the space of sequences and denote by F.
We obtain another vector space F0 which is a subset ofF, by considering only those sequences which contain all zeros from some index on (the index may differ from sequence to sequence), i.e., those sequences which contain only a finite number of non-zero terms. Example 4. In Example 3, a vector x e E is constructed by assigning to each natural number k a real number ék. Each 9: is therefore a mapping of the set N of natural numbers into the set R of real numbers (i.e., a real-valued
function defined on N). The same is true in Example 2 when N is replaced by {1,2, . . .,n}. We obtain a generalization of these examples by replacing N with an arbitrary set A. We denote by F(A) the set of all mappingsf of the set A into the set R of real numbers. If we now define for mappings fig 6 F(A) and scalar ac and
f+9
by (f+9)(2) = f(2)+9(Z)
06f
by
(1)
(0613(2) = 04(2)
for all z e A, then F(A) becomes a real vector space. By analogy with Example 3, we can further consider the setF0(A) E F(A) which consists of those mappings f e F(A) such that f(z) #0 for only a finite number of elements 2 e A. Obviously (1) defines operations in F0(A) and F0(A) is itself a vector space.
Example 5. Again let n be a natural number and let P” be the set of all real polynomials x in a real variable 7 which have degree at most n. n
”(7) = 2 01k
(a0,...,o¢n real).
We define addition and multiplication by scalars in the usual way for functions, i.e. we put
and 12
2 = x+y
when
2(1) = x(7)+y(r)
for all 1- ER
a = ax
when
u(7-) = «90(7)
for all 7 ER.
2.1: THE CONCEPT OF A VECTOR SPACE
With these definitions x+y and ax are polynomials of degree at most n and the seven axioms are satisfied. This real vector space of polynomials of degree at most a is denoted by P". Example 6. In the same way, we can consider the set P of all real poly-
nomials (without restriction on the degree). We then obtain the space of polynomials P. Example 7. Let 0 be the set of continuous real-valued functions x(-r) on the real interval — l S 7- S +1. Addition and multiplication by scalars are defined as in Example 5. Since, from any 6 0, it follows that x+y e C and out 6 0' for all scalars a, and since the axioms are satisfied, we obtain another real vector space which will be denoted by 0.
Of course we can replace the interval — l S 7' g + 1 by an arbitrary closed interval p < -r < o- to obtain a vector space C(p, 0).
Example 8. Let 1- be a variable angle measured in radians. The functions cos k7 and sin k-r are periodic ofperiodicity 277 for each natural number k and so can be thought of as real-valued functions on the circumference ofa circle. The same is true for all finite sums of the form 22(7) = %oco+ E {ackcoslc-r+]3ksinkr} k=1
(2)
with real coefficients oak and Bk, all but a finite number of which are zero. (The factor 12» in the term %oc0 is introduced for purely technical reasons—see
Example 3.1 ;8.) If we again define addition and multiplication by scalars as in Example 5, we see that, if a: and y are of the form (2), then so are x +y and ax. Also the seven axioms are satisfied and hence we obtain the real vector space of trigonometric polynomials which we denote by T.
Example 9. Let fl, . . ., 5,, be real variables. A linear form in these variables is a function x which can be represented in the form
x(a,...,a) = u1§1+...+«na = élaka with real coefficients a1, . . ., oak. It is easy to verify that the set of all linear forms in the variables $1, . . .,§,, together with their natural addition and multiplication by scalars is a real vector space Ln. Example 10. The sum x+y of two complex numbers and the product we of a complex number x by a real number at are again complex numbers and the seven axioms are satisfied. Hence the set of complex numbers with the
13
CH. 2: VECTOR SPACES
usual addition and multiplication is a real vector space which we will denote by K. (In this context, only the multiplication by real numbers is relevant and the multiplication ofarbitrary complex numbers is not needed.)
Example 11 . In Example 10, the set of complex numbers can be replaced by the set ofreal numbers. The latter then becomes a real vector space which we call the vector space of real scalars and denote by SR. 2.1.2
Complex Vector Spaces
The definition of the concept of a vector space can be straight-forwardly modified so that the complex numbers appear as scalars in place of the real numbers. We will then use the term ‘complex vector space’.
Definition 2. A ‘complex vector space’ E is a set in which an addition (x,y e E—>z=x+y e E) and a multiplication by complex numbers (a: e E, at complex—>u=ocx e E) are defined in such a way that the seven axioms of
Definition 1 are satisfied. We will again refer to the elements of a complex vector space as vectors and to the complex numbers as scalars in this context.
The theories of real and of complex vector spaces have very many results in common. Because of this, it is convenient in the following to make the
convention that all results, which are not specifically stated to apply to only one type of vector space, apply to both. Examples of Complex Vector Spaces Example 12. In Examples 2 and 3 (Rn,F,F0), if we use ordered sets and
sequences of complex numbers (instead of real numbers) and use complex scalars, we obtain three examples of complex vector spaces. Similarly we obtain a complex version of Example 4, by replacing F(A) with the set of all mappings of A into the complex numbers and at the same time use complex scalars. Example 13. Corresponding to Examples 5 and 6 (Pu and P), we obtain
complex vector spaces by leaving the variable 7- to be real but allowing the coefficients oak of the polynomials to be complex and using complex scalars. Notice that the set of polynomials with complex coefficients can also be
made into a real vector space by considering only the multiplication by real scalars. This set becomes a complex vector space as soon as arbitrary complex
numbers are allowed as scalars. Naturally we could also take 7- to be a complex variable and, in this way, 14
2.1: THE CONCEPT OF A VECTOR. SPACE
we would obtain new examples of vector spaces insofar as the elements are new. However it is well known that a polynomial in a complex variable 7- is completely determined by its values on the real numbers so that in fact no essentially different example is found in this way.
Example 14. Example 7 (0) can be made into a complex vector space by taking the functions 22(7) to be complex-valued continuous functions on the
real interval — 1 s Ts + l. The sum of two functions of this type and the product of one by a complex scalar are again functions of the same type. Example 15. A complex vector space is constructed from Example 8 (T), if the coefficients ah and Bk are allowed to be complex and the scalars are taken to be complex. In view of Euler’s formulae
6"" = cos k1- + i sin [or 61127 + e—ilc'r
eikr _ e—ilcf
cosk-r = ——
sink'r = —.—
2
21
the complex trigonometric polynomials can be written more concisely in the form
x(1-) =
+ no _ Z oak e'k"
(oak = O for all but a finite number of It’s).
k=-—no
Example 16. The complex version of Example 9 (L) is also easy to define.
We merely have to take the coefficients 0:17 and the scalars to be complex. It is also possible to choose the variables Eh to be real or complex.
Example 17. Finally the set of all complex numbers is itself a. complex vector space with the usual addition and multiplication. We call it the vector space of complex scalars and denote it by SK. Problems
1. Is the set of all integers with their usual addition and multiplication by real scalars a vector space? 2. Show that the set of all vectors x: (£1,52,§3) e R3 for which 251 —§2 + £3 =0 is a vector space with the operations defined in R3. Is this still true for the set of those vectors for which 2E1 —§2+ f3: 1? 3. Show that the set of all polynomials x(1-) e P (Example 2.1 ; 6) for which +1 j' x(1-)d—r=0 is a vector space with the operations defined in P. Is this still —1 +1
true if the condition is j' x(-r)d-r= l?
—1
l5
CH. 2: VECTOR SPACES
4. Let E1,E2 be real vector spaces. Let F be the set of all ordered pairs ($1,962), Where x1 6 E1 and x2 6 E2, and define (31,12)+(y1:y2) = (z1+y1,x2+y2)
“(xlyxfl = (“371,“l-
Prove thatF is a vector space. This is known as the directproduct ofE1 and E2. (If E2 is spanned by a single element uaéO (see Definition 2.4; 2), then the construction of the direct product is also referred to as adjoining the vector u to the space E1.)
2.2
2.2.1
Rules for Calculation in Vector Spaces
Sums of Finitely Many Vectors
Addition in a vector space is initially defined only for two terms. If we are given three vectors x, y, z e E, we can first form x + y and then the sum of this vector with z, i.e., (x+y) +2 6 E. Similarly we can form x + (y+2) E E. The associative law Al now says that these two expressions represent the same vector. In other words, it does not matter in which way the three terms are bracketed together (for a given order) and so the brackets may be omitted. Hence, for given x, y, z, w = x + y + z is a. well-defined vector in E. The same is
true for an arbitrary number of vectors. For example, for four vectors, in View of A1,
a£+(3/+(Z+W))= (x+y)+(Z+W)=((w+1/)+Z)+w so that it is possible to write x+y+z+ w. In order to avoid possible misunderstandings, we note here that a sum offinitely many vectors has a well-defined meaning but that infinite series of vectors are meaningless (nevertheless see 12.3). A convenient notation for a sum of finitely many vectors is obtained by using the summation sign. it
x = x1+...+x,, = 2 wk.
k=1
2.2.2
The Zero-Vector and Inverse Vectors
In connection with A2, we note that, in a vector space E, there is only one zero-element. If 0 and 0* are two such elements, i.e. 0 +a: :96 and y +0* = y for all my 6 E, then in particular 0=0+O* =0*. (Put an: 0* and 11:0.) Similarly we note that, in connection with A3, there is only one inverse vector of a given vector x e E. If ( —x) and (—x*) are two such vectors, then
(—x)* = 0+(—x>* = ((—x)+x)+(—x>* = (—x>+(x+(—x)*) = (—x)+0 = (—x). 16
2.2: CALCULATION IN VECTOR SPACES
Note that we have not used either at: + ( —— x) = O or ( — x)* + x = 0 in proving
this. Axiom A3 could therefore be stated in the following weaker form: ‘To each x e E there is a left inverse element (— x) such that (— x) + :1: = 0 and a right inverse element (—x)* such that x + ( —x)* = 0’. From the above
argument it follows that (—x) = ( —:c)*. A corresponding result is also true for the zero-vector. According to A3, — :49 also has an inverse vector which is actually equal to :19 again (because (—x) +x=x+ ( ~92) =0). Thus
—(—x) = x.
(1)
Since (x+y)+((-y)+(-x))=x+((y+(—y))+(-x))=x+(‘x)=0 and similarly ((— y)+(—x))+(x+y)=0, the inverse of a sum of vectorsxandy
is given by
-(x+y) = (-y)+(-x)2.2.3
(2)
Subtraction of Vectors
Corresponding to two given vectors 95,1; 6 E, there is a unique vector 2 e E
such that z+x = y
(3.1)
(viz. z=y+ ( —x)). This follows because, adding — x to both sides of equation(3.l) on the right,
we have
y+(—x) = (2+x)+(-x) = 2+(x+(-x)) = z and with this 2 we obtain
2+x = (y+(—x))+x = y+0 = 3/Instead ofy + ( — x) we usually write 3/ — x and call this the difference ofy and a: (in this order). We must remember however that y — :1: has no other meaning than y + ( —— x). The two equations 2+3: = y
and
z = 11—2:
(3.2)
therefore have the same meaning.
2.2.4 Rules for Multiplication by Scalars If a: e E and ac is a scalar, then
ax=0ifandonlyifac=0 or x=0.
(4)
Proof. 1. For an arbitrary scalar a, a0 = oc(0+0) = a0+oc0 l7
CH. 2: VECTOR SPACES
and, adding — (0:0) to both sides, it follows that «0:0. 2. For an arbitrary vector x e E Ox = (0+0)a: = 0x+0x and, adding — (0x) to both sides, it follows that 0x=0. 3. Let ozx=0 and assume that «#0. By 1, M1 and M2, it follows that x: lx=(la)x=l(ax)=10=0.
a:
o:
a:
Since x+(—l)m=lx+(—-l)x=0z=0, and similarly (—l)x+x=0, it followsthat
—-x = (— 1) x.
(5)
Multiplying both sides of (5) by an arbitrary scalar a, it also follows that
«(*x) = (MM = -(0w)-
(6)
If k is a natural number, then kx = x+x+...+x(lc terms).
(7)
Proof. The equation (7) is true for k = 1 (M1). We use induction on k to prove (7) in general. Assume that (7) is true for a sum of k— 1 terms, then km = ((Ic—l)+l)x = (k—l)a:+lx = (x+x+. . .+x)+x
where there are k— 1 terms in the brackets. Therefore (7) is also true for k and hence, by induction, for all natural numbers. By using (6), it also follows that
(—m)+(—x)+...+(—x) = Ic(—z) = (—Ic)x = —0, ADAZ >0 and A1+A2= 1, A1x1(O) + A2x2(0) = 1 and A1x1(l) + A2x2(1) 2 0.
that
Theorem 1. If 01 and 02 are convex sets in a real vector space, then 0=pu1 01 +p202 is also convexfor all real .“vlnuzProof. Suppose 951,2:2 e 0, so that $1 = [1.1 011+.Uv2021 and x2=n1012+n2022
where 611,612 (5 01 and 021,622 6 02. If ADAZBO and A1+A2= 1, then 3/ = A15"'1'l')\2@2 = #101 011 + A2 012) + MOM 0'21 + A2 622) = [1,101+];c where cl 6 01 and 62 e 02. Hence y E 0.
32
2.6: CONVEX SETS
Definition 2. Let E be a real vector space and let A be a subset of E. A ‘convex linear combination’ of the elements of A is a linear combination y: 2 Ma: A where AmZOforallx EA. and 2 Az=1. IE :cEA.
Theorem 2. A convex set 0 contains all convex linear combinations of its
elements. Proof. The proof is by induction on the number n of vectors x1, . . .,x,, e 0 in the convex linear combination 3/: Alxl + . . . +Anxn. The theorem is true for n = 2. Now assume that it is true for all convex linear combinations of at most n — 1 vectors. Since at least one of the coefficients, say A”, in the convex linear combina-
tion y: A1561 + . . . +Anx” is strictly positive, we can write 3/ in the form A A y = A1x1+p(;2x2+. . .+;”x,,) where p.=)\2+ . . . +An( #0). The vector in the brackets is a. convex linear combination of :82, . . .,x,, and hence, by the induction hypothesis, it is in 0'.
Since also 951 e 0, A1+p=A1+A2+ . . .+An= l and A1,”.20, it follows that y e 0. The theorem is therefore true by induction for all n. For an arbitrary subset A of E, we will denote the set of all convex linear combinations of the vectors of A by K*(A).
Theorem 3. K*(A) is a convex set containing A.
Proof. 1. Every vector at e A is a. convex linear combination x=1x of itself and therefore K*(A) 2 A. 2. Suppose say 6 K*(A) so that x: 2 0:22:
where
2511
and
Z az=l
and
a220forallzeA,
16A
y = 2 3,2 where 2 [3,: l
and B2 2 OforallzeA.
26A
26A
NOW,if)l+p.=l andA,y.>O,
2 (Aazwfizn MW?! = 254 and in this 2 (Aaz+p.fiz) = l and Aocz+ “[3220 for all z e A. Hence zEA
M+,u.y E K*(A) and K*(A) is convex. 33
CH. 2: VECTOR SPACES
Theorem 4. Let 5’ be a family of convex sets in a real vector space E, then H (5’) is also convex.
Proof. Suppose x1,x2 efl(.9’), so that .7:a e 0’ for all 0 6.57. Now, if A1,)l220 and A1+A2=1, then A1x1+A2x2 60 for all 069’. That is, A1x1+Azx2 eflty).
Definition 3. Let E be a real vector space. The ‘convex hull’ K (A) ofa subset A of E is the intersection of the family of convex sets which contain A. If A is finite, say A={a1,...,a,,}, we will write K(A)=K(a1,...,a,,). It follows immediately from Theorem 4 that K(A) is convex. Theorem 5. The correspondence A—>K(A) is a closure operator, i.e.,
l. K(A) 2 A 2. IfA1 9 A2, then K(Al) g K(Az)
3. K(K(A)) = K(A). Proof. 1. K(A) is the intersection of sets which contain A and therefore K(A) also contains A (Theorem 1.2 ;3).
2. Let y; be the family of convex sets containing Ai(i = l, 2). Then, if A1 9 A2: 5’1 2 «V2 and hence K(A1)=n(y1) E fl(~5’2)=K(A2) (Theorem 1.2 ;2).
3. More generally, we show that if 0 is convex then K(0) = C. In this case, 0 is a convex set containing itself and therefore, by Theorem 1.2:1, K(O’) g 0, but from 1. K(C) 2 C and hence K(C')=0.
Since K(A) is both convex and the intersection of all convex sets which contain A, K(A) could also be defined as the least convex set containing A (Theorem 1.2;4).
Theorem 6. IfA S K(B) andB S K(A), then K(A)=K(B). Proof. By Theorem 5, if A E K(B), then K(A) S K(B) and similarly
K(B) E K(A). Theorem 7. K(A) consists of the convex linear combinations of the vectors of A, i.e., K(A) =K*(A). Proof. 1. K(A) is convex and contains A and therefore, by Theorem 2,
K(A) 2 K*(A). 2. By Theorem 3, K*(A) is a convex set containing A and therefore K(A) E K*(A) (Theorem 1.2;1).
34
2.6: CONVEX SETS
2.6.2
Convex Cones
Let E again be a real vector space.
Definition 4. IfA is a subset ofE, a vector oftheform, y: E Ame: where A4, 2 0 $64
for all x e A is called a ‘positive linear combination’ of the elements of A. The vector y is called a ‘strictly positive linear combination’ if not all Az=0. Definition 5. A subsetP ofE is a ‘convex cone’ ifP contains all positive linear combimztions of any pair of its vectors.
A convex cone is obviously convex in the sense of Definition 1. Convex cones could also be characterized as those convex sets 0 which contain all
positive scalar multiples MM 20) of all vectors a: e O. The theory of convex cones is developed in a way directly analogous to
that of convex sets, and we will therefore only set out the most important results and in the main without proofs. UK is a convex cone and oc>0, then ocK=K. Ifoc P(A) is a closure operator (Theorem 5).
Definition 6. A convex cone K is said to be ‘acute’ ifK fl (—K)={0}; i.e., if a; e Kand —x e Kimpliesx=0. An acute convex cone therefore contains no inverses of any of its vectors
except the zero-vector. A convex cone K is acute if and only if 0 e K is not a strictly positive linear combination of the other vectors in K. For every convex cone K, the set L = K 0 (—K) is a subspace of E and K
is acute only when L = {0}. If to e K, then the coset L +90 is contained in K. The set of all cosets of L which are contained in K will be denoted by K and
clearly K S E/L. Theorem 8. IfK is a convex cone in E, then K is an acute convex cone in E/L. 35
CH. 2: VECTOR SPACES
Proof. 1. Suppose L+x1 and L+x2 e I? and let A1,/\220. Then whacz E K and therefore A1951 + A2 x2 6 K and hence L+(A1x1+)‘2x2) = A1(L+x1)+A2(L+x2) 51?2. If L+xeK and —(L+x)=L—x GK, then to 6K and —x 6K and hencex GK 0 (—K)=L, i.e., L+x=L.
Theorem9. If P,QSE are convex cones such that P+Q=E and P n (—Q)={0}, thenP= —P and Q: —Q, i.e., P and Q are subspaces ofE.
Proof. Suppose p1 e P. From the conditions of the theorem, there are elementsp EP and q e (2 such that ~p1=p+q and hence —q=p1+p. In other words ——q 6P n (—Q)={0}. Thus q=0 andp1= —p e —P. From this it follows thatP g —P and, multiplying by -— 1, that —P E P.
Hence P = —P. Similarly Q: — Q.
Example 3. Every subspace of a real vector space is a convex cone. A eoset is a convex cone if and only if it is a subspace. Example 4. In the vector space 0 (Example 2.1 ;7) the functions 95(7) for which x(7-) 2 0 for — § < 1- &4} form a convex cone. If {31(1) 2 0 and x2(1-) 2 0
for —§< 7- < i- and A1,)(2 20, then A1w1(1)+A2x2(-r) 20 for — 1} S -r< %.
Example 5. Examples of convex cones in the vector space G2 (Example 2.1 ;l) are indicated in the following diagram.
3——
O
(1) (2)
0
e
0
(3)
(4) Fig. 7.
e
2.6: CONVEX SETS
In (3) (halfplane) all positive multiples of the vector e belong to the set, but not the strictly negative multiples. In (4), all multiples of e belong to the set. (1), (2) and (3) are acute but (4) is not. In 03, examples of convex cones are 3-sided pyramids and circular cones (infinite in only one direction) with vertices at the zero-vector. Problems
1. Show that the set of all vectors x: (£1,§2, £3, f4) 6 R4 which satisfy the homogeneous linear inequalities $1 + £2 —- £3 — 254 2 0 and 2§1 + £2 + £3 + 4§4 2 0 is a convex cone.
2. If the right-hand sides of the inequalities in Problem 1 are replaced by 2 and 3 respectively, is the set still a convex cone? Is it a convex set? 3. Show that every subspace ofa real vector space is convex. Is this the case for the cosets? 4. Find conditions for a eoset to be a convex cone. 5. Let E be a real vector space and let x e E. Is the difference set E'\{z} convex? 6. Show that, if A,B are subspaces of a vector space, then
K(A u B) = K(K(A) u K(B)) and K(A+B) = K(A)+K(B).
37
CHAPTER 3
BASES OF A VECTOR SPACE, FINITE-DIMENSIONAL VECTOR SPACES 3.1
Bases of a Vector Space
3.1.1 Linear Dependence and Independence of Vectors
Definition 1. A subset S of a vector space E is said to be “linearly independent’ if there is no linear combination 2] (xxx which is equal to the zero-vector except rates
the one in which az=0for all x eS. A set which is not linearly independent is said to be linearly dependent.
Instead of saying that the ‘set’ S is linearly dependent or independent, we will say that the vectors of S are dependent or independent. It is clear that a set S is linearly dependent, if there is a linear combination y: 2, acts: of 168
the vectors ofS which is equal to the zero vector but has some coefficients not equal to zero.
Theorem 1. A set S is linearly dependent if and only if there is a vector mo in S which is a linear combination of the other vectors in S. Proof. 1. Suppose :60 ES and x0:
2
«22:. Then 110—
zeS\(a:u)
2
«31:0
zes\(:co)
is a linear combination of the vectors of S in which at least one coefficient is not zero.
2. Suppose S is linearly dependent, so that E axx=0 and axoaéO 265‘
for some x0 6 S. Then $0 =
——
2
“xx.
“so zeS\(xo}
Example 1. A subset S of the vector space G2 (Example 2.1 ;1) is linearly independent only when S: {a} and ayéO or S={a1,a2} and aha/2 are not parallel to the same line. In 03, a subset S is linearly independent only in the
following three cases. 38
3.1: BASES OF A VECTOR SPACE
1. S={a}; 0.750. 2. S={a1,a2}; abag not parallel to the same line. 3. S = {a1,a2,a3}; a1,a2,a3 not parallel to the same plane. 3.1.2
The Concept of a Basis
Definition 2. A ‘basis’ B ofa vector space E is a linearly irdepemlent spanning set for E. (Definition 2.4 ;3). Example 2. In 02(G3), a linearly independent set is a basis if and only if it contains two (three) vectors. The elements of a basis (basis vectors) will normally be denoted by the letter e.
Theorem 2. IfB is a basis ofE and B¢e, then every vector x e E can be written in exactly one way as a linear combination
= 2 fee
(1)
eeB
of the elements of B. (B: Q is a basis if and only if E = {0}.) Proof. 1. The fact that every vector x e E can be written in the form (1) is nothing more than the assumption that B is a spanning set. 2. Supposex: E3 fee: EB me. Then 26:3 (fa—1),)e=0and, because B IS linearly independent, 56:1), for all e e B.
Definition 3. The scalars Ea which are uniquely determined when a: e E is written in theform (1) are known as the ‘components’ ofx with respect to the basis B. Thus, for a given vector x e E, there is a unique component Ea corresponding to each basis vector e e B. We note, however, that only a finite
number of these components are not equal to zero. If the basis B is finite or countably infinite, i.e., B ={e1, . . ., en} 0rB= {eh e2, e3, . . .}, we Will denote the components by Q instead of if“ and write, in place of (l), w
n
x: 2 gkek
i=1
and
x: Z Ekek.
k=1
Example 3. In Fig. 8, which refers to the vector space G3, x=e1+e2+2e3. Hence the components of x with respect to the basis B ={e1,e2,e3} are {51:1, §2=11 53:2
39
CH. 3: BASES OF A VECTOR SPACE a:
82
Fig. 8.
Theorem 3. The components of a linear combination of vectors are the corres-
ponding linear combinations of the components of the vectors. in
Proof. Suppose
y: 2 which k=1
and
wk: 2 5M
(lc=l,...,n).
Then
593
n
y: 2 me where 1],: Z «,5k for all e e B. 95B
Ic=1
Example 4. Let B={e1,e2,e3} be a basis of the vector space G3 and let x1=e1+e2—2e3, x2=2e1—e2+e3. Then y=2x1—3x2= —4cl+5e2—7e3.
Hence the components of y are 171: —4=2.1—3.2, n2=5=2.l—3(—1), 713: —7=2(—2)—3.1. For a given vector :1: e E, there is a component 5, assigned to each basis vector e e B. This means that, corresponding to each a: E E, there is a unique element of the vector space F0(B) (Example 2.1 ;4). Hence a mapping f of E onto F0(B) is given by x—>f(x) e Fo(B) Where f (x) is the function on B which takes the value £6 at e e B. (f is real- or complex-valued according as
E is real or complex.) Theorem 3 can now be expressed by saying that the mapping f has the property
f (15:11 White) = i1 abf(xlc), i.e., the image of a linear combination of vectors is the corresponding linear combination of the image vectors.
A mapping of one vector space into another vector space which has this property is known as a linear mapping (see Definition 5.1 ;1). Further, f is 1-1 because, by (1), vectors which have the same components
are equal. 40
3.1: BASES OF A VECTOR SPACE
If the basis B={e1, . . gen} is finite, F0(B) can be replaced by Rn (cf. the remarks on Example 2.1 ;4). In this casef(x)=(f1, . . ., En), i.e., to each a: e E we assign its row of components.
Theorem 4. A subset 18’ of E is linearly dependent if and only if the image set f(S) E Fo(B) is linearly dependent. Proof. Sincef is a linear l-l mapping of E onto Fo(B), Theorem 4 is a direct consequence of Theorem 5.1 ;5, which we may use here in advance. Example 5. Let B={e1,e2,e3} be a. basis of 03. Further let x1 = e1+2ez—e3 6 G3
so thatf(x1) = (1,2,—1)ER3
x2 = -—e1+e2+2e3 603
so thatf(a:2) = (—l,1,2) 6R3
and x3 = —e1+4e2+3e3 603 so thatf(x3) = (—l,4,3) 6R3. Now, in R3,
f(x1)+2f($2)-f(x3) = (1,2.-1)+2(-1»1,2)-(-1,4,3) = (0,0,0) = 0 6R3, i.e., the rows of components are linearly dependent in R3. Hence the same is true for $1,222,233 in the vector space G3. The next two theorems show that the concept of a basis can be defined in two ways other than that in Definition 2.
Theorem 5. A ‘linearly independent’ set B Q E is a basis if and only if it is ‘maximal’, i.e., if every set which properly contains B is linearly dependent. Proof. 1. Suppose B is a maximal linearly independent set and let 2 e E\B. Then B U {z} is linearly dependent and there is an equation
2 aee+ fiz = 0 as]?
in which not all the coefficients ace and ,3 are zero. In this case [3 #0 because otherwise B would be linearly dependent. Hence we can solve the equation
forz and z = (— l/B) Z are e means thatz eL(B). Thus B is a spanning set for eeB
E and hence a basis. 2. Suppose B is a basis of E, that A is a subset of E properly containing B, and that z 6 A\B. Now 2 is a linear combination of the elements 41
CH. 3: BASES OF A VECTOR SPACE
of B (Theorem 2), so that A is linearly dependent (Theorem 1). Hence B is a
maximal linearly independent set.
Theorem 6‘. A ‘spanmmg set’ B of E is a basis if and only if it is ‘mlm'mal’, i.e., if every proper subset of B does not span E. Proof. 1. Suppose B is a basis, A is a proper subset of B and z E B\A. Since B is linearly independent, z ¢ L(A) and therefore A is not a spanning set for E. Hence B is a minimal spanning set for E.
2. Let B be a spanning set for E which is not a basis, i.e., which is not linearly independent. Then there is an element z 63 which is a linear
combination of the other elements in B, i.e., z e L(B\{z}). But then B E L(B\{z}) and, by Theorem 2.4 ;7, it follows that L(B\{z}) = L(B) = E. Hence the proper subset B\{z} g B is a spanning set for E and B is not minimal. Examples of Bases in Vector Spaces
We will describe bases for some of the examples of vector spaces introduced in 2.1 , but it must be noted that, in each case, in addition to the given
bases, there are infinitely many others. We note also that the following discussion holds both for real and complex vector spaces. Example 1. G2, G's. We have already described all the bases of G2 and G3 in Example 3.1 ;2. Example 2. R”. The set of the following 11, vectors is a basis of R”
e1 = (1,0,0, . . .,0,0),
e2 = (0,1,0, . . .,0,0), . . .,
en = (0,0,0, . . .,0, 1).
At this point it is convenient to introduce the function which is known as the Kronecker delta 8“,. The symbol 8,7, takes the value 1 when l: k and 0 when ’lgék. Usually l and [C will be natural numbers, but they may also be elements of an arbitrary set (Example 4). With the help of this symbol, we
can describe the given basis for Rn in the form 8,; = (8i1!3i2""!8in)
(7; = l,2,...,'n).
We show that {eh . . .,e,,} is a basis as follows. Ifx= (£1, . . ., E”) e R", then II
n
x = Z {kek so that e1, . . .,e,, span the Whole ofRn. Also, if 2 Ekek=0, then 15—1 Ic=1 (£1, . . ., if") :0 and 61:. ..=f,,=0. Hence e1, . . .,e,, arelinearlyindependent. At the same time, we see that, if x=(fl, .. .,§,,), then the scalars $1,...,§n are the components of xwith respect to the given basis. 42
3.1: BASES OF A VECTOR SPACE
Example 3. The discussion of a basis for the space of series F is outside the scope of this book because it requires transfinite methods. The countably
infinite set of vectors
6i = (3:1, 3a, 81'3, - - -)
(’i = 1, 2, 3, . - ')
is a basis of the vector space F0. Example 4. The discussion ofa basis for the vector spaceF(A) is also outside
the scope ofthis book except when A is finite. In the latter case, the situation is analogous to the vector space R”. The functions ex(x e A) given by (543/) = 8x11
(3/ E A)
form a basis B of the space F0(A), where 8”: 1 if x=y and 8,51,:0 if xaéy. Proof. 1. 2 axex=0 means 2 azex(y)=ay=0 for all y e A. Therefore B zeA
xeA
is linearly independent. 2. An arbitrary mapping f eFo(A) can be written in the form
f= “A 2 Mex, because «:64 2 f(w) cm =f(.1/) for any at Example 5. The vector space P" is obviously spanned by the polynomials fl
ei which are given by eI-(—:-)=-ri (i=0,. . .,n). Now suppose 2 Met-=0, i.e., i=0 2 Airi=0 for all 7-. From the well-known fact that a polynomial only
i=0
vanishes identically if all its coefficients are zero, it follows that Ai=0 for
i=0,1,. ..,n. Hence eo, . . .,e,, are linearly independent and therefore form a basis.
Example 6'. The countably infinite set of polynomials ei, where et-(-:-)='ri (Ii: 0, l, 2, . . .) is a basis of the space of polynomials P. The discussion ofa basis for Example 7 (the space of continuous functions)
is again outside the scope of this book. Example 8. The countably infinite set offunctions eo, eh andfk (k = 1 , 2, 3, . . .), where eo(1) = ;, ek(7-) = cos Ic-r and fk(1-) =sin 1:1, is a basis of the vector space T of trigonometric polynomials. It follows immediately from the definition of T that these functions span T. The fact that they are also linearly independent follows from the orthogonality relations which are proved in most standard texts on integral calculus. 43
CH. 3: BASES OF A VECTOR SPACE 2n
That is,
o ifjc
I cosja- coskrd-r 0 2n
fsinj-rsink-rd-r = 0 ifj ya lo 0
21’
0
f cosjq- sin Im- (11' o 2!:
2w
f (cosk'r)2dr = f (sink¢)2dr = 1r(k > 0) 0
0
Hence, if “080+
Z (achek+flkfk) = 0, k=1
co
i.e., if $050+ 2‘, (ackcosk-r+flksinlc7) =0, for all 1-, and we want to show that k=1 ,Bj=0, say, then we multiply both sides of the equation by sin3'1 and integrate from 0 to 271’. (It is possible to do this term by term because only a finite number of the coefficients are not zero.) In view of the orthogonality relations it follows that 21r
I Bj(Si-nj7)2d1' = 11,35 = 0 and therefore ’35 = 0.
o
The proof that tag-=0 is just the same except that we multiply by cosj-r. In the complex case (Example 2.1 ;l5) another basis is the set of vectors
e], given by ek(r) =e“"" (10:0, :1; l, i 2, . . .). It has already been shown that these span the vector space. Their linear independence is proved as follows. + so
+ co
E
akek = 0 means
_
2
ake'k" = 0for all -r.
Ic= — no
k = — “0
Multiplying by e—‘j’ and integrating from ~11 to +11- (again only a finite number of the terms in the series are not zero), it follows that
+w E
Ic= — no
+" . .
on,: I e'(’°")"dr = 27m]- = 0
and therefore
a,- = 0,
—1r
forj=0,il,i2,... (In these formulae i is the imaginary unit 1/ -- l and c are integers.) Example .9. The linear forms e1, . . .,e,, given by ek(§1w-U§n) = E];
are a basis of the vector space L".
44
(k = l,...,n)
3.1: BASES OF A VECTOR SPACE
Example 10. The real vector space of complex numbers has the set {1,1} as a basis, where It is the imaginary unit. Example 11 . The vector space SR of real scalars has the set B={l} as a basis, and the same is true for the vector space SK of complex scalars (Example 17). 3.1.3
The Existence of Bases in a Vector Space
We Will now show that every vector space has a basis. The general proof is not completely elementary because it uses results which depend on the axiom of choice or equivalently on the theory of wellordered sets. We will base our proof on a theorem due to Zorn (usually referred to as Zorn’s Lemma), which we will assume without further explanation. For those readers who are not familiar with this, we will also
give an elementary proofwhich is only valid however for those vector spaces which have a finite or countably infinite spanning set.
Theorem 7 (Zorn). A non-empty partially ordered set, in which every nonempty chain (i.e., totally ordered subset) has an upper bound, has a maximal element.
A proof of Zorn’s Lemma can be found for example in [23] p. 197. Theorem 8. Every vector space has a basis. Proof. Remembering Theorem 5, we show that there exists a maximal
linearly independent subset B of E. Let M be the family of all linearly independent subsets A of E. .2! is partially ordered by the relation A1 9 A2
(for A 1: A 2 e .11). A chain infl is a subset .97 ofM with the property that, if ADAZ 6%, then either A1 g A2 or A2 g A1. For each non-empty chain, we form the union Stfl) =U(.@) S E. Then 1. SL9?) is linearly independent. Suppose
2
(xxx: 0, Where only finitely many of the coeflicients are not
rest?!)
zero. We denote the corresponding vectors by «:1, . . .,:c,, and their coefficients by «1, . . ., can. To each vector wk, there exists a set A k 6 fl, such that
16,, e A k (k=1,. . .,rl). Since .9 is totally ordered, one of these sets A L. will be the largest. If we denote this by A k0, then A L. E A be for all h: l, . . .,1’l. This means however that $1, . . .,x,, e A h and it follows, from the linear indepen-
dence of A k0, that och = 0 for h = l, . . ., n. Hence 61%) is linearly independent. 2. 8(33) 2 A for all A e .93. This follows immediately from Theorem 1.2;]. 45
CH. 3: BASES OF A VECTOR SPACE
Now 1. means that St?) 6 M and 2. means that St?) is an upper bound for .93 in .51. Thus the partially ordered set .2! satisfies the conditions of Zorn’s Lemma because 4% is not empty, since a e .51. Therefore .51 has a maximal element B and this set B is a basis of E. After this general proof, we will now show in a more elementary way that a vector space E, which has a finite or a countably infinite spanning set, has a basis which is either finite or countably infinite. Suppose that A E E is a spanning set for E which consists of the finite or countably infinite number ofvectors ek(k = l , 2, 3, . . .) where e1 aé 0. (IfE = {0}, then B: 35 is a basis.) We construct a subset A* of A by putting e1 6 A* and, fork 2 2, eh e A* if, and only if, eh is not a linear combination of e1, . . ., ek_1, i.e., if ek (f L (e1, . . .,e,,_1). Then A* is a basis of E, because
1. A* is linearly independent. If A* were linearly dependent, then there '1
would be an equation 2‘, akek= 0 with n > 0, e,, e A* and an yé 0. This could
Ic=1 be solved for en as a linear combination of e1, . . ., e,,_1 in contradiction of the
definition of A*. 2. A * spans E. We only need to show that A is contained in L(A*) because then E =L(A) SI. L(L(A*)) =L(A*) (Theorems 2.4 ;5 and 6) and hence E: L(A*). If A were not contained in L(A*), then there would be an element ek e A
which was not in L(A*). Suppose that eko is the first element of A with this property. Certainly eke 9% A*, and therefore eko e L(el, . . .,e,,,_1). (Note that, since e1 6 A*, he 22.) But e1, . . .,eka_1 e L(A*) and hence e,“ e L(L(A*)) =
L(A*) and this is a contradiction. We note that this elementary proof is still valid for an arbitrary spanning set A of E, if A can be so ordered that every non-empty subset of A has a first element. That this is possible for all sets (even when they are not countable) is the content of the well-ordering theorem (see [23] p. 198). Theorem 9. In a vector space E, every linearly independent set 0 can be
completed to a basis of E, i.e., there is a basis B of E such that B 2 0.
Proof. The proofis almost the same as that ofTheorem 8, the only difference being that the family 4% consists of those linearly independent sets which contain 0.
Theorem 10. Every spanning set 0 of a vector space E contains a basis, i.e., there is a basis B of E such that B E C. 46
3.2: FINITE-DIMENSIONAL VECTOR SPACES
Proof. Again the proof follows that of Theorem 8 except that here .52! is the family of linearly independent sets A which are contained in 0. This shows that there is a linearly independent set B which is maximal in 0, i.e., a
linearly independent subset B of 0 such that every subset of 0 which properly contains B is linearly dependent. Then L(B) 2 0 and, because
L(C) = E,
L(B) = L(L(B)) 2 L(C’) = E. Hence B is a basis ofE.
Problems
1. Show that the vectors x1, . . .,x4 in Problem 2.3 ;2 form a basis of R4.
2. Let E be a vector space. Prove that (a) BS is a linearly independent subset of E and R is a subset of S, then
R is linearly independent. (b) H8 is linearly dependent and R 2 S, then R is linearly dependent. (c) If 0 e S, then S is linearly dependent.
(d) If x e E, then {x} is linearly dependent if and only if x=0. 3. Prove that, ifx, y are linearly independent elements ofthe vector space E, then x+y and x— y are also linearly independent.
4. Let 7-0, . . ., 7-,, be distinct real numbers. Show that the polynomials 72.;(1) of degree n which are given by “(7,5) = 8” (i, Ic=0, . . .,n) form a basis of the vector space Pn (Example 2.1 ;5). 5. Let B1 and B2 be bases of the vector spaces E1 and E2 respectively. Show that the pairs (e1,0) where e1 6 B1, and (O,e2), where e2 6 A2, form a basis of the direct product of E1 and E2 (see Problem 2.1;4).
6. Prove Theorems 9 and 10 in the case when E has a finite or countably infinite spanning set. (Use the proof of the existence of a basis in this special case.)
3.2 Finite-Dimensional Vector spaces Definition 1. A vector space is said to be ‘finite-dimensional’ if it has a finite spanning set.
Examples. The Examples 02, G3,Rn,Pn, Ln,K,;S'R of 2.1.1 are all finitedimensional because they all have finite bases. In view of Theorem 3.1 ;10, every finite-dimensional vector space E ;é {0}
has a finite basis. The most important property of this type of vector space is that all bases contain the same number of elements (see Theorem 2). The proof of this depends on the following Exchange Theorem which is due to Steinitz. 47
CH. 3: BASES OF A VECTOR SPACE
Theorem 1. Let E be a vector space, a1,...,a,, 6E and L=L(a1,...,a,,). If the subset O of L is linearly independent, then 1 . 0' is finite and the number m of elements in 0 is m < n. (We will therefore denote the elements of 0 by e1, . . .,e,,,.) 2. For each integer r,00; y,- e P and there is a representation x1 = E Aiyi Where 1. =1 i=1 yiaéxl for i=1, ..., n. Hence 7':
r
1'1 = Z A: 2 Flax/c,
i=1
k=1
(1)
7‘
where Z a“: l and a¢k>0fori= 1, ..., nandlc: l, . . ., r. The coefficient of k=1 n
:51 on the right-hand side of (l) is 2 Aim-1 which is strictly less than 1.
i=1
Since é Ai=l and Ai> 0, it could only be equal to 1 if i=1
#11 =F21==Pm1 = 163
3H. 3: BASES OF A VECTOR SPACE
But this would mean that y,-=:zal for all i, which is a contradiction of the assumptions. Hence (1) can be solved for :01, to obtain n
—1
971 = (1— 2 Aillvil)
i=1
r
Z
n
Z Ail‘ikxk'
k=2 i=1
The coeflicients on the right-hand side are all positive or zero. Their sum is n
n
-1
(1 — E Adm) i=1
n
r
i=1
Ic=2
i=1
n
—1
2 A: 2 Me = (1— 2 MM)
.2 Ai(1’I‘1‘1) = 1-
|=1
Therefore as] is a convex linear combination of 2:2, . . ., x,.
Theorem 3. A convex polyhedron is the convex hull of its vertex-vectors. Proof. Let P =K(x1, . . .,x,) and suppose that $1 is not a vertex-vector. By Theorem 2, {$1, . . .,x,} g K(x2, . . .,x,). Using Theorem 2.6 ;6, it follows that P=K(x2, . . .,:e,). Now, if there is another of the vectors 2:2, ..., x, which is not a vertex-vector, then this can also be left out in the same way, etc. 3.4.2
Simplexes
Suppose P=K(xo, . . .,x,) is a convex polyhedron. By Theorems 2.5310 and 2.6;7, P g N(P)=N(x0, . . .,a:,). This coset has dimension at most r, where the dimension of a coset is defined to be the dimension of the corresponding subspace. The subspace is in fact L(x1—x0,...,x,—xo), (cf. the proof of Theorem 2.5;10.) Definition 3. A convex polyhedron P=K(xo, . . .,x,) is said to be an ‘r-dimen-
siondl simplex’, if the dimension of the coset N(x0, . . .,x,) is equal to r.
If P is an r-dimensional simplex, it is easy to see that all of the vectors x0, . . ., x, are vertex-vectors. For instant, if :10, were not a vertex-vector then r —- l
r— 1
there would be a representation at, = Z Ak 2.7,, where 2 Ax, = 1 (Theorem 2). k =0 Ic—O But then x, eN(x0,...,x,_1) (Theorem 2.5;10) and the dimension of the coset N(:00, . . .,x,) would not be greater than r— 1. Example 3. In the vector space G2 (Example 2.1 ;1) every simplex is one of
the following types. (a) consisting of only one vector (dim. 0) (b) the convex hull of two distinct vectors (segment, dim. 1) (c) the convex hull of three vectors whose end points are not collinear (when they are drawn from a common origin) (triangle, dim. 2)
64
3.4: CONVEX POLYHEDRA
In the vector space 03, the same cases appear as in 02 and also
((1) the convex hull of four vectors whose end points are not coplanar (tetrahedron, dim. 3).
Example 4. In a real vector space E, suppose that e1, ...,e, are linearly independent. Then S =K(0,e1, . . .,e,) is an r-dimensional simplex.
Theorem 4. If S =K(xo, . . .,x,) is an r-dimensional simplex, then every vector 2: e S can be written in exactly one way as a convex linear combination of $0, . . ., x1" 1‘
7'
gM~
Ala:
a-
tr
gM~r
Proof. Suppose that x: 2 May, and x: Z pkxk, where Ic=0 k=0
M=L
11M*
Then 8"
1(Ak—m)(xk—xo) = 0
and, since the vectors x1 — x0, . . ., x,— 950 are linearly independent, it follows
that Ak=llk for k: 1, .. ., r and hence that Ao=po. If r = n = dimE, then every vector x e E can be represented in exactly one 7‘
r
way in the form x: 2 May, where Z Ak=l. The numbers A0,...,}\,, k=0 i=0 r
which are uniquely determined by x and the condition 2 Ah = l, are called k =0 barycentric co-crdinates in the vector space E with respect to the simplex S. If an, . . ., x, are the vertex-vectors of an r-dimensional simplex 8,, then, for 0 g s S r, any 3 +1 of these vectors Will be the vertex-vectors of an
s-dimensional simplex S, contained in 8,. We refer to S; as an s-dimensional face of 3,. The number of s-dimensional faces of an r-dimensional simplex is (::1) In particular, there are r+l (r—1)-dimensiona1 faces and r+l
0-dimensional faces (each consisting of just one vertex). S, has just one r-dimensional face, viz. itself.
Theorem 5. The intersection of two faces of a simplex is either a face or it is empty.
Proof. 1. From Theorem 4, it follows that the intersection is empty when the two faces have no vertex-vectors in common. 65
CH. 3: BASES OF A VECTOR SPACE
2. If the two faces have vertex-vectors in common, their intersection
is the simplex spanned by these vertex-vectors. Theorem 6. The intersection P of a coset N and a simplex S is either a convex polyhedron or it is empty. Proof. 1. If Paé r0“ , then, for each y e P, we can construct a unique face 8,,
of S which contains 3/ and has the least possible dimension. (By Theorem 5, 8,, is uniquely determined as the intersection ofall the faces which contain g.) If P n 8!, = {y}, we will say that y is a distinguished vector of P. Since each face ofS can be the 8,, for at most one distinguished vector y and S has only a finite number of faces, it follows that there can only be a finite number of
distinguished vectors. 2. We will now show that P is the convex hull of its distinguished vectors and thin will prove the theorem.
Fig. 12.
3. Suppose that y eP is not distinguished. Then there is a vector at e E’, (1750 such that y+d e P n S”. For arbitrary real numbers ”I, #2 it then follows that
21 = 2’!+P«1d EN and z2 = y—nzdeN. We will show in 4. that we may take #1, ”.2 to be positive and so choose them that 21 and 22 lie in proper faces of 8"] (Fig. 12). Then .7/
.“1 + #2
(M2 21 + #1 22)
i.e., y is a convex linear combination of 21, 22 e P. If 21, 22 are not distin-
guished, we can repeat the process just described for each of them. After a finite number of steps (at the latest when we reach 0-dimensional faces of S”), we obtain a representation of y as a convex linear combination of distinguished vectors of P. 4. It only remains to prove the assertion about the choice of [1.1 and [’42- If S” =K(xo, . . .,x,), then there are representations
66
3.4: CONVEX POLYHEDRA 3
8
3/ = Z Akxk and y+d = 2 (Ak+8k)xlc
h=0
k=0
whereAk>0(k=0,.. ,;8) logo Ak=l and 3:10 3).: —0. SincedaéO, notallSk are zero and therefore there must be one whicho'1s strictly positive and one which is strictly negative. We now choose [1.1 > 0 so that A), + #1 8k 2 0, for k: 0,.
and at least one of these is zero—which is possible because A), > 0 for all 8k and 8h < 0 for some k. Hence 8
31 = y+p1d = Iago (Ab+.u18k)xk 631/Since one of the coeflicients is zero, 21 actually belongs to a proper face of8,]. The choice of #2 is made in a similar way. 3.4.3
Convex Pyramids
Let E be a finite-dimensional real vector space. Definition 4. A ‘comzex pyramid’ P g E is the positive hull of afinite number of vectors x1, . . .,a:, e E (cf. 2.6.2).
A convex pyramid is a convex cone and is therefore convex. Example 5. In Example 2.6 ;5, the convex cones (1), (2) and (4) are convex pyramids in the vector space 02. On the other hand (3) is not a convex
pyramid. In 03, pyramids in the usual sense are convex pyramids providing they are convex and their origins correspond to the zero-vector. Example 6. The set of all vectors x e E, for which all the components (with
respect to some basis) are positive, is a convex pyramid. It is the positive hull of the basis. Every subspace L g E (and hence E itself) is a convex pyramid. If {e1, . . .,e,} is a basis ofL, then L=P(el, ...,e,, —e1,.. ., —e,).
The theory of convex pyramids naturally develops along similar lines to that of convex polyhedra. We will therefore first look for the analogues of the vertex-vectors. These are what are known as the edge-vectors.
In order to simplify the definition of these, we will say that two vectors x, y e E are similar if each is a strictly positive multiple of the other, i.e., y= are with ac > 0. (The zero-vector is therefore only similar to itself.) 67
CH. 3: BASES OF A VECTOR SPACE
Definition 5. A vector x of a convex pyramid P is an ‘edge-vector’ of P if, 7
from x: 2 Ahab, Ak>0, xk 6P, xkaéO (k=1,...,r), itfollows that x1, ...,x, Ic=1 are similar to x.
If x is an edge-vector of P, then so is every vector which is similar to at. Remembering Theorem 3, we could conjecture here that every convex pyramid is the positive hull of its edge-vectors. However, we can see that this is not always true, by considering the example of the convex pyramid
E which has no edge-vectors at all. Theorem 7 . Every acute convex pyramid is the positive hull of its edge-vectors, where it is sufi‘icient to choose just one representative from each class of similar edge—vectors. (See Definition 2.6;6.)
The proof of this theorem is directly similar to that of Theorem 3 and so we will not write it out again. Problems
1. Prove that, if P, Q are convex polyhedra, then so are K(P U Q) and
AP+ [LQ (for all real A, [1,). 2. Let {e1,. ..,e,.} be a basis of a real vector space. Show that e1, ..., en are the vertex-vectors of an (n— 1)-dimensional simplex S.
3. In Problem 2, let 10:13,, and ei=(8i1,...,8m) (i=l,...,n). Find a condition on £1, . . ., f" in order that a: (£1, . . .,En) e S. 4. Prove that, in a finite-dimensional vector space, the set of all those
vectors all of Whose components (with respect to a. given basis) are positive or zero is a convex pyramid. 5. Prove that, if K is a convex polyhedron, then P(K) is a convex pyramid.
68
CHAPTER 4
DETERMINANTS 4.1
Permutations
A permutation of a finite set of n elements is a mapping of the set onto itself.
Since it is a mapping ofa finite set onto itself, a permutation is automatically 1-1. If we denote the elements of the set A by l, 2, . . ., n and a permutation
of A by 45, then ¢(Ic) will take each of the values from 1 to n exactly once as 76 runs through the elements of A. (The use of small Greek letters to denote permutations will be the only departure from our usual convention of using small Roman letters for mappings.)
A convenient method of presenting a permutation is to write down the elements of A in a row and then, underneath each of these, to write down its
image under the permutation. For example, if the permutation is written intheform
¢_12345 1‘25314 then this means that 951 is the mapping such that ¢1(1)=2, ¢1(2)=5, ¢1(3)=3, etc. The total number of permutations of a set of n elements is n!=n(n—l)....3.2.l, because there are n possibilities for the choice of
9151(1), and then (n—l) possibilities for ¢1(2) (951(2);é¢1(1)), etc. Finally 95102) is completely determined by 561(1), ..., ¢1(n— 1). Since permutations are mappings, they can be multiplied together (see 1.3.1). The product 452351 of the permutations «751 and 952 is given by k—>¢2 (151(k) = ¢2(¢1(k)). This multiplication of permutations is not commutative in general. For example, if (#2 is the permutation
¢_12345 2“54132
12345
the“
¢2¢1=(4 2 1 5 3)
and
¢1¢2=(4 1 2 3 5)¢¢2¢‘
12345 69
CH. 4: DETERMINANTS
On the other hand, the multiplication of permutations is associative (Theorem 1.3;1). The identity permutation e is given by 606) =Ic (k: 1, . . .,n) and we see
that 956 = set = 4) for all permutations 95 Since a permutation (I: is a 1-1 mapping, it has an inverse 46—1 for which
95*1 93 = 45¢‘1 = 5 (see 1.3.2). This can be found by interchanging the rows in the above representation of g5 For example,
_l_25314_12345
¢1_(12345)’(41352) These rules show that the permutations of a set of n elements together with their multiplication form a group, which is known as the symmetric group S”. (See [25] pp. 61—64.)
If ll! is any permutation of A, then, as (f) runs through all the permutations ofA , so do ¢ll1 and (MS because, for any permutation 0, 0 =(05b‘1)1/J = giro/1’1 0). Also 55* runs through all the permutations as zfi does, because 0 = (0‘1)‘1 for all permutations 0. A transposition is a permutation which interchanges two of the elements
and leaves all the rest fixed. Thus the permutation ()5 is a transposition when there are two elements ’61, 162 e A such that q5(lc1) = 102, 45062) = k1 and ¢(k)=k for all kaélcl or 102. We will denote this transposition briefly by (k1 k2). Obviously, every
permutation can be written as a product of transpositions and in fact in many different ways. For example,
¢1 = (1 4)(1 5)(1 2) = (2 5)(1 5H4 5) Theorem 1. It is not possible to write a permutation both as a product of an odd number of transpositions and as a product of an even number of transpositions.
Proof. The nature of the elements to be permuted is not relevant to the statement of the theorem. We will assume that they are real variables and
denote them by fl, ..., 5,, (instead of l, . . ., n). The function A(§1a--w§n) = (§1_§Z)(§l_§3)'“(§1_§n)
-(§2—§3)---(E2—§n)
Mai—a) has the special property that, if fl, . . ., f” are permuted, then either it is unchanged or it is simply multiplied by —- 1. Any transposition (for example of £1 and {2) has the latter efi'ect. Hence a permutation which is a product 70
4.1: PERMUTATIONS
of an odd number of transpositions will have the effect of multiplying A by —l and a product of an even number of transpositions will leave A unchanged. Consequently no permutation can be both a product of an odd number oftranspositions and a product of an even number oftranspositions. In View of Theorem 1, we can make the following Definition. Definition 1. A permutation is said to be ‘even’ (odd) if it is the product of an even (odd) number of transpositions. The ‘charcwteristie’ ch (45) ofa permutation
qS is equal to +1 if¢is evenandis equal to —1 iq is odd. Theorem 2. 1. ch (6) = l
2- (311(9152 451) = 0h (¢2) ch (951) 3. ch(¢_1) = ch (¢) Proof. The identity permutation is the product of no transpositions and is therefore even. A representation of 952 (#1 as a product of transpositions can be found by multiplying such representations of 961 and 452. The second rule follows from this. Finally 56. 95-1 = s, and by the first and second rules,
ch(¢) .ch(¢_1) = 1, from which the third rule follows. Exercises
1. Decide whether the following two permutations are odd or even. 1
2
3 4 5
(“)¢=(5 2 4 1 3) Solution. (a) gbis odd
l
2
3 4 5
(b)¢=(4 3 1 5 2)
(b) l/IiS even.
2. Calculate the product (fix/I of the permutations in Exercise 1. Solution. ¢¢=G
:
4
g 3
:)'
Problems
1. Show that, if rfi is a permutation, then there is a power (’3’ of ¢ which is equal to the identity permutation. (Hint: consider the countably infinite set ofpowers ¢, (152, 433, . . . and show that there are two of these with different
exponents but which represent the same permutation.) 2. Prove that the set of all even permutations of n symbols is a group. This is known as the alternating group A”. Is this also true for the set of all odd permutations? 71
CH. 4: DETERMINANTS
4.2
Determinants
4.2.1
The Concept of a Determinant
Let E” be an n-dimensional vector space and let {eh . . .,e,,} be a basis of En which will be kept fixed in the following. We consider functions D(:t1, . . ., x"), Whose variables x1, . . .,:z:,, are vectors in En and Whose values are scalars. Thus, corresponding to each ordered set {2:1, . . .,w,,} E En, there is a scalar value of the function D(x1, . . .,:c,,).
Definition 1. D(xl, . . up”) is said to be a ‘determinant’ in E” (with respect to the basis {eh . . .,en}) if the following conditions are satisfied. D1. If a permutation gt is applied to the variables, then D(x1, . . .,z,,) is
multiplied by the characteristic of ;1). That is to say, D(¢(x1), ...,¢(z,,))= ch(¢). D(x1, . . .,:c,,). (In particular, D(x1, . . .,x,,) is multiplied by —1 if two
vectors are interchanged.) D2. D(x1, . . .,:e,,) is linear in each variable xk. This means, for example, when k: l, D(°6771+/3y1» $2, ' ' U x") = “D(x1: x2: ' ‘ " xn)+p‘D(:’/1: $2, ' ' ': £11.)
for arbitrary scalars a, B (of. Definition 6.1 ;1). D3. D(a1, . . .,x,,) is normalized, i.e., D(e1, . . .,e,,)= l.
The dimension of a determinant is the dimension n of the vector space En or equivalently the number of variables in D(x1, . . .,x,,).
Of course it is not claimed initially that there exists a determinant in En with respect to the basis {e1,..., en}. However, in the following, we will prove that there is in fact exactly one determinant. Theorem 1. Let qS be a mapping of the basis {e1,. . .,e,,} into itself (95 is not necessarily a permutation). Then, if there is a determinant D, I = ch (d), if 95 is a permutation D(¢(e1), ' ' " 4’05"» 1 = 0, if two of the vectors ¢(e1), . . ., q5(en) are equal.
Proof. If cfi is a permutation, the assertion follows from D3 by substituting the basis vectors e1, . . ., e" for $1, . . ., x” in D1. On the other hand, if 95 is not a permutation, then there are two distinct basis vectors eh, e,m such that
net.) =¢g(x) where g(x) = a: +f(x) also linear?
86
5.2: LINEAR MAPPINGS OF FINITE-DIMENSIONAL SPACES
6. To each function x(-r) e 0 (Example 2.1 ;7), there is assigned the function y(-r)=x(r) +:c(—-r) e 0. Does this define a linear mapping of 0 into itself? What is its kernel? Answer the same questions for 2(7) =x(r) -—x( — r).
7. The direct product of two vector spaces E1 and E2 (Problem 2.1 ;4) is mapped into itself by f((x1,x2)) = (x1,0). Is f linear? What is its kernel? 8. Let K be a convex subset (cone, polyhedron, or pyramid) of a real
vector space E and let f be a linear mapping of E. Show that f (K) S f(E) is also a convex set (cone, polyhedron, or pyramid). Is it possible for f (K) to be acute, if K is not acute? (see Theorem 2.6 ;8). 9. Prove that, ifL1 and L2 are subspaces ofa vector space, then the quotient spaces (L1+L2)/L1 and Lz/(L1 n L2) are isomorphic. Derive a new proof for Theorem 3.2 ;7.
10. Let L be a subspace of the vector space E. Show that the set of all linear mappings of E into a vector space F for which f(L): {0} is a subspace of f(E,F). More generally, show that the same is true for those mappings for which f(L) g M, where M is any subspace of F. 11 . Prove that the inverse image of a subspace under a linear mapping of a vector space E is a subspace of E. 5.2 Linear Mappings of Finite-Dimensional Vector Spaces, Matrices 5.2.1
The Bank of a Linear Mapping
Theorem 1. Suppose E is a finite-dimensional vector space. A vector space F
(with the same scalars) is isomorphic to E if and only ifF is also finite-dimensional and dimF = dim E. Proof. Theorem 5.1;5.
Definition 1. Let E be a finite-dimensional vector space and let f be a linear mapping of E into a vector space F. Then the ‘rank’ off is the dimension of the imagef(E) E F. (If E is finite-dimensional, then so is f(E).)
Theorem 2. If K is the kernel of the linear mappingf, then the rank off is equal to dim (E/K) = dimE—dimK. Proof. By Theorem 5.1 ;9, f(E) and E[K are isomorphic. Hence the assertion follows directly from Theorems 1 and 3.2 ;5.
Theorem 3. A linear mapping f of the finite-dimensional vector space E is 1-1 if and only if its rank is equal to dimE. 87
CH. 5: LINEAR MAPPINGS 0F VECTOR SPACES, MATRICES
Proof. 1. Iff is l-l, then, by Theorem 5.1;8, K = {0}. Hence the rank off is equal to dimE' (Theorem 2). 2. If the rank off is equal to dim 11', then, by Theorem 2, K = {0} and
hence f is 1-1 (Theorem 5.1;8). Theorem 4. Suppose dimE =dimF. A linear mapping f of E into F is 1-1 if and only if it is onto F. Proof. 1. Iff is 1-1, then, by Theorem 3, dimE = dimf(E) = dim F and hence f(E) =F by Theorem 3.2 ;4. 2. Iff(E) =F, then the rank off is equal to dimE and, by Theorem 3, f is 1-1.
5.2.2
Linear Mappings and Matrices
We will now investigate how the components of a vector can be used to express the components of its image under a linear mapping. Suppose therefore that {e1,...,en} is a basis of E and that {f1, . . .,fm} is a basis of F. Suppose further thatf is a linear mapping of E into F which, in accordance with Theorem 5.1 ;3, is given by
M) = E «if.
(k = 1,2.....n).
(1)
n
Then the vector x e E, x: 2 fkek, is mapped onto
k=1
fix) = 2": 51.71.91): E [ é aik§k]f£-
L-=1
t=1
The components 7h off(x) are therefore fl
”’7: = 2 “ikgk
(1: = 1,...,m).
(2)
Thus we have the following result. Theorem 5. Under a linear mapping, the components of the image of a vector x are linear funetionals in the components of 2: (cf. 6.1 ;(1)). We note that the coefficients “1k!"‘)°‘mk are the components of f(eh) (k=1,...,n).
For given bases, the linear mapping f is uniquely determined by the
88
5.2: LINEAR MAPPINGS OF FINITE-DIMENSIONAL SPACES
coefficients am which, as in an exchange tableau (of. 3.3), we usually arrange in the form of a rectangular array. 0‘11
0:12
“21
(122
“in ..n
“2"
A:
=(“ik) “m1
“m2
"-
“mu
We refer to a rectangular array of numbers like this as a matrix. The matrix A has m rows and n columns. The mn scalars acik are known as the elements of the matrix. The matrix A represents the linear mapping f with respect to
the two given bases of E andF, and we Will say that A is the matrix off with respect to these bases. Referring to a matrix which has m rows and n columns as an m x n matrix, we can easily verify the following theorem. Theorem 6. Let E and F be vector spaces of dimensions n and m in which fixed bases are given. Then each linear mappingf of E into E corresponds to an m x n matrix A = (aw) such that f is given by (2). Conversely (2) represents a linear mapping of E’ into Efor any m x n matrix A = (aw). The correspondence
between the linear mappings of E into E and the m x n nmtriccs is 1-1. Now, suppose that F is mapped by a further linear mapping 9 into a finite-dimensional vector space 0. Let the relation corresponding to (2) for g be
{h = E1 Bram
(h = 1,...,p)
(3)
where p is the dimension of G and {1, .. ., 5,, are components with respect to a given basis of 0. Thus the mapping 9 corresponds to the matrix
rim
I311
:03“).
B:
BM
[31,1
From (2) and (3), it follows by substitution that £1: = £1 [12”: Bhi“ik]§k
l
Ic=l
i=1
£1 Vhla‘fk Ic=l
(h = l,...,p),
(4) 89
CH. 5: LINEAR MAPPINGS OF VECTOR SPACES, MATRICES m
Where
y“, = 2 film-oz“, i=1
(h = 1,...,p; k = l,...,'n).
(5)
Thus the coeflicients y“, of the product mapping gf can be calculated from the coefficients on“, off and B,"- of g. The mapping gf is represented by the matrix
0:
711
...
...
711a
...
...
...
...
:(yhk)‘
m
7121
Definition 2. The matrix 0:01”) where tc is given by (5), is called the ‘product’ C’=BA of the matrices A = (am) and B: (flu). Thus the element )1“, of the product matrix C =BA is obtained when the hthrow ofB is ‘multiplied’ into the kth column ofA . We ‘multiply’ a. row into
a column containing the same number of elements by taking the sum of the products of the elements of the row with the corresponding elements of the column. Example 1. On multiplying the row (1,2, 3) into the column
—1 2
y
1 we obtain the number l.(—l)+2.2+3.l=6.
Hence the matrix product 0=BA only has a meaning when the row length (the number of elements in a row) of B is equal to the column length of A. The column length (or the number of rows) of the product matrix is
then equal to that of B and its row length is equal to that of A. Example2 LetB=(;
?
g)
and
4 5 4 4 ThenBA=(8 7 2* 5), 90
A:
1
2
—l
0
l
2
2 1
3
1
1
0
5.2: LINEAR MAPPINGS OF FINITE-DIMENSIONAL SPACES
where, for example, the element marked with an asterisk is obtained by multiplying the second row of B into the third row of A:
2.(—1)+1.2+2.1 = 2. Multiplication ofmatrices is not commutative, i.e., the equation AB = BA
is not true in general, even when both products have a meaning. Example3 If then
1 A—(O 0 AB=(-l
2 1)’ l 0)
2 B=(—l and
1 0),
2 BA=(_l
5 _2),
hence AByéBA. In view of Definition 2, the statements of formulae (4) and (5) can be combined into the following theorem.
Theorem 7. The matrix of the product of two linear mappings is the product of their matrices (in the same order).
Naturally it is assumed here that the same basis of the intermediate space F is used for both mappings. Theorem 8. Multiplication of matrices is associative. Proof. Let A = (am), B = (Bu), 0: (71:1) where the row length of A is equal to the column length of B and similarly for B and 0. Then
[(ABWM = 2k (Z “hi3ik) 71a = .2k ahiflik'yu ‘L
v
and the same is true for [A(B0)],,,. (For simplicity, we have denoted the element of (AB)0 with indices h and l by [(AB)C'],,,.) The matrices (AB)0 and A(B0) therefore have the same elements and hence they are equal. The formulae (2), (3) and (4) can also be expressed more clearly in terms of matrices. If we denote the matrix which consists of the one column
by E and similarly the matrix
Ici is not a permutation are zero. In the case ofa permutation, if we put k1; = 500'), the last expression becomes
detC = § 061,¢(1)-~ "WM/Kn) é} °h(¢)/31,¢¢-If(x) of the vector space E
into the vector space S. These mappings are just those real- or complexvalued functions f on E for which L. f(°‘1 $1 + 052132) = a1f(w1)+ a52.1%"?2) for all 901, 12 e E and all scalars, a1, «2. Definition 1. A ‘linear functional’ on a vector space E is a linear mapping of
E into the l-dimensioml vector space of its scalars. The linear functional f, which is given by f(x)=0 for all a: e E, will be denoted by f=0 and hence the statement f#0 will mean that there is a vector x e E such that f (x) #0. Iff;é 0, then f maps E onto S. If E is finite-dimensional, then, in the case of a linear functional f, the
representation 5.2 ;(2) off becomes m
M) = kg «ha.
(1)
It will be seen that these are the functions which were introduced in Example 2.1 ;9. Theorem 1. Let E be a vector space and let an be a non-zero vector in E. Then
there is a linear functional f on E such that f(x0) #0.
Proof. Since x0¢0, there is a basis B of E which contains x0. Hence the theorem follows from Theorem 5.1 ;3. Theorem 2. Let L be a subspace of E and x0 6 E\L. Then there is a linear
functiorml f on E such that f(x) =0 for all x e L and f(x0)7é0. 107
CH. 6: LINEAR FUNCTIONALS
Proof. The canonical image E0 of mo in the quotient space E/L is not the zero-element (cf. Example 5.1 ;1). Hence, by Theorem 1, there is a linear
functionalfon E/L for which flio) #0. We define a mapping f of E into S by f(x) =f(i) for each x e E, where c? is the canonical image of 95. Since 29—»? is a linear mapping of E onto E/L (Problem 5.1 ;2), it follows thatf is a linear functional on E (Theorem 5.1;6). It is easy to verify that f satisfies the requirements of the theorem. 6.1.2
Hyperplanes
Definition 2. Let H be a coset in a vector space E and let L be the subspace corresponding to H. Then H is said to be a ‘hyperplane’ of E, if the quotient space E/L has dimension 1.
Thus a coset is a hyperplane if and only if the corresponding subspace is a hyperplane. If dimE=n, then a subspace L of E is a hyperplane if and only if dimL=n— 1. (Theorem 3.2 ;5).
Theorem 3. A coset H g E is a hyperplane if and only if E is the only coset which contains H as a proper subset. Proof. 1. We will first deal with the case when H is a subspace.
1.1. Suppose the subspace L is a hyperplane. Then dimE/L: 1 and hence the subspace {0} =L/L of E/L is properly contained in only one subspace, viz. E/L. Hence, by Theorem 2.5 ;7, E is the only subspace Which
properly contains L. 1.2. Suppose that the subspace L of E is not properly contained in any
subspace except E itself. Then the subspace {0} = L/L of E/L is not properly contained in any subspace except E/L. Hence dim E/L = 1 and L is a hyperplane of E. 2. We now prove the theorem for arbitrary hyperplanes. 2.1. Suppose H is a hyperplane of E. Then so is the subspace L corresponding to H so that E is the only subspace which properly contains L. Hence E is the only coset which contains H as a proper subset (cf. Problem
2.5 ;5). 2.2. Suppose H is a coset of E and E is the only coset which contains
H as a proper subset. Then E is also the only subspace which properly contains the subspace L corresponding to H. Hence L is a hyperplane and so also is H. Theorem 4. Let f #0 be a linear functional on the vector space E. Then
L=f‘1(0) = {96; a: e E,f(x) =0} is a subspace and a hyperplane of E. (By Definition 5.1;2, L is the kernel of the linear functional f.)
108
6.1: LINEAR FUNCTIONALS AND COSETS
Proof. 1. By Theorem 5.1 ;7, L is a subspace.
2. By Theorem 5.1;9, the vector spaces E/L and f(E)=S are isomorphic. By Theorem 5.2 ;l, dim E/L = dimS = l and therefore L is a hyperplane.
Theorem 5. Let f;é 0 be a linear functional on the vector space E and let at be a
scalar. Then H=f_1(oc)= {x; x e E,f(x) = cc} is a hyperplane. Proof. Since f ;é 0, H is not empty. Suppose therefore that x0 6 H. Then L =H —x0 =f‘1(0) is a subspace and H is a coset of L. By Theorem 4, L is a
hyperplane and therefore H is also a hyperplane (of. Problem 3). Theorem 6'. Given any hyperplane H of E, then there exists a linear functional
f#0 on E and a scalar or such that H =f_1(oc). f is uniquely determined by H except for a non-zero scalar factor.
Proof. 1. Let L be the subspace corresponding to H. By Theorems 2 and 3, there is a linear functional f#0 on E such that f(x)=0 for all a: e L. By Theorem 4, L* =f‘1(0) is a hyperplane and L* 2 L. Since L is a hyperplane itself, it follows that L* =L. Now, H =f_1(oc), where as:f(you), and x0 is any
vector in H. 2. If H=f‘1(a) =g_1(/3), then L=f_1(0) =g“1(0) is the subspace corresponding to H. If 20 e E\L, then every x e E can be written uniquely
in the form x=y+Azo where y e L (of. Problem 4). Hence f(w) = Af(zo) and -
flan)
9(96) A9(20),1-e-,f 9020);). In the sense of analytic geometry, Theorem 6 states that every hyperplane has an equation of the formf (x) = at wheref at 0. Theorem 5 states that
every equation of this form is the equation of a hyperplane.
6.1.3
Systems of Linear Equations and Cosets
Theorem 7. Every coset M of E is the intersection of the family J of all those
hyperplanes H which contain M. Proof. 1. Clearly M E (VJ) (Theorem 1.2;3). 2. Suppose yo 9*; M (ifM =E, the theorem is triviallytrue) ands):0 e M. If L is the subspace corresponding to M, then yo—xo (f L. By Theorem 2, there is a linear functional f on E such that f(x)=0 for all 9: 6L and
f (yo—x0)9é0. We put f(xo)=oc and consider the hyperplane H :f_1(oc). For each x EM, f(x)=f(x0)=oc and hence H 2 M, i.e., HEM. Since
f(y0) ;éf(x0), yo 9.5 H and therefore also yo ¢ {Mu/l). Hence {1(1) 9 M. 109
CH. 6: LINEAR FUNCTIONALS
Iff is a linear functional on the vector space E and a: is a scalar, then the equation f(a) = at is known as a linear equation. A vector x0 6 E is a solution
of this equation, iff(950) =oc. By Theorem 6, every hyperplane is the set of solutions of a linear equation. Accordingly, Theorem 7 can also be stated as
follows.
Theorem 8. Every coset is the set of solutions of a system of linear equations. (viz. the system consisting of the equations of the hyperplanes in the family .1. If M = E, them/l = Q . The theorem is still true in this case because E is the set of solutions of the equation 0(a) =0.) By Theorems 5 and 2.5 ;9, the following converse ofTheorem 8 is also true.
Theorem 9. The set of solutions of a system of linear equations is either empty or it is a coset.
It is in fact possible for the set ofsolutions to be empty, for example when there is an equation f(x) = o: in which f is the zero functional and ac 9e 0. Example 1. In the vector space G3 (Example 2.1 ;1), the set of solutions of the linear equation (with respect to some basis)
f(x) = a1§1+a2§2+a3§3 = B
(.1095 0)
is a plane, i.e. , when the solution vectors are drawn from a fixed point, the set of their end points is a plane. A system of two linear equations has as its solution set either a straight line or a plane or it is empty. For three equations, the solution set can either be empty or a single vector or a straight line or a
plane. If we allow f= 0, the solution set can also be the whole space, viz. when the system has only the one equation 0(a) =0.
Problems
1. Let a linear functional f be given in the form (1). What happens to the coefl‘icients at under the change of basis E=T1fl +1
2. Show that, if k e 0 (Example 2.1 ;7), then flat): I Ic(-r):c(1-) (17 is a
linear functional on 0. 1 3. Show that, iff is a linear functional on a vector space, a is a scalar and f‘1(oc) =H, then L=f‘1(0) is the subspace corresponding to the coset H. 4. Suppose f is a linear functional on the vector space E, L=f_1(0) and 20 e E\L. Show that every vector a: e E can be written in exactly one way in the form x=y+/\zo where y e L. 110
6.2: DUALITY IN FINITE-DIMENSIONAL SPACES
5. Suppose E1 and E2 are vector spaces and L1 is a hyperplane of E1. Prove that the set of all pairs (22142), Where x1 6 L1 and x2 6 E2, is a hyperplane in the direct product of El and E2 (Problem 2.1 ;4).
6. Prove that a subspace L of the n-dimensional vector space E is a hyperplane if and only if dimL= n— l. 7. Let k(a, 7-) be a continuous scalar valued function of the real variables a, -r in the interval — l S 0', 1g +1. Show that, if g e 0, then the set of all
x e 0 which satisfy the first order linear integral equation
+1
I k(a.v)xf(x,yo) is a linearfunctional on E, and 2. for each x0 6 E, the mapping y—>f($0,311) is a linear functional on F.
Choosing bases of E and F and denoting the corresponding vector components by f; and 1)h respectively, we can easily verify that
flat?!) = Z]; aikfink z,
(2)
is a bilinear form on (E,F) for all a”. 0n the other hand, every bilinear form f on (E,F) can be represented in the form (2), because, from Definition 2 and
a: Egiei; y: Enkfk, it follows that (2) is satisfied by a“: =f(ei,fk). i k
6.2.3
Dual Pairs of Spaces
Definition 3. A pair (E,F) offinite-dimensional vector spaces with a bilinear form f (x, y) is said to be a ‘dual pair of spaces’ if 1. fromf(x,yo) =Ofor all a: e E, itfollows that y0=0 and 112
6.2: DUALITY IN FINITE-DIMENSIONAL SPACES
2. fromf(xo,y)=0for ally e F, itfollows that x0=0.
The form f (x, y) is known as the scalar product of the dual pair of spaces and in the following it will be denoted by (32,31). Example 1. Suppose that E and F are 2-dimensional and f(x,y)=fln2. Then this is not a dual pair of spaces, because, for the vector :50 6 E with
51:0, 52: l,f(x0,y) =0 for all y 6F. But xoaéO. However if we put f(x,y) = E1111 +§2 1,2, then (E,F) becomes a dual pair of spaces, because, if f(x,y)=0 for all y EF, then, with 971:1, 172:0, we have 51:0 and, with 771:0, 772:1, we have 52:0, i.e., 22:0. The same
argument is still valid when the roles of x and y are interchanged. Example 2. Let E and F be two vector spaces of the same dimension n and
let £1- and 77k be the vector components with respect to given bases of E and F. Then (E,F) is a dual pair of spaces with the scalar product @731) = El 51:17]:Ic=1 Example 3. A vector space E and its dual space E* is a dual pair of spaces when, for x E E and f e E*, the scalar product is defined by (x,f) =f (3:). Because f (x) :0 for all x e E is just the definition of f:0 and, if f (z)=0 for all f e E*, then, from Theorem 6.1 ;1, it follows that x=0. Finally, it is easy to verify that (x,f> is a bilinear form.
Now suppose that (E,F) is a dual pair of spaces. Corresponding to each x e E we have the linear functional fm on E which is given by fx(y) = (sup). In this way we define a mapping 9 of E into the dual space E* of F and we assert that g is both linear and 1-1. It is linear because fan xl+m z.(!/) = (“1 x1 + 0‘2 $2, 3/) = “14751: 1’!) + 052(952: 3/) “lfzt1(y)+ “2fx2(y)
for all 3/ SF, and it is 1-1 because l=fn means (x1,y)==0, for i=2, ..., n. We choosefz, ...,f,, analogously. If E Akfk=0, Ic=1 fl
1‘
then 2 Ak(ei,fk>= 2 AkSik=Ai=O for i=1,..., n. Thus f1,...,f,, are k=1 k=1 linearly independent and hence they are a basis of F. 116
6.2: DUALITY IN FINITE-DIMENSIONAL SPACES
If the scalar product ofa dual pair of spaces is referred to dual bases, then, from (2) and (4), it follows that
= 3 em = E'n.
(5)
Conversely, if the scalar product is given by (5), then the bases which are involved are dual. Now suppose that, with the notation of 6.2.5, f and g are dual mappings. Suppose that with respect to given bases of El and E2, f is given by :2 = AEI. (Here for example £1 is the matrix which has just one column consisting of the components £11, ..., $1,, of x1 6 El.) Suppose further that, with respect to the dual bases, 9 is given by 711 = 3112. In view of (5), the duality condition (3) now reads
E’l A’nz = £13112. Since this is valid for all x1 6 E1 and all yz 6 F2, i.e., for all values of E1, and 11%, it follows that B=A’. Theorem 9. Dual mappings are represented by transposed matrices with respect to dual bases. Conversely, transposed matrices represent dual mappings when they are referred to dual bases in dual pairs of spaces (which is always possible). Proof. It only remains to prove the second part. If A is an m x n matrix, we start with two vector spaces E1 and E2 of dimensions n and m. Then, with
respect to any bases of E1 and E2, A represents a linear mapping f of E1 into E2. Now suppose that F1 and F2 form dual pairs of spaces with E1 and
E2 (e.g., we could put 12:10:). If we now refer A’ to the bases of F1 and F2 which are dual to the chosen
bases of E1 and E2, then A’ represents the dual mapping g off. Because by (5), for x1 6 E1 and y2 e,
0.051),?» = €114,712 = (rpm/2))In View of Theorem 5.2;12, it now follows from Theorem 8 that trans-
posed matrices have the same column rank. If we now define the row rank of a matrix in analogy with the column rank (cf. 5.2.4) to be the maximum
number of linearly independent rows, then obviously the column rank of A’ is equal to the row rank of A. The column rank of a matrix is therefore always equal to the row rank so that we usually refer simply to the rank ofa matrix, i.e.,
Theorem 10. The row and column ranks of a matrix are equal, so that both can be referred to simply as the ‘rank’ of the matrix.
117
CH. 6: LINEAR FUNCTIONALS
Theorem 11. If the matrix A has rank r, then it contains a determinant of
dimension r whose value is not zero and every determinant of dimension greater than r which is contained in A has the value zero. (‘Determinants which are contained in A’ are obtained by choosing a number of the rows of A and the same number of columns and taking those
elements of A which belong both to one of the chosen rows and to one of the chosen columns).
Proof. 1. If A has rank r, then it has r linearly independent rows. These form a submatrix of A which also has rank r, and therefore has r linearly
independent columns. The determinant formed from these r partial columns of A has a. value not equal to zero. 2. If a determinant which is contained in A has dimension greater
than r, then its rows are linearly dependent, because the corresponding rows of A are linearly dependent. Hence the determinant has the value zero (Theorem 4.2 ;7). 6.2.7
Numerical Calculation of the Rank of a Matrix
Let A = (ecu) be an m x n matrix whose rank is to be calculated. We start with a vector space E of dimension n and choose a basis {e1,...,e,‘} of E. If we put 7!
f = Ae, i.e.,fi =
2 «1'k k=l
(7: = l,...,m)
(6)
then, by Theorem 3.1 ;4, the row rank (and hence the rank) of A is equal to
the maximum number of linearly independent vectors in the set {f1, . . .,f,,,} and hence to the dimension of L(f1, . . .,f,,,). We start now with the tableau which expresses the relations (6) in normal interpretation. This is the same
tableau as 5.6 ;(l). Now we exchange vectors fi with vectors ek until it is no longer possible to do so, i.e. , until there are no more suitable pivots available.
Note that it does not matter which pair of vectors we exchange first and which we exchange second, etc, because the only restriction lies in the
availability of pivots. With a possible re-ordering of the rows and of the columns, the last tableau will have the form (see 5.6 ;(3) for the notation)
118
fl
e2
e1 =
B
0'
f2 =
D
0
6.2: DUALITY IN FINITE-DIMENSIONAL SPACES
where the bottom right-hand corner contains only zeros (otherwise it would be possible to make a further exchange). The vectors in the group f1 are linearly independent because otherwise it would not have been possible to exchange them. It also follows from the tableau that each of the vectors in the groupf2 is a linear combination of those in the groupf1. Thus the dimension of L(f1, . . .,f,,,) and hence the rank ofA is equal to the number of vectors
f,‘ which have been exchanged In a practical calculation, it is clearly not necessary to carry each pivotal row and column into the rest of the subsequent calculation. In the next Example, we will show the full calculation on the left and the calculation reduced to its essentials on the right. Example 5. To find the rank of the matrix
A:
l 2
2 l
l 2
2 1
l
1
l
l
IstTableau e1
f1: f2: f3:
92
1* 2 1
93
e4
2 1 l
l 2 1
2 1 1
*
—2
—1
—2
93
e4
2ndTableau
f1
91
—1 —2
e1:
1 -2
f2: f3:
2 1
—3 —1*
0 0
——3 —1
—3 —l*
0 0
—3 ~1
l
*
0
—1
*
O
—1
ll9
CH. 6: LINEAR FUNCTIONALS
3rdTableau f1
f3
33
e4
61 =
—1
2
—l
0
f2 =
—1
3
0
0
62 =
l
—l
0
—l
0
0
The vector f2 cannot be exchanged either with ea or e4. The number of
possible exchange steps is 2 and therefore the rank of A is equal to 2. Exercises
1. Find the ranks of the following two matrices, l A = (—1 —l
2 —-3 —-4
3 4 l 5) 5 l4
B =
l —l —2 —3 —-l
—1 3 2 5 —1
2 2 ——4 —2 —6
Solution. Bank A = rank B: 2.
2. Find the dimension of the subspace of R5 which is spanned by the following four vectors. x1 =(2,—1,3,5,—2),
x3 = (5,—3,8,4,1),
x2 = (3,—2,5,—1,3) x4 = (l,0,l,ll,—7).
Solution. The dimension is 2. 3- Let f1(x) = §1+2§2+3§3+ 454 f2(x) = —§1—3§2+ 53+ 5564 fslx) = “51—4524'5534'1454 9(3) = 51+352— {53— 554 be linear functionals on the vector space R4. Is there a vector x e R4 such
that gtx) 3&0 and mo =fz(x) =f3(x) =0? Solution. No, because 9 is a linear combination off1, f2, f3. Problems 1. The definition of the dual space can be extended to arbitrary vector
spaces in an obvious way. Show that the dual space F: of the vector space F0 (Example 2.1;3) is isomorphic to the space F.
120
6.3: LINEAR FUNCTIONALS (POSITIVE ON A CONVEX SET)
2. Let L be a subspace of the vector space E. What special property
characterizes the linear functionals f e L0 S E*2 3. Show that, if L is a subspace of the vector space E, then L“ is isomorphic to E*/Ll. 4. Suppose that L is the subspace of R5 which is spanned by the vectors zl=(2,1,1,2,0) and z2=(1,2,3,l,1). Find a basis for the dual subspace
L0 E R2. 5. Calculate the basis of R: which is dual to the basis ei=(8,-1,...,8,-,,) (i: 1,. ..,n) ofRn.
6. Let t be an endomorphism of the finite-dimensional vector space E. What is the image of a linear functional f e E* under the mapping which is dual to t?
7. Let (E,F) be a dual pair of spaces. Calculate the dual subspaces {0}T and El. 8. Let (E,F) be a dual pair of spaces and suppose that two bases of E are connected by é=Se (see 5.4 ;(4)). What is the connection between the dual
bases? 9. Show that (L1+L2)T=LI n L; and (L1 n L2)T=LI+L§. 10. Let (E,F) be a dual pair of spaces and suppose E =L1 6—) L2. Show that
F = LI 99 Li. ll . Show that the dual mapping of a product of two linear mappings is the product of the dual mappings in the reverse order. 12. Let f(why) = 2 ocik Em], be a bilinear form on two vector spaces of the i la same dimension. Find a necessary and sufficient condition on the “tic so that (E,F) is a dual pair of spaces with the scalar product (at, y) =f(x,y). 13. Show that the bilinear form f(x, y) in Problem 12 is a product of a linear functional 9(a) on E and a linear functional My) on F if and only if the rank
ofA = (com) is equal to 0 or 1. 14. What is the dual mapping of a projection? (see Problem 5.3 ;6).
6.3
6.3.1
Linear Functionals which are Positive on a Convex Set
The Separation Theorem
In 2- or 3-dimensional space, we have the intuitive idea that any convex set K which does not contain the origin lies completely to one side of some line or plane through the origin (i.e., some hyperplane). If f (x)=0 is the
equation of this hyperplane, then f(x) 20 for all x e K (multiplying f by — 1 if necessary).
We Will now show that this state of affairs also obtains in all finitedimensional real vector spaces. 121
CH. 6: LINEAR FUNCTIONALS
f(z)>0
H =r10 for all x e K. We say that f is positive on K (although
strictly speaking we should say ‘non-negative’). Proof. 1. We consider the family J? of all subspaces L of E which have the
following property. A: There is a linear functional f#0 on L such that flat) 2 OforallxeLnK.
We need to show that E e if. 2. If K = w , the theorem is trivially true. Suppose that 110 e K. If M0 6 K, then A > 0 because 0 at K. Hence a linear functional f#0 is defined on the 1 dimensional subspace L(x0) byf (M0) = A so that L(a:o) has property A. Hence L(x0) e 55’ and g is not empty. 3. In 4. we will show that, if L 63$ and LgéE, then there is an
L e ,2” such that dim L=dimL+ 1. Since 3’ is not empty, it follows that E e .9 and the theorem will be proved. 4. Suppose L 63, LaéE and f is a linear functional on L as in property A. If t is a vector in E\L, we form the subspace L =L(L, t). Then dimL=dimL+1 and every vector :2 e L can be written uniquely in the
form 59:35 + At, where x E L. We distinguish three cases. 1st case: Forallf=x+ht e L n K, A20. Thenf(:2) = )iisalinearfunctional
on L,f¢0 andf(fi)>0for all e 61: n K. Hence L ex. 122
6.3: LINEAR FUNCTIONALS (POSITIVE ON A CONVEX SET)
2nd case: For all £=x+At e L n K, AgO. Then we putflfi): —)1. 3rd case: Ifneither of the first two cases holds, then there exist
:21 = x1+A1t 62 n K where an 6L and A1 > 0
and
:22 =xz—AgtEZaherexzeLandA2>0.
We consider the two sets of real numbers
M1 = {p.1; there exists :51 E L, A1 >0 such that
x1+/\1t EL n K and [1.1 = —'fl:—l)l 1 M2 = {9: there exists x2 6 L, A2 > 0 such that —A2t e .21 n K and p2 = +f(;;2)}. 2
Both M1 and M2 are not empty. Further ”,1 S [1.2 for all pl 6 M1 and all ”,2 e M2. Because, for thelcorresponding 251, A1, x2, A2,
Il—1_'l‘2)| (A1x2+)\2x1) EL n K therefore A1f(x2) + A2f(391) 2 O and hence
M = _f_(x1) f_(x2) _ M _ 1
T1 \ T2
2
It follows that there is a real number 6 such that ‘ul S 0 S [1.2 for all ‘ul 6 M1 and all ‘uz e M2
We now consider the linear functlonal f on 1) given by f(x+)lt) =f(x)+)l0. Clearlyfaé 0 and it is easy to see thatf(at) 2 0 for all 2‘: e E n K. Hence in this case also L E 3’ and the proof is complete. Theorem 2. Let P be a convex pyramid in a real vector space E of dimension
n and let x0 6 E\P. Then there exists a symmetric parallelepiped W in E such that (920+ W) n P=L7L (0f. Example 3.4;2.) Proof. Suppose P=P(a1, . . .,a,) (see Definition 3.4 ;4). A linear mapping 9 of the r-dimensional space R, onto the linear hull L(P) of P is given by
123
CH. 6: LINEAR FUNCTIONALS T
g()\1,...,}l,)= 2 Ahab. We choose a basis {e1,...,e,,} of E and denote the k=1 7 components of x: 2 Aka], by fly”, 5”. Since each component is a k=1 linear functional on E, it follows that for each x e L(P), §i=hi(/\1,...,A,) (i: l, . . .,r), where h,- is a linear functional on R, (Theorem 5.1 ;6). We de-
note the components of are by £0,- and construct the quantity 9 = min max IEi—foil zeP 10 (=lc 1,.
r})isthedualpyramidofPoandQ= {x; e,(xf0 )>0}
is the dual pyramid of Q0 The hypothesis of the theorem means that P g Q. Therefore, by Theorem 6, Q" E P0 and hence f0 6 Po and the assertion follows. If we choose a basis ofE (components 45;), then we have representations as
follows. fax) = 12 “1:151 01‘ f0”) = Ag
and
fat”) = g 71 ft or f0(5'?) = Y";
Now, writing 020 to mean that all the elements of the matrix 0 are
non-negative, we may restate Theorem 7 in the following form. Theorem 8. Let A and y’ be real matrices with the same number of columns and
suppose that whenever A320, it follows that y’EZO. Then there exists a matrix 71’ >0 such that y’=1]’A. That is, y’ is a linear combination of the rows of A with non-negative coefficients. 6.3.3
The Minimax Theorem
Theorem 9. Let E and F be finite-dimensional real vector spaces, let f (x, y) (x e E,y 61‘) be a bilinear form and let X g E and Y E F be non-empty convex polyhedra. Then min max f(x y) = max min f(x, y) 1161’ 56X
xeX
(5)
e
This theorem is known as the Minimax Theorem and is due to J. von Neumann. The right-hand side of the equation (5) is calculated by first finding, for each x e X, a y e Y such that f (23,31) is minimal. This minimal value is dependent on x. We then find an x e X such that the corresponding minimum is as large as possible. The left—hand side is calculated in an
analogous manner. The fact that the minimum on the left-hand side and the maximum on the right-hand side both exist is not obvious and will not in fact be proved 127
CH. 6: LINEAR FUNCTIONALS
until a later section (9.2.2). Nevertheless we will continue with the present
theorem, assuming for the time being that the existence of both sides is known. Proof of the Minimax Theorem. 1. Suppose that the value of f(my) on the left-hand side of (5) is taken at ar:=:v1 and y=y1 and that the value on the right-hand side is taken at x=x2, y=y2. Then
min muffle) = f(x1,y1) 2 f(902,311) 2 f(x2,y2) = max minf(x,y)e
sex
sex 1161’
Hence it only remains to prove that the left-hand side of (5) cannot be strictly greater than the right-hand side. In other words, that there is no
real number 0 such that min max f(x, 3/) > (9 > max min f(x, y). e
apex
sex
(6)
1161’
2. We now prove that (6) is impossible for the special case in which the
following extra conditions are satisfied. (a) E and F form a dual pair of spaces with (25,31) =f(x, y)
(b) Each of the polyhedra X and Y is a subset of a hyperplane in E and F respectively and is not a subspace. (a) 0:0 From (6) and (6), it follows that
maxf(x,y) > 0 for all y e Y, 25X
and
min f(x,y) < 0 for all x e X, e
or, alternatively, for each 3/ e Y, there is an a: e X such that f(x, 3/) >0
and
for each a: e X, there is a y 6 Y such that f(27,11) 0 and for each x e P, xaéO, there is a y e Q such thatf(x,y) 0
(i = 1:---»m)-
(3)
Here we are looking for the solutions ($1,...,§m 771, . ..,17,,,) of the system of equations (3) for which 77520 (i: 1,. ..,m). By Theorem 7.3 ;5, if there exist any solutions at all, then these solutions form the sum P =P1 +P2 of a
convex polyhedron P1 and a convex pyramid P2. The transfer from Rm” to Rn is achieved by the same linear mapping as in 7 .4.1 and this maps P 1-1 onto the set K of solutions of (1). Hence we have the following theorem. Theorem 2. The set of solutions of a system of linear inequalities (l) is either empty or it is the sum K =K1+K2 of a convex polyhedron K1 and a convex pyramid K2. Note that K2 can be {0}, i.e., K=K1. We must now make several remarks in connection with this Theorem 2.
l . The polyhedron K1 is not uniquely determined ifK2 ye {O}. For instance, we can certainly extend K1 by introducing any vector from K\K1 as an extra vertex vector. This does not alter K.
Suppose that $1, . . ., x, are the vertex vectors ofK1. We will say that one of these («:1 say) is superfluous if x1 6 K(952, . . .,x,)+K2. We can then omit x1 as a vertex vector (so making K1 smaller) without changing K. If after the omission of 911 there is a further superfluous vertex vector, then this can also be omitted, and so on. We may therefore assume that K1 has no super-
fluous vertex vectors. 2. The pyramid K2 consists of the solutions of the homogeneous system associated with (1). For, suppose z 6 K2, then, for any a: 6 K1 and A>0, x+Az e K, or, in components, 7.
1:231 aik(§lc+)‘§k)+/3i n
n
-
= Eluikflc‘l'firl'A 15210“.k > 0
(‘l = 1"":m)-
(4)
From this, it follows that 71
£21 a“: Ck 2 0
144
(i = l, . . .,’m).
(5)
7.4: LINEAR INEQUALITIES
Conversely, if (5) is satisfied and x 6 K1, then (4) follows for A 2 0. Hence x+Az GK, i.e., z=(l/A)(—x+kl)+k2, Where 161 6K1, Icz 6K2 and are dependent on A. As A—>oo the components of (1 /A)(—x+lc1) converge to 0 and therefore the components of k2 converge to those of 2. As a simple argument will show, it follows that z e K2. 3. Consequently, the subspace L=K2 0 (—K2) consists exactly of the
solutions of the system of equations
éaufi=0
c=1pwm)
k=1
(o
and therefore has the dimension n—r(A) (Theorem 7.1 ;l), where A = (am). 4. Now, for a given (not superfluous) vertex vector x1 6 K1, suppose that
exactly a ofthe inequalities are satisfied with the equality sign, i.e. , assuming that they are the first 3, suppose that n
[‘2 “ikEk'i'Bi = 0 fori = l,...,s, =1
(7)
15
2 aikfb+fli > 0 fort} = s+l,...,m. k=1 Let M be the solution space of the system of equations
ima=0
k=1
u=uwa
(8)
m
If (1 e M, then (7) is satisfied by ”=31 1 Ad for all A. Further, there exists a A0 > 0 such that, with A=Ao, (8) is also satisfied. Hence x=x1 i- Aod are solutions of (1). Hence
”1+Aod = k11+k21 xl—Aod = k12+k22,
Where
[611, [612 6 K1 and 1021, 1022 E K2.
By adding these, it follows that 3:1 = ki+k;»
where
k: = §(lcn+k12) 6 K1
and k; = §(Ic21+lc22) 6 K2.
But in this k; must be zero because otherwise x1 would be superfluous. Hence x1 =%(lc11 +1012). Since x1 is a vertex vector it follows that k11=k12=x1. Hence A0d=k21 = —Ic22 6 K2 0 (—K2) =L. Therefore d e L, and hence M E L. On the other hand, it is clear that M 2 L and it follows
that the rank of the system of equations (9) is equal to r(A). Therefore there are r(A) of the s inequalities in (1) which are satisfied by
ml with the equality sign, such that the corresponding left-hand sides of (9) are linearly independent. We will say, in this case, that the inequalities involved are also linearly independent. 145
CH. 7: LINEAR EQUATIONS AND INEQUALITIES
Definition 1. A solution x=(£1,.. .,§n) of the system (1) is said to be a ‘basic solution’ if it satisfies r(A) linearly indeperdent inequalities of the system with the equality sign. A basic solution is said to be normal, if exactly r(A) inequalities are satisfied with the equality sign. A non-normal basic solution is said to be degenerate.
This proves the following theorem. Theorem 3. The convex polyhedron K1 (Theorem 2) can be so chosen that its vertex vectors are basic solutions of the system (1).
Example 1. Let n=2, r(A)=2 and consequently m22. If we replace the inequality sign by an equals sign in one of the inequalities we will obtain the equation of a line (assuming that not both of “i. and 011-” are zero). Then
the solutions of the corresponding inequality form one of the two halfplanes bounded by the line. Thus K is the intersection of the halfplanes corresponding to the inequalities of (1).
(b)
(a)
Fig. 15.
In Fig. 15a, m = 5 and K2 = {0}. (The diagrams and the terminology naturally refer to the space 02 which is isomorphic to R2 (Example 2.1 ;l).)
Every vertex vector of K = K1 is a basic solution since it is the intersection of just two of the five lines. In Fig. 15b, m = 4 and K1 is the convex hull of the three numbered vertex vectors. All three are basic solutions. Here we could extend K1 by introducing any vector x e K\K1 as a new vertex vector. However, these would be superfluous and not basic solutions. 146
7.4: LINEAR INEQUALITIES
Example 2. Let n: 3. Every inequality of (1) represents a halfspace which
is bounded by the plane given by the corresponding equation. Thus K is the intersection of finitely many halfspaces. Ist case. r(A) =3 (hence m 2 3). In this case, K1 can be so chosen that each vertex is the unique intersection of three planes.
2nd case. r(A) =2 (hence m2 2). In this case there is a set of parallel lines S such that, for a: e K, the line ofS which passes through :4: lies entirely in K.
Fig. 16. In Fig. 16, m: 3 and K is a 3-sided prism. K2 is the line of the setS which
goes through 0. K1 can be chosen to be the triangle with the numbered vertices. Every vertex vector lies in r(A)=two faces. The vertex vectors
are therefore basic solutions. More generally every point which lies on an edge of the prism K represents a basic solution.
In Fig. 17 (m=4), K2 is a wedge which is bounded by two halfplanes whose common bonding line is the line from the set S which goes through 0. K1 can again be chosen as a triangle. 3rd case. r(A)= 1 (hence m2 1). In this case, K is either empty or it is a
slice bounded by two parallel planes or a halfspace which is bounded by a plane. Every point of a boundary plane represents a basic solution.
4th case. r(A) :0. Now K=R3 if all [3120, otherwise K=y3 . Example 3. The basic solutions in Figs. 15, 16, 17 were all normal. For instance, if n=r(A)=3, then a basic solution is degenerate when it is the
intersection of at least four bounding planes. For example, if K is a regular 147
CH. 7: LINEAR EQUATIONS AND INEQUALITIES
v
0
Fig. 17. octahedron or icosahedron, then all vertices are degenerate basic solutions. On the other hand, if K is a cube, tetrahedron or a regular dodecahedron, then all basic solutions are normal.
148
CHAPTER 8
LINEAR PROGRAMMING 8.1
Linear Programmes
A linear programme is a problem of the following kind. We are given a system of linear inequalities
Pi = ’21 “ih§h+fii 3 0
(’i = 1,...,m)
(I)
and a linear functional n
9 = E 71:51:
(2)
k=1
which will be referred to as the objectfunction. We wish to find those vectors (£1, . . ., 5,.) e Rn which satisfy (1) and for which the object function takes its
minimum value. In matrix notation, a linear programme can therefore be formulated as follows
9 = 113+? 2 0
(3)
0 = y’g = min
(4)
(As before, we write 0'20 if all the elements of the matrix 0 are nonnegative.) The inequalities (3) are known as the restrictions of the programme. A vector (51,...,§n) GR“ which satisfies the restrictions is known as an
admissible solution. An admissible solution which satisfies (4) is known as a minimal or optimal solution and the corresponding value of the object
function is known as the optimal value. The special case in which the restrictions include the inequalities 5,; 2 0
(k = l,...,n),
i.e., E 9 0,
(5)
is ofparticular importance. In this case the programme is said to be a definite linear programme. There will also be cases in which the restrictions include only some of the inequalities (5), and then the variables which are not affected by (5) are known as free variables (see 8.4). 149
CH. 8: LINEAR PROGRAMMING
In practical applications, there will also be inequalities with S in place of 2 . They may be changed into the latter form simply by changing the sign of both sides. If one of the restrictions is an equation, then it may be
replaced by two inequalities (a =0 is replaced by oz 2 0 and — a 2 0), so that we obtain the form (3) again. It is also possible that the object function has
to be maximized. Since it is possible to replace 0=max by — 0=min, this case is also covered by (4).
By Theorem 7 .4 ;2, the set K of admissible solutions of a linear programme is either empty or it is the sum of a convex polyhedron K1 and a convex pyramid K2. If K ;é Q , suppose that al, . . ., a, are the vertex vectors of the polyhedron K1 (by Theorem 7.4 ;3, we may assume that they are basic solutions) and that the pyramid K2 is generated by b1, . . ., b3. Then :1: e Rn
is an admissible solution if and only if there are coefficients A1, ..., A” 7
[41, ..., [1.3, where Ak>0 (k=1,...,r), Z Ak=l, #4020 (lc=1,...,s), suchthat
k=1
7
8
x = 2 AIc‘llc'l' 2 I‘lcblc‘ k=1
k=1
(6)
Denoting the object function by g(x), we have T
I
90") = 2 AIc.¢7(0‘k)+ 2 MW)k=1 Ic=1 We distinguish the following three cases. 1. g(bk)>0
for Ic=l, ..., 8.
If x is to be a minimal solution then pk must be zero for all k: l, ..., 8. Let ako be a vertex vector for which g(a,c) takes the least value :3 = 9(0)“) = min 9(ak)~ ’q'fl=02. Proof. Multiplying (l) and (2) by 11’ and E respectively, it follows that
Y’§>n’AE>n’l3Theorem 2. (Duality Theorem) 1. If (1) has an optimal solution, then so has (2) and vice versa. The optimal
values of the two object functions are equal. 2. If (1) [(2)] has admissible solutions with arbitrarily small [large] values of the object function, then (2) [(1)] has no admissible solutions. 3. If (1) [(2)] has no admissible solutions, then (2) [(1)] either has no admis-
sible solutions or it has admissible solutions with arbitrarily large [small] values of the object function. Proof. 1. Suppose (I) has an optimal solution. Let the minimal value of the object function be 01. Then, it follows from Ag—e 2 0,
E, 2 0,
that
y’g—ol 2 0.
(3)
Using this we will now show further that,
if
AE—flg 2 0,
E 2 0,
£2 0,
then y’E—alé 2 0.
(4)
Suppose first that z > 0. We divide the conditions of (4) by C and obtain the conditions of (3) with {—1 g in place of E By (3), it follows that y’E—lg—al 2 0,
i.e.,
y’E—UIC 2 0.
Suppose secondly that i =0. We must show that the conditions
Ago > 0»
go > 0,
Y, go < 0
(5)
cannot be satisfied simultaneously. Let E1 be an optimal solution of (l), i.e.,
Agl 9 [3’
E1 9 0 and Y, g1 = 01,
then, if there were a go satisfying (5), it would follow by addition that
moi-£1) 9 0 and Y'(go+§1) < 01.
A(Eo+g1) > B,
This is a contradiction because 0'1 is the minimal value ofthe object function.
This completes the proof ofstatement (4) and we will now write it in the form A—B
if
I
(0 154
E
/
(1)) (C) > 0
,
then (Y
',-0'
/
1)(Z) > 0
.
8.3: THE SIMPLEX METHOD
(The matrices in this are written in terms of submatrices as shown. In particular I is the n x 11. identity matrix.) By Theorem 6.3;8, there is a matrix (n’mimé) >0, such that
A—
(n'mimé) (1
0 = (y’.-01),
0
i.e., i.e.,
n’A+n{ = y’ n’A S y’
1
and
—'q'§+1)é = —-01,
and 1H3 2 01.
Thus 11’ is an admissible solution of (2) and the corresponding value of the object function is at least ‘71- The first part of the theorem now follows because, by Theorem 1, the value of the object function of (2) is never greater than 01 at any of the admissible solutions of (2). (Using the fact that
the relationship between dual programmes is symmetric.) 2. The second part of the Theorem is a direct consequence of Theorem 1. 3. The third part is a consequence of the first. Suppose for instance that (2) has admissible solutions, and suppose that the set of values of the object function on the set K of admissible solutions is bounded above. Then (2) has a maximal solution and therefore, by 1., (1) has a minimal solution. Finally we see that it is possible for both programmes simultaneously to
have no admissible solutions, for example, when A =0, e > 0 and y’ < 0.
8.3
The Simplex Method for the Numerical Solution of Linear Programmes
Consider the following definite programme.
9 = AE+p 2 0 E 20
(1)
y’g = min Our first task is to find the optimal basic solutions. It turns out that at the same time we will also obtain all the optimal solutions, i.e., those corres-
ponding to section 2 of Theorem 8.1 ;1. Because of the restrictions g 2 0, the rank of the system of inequalities in a definite programme is equal to 11.. Hence, by Theorem 7.4;3, there are n
linearly independent variables in the set {£1,. ”,5", p1, .. .,pm} which take the value 0 in an admissible basic solution. The other variables and the object function can be expressed in terms of these linearly independent variables and the variable 1- (which is to be set equal to l) and this gives a tableau of the following kind. 155
CH. 8: LINEAR PROGRAMMING
6
P
Bi
[1*
P=
5= 9 =
T
B; y:
...
y:
(2)
3*
where the top row contains the n vanishing variables and 1-, and the lefthand column contains the other variables and the object function 0. Since the tableau corresponds to an admissible solution )3: 2 0 for all i: 1, . . ., m, because these are the values of the variables on the left-hand side for the given basic solution.
Thus every admissible basic solution corresponds to a tableau of the form (2) with )3: >0 (i: 1),. . .,m). Conversely, we can construct an ad-
missible basic solution from any tableau ofthis form by putting the variables in the top row equal to zero and calculating the others from the tableau. Narmal and Degenerate Basic Solutions
Tableau (2) corresponds to a normal basic solution if and only if )3: >0 for all i: 1, . . ., m. If [31 :0, say, then, in addition to the 71. variables in the
top row, the variable in the first row also takes the value 0 (and conversely). A tableau which corresponds to a normal (degenerate) basic solution will itself be referred to as normal (degenerate). Optimal Basic Solutions
If y: 2 0 (k: l, . . .,n), then tableau (2) represents an optimal basic solution and will then be said to be optimal itself. In every other admissible solution,
at least one of the variables in the top row will take a strictly positive value. Hence the corresponding value of the object function is not smaller than that for the basic solution (2). If y: > 0 for all k = 1, . . ., n, then (2) represents the unique optimal basic solution.
Conversely, if the tableau (2) is normal and optimal, then y?) 0 (Ic=1,...,n). For, if yzo0 (i=1,...,m), we can choose a strictly positive value for the variable at the head of the
koth column so that the variables on the left-hand side are still positive. In this way, we obtain a new admissible solution, which gives a smaller value of the object function. 156
8.3: THE SIMPLEX METHOD
On the other hand, some of the y: can be negative in a degenerate optimal tableau, e.g., when one of the Bf, say fix, is zero and “Zk0 P3 = —§2-§3+3 3 0
$220 £3 2 0
P1 =
P4 = —§1
—§3+3 3 0
P5 =
“£344 9 0
P6 =
'51
52—5344 > 0 0 —- —E3 = mm 163
CH. 8: LINEAR PROGRAMMING
Fig. 22. 2nd Tableau
Ist Tableau
p1 p2 p3 p4 p5 p6
= _ = = = =
9 =
51
£2
£3
1
O —l 0 —l l 0
—l 0 —l 0 0 l
1 l —1 —1 —-l —1*
1 1 3 3 1 1
0
0
—l
0
E1
52
P6
= = = = = =
0 —l O —1 1 0
0 l -—2 —1 —1* 1
—1 —l l 1 1 —l
2 2 2 2 0 l
0 =
0
1
—l
p1 p2 p3 p4 p5 £3
-l
1
The second tableau is degenerate. In View of the rule 31,“.o < 0, only 0:52 = — 1
can be considered as a. pivot. The next tableau still represents the same solution. 164
8.3: THE SIMPLEX METHOD
37d Tableau
4thTableau
51
P5
P6
7
P3
P5
P6
7'
p3 =
0
—l
0
p3 =
—2*
2
—l
2
£1 =
-%
1
-%
1
p4 =
—2
1
0
2
p4 =
l
—l
l
0
£2 =
—%
0
t
£3 =
1
—l
0
1
0 =
—l
l
0
0=
s
0
g:
£3 = —%
—
0 —%
2
1
2 —2
The fourth tableau g’ves the minimal solution (1,1,2) with 0: — 2. This
solution is degenerate because [)3 = p4 = p5 = p6 = 0. Even though 3/2 = 0, there are no other optimal solutions because, if 0 = — 2, then Taps + p6) = 0 and hence p3=p6=0 and p4: —p5=0.
If we drop the principle that ’71:» < 0 and try to choose a pivot in the first column of the second tableau, say 0121 = — 1, then we have
Tableau 3a P2
52
P6
T
{I =
—1
1
—1
2
p3 p4 p5 £3
= = = =
O l —l 0
—2 —2* 0 l
l 2 0 —1
2 0 2 l
0 =
0
1
—l
—l
We come to a. new degenerate solution, but Without decreasing the object function.
165
CH. 8: LINEAR PROGRAMMING
Tableauela P2
P4
7
—1
2
-%
0
2
l
—l
2
a; 4;
0
1
0
0
p1 =
0
0
El=
—%
p3 =
—l
£2 =
P6
2
p5 =
—l*
£3 =
t
-%
0
1
0 =
—§
1}
O
—1
This tableau represents the same solution as 3a. The pivot has to be chosen in the first column and there are two possibilities for this (see Exercise 2).
Tableau5a P5
P4
P6
7
£1 =
i
—%
0
1
p3 =
—1
l
—1
0
£2 =
—%
-%
1
1
p2 =
—l
0
0
2
£3 =
—%
—%
0
2
0 =
{-
§
0
—2
This tableau again represents the optimal solution. The two routes to the solution are again marked by arrows in Fig. 22. Case 3. The tableau is optimal. We restrict the discussion to normal tableaux, i.e. , we assume that [3; > 0 for all i = l , . . ., m. In this case, the condition
of being optimal is equivalent to 71: 2 0 for all k: 1, . . ., 11,. Further, we have already seen that the tableau represents the unique optimal solution if and only if 'yk > 0 for all k: l, . . ., u. If Yb» =0 for some k0 and all the corresponding an,“ are positive, then we can obtain optimal solutions by giving at 166
8.3: THE SIMPLEX METHOD
least one variable an arbitrarily large value (Theorem 8.1;1, part 2). If y,“ =0 and there is an i0 such that “to he 0’
5120 €230
—2§1+3§2+620 = ——2§1+§2=min (b)
2§1—3§2—353+18>0: +1090, —2§1+ £2 _
§3+
4201
£120 £220 E320
9: —§1—§2——3§3=min (0)
451‘3éz—E3—2é4'l'5420»
_ g,
€120
+1820,
.5220
+4020, 51—352 +3020, — §2—§3 ‘53—254‘1'4820
5320 5420
0:51—52—53—§4=min
169
CH. 8: LINEAR PROGRAMMING
Solutions. (a) 51:6, §2=2, 9=_10
(b) €1=9,Ez=8,£3=4,0=_29 (6) £1=0, 52:2, 53:28, 54:10, 0: —40. 8.4
The Treatment of Free Variables
In 8.3 we assumed that the programme was definite, i.e., that there were no free variables. We will now describe a variation of the simplex method which allows us to eliminate any free variables at the beginning of the calculation. We will restrict the discussion to normal tableaux, but it will not be difficult to see how we would deal with degenerate tableaux. Further, we will still assume that the zero-vector is an admissible solution, i.e., )3, 2 0 for all i=1, ..., m (see 8.5 for the case when this is not so). The free variables are initially in the top row of the tableau and, since we eliminate them (i.e., we do not carry them into the rest of the calculation when they have been exchanged with other variables), it follows that free variables will only ever appear in the top rows of tableaux. As before, we
will keep to the principles G1, G2 and G3 in 8.3. Then, as a consequence of G1, it follows that all the coefficients in the r-column of any of the tableaux must be positive (leaving out the row corresponding to the free variable which has been exchanged (see Example 1)).
Now we will assume that 51 is free and therefore look for a pivot in the first column. 1. If an =0 for all i, then we can always obtain admissible solutions by giving $1 an arbitrary real value. 1.1. 911:0. Then 0 is independent of 51. Hence we need not consider £1 at all in the solution. 1.2. 'yl #0. There exists no minimal solution.
2. Not all an =0. Let a,” 1 #0 be a possible pivot. 2.1. 8 goes into B—Bioyi/‘Xini which, to agree with G3 should be < 8 and if possible < 8. 2.1.1. 'yl < 0. It follows that a“ 1 should be < 0 and it is then possible to satisfy G3 in the strong form. (If all “i1 are positive, then there exists no minimal solution.)
2.1.2. 311 > 0. It follows that a,” 1 should be > 0 and it is then possible to satisfy G3 in the strong form. (If all an are negative, then there is no minimal solution.) 2.1.3. 311:0. It is only possible to satisfy G3 in the weak form.
2.2. For igéio, )3, goes into Bi—‘Bioail/aio 1 which, to agree with G1, should be 20 and if possible >0 to agree with G2. 2.2.1. yl < 0. Then out, 1 < 0 and G2 is satisfied whenever a“ > 0. If on“ < 0,
170
8.4: TREATMENT OF FREE VARIABLES
then as before Bi/ocfl g Bio/a5, 1 and so we choose 2}, such that Xi» =
max
x,-
i,au 0. Similarly, we choose i0 such that Xin= min Xi‘i, an >0
2.2.3. 71 =0. This case can be dealt with either as in 2.2.1 or 2.2.2. Thus we have the following rule for choosing the pivot. If 71 > O (< 0), then we choose the pivot from among the positive (negative) coefficients of the first column. The one which is actually chosen is
decided by calculating the characteristic quotients x,=B,~/a,-1 for each of the positive (negative) coeflicients on“ and finding the index i0 such that
Xio = min Xi( max Xi)‘ If 'yl =0, we can consider it to be either positive am >0
i, am w0, every optimal solution of ( 1) is also an optimal solution of (2). This statement of Theorem 1 corresponds to the intuitive idea. that, for a large positive on, in order to make 01 small we must first minimize 02 and
then 00. Proof. 1. An admissible solution 7
3
97 = Z )‘kak‘l' 2 .u'lcbk
k=l
(3)
k=1
(see 8.1 ;(6)) is optimal for an object function 0(a) if and only if from M > 0,
and
it follows that (1,, is an optimal solution
from M: > 0,
it follows that 0(bk) = O
(k = 1,. . .,r)
(k = l, . . .,s).
2. Let K be the set of admissible solutions of the programmes (1) and (2) and let a: e K be an admissible solution which is not minimal for 02. Then, for the representation (3) of at, either
there exists 160,
such that A,“ > O and ako is not minimal for 02,
or there exists loo,
such that p,“ > 0 and
02(bko) 95 0.
In the first case, there exists 3. ch 6 K such that 02(cko) < 0201,“). From this, it follows that
91(0):») = 000%..) +w92(6k.) < odds.) +w02(ako) = (Make), whenever
00(cko) — 000%)
w > ————— ' 02(ako) " 02(0160)
If (.0 satisfies this condition, then aka, and hence x, is not optimal for 01.
173
CH. 8: LINEAR PROGRAMMING
In the second case 01(bko) = 60(bkn)+w02(bko)
)
00(bk w 95 - —°-
when
* 0,
02(1),...)
If on satisfies this condition, then x is not optimal for 01. 3. Obviously, Theorem 1 will now be satisfied if we put (00 equal to the largest of the numbers
90(0k) —' 00(0):) —
and
02(0):) “ 92031:)
—
00(bk) 7 02(13):)
where at runs through all the vectors in the set {0,1, . . .,a,} for which 02 is
not optimal, 6,, satisfies the condition 02(ck) < 02(ak) and bk runs through those vectors in the set {b1,.. .,b8} for which 02(bk) #0. We will first carry out the method for a particular example. Example].
I
P1 = 514'452— 8 3 0 2514‘352-12 9 0 P2
p3=2£1+£2—6>0 9 =
§1+
£1 2 0 £2 9 0
4
£2 = min
We add 53+ 12 to each of the restrictions on the left-hand side, where E3 is a new free variable, and we introduce a new restriction £3 + 12 2 0.
We also replace the object function by 0 + w(f3 + 12) = £1 + £2 + w(§g +12) where w is a constant whose actual value will not need to be known. In this way we obtain the new programme
p: = £1+4§2+éa+ 4 2 0
£1 2 0
P2 = 251+3§2+§3
2 0
£2 3 0
P; = 2§1+ §2+§3+ 6 9 0
£3 free
P: =
§3+12 2 0
01 = {1+ §2+w(§3+12)= min
(5) J
The programme (5) has the zero vector as an admissible solution (all Bi 2 0) and can therefore be solved by the earlier methods. From Theorem 1, we also know that, for sufficiently large to, every optimal solution of (5) is
also optimal for the object function £3 + 12 with the same restrictions. (The constant terms in the object functions obviously do not affect the solution.) Now however the minimum of £3 + 12 is zero, providing (4) has any admissible solutions at all. If this is the case, then the solution of (5) will auto-
matically give {53 + 12 = 0 and the corresponding values of £1 and {52 will be an optimal solution of (4).
174
8.5: GENERAL LINEAR PROGRAMMES
$76.3
86.3w
aw 3V ofifiaumgm
.3 .aE o H Hi
.52" a.
.amEOO"
0 H at
o H on
3v oEEdQOHW
“w
175
CH. 8: LINEAR PROGRAMMING
In order to carry out the calculation, it is not necessary to know the actual value of m. We merely assume that w is large enough to ensure that, for any constants cc, ,8( > O) which appear in the calculation, the numbers ac + Bo) and oc— film are positive and negative respectively irrespective of the sign of ac. For simplicity, we will leave out the asterisks on the pi in the following tableaux. IstTableau
£1 £2 £3
7'
pl:
1 4 1
4
P2:
2
0
3
1*
p3:
2 1 1
6
[14:
0
1
12
01:
1 1 to
0
12¢»
First we eliminate the free variable {3. 2nd Tableau
£1
£2
pg
1-
p3 = p4 =
0 —2
—2* —3
l l
6 12
91 =
1 —2w
1 —3w
w
12w
3rd Tableau
fl
P2
1-
P1 =
‘1
‘i
%
7
52 =
0
-%
92
3
3w—l
1—14)
p4 = 6?1 =
176
p3
—2*
g 4:
l—2w T T
3
3(w+l)
8.5: GENERAL LINEAR PROGRAMMES
4th Tableau P4
P3
P2
T
%
P1 =
%
—%
71
52 =
0
—%
12-
3
£1 =
-%
*2
—%
%
2w—l 01 = T
1
i» :
The fourth tableau is optimal. From it, we have {1 = g and £2 = 3. Hence, from the second tableau, we have £3: —3—9= —l2 and therefore £3 + 12 =0. This means that §1=%, 52:3 is an optimal solution of (4) and the minimum value of 0 is g. The programmes (4) and (5) are represented geometrically in Fig. 25. We note in particular that for (5) the planes 01 =constant become steeper
as w decreases. If a) is too small, then the optimum is taken at point B instead of point 0 which corresponds to the optimum of (4).
This example shows that the method applied here is also applicable in general. This can be described in terms of the following rules for the solution of the general programme it
1E1 “erg/n+3; 2 0
(i = l,...,m)
some of the Ex: are free and some are 2 0
(6)
fl
0 = E Ykélc = min k=1
1. We construct the new programme 75
kgl aikfk+£n+l+8+13i 2 0
(1’ = 1,
)m)
§n+1+8 2 0
Conditions for £1, . . ., 5,, as in (6) and {n+1 free
(7)
n
91 = kglyk§k+w(§,.+1+3) = min where 8 is chosen so that 8+,3i20 for all i=1, ..., m. 177
CH. 8: LINEAR PROGRAMMING
2. We solve the programme (7 ), assuming that w is so large that, for any constants or and B( > 0) which appear in the calculation, the numbers at + Ba; and ac— [3w are positive and negative respectively, irrespective of the sign of cc. 3. If, in an optimal solution of (7), fa“ + 3 > 0, then (6) has no admissible solution. On the other hand, if 5,,“ + 8 = 0, then an optimal solution of (7) also gives an optimal solution of (6).
Exercise Solve the linear programme §1_§2_ 2 9 0: §1+§2— 8 9 0:
£1 9 0 £2 9 0
—2fl—§2+20 2 0 0 = 3§1+§2 = min
by introducing a new variable.
Solution. 51:5, {2:3, 0:18. 8.6
The Simplex Method and Duality
In this section we will show that the duality law of Linear Programming
(Theorem 8.2 ;2) is contained in the duality law of the Exchange Method (see 3.3.2).
We consider the definite programme p=A§+B>O
£20
(1)
0=Y’§=min where we will assume that [3 2 0. The dual programme, denoting the variables by an and 0.7, is °I=_n’A+Y,>0
71’? 0
(2)
-r = —n’p = max
(Note that here 1 now denotes one of the two object functions and is not to be set equal to l.) 178
8.6: THE SIMPLEX METHOD AND DUALITY
Suppose that the first and final tableaux for (l) are £1
£1:
P1
.81
Pm ‘_
0
E
1
BM
7’1
7n
P
P:
A*
(3)
(4)
g:
0=
0
vi
3/:
where (3* 20, y*’ 20, 3 0 03 = 2711+ 772— 6 2 0
’72 9 0
‘1' = 711+l = min 7- : —n1—7}2 = max. i.e., This example is identical with Example 8.5 ;l which we have already solved by introducing a free variable. Since the coefficients of the function — 7- are positive, we can also solve the programme directly by writing the tableaux
in the vertical form. 2nd Tableau
Isl Tableau (71:02:03 =
‘711
”1
"2
‘2
l
—8
—12
—6
“’72: 02: 0'3:
7':
1
—n2 —4* —3 —1
1
0
‘711
i
—%
'i*
l
2
—6
—4
7':
i"
-01 a —e —i
l —2
181
CH. 8: LINEAR PROGRAMMING
'02 =
3rd Tableau
—as a} —ol —%
111 =
—, —% —4* %
-r :
% %
1—4’ ——a—2 —° —%
1
1,2 =
4th Tableau
02 =
0'1 =
1,1 =
1- =
—03
%
fi
‘2
i
‘02
"%
"%
i
i
3
”—f
%
‘17“
l
From the fourth tableau, we have the optimal solution 111 =§, 172:3. The
maximum value of +1- is —% and hence the minimal value of — 7- is +% which agrees With the earlier result. Example 3.
P1 = f1+§2—1 > 0 ‘52+1 2 0 P2 = 0 = 2E1+§2 = min.
In the horizontal interpretation, this example can only be solved by introducing a free variable 63. However, since the coefficients in 0 are positive, we choose to use the tableaux in vertical form where we will denote the variables by f, ,2 instead of 1,, a. Ist Tableau
211d Tableau
P1 = P2 =
£2= p2= —0=
-0 =
—{;'1
-—1
0
2
—§1
1
—§2
—1*
1
1
—pl
—1
—1
l
0
l
1
182
1
-—1
l
1 O
—l
8.6: THE SIMPLEX METHOD AND DUALITY
The optimal solution is El =0, f2=l and the minimum value of 0 is 1. Exercises
1. Formulate the dual programmes of the three programmes in Exercise 83 ;4 and solve them. Solutions. (a) —771+1]2+27]3—2 2 0,
7,1 2 0
711+”02—3773‘l‘1 2 0:
772 2 0 713 2 0
'r = —4n1—81]2—6'q3 = max
ns=%;
nz=%,
n1=0,
(b) —27]1+2”)2 ‘130, —1>0, 3771— n2 3'01 +773—3>0»
r=—10
71120 12220 71320
7-: —181}1—10772—4773=max 711:7? 712=%: Via-1%; 7': ‘29
(0) ‘4’71'1‘712— "’13
+190:
77120
—1>0,
71220
711
+n4+ 175-00:
17320
2"71
+2VI5—190:
W490 71590
3771
+3713+7l4
7': —547)1—18112—40173—30174—48775=max
711=tu 172:0» 773:0) 774=%: 775=%§1’= ’40 The solutions could also be read off from the final tableau of Exercise 8.3 ;4 by using the vertical interpretation. 2. Solve the linear programme of Exercise 8.5, without introducing a new variable, by using the vertical interpretation. Solution. The initial tableau is
—§1 —._s2
P1:
P2:
—1 1
—1 —1
P3:
2 1
7=
3 1
183 13
CHAPTER 9
TCHEBYCHEV APPROXIMATIONS 9.1 Tchebychev’s Method of Approximation Consider the following real system of linear equations
:3: Maw; = o (i = l,...,m),
i.e., A§+p = 0
(1)
and for the moment suppose that (l) is the expression of some physical law, i.e., that the physical quantities a“, 13;, Q are connected by the relationships given in (1). Further suppose that the coeflicients a“, and B,- can be
measured by experiment and that the quantities {,6 have then to be calculated using the equations (1). Since experimental measurements are always subject to error, it is possible that the system (1) will be in-
consistent and have no solution even though physical considerations will show that there should be a solution. The problem then is to solve the system ‘as closely as possible’ in a sense which must be more precisely defined. For arbitrary real values 51, . . ., f," we will put 1‘
E1: = k2] aikfk+fii (’1: = l,.. .,m),
i.e., E = Ag+p
and refer to these quantities as residuals. If E is a solution of the system, then a = 0. Consequently, when the system cannot be solved exactly, we will try to find values of the unknowns .51, ..., fn such that the greatest of the absolute values of the residuals is as small as possible. The solution of this problem is known as Tohebychev’s Method of Approximation. We remark here that, apart from Tchebychev’s Method, there are several
other methods of approximating to the solution and in particular there is the method of least squares which is due to Gauss and which will be met in 12.2. In fact the method of least squares is more suitable than Tchebychev’s method for solving those problems in which the inconsistency of the equations (1) is due to statistical errors in the coefficients, as for example in
the sort of physical problem considered at the begirming of this section. 184
9.1: TCHEBYCHEV’S METHOD OF APPROXIMATION
(Of. [20], p. 124). Tchebychev’s method is more useful for the approximation of functions (see Example 4). Now let a be the largest of the quantities [Gil which we have to minimize. This can also be characterized as the least upper bound of the absolute values M |, so that IEiI S o for all i = l, . . ., m and equality holds for at least one value of i. Thus we have to find values of the unknowns 5,, which make a- as small as possible. Since [Eel S a may be written in the form 5i S a and —— e; S a, we obtain the following linear programme: it
6i = kzlo‘ikgk'i'pi < 0 i: 1,...,m
fl
(2)
—€i = — Z “Hugh—Bi < 0 k=1 0' = min
which has the free variables {31, .. ., L, and the object function 0:0 and clearly this could be solved directly using the techniques of Chapter 8. However, since the zero vector is not an admissible solution it pays to reformulate the programme by dividing through by a and introducing the following notation.
:=%,
3=;.l
(a
Then (2) may be rewritten in the form
pa = élaik§2+f3¢ 3+1 2 o; Piz = — 1:231 airfi—B;§:+l 2 0;
a: 1, ...,m i= 1, ...,m
(4)
—f°:, = min
This is again a linear programme with the free variables {3, 5‘1“, ..., f; It has {3 = f: = . . . = 5’; =0 as an admissible solution and it is easy to find the simple rules by which it is derived from (1). In particular, we note that
Pi1+P52 = 2 fori = l,...,m.
(5) 185
CH. 9: TCHEBYCHEV APPROXIMATIONS
The initial tableau for the simplex method is
f?
£3
1
B1 S
1 E
Pml =
Bin
1
P12 =
"‘31
1
P11 = S
6:
A
(6) —A
E Pm2 =
0 =
0
0
3
E
—3m
1
—l
0
During the calculation, it is helpful to keep the relations (5) in mind, because they must also be satisfied in each later tableau. This means, 1. If for some i, p“ and p52 are on the left—hand side, then the corresponding rows are identical except that they have opposite signs (apart
from the last coefficients whose sum is equal to 2). 2. If p“ is on the left-hand side and pi2 is in the top row, then the row corresponding to p“ consists entirely of zeros except that the coefficient
under piz is — 1 and the last coefficient is 2. By remembering these two rules, we can save ourselves a considerable amount of work in the calculation. The final tableau of the method leads to the solution of the approximation problem as follows. 1. The minimal value of the object function is 0: —§f,= — 1/0. 2. The values of the unknowns g, are given by 611: 0ft,
,0: 1,...,7L.
Example 1. We will first consider a trivial example, viz. the system
51 = 1
(7)
61:0
With only one unknown which is supposed to satisfy the incompatible equations (7 ). We see immediately that the Tchebychev method gives the solution 8:}. Both residuals then take the absolute value i, While, for 186
9.1: TCHEBYCHEV’S METHOD OF APPROXIMATION
every other value of £1, at least one of them takes a larger absolute value. O
The linear programme corresponding to the system (7) is \V
P11= fl“— 3+1 P12 = —§:+§3+1 2 0 +1 2 0 P21 = fl P22 = “El +1 9 0 9 = —§f, = min
Ist Tableau
5
£3
1
P11 = P12 = P21 =
1 —1 1
—1* 1 0
1 1 1
P22 =
‘1
0
1
9 =
0
—l
0
2nd Tableau (Elimination of £3) E:
P11
1
3 _
1
—1
1
P12 =
0
—‘1
2
P21 =
1
0
1
P22 =
—1*
0
1
-—l
1
—l
9 =
187
CH. 9: TCHEBYCHEV APPROXIMATIONS
3rd Tableau (Elimination of g) P22
P11
1
o —1
2
P21 =
'—l
0
2
j =
—1
0
1
9 =
1
l
—2
m=
This tableau is already optimal and gives E: = l , p11 = 0, and, from the 2nd tableau it follows that £3 = 2. The minimum of 0 is — 2 and therefore 0min = § and hence £1 = a}. The calculation confirms the obvious result.
Example 2.
251
—4 = 0 62—1 = 0
(8)
51+52—2 = 0 §1=2 “£2
\
a: o
§2=l
l
l
l
>§1
3
2
§1+§z=2 Fig. 26.
For any point (£1, E2) of the plane, [fl + £2 — 2| is the distance of the point from the line §1+§2—2=0 multiplied by V2 and similarly [251—4[ and |f2 — l I are the distances from the lines 251 -— 4 = 0 and £2 — 1 = O multiplied by 2 and 1 respectively. The problem now is to find a point (£1,§2) which will
make the largest of these distances (multiplied by the appropriate factor) as small as possible. (See Fig. 26, Where the solution is denoted by x.) The corresponding linear programme is 188
9.1: TCHEBYCHEV’S METHOD OF APPROXIMATION
P11 = 2E; P12 = —25:
P21 =
‘43-” 2 0 +4§b+1 2 0
f;—
P22 =
3+1 2 0
‘5?!- fb'l'l 9 0
(9)
pa; = §t+£§—2§3+1 2 0 p32 = — £1- §+2§§+1 2 0 9 = —E3 = min
Ist Tableau
2nd Tableau (elimination of £3)
E?
£3
£3
1
f?
E;
P11
1
P11 =
2
0
—4*
l
f) =
i‘
0
-1
i
P12 =
—2
0
4
1
p12:
0
0 —1
2
_%*
P21 =
0
l
—l
1
P21:
1
i,
2_
P22 =
0
—-1
l
1
P22:
12.
._l
_i
.2
P31 = P32 =
1 “1
1 ‘1
’2 2
1 1
P31= P32=
0 0
1 ‘1
l‘ ‘1‘
% g
0=
0
0 —1
0
a=
fl};
0
3rd Tableau (elimination of a)
P21
5;
P11
P12 =
0
0
—l
2
P22 = pa =
"'1 0
0 l
0 i
2 %
P32 =
0
—1*
‘b
%
9 =
l
—l
0
—l
i —;
4th Tableau (elimination of E2)
1
P21
P32
P11
p12 =
0
0
--l
2
P31 = 3 =
0 ‘1 0 0 —1 —%
% 3
0=
l
1
§
1
—§
189
CH. 9: TCHEBYCHEV APPROXIMATIONS
Since all the free variables are now eliminated, this is the point at which the simplex method should begin. However, since the first part of the last row consists of all positive numbers, we already have an optimal tableau. The optimal solution is E; = %:
P11 = P21 = P32 = 0:
P12 = P22 = P31 = 2-
From the third tableau it follows that 5* = g. The minimal value of 0 is — g, hence 0min =% and finally {‘1 =%, £2 = g. Substituting these values in (2), we obtain the residuals 61 = 62 = 63 = - g. The fact that all three absolute values
[5;] are equal to % is the result of a more general rule, viz. the rank of the system of inequalities in programme (4) is equal to the rank of the matrix (11,9) and, in our example, this is equal to 3. Hence, in a normal basic solution, three of the inequalities (9) must be satisfied With the equality
Sign.
Example 3 (see Fig. 27).
51+ gr” 8 = 0
(10)
£1“ £2- 2 = 0
§1+252—10 = 0
Ist Tableau
£1
190
£2
£3
1 1
P11 =
l
l
—-8
P12 =
—1
—— 1
8
1
P21 =
l
—1
'— 2
1
p22 =
—l
1
2
1
P31 =
1
2
— 10
1
P32 =
—l
—— 2
10
l
9 =
0
0
—- 1
O
9.1: TCHEBYCHEV’S METHOD OF APPROXIMATION
M2 §1+§2-2=0
§1+Ez-8=0
§1+2£2—10=0
We find by elimination that
E?) = fifi+fi§;—fil’31+il6 2
4 *
_. £1* — _u s P22+Jr§2—%P31+ls_'
The 4th Tableau then becomes
P22
P11
P31
1
z =
—1
—4
3
P12 =
0
—l
0
2
P21 =
—-1
0
0
2
P32 =
0
0
0=
%
g —
‘—1*
2
2
—1
At this point, we must start to use the simplex method. 191
CH. 9: TCHEBYCHEV APPROXIMATIONS
5th Tableau
P22
P11
P32
1
P12 =
0
-—l
0
2
P21 =
—1
0
0
2
P31 =
0
0
—l
2
0 =
L-
g
l
—3
This tableau is optimal and we find P12 = P21 = P31 = 2:
P11 = P22 = P32 = 0
and hence £3: 8, fl = 15. The minimal value of 0 is —3, hence 0mm =§ and 51:5, fz=§. Again all three [ail are equal to the same value 9; as they should be.
Example 4. We start with the function x(-r) =cos 111/2 of a real variable 1and look for a polynomial 2(7) = a0+ «11+ 412 7-2 of degree at most 2 which most closely approximates to 27(7) at the points 1- = 0, i l , i 2, i.e., for which
the largest of the five absolute values [2(1) —x(r) I, (7- :0, i l, 1- 2) is as small as possible. Since x(-r) is an even function, 2(7) must also be an even function and hence «1 =0. Consequently, it is suflicient to consider only the values 1:0, 1, 2.
Now
2(0) —x(0) = a0
—l
2(1)-x(1) = ato+0t2 2(2) —x(2) = oco+4oc2+ 1.
Thus we have an approximation problem to solve for do and «2 using
Tchebychev’s method.
Ist Tableau
a:
P11 = P12 =
192
1* -1
a:
£3
1
4 —-4
l —— 1
1 1 1
P21 =
l
1
0
P22 =
-- l
—l
O
1
P31 =
l
0
'— l
1
P32 =
—l
0
l
1
0 =
0
0
-— l
0
9.2: PROOF OF TWO RESULTS USED EARLIER
The solution is do = i «2 = — %, a = i. The required polynomial is therefore 2(7) = 1(3 — 21-2). It differs from x(-r) by the absolute value & at each of the five points 7:0, 1 1, -l_-2. Every other polynomial of degree at most 2
differs from x(-r) by more than i at at least one of the five points. Exercises
Use Tchebychev’s method of approximation to solve the following systems of equations. (a) £1
=5
§2=7 51+§2=0 £1—§2=1
(b)
3§1+3§2+363=90
51+ 52“ §3=20 §1_ £2+ 63:20 ‘51"? 52+ 53:20 Solutions. (a) 51:1,
52:3.
0’) §1=§2=§3=1L
9.2
9.2.]
The Proof of Two Results used Earlier
Theorem 6.3;2
In the proof of Theorem 6.3 ;2, we assumed that 9 = min max "‘i ' '~:Ar)_§iol Ak>0 l$i I‘Therefore there exists a ho such that
f(x0’yko) > M and “11:0 > 0-
In this case yM is inessential because, if y is a mixed strategy of Min in which yko has a strictly positive weight, then it is easy to see that f(x0,y) >'u,.
Therefore y is not optimal. The Practical Significance of Mixed Strategies
In the applications of Game Theory, mixed strategies are usually used in the following stochastic way. (The concepts of Probability Theory, which are used here may be found in [29].) 199 14
CH. 10: GAME THEORY
Let x: (£1, . . .,§m) and y=(1;1,...,17,,) be mixed strategies for Max and Min. We now think of the subscripts 13, k of the pure strategies as being independent random variables which take the values 1‘: l, . . ., m and k = 1,
. . ., n with the probabilities fl, . . ., 5m and 7,1, . . ., 7}”. Then the pay-0E is also a random variable which corresponds to the matrix element a“, with prob-
ability £51”. The expected value of the pay-ofi' is therefore
9081/) = aaikéink = E’A"! = “$41), i.e., it is equal to the value of the pay-01f for the mixed strategies 2: and y,
We now imagine that not just one but N moves are played. At each move. Max and Min each independently choose a pure strategy at random subject to the probabilities fl, .. ., 6m and 711: . . ., 17” given by the mixed strategies 2:
and y. At each move the pay-ofi‘ has the value of a matrix element “it: (with a certain probability) and we can calculate the arithmetic mean of these values denoting it by gN(x,y). By the laws of Probability Theory, these
means converge with probability 1 to the expected value as N—> co, i.e.,
13111100 m, y) = Way) = flay)Example 3. In the game already considered (Example 2) 0112 A=1023 2101
Max has 3 and Min has 4 pure strategies available. Both players can now achieve their optimal mixed strategies, x = (%’%:%)
and
(1/ = (%,%,%,0)
by the following stochastic method. At each move, each player throws an ordinary die. Max chooses
and
i = 1 when he throws l, 2 or 3 .5 = 2 when he throws 4 i = 3 when he throws 5 or 6.
Min chooses k = 1 when he throws l k = 2 when he throws 2, 3 or 4 k = 3 when he throws 5 or 6 and he never chooses Ic=4.
Thus at each move the fall of the die determines a pure strategy for each player. It corresponds to an element a“; of A as the value of the pay-off. 200
10.3: THE EVALUATION OF GAMES
If the arithmetic mean of a sufficiently large number of these pay-offs is now calculated, then it approaches arbitrarily close to the value of the game, i.e., to p.=-g-.
The method of applying mixed strategies which we have described here must be used in practice with great care. As long as only a finite number of moves are played, there certainly is no guarantee that on average the players will reach the value of the game. The effective mean value Will only approach within a given small distance a of the value p. with a given prob-
ability when the number of moves is sufi‘iciently large. However, if the number of moves is given, then the mean value will differ from p. by given amounts with given probabilities. For example, if just one move is played in the game above with the given optimal strategies, then Max has the probability 10/36 that he will only achieve the value 0. The probability that the pay-off for Min is either 1 or 2 (i.e., worse than u) is 26/36. If two moves are played, then Max has the
probability 540/1296 of achieving on average only 0 or % while the probability that Min will have an average pay-off of at least 1 is 756/1296. In order to investigate this situation, it is necessary to make new approaches to the theory which are outside the scope of this book (see for example [26] chap. 13).
Problems
1. The pure strategy to for Max is said to be strictly dominated if there is a mixed strategy :15: (£1, . . .,f,,,) with Ein=0 such that E flat“, > 061:0 k=l
(,0 = 1,...,’I’L).
(l)
Show that every strictly dominated pure strategy is inessential. 2. If the ‘ > ’ sign in (1) (Problem 1) is replaced by ‘ 2 ’, then the strategy to is said to be dominated. Is every dominated pure strategy inessential?
3. Prove the assertion concerning saddle points which was stated at the end of 10.1.
10.3 The Evaluation of Games by the Simplex Method If we add the same constant A to all the elements of the matrix A, then the
pay-off function f(x, y) of the corresponding game becomes
f*(w,y) = E’(A+ADM = E’AnH‘é’Dn =f(w,y)+)«,
201
CH. 10: GAME THEORY
where D is the m x 7:. matrix all of whose elements are equal to l, i.e., f (x, y)
is simply increased by the constant A. Hence the value of the game is also increased by A and the optimal strategies are the same in both cases. Thus,
if we know the value and the optimal strategies for the game A , then we also know them for the game A +AD and conversely. By choosing A sufl‘iciently large, we can ensure that the value ,u+A of A +AD is strictly positive. In particular this will be the case when all the elements of A + AD are strictly positive. Because the pay-off function will then take only strictly positive values and hence the value of the game will also be strictly positive. Thus there will be no essential loss of generality for the numerical evaluation of a game if we assume that its value is strictly positive. Thus we will consider a game with the matrix A and the value p. > 0. An optimal strategy y=(1]1, . . ., 17”) for Min satisfies the inequalities
A1. s a,
(1)
where y. is the m x 1 matrix all of whose elements are equal to [1,. Indeed y. is the least real number for Which there exists 1, such that both (1) and n
kglm=h
77,520
(lc=l,...,n)
(2)
are satisfied. Ifthere were a smaller number [1.1 < ,u. with this property, then
flab?!) = 27147] < Ely-1 = [1-1 for every mixed strategy of Max, so that the value of the game would be less than or equal to [1,1, The relations (1) and (2) together with Ir=min constitute a linear pro-
gramme, the solution of which will give the optimal strategies for Min and the value [1,. As in 9.1, it is useful to modify the programme by dividing by ,u.. Putting 1):=17k/y., we have a = —A'q*+1 2 0 TM:
_1,,,* = _
11* 2 0 1
*
771; =
1
(3)
a
The initial tableau of the simplex method for the definite programme (3) is 202
10.3: THE EVALUATION OF GAMES
1]:
01
1):
1
=
l
.
-A.
.
1
am = 0:
—l
—-l
(4)
0
Suppose that the final tableau is a'
11*
a
= A*
77* =
l
:81 S B1»
0: h w W
a
where 3‘90 (i=1,...,m), ”>0 (k=l,...,n) and 8 0
and
0‘11
0‘12
0612
0‘22
2
= “110522"“12 > 0-
Example 4- L“ 4(37):fi+2§§+§§+4§1§2+8§1§3+2§2§3=E'Ag: where
124 A=221
411 212
11.1: QUADRATIC FORMS ON REAL SPACES
Isl Tableau
2nd Tableau
£1 52 £3
£2
:1 =7)1 =
§2=l=
712 =
n; =
£3
—2*
—7
‘7
—15
’73 =
Note that 11’2 and 7/3 are not identical with the original 112 and 173. 3rd Tableau
Es §3=~q§=
17°
In the calculations it is possible to make use of the fact that all the tableaux are symmetric. For instance, in the 2nd tableau, only one of the co-
efficients, — 7, needs to be calculated—the other follows by symmetry. Thus q(x) is put into the reduced form
QM = €i—%€§+%C§
(8)
by the transformation C1 = §1+2§2+ 453
£2 =
—2€2— 763
{3 =
lzlfs
(9)
The coefficients of Cf, are the reciprocals of the pivots. From (8) it follows
that q is not positive definite. The equations (9) are particularly easy to solve for the Eh because the
matrix of coefficients is triangular (i.e., all elements below the main diagonal are equal to zero). We first solve the 3rd equation for {13, then the 2nd for £2, and finally the 1st for £1 to obtain 51 = {1+ §2+T°6§3 £3 =
—%€2—fi€s
£3 =
fiCs
(10)
We can easily verify (8) by substituting (10) in the original representation of q(x). 2] 3
CH. 11: FORMS OF THE SECOND DEGREE
The equations (10) could also be found by doing the reduction with the full tableaux (see Example 12.1 ;4). It is also easy to find the basis vectors f1, f2, f3 to which the components
{1, £2, :3 are referred. For, if we write (9) in the form §=S§, then f =S* e=(S‘1)’e Where {21,e2,e3} is the original basis (Theorem 5.4;2).
Now S“ can be found from (10) and we have f1 = 31 f2 = e1_%e2 _ .6. 7 .2. f3 " 1931—1992+1933The method can obviously be applied in the way just shown (using the
diagonal elements in their natural order as pivots) whenever A1, . . ., An are not zero and in particular for positive definite forms. The matrix S in (9) is always then a triangular matrix. Since the matrix of the reduced form is diagonal (i.e., all elements ofi‘ the main diagonal are zero) and in View of
the remark after the proof of Theorem 1, we have the following result. Theorem 5. If, for a real symmetric matrix A, the determinants A 1, . . ., An are not zero, then there exists a non-singular real triangular matrix S such that A* =S’AS is diagonal. (See Problem 2.)
11.1.4
The Inertia Theorem it
Ifthe reduced quadratic form q(x) = 2 Ah {,E is transformed by substituting k= 1
z: = while, z: = a,
and
ink 7e 0 int = 0,
then q(x) becomes 1|
q(x) = 2 Mg?
k=1
where
pk = i1 01‘0 (k = l,...,n).
(11)
Of course, without carrying out the reduction, it is not possible to say which pk are equal to +1, — 1 or 0. However we do have the following Inertia
Theorem which is due to Sylvester. Theorem 6. If the quadratic form q(x) is represented in the form (11), the numbers of positive and negative terms are uniquely determined by q(x). In particular, these numbers are independent of the method of reduction and the
transformation of the variables used to reach the form (11). 214
11.1: QUADRATIC FORMS ON REAL SPACES
Proof. If the theorem were false, then there would be a quadratic form with two representations
q(x)= %+...+§§-§§+l—..._ga = 71i+-~+TI§—’73+1—~-—77§7 where p > g. It could happen that no negative terms appear in either or both
of these representations. 0n the other hand, we can assume that some positive terms actually appear in the first representation. Suppose that the components ft refer to the basis {eh . . ., en}, and 17k referto the basis {f1, . . .fn}. Let L1=L(e1, . . .,e,,) and L2=L(fq+1, . . .,fn). By Theorem 3.2 ;8 there is an
x0 6 L1 n L2, xoaéO. For this vector x0, fp+1=...=§n='q1=. ..=7)q=0, while not all of £1, ..., 51, are zero. From the first representation of 9, it follows that q(xo) > 0 and from the second that q(xo) S 0. Thus we have
reached a contradiction. Exercise Reduce the following two quadratic forms to canonical form and state the
corresponding transformation of the variables. Are these forms positive definite? (a) 9(x)=§i+3§§+5§§—2§152+4§1§3*6§2§3 (b) q(x)=2§§+3§§+8§§+2§1§2—8§1§3+6§2§3.
Solution. (a) q(x) = g? + m + 2§§ where
{1:51— §2+2§3
{z =
252 - £3
:3: %fs q(:c) is positive definite.
0’) 9(3) = i5? + H3 -fi€§ where
51:251 + 52— 453 £2 = %f2 + 553 C3 =
— 1053
q(x) is not positive definite. Problems I. Let A and S be n x 7:, matrices. Show that, if A is symmetric, then so is
S’AS. (Cf. Theorem 1.)
0 2. Show that Theorem 5 is not true for the symmetric matrix A =(1
1 O). 215
CH. 11: FORMS OF THE SECOND DEGREE
3. Prove that, if A is a real n x 11. matrix and detA #0, then the quadratic
form q(x) = g’AA’g is positive definite. 4. Suppose that, for the real n x n matrix A, the determinants A1, . . ., A" (of. Theorem 4) are all strictly positive. Prove that all the diagonal elements a“, are also strictly positive.
5. Suppose that the quadratic form q(:e) has the reduced form (6) with respect to a given basis. What is the polar form of q(:c) with respect to the
same basis? 6. Prove that, if A is a real symmetric matrix, then there is a real number A
such that the quadratic form §'(A +AI) E, is positive definite. 7. HA is a real symmetric matrix, show that E’AE is positive definite if and only if E’A‘l E is positive definite. 8. Prove that the symmetric matrix A of the quadratic form q(x) =E’AE with respect to a given basis is uniquely determined by q. 9. Let f(x, y) = g’An be a bilinear form on the vector space E. Prove that
f(x,x)=0 for all 1: 6E if and only iff(x,y)= —f(y,:c) for all x, y e E, i.e., if A = —A’. 11.2 Hermitian Forms on Complex Vector Spaces
Part of the discussion in 11.1 can be carried over directly to the case of complex vector spaces. However, we meet with difficulties for example when we come to the definition of positive definite quadratic forms. Apart from this, it is preferable on formal grounds to proceed somewhat differently in the complex case as will be described in the present section. In this, we will not repeat proofs which are simple generalizations of those in 1 1.1. If at is a complex number, we will denote the complex conjugate of a by 0?. 11.2.1
Quasi-bilinear Forms
Definition 1. A ‘quasi-bilinearform’ on a complex vector space E is a complexvaluedfunction f(x, y) of two variables x, y e E which is linear in a: and which is dependent on 3/ according to the rules
and
f(x, 2/1 +y2) = f(x, 2/1) +f(x, .112) f(w, 04/) = o'cf(x, y)
for all x, 3/, 3/1, 3/2 e E and all scalars oz. Example 1. We use the same notation as in Example 11.1;1, except that now z(a), y(1-) and k(o, -r) are complex-valued functions of the real variables 0', 7'. Then
fey) = f —1
is a quasi-bilinear form. 216
l
L".+
+1
k(a,r)x(o)37(—r)dadr
11.2: HERMITIAN FORMS 0N COMPLEX SPACES
If E is finite-dimensional, the representation 11.1;(1) corresponds to the following representation of the quasi-bilinear form f(z, y)
f(x,y) = i asem=§’Afi, where aa=f(et,ek) h
(1)
and Where '7] means the matrix whose elements are the complex conjugates
of the elements of 1]. For a given basis, f and A again determine each other uniquely. Under a change of basis §=S§*, A goes into A*=S’A§. Definition 2. The quasi-bilinear form is said to be ‘Hermitian’ if
“21:97) = fwd/)(We write f—(x, y) instead off(x,y).) We note that the property f(2:, say) = o?f(x, y) of Definition 1 is a consequence of the linearity with respect to x and the condition f(y, x) =f(x, y). (f(2:, any) =f(ocy,x) = o'cf(y,x) = o?f(x,y).) The quasi-bilinear form in Example
1 is Hermitian if and only if Ic(-r, o) =l_c(cr, 7). Theorem 1. If the complex vector space E is finite-dimensiomzl, then f(x, y) =E’Afi is Hermitian if and only if A’=Z. A matrix A with this host property is also said to be Hermitian.
The simple proof of this is the content of Problem 5. 11.2.2
Quadratic and Hermitian Forms
Definition 3. A complex-valued function q(x) on a complex vector space E is said to be a ‘quadratic form’ if there is a quasi-bilinear form f(at, y) (the ‘polar form’ of q) such that q(x) =f(x, x).
Theorem 2. A quadratic form and its polar form on a complex vector space determine each other uniquely.
Prwf- f(x, y) = &[q(w+y) + M76 + iy) - q(x —y) - Mm - #1)]Definition 4. A quadratic form on a complex vector space is said to be ‘Hermitian’ if its polar form is Hermitian. We then speak briefly of a ‘Hermitian’ form. Thus, in view of Example 1, a Hermitian form is given by +1
q(x) = f k(o, 1) x(a):7:(r) dad-r -1
whenever k(o, 1') = 76(1', 0).
217
CH. 11: FORMS OF THE SECOND DEGREE
If E is finite-dimensional, then Hermitian forms may be written in the form
906) = E’AE
(2)
where A is Hermitian. Conversely (2) always gives a Hermitian form Whenever A is Hermitian. For a given basis, q and A determine each other uniquely.
Theorem 3. A quadratic form q(x) on a complex vector space is Hermitian if and only if it only takes real values. Proof. 1. If g is Hermitian, then q(:c) =f(x, x) =f(x,w) =§(x). 2. If q only takes real values, then in particular q(z i y) and q(x i iy)
are real. Now if, in the proof of Theorem 2, we interchange x and y, q(x + y) and q(.v — y) do not change while q(x + iy) and q(:c — iy) interchange. Therefore
f(ax) =f(w, y). 11.2.3
The Reduction of Hermitian Forms
In this section we will formulate the theorems corresponding to the results of 11.1.3. We will omit the proofs which are simple generalizations of those in 1 1.1.3. Theorem 4. Given a Hermitian form q(.v) on a finite-dimensional complex vector space, then there is a basis with respect to which q(:c) takes the form
We) = i net. = i will“ k=1
(3)
k=1
where the coefiicients Ah are real. A Hermitian form is said to be positive definite if it never takes negative values and if it only takes the value 0 for x = 0. Clearly the Hermitian form (3) is positive definite if and only if all the coefficients Ab are strictly positive. The determinants A 1, . . ., A n (defined as in 11.1.3) are real for a Hermitian matrix A (see Problem 4).
Theorem 5. If q(x) is a Hermitian form and the determinants A1, .. ., A” are not zero, then there is a basis such that
qua—Alma +A2|€2l +...+ A” Iti__l_
2
‘fi
2
An—l
2
Theorem 6. A Hermitian form is positive definite if and only if Al, are strictly positive.
218
An
11.2: HERMITIAN FORMS 0N COMPLEX SPACES
Theorem 7. (‘Inertia Theorem’) If a Hermitianform is written in two different ways in the form (3), then the numbers ofpositive, negative and zero coefiicients are the same in both forms. Problems 1. Show that AA’ is Hermitian for all matrices A. 2. Show that, if A is a square matrix and detA 5A 0, then E’ AA’E is positive definite.
3. Show that, if A is Hermitian, then so is A"1 (providing A‘1 exists). 4. Prove that, if A is Hermitian, then detA is real. 5. Prove Theorem 11.2;1. 6. Let f (egg) = E’A‘q be a quasi-bilinear form on a complex vector space E. Prove that f(x,a:)=0 for all .7: e E if and only iff(x,y)=0 for all x, y E E,
i.e., if A = 0. 7. If A is a Hermitian matrix, show that S’AS is also Hermitian, where S
is an arbitrary matrix for which the product is defined. 8. Adapt the reduction method described in 11.1.3 so that it can be applied to Hermitian forms and use it to prove Theorems 4, 5 and 6.
219
CHAPTER 12
EUCLIDEAN AND UNITARY VECTOR SPACES As in the previous chapter, we will first consider the real case (Euclidean spaces) in sections 12.1 to 12.3 and then the complex case (Unitary spaces) in section 12.4.
12.1 Euclidean Vector Spaces 12.1.1 The Concept of a Euclidean Vector Space Definition 1. A Euclidean vector space is a real vector space together with a
positive definite quadratic form which is known as the fundamental form’ of the space. If q(:e) is the fundamental form of the Euclidean space E, then we refer to the real number [[x“ = + V{q(a;)} as the length or the norm of the vector x e E. The zero vector is the only vector which has norm 0. For all scalars
cc and all a: 6E, ”ax": |o¢||lx||. A vector a: GE is said to be normalized if “as” = 1. If y e E and 11750, then x: ||y||‘1y is normalized. The polar form of the fundamental form q(x), which we will now write briefly as (22,11) is known as the scalar product of the vectors a: and 3/. Thus
the scalar product is a symmetric bilinear form on E.
Example 1. The vectors of the 3-dimensional space of Euclidean geometry form a Euclidean vector space when ||x|| is defined to be the length of the vector x in the usual sense. It is from this example that the name ‘Euclidean vector space’ is derived. (See Problem 1). Example 2. The real n-dimensional space R“ is a Euclidean vector space when n
"r
”2;“2 = [El 6,2,,
and hence (22,11) = 1‘21 5,617,,
Example 3. The vector space 0 (Example 2.1 ;7) is a Euclidean vector space when
+1
"24a = f more —1
220
12.:1 EUCLIDEAN VECTOR SPACES
+1 (see Example 11.1;3). The scalar product is (x,y) = 1‘ 75(1) 11(1) (11. 1
Theorem 1. (Cauchy—Schwarz Inequality.) For all vectors x, y of a Euclidean vector space
[(96, 31)] S lll HyllEquality holds if and only if x and y are linearly dependent. Proof. 1. If x and y are linearly dependent, 3/: Ana say, then both sides are
equal to [M “2;”? 2. If a: and y are linearly independent, then we think of
9(M+M) = (Ax+py,)«x+py) = Ilacllzlz+2(w,y)MHz/”2M2 as a quadratic form in the variables A, ,u.. This is positive definite, and therefore
”96”2 (x, y) (x, .11) ”l2 = IlflfillzllzllIZ-(flml)2 > 0-
2—
(see Theorem 11.1;4). In view of the Cauchy—Schwarz inequality, it is possible to define the
angle ac between two vectors x and y( ¢ 0) by the formula cos at = (x, y)/”a” My“. Theorem 2. (Triangle Inequality.) For all vectors x, y of a Euclidean vector
space
llxII-llyll S l|x+yll < ||x||+llyllProof. By Theorem I,
”70+yll2 = llxll"'+2(%y)+H!/||2 S (llwlldHlyll)2 and the second inequality follows from this (remembering that both ||x+yH and [l + ”y” are positive or zero). The first inequality follows from the second because
lll = ll(x+y)-yll S Hx+yll+llyllThe vectors x and y are said to be orthogonal if (ray) =0. Since (mg) is a symmetric bilinear form, orthogonality is a symmetric relation between vectors.
The zero vector is the only vector which is orthogonal to all vectors x e E. This means that every Euclidean vector space E forms a dual pair of spaces with itself using the scalar product (egg) (see Definition 6.2 ;3). If E is finitedimensional, the definitions and theorems of sections 6.2.3 to 6.2.6 can be
brought directly into the present work.
221
CH. 12: EUCLIDEAN AND UNITARY VECTOR SPACES
In particular the dual space, or as we will now say the orthogonal complement, Ll of a subspace L of E consists of all those vectors y e E which are
orthogonal to every vector x e L. By Theorem 6.2 ;6, Ll'r =L (for finitedimensional spaces) and the sum of the dimensions of L and L'r is equal to the dimension of E. On the other hand, L 0 LT = {0}, because 0 is orthogonal to itself and, if “as“2 = 0, then x = 0. From these we have the following result.
(Cf. Theorems 2.4;10 and 3.2 ;6.) Theorem 3. Let E be afinite—dimensional Euclidean vector space, L a subspace ofE and L'{ its orthogonal complement. Then E=L {—9 LT, i.e.,for each x e E there is a unique representation x=x1 +272 where .751 e L and x2 6 LT.
Theorem 4. In a Euclidean vector space E, the vectors a1, ..., a, GE are linearly dependent if and only if Gram’s determinant G = Kai) dial]
is equal to zero (see Problem 2).
The elements of Gram’s determinant are the scalar products of the vectors a1, ..., an taken in pairs.
Proof. If 2’: Mai-=0, then i=1 (Zhiaimk) = 2 Ai(a,-,ak) = 0 fork = l,...,r. ‘I
(1)
|
Conversely (1) means that 2 Mai e [L(a1, . . .,a,)]l. 0n the other hand i 2 Ma,- eL(a1, . . .,a,.). Hence by Theorem 3,it follows from (1) thatZ Aim-=0. t 12 Thus a linear relation between a1, ..., a, is equivalent to the same relation between the rows of Gram’s determinant and the theorem follows from Theorem 4.2 ;7.
12.1.2
Orthogonalization
Definition 2. A subset A ofa Emlidean vector space E is said to be ‘orthogonal’ if any two of its elements are orthogonal. It is said to be ‘orthonormal’ if, in addition, its elements are normalized.
Theorem 5. Any orthogonal set A which does not contain the zero vector is linearly independent. Thus in particular an orthonormal set is linearly independent. 222
12.1: EUCLIDEAN VECTOR SPACES
Proof. If 2 Amx=0, then, for any 3/ e A, it follows from the conditions of «:64
the theorem that 2 Away) =Av(y,y) =0, and hence Ay=0. zeA
Theorem 6. Let E be a Euclidean vector space and let A be afin'ite or countably
infinite subset of E. Then the subspace L(A) of E has an orthmwrmal basis. In particular every finite-dimensional Euclidean vector space has an orthonormal basis. Proof. Since L(A) has a finite or countably infinite basis (Theorem 3.1 ;10), we will not lose any generality by assuming that the set A is linearly independent. We will prove the theorem by using the Schmidt Orthogonal-
lzatlon Method which will also produce a technique for constructing an orthonormal basis for L(A). Suppose that the vectors of the linearly independent set A are al, a2, a3, .... We first construct an orthogonal basis e1, e2, e3, by setting 91 = a1
e2 = A21e1+a2
(2)
e3 = A31+A32e2+a3, etc., where the coefiicients are to be determined as follows. Since (e1, e2) :0, it follows that
(91,)‘21 914412) = AZIllelll2+(elaa2) = 0 and, since [[e1||= ||a1||7é0, A21 can be determined from this equation. Since (e1, e3) = 0, it follows similarly that (31»)‘3191+A3292+“3) = A:31ll‘311l2‘l')l32(91:ez)+(91,flls) = A31llelll2+(elia3) = 0
and A31 can be found from this. The condition (e2, e3) =0, similarly leads to the equation A32||e2|12+ (e2,a3) =0 for A32. (Since a1 and a2 may be written as linear combinations of el and e2 from (2), it follows that el and e2 are linearly independent and hence that ||e2|| #0.) It is clear that this Schmidt Orthogonalization Method can be continued
in such a way that a new vector ek is assigned to each vector ak. In view of the method of construction, any two of these vectors ek will be orthogonal. The set e1, e2, e3, . . . is a basis of L(A) because for any 7‘, the vectors a1, . . ., a, may be written as linear combinations of e1, . . ., e, from (2). We can now obtain
an orthonormal basis simply by normalizing the vectors ek, i.e., by replacing ek with Hekll‘lek. We remark that the proof also shows that the vectors ek are uniquely 223
CH. 12: EUCLIDEAN AND UNITARY VECTOR SPACES
determined by the formulae (2) and the condition of orthonormality. It is easy to see that the vectors ek are uniquely determined except for scalar
factors, if we require any two to be orthogonal and each ek to be a linear combination of a1, ..., ah (k: 1,2,3, . . .). With a view to producing a concise technique for calculation, we will
now show that the Schmidt Method can be carried out by using the Exchange Method. We first prove the following result. Theorem 7. A basis B of a Euclidean vector space E is orthogonal if and only if the fundamental form is completely reduced with respect to this basis, i.e.,
”a?“2 = W?) = 2 Au {3e63
(3)
Further, B is orthonormal if and only if A6: 1 for all e e B.
Proof. 1. For any basis B, we have
E (e,f)ieif2 fEB 9(76) = (6M!) = eeB
(4)
Now, if B is orthogonal, then (e,f) =0 for eyef and hence we have (3) with A9: (e, e). Further, if B is orthonormal, then he: (e, e) = 1 for all e e B. 2. From (3) and (4), it follows that (e,f ) =0 for eaéf and hence that B is orthogonal (see Problem 11.1;8). Further A: (e, e) for all e e B so that
B is orthonormal when all )9: 1. Thus, in view ofTheorem 7, we can obtain an orthogonal basis by reducing the fundamental form. The Reduction Method of 11.1.3 can be easily extended to a countably infinite number of variables simply by allowing tableaux with a countably infinite number of rows and columns. Except for scalar factors, this method will give the same orthogonal basis as the Schmidt Method, because again each ek is a linear combination of a1, . . ., ah (k: 1,2,3, . . .). The norms ||ek||, which are used to normalize the basis vectors ck, also appear in the method, because, from (3), we have
||e,,||= +VAk. But Ak=l/3k where 8,, is the kth pivot. Thus the vector eh can be normalized by multiplying by + V31:-
Example 4. The space P3 (Example 2.1 ;5) is Euclidean with the fundamental form +1
Hasn’- = qua) = I were —1
since, if a polynomial x(1-) vanishes on — l < 7 S + 1, it vanishes for all -r and therefore x=0. A basis of P2 is given by ao('r) = 1; (11(7) = 1'; a2(—r) = 7-2; 224
12.1: EUCLIDEAN VECTOR SPACES
a3(-r)=-r3 (see Example 3.1 ;5). With respect to this basis, we have by (4) that 3 9(3) = Z (abak) £4 £1: €,k=0 where +1
/ aim «1,,(1) d1
(0’1" ah)
-—1
+1 I 15+d
l
-1
0
if (i+lc) is odd
2 —_i+lc+l
1 (2+lc)lseven
'f .
.
.
Hence the initial tableau for the reduction of the fundamental form is
£0 £1 £2 Es £0 = 770 ’71
2* 0
0 §
§ 0
0 E
772 §
0 § 0
"Is
E
0
0
7%
Using the elements in the main diagonal in their natural order as the pivots, we reach the following final tableau after four exchange steps.
m1 :2
cs
see—e 0 610% 04%0 £200 47’ £300 0 1% Here the pivots have the values 2, 2/3, 8/45, 8/175. 225
CH. 12: EUCLIDEAN AND UNITARY VECTOR SPACES
Thus the orthogonal basis is co = %“o = h 91 = gal = g
92 = —%(a0—3a2) = ”‘%(1—372)
e3 = ——“’8—5(3a1—5a3) = —3—8§(3«r—5—r3). To normalize these polynomials it is necessary to multiply them by 1/2, 142/3), V(8/45), \/(8/175) respectively. This gives
éo = 55
e = «see—1)
51 = x/(‘éih'
és = V(%)(5T3—3T)
which are the first four normalized Legendre polynomials. (We note, how-
ever, that the Legendre polynomials are not usually normalized in this way but by the condition that they should take the value +1 for 7‘: +1.) All
the Legendre polynomials can be obtained using this method. However for this special purpose there are much simpler methods. (Of. [18], p. 190.) 12.1.3 Orthogonal Transformations and Matrices Let E be an n-dimensional Euclidean vector space and let {1, ..., En and
111, ..., Vin be the components of a vector x e E with respect to two orthonormal bases. Then, by Theorem 7, n
n
”97112 = 9(90) = E 77% = E 5% = E'E = 71"}. k=1 Ic=1 Now, if
n = 85,
(5)
(of. Theorem 5.4; 2), then g' g = E’S’SE and hence, in view of the symmetry ofS’S, S’S = I, i.e., 8—1 = S’, i.e.,S* = S (6) (see 5.4;(5)). Definition 3. A real square matrix S which has the property (6) is said to be
‘orthogonal’. The linear transformation (5) of the vector components which corresponds to S is also said to be ‘orthogonal’. Now suppose that (5) represents a change of basis in which the f-basis is orthonormal. If S is orthogonal, then
11": = E’S’S‘é = E’E = q(x) 226
12.1: EUCLIDEAN VECTOR SPACES
so that the n-basis is also orthonormal (Theorem 7). If S is orthogonal, then, by Theorem 5.4;2, (5) corresponds to the trans-
formation f =S* e=Se of the vector components. Thus the vector components are transformed in the same way as the basis vectors. Putting all this together, we have Theorem 8. A change of basis between orthonormal bases is represented by an orthogonal matrix. Conversely, every orthogonal matrix can be expressed as the representation of a change of basis between orthonormal bases. In the transfer
from one orthonormal basis to another, the vector components undergo the same transformation as the basis vectors. From (6) it is clear that an orthogonal matrix always has an inverse which
is also orthogonal ((S‘l)’=(S’)'1=(S‘1)‘1). Further, the product of two orthogonal matrices 81,82 is also orthogonal ((SISZ)‘1=S§1;ST1 =SéSi= (8182f). Since the identity matrix I is orthogonal, we have the following theorem. Theorem 9. The set of all n x n orthogonal matrices is a group with the usual matrix multiplication and is known as the orthogonal group of degree n.
If S is orthogonal, then (detS’)2 = detSdetS’ = detSS’ = detI = 1 and therefore detS = i 1. Both possibilities occur. For example, when n=1, Sl=(+l) and Sz=(—l) are orthogonal. The first matrix has the
determinant +1 and the second has determinant — l. The set of all orthogonal matrices of degree n which have determinant + l is clearly also a group which is a subgroup of the orthogonal group. If we interpret (5) (referred to an orthonormal basis) as representing an
endomorphism f of E (cf. 5.3), then it follows from the orthogonality of S that f is an automorphism. Also, for all x1, x2 6 E,
(f(x1),f(¢2)) = mm = €119n = gigz = (971,902),
i.e., the scalar product of two vectors and hence the norm of a vector is invariant under the automorphism f. An automorphism with this property is called a rotation of the Euclidean vector space E. If detS’ = +1, we refer to f as a. proper rotation and if detS = -— l, we refer to f as a reflection. It is easy to verify that the matrix S in (5) is orthogonal if (5) represents a rotation
of E. Conversely we now have the following theorem. 227
CH. 12: EUCLIDEAN AND UNITARY VECTOR SPACES
Theorem 10. The set of all rotations of a Euclidean vector space of dimension n and the set of all proper rotations are groups (with the multiplication of map-
pings). The first is isomorphic to the orthogonal group of degree n and is a subgroup of the group A(E). (Theorem 5.3 ;l.)
A rotation of a Euclidean vector space E is often referred to simply as an automorphism of E because all the structural properties of E, i.e., the linear relationships, the norms and scalar products of vectors, are preserved. Exercises
1. Use Gram’s determinant to prove that the vectors ak('r) = 1-" (k = 0, 1 , 2, 3) in the vector space 0 (Example 2.1 ;7) are linearly independent. Solution. Make 0 Euclidean as in Example 3, and Gram’s determinant is equal to 256/23625950. 2. (a) Prove that the vector space 0’ (Example 2.1;7) becomes Euclidean
+1 With the scalar product (at, y) = I 12:1:(1) y(r) d1. —1 (b) In this Euclidean space, orthogonalize the sequence of vectors 010:], “1:7, “2:72, (13:1'3.
3 Solution. If the fundamental form q(x) = ., Z=0(ai,ak)f¢£,, (on the subspace
co = $00 =
also
e1= gal =
3049:
P3 9 0) is reduced by the exchange method, then the corresponding change of the basis vectors is
= _;2§a0+u_6a2= 34572 —3) 33 = —§§§al+‘$a3 = 468W —5T)
3. If the polynomials 2(7) are considered only for the values 1- = — 2, — 1, 0, l, 2, then they form a 5-dimensional vector space E. (Two polynomials are considered to be equal if and only if they take the same values for the five given values of 1- (see Example 12.2 ,2) ) The set of vectors a1(1-)— - 1‘ (i— —— 4) ls a basis of E and E 1s Euclidean with the fundamental form q(x) =
1:
2 2 [x(lc)]2. Orthogonalize the given basis.
Solution. e0: %, e1 =ilfi7" 92 = fih'z _ 2), e3 = —71§(51'3 —_ 171-), e,1 = $63574 _ 1551.2 + 72).
228
12.1: EUCLIDEAN VECTOR SPACES
The norms of these polynomials are 1/(9, V(fi—), V(fi), 1/(35—2), 1/(% . Problems
1. Prove the assertion made in Example 1. 2. Prove that Gram’s determinant (Theorem 4) is strictly positive if (11, ..., a, are linearly independent. (Hint: Consider the quadratic form r
.2
(a¢,ak)fiflc-)
1,Ia=1
3. Use Gram’s determinant to show that the polynomials (10(1) = l, . . ., (14(1) = 1'4 are a basis of P,1 (Example 2.1 ;5). 4. Prove that, if E is a Euclidean vector space, {0.1, . . ., en} is a basis of E and a1, . . ., can are real numbers, then there is exactly one vector x e E such that (x,ek)=ak for k=l, ..., n.
5. Show that the set L of all odd functions in the Euclidean vector space 0 (Example 3) (i.e., those functions x(1-) for which x( — -r) = —x(1-) in
— 1 < 1- S +1) is a subspace of 0. Which functions are in Ll? 6. Let E be a Euclidean vector space and let a e E. For which vectors x e E are (x+a) and (x—a) orthogonal? 7. Prove that
[:21 51.7142 s [121 6%] [1211):] for all real numbers {51, ..., E”, 171, .. ., 17". 8. Prove that +1
2
+1
+1
[ f x(-r)y(-r)d-r] sh maymflf [y(r)]2d‘r]forallx,y60. ._ 1
—1
—1
9. Prove that Pythagoras’s theorem,
‘Hxi 3/”2 = llxll2+ llyll2 if 9: and y are orthogonal} is true in any Euclidean vector space. 10. Show that a 2 x 2 orthogonal matrix is either of the form cos a sin cc — sin cc cos oz
)
or of the form
( cos a sm 0:) . sin 0: —cos oz
Which of these represent proper rotations? 11. Show that, iff is a reflection of a Euclidean plane (i.e., a 2-dimensional
Euclidean space), then f2 = e. 12. Suppose f is a rotation of a. finite-dimensional Euclidean vector space E. What is the dual endomorphism off ’9‘ 229
CH. 12: EUCLIDEAN AND UNITARY VECTOR SPACES
13. Which rotations of a Euclidean vector space—in particular a Euclidean plane—are symmetric endomorphisms? (Cf. Definition 1.3.2 ;1.) 14. Show that, if {eh . . ., en} and {f1, . . .,f,,} are orthonormal bases of the
Euclidean vector space E, then there is exactly one rotation t of E for which t(ek)=f,, (k: 1, . . .,n). 15. Prove that, given a basis of a real vector space, then there is a fundamental form which makes this basis orthonormal. 16. Prove that a subspace L of a Euclidean vector space E is itself a Euclidean vector space when llxll (x e L) is defined to be equal to the norm of x in E. 12.2 12.2.1
Approximation in Euclidean Vector Spaces, The Method of Least Squares The General Approximation Problem
Suppose that we are given a finite set of vectors a1, ..., a, in a Euclidean vector space E and suppose that, for an arbitrary vector 2 e E, we wish to find a vector x0 6 L=L(a1, . . .,a,) for which “2—210“ is as small as possible, i.e., [[z-x0[|=min”z—x||. xeL
Theorem 1. There is exactly one vector x0 6 L for which
llz—xoll = m111||74--90HxeL
This is the vector :60 which is uniquely determined by x0 6 L and 2 —x0 6 LT.
Proof. By Theorem 12.1 ;3 applied to the subspace L(L,z), there is just one vector x0 6 L such that z — x0 6 LT. Now suppose that x e L and :1: 9E 220. Then
”2-9?”2 = ”(z-zo) + (mo-96)”2 = Ilz—xol|2+2(z—xo, xo—x) + li$0”$”2Since z—xo e L1 and xo—x e L, the middle term vanishes. Further, since xo—xgéO, it follows that ”z —x||2 > [lz —x0||2 and the theorem is proved.
In the applications of the theorem it is important to be able to calculate 1’
the coefficients M, in the representation :50 = Z Ahab. Ic—l Since (z —x0,ai) :0 for all i=1, . . ., r, it follows that kél (ak’ai) A]: = (2,04')
(7; = 1, . . .,’r)
(l)
is a system of 1‘ linear equations for the r coefficients Ak. Every solution 230
12.2: THE METHOD OF LEAST SQUARES
(A1, . . .,h,.) of this system provides a representation of 2:0. The A1, . . ., A, are uniquely determined if and only if a1, . . ., a, are linearly independent. It is only in this latter case that the determinant of the system (1) is not equal to zero (Theorem 12.1 ;4). Definition 1. The vector me which is uniquely determined as in Theorem 1 is
said to be the ‘best approximation to’ 2 as a linear combination of the vectors “1, . . ., a,”
The following examples Will illustrate three important types of problem to which the theory may be applied. Example 1. Let E be the Euclidean vector space 0' (Example 12.1;3) and let a], e 0 be the polynomial ak(r) :7” (16:0, . . .,r). Then, for a given continuous function 2 e 0, the method described above will produce a ‘best approximation in mean squares’ for z on the interval —- 1 oo n
and therefore
. ux u = Ialuenu. hm ”Till
k—> no
n
273
CH. 13: EIGENVALUES AND EIGENVECTORS
Hence by division ”Wk“ = I)“.
b—m ”xiv—1”
(8)
From (6) it further follows that x
A—E_§nen = yk
and, since
lim ”3],,” = 0, k—>oo
lim 1’; = fine"
(9)
k—>oo
where convergence is defined in E as in 12.3 ;(2). Thus the sequence of vectors 971
”2
953
A”,
A3,
A3,
converges to an eigenvector corresponding to the eigenvalue A“ with the greatest absolute value. Example 5. Suppose that the endomorphism f is represented by the matrix A =
5 —2 —4
—2 2 2
— 2 5
with respect to some basis. We choose $0 to be the vector which has the components 1, 0, 0 with respect to this basis. Then the components of the vectors wk are
x1=(5 ,—2 ,—4 ) x2=(45 ,—22 ,—44) x3 = (445 , —222 , -—444) x4 = (4445, —2222, —4444) etc. We see immediately that
”k| ”xle—l”
tends to 10, so that the eigenvalue
with the greatest absolute value is A3: i 10.
From (9), we also see that the corresponding eigenvector has the components 2, — 1, —- 2 (apart from a scalar factor). Finally it is not possible that As: — 10, because, in this case, for sufficiently large h, each component of wk would change sign at each iteration. Hence A3 = + 10. The evaluation of the norm ”k at each step involves a certain amount of
calculation which is reduced if the given norm is replaced by a new one equal to the absolute value of the greatest component of x. We will denote this by 274
13.6: NUMERICAL CALCULATION OF EIGENVALUES
[m]. It is easy to verify that “$l ||en|| and ”3/i may be replaced by [wk], [en] and [yk] in the arguments (in particular (7)) which led to the conclusion (8), and hence we have lim [wk] b—Nao [Wk—1]
= WI-
(10)
By introducing the new norm [wk], we have also made it possible to drop the condition that E should be Euclidean. (We remark however that E can
always be made Euclidean by putting Hx||2= 2 §% where {k are the comk ponents of x with respect to some basis and f is represented by some matrix With respect to this basis.) From the representation of yk in (6), we see that the greater the quotient _"
is, i.e., the more A" exceeds the other eigenvalues, then the more rapid
A”—1
is the convergence in (8), (9) and (10). From (9), we see that the components
of wk become very large or very small as 10 increases if [Anl 9e 1. In order to avoid this, we can divide wk by [lxkll or [wk] after
"k
[wk]
"wk—1“
[xiv—1]
has been calculated. In this way, we ensure that llxk||=l or that [wk]: 1. This has the added advantage of making it easier to recognize the eigenvector as a limit. If E is Euclidean and f is symmetric then in general it is possible to find the eigenvalue A" to within a given degree of accuracy with less efi'ort than is involved in (8) or (10) by using the Rayleigh quotient (see 13.5 ;(6)). If
en is again an eigenvector corresponding to A” then An = P(en) =
«we.» . (en: en)
Hence, the Rayleigh quotients
(xk.f(xk)) (whack) p(a:k) =——
( Ic=1,2,3,... )
are approximations to A“. Ifwe assume that ||xk|| = ”en" = l and xk=en + dank, then, after a short calculation, we obtain
P(xk) _An = (f(dxk)—An dxk’ dxkl-
(11)
Since this is a quadratic form in dz,“ p(xk) will approach A” particularly quickly when f is symmetric. 275
CH. 13: EIGENVALUES AND EIGENVECTORS
We will check this by looking again at Example 5. The matrix A is symmetric and hence it represents a symmetric endomorphism of Rn (where (x,y)=2k§k17h). The calculation gives for instance
p(a:2)
44445 we E5 22 445
.
9 998875
_ [L5], [x4]
Finally we remark that the Rayleigh quotient also converges to the eigenvalue in the case of an arbitrary matrix A . However the extra rapidity of the convergence is lost when A is not symmetric. In practical applications the main interest is often not in the eigenvalue with the greatest absolute value but in the one with the least absolute value. For example this is the case in oscillation problems, where the eigenvalues represent the eigenfrequencies. The iteration method can still be applied, by using Theorem 13.1 ;7. In view of this theorem, the eigenvalue of A with the least absolute value is equal to the reciprocal of the eigenvalue of A“1 with the greatest absolute value. There are methods of calculation which are based on this idea, but do not in fact require the inverse of the matrix to be completely calculated. (cf. [8], p. 286. For other numerical techniques for the calculation of eigenvalues see [27] chap. 10.)
Finally we will indicate how, after finding A” and a corresponding eigenvector e”, it is possible to find the eigenvalue And, assuming thatf and A are symmetric. In this case, every eigenvector corresponding to An—l is ortho-
gonal to en and the orthogonal complement ofL(e,,) is mapped into itself byf (Theorem 13.2 ;4). If we assume that M,,_1| > |/\k| for k: 1, ..., n—2, then we can start with any vector x0 such that ($0, en)=0 and again apply the iterative method already described for A”. In this way we will find Ana and
an eigenvector corresponding to this eigenvalue. However in actual numerical calculation there is always a risk that, during the method, the orthogonality to en will be lost due to rounding errors. (See Example 7.) In View ofthis possibility, we must check the orthogonality at each step and ifnecessary we must restore it by replacing wk with xk— (wk,e,,)e,, (assuming e,, is normalized).
The other eigenvalues M2, Aka, . .. may also be found by the same method.
Example 6. We choose the same 3 x 3 matrix as in Example 1 and start with the arbitrarily chosen vector «:0: (1,7,3). Then :51: ( —43,59,51). In
the following table, we list the results for several values of k writing the vectors my, in normalized form so that lkll = 1. We will give the components
to five decimal places although more places are effectively calculated. 276
13.6: NUMERICAL CALCULATION OF EIGENVALUES
’9
xk
[xkl/[wk—11*
Mitt—1)
5 10 l5 25 35
- 0'54815, 060559, 057687 0'58189, —0‘57278, —0'57734 — 057677, 057793, 057735 — 0'57734, 057736, 057735 -—0'57735, 057735, 057735
— 4'06489 —2'77900 — 3‘01527 — 3‘00026 —3-00001
— 11-72404 —2'83200 -— 302293 — 300040 —3'00001
For convenience the method of evaluating [xk]/[x,c_1]* has been adapted so that it is always the component of x], which has the same index as the
component of xk_1 with the greatest absolute value which is divided by the latter. This makes no difference to the final result. From this table, we can see that p(xk) converges to the eigenvalue A3 = — 3 and that xk converges to the eigenvector ( — l, 1, 1). In this example p(:€k) does not converge any more rapidly than [wk]/[x,c_1]*. Note that A is not symmetric.
Example 7. 0-22 0-02 0-12 0- 14
0-02 0-14 0-04 — 0-06
0-12 0-04 0-28 0-08
0-14 — 0-06 0-08 026
Note that A is symmetric. We start with x0: (1, 7,3,9)
70' 2 5 10 15
xi: 0-58045, 057755, 0-57735, 067735,
003537, 0-00163, 000004, 000000,
057421, 0-57852, 057739, 057735,
057629 0-57598 057731 057735
[xk1/[xb—1]*
Mme-1)
043357 047939 0-47996 048000
046796 047999 048000 048000
We see that 0-48 is an eigenvalue with the eigenvector (1,0, 1,1). We will now repeat the method starting with x0: (0,1,0, 0), i.e., with a vector which is orthogonal to the eigenvector just found. Providing the
rounding errors do not interfere, we should find the eigenvalue with the second greatest absolute value and a corresponding eigenvector. 277
CH. 13: EIGENVALUES AND EIGENVECTORS
[why[wk—11*
P(xlc—1)
—0'57650 —0'57735 -0'57735 —-0'57735
0'23231 0'23977 0'23988 0-23994
0'23953 0'24000 0'24000 0'24000
—0'23477, 0'52746, 0'29269, —0'76223 —0'57735, 0‘00127,—0'57608, —0'57862
026928 047833
025132 048000
331:
7‘ 5 10 11 12
30 40
0'01747, 0'00056, 000028, 000014,
0'59567, 0'57791, 0'57763, 057749,
0'55903, 0'57679, 0'57707, 057721,
As far as 10 = 12, the solution approaches the eigenvalue 0-24 and the eigenvector (0,1,l,—l). However, the rounding errors then start to interfere
(nine significant places were used in the calculation) and we are led once again to the greatest eigenvalue. Finally we note that, in this example, the Rayleigh quotient always converges more rapidly than [wk]/[xk_1].
Exercises
1. Use the Gaussian Algorithm to find the eigenvalues and eigenvectors of the matrices
(a)
——8 A =( 9 9
—27 28 27
Solutions. (a) A1=10 A2=A3=l
9
_) a”:
5 6 2 —4
—4 —6 0 4
—2 —l —l 2
—1 —2 2 2
x1=(1,—1,—-1) $2=(l90) 1)
$3=(0, 1,3) x1=(l,2,0,—1) 9:2:(2, 3, 1,—2) x3=(3, 3, 1,—2) x4=(l, l,0,—l).
2. Use the iterative method to find the greatest eigenvalue and the corresponding eigenvector of the matrix A in Exercise 1.
278
13.6: NUMERICAL CALCULATION OF EIGENVALUES
Solution. Starting with x0: (1,0,0), the method produces in sequence the vectors
$1 (— 8, 9, 9) x2 (— 98, 99, 99) x3 = (—998, 999, 999) From this, it is easy to see the eigenvalue 10 and the corresponding eigenvector (— l, l, l).
CHAPTER 14
INVARIANT SUBSPACES, CANONICAL FORMS OF MATRICES If the finite-dimensional vector space E has a basis consisting ofeigenvectors
of the endomorphism f of E, then it is particularly easy to see how this endomorphism operates on E. Each vector in the basis is simply multiplied by the corresponding eigenvalue and then it is straightforward to find the image of an arbitrary vector x e E (see 13.6;(6)). By Theorem 13.1 ;4, the
matrix which represents f with respect to some basis is reducible to diagonal form and hence is similar to a diagonal matrix. Apart from the order of the diagonal elements, this diagonal matrix is uniquely determined by f We will now investigate the situation in the case of an arbitrary endomorphism to see if there are any similar properties and also to see if every
square matrix is similar to some matrix which has a simple canonical form. 14.1 14.1.1
Invariant Subspaces Vector Spaces over an Arbitrary Field
So far we have considered real and complex vector spaces. The most important property of the real and complex numbers which we have used is the possibility of adding and multiplying them together (i.e., the existence of two binary operations on the set S of scalars) subject to the following axioms.
l. (a+/3) +y= oc+ (13+y). Addition is associative. 2. or + [3 = B + ac. Addition is commutative. 3. There exists an element 0 e S such that 0+0: = aforallocES.
4. Corresponding to each element a: e S, there is an element — cc 6 S such that oc+ ( — oz) =0.
5. (ac/3) y = oc(/3y). Multiplication is associative. 6. «,8 = [3a. Multiplication is commutative. 7. There exists an element 1 e S such that lac = aforall oz ES.
280
14.1 : INVARIANT SUBSPACES
8. Corresponding to each element «(#0) ES, there is an element “—1
such that aux—1 = l . 9. «([3 + y) = 0:3 + any. 10. laéO.
Distributive.
An algebraic structure which has two binary operations satisfying axioms l, ..., 10 is known as a field. The sets of real and of complex numbers with their usual addition and multiplication are two examples of fields, but of course there are infinitely many other fields of which one well-known example is the field of rational numbers.
We can now redefine the concept of a vector space, replacing the fields of real or of complex numbers by an arbitrary field F. We will then speak of a vector space over the field F, and refer to F as the ground field of the vector
space. The elements of the field F are the scalars for all vector spaces overF. A great many of the previous results are also valid for vector spaces over an arbitrary field F. In this last chapter, we will now deal basically with vector spaces over arbitrary fields because it is only in this case that the theory attains its true value while the main results take on a very special
form When applied to real or complex vector spaces. 14.1.2
Polynomials
A polynomial over afield F is an expression of the form
r= a0+a1§+...+ans"= i «he
m say), then the
sum of f and g'15 defined by
f+g = [:0 (we)?
(2)
where BM“ = . .. =l3n=0.
m «#316 I: +l. 0%«k
'o‘:M:
II
The product of f and g is defined by
(3) 281
CH. 14: INVARIANT SUBSPACES
If f and g have degrees n and m then fg has degree (m+n).
It is easy to verify that these operations of addition and multiplication of the polynomials over F satisfy the axioms 1, .. ., 6 and 9 in the definition of a field (cf. 14.1.1). Thus the polynomials over a field F form an algebraic
structure which is known as a commutative ring. (Fields are a special type of commutative ring.) We refer to this ring as the ring of polynomials over the field F. The elements a e F can be considered as polynomials over F, in that they are the polynomials of degree 0 (except oz=0, which has no degree). If we apply Definition (3) to this special case, then we obtain the following rule
for the multiplication of the polynomial (l) by an element a e F.
at = 2": (was k=0
(4)
Division in the ring of polynomials over the field F has many properties which are similar to the properties of division in the ring of integers. We will now set out the most important of these, but we will not include their proofs which may be found in [24] chapter 3.
The polynomial f is said to divide the polynomial g, in symbols f lg, if there is a polynomial h such that g =fh. If f |g and f is neither equal to egg (cc 6F) nor of degree 0, then f is said to be a proper divisor of g. Every
polynomial of degree 0 (i.e., every a: 6F, (29:50) is a divisor of every polynomial. If f ]g, then g will also be referred to as a multiple of f. A greatest common divisor (god) of two polynomials f and g is any
polynomial h which has the following two properties. 1. h|f and hlg. 2. Every common divisor of f and g is also a divisor of h.
Apart from a factor on e F, the g.c.d. h of f and g is uniquely determined by f and g and there exist polynomials f1 and g1, such that
h = f1f+g1g.
(5)
If 1 is a god. of f and g, then f and g are said to be relatively prime.
A least common multiple (l.c.m.) of two polynomials f and g is any polynomial h such that l. f|h and g|h. 2. Every common multiple of f and g is also a multiple of h.
Apart from a factor ac 6F, the l.c.m. h of f and g is also uniquely determined by f and g.
The polynomials which correspond to the prime integers are the irreducible polynomials. A polynomial is said to be irreducible if it has no proper divisors and is not of degree 0. If f is irreducible, then so is af for all cc 6 F, cc yé 0.
282
l4. 1 : INVARIANT SUBSPACES
Every polynomial has a representation as a product of irreducible polynomials which is unique in the following sense. If
f= plpzmpr = q1q2"'qs are two such representations, then r=s and, with a suitable renumbering,
pk is identical with qk except for a factor ask 6 F (k = l, . . ., r). We see that the polynomials of degree 0 play the same part in the ring of polynomials as the two numbers i l in the ring of integers. They are known as the units of the ring. If f is monic, and the irreducible factors p1, . . ., p, are also monic, then these are uniquely determined by f. An ideal in the ring P of polynomials over the field F is a subset J of P with the following properties.
1. With respect to addition, J is a subgroup of P (i.e., Jaém and, if f, g EJ, then f—g EJ see [24] p. 372). 2. Iff eJ and g GP, then fg eJ. If an ideal J ofP does not consist only of the zero polynomial (zero ideal),
then it contains a polynomial f0 aé 0 of minimum degree. Apart from a factor a e F, this is uniquely determined by J. Because, if go 9é 0 is a second polynomial of minimum degree in J, then, by (5), the g.c.d. h of f0 and go is also in J and therefore has the same degree as f0 and go. It follows that f0
and go only differ by a factor ac 6F. Now, if 1' EJ, then again by (5), the god. of f and f0 is also in J and hence f0 |f. Thus every polynomial f e J is a multiple of f0, i.e., f0 generates the ideal J. An ideal which is generated by a single element is generally referred to as a principal ideal. Thus every ideal
in the ring of polynomials over a field F is principal.
14.1.3 Minimal Polynomials and Annihilators Now let E be a vector space of finite dimension n over a field F, let f be an endomorphism of E and let f be the polynomial (1) over F. By 5.3.1
ff = 12:10 “Icfk
(6)
is again an endomorphism of E (fk means the product of 1:: factors each equal to f and 050f0 is the identity endomorphism multiplied by etc). The endomorphism ff is constructed by substituting f for f in the polynomial
f. If g is a second polynomial over F, then we can also form g, and we have
(f+g)f = ff+gf
(7)
(fg)f = ff g;233
CH. 14: INVARIANT SUBSPACES
The polynomial f=1 corresponds to the identity endomorphism and the polynomial f =0 corresponds to the zero-endomorphism. Now (7) means
that the mapping f—>ff is a homomorphic mapping of the ring of polynomials into the ring of endomorphisms of E (see 5.3.1). Definition 1. A subspace H of E is said to be :f-invariant’ iff (H) is contained in H, i.e., ifH is mapped into itself by the endomorphism f.
If H is f-invariant, then it is easy to see that ff(H) E H for all poly-
nomials f over F. Now, if H is anf-invariant subspace of E and L is an arbitrary subspace of E, then we consider the set J g P of all polynomials f over F such that ff(L) E H. Then this set J is an ideal. Because, if f eJ and g eJ, then (f— g)f(L)=ff(L)—gf(L) g H. Further, if f eJ and g 6P, then (gf)f(L) =gf ff(L) g g,(H) g H. Finally 0 E J and therefore Jaéz. This ideal J is not the zero-ideal. Because, by Theorem 5.2 ;11, the n2 + 1
endomorphismsf0 = l, f, f2, . . ., f”’ are linearly dependent which means that there is a polynomial f #0 such that ff=0. Hence ff(L) E H and therefore f e J. The polynomial foaéO which generates J is the one which is uniquely
determined (apart from a factor 05 6F) as the polynomial with the least degree such that f0f(L) S; H. We now consider the following two special cases. 1. L=E, H = {0}. Then f0 is the unique (apart from a factor cc 6 F) polynomial of the least degree such that f0f=0 (the zero-endomorphism). 2. Let x e E. If we put L=L(x) and H = {0}, then f0 is the unique (apart
from a factor oz 6 F) polynomial of the least degree such that f0f(x) :0. Definition 2. The ‘minimal polynomial’ of the endomorphism f is the unique manic polynomial m of the least degree such that mf=0. The ‘f-annihilator’
of a vector x e E is the unique manic polynomial h of the least degree such that hf (x) = 0. The f-annihilator of x=0 is the polynomial 1.
Theorem 1. If s is the degree of the f-annihilator h of x e E and t; s, then the vectors as, f(x), ..., f"_1(x), are linearly independent, and the vectors x, f(x),
..., fs_1(x), f’(x) are linearly dependent. Proof. 1 . If the first set of vectors was linearly dependent, then there would be a polynomial k of degree < s such that kf(x) =0 and this contradicts the definition of h. 2. If t :8, the second part of the theorem follows from the fact that
hf (x) is a. linear combination of x, . . ., f‘(x). Now suppose that this part of the 284
14.1: INVARIANT SUBSPACES
theorem is also true for some t 2 8. Then f‘(x) e L(x, ..., f“1(x)). Therefore
f+1(x) e L(f(x), . . .,f”(x)) g L(x, . . .,f“"1(z)) so that the assertion is also true for t+ 1 and hence by induction for all t.
14.1.4 Invariant Subspaces Theorem 2. If f is a polynomial over the field F and f is an endomorphism of
E, then ker ff is an f-irwariant subspace of E. (ker f, denotes the kernel of the endomorphism ff (see Definition 5.1 ;2).)
Proof. If a: e kerff, then ff(x) =0 and hence
we» = me» = f(0) = 0, i.e.,flx) 6 km}We note that in general not all of the f—invariant subspaces can be found in this way, for example, when f=e (the identity endomorphism), kerfe
is either {0} or E depending on the polynomial f, whereas in fact every subspace of E is e-invariant. Theorem 3. If f |g, then kerff E kergf. Proof. If x e ker ff, then ff(x) =0 and hence, if g = hf, gf(x) = hfff(x) =0, i.e., x e ker gf. Theorem 4. If f is a proper divisor of g and g is a divisor of the minimal
polynomial m off, then kerf, is a proper subset of ker gf. Proof. We put m=hg and f1=hf. Then f1 is a proper divisor of m and therefore deg f1 < degm. Hence there is an .120 E E such that f1f(x0) 9&0. If we put yo: hf(x0), then ff(y0) = (hf)f(xo) =f1f(x0) #0 and hence yo 5% kerff. 0n the other hand g,(yo)=(hg)f (x0)=m,(x0)=0 and hence yo ekergf. The assertion follows from this and Theorem 3.
Theorem 5. If h is a god. of f and g, then kerb, = ker ff {1 ker gf.
Proof. 1. From Theorem 3, it follows that ker h, E ker f, n ker g,. 2. Suppose xekerff n kergf, i.e., ff(x)=gf(x)=0. Using (5), it
follows that l11(1”) = fl; ff(x)+glf g;(“) = 0 and therefore that a; e ker hf.
CH. 14: INVARIANT SUBSPACES
Theorem 6. If h is an l.c.m. of f and g then ker h, = ker ff+ ker g,.
Proof. 1. From Theorem 3, it follows that kerff g ker h, and ker g, g ker-hf and therefore kerff+ker g, g. ker hf. 2. Suppose a: e kerhf. If we put h=f1f=g1g then 1 is a g.c.d. of f1 and g1. Therefore by (5) there exist f2, g2 such that f2 {1 + g2 g1 = 1. Putting x1=f2f f1f(x), ff(x1)= (fzh)f (x) =0 and therefore ml 6 kerff. Similarly wz=g2fg1f(x) e kergf. Further x1+x2=e(x)=x. Hence a: e kerff+kergf. Theorem 7. If f and g are relatively prime, then ker (lg), = ker ff ® ker g,.
Proof. This follows from Theorems 5 and 6, remembering that l is a. g.c.d. and fg is an l.c.m. off and g. Definition 3. A subspace L of E is said to be ‘f-irreducible’ if L is f-in'variant and there is no decomposition L=L1 (-9 L2 of L into a direct sum of non-zero
f-invariant subspaces. Theorem 8. Iff is an endomorphism of the vector space E of finite dimension n, then E is equal to the direct sum off-irreducible subspaces.
Proof. 1. For n: 1, the theorem is obvious because E is itselff-irreducible. 2. We therefore use induction on the dimension n. Suppose that the theorem is true for all spaces of dimension less than n, and suppose that
dim E =n. There is nothing to prove if E is already f-irreducible. Suppose therefore that there is a decomposition E=Ll ® L2, in which each Li is f-invariant, each Ligé {O} and therefore dimLi