Linear Algebra [1 ed.]


132 58 18MB

English Pages 305+x [319] Year 1967

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Title
Contents
Preface
Notes for the Reader
1. Sets and Mappings
1.1 Sets
1.2 Families of Sets
1.3 Mappings
2. Vector Spaces
2.1 The Concept of a Vector Space, Examples
2.2 Rules for Calculation in Vector Spaces
2.3 Linear Combinations and Calculations with Subsets of a Vector Space
2.4 Subspaces of a Vector Space
2.5 Cosets and Quotient Spaces
2.6 Convex Sets
3. Bases of a Vector Space, Finite-Dimensional Vector Spaces
3.1 Bases of a Vector Space
3.2 Finite-Dimensional Vector Spaces
3.3 The Exchange Method
3.4 Convex Polyhedra
4. Determinants
4.1 Permutations
4.2 Determinants
4.3 Numerical Evaluation of Determinants
5. Linear Mappings of Vector Spaces, Matrices
5.1 Linear Mappings
5.2 Linear Mappings of Finite-Dimensional Vector Spaces, Matrices
5.3 Linear Mappings of a Vector Space into Itself (Endomorphisms)
5.4 Change of Basis
5.5 Numerical Inversion of Marices
5.6 The Exchange Method and Matrix Calculation
6. Linear Functionals
6.1 Linear Functionals and Cosets
6.2 Duality in Finite-Dimensional Vector Spaces
6.3 Linear Functionals which are Positive on a Convex Set
7. Systems of Linear Equations and Inequalities
7.1 The Solutions of a System of Linear Equations
7.2 Numerical Solution of a System of Linear Equations
7.3 Positive Solutions of a System of Real Linear Equations
7.4 Systems of Linear Inequalities
8. Linear Programming
8.1 Linear Programmes
8.2 The Duality Law of Linear Programming
8.3 The Simplex Method for the Numerical Solution of Linear Programmes
8.4 The Treatment of Free Variables
8.5 General Linear Programmes
8.6 The Simplex Method and Duality
9. Tchebyshev Approximations
9.1 Tchebyshev's Method of Approximation
9.2 The Proof of Two Results used Earlier
10. Game Theory
10.1 Two-Person Zero-Sum Games, Pure Strategies
10.2 Mixed Strategies
10.3 The Evaluation of Games by the Simplex Method
11. Forms of the Second Degree
11.1 Quadratic Forms on Real Vector Spaces
11.2 Hermitian Forms on Complex Vector Spaces
12. Euclidean and Unitary Vector Spaces
12.1 Euclidean Vector Spaces
12.2 Approximation in Euclidean Vector Spaces, The Method of Least Squares
12.3 Hilbert Spaces
12.4 Unitary Vector Spaces
13. Eigenvalues and Eigenvectors of Endomorphisms of a Vector Space
13.1 Eigenvectors and Eigenvalues
13.2 Symmetric Endomorphisms of a Euclidean Vector Space
13.3 The Transformation of Quadratic Forms to Principal Axes
13.4 Self-Adjoint Endomorphisms of a Unitary Vector Space
13.5 Extremal Properties of the Eigenvalues of Symmetric and Self-Adjoint Endomorphisms
13.6 Numerical Calculation of Eigenvalues and Eigenvectors
14. Invariant Subspaces, Canonical Forms of Matrices
14.1 Invariant Subspaces
14.2 Canonical Forms of Matrices
Bibliography
Index
Recommend Papers

Linear Algebra [1 ed.]

  • Commentary
  • English translation of https://libgen.is/book/index.php?md5=EA9A00BDDA09EE16FB525DDD107CE515. The German 2nd edition, https://libgen.is/book/index.php?md5=2BD14658B6913847334357DB3180E24D, has not been translation to my knowledge.
  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

LINEAR ALGEBRA

INTERNATIONAL SERIES IN PURE AND APPLIED MATHEMATICS William Ted Martin and E. H. Spanier, Consulting Editors AHLFORS - Complex Analysis BELLMAN - Stability Theory of Difl'erential Equations BUCK - Advanced Calculus BUSACKER AND SAATY - Finite Graphs and Networks CHENEY ~ Introduction to Approximation Theory CODDINGTON AND LEVINSON - Theory of Ordinary Differential Equations DETTMAN ~ Mathematical Methods in Physics and Engineering EPSTEIN - Partial Differential Equations GOLOMB AND SHANKS ~ Elements of Ordinary Differential Equations GRAVES . The Theory of Functions of Real Variables GREENSPAN - Introduction to Partial Differential Equations GRIFFIN - Elementary Theory of Numbers KAMMING - Numerical Methods for Scientists and Engineers aDEBEAND - Introduction to Numerical Analysis HOUSEHOLDER - Principles of Numerical Analysis LASS - Elements of Pure and Applied Mathematics LAss - Vector and Tensor Analysis LEPAGE - Complex Variables and the Lapage Transform for Engineers NEHARI . Conformal Mapping NEWELL - Vector Analysis RALSTON ‘ A First Course in Numerical Analysis ROSSER ' Logic for Mathematicians RUBIN - Principles of Mathematical Analysis SAATY AND BRAM - Nonlinear Mathematics SIMMONS - Introduction to Topology and Modern Analysis SNEDDON ~ Elements of Partial Difi'erential Equations SNEDDON ' Fourier Transforms STOLL - Linear Algebra and Matrix Theory STRUBLE - Nonlinear Differential Equations WEINSTooK - Calculus of Variations WEISS - Algebraic Number Theory ZEMANIAN - Distribution Theory and Transform Analysis

LINEAR ALGEBRA WALTER NEF Professor of Matbematits, University of Berne, Switzerland

Translated from the German by

I. C. Ault Letturer in Mathematics,

University of Leicester

MCGRAW—HILL BOOK COMPANY

New York ' St: Louis ' San Francisco ' Toronto ' London ' Sydney

Published by McGraW-Hill Publishing Company Limited McGraW-Hill House, Maidenhead, Berkshire, England

Authorized translation from the first German-language edition, copyrighted in Switzerland and published by Birkhauser Verlag AG, Basel 94039

Lehrbuch der Linearen Algebra first published by Birkhéiuser Verlag, Basel, 1966. © 1966 Birkhauser Verlag, Basel

Copyright ©1967 McGraw-Hill, Inc. All rights reserved. N0 part of this publication may be reproduced, stored in a

retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or

otherwise, without the prior permission of McGraW—Hill, Inc.

CONTENTS SETS AND MAPPINGS Sets Families of Sets

wn—I

1 .1 1.2

1 .3 Mappings VECTOR SPACES

2.1

The Concept of a Vector Space, Examples

2.2

Rules for Calculation 1n Vector Spaces

2.3

Linear Combinations and Calculation \1- ith Subsets of a Vector Space . . Subspaces of a Vector Space . Cosets and Quotient Spaces Convex Sets .

2.4 2.5 2.6

. 19 21 25 31

BASES OF A VECTOR SPACE, FINITE-DIMENSIONAL VECTOR SPACES

3.1 3.2

Bases of a Vector Space . Finite-Dimensional Vector Spaces .

3.3

The Exchange Method

3.4

Convex Polyhedra

38 47 51 61

DETERMINANTS 4.1 4. 9 4.3

Permutations Determinants . Numerical Evaluation of Determinants

69 72 78

LINEAR MAPPINGS 0F VECTOR SPACES, MATRICES

. Linear Mappings . Linear Mappings of Finite-Dimensional Vector Spaces, Matrices . 5.3 Linear Mappings of a Vector Space into Itself (Endomor— . . . . . . phisms) 5.4 Change of Basis 5.5 Numerical Inversion of Matrices 5.6 The Exchange Method and Matrix Calculation. 5.1 5.2

V

82 87 96 100 102 105



6

CONTENTS

LINEAR FUNO'rIONALs

6.1 6.2 63 7

Duality 1n Finite-Dimensional Vector Spaces Linear Functionals which are Positive on a Convex Set

The solutions of a System of Linear Equations

Numerical Solution of a System of Linear Equations Positive Solutions Of a System of Real Linear Equations Systems of Linear Inequalities

8.1 8.2 8.3

Linear Programmes

8.4 8.5 8.6

The Treatment of Free Variables

The Duality Law of Linear Programming The Simplex Method for the Numerical Solution of Linear .

General Linear Programmes .

The Simplex Method and Duality

155 170 172 178

Tchebychev’s Method of Approximation . The Proof Of two Results used Earlier

184 193

TWO-Person Zero-Sum Games, Pure Strategies Mixed Strategies .

The Evaluation of Games by the Simplex Method

195 197 201

FORMS OF THE SECOND DEGREE 11.1 11.2

12

.

GAME THEORY 10.1 10.2 10.3

11

.

149 153

TCHEBYCHEV APPROXIMATIONS

9.1 9.2 10

131 134 140 143

LINEAR PROGRAMMING

Programmes

9

107 111 121

SYSTEMS 0F LINEAR EQUATIONS AND INEQUALITIES

7.1 7.2 7.3 7.4 8

Linear Functionals and Cosets

Quadratic Forms on Real Vector Spaces

Hermitian Forms on Complex Vector Spaces

207 216

EUCLIDEAN AND UNITARY VECTOR SPACES

Euclidean Vector Spaces 220 Approximation 1n Euclidean Vector Spaces, The Method of Least Squares . . . . . 230 12.3 Hilbert Spaces . 238 12.4 Unitary Vector Spaces . 243 12.1 12.2

CONTENTS

l3

14

vii

EIGENVALUES AND EIGENVECTORS OF' ENDOMORPHISMS OF A VECTOR SPACE 13.1 13.2 13.3 13.4

Eigenvectors and Eigenvalues Symmetric Endomorphisms of a Euclidean Vector Space The Transformation of Quadratic Forms to Principal Axes Self—Adjoint Endomorphisms of a Unitary Vector Space

13.5

Extremal Properties of the Eigenvalues of Symmetric and Self--Adjoint Endomorphisms 258

13.6

Numerical Calculation of Eigenvalues and Eigenvectors

247 251 253 256

262

INVARIANT SUBSPACES, CANONICAL FORMS OF MATRICES

14.1

Invariant Subspaces

14.2

Canonical Forms of Matrices

Bibliography Index .

.

280 292 300 302

PREFACE This book is based on an introductory course of lectures on linear algebra

which I have often given in the University of Berne. This course is intended for students in their second year and is for both specialist and supplementary mathematicians. These latter usually include actuaries, astronomers, physicists, chemists and also, sometimes, mathematical economists.

Because of the wide diversity of the audience, it was necessary to develop the discussion from as few basic assumptions as possible and this is a feature of the book. It is assumed that the reader has had a grammar school education in mathematics and that he has the ability to think abstractly.

It is also helpful if he has some knowledge of vector geometry because this will provide him with visual illustrations of the general theory.

It was also necessary to restrict the course to the most important topics so that in the book, with the exception of the last chapter, only real and complex vector spaces will be considered. In addition, it was necessary to consider the applications of linear algebra in accordance with the needs of the majority of the audience. Apart from details concerning the choice ofmaterial, the main effect ofthis requirement has been the inclusion of simple techniques for the solution of the most important types of numerical problem. Naturally, it has not been possible

to give a complete discussion of these. It is not usual practice to deal with linear programming, Tchebychev approximations, and game theory in a textbook of linear algebra but these topics are becoming increasingly important, so I have presented them in the form of introductory chapters. The publication of this book would hardly have been possible without the

very considerable help of several co-workers. Especially, I would like to thank Miss Dora Hanni for typing the manuscript and Messrs. H. P. Brier, H. P. Blau, D. Fischer, and N. Ragaz for their help with the proof-reading and also H. P. Blau for drawing the diagrams. Lastly, I thank the Birkhéiuser Verlag and the McGraW-Hill Publishing Company, Limited for their always friendly co-operation and their careful preparation of the book. W. NEF

Notes for the Reader

1 . The book is divided into fourteen chapters and the chapters into sections. The numbering of definitions, theorems, examples and individual formulae starts afresh in each section. For example a reference to Theorem 2.4 ;9 would mean the Theorem 9 in section 2.4. Similarly, 3.3;(11) would mean the formula (11) in section 3.3. The number of the section is omitted whenever the reference is made within the same section.

2. In contrast with the usual notation, the matrix which has just one column consisting of the components 5 ofa vector x will be denoted by g (instead of

x). The transpose of g is denoted by g'. This avoids the problem of having different things denoted by the same symbol. (See 5.2.2.) 3. At the end of some of the sections, there are ‘Exercises’ and ‘Problems’.

The exercises are numerical examples and the problems are theoretical examples which are often extensions of the material in the text. 4. Readers who are not interested in linear programming, Tchebychev

approximations, and game theory may omit the following sections and chapters: 2.6, 3.4, 6.3, 7.3, 7.4;8, 9, 10.

CHAPTER 1

SETS AND MAPPINGS In this chapter we state and discuss the properties of sets and mappings which will be needed in the subsequent chapters. We will do this at an intuitive level without attempting to carry out a strictly axiomatic

approach. 1.1

Sets

A set is a collection of objects which is thought of as an entity in itself. The objects as in a set M are usually referred to as the elements of the set and this

relationship is written formally as xeM. This is usually read as ‘x is an element of M’ or ‘2: belongs to M’ or simply as

‘2: is in M’. A set M is said to ‘contain’ its elements. If the object 3/ is not an element of M, we write y 9% M.

A set is completely determined by its elements. Consequently two sets M and N are considered to be equal if and only if they contain the same elements, i.e., M=N if and only if

926M implies xeN and

xeN implies xeM.

A set is said to be finite if it contains only a finite number of elements. If every element of the set A is also an element of the set B, we say that A is a subset of B and express this in symbols by AEB

or

BEA

These are usually read as ‘A is a subset of B’ or ‘A is contained in B’ or ‘B contains A’. In order to prove that the set A is a subset of the set B it is therefore necessary to show that

a: e A

implies x e B.

CH. 1: SETS AND MAPPINGS

Clearly A :3 if and only if A g B and B E A. Every set A is a subset of itself. All other subsets M of A are referred to as proper subsets of A and we express this in symbols by M CA. In many cases it is convenient to be able to consider the set which con-

tains no elements at all. We call this the empty set and denote it by the symbol 25. The empty set is a subset of every other set. In order to characterize a set, it is necessary to specify its elements. For a finite set A, this can be done by writing out a list of its elements in the form A = {$1, . . ”22”}. Thus, for example, {1,3,5} is the set which contains the numbers 1, 3 and 5 as its elements. If A is countably infinite, we will extend this notation to write A = {x1,x2,x3, . . .}.

More generally, we will use the notation

A = {96; ¢(x)}

(1)

where 95(x) denotes a statement by which the elements of A are characterized. In words (1) reads, ‘A is the set of all those elements a; for which the

statement (Mac) is true’. For example, in 3-dimensional space, if a: is an arbitrary vector and y is a fixed vector, then A = {313; a: is orthogonal to y} is the set of all vectors x which are orthogonal to the vector g. If f is a real

number and N is the set of all natural numbers (positive non-zero integers), then

B = {6; £2 6 N} is the set of square roots of the natural numbers. 1.2

Families of Sets

Many of the sets studied in mathematics have sets as their elements, that is

to say, we will need to deal with ‘sets of sets’. If we wish to emphasize that the elements of a set .V are again sets, we will call 5’ afamily of sets and use

capital script letters.

Example 1. For each positive non-zero real number p, let Ap be the set AP={E;—p 0, then the intersection becomes 0h?) =AP0 e .5" and 5’ has the least element AM. The difference set A\B of two sets A and B is defined by A\B = {x;xeA,x¢B}. That is, A\B consists of those elements of A which are not contained in B. For example, if A = {1, 2,5, 7} and B: {5,6,7}, then A\B= {1,2} and B\A = {6}.

4

1.35 MAPPINGS

Problems 1 . Prove the distributive laws

An(BUC) = (AnB)U(AnC) and

AU(B00)=(AUB)O(AUO)

2. Let E be a set and, for each subset A ofE, let A denote the complement of

A in E, i.e. the set E\A. Prove the following rules.

(MAUB=ZnB (mAnB=ZuE (c) A E BifandonlyifAnE = 9

1.3 Mappings 1.3.1 General Mappings Let E andF be two sets. A nwpping ofE intoF is a rule which assigns to each element x e E a unique element 3/ e F. The element y e E which is assigned to the element a: e E by the mapping fis called the image ofa: underfand is denoted byf(x). The set E is called the domain off and F is called the range off.

Example 1. Suppose E is the set of all squares in a Euclidean plane and F is the set of all real numbers. A mapping of the domain E into the range F may be defined by assigning to each square the real number which is its area. Mappings will usually be denoted by small Roman letters f, g, .

(An

exception will be made in the case of permutations, cf. 4.1.) When an element :1: e E goes into the element 3/ GF under the mapping f, this will be indicated by the notation may =f(51:). Two mappingsfand g ofE intoF will be considered to be equal iff(x) = g(x) for all x e E, and this will be written briefly as f= 9.

Not every element of F needs to be the image of an element in E. The images of the elements of E form a subset ofF which we call the image of E under f and denote by f(E). Thus f (E) = {y; y e F and there is an element x e E such that y:f (95)}. If it happens that f (E) =F, we say that f is a mapping of E onto F. More generally, if A is a subset of E, we use the notation f(A) for the set of all the images of the elements of A. That is, f (A) = {31; there is an x e A such that f(x)=y}. Now let E,F and G be three sets and letf be a mapping of E into F and g 5

CH. 1: SETS AND MAPPINGS

a mapping ofF into 0". We can define a mapping of E into G by assigning to the element x e E the element z=g(f(x)) in G. This mapping is called the product of g and f and we denote it by gf (in this order!). The product mapping gf is therefore defined by gf(x) = g(f(x)) for all x e E.

It is important to remember the order of the factors. The productfg may have no meaning in general and even when it has (e.g., when E=F=G) fg need not be equal to gf.

Example 2. Suppose E=F=G={l,2,3}. Let f be defined by f(l)=2; f(2)=3; f(3)=1 and g by g(l)=l; g(2)=3; g(3)=2. Then, for example,

f9(2) =f(3)= 1 and 9f(2) =9(3) =2 and hencefgaégfNow let H be a fourth set and let h be a mapping of G into H. Then both the products h(gf) and (hg)f have a meaning and, in the following fundamental Theorem, we prove that they are equal. Theorem 1. Multiplication of mappings is associative.

Proof. For a: GE, let y=f(x) eF, z=g(y) GG and u=h(z) EH. Then gf(z) = z and h(gf ) (x) = h(z) = 71.. Similarly hg(y) = u and therefore (kg)f(2:) = u. Thus h(gf) (x) = (by)f(x) for all x e E and hence h(gf) = (119)f. In view of this theorem it is possible to omit the brackets and simply write hgf for the product of h, g and f. Iff is a mapping of E intoF and K is a subset of F, we define the inverse

image of K under f to be the subset of E given by f‘1(K) = {x;e,f(x) eK}.

That is, f‘1(K) is the subset of E which consists of all those elements whose images under f are in K. Note that the symbol 1"1 does not represent a

mapping in general because it is possible for many elements in E to have the same image in F. Clearly f‘1(F) =f‘1(f(E))=E, but we also note that it is possible for f‘1(K) to be the empty set and that this will happen when K n f (E)=@. If K consists of a single element, say K ={y}, we will write f‘1(y) for f‘1({y}) and this then means the set of all x e E for which f(3;) :31.

Example 3. As in Example 1, let E be the set of all squares in a Euclidean plane, F the set of all real numbers and f the mapping which assigns to each square its area. If f e F and §0, then f‘1(§) is the set of all squares of side 1/g. 6

1.3: MAPPINGS

1.3.2

0ne-to-0ne Mappings

A mapping f is called one-to-one (1-1) when no two difl'erent elements in the

domain have the same image in the range, i.e. when

701 ¢ 9’2 implies f($1) 95 f(x2) or, alternatively, f(.731) = f(”2)

implies

$1 = “'2'

If E is a finite set, a mapping f of E is clearly 1-1 if and only if the image f (E) contains exactly the same number of elements as E.

Whenf is 1-1, f'1(y) contains exactly one element x e E for each element 3/ e f(E). Putting f‘1(y)=x (instead of {x}), f—1 can now be thought of as a mapping from f(E) onto E and we call it the inverse mapping of f. The

mapping f‘1 is also l-l. Obviouslyf‘1f is the mapping of E onto itself under which each x e E goes into itself. We call this the identity mapping on E.

Similarly fi‘l is the identity mapping on f(E). Theorem 2. The product of two l-l mappings is 1-1.

Proof. Let f and g be 1-1 mappings and suppose that gf is defined. If

9f(x1) =9f(x2), then 9(f(x1))=9(f(x2)) and, since 9 is 1-1, f(x1)=f(x2) and hence x1 =x2.

1.3.3 Mappings of a Set into Itself It is possible for the domain and the range of a mapping to be the same set and, in this case, we will be dealing with a mapping of a set into itself. The best known examples of this are provided by the concept of a function. For example the function y = sinx assigns to each real number x the real number sin x, so that the function is a mapping of the set of real numbers into itself.

The 1 -1 mappings ofa set E onto itselfare ofparticular interest. (Note that if E is finite, a mapping of E into itself is 1-1 if and only if it is onto E so that only one of these two properties needs to be assumed.)

If we denote by T(E) the set of all 1-1 mappings of the set E onto itself, then T(E) has the following properties.

1. Iff, g E T(E), thenfg and gf e T(E). 2. Iff e T(E), thenf‘1 e T(E). 3. The identity mapping on E is in T(E).

Together with the associativity of the multiplication of mappings proved earlier in Theorem 1.3; 1, properties 1, 2, 3 show that T(E) is a group. (For the concept of a group, see 2.1.1 or [25] pp. 1—3.)

CH. 1: SETS AND MAPPINGS

Example 4. Let E ={1,2,3}. The group T(E) consists of the following six

mappings,

H122) 12—62:) #62?) fs=(;i2)

fs—(éii’) HHS)

In these, the mapping f is defined by writing the elements of E in the first row and then under each element its image under f. Thus fl is the identity

mapping, and, for example, fgl =f3, fgl =f5. It is clear that f1,f2, . . .,f6 are just the permutations of the three elements 1 , 2, 3. Permutations of a general set of elements will be introduced later in 4.1 by using this idea. Problems

1. Letf be a mapping of the set E into the setF and let A, B be subsets of E. Prove that f(A U B)=f(A) Uf(B) and f(A n B) S f(A) nf(B). Give an example to show that the g sign cannot be replaced by = in the second of these rules. What more can be said iff is 1-1? 2. Let f be a mapping of E into E and let A,B be subsets of E. Further let A =f_1(A*) andf(B)=B*. Prove thatf(A fl B)=A* n 3*.

CHAPTER 2

VECTOR SPACES 2.1

The Concept of a. Vector Space, Examples

Linear algebra can be characterized as the study of vector spaces. Vector

spaces, which are sometimes also called linear spaces, are a particular kind ofalgebraic system and accordingly they are sets in which certain operations on the elements are defined. (For the general concept of an algebraic system, see for example [24].)

2.1.1

Real Vector Spaces

Let E be a set whose elements are denoted by small Roman letters, and suppose that there is a rule which assigns to each ordered pair of elements it, y e E a further element 2 e E which we write as the sum ofx and y, i.e. we express the relationship between 2 and 95,3; by the formula

2 = x+y. The construction of z from x and y is known as a binary operation on the set

E which in this case is written as addition. We have said that 2 should be constructed from the ordered pair x,y (i.e., the pair y,x is different from the pair x,y), because initially we do not want to assume that the addition is commutative, i.e., that x + y = 3/ +3: for all x, y e E. We will be able to prove this as a consequence of the other conditions to be introduced later.

Now suppose that there is a further rule which assigns to each real number a and each element x e E a further element a e E which we Write as 714:“.

The element a will be referred to as the product of the element x e E with the

real number (or the scalar) oz. The construction of a from 0: and a: is a binary operation in the set of all real numbers and of elements of E which we will call multiplication by scalars.

Definition 1. The set E together with the operations considered above is a ‘real vector space’, if the operations satisfy the following seven axioms.

CH. 2: VECTOR SPACES

Al. Addition is associative x+(y+z) = (x+y)+z for all x,y,z e E

A2. There is a zero-element O e E such that

x+0 = 0+:c = mforalle A3. To each a: e E, there is an inverse element (—x) e E such that (—x)+x = x+(—:v) = 0

Except when there may be some risk of misunderstanding we will normally write —x in place of ( —-a:). Ml. lx=x for all x e E M2. Multiplication by scalars is associative (10396) = (afi)xfor all scalars cc, )3 and all a: e E D1. u(x+y)=ax+ocy for all scalars a: and all x,y e E

D2. (oc+,B)x=ax+)3x for all scalars «,3 and all x e E D1 and D2 are known as distributive laws. We will prove in 2.2.5 that, as a consequence of these seven axioms, the

addition is also commutative. Because this is so important, we will already make a note of it here in the form of a further axiom. A4. w+y=y+x for all any 6 E (see Theorem 2.2.1)

An algebraic system which has a binary operation satisfying the axioms Al, A2 and A3 is called a group. There are many examples of groups in which the operation is not commutative (e.g., the groups A(E) and GLn in 5.3, where the operation is written as multiplication instead of addition). Consequently, in the case of a vector space, the axioms Al, A2 and A3 Will not be sufficient in themselves to prove A4, and the other axioms must also

be used in the proof. The elements of a vector space will usually be referred to as vectors. This name comes from the first of the following examples of vector spaces in which the elements are vectors of 3-dimensional space in the sense of elementary geometry. As before, we will use small Roman letters for vectors and small Greek letters for scalars throughout the rest of this work.

The significance of Linear Algebra in mathematics and in its applications stems from the fact that various types of vector space arise naturally in many different branches ofmathematics. We will now set out some examples of these for future reference.

Example 1. We start with an (affine or Euclidean) 2-dimensional plane or a 3-dimensional space and consider vectors in the sense of elementary geo-

10

2.1: THE CONCEPT OF A VECTOR SPACE

metry. These can be represented by directed segments (arrows) where two

arrows which are obtained from each other by a parallel shift represent the same vector. For these vectors, an addition and a multiplication by real

scalars are defined in a way which is well-known and may be remembered with the help of Fig. 2.

It is possible to prove by geometrical methods that these two operations satisfy the seven axioms of Definition 1. The set E of all vectors of the plane or of the 3-space is therefore a real vector space with this addition and multiplication by scalars. We Will refer to these two spaces as G2 (for the

plane) and GS (for the space). Note that they are difierent from the plane and space which appear in their construction and which are of course sets of points. This example is of particular importance for two reasons. 1. It makes possible the use of Linear Algebra in the development of analytic geometry. 2. It enables us to illustrate results in Linear Algebra geometrically in a plane or in 3-space. Example 2. Let n be a natural number and let E be the family of all ordered sets ofn real numbers (n-tuples). Ifx= (f1, . . ., E”) e Eandy = (771, . . ., 17") e E are two such ordered sets, we define their sum by x+y = (§1+771:-~:§n+7]n) GE:

and the product of x and the scalar oz by (xx = (afl,...,afn).

It is easy to verify that the seven axioms are satisfied. In particular the zero-element is 0: (0, . . .,0) e E. (Note that here the left-hand zero denotes 11

CH. 2: VECTOR SPACES

the zero-element or zero-vector of E, while the zeros inside the brackets

denote the real number 0.) Further —x=(— £1, . . ., —f,,). The real vector space so defined is called the space of n-tuples and we will refer to it as Rn.

Example 3. Instead of ordered sets of 11, real numbers, we can also consider

countably infinite sequences of real numbers, i.e., x = (51,52, {3, . . .). Addition and multiplication by scalars can be defined exactly as in Example 2 (viz. ‘term by term’) and, in this way, a new real vector space is constructed which we call the space of sequences and denote by F.

We obtain another vector space F0 which is a subset ofF, by considering only those sequences which contain all zeros from some index on (the index may differ from sequence to sequence), i.e., those sequences which contain only a finite number of non-zero terms. Example 4. In Example 3, a vector x e E is constructed by assigning to each natural number k a real number ék. Each 9: is therefore a mapping of the set N of natural numbers into the set R of real numbers (i.e., a real-valued

function defined on N). The same is true in Example 2 when N is replaced by {1,2, . . .,n}. We obtain a generalization of these examples by replacing N with an arbitrary set A. We denote by F(A) the set of all mappingsf of the set A into the set R of real numbers. If we now define for mappings fig 6 F(A) and scalar ac and

f+9

by (f+9)(2) = f(2)+9(Z)

06f

by

(1)

(0613(2) = 04(2)

for all z e A, then F(A) becomes a real vector space. By analogy with Example 3, we can further consider the setF0(A) E F(A) which consists of those mappings f e F(A) such that f(z) #0 for only a finite number of elements 2 e A. Obviously (1) defines operations in F0(A) and F0(A) is itself a vector space.

Example 5. Again let n be a natural number and let P” be the set of all real polynomials x in a real variable 7 which have degree at most n. n

”(7) = 2 01k

(a0,...,o¢n real).

We define addition and multiplication by scalars in the usual way for functions, i.e. we put

and 12

2 = x+y

when

2(1) = x(7)+y(r)

for all 1- ER

a = ax

when

u(7-) = «90(7)

for all 7 ER.

2.1: THE CONCEPT OF A VECTOR SPACE

With these definitions x+y and ax are polynomials of degree at most n and the seven axioms are satisfied. This real vector space of polynomials of degree at most a is denoted by P". Example 6. In the same way, we can consider the set P of all real poly-

nomials (without restriction on the degree). We then obtain the space of polynomials P. Example 7. Let 0 be the set of continuous real-valued functions x(-r) on the real interval — l S 7- S +1. Addition and multiplication by scalars are defined as in Example 5. Since, from any 6 0, it follows that x+y e C and out 6 0' for all scalars a, and since the axioms are satisfied, we obtain another real vector space which will be denoted by 0.

Of course we can replace the interval — l S 7' g + 1 by an arbitrary closed interval p < -r < o- to obtain a vector space C(p, 0).

Example 8. Let 1- be a variable angle measured in radians. The functions cos k7 and sin k-r are periodic ofperiodicity 277 for each natural number k and so can be thought of as real-valued functions on the circumference ofa circle. The same is true for all finite sums of the form 22(7) = %oco+ E {ackcoslc-r+]3ksinkr} k=1

(2)

with real coefficients oak and Bk, all but a finite number of which are zero. (The factor 12» in the term %oc0 is introduced for purely technical reasons—see

Example 3.1 ;8.) If we again define addition and multiplication by scalars as in Example 5, we see that, if a: and y are of the form (2), then so are x +y and ax. Also the seven axioms are satisfied and hence we obtain the real vector space of trigonometric polynomials which we denote by T.

Example 9. Let fl, . . ., 5,, be real variables. A linear form in these variables is a function x which can be represented in the form

x(a,...,a) = u1§1+...+«na = élaka with real coefficients a1, . . ., oak. It is easy to verify that the set of all linear forms in the variables $1, . . .,§,, together with their natural addition and multiplication by scalars is a real vector space Ln. Example 10. The sum x+y of two complex numbers and the product we of a complex number x by a real number at are again complex numbers and the seven axioms are satisfied. Hence the set of complex numbers with the

13

CH. 2: VECTOR SPACES

usual addition and multiplication is a real vector space which we will denote by K. (In this context, only the multiplication by real numbers is relevant and the multiplication ofarbitrary complex numbers is not needed.)

Example 11 . In Example 10, the set of complex numbers can be replaced by the set ofreal numbers. The latter then becomes a real vector space which we call the vector space of real scalars and denote by SR. 2.1.2

Complex Vector Spaces

The definition of the concept of a vector space can be straight-forwardly modified so that the complex numbers appear as scalars in place of the real numbers. We will then use the term ‘complex vector space’.

Definition 2. A ‘complex vector space’ E is a set in which an addition (x,y e E—>z=x+y e E) and a multiplication by complex numbers (a: e E, at complex—>u=ocx e E) are defined in such a way that the seven axioms of

Definition 1 are satisfied. We will again refer to the elements of a complex vector space as vectors and to the complex numbers as scalars in this context.

The theories of real and of complex vector spaces have very many results in common. Because of this, it is convenient in the following to make the

convention that all results, which are not specifically stated to apply to only one type of vector space, apply to both. Examples of Complex Vector Spaces Example 12. In Examples 2 and 3 (Rn,F,F0), if we use ordered sets and

sequences of complex numbers (instead of real numbers) and use complex scalars, we obtain three examples of complex vector spaces. Similarly we obtain a complex version of Example 4, by replacing F(A) with the set of all mappings of A into the complex numbers and at the same time use complex scalars. Example 13. Corresponding to Examples 5 and 6 (Pu and P), we obtain

complex vector spaces by leaving the variable 7- to be real but allowing the coefficients oak of the polynomials to be complex and using complex scalars. Notice that the set of polynomials with complex coefficients can also be

made into a real vector space by considering only the multiplication by real scalars. This set becomes a complex vector space as soon as arbitrary complex

numbers are allowed as scalars. Naturally we could also take 7- to be a complex variable and, in this way, 14

2.1: THE CONCEPT OF A VECTOR. SPACE

we would obtain new examples of vector spaces insofar as the elements are new. However it is well known that a polynomial in a complex variable 7- is completely determined by its values on the real numbers so that in fact no essentially different example is found in this way.

Example 14. Example 7 (0) can be made into a complex vector space by taking the functions 22(7) to be complex-valued continuous functions on the

real interval — 1 s Ts + l. The sum of two functions of this type and the product of one by a complex scalar are again functions of the same type. Example 15. A complex vector space is constructed from Example 8 (T), if the coefficients ah and Bk are allowed to be complex and the scalars are taken to be complex. In view of Euler’s formulae

6"" = cos k1- + i sin [or 61127 + e—ilc'r

eikr _ e—ilcf

cosk-r = ——

sink'r = —.—

2

21

the complex trigonometric polynomials can be written more concisely in the form

x(1-) =

+ no _ Z oak e'k"

(oak = O for all but a finite number of It’s).

k=-—no

Example 16. The complex version of Example 9 (L) is also easy to define.

We merely have to take the coefficients 0:17 and the scalars to be complex. It is also possible to choose the variables Eh to be real or complex.

Example 17. Finally the set of all complex numbers is itself a. complex vector space with the usual addition and multiplication. We call it the vector space of complex scalars and denote it by SK. Problems

1. Is the set of all integers with their usual addition and multiplication by real scalars a vector space? 2. Show that the set of all vectors x: (£1,52,§3) e R3 for which 251 —§2 + £3 =0 is a vector space with the operations defined in R3. Is this still true for the set of those vectors for which 2E1 —§2+ f3: 1? 3. Show that the set of all polynomials x(1-) e P (Example 2.1 ; 6) for which +1 j' x(1-)d—r=0 is a vector space with the operations defined in P. Is this still —1 +1

true if the condition is j' x(-r)d-r= l?

—1

l5

CH. 2: VECTOR SPACES

4. Let E1,E2 be real vector spaces. Let F be the set of all ordered pairs ($1,962), Where x1 6 E1 and x2 6 E2, and define (31,12)+(y1:y2) = (z1+y1,x2+y2)

“(xlyxfl = (“371,“l-

Prove thatF is a vector space. This is known as the directproduct ofE1 and E2. (If E2 is spanned by a single element uaéO (see Definition 2.4; 2), then the construction of the direct product is also referred to as adjoining the vector u to the space E1.)

2.2

2.2.1

Rules for Calculation in Vector Spaces

Sums of Finitely Many Vectors

Addition in a vector space is initially defined only for two terms. If we are given three vectors x, y, z e E, we can first form x + y and then the sum of this vector with z, i.e., (x+y) +2 6 E. Similarly we can form x + (y+2) E E. The associative law Al now says that these two expressions represent the same vector. In other words, it does not matter in which way the three terms are bracketed together (for a given order) and so the brackets may be omitted. Hence, for given x, y, z, w = x + y + z is a. well-defined vector in E. The same is

true for an arbitrary number of vectors. For example, for four vectors, in View of A1,

a£+(3/+(Z+W))= (x+y)+(Z+W)=((w+1/)+Z)+w so that it is possible to write x+y+z+ w. In order to avoid possible misunderstandings, we note here that a sum offinitely many vectors has a well-defined meaning but that infinite series of vectors are meaningless (nevertheless see 12.3). A convenient notation for a sum of finitely many vectors is obtained by using the summation sign. it

x = x1+...+x,, = 2 wk.

k=1

2.2.2

The Zero-Vector and Inverse Vectors

In connection with A2, we note that, in a vector space E, there is only one zero-element. If 0 and 0* are two such elements, i.e. 0 +a: :96 and y +0* = y for all my 6 E, then in particular 0=0+O* =0*. (Put an: 0* and 11:0.) Similarly we note that, in connection with A3, there is only one inverse vector of a given vector x e E. If ( —x) and (—x*) are two such vectors, then

(—x)* = 0+(—x>* = ((—x)+x)+(—x>* = (—x>+(x+(—x)*) = (—x)+0 = (—x). 16

2.2: CALCULATION IN VECTOR SPACES

Note that we have not used either at: + ( —— x) = O or ( — x)* + x = 0 in proving

this. Axiom A3 could therefore be stated in the following weaker form: ‘To each x e E there is a left inverse element (— x) such that (— x) + :1: = 0 and a right inverse element (—x)* such that x + ( —x)* = 0’. From the above

argument it follows that (—x) = ( —:c)*. A corresponding result is also true for the zero-vector. According to A3, — :49 also has an inverse vector which is actually equal to :19 again (because (—x) +x=x+ ( ~92) =0). Thus

—(—x) = x.

(1)

Since (x+y)+((-y)+(-x))=x+((y+(—y))+(-x))=x+(‘x)=0 and similarly ((— y)+(—x))+(x+y)=0, the inverse of a sum of vectorsxandy

is given by

-(x+y) = (-y)+(-x)2.2.3

(2)

Subtraction of Vectors

Corresponding to two given vectors 95,1; 6 E, there is a unique vector 2 e E

such that z+x = y

(3.1)

(viz. z=y+ ( —x)). This follows because, adding — x to both sides of equation(3.l) on the right,

we have

y+(—x) = (2+x)+(-x) = 2+(x+(-x)) = z and with this 2 we obtain

2+x = (y+(—x))+x = y+0 = 3/Instead ofy + ( — x) we usually write 3/ — x and call this the difference ofy and a: (in this order). We must remember however that y — :1: has no other meaning than y + ( —— x). The two equations 2+3: = y

and

z = 11—2:

(3.2)

therefore have the same meaning.

2.2.4 Rules for Multiplication by Scalars If a: e E and ac is a scalar, then

ax=0ifandonlyifac=0 or x=0.

(4)

Proof. 1. For an arbitrary scalar a, a0 = oc(0+0) = a0+oc0 l7

CH. 2: VECTOR SPACES

and, adding — (0:0) to both sides, it follows that «0:0. 2. For an arbitrary vector x e E Ox = (0+0)a: = 0x+0x and, adding — (0x) to both sides, it follows that 0x=0. 3. Let ozx=0 and assume that «#0. By 1, M1 and M2, it follows that x: lx=(la)x=l(ax)=10=0.

a:

o:

a:

Since x+(—l)m=lx+(—-l)x=0z=0, and similarly (—l)x+x=0, it followsthat

—-x = (— 1) x.

(5)

Multiplying both sides of (5) by an arbitrary scalar a, it also follows that

«(*x) = (MM = -(0w)-

(6)

If k is a natural number, then kx = x+x+...+x(lc terms).

(7)

Proof. The equation (7) is true for k = 1 (M1). We use induction on k to prove (7) in general. Assume that (7) is true for a sum of k— 1 terms, then km = ((Ic—l)+l)x = (k—l)a:+lx = (x+x+. . .+x)+x

where there are k— 1 terms in the brackets. Therefore (7) is also true for k and hence, by induction, for all natural numbers. By using (6), it also follows that

(—m)+(—x)+...+(—x) = Ic(—z) = (—Ic)x = —0, ADAZ >0 and A1+A2= 1, A1x1(O) + A2x2(0) = 1 and A1x1(l) + A2x2(1) 2 0.

that

Theorem 1. If 01 and 02 are convex sets in a real vector space, then 0=pu1 01 +p202 is also convexfor all real .“vlnuzProof. Suppose 951,2:2 e 0, so that $1 = [1.1 011+.Uv2021 and x2=n1012+n2022

where 611,612 (5 01 and 021,622 6 02. If ADAZBO and A1+A2= 1, then 3/ = A15"'1'l')\2@2 = #101 011 + A2 012) + MOM 0'21 + A2 622) = [1,101+];c where cl 6 01 and 62 e 02. Hence y E 0.

32

2.6: CONVEX SETS

Definition 2. Let E be a real vector space and let A be a subset of E. A ‘convex linear combination’ of the elements of A is a linear combination y: 2 Ma: A where AmZOforallx EA. and 2 Az=1. IE :cEA.

Theorem 2. A convex set 0 contains all convex linear combinations of its

elements. Proof. The proof is by induction on the number n of vectors x1, . . .,x,, e 0 in the convex linear combination 3/: Alxl + . . . +Anxn. The theorem is true for n = 2. Now assume that it is true for all convex linear combinations of at most n — 1 vectors. Since at least one of the coefficients, say A”, in the convex linear combina-

tion y: A1561 + . . . +Anx” is strictly positive, we can write 3/ in the form A A y = A1x1+p(;2x2+. . .+;”x,,) where p.=)\2+ . . . +An( #0). The vector in the brackets is a. convex linear combination of :82, . . .,x,, and hence, by the induction hypothesis, it is in 0'.

Since also 951 e 0, A1+p=A1+A2+ . . .+An= l and A1,”.20, it follows that y e 0. The theorem is therefore true by induction for all n. For an arbitrary subset A of E, we will denote the set of all convex linear combinations of the vectors of A by K*(A).

Theorem 3. K*(A) is a convex set containing A.

Proof. 1. Every vector at e A is a. convex linear combination x=1x of itself and therefore K*(A) 2 A. 2. Suppose say 6 K*(A) so that x: 2 0:22:

where

2511

and

Z az=l

and

a220forallzeA,

16A

y = 2 3,2 where 2 [3,: l

and B2 2 OforallzeA.

26A

26A

NOW,if)l+p.=l andA,y.>O,

2 (Aazwfizn MW?! = 254 and in this 2 (Aaz+p.fiz) = l and Aocz+ “[3220 for all z e A. Hence zEA

M+,u.y E K*(A) and K*(A) is convex. 33

CH. 2: VECTOR SPACES

Theorem 4. Let 5’ be a family of convex sets in a real vector space E, then H (5’) is also convex.

Proof. Suppose x1,x2 efl(.9’), so that .7:a e 0’ for all 0 6.57. Now, if A1,)l220 and A1+A2=1, then A1x1+A2x2 60 for all 069’. That is, A1x1+Azx2 eflty).

Definition 3. Let E be a real vector space. The ‘convex hull’ K (A) ofa subset A of E is the intersection of the family of convex sets which contain A. If A is finite, say A={a1,...,a,,}, we will write K(A)=K(a1,...,a,,). It follows immediately from Theorem 4 that K(A) is convex. Theorem 5. The correspondence A—>K(A) is a closure operator, i.e.,

l. K(A) 2 A 2. IfA1 9 A2, then K(Al) g K(Az)

3. K(K(A)) = K(A). Proof. 1. K(A) is the intersection of sets which contain A and therefore K(A) also contains A (Theorem 1.2 ;3).

2. Let y; be the family of convex sets containing Ai(i = l, 2). Then, if A1 9 A2: 5’1 2 «V2 and hence K(A1)=n(y1) E fl(~5’2)=K(A2) (Theorem 1.2 ;2).

3. More generally, we show that if 0 is convex then K(0) = C. In this case, 0 is a convex set containing itself and therefore, by Theorem 1.2:1, K(O’) g 0, but from 1. K(C) 2 C and hence K(C')=0.

Since K(A) is both convex and the intersection of all convex sets which contain A, K(A) could also be defined as the least convex set containing A (Theorem 1.2;4).

Theorem 6. IfA S K(B) andB S K(A), then K(A)=K(B). Proof. By Theorem 5, if A E K(B), then K(A) S K(B) and similarly

K(B) E K(A). Theorem 7. K(A) consists of the convex linear combinations of the vectors of A, i.e., K(A) =K*(A). Proof. 1. K(A) is convex and contains A and therefore, by Theorem 2,

K(A) 2 K*(A). 2. By Theorem 3, K*(A) is a convex set containing A and therefore K(A) E K*(A) (Theorem 1.2;1).

34

2.6: CONVEX SETS

2.6.2

Convex Cones

Let E again be a real vector space.

Definition 4. IfA is a subset ofE, a vector oftheform, y: E Ame: where A4, 2 0 $64

for all x e A is called a ‘positive linear combination’ of the elements of A. The vector y is called a ‘strictly positive linear combination’ if not all Az=0. Definition 5. A subsetP ofE is a ‘convex cone’ ifP contains all positive linear combimztions of any pair of its vectors.

A convex cone is obviously convex in the sense of Definition 1. Convex cones could also be characterized as those convex sets 0 which contain all

positive scalar multiples MM 20) of all vectors a: e O. The theory of convex cones is developed in a way directly analogous to

that of convex sets, and we will therefore only set out the most important results and in the main without proofs. UK is a convex cone and oc>0, then ocK=K. Ifoc P(A) is a closure operator (Theorem 5).

Definition 6. A convex cone K is said to be ‘acute’ ifK fl (—K)={0}; i.e., if a; e Kand —x e Kimpliesx=0. An acute convex cone therefore contains no inverses of any of its vectors

except the zero-vector. A convex cone K is acute if and only if 0 e K is not a strictly positive linear combination of the other vectors in K. For every convex cone K, the set L = K 0 (—K) is a subspace of E and K

is acute only when L = {0}. If to e K, then the coset L +90 is contained in K. The set of all cosets of L which are contained in K will be denoted by K and

clearly K S E/L. Theorem 8. IfK is a convex cone in E, then K is an acute convex cone in E/L. 35

CH. 2: VECTOR SPACES

Proof. 1. Suppose L+x1 and L+x2 e I? and let A1,/\220. Then whacz E K and therefore A1951 + A2 x2 6 K and hence L+(A1x1+)‘2x2) = A1(L+x1)+A2(L+x2) 51?2. If L+xeK and —(L+x)=L—x GK, then to 6K and —x 6K and hencex GK 0 (—K)=L, i.e., L+x=L.

Theorem9. If P,QSE are convex cones such that P+Q=E and P n (—Q)={0}, thenP= —P and Q: —Q, i.e., P and Q are subspaces ofE.

Proof. Suppose p1 e P. From the conditions of the theorem, there are elementsp EP and q e (2 such that ~p1=p+q and hence —q=p1+p. In other words ——q 6P n (—Q)={0}. Thus q=0 andp1= —p e —P. From this it follows thatP g —P and, multiplying by -— 1, that —P E P.

Hence P = —P. Similarly Q: — Q.

Example 3. Every subspace of a real vector space is a convex cone. A eoset is a convex cone if and only if it is a subspace. Example 4. In the vector space 0 (Example 2.1 ;7) the functions 95(7) for which x(7-) 2 0 for — § < 1- &4} form a convex cone. If {31(1) 2 0 and x2(1-) 2 0

for —§< 7- < i- and A1,)(2 20, then A1w1(1)+A2x2(-r) 20 for — 1} S -r< %.

Example 5. Examples of convex cones in the vector space G2 (Example 2.1 ;l) are indicated in the following diagram.

3——

O

(1) (2)

0

e

0

(3)

(4) Fig. 7.

e

2.6: CONVEX SETS

In (3) (halfplane) all positive multiples of the vector e belong to the set, but not the strictly negative multiples. In (4), all multiples of e belong to the set. (1), (2) and (3) are acute but (4) is not. In 03, examples of convex cones are 3-sided pyramids and circular cones (infinite in only one direction) with vertices at the zero-vector. Problems

1. Show that the set of all vectors x: (£1,§2, £3, f4) 6 R4 which satisfy the homogeneous linear inequalities $1 + £2 —- £3 — 254 2 0 and 2§1 + £2 + £3 + 4§4 2 0 is a convex cone.

2. If the right-hand sides of the inequalities in Problem 1 are replaced by 2 and 3 respectively, is the set still a convex cone? Is it a convex set? 3. Show that every subspace ofa real vector space is convex. Is this the case for the cosets? 4. Find conditions for a eoset to be a convex cone. 5. Let E be a real vector space and let x e E. Is the difference set E'\{z} convex? 6. Show that, if A,B are subspaces of a vector space, then

K(A u B) = K(K(A) u K(B)) and K(A+B) = K(A)+K(B).

37

CHAPTER 3

BASES OF A VECTOR SPACE, FINITE-DIMENSIONAL VECTOR SPACES 3.1

Bases of a Vector Space

3.1.1 Linear Dependence and Independence of Vectors

Definition 1. A subset S of a vector space E is said to be “linearly independent’ if there is no linear combination 2] (xxx which is equal to the zero-vector except rates

the one in which az=0for all x eS. A set which is not linearly independent is said to be linearly dependent.

Instead of saying that the ‘set’ S is linearly dependent or independent, we will say that the vectors of S are dependent or independent. It is clear that a set S is linearly dependent, if there is a linear combination y: 2, acts: of 168

the vectors ofS which is equal to the zero vector but has some coefficients not equal to zero.

Theorem 1. A set S is linearly dependent if and only if there is a vector mo in S which is a linear combination of the other vectors in S. Proof. 1. Suppose :60 ES and x0:

2

«22:. Then 110—

zeS\(a:u)

2

«31:0

zes\(:co)

is a linear combination of the vectors of S in which at least one coefficient is not zero.

2. Suppose S is linearly dependent, so that E axx=0 and axoaéO 265‘

for some x0 6 S. Then $0 =

——

2

“xx.

“so zeS\(xo}

Example 1. A subset S of the vector space G2 (Example 2.1 ;1) is linearly independent only when S: {a} and ayéO or S={a1,a2} and aha/2 are not parallel to the same line. In 03, a subset S is linearly independent only in the

following three cases. 38

3.1: BASES OF A VECTOR SPACE

1. S={a}; 0.750. 2. S={a1,a2}; abag not parallel to the same line. 3. S = {a1,a2,a3}; a1,a2,a3 not parallel to the same plane. 3.1.2

The Concept of a Basis

Definition 2. A ‘basis’ B ofa vector space E is a linearly irdepemlent spanning set for E. (Definition 2.4 ;3). Example 2. In 02(G3), a linearly independent set is a basis if and only if it contains two (three) vectors. The elements of a basis (basis vectors) will normally be denoted by the letter e.

Theorem 2. IfB is a basis ofE and B¢e, then every vector x e E can be written in exactly one way as a linear combination

= 2 fee

(1)

eeB

of the elements of B. (B: Q is a basis if and only if E = {0}.) Proof. 1. The fact that every vector x e E can be written in the form (1) is nothing more than the assumption that B is a spanning set. 2. Supposex: E3 fee: EB me. Then 26:3 (fa—1),)e=0and, because B IS linearly independent, 56:1), for all e e B.

Definition 3. The scalars Ea which are uniquely determined when a: e E is written in theform (1) are known as the ‘components’ ofx with respect to the basis B. Thus, for a given vector x e E, there is a unique component Ea corresponding to each basis vector e e B. We note, however, that only a finite

number of these components are not equal to zero. If the basis B is finite or countably infinite, i.e., B ={e1, . . ., en} 0rB= {eh e2, e3, . . .}, we Will denote the components by Q instead of if“ and write, in place of (l), w

n

x: 2 gkek

i=1

and

x: Z Ekek.

k=1

Example 3. In Fig. 8, which refers to the vector space G3, x=e1+e2+2e3. Hence the components of x with respect to the basis B ={e1,e2,e3} are {51:1, §2=11 53:2

39

CH. 3: BASES OF A VECTOR SPACE a:

82

Fig. 8.

Theorem 3. The components of a linear combination of vectors are the corres-

ponding linear combinations of the components of the vectors. in

Proof. Suppose

y: 2 which k=1

and

wk: 2 5M

(lc=l,...,n).

Then

593

n

y: 2 me where 1],: Z «,5k for all e e B. 95B

Ic=1

Example 4. Let B={e1,e2,e3} be a basis of the vector space G3 and let x1=e1+e2—2e3, x2=2e1—e2+e3. Then y=2x1—3x2= —4cl+5e2—7e3.

Hence the components of y are 171: —4=2.1—3.2, n2=5=2.l—3(—1), 713: —7=2(—2)—3.1. For a given vector :1: e E, there is a component 5, assigned to each basis vector e e B. This means that, corresponding to each a: E E, there is a unique element of the vector space F0(B) (Example 2.1 ;4). Hence a mapping f of E onto F0(B) is given by x—>f(x) e Fo(B) Where f (x) is the function on B which takes the value £6 at e e B. (f is real- or complex-valued according as

E is real or complex.) Theorem 3 can now be expressed by saying that the mapping f has the property

f (15:11 White) = i1 abf(xlc), i.e., the image of a linear combination of vectors is the corresponding linear combination of the image vectors.

A mapping of one vector space into another vector space which has this property is known as a linear mapping (see Definition 5.1 ;1). Further, f is 1-1 because, by (1), vectors which have the same components

are equal. 40

3.1: BASES OF A VECTOR SPACE

If the basis B={e1, . . gen} is finite, F0(B) can be replaced by Rn (cf. the remarks on Example 2.1 ;4). In this casef(x)=(f1, . . ., En), i.e., to each a: e E we assign its row of components.

Theorem 4. A subset 18’ of E is linearly dependent if and only if the image set f(S) E Fo(B) is linearly dependent. Proof. Sincef is a linear l-l mapping of E onto Fo(B), Theorem 4 is a direct consequence of Theorem 5.1 ;5, which we may use here in advance. Example 5. Let B={e1,e2,e3} be a. basis of 03. Further let x1 = e1+2ez—e3 6 G3

so thatf(x1) = (1,2,—1)ER3

x2 = -—e1+e2+2e3 603

so thatf(a:2) = (—l,1,2) 6R3

and x3 = —e1+4e2+3e3 603 so thatf(x3) = (—l,4,3) 6R3. Now, in R3,

f(x1)+2f($2)-f(x3) = (1,2.-1)+2(-1»1,2)-(-1,4,3) = (0,0,0) = 0 6R3, i.e., the rows of components are linearly dependent in R3. Hence the same is true for $1,222,233 in the vector space G3. The next two theorems show that the concept of a basis can be defined in two ways other than that in Definition 2.

Theorem 5. A ‘linearly independent’ set B Q E is a basis if and only if it is ‘maximal’, i.e., if every set which properly contains B is linearly dependent. Proof. 1. Suppose B is a maximal linearly independent set and let 2 e E\B. Then B U {z} is linearly dependent and there is an equation

2 aee+ fiz = 0 as]?

in which not all the coefficients ace and ,3 are zero. In this case [3 #0 because otherwise B would be linearly dependent. Hence we can solve the equation

forz and z = (— l/B) Z are e means thatz eL(B). Thus B is a spanning set for eeB

E and hence a basis. 2. Suppose B is a basis of E, that A is a subset of E properly containing B, and that z 6 A\B. Now 2 is a linear combination of the elements 41

CH. 3: BASES OF A VECTOR SPACE

of B (Theorem 2), so that A is linearly dependent (Theorem 1). Hence B is a

maximal linearly independent set.

Theorem 6‘. A ‘spanmmg set’ B of E is a basis if and only if it is ‘mlm'mal’, i.e., if every proper subset of B does not span E. Proof. 1. Suppose B is a basis, A is a proper subset of B and z E B\A. Since B is linearly independent, z ¢ L(A) and therefore A is not a spanning set for E. Hence B is a minimal spanning set for E.

2. Let B be a spanning set for E which is not a basis, i.e., which is not linearly independent. Then there is an element z 63 which is a linear

combination of the other elements in B, i.e., z e L(B\{z}). But then B E L(B\{z}) and, by Theorem 2.4 ;7, it follows that L(B\{z}) = L(B) = E. Hence the proper subset B\{z} g B is a spanning set for E and B is not minimal. Examples of Bases in Vector Spaces

We will describe bases for some of the examples of vector spaces introduced in 2.1 , but it must be noted that, in each case, in addition to the given

bases, there are infinitely many others. We note also that the following discussion holds both for real and complex vector spaces. Example 1. G2, G's. We have already described all the bases of G2 and G3 in Example 3.1 ;2. Example 2. R”. The set of the following 11, vectors is a basis of R”

e1 = (1,0,0, . . .,0,0),

e2 = (0,1,0, . . .,0,0), . . .,

en = (0,0,0, . . .,0, 1).

At this point it is convenient to introduce the function which is known as the Kronecker delta 8“,. The symbol 8,7, takes the value 1 when l: k and 0 when ’lgék. Usually l and [C will be natural numbers, but they may also be elements of an arbitrary set (Example 4). With the help of this symbol, we

can describe the given basis for Rn in the form 8,; = (8i1!3i2""!8in)

(7; = l,2,...,'n).

We show that {eh . . .,e,,} is a basis as follows. Ifx= (£1, . . ., E”) e R", then II

n

x = Z {kek so that e1, . . .,e,, span the Whole ofRn. Also, if 2 Ekek=0, then 15—1 Ic=1 (£1, . . ., if") :0 and 61:. ..=f,,=0. Hence e1, . . .,e,, arelinearlyindependent. At the same time, we see that, if x=(fl, .. .,§,,), then the scalars $1,...,§n are the components of xwith respect to the given basis. 42

3.1: BASES OF A VECTOR SPACE

Example 3. The discussion of a basis for the space of series F is outside the scope of this book because it requires transfinite methods. The countably

infinite set of vectors

6i = (3:1, 3a, 81'3, - - -)

(’i = 1, 2, 3, . - ')

is a basis of the vector space F0. Example 4. The discussion ofa basis for the vector spaceF(A) is also outside

the scope ofthis book except when A is finite. In the latter case, the situation is analogous to the vector space R”. The functions ex(x e A) given by (543/) = 8x11

(3/ E A)

form a basis B of the space F0(A), where 8”: 1 if x=y and 8,51,:0 if xaéy. Proof. 1. 2 axex=0 means 2 azex(y)=ay=0 for all y e A. Therefore B zeA

xeA

is linearly independent. 2. An arbitrary mapping f eFo(A) can be written in the form

f= “A 2 Mex, because «:64 2 f(w) cm =f(.1/) for any at Example 5. The vector space P" is obviously spanned by the polynomials fl

ei which are given by eI-(—:-)=-ri (i=0,. . .,n). Now suppose 2 Met-=0, i.e., i=0 2 Airi=0 for all 7-. From the well-known fact that a polynomial only

i=0

vanishes identically if all its coefficients are zero, it follows that Ai=0 for

i=0,1,. ..,n. Hence eo, . . .,e,, are linearly independent and therefore form a basis.

Example 6'. The countably infinite set of polynomials ei, where et-(-:-)='ri (Ii: 0, l, 2, . . .) is a basis of the space of polynomials P. The discussion ofa basis for Example 7 (the space of continuous functions)

is again outside the scope of this book. Example 8. The countably infinite set offunctions eo, eh andfk (k = 1 , 2, 3, . . .), where eo(1) = ;, ek(7-) = cos Ic-r and fk(1-) =sin 1:1, is a basis of the vector space T of trigonometric polynomials. It follows immediately from the definition of T that these functions span T. The fact that they are also linearly independent follows from the orthogonality relations which are proved in most standard texts on integral calculus. 43

CH. 3: BASES OF A VECTOR SPACE 2n

That is,

o ifjc

I cosja- coskrd-r 0 2n

fsinj-rsink-rd-r = 0 ifj ya lo 0

21’

0

f cosjq- sin Im- (11' o 2!:

2w

f (cosk'r)2dr = f (sink¢)2dr = 1r(k > 0) 0

0

Hence, if “080+

Z (achek+flkfk) = 0, k=1

co

i.e., if $050+ 2‘, (ackcosk-r+flksinlc7) =0, for all 1-, and we want to show that k=1 ,Bj=0, say, then we multiply both sides of the equation by sin3'1 and integrate from 0 to 271’. (It is possible to do this term by term because only a finite number of the coefficients are not zero.) In view of the orthogonality relations it follows that 21r

I Bj(Si-nj7)2d1' = 11,35 = 0 and therefore ’35 = 0.

o

The proof that tag-=0 is just the same except that we multiply by cosj-r. In the complex case (Example 2.1 ;l5) another basis is the set of vectors

e], given by ek(r) =e“"" (10:0, :1; l, i 2, . . .). It has already been shown that these span the vector space. Their linear independence is proved as follows. + so

+ co

E

akek = 0 means

_

2

ake'k" = 0for all -r.

Ic= — no

k = — “0

Multiplying by e—‘j’ and integrating from ~11 to +11- (again only a finite number of the terms in the series are not zero), it follows that

+w E

Ic= — no

+" . .

on,: I e'(’°")"dr = 27m]- = 0

and therefore

a,- = 0,

—1r

forj=0,il,i2,... (In these formulae i is the imaginary unit 1/ -- l and c are integers.) Example .9. The linear forms e1, . . .,e,, given by ek(§1w-U§n) = E];

are a basis of the vector space L".

44

(k = l,...,n)

3.1: BASES OF A VECTOR SPACE

Example 10. The real vector space of complex numbers has the set {1,1} as a basis, where It is the imaginary unit. Example 11 . The vector space SR of real scalars has the set B={l} as a basis, and the same is true for the vector space SK of complex scalars (Example 17). 3.1.3

The Existence of Bases in a Vector Space

We Will now show that every vector space has a basis. The general proof is not completely elementary because it uses results which depend on the axiom of choice or equivalently on the theory of wellordered sets. We will base our proof on a theorem due to Zorn (usually referred to as Zorn’s Lemma), which we will assume without further explanation. For those readers who are not familiar with this, we will also

give an elementary proofwhich is only valid however for those vector spaces which have a finite or countably infinite spanning set.

Theorem 7 (Zorn). A non-empty partially ordered set, in which every nonempty chain (i.e., totally ordered subset) has an upper bound, has a maximal element.

A proof of Zorn’s Lemma can be found for example in [23] p. 197. Theorem 8. Every vector space has a basis. Proof. Remembering Theorem 5, we show that there exists a maximal

linearly independent subset B of E. Let M be the family of all linearly independent subsets A of E. .2! is partially ordered by the relation A1 9 A2

(for A 1: A 2 e .11). A chain infl is a subset .97 ofM with the property that, if ADAZ 6%, then either A1 g A2 or A2 g A1. For each non-empty chain, we form the union Stfl) =U(.@) S E. Then 1. SL9?) is linearly independent. Suppose

2

(xxx: 0, Where only finitely many of the coeflicients are not

rest?!)

zero. We denote the corresponding vectors by «:1, . . .,:c,, and their coefficients by «1, . . ., can. To each vector wk, there exists a set A k 6 fl, such that

16,, e A k (k=1,. . .,rl). Since .9 is totally ordered, one of these sets A L. will be the largest. If we denote this by A k0, then A L. E A be for all h: l, . . .,1’l. This means however that $1, . . .,x,, e A h and it follows, from the linear indepen-

dence of A k0, that och = 0 for h = l, . . ., n. Hence 61%) is linearly independent. 2. 8(33) 2 A for all A e .93. This follows immediately from Theorem 1.2;]. 45

CH. 3: BASES OF A VECTOR SPACE

Now 1. means that St?) 6 M and 2. means that St?) is an upper bound for .93 in .51. Thus the partially ordered set .2! satisfies the conditions of Zorn’s Lemma because 4% is not empty, since a e .51. Therefore .51 has a maximal element B and this set B is a basis of E. After this general proof, we will now show in a more elementary way that a vector space E, which has a finite or a countably infinite spanning set, has a basis which is either finite or countably infinite. Suppose that A E E is a spanning set for E which consists of the finite or countably infinite number ofvectors ek(k = l , 2, 3, . . .) where e1 aé 0. (IfE = {0}, then B: 35 is a basis.) We construct a subset A* of A by putting e1 6 A* and, fork 2 2, eh e A* if, and only if, eh is not a linear combination of e1, . . ., ek_1, i.e., if ek (f L (e1, . . .,e,,_1). Then A* is a basis of E, because

1. A* is linearly independent. If A* were linearly dependent, then there '1

would be an equation 2‘, akek= 0 with n > 0, e,, e A* and an yé 0. This could

Ic=1 be solved for en as a linear combination of e1, . . ., e,,_1 in contradiction of the

definition of A*. 2. A * spans E. We only need to show that A is contained in L(A*) because then E =L(A) SI. L(L(A*)) =L(A*) (Theorems 2.4 ;5 and 6) and hence E: L(A*). If A were not contained in L(A*), then there would be an element ek e A

which was not in L(A*). Suppose that eko is the first element of A with this property. Certainly eke 9% A*, and therefore eko e L(el, . . .,e,,,_1). (Note that, since e1 6 A*, he 22.) But e1, . . .,eka_1 e L(A*) and hence e,“ e L(L(A*)) =

L(A*) and this is a contradiction. We note that this elementary proof is still valid for an arbitrary spanning set A of E, if A can be so ordered that every non-empty subset of A has a first element. That this is possible for all sets (even when they are not countable) is the content of the well-ordering theorem (see [23] p. 198). Theorem 9. In a vector space E, every linearly independent set 0 can be

completed to a basis of E, i.e., there is a basis B of E such that B 2 0.

Proof. The proofis almost the same as that ofTheorem 8, the only difference being that the family 4% consists of those linearly independent sets which contain 0.

Theorem 10. Every spanning set 0 of a vector space E contains a basis, i.e., there is a basis B of E such that B E C. 46

3.2: FINITE-DIMENSIONAL VECTOR SPACES

Proof. Again the proof follows that of Theorem 8 except that here .52! is the family of linearly independent sets A which are contained in 0. This shows that there is a linearly independent set B which is maximal in 0, i.e., a

linearly independent subset B of 0 such that every subset of 0 which properly contains B is linearly dependent. Then L(B) 2 0 and, because

L(C) = E,

L(B) = L(L(B)) 2 L(C’) = E. Hence B is a basis ofE.

Problems

1. Show that the vectors x1, . . .,x4 in Problem 2.3 ;2 form a basis of R4.

2. Let E be a vector space. Prove that (a) BS is a linearly independent subset of E and R is a subset of S, then

R is linearly independent. (b) H8 is linearly dependent and R 2 S, then R is linearly dependent. (c) If 0 e S, then S is linearly dependent.

(d) If x e E, then {x} is linearly dependent if and only if x=0. 3. Prove that, ifx, y are linearly independent elements ofthe vector space E, then x+y and x— y are also linearly independent.

4. Let 7-0, . . ., 7-,, be distinct real numbers. Show that the polynomials 72.;(1) of degree n which are given by “(7,5) = 8” (i, Ic=0, . . .,n) form a basis of the vector space Pn (Example 2.1 ;5). 5. Let B1 and B2 be bases of the vector spaces E1 and E2 respectively. Show that the pairs (e1,0) where e1 6 B1, and (O,e2), where e2 6 A2, form a basis of the direct product of E1 and E2 (see Problem 2.1;4).

6. Prove Theorems 9 and 10 in the case when E has a finite or countably infinite spanning set. (Use the proof of the existence of a basis in this special case.)

3.2 Finite-Dimensional Vector spaces Definition 1. A vector space is said to be ‘finite-dimensional’ if it has a finite spanning set.

Examples. The Examples 02, G3,Rn,Pn, Ln,K,;S'R of 2.1.1 are all finitedimensional because they all have finite bases. In view of Theorem 3.1 ;10, every finite-dimensional vector space E ;é {0}

has a finite basis. The most important property of this type of vector space is that all bases contain the same number of elements (see Theorem 2). The proof of this depends on the following Exchange Theorem which is due to Steinitz. 47

CH. 3: BASES OF A VECTOR SPACE

Theorem 1. Let E be a vector space, a1,...,a,, 6E and L=L(a1,...,a,,). If the subset O of L is linearly independent, then 1 . 0' is finite and the number m of elements in 0 is m < n. (We will therefore denote the elements of 0 by e1, . . .,e,,,.) 2. For each integer r,00; y,- e P and there is a representation x1 = E Aiyi Where 1. =1 i=1 yiaéxl for i=1, ..., n. Hence 7':

r

1'1 = Z A: 2 Flax/c,

i=1

k=1

(1)

7‘

where Z a“: l and a¢k>0fori= 1, ..., nandlc: l, . . ., r. The coefficient of k=1 n

:51 on the right-hand side of (l) is 2 Aim-1 which is strictly less than 1.

i=1

Since é Ai=l and Ai> 0, it could only be equal to 1 if i=1

#11 =F21==Pm1 = 163

3H. 3: BASES OF A VECTOR SPACE

But this would mean that y,-=:zal for all i, which is a contradiction of the assumptions. Hence (1) can be solved for :01, to obtain n

—1

971 = (1— 2 Aillvil)

i=1

r

Z

n

Z Ail‘ikxk'

k=2 i=1

The coeflicients on the right-hand side are all positive or zero. Their sum is n

n

-1

(1 — E Adm) i=1

n

r

i=1

Ic=2

i=1

n

—1

2 A: 2 Me = (1— 2 MM)

.2 Ai(1’I‘1‘1) = 1-

|=1

Therefore as] is a convex linear combination of 2:2, . . ., x,.

Theorem 3. A convex polyhedron is the convex hull of its vertex-vectors. Proof. Let P =K(x1, . . .,x,) and suppose that $1 is not a vertex-vector. By Theorem 2, {$1, . . .,x,} g K(x2, . . .,x,). Using Theorem 2.6 ;6, it follows that P=K(x2, . . .,:e,). Now, if there is another of the vectors 2:2, ..., x, which is not a vertex-vector, then this can also be left out in the same way, etc. 3.4.2

Simplexes

Suppose P=K(xo, . . .,x,) is a convex polyhedron. By Theorems 2.5310 and 2.6;7, P g N(P)=N(x0, . . .,a:,). This coset has dimension at most r, where the dimension of a coset is defined to be the dimension of the corresponding subspace. The subspace is in fact L(x1—x0,...,x,—xo), (cf. the proof of Theorem 2.5;10.) Definition 3. A convex polyhedron P=K(xo, . . .,x,) is said to be an ‘r-dimen-

siondl simplex’, if the dimension of the coset N(x0, . . .,x,) is equal to r.

If P is an r-dimensional simplex, it is easy to see that all of the vectors x0, . . ., x, are vertex-vectors. For instant, if :10, were not a vertex-vector then r —- l

r— 1

there would be a representation at, = Z Ak 2.7,, where 2 Ax, = 1 (Theorem 2). k =0 Ic—O But then x, eN(x0,...,x,_1) (Theorem 2.5;10) and the dimension of the coset N(:00, . . .,x,) would not be greater than r— 1. Example 3. In the vector space G2 (Example 2.1 ;1) every simplex is one of

the following types. (a) consisting of only one vector (dim. 0) (b) the convex hull of two distinct vectors (segment, dim. 1) (c) the convex hull of three vectors whose end points are not collinear (when they are drawn from a common origin) (triangle, dim. 2)

64

3.4: CONVEX POLYHEDRA

In the vector space 03, the same cases appear as in 02 and also

((1) the convex hull of four vectors whose end points are not coplanar (tetrahedron, dim. 3).

Example 4. In a real vector space E, suppose that e1, ...,e, are linearly independent. Then S =K(0,e1, . . .,e,) is an r-dimensional simplex.

Theorem 4. If S =K(xo, . . .,x,) is an r-dimensional simplex, then every vector 2: e S can be written in exactly one way as a convex linear combination of $0, . . ., x1" 1‘

7'

gM~

Ala:

a-

tr

gM~r

Proof. Suppose that x: 2 May, and x: Z pkxk, where Ic=0 k=0

M=L

11M*

Then 8"

1(Ak—m)(xk—xo) = 0

and, since the vectors x1 — x0, . . ., x,— 950 are linearly independent, it follows

that Ak=llk for k: 1, .. ., r and hence that Ao=po. If r = n = dimE, then every vector x e E can be represented in exactly one 7‘

r

way in the form x: 2 May, where Z Ak=l. The numbers A0,...,}\,, k=0 i=0 r

which are uniquely determined by x and the condition 2 Ah = l, are called k =0 barycentric co-crdinates in the vector space E with respect to the simplex S. If an, . . ., x, are the vertex-vectors of an r-dimensional simplex 8,, then, for 0 g s S r, any 3 +1 of these vectors Will be the vertex-vectors of an

s-dimensional simplex S, contained in 8,. We refer to S; as an s-dimensional face of 3,. The number of s-dimensional faces of an r-dimensional simplex is (::1) In particular, there are r+l (r—1)-dimensiona1 faces and r+l

0-dimensional faces (each consisting of just one vertex). S, has just one r-dimensional face, viz. itself.

Theorem 5. The intersection of two faces of a simplex is either a face or it is empty.

Proof. 1. From Theorem 4, it follows that the intersection is empty when the two faces have no vertex-vectors in common. 65

CH. 3: BASES OF A VECTOR SPACE

2. If the two faces have vertex-vectors in common, their intersection

is the simplex spanned by these vertex-vectors. Theorem 6. The intersection P of a coset N and a simplex S is either a convex polyhedron or it is empty. Proof. 1. If Paé r0“ , then, for each y e P, we can construct a unique face 8,,

of S which contains 3/ and has the least possible dimension. (By Theorem 5, 8,, is uniquely determined as the intersection ofall the faces which contain g.) If P n 8!, = {y}, we will say that y is a distinguished vector of P. Since each face ofS can be the 8,, for at most one distinguished vector y and S has only a finite number of faces, it follows that there can only be a finite number of

distinguished vectors. 2. We will now show that P is the convex hull of its distinguished vectors and thin will prove the theorem.

Fig. 12.

3. Suppose that y eP is not distinguished. Then there is a vector at e E’, (1750 such that y+d e P n S”. For arbitrary real numbers ”I, #2 it then follows that

21 = 2’!+P«1d EN and z2 = y—nzdeN. We will show in 4. that we may take #1, ”.2 to be positive and so choose them that 21 and 22 lie in proper faces of 8"] (Fig. 12). Then .7/

.“1 + #2

(M2 21 + #1 22)

i.e., y is a convex linear combination of 21, 22 e P. If 21, 22 are not distin-

guished, we can repeat the process just described for each of them. After a finite number of steps (at the latest when we reach 0-dimensional faces of S”), we obtain a representation of y as a convex linear combination of distinguished vectors of P. 4. It only remains to prove the assertion about the choice of [1.1 and [’42- If S” =K(xo, . . .,x,), then there are representations

66

3.4: CONVEX POLYHEDRA 3

8

3/ = Z Akxk and y+d = 2 (Ak+8k)xlc

h=0

k=0

whereAk>0(k=0,.. ,;8) logo Ak=l and 3:10 3).: —0. SincedaéO, notallSk are zero and therefore there must be one whicho'1s strictly positive and one which is strictly negative. We now choose [1.1 > 0 so that A), + #1 8k 2 0, for k: 0,.

and at least one of these is zero—which is possible because A), > 0 for all 8k and 8h < 0 for some k. Hence 8

31 = y+p1d = Iago (Ab+.u18k)xk 631/Since one of the coeflicients is zero, 21 actually belongs to a proper face of8,]. The choice of #2 is made in a similar way. 3.4.3

Convex Pyramids

Let E be a finite-dimensional real vector space. Definition 4. A ‘comzex pyramid’ P g E is the positive hull of afinite number of vectors x1, . . .,a:, e E (cf. 2.6.2).

A convex pyramid is a convex cone and is therefore convex. Example 5. In Example 2.6 ;5, the convex cones (1), (2) and (4) are convex pyramids in the vector space 02. On the other hand (3) is not a convex

pyramid. In 03, pyramids in the usual sense are convex pyramids providing they are convex and their origins correspond to the zero-vector. Example 6. The set of all vectors x e E, for which all the components (with

respect to some basis) are positive, is a convex pyramid. It is the positive hull of the basis. Every subspace L g E (and hence E itself) is a convex pyramid. If {e1, . . .,e,} is a basis ofL, then L=P(el, ...,e,, —e1,.. ., —e,).

The theory of convex pyramids naturally develops along similar lines to that of convex polyhedra. We will therefore first look for the analogues of the vertex-vectors. These are what are known as the edge-vectors.

In order to simplify the definition of these, we will say that two vectors x, y e E are similar if each is a strictly positive multiple of the other, i.e., y= are with ac > 0. (The zero-vector is therefore only similar to itself.) 67

CH. 3: BASES OF A VECTOR SPACE

Definition 5. A vector x of a convex pyramid P is an ‘edge-vector’ of P if, 7

from x: 2 Ahab, Ak>0, xk 6P, xkaéO (k=1,...,r), itfollows that x1, ...,x, Ic=1 are similar to x.

If x is an edge-vector of P, then so is every vector which is similar to at. Remembering Theorem 3, we could conjecture here that every convex pyramid is the positive hull of its edge-vectors. However, we can see that this is not always true, by considering the example of the convex pyramid

E which has no edge-vectors at all. Theorem 7 . Every acute convex pyramid is the positive hull of its edge-vectors, where it is sufi‘icient to choose just one representative from each class of similar edge—vectors. (See Definition 2.6;6.)

The proof of this theorem is directly similar to that of Theorem 3 and so we will not write it out again. Problems

1. Prove that, if P, Q are convex polyhedra, then so are K(P U Q) and

AP+ [LQ (for all real A, [1,). 2. Let {e1,. ..,e,.} be a basis of a real vector space. Show that e1, ..., en are the vertex-vectors of an (n— 1)-dimensional simplex S.

3. In Problem 2, let 10:13,, and ei=(8i1,...,8m) (i=l,...,n). Find a condition on £1, . . ., f" in order that a: (£1, . . .,En) e S. 4. Prove that, in a finite-dimensional vector space, the set of all those

vectors all of Whose components (with respect to a. given basis) are positive or zero is a convex pyramid. 5. Prove that, if K is a convex polyhedron, then P(K) is a convex pyramid.

68

CHAPTER 4

DETERMINANTS 4.1

Permutations

A permutation of a finite set of n elements is a mapping of the set onto itself.

Since it is a mapping ofa finite set onto itself, a permutation is automatically 1-1. If we denote the elements of the set A by l, 2, . . ., n and a permutation

of A by 45, then ¢(Ic) will take each of the values from 1 to n exactly once as 76 runs through the elements of A. (The use of small Greek letters to denote permutations will be the only departure from our usual convention of using small Roman letters for mappings.)

A convenient method of presenting a permutation is to write down the elements of A in a row and then, underneath each of these, to write down its

image under the permutation. For example, if the permutation is written intheform

¢_12345 1‘25314 then this means that 951 is the mapping such that ¢1(1)=2, ¢1(2)=5, ¢1(3)=3, etc. The total number of permutations of a set of n elements is n!=n(n—l)....3.2.l, because there are n possibilities for the choice of

9151(1), and then (n—l) possibilities for ¢1(2) (951(2);é¢1(1)), etc. Finally 95102) is completely determined by 561(1), ..., ¢1(n— 1). Since permutations are mappings, they can be multiplied together (see 1.3.1). The product 452351 of the permutations «751 and 952 is given by k—>¢2 (151(k) = ¢2(¢1(k)). This multiplication of permutations is not commutative in general. For example, if (#2 is the permutation

¢_12345 2“54132

12345

the“

¢2¢1=(4 2 1 5 3)

and

¢1¢2=(4 1 2 3 5)¢¢2¢‘

12345 69

CH. 4: DETERMINANTS

On the other hand, the multiplication of permutations is associative (Theorem 1.3;1). The identity permutation e is given by 606) =Ic (k: 1, . . .,n) and we see

that 956 = set = 4) for all permutations 95 Since a permutation (I: is a 1-1 mapping, it has an inverse 46—1 for which

95*1 93 = 45¢‘1 = 5 (see 1.3.2). This can be found by interchanging the rows in the above representation of g5 For example,

_l_25314_12345

¢1_(12345)’(41352) These rules show that the permutations of a set of n elements together with their multiplication form a group, which is known as the symmetric group S”. (See [25] pp. 61—64.)

If ll! is any permutation of A, then, as (f) runs through all the permutations ofA , so do ¢ll1 and (MS because, for any permutation 0, 0 =(05b‘1)1/J = giro/1’1 0). Also 55* runs through all the permutations as zfi does, because 0 = (0‘1)‘1 for all permutations 0. A transposition is a permutation which interchanges two of the elements

and leaves all the rest fixed. Thus the permutation ()5 is a transposition when there are two elements ’61, 162 e A such that q5(lc1) = 102, 45062) = k1 and ¢(k)=k for all kaélcl or 102. We will denote this transposition briefly by (k1 k2). Obviously, every

permutation can be written as a product of transpositions and in fact in many different ways. For example,

¢1 = (1 4)(1 5)(1 2) = (2 5)(1 5H4 5) Theorem 1. It is not possible to write a permutation both as a product of an odd number of transpositions and as a product of an even number of transpositions.

Proof. The nature of the elements to be permuted is not relevant to the statement of the theorem. We will assume that they are real variables and

denote them by fl, ..., 5,, (instead of l, . . ., n). The function A(§1a--w§n) = (§1_§Z)(§l_§3)'“(§1_§n)

-(§2—§3)---(E2—§n)

Mai—a) has the special property that, if fl, . . ., f” are permuted, then either it is unchanged or it is simply multiplied by —- 1. Any transposition (for example of £1 and {2) has the latter efi'ect. Hence a permutation which is a product 70

4.1: PERMUTATIONS

of an odd number of transpositions will have the effect of multiplying A by —l and a product of an even number of transpositions will leave A unchanged. Consequently no permutation can be both a product of an odd number oftranspositions and a product of an even number oftranspositions. In View of Theorem 1, we can make the following Definition. Definition 1. A permutation is said to be ‘even’ (odd) if it is the product of an even (odd) number of transpositions. The ‘charcwteristie’ ch (45) ofa permutation

qS is equal to +1 if¢is evenandis equal to —1 iq is odd. Theorem 2. 1. ch (6) = l

2- (311(9152 451) = 0h (¢2) ch (951) 3. ch(¢_1) = ch (¢) Proof. The identity permutation is the product of no transpositions and is therefore even. A representation of 952 (#1 as a product of transpositions can be found by multiplying such representations of 961 and 452. The second rule follows from this. Finally 56. 95-1 = s, and by the first and second rules,

ch(¢) .ch(¢_1) = 1, from which the third rule follows. Exercises

1. Decide whether the following two permutations are odd or even. 1

2

3 4 5

(“)¢=(5 2 4 1 3) Solution. (a) gbis odd

l

2

3 4 5

(b)¢=(4 3 1 5 2)

(b) l/IiS even.

2. Calculate the product (fix/I of the permutations in Exercise 1. Solution. ¢¢=G

:

4

g 3

:)'

Problems

1. Show that, if rfi is a permutation, then there is a power (’3’ of ¢ which is equal to the identity permutation. (Hint: consider the countably infinite set ofpowers ¢, (152, 433, . . . and show that there are two of these with different

exponents but which represent the same permutation.) 2. Prove that the set of all even permutations of n symbols is a group. This is known as the alternating group A”. Is this also true for the set of all odd permutations? 71

CH. 4: DETERMINANTS

4.2

Determinants

4.2.1

The Concept of a Determinant

Let E” be an n-dimensional vector space and let {eh . . .,e,,} be a basis of En which will be kept fixed in the following. We consider functions D(:t1, . . ., x"), Whose variables x1, . . .,:z:,, are vectors in En and Whose values are scalars. Thus, corresponding to each ordered set {2:1, . . .,w,,} E En, there is a scalar value of the function D(x1, . . .,:c,,).

Definition 1. D(xl, . . up”) is said to be a ‘determinant’ in E” (with respect to the basis {eh . . .,en}) if the following conditions are satisfied. D1. If a permutation gt is applied to the variables, then D(x1, . . .,z,,) is

multiplied by the characteristic of ;1). That is to say, D(¢(x1), ...,¢(z,,))= ch(¢). D(x1, . . .,:c,,). (In particular, D(x1, . . .,x,,) is multiplied by —1 if two

vectors are interchanged.) D2. D(x1, . . .,:e,,) is linear in each variable xk. This means, for example, when k: l, D(°6771+/3y1» $2, ' ' U x") = “D(x1: x2: ' ‘ " xn)+p‘D(:’/1: $2, ' ' ': £11.)

for arbitrary scalars a, B (of. Definition 6.1 ;1). D3. D(a1, . . .,x,,) is normalized, i.e., D(e1, . . .,e,,)= l.

The dimension of a determinant is the dimension n of the vector space En or equivalently the number of variables in D(x1, . . .,x,,).

Of course it is not claimed initially that there exists a determinant in En with respect to the basis {e1,..., en}. However, in the following, we will prove that there is in fact exactly one determinant. Theorem 1. Let qS be a mapping of the basis {e1,. . .,e,,} into itself (95 is not necessarily a permutation). Then, if there is a determinant D, I = ch (d), if 95 is a permutation D(¢(e1), ' ' " 4’05"» 1 = 0, if two of the vectors ¢(e1), . . ., q5(en) are equal.

Proof. If cfi is a permutation, the assertion follows from D3 by substituting the basis vectors e1, . . ., e" for $1, . . ., x” in D1. On the other hand, if 95 is not a permutation, then there are two distinct basis vectors eh, e,m such that

net.) =¢g(x) where g(x) = a: +f(x) also linear?

86

5.2: LINEAR MAPPINGS OF FINITE-DIMENSIONAL SPACES

6. To each function x(-r) e 0 (Example 2.1 ;7), there is assigned the function y(-r)=x(r) +:c(—-r) e 0. Does this define a linear mapping of 0 into itself? What is its kernel? Answer the same questions for 2(7) =x(r) -—x( — r).

7. The direct product of two vector spaces E1 and E2 (Problem 2.1 ;4) is mapped into itself by f((x1,x2)) = (x1,0). Is f linear? What is its kernel? 8. Let K be a convex subset (cone, polyhedron, or pyramid) of a real

vector space E and let f be a linear mapping of E. Show that f (K) S f(E) is also a convex set (cone, polyhedron, or pyramid). Is it possible for f (K) to be acute, if K is not acute? (see Theorem 2.6 ;8). 9. Prove that, ifL1 and L2 are subspaces ofa vector space, then the quotient spaces (L1+L2)/L1 and Lz/(L1 n L2) are isomorphic. Derive a new proof for Theorem 3.2 ;7.

10. Let L be a subspace of the vector space E. Show that the set of all linear mappings of E into a vector space F for which f(L): {0} is a subspace of f(E,F). More generally, show that the same is true for those mappings for which f(L) g M, where M is any subspace of F. 11 . Prove that the inverse image of a subspace under a linear mapping of a vector space E is a subspace of E. 5.2 Linear Mappings of Finite-Dimensional Vector Spaces, Matrices 5.2.1

The Bank of a Linear Mapping

Theorem 1. Suppose E is a finite-dimensional vector space. A vector space F

(with the same scalars) is isomorphic to E if and only ifF is also finite-dimensional and dimF = dim E. Proof. Theorem 5.1;5.

Definition 1. Let E be a finite-dimensional vector space and let f be a linear mapping of E into a vector space F. Then the ‘rank’ off is the dimension of the imagef(E) E F. (If E is finite-dimensional, then so is f(E).)

Theorem 2. If K is the kernel of the linear mappingf, then the rank off is equal to dim (E/K) = dimE—dimK. Proof. By Theorem 5.1 ;9, f(E) and E[K are isomorphic. Hence the assertion follows directly from Theorems 1 and 3.2 ;5.

Theorem 3. A linear mapping f of the finite-dimensional vector space E is 1-1 if and only if its rank is equal to dimE. 87

CH. 5: LINEAR MAPPINGS 0F VECTOR SPACES, MATRICES

Proof. 1. Iff is l-l, then, by Theorem 5.1;8, K = {0}. Hence the rank off is equal to dimE' (Theorem 2). 2. If the rank off is equal to dim 11', then, by Theorem 2, K = {0} and

hence f is 1-1 (Theorem 5.1;8). Theorem 4. Suppose dimE =dimF. A linear mapping f of E into F is 1-1 if and only if it is onto F. Proof. 1. Iff is 1-1, then, by Theorem 3, dimE = dimf(E) = dim F and hence f(E) =F by Theorem 3.2 ;4. 2. Iff(E) =F, then the rank off is equal to dimE and, by Theorem 3, f is 1-1.

5.2.2

Linear Mappings and Matrices

We will now investigate how the components of a vector can be used to express the components of its image under a linear mapping. Suppose therefore that {e1,...,en} is a basis of E and that {f1, . . .,fm} is a basis of F. Suppose further thatf is a linear mapping of E into F which, in accordance with Theorem 5.1 ;3, is given by

M) = E «if.

(k = 1,2.....n).

(1)

n

Then the vector x e E, x: 2 fkek, is mapped onto

k=1

fix) = 2": 51.71.91): E [ é aik§k]f£-

L-=1

t=1

The components 7h off(x) are therefore fl

”’7: = 2 “ikgk

(1: = 1,...,m).

(2)

Thus we have the following result. Theorem 5. Under a linear mapping, the components of the image of a vector x are linear funetionals in the components of 2: (cf. 6.1 ;(1)). We note that the coefficients “1k!"‘)°‘mk are the components of f(eh) (k=1,...,n).

For given bases, the linear mapping f is uniquely determined by the

88

5.2: LINEAR MAPPINGS OF FINITE-DIMENSIONAL SPACES

coefficients am which, as in an exchange tableau (of. 3.3), we usually arrange in the form of a rectangular array. 0‘11

0:12

“21

(122

“in ..n

“2"

A:

=(“ik) “m1

“m2

"-

“mu

We refer to a rectangular array of numbers like this as a matrix. The matrix A has m rows and n columns. The mn scalars acik are known as the elements of the matrix. The matrix A represents the linear mapping f with respect to

the two given bases of E andF, and we Will say that A is the matrix off with respect to these bases. Referring to a matrix which has m rows and n columns as an m x n matrix, we can easily verify the following theorem. Theorem 6. Let E and F be vector spaces of dimensions n and m in which fixed bases are given. Then each linear mappingf of E into E corresponds to an m x n matrix A = (aw) such that f is given by (2). Conversely (2) represents a linear mapping of E’ into Efor any m x n matrix A = (aw). The correspondence

between the linear mappings of E into E and the m x n nmtriccs is 1-1. Now, suppose that F is mapped by a further linear mapping 9 into a finite-dimensional vector space 0. Let the relation corresponding to (2) for g be

{h = E1 Bram

(h = 1,...,p)

(3)

where p is the dimension of G and {1, .. ., 5,, are components with respect to a given basis of 0. Thus the mapping 9 corresponds to the matrix

rim

I311

:03“).

B:

BM

[31,1

From (2) and (3), it follows by substitution that £1: = £1 [12”: Bhi“ik]§k

l

Ic=l

i=1

£1 Vhla‘fk Ic=l

(h = l,...,p),

(4) 89

CH. 5: LINEAR MAPPINGS OF VECTOR SPACES, MATRICES m

Where

y“, = 2 film-oz“, i=1

(h = 1,...,p; k = l,...,'n).

(5)

Thus the coeflicients y“, of the product mapping gf can be calculated from the coefficients on“, off and B,"- of g. The mapping gf is represented by the matrix

0:

711

...

...

711a

...

...

...

...

:(yhk)‘

m

7121

Definition 2. The matrix 0:01”) where tc is given by (5), is called the ‘product’ C’=BA of the matrices A = (am) and B: (flu). Thus the element )1“, of the product matrix C =BA is obtained when the hthrow ofB is ‘multiplied’ into the kth column ofA . We ‘multiply’ a. row into

a column containing the same number of elements by taking the sum of the products of the elements of the row with the corresponding elements of the column. Example 1. On multiplying the row (1,2, 3) into the column

—1 2

y

1 we obtain the number l.(—l)+2.2+3.l=6.

Hence the matrix product 0=BA only has a meaning when the row length (the number of elements in a row) of B is equal to the column length of A. The column length (or the number of rows) of the product matrix is

then equal to that of B and its row length is equal to that of A. Example2 LetB=(;

?

g)

and

4 5 4 4 ThenBA=(8 7 2* 5), 90

A:

1

2

—l

0

l

2

2 1

3

1

1

0

5.2: LINEAR MAPPINGS OF FINITE-DIMENSIONAL SPACES

where, for example, the element marked with an asterisk is obtained by multiplying the second row of B into the third row of A:

2.(—1)+1.2+2.1 = 2. Multiplication ofmatrices is not commutative, i.e., the equation AB = BA

is not true in general, even when both products have a meaning. Example3 If then

1 A—(O 0 AB=(-l

2 1)’ l 0)

2 B=(—l and

1 0),

2 BA=(_l

5 _2),

hence AByéBA. In view of Definition 2, the statements of formulae (4) and (5) can be combined into the following theorem.

Theorem 7. The matrix of the product of two linear mappings is the product of their matrices (in the same order).

Naturally it is assumed here that the same basis of the intermediate space F is used for both mappings. Theorem 8. Multiplication of matrices is associative. Proof. Let A = (am), B = (Bu), 0: (71:1) where the row length of A is equal to the column length of B and similarly for B and 0. Then

[(ABWM = 2k (Z “hi3ik) 71a = .2k ahiflik'yu ‘L

v

and the same is true for [A(B0)],,,. (For simplicity, we have denoted the element of (AB)0 with indices h and l by [(AB)C'],,,.) The matrices (AB)0 and A(B0) therefore have the same elements and hence they are equal. The formulae (2), (3) and (4) can also be expressed more clearly in terms of matrices. If we denote the matrix which consists of the one column

by E and similarly the matrix

Ici is not a permutation are zero. In the case ofa permutation, if we put k1; = 500'), the last expression becomes

detC = § 061,¢(1)-~ "WM/Kn) é} °h(¢)/31,¢¢-If(x) of the vector space E

into the vector space S. These mappings are just those real- or complexvalued functions f on E for which L. f(°‘1 $1 + 052132) = a1f(w1)+ a52.1%"?2) for all 901, 12 e E and all scalars, a1, «2. Definition 1. A ‘linear functional’ on a vector space E is a linear mapping of

E into the l-dimensioml vector space of its scalars. The linear functional f, which is given by f(x)=0 for all a: e E, will be denoted by f=0 and hence the statement f#0 will mean that there is a vector x e E such that f (x) #0. Iff;é 0, then f maps E onto S. If E is finite-dimensional, then, in the case of a linear functional f, the

representation 5.2 ;(2) off becomes m

M) = kg «ha.

(1)

It will be seen that these are the functions which were introduced in Example 2.1 ;9. Theorem 1. Let E be a vector space and let an be a non-zero vector in E. Then

there is a linear functional f on E such that f(x0) #0.

Proof. Since x0¢0, there is a basis B of E which contains x0. Hence the theorem follows from Theorem 5.1 ;3. Theorem 2. Let L be a subspace of E and x0 6 E\L. Then there is a linear

functiorml f on E such that f(x) =0 for all x e L and f(x0)7é0. 107

CH. 6: LINEAR FUNCTIONALS

Proof. The canonical image E0 of mo in the quotient space E/L is not the zero-element (cf. Example 5.1 ;1). Hence, by Theorem 1, there is a linear

functionalfon E/L for which flio) #0. We define a mapping f of E into S by f(x) =f(i) for each x e E, where c? is the canonical image of 95. Since 29—»? is a linear mapping of E onto E/L (Problem 5.1 ;2), it follows thatf is a linear functional on E (Theorem 5.1;6). It is easy to verify that f satisfies the requirements of the theorem. 6.1.2

Hyperplanes

Definition 2. Let H be a coset in a vector space E and let L be the subspace corresponding to H. Then H is said to be a ‘hyperplane’ of E, if the quotient space E/L has dimension 1.

Thus a coset is a hyperplane if and only if the corresponding subspace is a hyperplane. If dimE=n, then a subspace L of E is a hyperplane if and only if dimL=n— 1. (Theorem 3.2 ;5).

Theorem 3. A coset H g E is a hyperplane if and only if E is the only coset which contains H as a proper subset. Proof. 1. We will first deal with the case when H is a subspace.

1.1. Suppose the subspace L is a hyperplane. Then dimE/L: 1 and hence the subspace {0} =L/L of E/L is properly contained in only one subspace, viz. E/L. Hence, by Theorem 2.5 ;7, E is the only subspace Which

properly contains L. 1.2. Suppose that the subspace L of E is not properly contained in any

subspace except E itself. Then the subspace {0} = L/L of E/L is not properly contained in any subspace except E/L. Hence dim E/L = 1 and L is a hyperplane of E. 2. We now prove the theorem for arbitrary hyperplanes. 2.1. Suppose H is a hyperplane of E. Then so is the subspace L corresponding to H so that E is the only subspace which properly contains L. Hence E is the only coset which contains H as a proper subset (cf. Problem

2.5 ;5). 2.2. Suppose H is a coset of E and E is the only coset which contains

H as a proper subset. Then E is also the only subspace which properly contains the subspace L corresponding to H. Hence L is a hyperplane and so also is H. Theorem 4. Let f #0 be a linear functional on the vector space E. Then

L=f‘1(0) = {96; a: e E,f(x) =0} is a subspace and a hyperplane of E. (By Definition 5.1;2, L is the kernel of the linear functional f.)

108

6.1: LINEAR FUNCTIONALS AND COSETS

Proof. 1. By Theorem 5.1 ;7, L is a subspace.

2. By Theorem 5.1;9, the vector spaces E/L and f(E)=S are isomorphic. By Theorem 5.2 ;l, dim E/L = dimS = l and therefore L is a hyperplane.

Theorem 5. Let f;é 0 be a linear functional on the vector space E and let at be a

scalar. Then H=f_1(oc)= {x; x e E,f(x) = cc} is a hyperplane. Proof. Since f ;é 0, H is not empty. Suppose therefore that x0 6 H. Then L =H —x0 =f‘1(0) is a subspace and H is a coset of L. By Theorem 4, L is a

hyperplane and therefore H is also a hyperplane (of. Problem 3). Theorem 6'. Given any hyperplane H of E, then there exists a linear functional

f#0 on E and a scalar or such that H =f_1(oc). f is uniquely determined by H except for a non-zero scalar factor.

Proof. 1. Let L be the subspace corresponding to H. By Theorems 2 and 3, there is a linear functional f#0 on E such that f(x)=0 for all a: e L. By Theorem 4, L* =f‘1(0) is a hyperplane and L* 2 L. Since L is a hyperplane itself, it follows that L* =L. Now, H =f_1(oc), where as:f(you), and x0 is any

vector in H. 2. If H=f‘1(a) =g_1(/3), then L=f_1(0) =g“1(0) is the subspace corresponding to H. If 20 e E\L, then every x e E can be written uniquely

in the form x=y+Azo where y e L (of. Problem 4). Hence f(w) = Af(zo) and -

flan)

9(96) A9(20),1-e-,f 9020);). In the sense of analytic geometry, Theorem 6 states that every hyperplane has an equation of the formf (x) = at wheref at 0. Theorem 5 states that

every equation of this form is the equation of a hyperplane.

6.1.3

Systems of Linear Equations and Cosets

Theorem 7. Every coset M of E is the intersection of the family J of all those

hyperplanes H which contain M. Proof. 1. Clearly M E (VJ) (Theorem 1.2;3). 2. Suppose yo 9*; M (ifM =E, the theorem is triviallytrue) ands):0 e M. If L is the subspace corresponding to M, then yo—xo (f L. By Theorem 2, there is a linear functional f on E such that f(x)=0 for all 9: 6L and

f (yo—x0)9é0. We put f(xo)=oc and consider the hyperplane H :f_1(oc). For each x EM, f(x)=f(x0)=oc and hence H 2 M, i.e., HEM. Since

f(y0) ;éf(x0), yo 9.5 H and therefore also yo ¢ {Mu/l). Hence {1(1) 9 M. 109

CH. 6: LINEAR FUNCTIONALS

Iff is a linear functional on the vector space E and a: is a scalar, then the equation f(a) = at is known as a linear equation. A vector x0 6 E is a solution

of this equation, iff(950) =oc. By Theorem 6, every hyperplane is the set of solutions of a linear equation. Accordingly, Theorem 7 can also be stated as

follows.

Theorem 8. Every coset is the set of solutions of a system of linear equations. (viz. the system consisting of the equations of the hyperplanes in the family .1. If M = E, them/l = Q . The theorem is still true in this case because E is the set of solutions of the equation 0(a) =0.) By Theorems 5 and 2.5 ;9, the following converse ofTheorem 8 is also true.

Theorem 9. The set of solutions of a system of linear equations is either empty or it is a coset.

It is in fact possible for the set ofsolutions to be empty, for example when there is an equation f(x) = o: in which f is the zero functional and ac 9e 0. Example 1. In the vector space G3 (Example 2.1 ;1), the set of solutions of the linear equation (with respect to some basis)

f(x) = a1§1+a2§2+a3§3 = B

(.1095 0)

is a plane, i.e. , when the solution vectors are drawn from a fixed point, the set of their end points is a plane. A system of two linear equations has as its solution set either a straight line or a plane or it is empty. For three equations, the solution set can either be empty or a single vector or a straight line or a

plane. If we allow f= 0, the solution set can also be the whole space, viz. when the system has only the one equation 0(a) =0.

Problems

1. Let a linear functional f be given in the form (1). What happens to the coefl‘icients at under the change of basis E=T1fl +1

2. Show that, if k e 0 (Example 2.1 ;7), then flat): I Ic(-r):c(1-) (17 is a

linear functional on 0. 1 3. Show that, iff is a linear functional on a vector space, a is a scalar and f‘1(oc) =H, then L=f‘1(0) is the subspace corresponding to the coset H. 4. Suppose f is a linear functional on the vector space E, L=f_1(0) and 20 e E\L. Show that every vector a: e E can be written in exactly one way in the form x=y+/\zo where y e L. 110

6.2: DUALITY IN FINITE-DIMENSIONAL SPACES

5. Suppose E1 and E2 are vector spaces and L1 is a hyperplane of E1. Prove that the set of all pairs (22142), Where x1 6 L1 and x2 6 E2, is a hyperplane in the direct product of El and E2 (Problem 2.1 ;4).

6. Prove that a subspace L of the n-dimensional vector space E is a hyperplane if and only if dimL= n— l. 7. Let k(a, 7-) be a continuous scalar valued function of the real variables a, -r in the interval — l S 0', 1g +1. Show that, if g e 0, then the set of all

x e 0 which satisfy the first order linear integral equation

+1

I k(a.v)xf(x,yo) is a linearfunctional on E, and 2. for each x0 6 E, the mapping y—>f($0,311) is a linear functional on F.

Choosing bases of E and F and denoting the corresponding vector components by f; and 1)h respectively, we can easily verify that

flat?!) = Z]; aikfink z,

(2)

is a bilinear form on (E,F) for all a”. 0n the other hand, every bilinear form f on (E,F) can be represented in the form (2), because, from Definition 2 and

a: Egiei; y: Enkfk, it follows that (2) is satisfied by a“: =f(ei,fk). i k

6.2.3

Dual Pairs of Spaces

Definition 3. A pair (E,F) offinite-dimensional vector spaces with a bilinear form f (x, y) is said to be a ‘dual pair of spaces’ if 1. fromf(x,yo) =Ofor all a: e E, itfollows that y0=0 and 112

6.2: DUALITY IN FINITE-DIMENSIONAL SPACES

2. fromf(xo,y)=0for ally e F, itfollows that x0=0.

The form f (x, y) is known as the scalar product of the dual pair of spaces and in the following it will be denoted by (32,31). Example 1. Suppose that E and F are 2-dimensional and f(x,y)=fln2. Then this is not a dual pair of spaces, because, for the vector :50 6 E with

51:0, 52: l,f(x0,y) =0 for all y 6F. But xoaéO. However if we put f(x,y) = E1111 +§2 1,2, then (E,F) becomes a dual pair of spaces, because, if f(x,y)=0 for all y EF, then, with 971:1, 172:0, we have 51:0 and, with 771:0, 772:1, we have 52:0, i.e., 22:0. The same

argument is still valid when the roles of x and y are interchanged. Example 2. Let E and F be two vector spaces of the same dimension n and

let £1- and 77k be the vector components with respect to given bases of E and F. Then (E,F) is a dual pair of spaces with the scalar product @731) = El 51:17]:Ic=1 Example 3. A vector space E and its dual space E* is a dual pair of spaces when, for x E E and f e E*, the scalar product is defined by (x,f) =f (3:). Because f (x) :0 for all x e E is just the definition of f:0 and, if f (z)=0 for all f e E*, then, from Theorem 6.1 ;1, it follows that x=0. Finally, it is easy to verify that (x,f> is a bilinear form.

Now suppose that (E,F) is a dual pair of spaces. Corresponding to each x e E we have the linear functional fm on E which is given by fx(y) = (sup). In this way we define a mapping 9 of E into the dual space E* of F and we assert that g is both linear and 1-1. It is linear because fan xl+m z.(!/) = (“1 x1 + 0‘2 $2, 3/) = “14751: 1’!) + 052(952: 3/) “lfzt1(y)+ “2fx2(y)

for all 3/ SF, and it is 1-1 because l=fn means (x1,y)==0, for i=2, ..., n. We choosefz, ...,f,, analogously. If E Akfk=0, Ic=1 fl

1‘

then 2 Ak(ei,fk>= 2 AkSik=Ai=O for i=1,..., n. Thus f1,...,f,, are k=1 k=1 linearly independent and hence they are a basis of F. 116

6.2: DUALITY IN FINITE-DIMENSIONAL SPACES

If the scalar product ofa dual pair of spaces is referred to dual bases, then, from (2) and (4), it follows that

= 3 em = E'n.

(5)

Conversely, if the scalar product is given by (5), then the bases which are involved are dual. Now suppose that, with the notation of 6.2.5, f and g are dual mappings. Suppose that with respect to given bases of El and E2, f is given by :2 = AEI. (Here for example £1 is the matrix which has just one column consisting of the components £11, ..., $1,, of x1 6 El.) Suppose further that, with respect to the dual bases, 9 is given by 711 = 3112. In view of (5), the duality condition (3) now reads

E’l A’nz = £13112. Since this is valid for all x1 6 E1 and all yz 6 F2, i.e., for all values of E1, and 11%, it follows that B=A’. Theorem 9. Dual mappings are represented by transposed matrices with respect to dual bases. Conversely, transposed matrices represent dual mappings when they are referred to dual bases in dual pairs of spaces (which is always possible). Proof. It only remains to prove the second part. If A is an m x n matrix, we start with two vector spaces E1 and E2 of dimensions n and m. Then, with

respect to any bases of E1 and E2, A represents a linear mapping f of E1 into E2. Now suppose that F1 and F2 form dual pairs of spaces with E1 and

E2 (e.g., we could put 12:10:). If we now refer A’ to the bases of F1 and F2 which are dual to the chosen

bases of E1 and E2, then A’ represents the dual mapping g off. Because by (5), for x1 6 E1 and y2 e,

0.051),?» = €114,712 = (rpm/2))In View of Theorem 5.2;12, it now follows from Theorem 8 that trans-

posed matrices have the same column rank. If we now define the row rank of a matrix in analogy with the column rank (cf. 5.2.4) to be the maximum

number of linearly independent rows, then obviously the column rank of A’ is equal to the row rank of A. The column rank of a matrix is therefore always equal to the row rank so that we usually refer simply to the rank ofa matrix, i.e.,

Theorem 10. The row and column ranks of a matrix are equal, so that both can be referred to simply as the ‘rank’ of the matrix.

117

CH. 6: LINEAR FUNCTIONALS

Theorem 11. If the matrix A has rank r, then it contains a determinant of

dimension r whose value is not zero and every determinant of dimension greater than r which is contained in A has the value zero. (‘Determinants which are contained in A’ are obtained by choosing a number of the rows of A and the same number of columns and taking those

elements of A which belong both to one of the chosen rows and to one of the chosen columns).

Proof. 1. If A has rank r, then it has r linearly independent rows. These form a submatrix of A which also has rank r, and therefore has r linearly

independent columns. The determinant formed from these r partial columns of A has a. value not equal to zero. 2. If a determinant which is contained in A has dimension greater

than r, then its rows are linearly dependent, because the corresponding rows of A are linearly dependent. Hence the determinant has the value zero (Theorem 4.2 ;7). 6.2.7

Numerical Calculation of the Rank of a Matrix

Let A = (ecu) be an m x n matrix whose rank is to be calculated. We start with a vector space E of dimension n and choose a basis {e1,...,e,‘} of E. If we put 7!

f = Ae, i.e.,fi =

2 «1'k k=l

(7: = l,...,m)

(6)

then, by Theorem 3.1 ;4, the row rank (and hence the rank) of A is equal to

the maximum number of linearly independent vectors in the set {f1, . . .,f,,,} and hence to the dimension of L(f1, . . .,f,,,). We start now with the tableau which expresses the relations (6) in normal interpretation. This is the same

tableau as 5.6 ;(l). Now we exchange vectors fi with vectors ek until it is no longer possible to do so, i.e. , until there are no more suitable pivots available.

Note that it does not matter which pair of vectors we exchange first and which we exchange second, etc, because the only restriction lies in the

availability of pivots. With a possible re-ordering of the rows and of the columns, the last tableau will have the form (see 5.6 ;(3) for the notation)

118

fl

e2

e1 =

B

0'

f2 =

D

0

6.2: DUALITY IN FINITE-DIMENSIONAL SPACES

where the bottom right-hand corner contains only zeros (otherwise it would be possible to make a further exchange). The vectors in the group f1 are linearly independent because otherwise it would not have been possible to exchange them. It also follows from the tableau that each of the vectors in the groupf2 is a linear combination of those in the groupf1. Thus the dimension of L(f1, . . .,f,,,) and hence the rank ofA is equal to the number of vectors

f,‘ which have been exchanged In a practical calculation, it is clearly not necessary to carry each pivotal row and column into the rest of the subsequent calculation. In the next Example, we will show the full calculation on the left and the calculation reduced to its essentials on the right. Example 5. To find the rank of the matrix

A:

l 2

2 l

l 2

2 1

l

1

l

l

IstTableau e1

f1: f2: f3:

92

1* 2 1

93

e4

2 1 l

l 2 1

2 1 1

*

—2

—1

—2

93

e4

2ndTableau

f1

91

—1 —2

e1:

1 -2

f2: f3:

2 1

—3 —1*

0 0

——3 —1

—3 —l*

0 0

—3 ~1

l

*

0

—1

*

O

—1

ll9

CH. 6: LINEAR FUNCTIONALS

3rdTableau f1

f3

33

e4

61 =

—1

2

—l

0

f2 =

—1

3

0

0

62 =

l

—l

0

—l

0

0

The vector f2 cannot be exchanged either with ea or e4. The number of

possible exchange steps is 2 and therefore the rank of A is equal to 2. Exercises

1. Find the ranks of the following two matrices, l A = (—1 —l

2 —-3 —-4

3 4 l 5) 5 l4

B =

l —l —2 —3 —-l

—1 3 2 5 —1

2 2 ——4 —2 —6

Solution. Bank A = rank B: 2.

2. Find the dimension of the subspace of R5 which is spanned by the following four vectors. x1 =(2,—1,3,5,—2),

x3 = (5,—3,8,4,1),

x2 = (3,—2,5,—1,3) x4 = (l,0,l,ll,—7).

Solution. The dimension is 2. 3- Let f1(x) = §1+2§2+3§3+ 454 f2(x) = —§1—3§2+ 53+ 5564 fslx) = “51—4524'5534'1454 9(3) = 51+352— {53— 554 be linear functionals on the vector space R4. Is there a vector x e R4 such

that gtx) 3&0 and mo =fz(x) =f3(x) =0? Solution. No, because 9 is a linear combination off1, f2, f3. Problems 1. The definition of the dual space can be extended to arbitrary vector

spaces in an obvious way. Show that the dual space F: of the vector space F0 (Example 2.1;3) is isomorphic to the space F.

120

6.3: LINEAR FUNCTIONALS (POSITIVE ON A CONVEX SET)

2. Let L be a subspace of the vector space E. What special property

characterizes the linear functionals f e L0 S E*2 3. Show that, if L is a subspace of the vector space E, then L“ is isomorphic to E*/Ll. 4. Suppose that L is the subspace of R5 which is spanned by the vectors zl=(2,1,1,2,0) and z2=(1,2,3,l,1). Find a basis for the dual subspace

L0 E R2. 5. Calculate the basis of R: which is dual to the basis ei=(8,-1,...,8,-,,) (i: 1,. ..,n) ofRn.

6. Let t be an endomorphism of the finite-dimensional vector space E. What is the image of a linear functional f e E* under the mapping which is dual to t?

7. Let (E,F) be a dual pair of spaces. Calculate the dual subspaces {0}T and El. 8. Let (E,F) be a dual pair of spaces and suppose that two bases of E are connected by é=Se (see 5.4 ;(4)). What is the connection between the dual

bases? 9. Show that (L1+L2)T=LI n L; and (L1 n L2)T=LI+L§. 10. Let (E,F) be a dual pair of spaces and suppose E =L1 6—) L2. Show that

F = LI 99 Li. ll . Show that the dual mapping of a product of two linear mappings is the product of the dual mappings in the reverse order. 12. Let f(why) = 2 ocik Em], be a bilinear form on two vector spaces of the i la same dimension. Find a necessary and sufficient condition on the “tic so that (E,F) is a dual pair of spaces with the scalar product (at, y) =f(x,y). 13. Show that the bilinear form f(x, y) in Problem 12 is a product of a linear functional 9(a) on E and a linear functional My) on F if and only if the rank

ofA = (com) is equal to 0 or 1. 14. What is the dual mapping of a projection? (see Problem 5.3 ;6).

6.3

6.3.1

Linear Functionals which are Positive on a Convex Set

The Separation Theorem

In 2- or 3-dimensional space, we have the intuitive idea that any convex set K which does not contain the origin lies completely to one side of some line or plane through the origin (i.e., some hyperplane). If f (x)=0 is the

equation of this hyperplane, then f(x) 20 for all x e K (multiplying f by — 1 if necessary).

We Will now show that this state of affairs also obtains in all finitedimensional real vector spaces. 121

CH. 6: LINEAR FUNCTIONALS

f(z)>0

H =r10 for all x e K. We say that f is positive on K (although

strictly speaking we should say ‘non-negative’). Proof. 1. We consider the family J? of all subspaces L of E which have the

following property. A: There is a linear functional f#0 on L such that flat) 2 OforallxeLnK.

We need to show that E e if. 2. If K = w , the theorem is trivially true. Suppose that 110 e K. If M0 6 K, then A > 0 because 0 at K. Hence a linear functional f#0 is defined on the 1 dimensional subspace L(x0) byf (M0) = A so that L(a:o) has property A. Hence L(x0) e 55’ and g is not empty. 3. In 4. we will show that, if L 63$ and LgéE, then there is an

L e ,2” such that dim L=dimL+ 1. Since 3’ is not empty, it follows that E e .9 and the theorem will be proved. 4. Suppose L 63, LaéE and f is a linear functional on L as in property A. If t is a vector in E\L, we form the subspace L =L(L, t). Then dimL=dimL+1 and every vector :2 e L can be written uniquely in the

form 59:35 + At, where x E L. We distinguish three cases. 1st case: Forallf=x+ht e L n K, A20. Thenf(:2) = )iisalinearfunctional

on L,f¢0 andf(fi)>0for all e 61: n K. Hence L ex. 122

6.3: LINEAR FUNCTIONALS (POSITIVE ON A CONVEX SET)

2nd case: For all £=x+At e L n K, AgO. Then we putflfi): —)1. 3rd case: Ifneither of the first two cases holds, then there exist

:21 = x1+A1t 62 n K where an 6L and A1 > 0

and

:22 =xz—AgtEZaherexzeLandA2>0.

We consider the two sets of real numbers

M1 = {p.1; there exists :51 E L, A1 >0 such that

x1+/\1t EL n K and [1.1 = —'fl:—l)l 1 M2 = {9: there exists x2 6 L, A2 > 0 such that —A2t e .21 n K and p2 = +f(;;2)}. 2

Both M1 and M2 are not empty. Further ”,1 S [1.2 for all pl 6 M1 and all ”,2 e M2. Because, for thelcorresponding 251, A1, x2, A2,

Il—1_'l‘2)| (A1x2+)\2x1) EL n K therefore A1f(x2) + A2f(391) 2 O and hence

M = _f_(x1) f_(x2) _ M _ 1

T1 \ T2

2

It follows that there is a real number 6 such that ‘ul S 0 S [1.2 for all ‘ul 6 M1 and all ‘uz e M2

We now consider the linear functlonal f on 1) given by f(x+)lt) =f(x)+)l0. Clearlyfaé 0 and it is easy to see thatf(at) 2 0 for all 2‘: e E n K. Hence in this case also L E 3’ and the proof is complete. Theorem 2. Let P be a convex pyramid in a real vector space E of dimension

n and let x0 6 E\P. Then there exists a symmetric parallelepiped W in E such that (920+ W) n P=L7L (0f. Example 3.4;2.) Proof. Suppose P=P(a1, . . .,a,) (see Definition 3.4 ;4). A linear mapping 9 of the r-dimensional space R, onto the linear hull L(P) of P is given by

123

CH. 6: LINEAR FUNCTIONALS T

g()\1,...,}l,)= 2 Ahab. We choose a basis {e1,...,e,,} of E and denote the k=1 7 components of x: 2 Aka], by fly”, 5”. Since each component is a k=1 linear functional on E, it follows that for each x e L(P), §i=hi(/\1,...,A,) (i: l, . . .,r), where h,- is a linear functional on R, (Theorem 5.1 ;6). We de-

note the components of are by £0,- and construct the quantity 9 = min max IEi—foil zeP 10 (=lc 1,.

r})isthedualpyramidofPoandQ= {x; e,(xf0 )>0}

is the dual pyramid of Q0 The hypothesis of the theorem means that P g Q. Therefore, by Theorem 6, Q" E P0 and hence f0 6 Po and the assertion follows. If we choose a basis ofE (components 45;), then we have representations as

follows. fax) = 12 “1:151 01‘ f0”) = Ag

and

fat”) = g 71 ft or f0(5'?) = Y";

Now, writing 020 to mean that all the elements of the matrix 0 are

non-negative, we may restate Theorem 7 in the following form. Theorem 8. Let A and y’ be real matrices with the same number of columns and

suppose that whenever A320, it follows that y’EZO. Then there exists a matrix 71’ >0 such that y’=1]’A. That is, y’ is a linear combination of the rows of A with non-negative coefficients. 6.3.3

The Minimax Theorem

Theorem 9. Let E and F be finite-dimensional real vector spaces, let f (x, y) (x e E,y 61‘) be a bilinear form and let X g E and Y E F be non-empty convex polyhedra. Then min max f(x y) = max min f(x, y) 1161’ 56X

xeX

(5)

e

This theorem is known as the Minimax Theorem and is due to J. von Neumann. The right-hand side of the equation (5) is calculated by first finding, for each x e X, a y e Y such that f (23,31) is minimal. This minimal value is dependent on x. We then find an x e X such that the corresponding minimum is as large as possible. The left—hand side is calculated in an

analogous manner. The fact that the minimum on the left-hand side and the maximum on the right-hand side both exist is not obvious and will not in fact be proved 127

CH. 6: LINEAR FUNCTIONALS

until a later section (9.2.2). Nevertheless we will continue with the present

theorem, assuming for the time being that the existence of both sides is known. Proof of the Minimax Theorem. 1. Suppose that the value of f(my) on the left-hand side of (5) is taken at ar:=:v1 and y=y1 and that the value on the right-hand side is taken at x=x2, y=y2. Then

min muffle) = f(x1,y1) 2 f(902,311) 2 f(x2,y2) = max minf(x,y)e

sex

sex 1161’

Hence it only remains to prove that the left-hand side of (5) cannot be strictly greater than the right-hand side. In other words, that there is no

real number 0 such that min max f(x, 3/) > (9 > max min f(x, y). e

apex

sex

(6)

1161’

2. We now prove that (6) is impossible for the special case in which the

following extra conditions are satisfied. (a) E and F form a dual pair of spaces with (25,31) =f(x, y)

(b) Each of the polyhedra X and Y is a subset of a hyperplane in E and F respectively and is not a subspace. (a) 0:0 From (6) and (6), it follows that

maxf(x,y) > 0 for all y e Y, 25X

and

min f(x,y) < 0 for all x e X, e

or, alternatively, for each 3/ e Y, there is an a: e X such that f(x, 3/) >0

and

for each a: e X, there is a y 6 Y such that f(27,11) 0 and for each x e P, xaéO, there is a y e Q such thatf(x,y) 0

(i = 1:---»m)-

(3)

Here we are looking for the solutions ($1,...,§m 771, . ..,17,,,) of the system of equations (3) for which 77520 (i: 1,. ..,m). By Theorem 7.3 ;5, if there exist any solutions at all, then these solutions form the sum P =P1 +P2 of a

convex polyhedron P1 and a convex pyramid P2. The transfer from Rm” to Rn is achieved by the same linear mapping as in 7 .4.1 and this maps P 1-1 onto the set K of solutions of (1). Hence we have the following theorem. Theorem 2. The set of solutions of a system of linear inequalities (l) is either empty or it is the sum K =K1+K2 of a convex polyhedron K1 and a convex pyramid K2. Note that K2 can be {0}, i.e., K=K1. We must now make several remarks in connection with this Theorem 2.

l . The polyhedron K1 is not uniquely determined ifK2 ye {O}. For instance, we can certainly extend K1 by introducing any vector from K\K1 as an extra vertex vector. This does not alter K.

Suppose that $1, . . ., x, are the vertex vectors ofK1. We will say that one of these («:1 say) is superfluous if x1 6 K(952, . . .,x,)+K2. We can then omit x1 as a vertex vector (so making K1 smaller) without changing K. If after the omission of 911 there is a further superfluous vertex vector, then this can also be omitted, and so on. We may therefore assume that K1 has no super-

fluous vertex vectors. 2. The pyramid K2 consists of the solutions of the homogeneous system associated with (1). For, suppose z 6 K2, then, for any a: 6 K1 and A>0, x+Az e K, or, in components, 7.

1:231 aik(§lc+)‘§k)+/3i n

n

-

= Eluikflc‘l'firl'A 15210“.k > 0

(‘l = 1"":m)-

(4)

From this, it follows that 71

£21 a“: Ck 2 0

144

(i = l, . . .,’m).

(5)

7.4: LINEAR INEQUALITIES

Conversely, if (5) is satisfied and x 6 K1, then (4) follows for A 2 0. Hence x+Az GK, i.e., z=(l/A)(—x+kl)+k2, Where 161 6K1, Icz 6K2 and are dependent on A. As A—>oo the components of (1 /A)(—x+lc1) converge to 0 and therefore the components of k2 converge to those of 2. As a simple argument will show, it follows that z e K2. 3. Consequently, the subspace L=K2 0 (—K2) consists exactly of the

solutions of the system of equations

éaufi=0

c=1pwm)

k=1

(o

and therefore has the dimension n—r(A) (Theorem 7.1 ;l), where A = (am). 4. Now, for a given (not superfluous) vertex vector x1 6 K1, suppose that

exactly a ofthe inequalities are satisfied with the equality sign, i.e. , assuming that they are the first 3, suppose that n

[‘2 “ikEk'i'Bi = 0 fori = l,...,s, =1

(7)

15

2 aikfb+fli > 0 fort} = s+l,...,m. k=1 Let M be the solution space of the system of equations

ima=0

k=1

u=uwa

(8)

m

If (1 e M, then (7) is satisfied by ”=31 1 Ad for all A. Further, there exists a A0 > 0 such that, with A=Ao, (8) is also satisfied. Hence x=x1 i- Aod are solutions of (1). Hence

”1+Aod = k11+k21 xl—Aod = k12+k22,

Where

[611, [612 6 K1 and 1021, 1022 E K2.

By adding these, it follows that 3:1 = ki+k;»

where

k: = §(lcn+k12) 6 K1

and k; = §(Ic21+lc22) 6 K2.

But in this k; must be zero because otherwise x1 would be superfluous. Hence x1 =%(lc11 +1012). Since x1 is a vertex vector it follows that k11=k12=x1. Hence A0d=k21 = —Ic22 6 K2 0 (—K2) =L. Therefore d e L, and hence M E L. On the other hand, it is clear that M 2 L and it follows

that the rank of the system of equations (9) is equal to r(A). Therefore there are r(A) of the s inequalities in (1) which are satisfied by

ml with the equality sign, such that the corresponding left-hand sides of (9) are linearly independent. We will say, in this case, that the inequalities involved are also linearly independent. 145

CH. 7: LINEAR EQUATIONS AND INEQUALITIES

Definition 1. A solution x=(£1,.. .,§n) of the system (1) is said to be a ‘basic solution’ if it satisfies r(A) linearly indeperdent inequalities of the system with the equality sign. A basic solution is said to be normal, if exactly r(A) inequalities are satisfied with the equality sign. A non-normal basic solution is said to be degenerate.

This proves the following theorem. Theorem 3. The convex polyhedron K1 (Theorem 2) can be so chosen that its vertex vectors are basic solutions of the system (1).

Example 1. Let n=2, r(A)=2 and consequently m22. If we replace the inequality sign by an equals sign in one of the inequalities we will obtain the equation of a line (assuming that not both of “i. and 011-” are zero). Then

the solutions of the corresponding inequality form one of the two halfplanes bounded by the line. Thus K is the intersection of the halfplanes corresponding to the inequalities of (1).

(b)

(a)

Fig. 15.

In Fig. 15a, m = 5 and K2 = {0}. (The diagrams and the terminology naturally refer to the space 02 which is isomorphic to R2 (Example 2.1 ;l).)

Every vertex vector of K = K1 is a basic solution since it is the intersection of just two of the five lines. In Fig. 15b, m = 4 and K1 is the convex hull of the three numbered vertex vectors. All three are basic solutions. Here we could extend K1 by introducing any vector x e K\K1 as a new vertex vector. However, these would be superfluous and not basic solutions. 146

7.4: LINEAR INEQUALITIES

Example 2. Let n: 3. Every inequality of (1) represents a halfspace which

is bounded by the plane given by the corresponding equation. Thus K is the intersection of finitely many halfspaces. Ist case. r(A) =3 (hence m 2 3). In this case, K1 can be so chosen that each vertex is the unique intersection of three planes.

2nd case. r(A) =2 (hence m2 2). In this case there is a set of parallel lines S such that, for a: e K, the line ofS which passes through :4: lies entirely in K.

Fig. 16. In Fig. 16, m: 3 and K is a 3-sided prism. K2 is the line of the setS which

goes through 0. K1 can be chosen to be the triangle with the numbered vertices. Every vertex vector lies in r(A)=two faces. The vertex vectors

are therefore basic solutions. More generally every point which lies on an edge of the prism K represents a basic solution.

In Fig. 17 (m=4), K2 is a wedge which is bounded by two halfplanes whose common bonding line is the line from the set S which goes through 0. K1 can again be chosen as a triangle. 3rd case. r(A)= 1 (hence m2 1). In this case, K is either empty or it is a

slice bounded by two parallel planes or a halfspace which is bounded by a plane. Every point of a boundary plane represents a basic solution.

4th case. r(A) :0. Now K=R3 if all [3120, otherwise K=y3 . Example 3. The basic solutions in Figs. 15, 16, 17 were all normal. For instance, if n=r(A)=3, then a basic solution is degenerate when it is the

intersection of at least four bounding planes. For example, if K is a regular 147

CH. 7: LINEAR EQUATIONS AND INEQUALITIES

v

0

Fig. 17. octahedron or icosahedron, then all vertices are degenerate basic solutions. On the other hand, if K is a cube, tetrahedron or a regular dodecahedron, then all basic solutions are normal.

148

CHAPTER 8

LINEAR PROGRAMMING 8.1

Linear Programmes

A linear programme is a problem of the following kind. We are given a system of linear inequalities

Pi = ’21 “ih§h+fii 3 0

(’i = 1,...,m)

(I)

and a linear functional n

9 = E 71:51:

(2)

k=1

which will be referred to as the objectfunction. We wish to find those vectors (£1, . . ., 5,.) e Rn which satisfy (1) and for which the object function takes its

minimum value. In matrix notation, a linear programme can therefore be formulated as follows

9 = 113+? 2 0

(3)

0 = y’g = min

(4)

(As before, we write 0'20 if all the elements of the matrix 0 are nonnegative.) The inequalities (3) are known as the restrictions of the programme. A vector (51,...,§n) GR“ which satisfies the restrictions is known as an

admissible solution. An admissible solution which satisfies (4) is known as a minimal or optimal solution and the corresponding value of the object

function is known as the optimal value. The special case in which the restrictions include the inequalities 5,; 2 0

(k = l,...,n),

i.e., E 9 0,

(5)

is ofparticular importance. In this case the programme is said to be a definite linear programme. There will also be cases in which the restrictions include only some of the inequalities (5), and then the variables which are not affected by (5) are known as free variables (see 8.4). 149

CH. 8: LINEAR PROGRAMMING

In practical applications, there will also be inequalities with S in place of 2 . They may be changed into the latter form simply by changing the sign of both sides. If one of the restrictions is an equation, then it may be

replaced by two inequalities (a =0 is replaced by oz 2 0 and — a 2 0), so that we obtain the form (3) again. It is also possible that the object function has

to be maximized. Since it is possible to replace 0=max by — 0=min, this case is also covered by (4).

By Theorem 7 .4 ;2, the set K of admissible solutions of a linear programme is either empty or it is the sum of a convex polyhedron K1 and a convex pyramid K2. If K ;é Q , suppose that al, . . ., a, are the vertex vectors of the polyhedron K1 (by Theorem 7.4 ;3, we may assume that they are basic solutions) and that the pyramid K2 is generated by b1, . . ., b3. Then :1: e Rn

is an admissible solution if and only if there are coefficients A1, ..., A” 7

[41, ..., [1.3, where Ak>0 (k=1,...,r), Z Ak=l, #4020 (lc=1,...,s), suchthat

k=1

7

8

x = 2 AIc‘llc'l' 2 I‘lcblc‘ k=1

k=1

(6)

Denoting the object function by g(x), we have T

I

90") = 2 AIc.¢7(0‘k)+ 2 MW)k=1 Ic=1 We distinguish the following three cases. 1. g(bk)>0

for Ic=l, ..., 8.

If x is to be a minimal solution then pk must be zero for all k: l, ..., 8. Let ako be a vertex vector for which g(a,c) takes the least value :3 = 9(0)“) = min 9(ak)~ ’q'fl=02. Proof. Multiplying (l) and (2) by 11’ and E respectively, it follows that

Y’§>n’AE>n’l3Theorem 2. (Duality Theorem) 1. If (1) has an optimal solution, then so has (2) and vice versa. The optimal

values of the two object functions are equal. 2. If (1) [(2)] has admissible solutions with arbitrarily small [large] values of the object function, then (2) [(1)] has no admissible solutions. 3. If (1) [(2)] has no admissible solutions, then (2) [(1)] either has no admis-

sible solutions or it has admissible solutions with arbitrarily large [small] values of the object function. Proof. 1. Suppose (I) has an optimal solution. Let the minimal value of the object function be 01. Then, it follows from Ag—e 2 0,

E, 2 0,

that

y’g—ol 2 0.

(3)

Using this we will now show further that,

if

AE—flg 2 0,

E 2 0,

£2 0,

then y’E—alé 2 0.

(4)

Suppose first that z > 0. We divide the conditions of (4) by C and obtain the conditions of (3) with {—1 g in place of E By (3), it follows that y’E—lg—al 2 0,

i.e.,

y’E—UIC 2 0.

Suppose secondly that i =0. We must show that the conditions

Ago > 0»

go > 0,

Y, go < 0

(5)

cannot be satisfied simultaneously. Let E1 be an optimal solution of (l), i.e.,

Agl 9 [3’

E1 9 0 and Y, g1 = 01,

then, if there were a go satisfying (5), it would follow by addition that

moi-£1) 9 0 and Y'(go+§1) < 01.

A(Eo+g1) > B,

This is a contradiction because 0'1 is the minimal value ofthe object function.

This completes the proof ofstatement (4) and we will now write it in the form A—B

if

I

(0 154

E

/

(1)) (C) > 0

,

then (Y

',-0'

/

1)(Z) > 0

.

8.3: THE SIMPLEX METHOD

(The matrices in this are written in terms of submatrices as shown. In particular I is the n x 11. identity matrix.) By Theorem 6.3;8, there is a matrix (n’mimé) >0, such that

A—

(n'mimé) (1

0 = (y’.-01),

0

i.e., i.e.,

n’A+n{ = y’ n’A S y’

1

and

—'q'§+1)é = —-01,

and 1H3 2 01.

Thus 11’ is an admissible solution of (2) and the corresponding value of the object function is at least ‘71- The first part of the theorem now follows because, by Theorem 1, the value of the object function of (2) is never greater than 01 at any of the admissible solutions of (2). (Using the fact that

the relationship between dual programmes is symmetric.) 2. The second part of the Theorem is a direct consequence of Theorem 1. 3. The third part is a consequence of the first. Suppose for instance that (2) has admissible solutions, and suppose that the set of values of the object function on the set K of admissible solutions is bounded above. Then (2) has a maximal solution and therefore, by 1., (1) has a minimal solution. Finally we see that it is possible for both programmes simultaneously to

have no admissible solutions, for example, when A =0, e > 0 and y’ < 0.

8.3

The Simplex Method for the Numerical Solution of Linear Programmes

Consider the following definite programme.

9 = AE+p 2 0 E 20

(1)

y’g = min Our first task is to find the optimal basic solutions. It turns out that at the same time we will also obtain all the optimal solutions, i.e., those corres-

ponding to section 2 of Theorem 8.1 ;1. Because of the restrictions g 2 0, the rank of the system of inequalities in a definite programme is equal to 11.. Hence, by Theorem 7.4;3, there are n

linearly independent variables in the set {£1,. ”,5", p1, .. .,pm} which take the value 0 in an admissible basic solution. The other variables and the object function can be expressed in terms of these linearly independent variables and the variable 1- (which is to be set equal to l) and this gives a tableau of the following kind. 155

CH. 8: LINEAR PROGRAMMING

6

P

Bi

[1*

P=

5= 9 =

T

B; y:

...

y:

(2)

3*

where the top row contains the n vanishing variables and 1-, and the lefthand column contains the other variables and the object function 0. Since the tableau corresponds to an admissible solution )3: 2 0 for all i: 1, . . ., m, because these are the values of the variables on the left-hand side for the given basic solution.

Thus every admissible basic solution corresponds to a tableau of the form (2) with )3: >0 (i: 1),. . .,m). Conversely, we can construct an ad-

missible basic solution from any tableau ofthis form by putting the variables in the top row equal to zero and calculating the others from the tableau. Narmal and Degenerate Basic Solutions

Tableau (2) corresponds to a normal basic solution if and only if )3: >0 for all i: 1, . . ., m. If [31 :0, say, then, in addition to the 71. variables in the

top row, the variable in the first row also takes the value 0 (and conversely). A tableau which corresponds to a normal (degenerate) basic solution will itself be referred to as normal (degenerate). Optimal Basic Solutions

If y: 2 0 (k: l, . . .,n), then tableau (2) represents an optimal basic solution and will then be said to be optimal itself. In every other admissible solution,

at least one of the variables in the top row will take a strictly positive value. Hence the corresponding value of the object function is not smaller than that for the basic solution (2). If y: > 0 for all k = 1, . . ., n, then (2) represents the unique optimal basic solution.

Conversely, if the tableau (2) is normal and optimal, then y?) 0 (Ic=1,...,n). For, if yzo0 (i=1,...,m), we can choose a strictly positive value for the variable at the head of the

koth column so that the variables on the left-hand side are still positive. In this way, we obtain a new admissible solution, which gives a smaller value of the object function. 156

8.3: THE SIMPLEX METHOD

On the other hand, some of the y: can be negative in a degenerate optimal tableau, e.g., when one of the Bf, say fix, is zero and “Zk0 P3 = —§2-§3+3 3 0

$220 £3 2 0

P1 =

P4 = —§1

—§3+3 3 0

P5 =

“£344 9 0

P6 =

'51

52—5344 > 0 0 —- —E3 = mm 163

CH. 8: LINEAR PROGRAMMING

Fig. 22. 2nd Tableau

Ist Tableau

p1 p2 p3 p4 p5 p6

= _ = = = =

9 =

51

£2

£3

1

O —l 0 —l l 0

—l 0 —l 0 0 l

1 l —1 —1 —-l —1*

1 1 3 3 1 1

0

0

—l

0

E1

52

P6

= = = = = =

0 —l O —1 1 0

0 l -—2 —1 —1* 1

—1 —l l 1 1 —l

2 2 2 2 0 l

0 =

0

1

—l

p1 p2 p3 p4 p5 £3

-l

1

The second tableau is degenerate. In View of the rule 31,“.o < 0, only 0:52 = — 1

can be considered as a. pivot. The next tableau still represents the same solution. 164

8.3: THE SIMPLEX METHOD

37d Tableau

4thTableau

51

P5

P6

7

P3

P5

P6

7'

p3 =

0

—l

0

p3 =

—2*

2

—l

2

£1 =

-%

1

-%

1

p4 =

—2

1

0

2

p4 =

l

—l

l

0

£2 =

—%

0

t

£3 =

1

—l

0

1

0 =

—l

l

0

0=

s

0

g:

£3 = —%



0 —%

2

1

2 —2

The fourth tableau g’ves the minimal solution (1,1,2) with 0: — 2. This

solution is degenerate because [)3 = p4 = p5 = p6 = 0. Even though 3/2 = 0, there are no other optimal solutions because, if 0 = — 2, then Taps + p6) = 0 and hence p3=p6=0 and p4: —p5=0.

If we drop the principle that ’71:» < 0 and try to choose a pivot in the first column of the second tableau, say 0121 = — 1, then we have

Tableau 3a P2

52

P6

T

{I =

—1

1

—1

2

p3 p4 p5 £3

= = = =

O l —l 0

—2 —2* 0 l

l 2 0 —1

2 0 2 l

0 =

0

1

—l

—l

We come to a. new degenerate solution, but Without decreasing the object function.

165

CH. 8: LINEAR PROGRAMMING

Tableauela P2

P4

7

—1

2

-%

0

2

l

—l

2

a; 4;

0

1

0

0

p1 =

0

0

El=

—%

p3 =

—l

£2 =

P6

2

p5 =

—l*

£3 =

t

-%

0

1

0 =

—§

1}

O

—1

This tableau represents the same solution as 3a. The pivot has to be chosen in the first column and there are two possibilities for this (see Exercise 2).

Tableau5a P5

P4

P6

7

£1 =

i

—%

0

1

p3 =

—1

l

—1

0

£2 =

—%

-%

1

1

p2 =

—l

0

0

2

£3 =

—%

—%

0

2

0 =

{-

§

0

—2

This tableau again represents the optimal solution. The two routes to the solution are again marked by arrows in Fig. 22. Case 3. The tableau is optimal. We restrict the discussion to normal tableaux, i.e. , we assume that [3; > 0 for all i = l , . . ., m. In this case, the condition

of being optimal is equivalent to 71: 2 0 for all k: 1, . . ., 11,. Further, we have already seen that the tableau represents the unique optimal solution if and only if 'yk > 0 for all k: l, . . ., u. If Yb» =0 for some k0 and all the corresponding an,“ are positive, then we can obtain optimal solutions by giving at 166

8.3: THE SIMPLEX METHOD

least one variable an arbitrarily large value (Theorem 8.1;1, part 2). If y,“ =0 and there is an i0 such that “to he 0’

5120 €230

—2§1+3§2+620 = ——2§1+§2=min (b)

2§1—3§2—353+18>0: +1090, —2§1+ £2 _

§3+

4201

£120 £220 E320

9: —§1—§2——3§3=min (0)

451‘3éz—E3—2é4'l'5420»

_ g,

€120

+1820,

.5220

+4020, 51—352 +3020, — §2—§3 ‘53—254‘1'4820

5320 5420

0:51—52—53—§4=min

169

CH. 8: LINEAR PROGRAMMING

Solutions. (a) 51:6, §2=2, 9=_10

(b) €1=9,Ez=8,£3=4,0=_29 (6) £1=0, 52:2, 53:28, 54:10, 0: —40. 8.4

The Treatment of Free Variables

In 8.3 we assumed that the programme was definite, i.e., that there were no free variables. We will now describe a variation of the simplex method which allows us to eliminate any free variables at the beginning of the calculation. We will restrict the discussion to normal tableaux, but it will not be difficult to see how we would deal with degenerate tableaux. Further, we will still assume that the zero-vector is an admissible solution, i.e., )3, 2 0 for all i=1, ..., m (see 8.5 for the case when this is not so). The free variables are initially in the top row of the tableau and, since we eliminate them (i.e., we do not carry them into the rest of the calculation when they have been exchanged with other variables), it follows that free variables will only ever appear in the top rows of tableaux. As before, we

will keep to the principles G1, G2 and G3 in 8.3. Then, as a consequence of G1, it follows that all the coefficients in the r-column of any of the tableaux must be positive (leaving out the row corresponding to the free variable which has been exchanged (see Example 1)).

Now we will assume that 51 is free and therefore look for a pivot in the first column. 1. If an =0 for all i, then we can always obtain admissible solutions by giving $1 an arbitrary real value. 1.1. 911:0. Then 0 is independent of 51. Hence we need not consider £1 at all in the solution. 1.2. 'yl #0. There exists no minimal solution.

2. Not all an =0. Let a,” 1 #0 be a possible pivot. 2.1. 8 goes into B—Bioyi/‘Xini which, to agree with G3 should be < 8 and if possible < 8. 2.1.1. 'yl < 0. It follows that a“ 1 should be < 0 and it is then possible to satisfy G3 in the strong form. (If all “i1 are positive, then there exists no minimal solution.)

2.1.2. 311 > 0. It follows that a,” 1 should be > 0 and it is then possible to satisfy G3 in the strong form. (If all an are negative, then there is no minimal solution.) 2.1.3. 311:0. It is only possible to satisfy G3 in the weak form.

2.2. For igéio, )3, goes into Bi—‘Bioail/aio 1 which, to agree with G1, should be 20 and if possible >0 to agree with G2. 2.2.1. yl < 0. Then out, 1 < 0 and G2 is satisfied whenever a“ > 0. If on“ < 0,

170

8.4: TREATMENT OF FREE VARIABLES

then as before Bi/ocfl g Bio/a5, 1 and so we choose 2}, such that Xi» =

max

x,-

i,au 0. Similarly, we choose i0 such that Xin= min Xi‘i, an >0

2.2.3. 71 =0. This case can be dealt with either as in 2.2.1 or 2.2.2. Thus we have the following rule for choosing the pivot. If 71 > O (< 0), then we choose the pivot from among the positive (negative) coefficients of the first column. The one which is actually chosen is

decided by calculating the characteristic quotients x,=B,~/a,-1 for each of the positive (negative) coeflicients on“ and finding the index i0 such that

Xio = min Xi( max Xi)‘ If 'yl =0, we can consider it to be either positive am >0

i, am w0, every optimal solution of ( 1) is also an optimal solution of (2). This statement of Theorem 1 corresponds to the intuitive idea. that, for a large positive on, in order to make 01 small we must first minimize 02 and

then 00. Proof. 1. An admissible solution 7

3

97 = Z )‘kak‘l' 2 .u'lcbk

k=l

(3)

k=1

(see 8.1 ;(6)) is optimal for an object function 0(a) if and only if from M > 0,

and

it follows that (1,, is an optimal solution

from M: > 0,

it follows that 0(bk) = O

(k = 1,. . .,r)

(k = l, . . .,s).

2. Let K be the set of admissible solutions of the programmes (1) and (2) and let a: e K be an admissible solution which is not minimal for 02. Then, for the representation (3) of at, either

there exists 160,

such that A,“ > O and ako is not minimal for 02,

or there exists loo,

such that p,“ > 0 and

02(bko) 95 0.

In the first case, there exists 3. ch 6 K such that 02(cko) < 0201,“). From this, it follows that

91(0):») = 000%..) +w92(6k.) < odds.) +w02(ako) = (Make), whenever

00(cko) — 000%)

w > ————— ' 02(ako) " 02(0160)

If (.0 satisfies this condition, then aka, and hence x, is not optimal for 01.

173

CH. 8: LINEAR PROGRAMMING

In the second case 01(bko) = 60(bkn)+w02(bko)

)

00(bk w 95 - —°-

when

* 0,

02(1),...)

If on satisfies this condition, then x is not optimal for 01. 3. Obviously, Theorem 1 will now be satisfied if we put (00 equal to the largest of the numbers

90(0k) —' 00(0):) —

and

02(0):) “ 92031:)



00(bk) 7 02(13):)

where at runs through all the vectors in the set {0,1, . . .,a,} for which 02 is

not optimal, 6,, satisfies the condition 02(ck) < 02(ak) and bk runs through those vectors in the set {b1,.. .,b8} for which 02(bk) #0. We will first carry out the method for a particular example. Example].

I

P1 = 514'452— 8 3 0 2514‘352-12 9 0 P2

p3=2£1+£2—6>0 9 =

§1+

£1 2 0 £2 9 0

4

£2 = min

We add 53+ 12 to each of the restrictions on the left-hand side, where E3 is a new free variable, and we introduce a new restriction £3 + 12 2 0.

We also replace the object function by 0 + w(f3 + 12) = £1 + £2 + w(§g +12) where w is a constant whose actual value will not need to be known. In this way we obtain the new programme

p: = £1+4§2+éa+ 4 2 0

£1 2 0

P2 = 251+3§2+§3

2 0

£2 3 0

P; = 2§1+ §2+§3+ 6 9 0

£3 free

P: =

§3+12 2 0

01 = {1+ §2+w(§3+12)= min

(5) J

The programme (5) has the zero vector as an admissible solution (all Bi 2 0) and can therefore be solved by the earlier methods. From Theorem 1, we also know that, for sufficiently large to, every optimal solution of (5) is

also optimal for the object function £3 + 12 with the same restrictions. (The constant terms in the object functions obviously do not affect the solution.) Now however the minimum of £3 + 12 is zero, providing (4) has any admissible solutions at all. If this is the case, then the solution of (5) will auto-

matically give {53 + 12 = 0 and the corresponding values of £1 and {52 will be an optimal solution of (4).

174

8.5: GENERAL LINEAR PROGRAMMES

$76.3

86.3w

aw 3V ofifiaumgm

.3 .aE o H Hi

.52" a.

.amEOO"

0 H at

o H on

3v oEEdQOHW

“w

175

CH. 8: LINEAR PROGRAMMING

In order to carry out the calculation, it is not necessary to know the actual value of m. We merely assume that w is large enough to ensure that, for any constants cc, ,8( > O) which appear in the calculation, the numbers ac + Bo) and oc— film are positive and negative respectively irrespective of the sign of ac. For simplicity, we will leave out the asterisks on the pi in the following tableaux. IstTableau

£1 £2 £3

7'

pl:

1 4 1

4

P2:

2

0

3

1*

p3:

2 1 1

6

[14:

0

1

12

01:

1 1 to

0

12¢»

First we eliminate the free variable {3. 2nd Tableau

£1

£2

pg

1-

p3 = p4 =

0 —2

—2* —3

l l

6 12

91 =

1 —2w

1 —3w

w

12w

3rd Tableau

fl

P2

1-

P1 =

‘1

‘i

%

7

52 =

0

-%

92

3

3w—l

1—14)

p4 = 6?1 =

176

p3

—2*

g 4:

l—2w T T

3

3(w+l)

8.5: GENERAL LINEAR PROGRAMMES

4th Tableau P4

P3

P2

T

%

P1 =

%

—%

71

52 =

0

—%

12-

3

£1 =

-%

*2

—%

%

2w—l 01 = T

1

i» :

The fourth tableau is optimal. From it, we have {1 = g and £2 = 3. Hence, from the second tableau, we have £3: —3—9= —l2 and therefore £3 + 12 =0. This means that §1=%, 52:3 is an optimal solution of (4) and the minimum value of 0 is g. The programmes (4) and (5) are represented geometrically in Fig. 25. We note in particular that for (5) the planes 01 =constant become steeper

as w decreases. If a) is too small, then the optimum is taken at point B instead of point 0 which corresponds to the optimum of (4).

This example shows that the method applied here is also applicable in general. This can be described in terms of the following rules for the solution of the general programme it

1E1 “erg/n+3; 2 0

(i = l,...,m)

some of the Ex: are free and some are 2 0

(6)



0 = E Ykélc = min k=1

1. We construct the new programme 75

kgl aikfk+£n+l+8+13i 2 0

(1’ = 1,

)m)

§n+1+8 2 0

Conditions for £1, . . ., 5,, as in (6) and {n+1 free

(7)

n

91 = kglyk§k+w(§,.+1+3) = min where 8 is chosen so that 8+,3i20 for all i=1, ..., m. 177

CH. 8: LINEAR PROGRAMMING

2. We solve the programme (7 ), assuming that w is so large that, for any constants or and B( > 0) which appear in the calculation, the numbers at + Ba; and ac— [3w are positive and negative respectively, irrespective of the sign of cc. 3. If, in an optimal solution of (7), fa“ + 3 > 0, then (6) has no admissible solution. On the other hand, if 5,,“ + 8 = 0, then an optimal solution of (7) also gives an optimal solution of (6).

Exercise Solve the linear programme §1_§2_ 2 9 0: §1+§2— 8 9 0:

£1 9 0 £2 9 0

—2fl—§2+20 2 0 0 = 3§1+§2 = min

by introducing a new variable.

Solution. 51:5, {2:3, 0:18. 8.6

The Simplex Method and Duality

In this section we will show that the duality law of Linear Programming

(Theorem 8.2 ;2) is contained in the duality law of the Exchange Method (see 3.3.2).

We consider the definite programme p=A§+B>O

£20

(1)

0=Y’§=min where we will assume that [3 2 0. The dual programme, denoting the variables by an and 0.7, is °I=_n’A+Y,>0

71’? 0

(2)

-r = —n’p = max

(Note that here 1 now denotes one of the two object functions and is not to be set equal to l.) 178

8.6: THE SIMPLEX METHOD AND DUALITY

Suppose that the first and final tableaux for (l) are £1

£1:

P1

.81

Pm ‘_

0

E

1

BM

7’1

7n

P

P:

A*

(3)

(4)

g:

0=

0

vi

3/:

where (3* 20, y*’ 20, 3 0 03 = 2711+ 772— 6 2 0

’72 9 0

‘1' = 711+l = min 7- : —n1—7}2 = max. i.e., This example is identical with Example 8.5 ;l which we have already solved by introducing a free variable. Since the coefficients of the function — 7- are positive, we can also solve the programme directly by writing the tableaux

in the vertical form. 2nd Tableau

Isl Tableau (71:02:03 =

‘711

”1

"2

‘2

l

—8

—12

—6

“’72: 02: 0'3:

7':

1

—n2 —4* —3 —1

1

0

‘711

i

—%

'i*

l

2

—6

—4

7':

i"

-01 a —e —i

l —2

181

CH. 8: LINEAR PROGRAMMING

'02 =

3rd Tableau

—as a} —ol —%

111 =

—, —% —4* %

-r :

% %

1—4’ ——a—2 —° —%

1

1,2 =

4th Tableau

02 =

0'1 =

1,1 =

1- =

—03

%



‘2

i

‘02

"%

"%

i

i

3

”—f

%

‘17“

l

From the fourth tableau, we have the optimal solution 111 =§, 172:3. The

maximum value of +1- is —% and hence the minimal value of — 7- is +% which agrees With the earlier result. Example 3.

P1 = f1+§2—1 > 0 ‘52+1 2 0 P2 = 0 = 2E1+§2 = min.

In the horizontal interpretation, this example can only be solved by introducing a free variable 63. However, since the coefficients in 0 are positive, we choose to use the tableaux in vertical form where we will denote the variables by f, ,2 instead of 1,, a. Ist Tableau

211d Tableau

P1 = P2 =

£2= p2= —0=

-0 =

—{;'1

-—1

0

2

—§1

1

—§2

—1*

1

1

—pl

—1

—1

l

0

l

1

182

1

-—1

l

1 O

—l

8.6: THE SIMPLEX METHOD AND DUALITY

The optimal solution is El =0, f2=l and the minimum value of 0 is 1. Exercises

1. Formulate the dual programmes of the three programmes in Exercise 83 ;4 and solve them. Solutions. (a) —771+1]2+27]3—2 2 0,

7,1 2 0

711+”02—3773‘l‘1 2 0:

772 2 0 713 2 0

'r = —4n1—81]2—6'q3 = max

ns=%;

nz=%,

n1=0,

(b) —27]1+2”)2 ‘130, —1>0, 3771— n2 3'01 +773—3>0»

r=—10

71120 12220 71320

7-: —181}1—10772—4773=max 711:7? 712=%: Via-1%; 7': ‘29

(0) ‘4’71'1‘712— "’13

+190:

77120

—1>0,

71220

711

+n4+ 175-00:

17320

2"71

+2VI5—190:

W490 71590

3771

+3713+7l4

7': —547)1—18112—40173—30174—48775=max

711=tu 172:0» 773:0) 774=%: 775=%§1’= ’40 The solutions could also be read off from the final tableau of Exercise 8.3 ;4 by using the vertical interpretation. 2. Solve the linear programme of Exercise 8.5, without introducing a new variable, by using the vertical interpretation. Solution. The initial tableau is

—§1 —._s2

P1:

P2:

—1 1

—1 —1

P3:

2 1

7=

3 1

183 13

CHAPTER 9

TCHEBYCHEV APPROXIMATIONS 9.1 Tchebychev’s Method of Approximation Consider the following real system of linear equations

:3: Maw; = o (i = l,...,m),

i.e., A§+p = 0

(1)

and for the moment suppose that (l) is the expression of some physical law, i.e., that the physical quantities a“, 13;, Q are connected by the relationships given in (1). Further suppose that the coeflicients a“, and B,- can be

measured by experiment and that the quantities {,6 have then to be calculated using the equations (1). Since experimental measurements are always subject to error, it is possible that the system (1) will be in-

consistent and have no solution even though physical considerations will show that there should be a solution. The problem then is to solve the system ‘as closely as possible’ in a sense which must be more precisely defined. For arbitrary real values 51, . . ., f," we will put 1‘

E1: = k2] aikfk+fii (’1: = l,.. .,m),

i.e., E = Ag+p

and refer to these quantities as residuals. If E is a solution of the system, then a = 0. Consequently, when the system cannot be solved exactly, we will try to find values of the unknowns .51, ..., fn such that the greatest of the absolute values of the residuals is as small as possible. The solution of this problem is known as Tohebychev’s Method of Approximation. We remark here that, apart from Tchebychev’s Method, there are several

other methods of approximating to the solution and in particular there is the method of least squares which is due to Gauss and which will be met in 12.2. In fact the method of least squares is more suitable than Tchebychev’s method for solving those problems in which the inconsistency of the equations (1) is due to statistical errors in the coefficients, as for example in

the sort of physical problem considered at the begirming of this section. 184

9.1: TCHEBYCHEV’S METHOD OF APPROXIMATION

(Of. [20], p. 124). Tchebychev’s method is more useful for the approximation of functions (see Example 4). Now let a be the largest of the quantities [Gil which we have to minimize. This can also be characterized as the least upper bound of the absolute values M |, so that IEiI S o for all i = l, . . ., m and equality holds for at least one value of i. Thus we have to find values of the unknowns 5,, which make a- as small as possible. Since [Eel S a may be written in the form 5i S a and —— e; S a, we obtain the following linear programme: it

6i = kzlo‘ikgk'i'pi < 0 i: 1,...,m



(2)

—€i = — Z “Hugh—Bi < 0 k=1 0' = min

which has the free variables {31, .. ., L, and the object function 0:0 and clearly this could be solved directly using the techniques of Chapter 8. However, since the zero vector is not an admissible solution it pays to reformulate the programme by dividing through by a and introducing the following notation.

:=%,

3=;.l

(a

Then (2) may be rewritten in the form

pa = élaik§2+f3¢ 3+1 2 o; Piz = — 1:231 airfi—B;§:+l 2 0;

a: 1, ...,m i= 1, ...,m

(4)

—f°:, = min

This is again a linear programme with the free variables {3, 5‘1“, ..., f; It has {3 = f: = . . . = 5’; =0 as an admissible solution and it is easy to find the simple rules by which it is derived from (1). In particular, we note that

Pi1+P52 = 2 fori = l,...,m.

(5) 185

CH. 9: TCHEBYCHEV APPROXIMATIONS

The initial tableau for the simplex method is

f?

£3

1

B1 S

1 E

Pml =

Bin

1

P12 =

"‘31

1

P11 = S

6:

A

(6) —A

E Pm2 =

0 =

0

0

3

E

—3m

1

—l

0

During the calculation, it is helpful to keep the relations (5) in mind, because they must also be satisfied in each later tableau. This means, 1. If for some i, p“ and p52 are on the left—hand side, then the corresponding rows are identical except that they have opposite signs (apart

from the last coefficients whose sum is equal to 2). 2. If p“ is on the left-hand side and pi2 is in the top row, then the row corresponding to p“ consists entirely of zeros except that the coefficient

under piz is — 1 and the last coefficient is 2. By remembering these two rules, we can save ourselves a considerable amount of work in the calculation. The final tableau of the method leads to the solution of the approximation problem as follows. 1. The minimal value of the object function is 0: —§f,= — 1/0. 2. The values of the unknowns g, are given by 611: 0ft,

,0: 1,...,7L.

Example 1. We will first consider a trivial example, viz. the system

51 = 1

(7)

61:0

With only one unknown which is supposed to satisfy the incompatible equations (7 ). We see immediately that the Tchebychev method gives the solution 8:}. Both residuals then take the absolute value i, While, for 186

9.1: TCHEBYCHEV’S METHOD OF APPROXIMATION

every other value of £1, at least one of them takes a larger absolute value. O

The linear programme corresponding to the system (7) is \V

P11= fl“— 3+1 P12 = —§:+§3+1 2 0 +1 2 0 P21 = fl P22 = “El +1 9 0 9 = —§f, = min

Ist Tableau

5

£3

1

P11 = P12 = P21 =

1 —1 1

—1* 1 0

1 1 1

P22 =

‘1

0

1

9 =

0

—l

0

2nd Tableau (Elimination of £3) E:

P11

1

3 _

1

—1

1

P12 =

0

—‘1

2

P21 =

1

0

1

P22 =

—1*

0

1

-—l

1

—l

9 =

187

CH. 9: TCHEBYCHEV APPROXIMATIONS

3rd Tableau (Elimination of g) P22

P11

1

o —1

2

P21 =

'—l

0

2

j =

—1

0

1

9 =

1

l

—2

m=

This tableau is already optimal and gives E: = l , p11 = 0, and, from the 2nd tableau it follows that £3 = 2. The minimum of 0 is — 2 and therefore 0min = § and hence £1 = a}. The calculation confirms the obvious result.

Example 2.

251

—4 = 0 62—1 = 0

(8)

51+52—2 = 0 §1=2 “£2

\

a: o

§2=l

l

l

l

>§1

3

2

§1+§z=2 Fig. 26.

For any point (£1, E2) of the plane, [fl + £2 — 2| is the distance of the point from the line §1+§2—2=0 multiplied by V2 and similarly [251—4[ and |f2 — l I are the distances from the lines 251 -— 4 = 0 and £2 — 1 = O multiplied by 2 and 1 respectively. The problem now is to find a point (£1,§2) which will

make the largest of these distances (multiplied by the appropriate factor) as small as possible. (See Fig. 26, Where the solution is denoted by x.) The corresponding linear programme is 188

9.1: TCHEBYCHEV’S METHOD OF APPROXIMATION

P11 = 2E; P12 = —25:

P21 =

‘43-” 2 0 +4§b+1 2 0

f;—

P22 =

3+1 2 0

‘5?!- fb'l'l 9 0

(9)

pa; = §t+£§—2§3+1 2 0 p32 = — £1- §+2§§+1 2 0 9 = —E3 = min

Ist Tableau

2nd Tableau (elimination of £3)

E?

£3

£3

1

f?

E;

P11

1

P11 =

2

0

—4*

l

f) =

i‘

0

-1

i

P12 =

—2

0

4

1

p12:

0

0 —1

2

_%*

P21 =

0

l

—l

1

P21:

1

i,

2_

P22 =

0

—-1

l

1

P22:

12.

._l

_i

.2

P31 = P32 =

1 “1

1 ‘1

’2 2

1 1

P31= P32=

0 0

1 ‘1

l‘ ‘1‘

% g

0=

0

0 —1

0

a=

fl};

0

3rd Tableau (elimination of a)

P21

5;

P11

P12 =

0

0

—l

2

P22 = pa =

"'1 0

0 l

0 i

2 %

P32 =

0

—1*

‘b

%

9 =

l

—l

0

—l

i —;

4th Tableau (elimination of E2)

1

P21

P32

P11

p12 =

0

0

--l

2

P31 = 3 =

0 ‘1 0 0 —1 —%

% 3

0=

l

1

§

1

—§

189

CH. 9: TCHEBYCHEV APPROXIMATIONS

Since all the free variables are now eliminated, this is the point at which the simplex method should begin. However, since the first part of the last row consists of all positive numbers, we already have an optimal tableau. The optimal solution is E; = %:

P11 = P21 = P32 = 0:

P12 = P22 = P31 = 2-

From the third tableau it follows that 5* = g. The minimal value of 0 is — g, hence 0min =% and finally {‘1 =%, £2 = g. Substituting these values in (2), we obtain the residuals 61 = 62 = 63 = - g. The fact that all three absolute values

[5;] are equal to % is the result of a more general rule, viz. the rank of the system of inequalities in programme (4) is equal to the rank of the matrix (11,9) and, in our example, this is equal to 3. Hence, in a normal basic solution, three of the inequalities (9) must be satisfied With the equality

Sign.

Example 3 (see Fig. 27).

51+ gr” 8 = 0

(10)

£1“ £2- 2 = 0

§1+252—10 = 0

Ist Tableau

£1

190

£2

£3

1 1

P11 =

l

l

—-8

P12 =

—1

—— 1

8

1

P21 =

l

—1

'— 2

1

p22 =

—l

1

2

1

P31 =

1

2

— 10

1

P32 =

—l

—— 2

10

l

9 =

0

0

—- 1

O

9.1: TCHEBYCHEV’S METHOD OF APPROXIMATION

M2 §1+§2-2=0

§1+Ez-8=0

§1+2£2—10=0

We find by elimination that

E?) = fifi+fi§;—fil’31+il6 2

4 *

_. £1* — _u s P22+Jr§2—%P31+ls_'

The 4th Tableau then becomes

P22

P11

P31

1

z =

—1

—4

3

P12 =

0

—l

0

2

P21 =

—-1

0

0

2

P32 =

0

0

0=

%

g —

‘—1*

2

2

—1

At this point, we must start to use the simplex method. 191

CH. 9: TCHEBYCHEV APPROXIMATIONS

5th Tableau

P22

P11

P32

1

P12 =

0

-—l

0

2

P21 =

—1

0

0

2

P31 =

0

0

—l

2

0 =

L-

g

l

—3

This tableau is optimal and we find P12 = P21 = P31 = 2:

P11 = P22 = P32 = 0

and hence £3: 8, fl = 15. The minimal value of 0 is —3, hence 0mm =§ and 51:5, fz=§. Again all three [ail are equal to the same value 9; as they should be.

Example 4. We start with the function x(-r) =cos 111/2 of a real variable 1and look for a polynomial 2(7) = a0+ «11+ 412 7-2 of degree at most 2 which most closely approximates to 27(7) at the points 1- = 0, i l , i 2, i.e., for which

the largest of the five absolute values [2(1) —x(r) I, (7- :0, i l, 1- 2) is as small as possible. Since x(-r) is an even function, 2(7) must also be an even function and hence «1 =0. Consequently, it is suflicient to consider only the values 1:0, 1, 2.

Now

2(0) —x(0) = a0

—l

2(1)-x(1) = ato+0t2 2(2) —x(2) = oco+4oc2+ 1.

Thus we have an approximation problem to solve for do and «2 using

Tchebychev’s method.

Ist Tableau

a:

P11 = P12 =

192

1* -1

a:

£3

1

4 —-4

l —— 1

1 1 1

P21 =

l

1

0

P22 =

-- l

—l

O

1

P31 =

l

0

'— l

1

P32 =

—l

0

l

1

0 =

0

0

-— l

0

9.2: PROOF OF TWO RESULTS USED EARLIER

The solution is do = i «2 = — %, a = i. The required polynomial is therefore 2(7) = 1(3 — 21-2). It differs from x(-r) by the absolute value & at each of the five points 7:0, 1 1, -l_-2. Every other polynomial of degree at most 2

differs from x(-r) by more than i at at least one of the five points. Exercises

Use Tchebychev’s method of approximation to solve the following systems of equations. (a) £1

=5

§2=7 51+§2=0 £1—§2=1

(b)

3§1+3§2+363=90

51+ 52“ §3=20 §1_ £2+ 63:20 ‘51"? 52+ 53:20 Solutions. (a) 51:1,

52:3.

0’) §1=§2=§3=1L

9.2

9.2.]

The Proof of Two Results used Earlier

Theorem 6.3;2

In the proof of Theorem 6.3 ;2, we assumed that 9 = min max "‘i ' '~:Ar)_§iol Ak>0 l$i I‘Therefore there exists a ho such that

f(x0’yko) > M and “11:0 > 0-

In this case yM is inessential because, if y is a mixed strategy of Min in which yko has a strictly positive weight, then it is easy to see that f(x0,y) >'u,.

Therefore y is not optimal. The Practical Significance of Mixed Strategies

In the applications of Game Theory, mixed strategies are usually used in the following stochastic way. (The concepts of Probability Theory, which are used here may be found in [29].) 199 14

CH. 10: GAME THEORY

Let x: (£1, . . .,§m) and y=(1;1,...,17,,) be mixed strategies for Max and Min. We now think of the subscripts 13, k of the pure strategies as being independent random variables which take the values 1‘: l, . . ., m and k = 1,

. . ., n with the probabilities fl, . . ., 5m and 7,1, . . ., 7}”. Then the pay-0E is also a random variable which corresponds to the matrix element a“, with prob-

ability £51”. The expected value of the pay-ofi' is therefore

9081/) = aaikéink = E’A"! = “$41), i.e., it is equal to the value of the pay-01f for the mixed strategies 2: and y,

We now imagine that not just one but N moves are played. At each move. Max and Min each independently choose a pure strategy at random subject to the probabilities fl, .. ., 6m and 711: . . ., 17” given by the mixed strategies 2:

and y. At each move the pay-ofi‘ has the value of a matrix element “it: (with a certain probability) and we can calculate the arithmetic mean of these values denoting it by gN(x,y). By the laws of Probability Theory, these

means converge with probability 1 to the expected value as N—> co, i.e.,

13111100 m, y) = Way) = flay)Example 3. In the game already considered (Example 2) 0112 A=1023 2101

Max has 3 and Min has 4 pure strategies available. Both players can now achieve their optimal mixed strategies, x = (%’%:%)

and

(1/ = (%,%,%,0)

by the following stochastic method. At each move, each player throws an ordinary die. Max chooses

and

i = 1 when he throws l, 2 or 3 .5 = 2 when he throws 4 i = 3 when he throws 5 or 6.

Min chooses k = 1 when he throws l k = 2 when he throws 2, 3 or 4 k = 3 when he throws 5 or 6 and he never chooses Ic=4.

Thus at each move the fall of the die determines a pure strategy for each player. It corresponds to an element a“; of A as the value of the pay-off. 200

10.3: THE EVALUATION OF GAMES

If the arithmetic mean of a sufficiently large number of these pay-offs is now calculated, then it approaches arbitrarily close to the value of the game, i.e., to p.=-g-.

The method of applying mixed strategies which we have described here must be used in practice with great care. As long as only a finite number of moves are played, there certainly is no guarantee that on average the players will reach the value of the game. The effective mean value Will only approach within a given small distance a of the value p. with a given prob-

ability when the number of moves is sufi‘iciently large. However, if the number of moves is given, then the mean value will differ from p. by given amounts with given probabilities. For example, if just one move is played in the game above with the given optimal strategies, then Max has the probability 10/36 that he will only achieve the value 0. The probability that the pay-off for Min is either 1 or 2 (i.e., worse than u) is 26/36. If two moves are played, then Max has the

probability 540/1296 of achieving on average only 0 or % while the probability that Min will have an average pay-off of at least 1 is 756/1296. In order to investigate this situation, it is necessary to make new approaches to the theory which are outside the scope of this book (see for example [26] chap. 13).

Problems

1. The pure strategy to for Max is said to be strictly dominated if there is a mixed strategy :15: (£1, . . .,f,,,) with Ein=0 such that E flat“, > 061:0 k=l

(,0 = 1,...,’I’L).

(l)

Show that every strictly dominated pure strategy is inessential. 2. If the ‘ > ’ sign in (1) (Problem 1) is replaced by ‘ 2 ’, then the strategy to is said to be dominated. Is every dominated pure strategy inessential?

3. Prove the assertion concerning saddle points which was stated at the end of 10.1.

10.3 The Evaluation of Games by the Simplex Method If we add the same constant A to all the elements of the matrix A, then the

pay-off function f(x, y) of the corresponding game becomes

f*(w,y) = E’(A+ADM = E’AnH‘é’Dn =f(w,y)+)«,

201

CH. 10: GAME THEORY

where D is the m x 7:. matrix all of whose elements are equal to l, i.e., f (x, y)

is simply increased by the constant A. Hence the value of the game is also increased by A and the optimal strategies are the same in both cases. Thus,

if we know the value and the optimal strategies for the game A , then we also know them for the game A +AD and conversely. By choosing A sufl‘iciently large, we can ensure that the value ,u+A of A +AD is strictly positive. In particular this will be the case when all the elements of A + AD are strictly positive. Because the pay-off function will then take only strictly positive values and hence the value of the game will also be strictly positive. Thus there will be no essential loss of generality for the numerical evaluation of a game if we assume that its value is strictly positive. Thus we will consider a game with the matrix A and the value p. > 0. An optimal strategy y=(1]1, . . ., 17”) for Min satisfies the inequalities

A1. s a,

(1)

where y. is the m x 1 matrix all of whose elements are equal to [1,. Indeed y. is the least real number for Which there exists 1, such that both (1) and n

kglm=h

77,520

(lc=l,...,n)

(2)

are satisfied. Ifthere were a smaller number [1.1 < ,u. with this property, then

flab?!) = 27147] < Ely-1 = [1-1 for every mixed strategy of Max, so that the value of the game would be less than or equal to [1,1, The relations (1) and (2) together with Ir=min constitute a linear pro-

gramme, the solution of which will give the optimal strategies for Min and the value [1,. As in 9.1, it is useful to modify the programme by dividing by ,u.. Putting 1):=17k/y., we have a = —A'q*+1 2 0 TM:

_1,,,* = _

11* 2 0 1

*

771; =

1

(3)

a

The initial tableau of the simplex method for the definite programme (3) is 202

10.3: THE EVALUATION OF GAMES

1]:

01

1):

1

=

l

.

-A.

.

1

am = 0:

—l

—-l

(4)

0

Suppose that the final tableau is a'

11*

a

= A*

77* =

l

:81 S B1»

0: h w W

a

where 3‘90 (i=1,...,m), ”>0 (k=l,...,n) and 8 0

and

0‘11

0‘12

0612

0‘22

2

= “110522"“12 > 0-

Example 4- L“ 4(37):fi+2§§+§§+4§1§2+8§1§3+2§2§3=E'Ag: where

124 A=221

411 212

11.1: QUADRATIC FORMS ON REAL SPACES

Isl Tableau

2nd Tableau

£1 52 £3

£2

:1 =7)1 =

§2=l=

712 =

n; =

£3

—2*

—7

‘7

—15

’73 =

Note that 11’2 and 7/3 are not identical with the original 112 and 173. 3rd Tableau

Es §3=~q§=

17°

In the calculations it is possible to make use of the fact that all the tableaux are symmetric. For instance, in the 2nd tableau, only one of the co-

efficients, — 7, needs to be calculated—the other follows by symmetry. Thus q(x) is put into the reduced form

QM = €i—%€§+%C§

(8)

by the transformation C1 = §1+2§2+ 453

£2 =

—2€2— 763

{3 =

lzlfs

(9)

The coefficients of Cf, are the reciprocals of the pivots. From (8) it follows

that q is not positive definite. The equations (9) are particularly easy to solve for the Eh because the

matrix of coefficients is triangular (i.e., all elements below the main diagonal are equal to zero). We first solve the 3rd equation for {13, then the 2nd for £2, and finally the 1st for £1 to obtain 51 = {1+ §2+T°6§3 £3 =

—%€2—fi€s

£3 =

fiCs

(10)

We can easily verify (8) by substituting (10) in the original representation of q(x). 2] 3

CH. 11: FORMS OF THE SECOND DEGREE

The equations (10) could also be found by doing the reduction with the full tableaux (see Example 12.1 ;4). It is also easy to find the basis vectors f1, f2, f3 to which the components

{1, £2, :3 are referred. For, if we write (9) in the form §=S§, then f =S* e=(S‘1)’e Where {21,e2,e3} is the original basis (Theorem 5.4;2).

Now S“ can be found from (10) and we have f1 = 31 f2 = e1_%e2 _ .6. 7 .2. f3 " 1931—1992+1933The method can obviously be applied in the way just shown (using the

diagonal elements in their natural order as pivots) whenever A1, . . ., An are not zero and in particular for positive definite forms. The matrix S in (9) is always then a triangular matrix. Since the matrix of the reduced form is diagonal (i.e., all elements ofi‘ the main diagonal are zero) and in View of

the remark after the proof of Theorem 1, we have the following result. Theorem 5. If, for a real symmetric matrix A, the determinants A 1, . . ., An are not zero, then there exists a non-singular real triangular matrix S such that A* =S’AS is diagonal. (See Problem 2.)

11.1.4

The Inertia Theorem it

Ifthe reduced quadratic form q(x) = 2 Ah {,E is transformed by substituting k= 1

z: = while, z: = a,

and

ink 7e 0 int = 0,

then q(x) becomes 1|

q(x) = 2 Mg?

k=1

where

pk = i1 01‘0 (k = l,...,n).

(11)

Of course, without carrying out the reduction, it is not possible to say which pk are equal to +1, — 1 or 0. However we do have the following Inertia

Theorem which is due to Sylvester. Theorem 6. If the quadratic form q(x) is represented in the form (11), the numbers of positive and negative terms are uniquely determined by q(x). In particular, these numbers are independent of the method of reduction and the

transformation of the variables used to reach the form (11). 214

11.1: QUADRATIC FORMS ON REAL SPACES

Proof. If the theorem were false, then there would be a quadratic form with two representations

q(x)= %+...+§§-§§+l—..._ga = 71i+-~+TI§—’73+1—~-—77§7 where p > g. It could happen that no negative terms appear in either or both

of these representations. 0n the other hand, we can assume that some positive terms actually appear in the first representation. Suppose that the components ft refer to the basis {eh . . ., en}, and 17k referto the basis {f1, . . .fn}. Let L1=L(e1, . . .,e,,) and L2=L(fq+1, . . .,fn). By Theorem 3.2 ;8 there is an

x0 6 L1 n L2, xoaéO. For this vector x0, fp+1=...=§n='q1=. ..=7)q=0, while not all of £1, ..., 51, are zero. From the first representation of 9, it follows that q(xo) > 0 and from the second that q(xo) S 0. Thus we have

reached a contradiction. Exercise Reduce the following two quadratic forms to canonical form and state the

corresponding transformation of the variables. Are these forms positive definite? (a) 9(x)=§i+3§§+5§§—2§152+4§1§3*6§2§3 (b) q(x)=2§§+3§§+8§§+2§1§2—8§1§3+6§2§3.

Solution. (a) q(x) = g? + m + 2§§ where

{1:51— §2+2§3

{z =

252 - £3

:3: %fs q(:c) is positive definite.

0’) 9(3) = i5? + H3 -fi€§ where

51:251 + 52— 453 £2 = %f2 + 553 C3 =

— 1053

q(x) is not positive definite. Problems I. Let A and S be n x 7:, matrices. Show that, if A is symmetric, then so is

S’AS. (Cf. Theorem 1.)

0 2. Show that Theorem 5 is not true for the symmetric matrix A =(1

1 O). 215

CH. 11: FORMS OF THE SECOND DEGREE

3. Prove that, if A is a real n x 11. matrix and detA #0, then the quadratic

form q(x) = g’AA’g is positive definite. 4. Suppose that, for the real n x n matrix A, the determinants A1, . . ., A" (of. Theorem 4) are all strictly positive. Prove that all the diagonal elements a“, are also strictly positive.

5. Suppose that the quadratic form q(:e) has the reduced form (6) with respect to a given basis. What is the polar form of q(:c) with respect to the

same basis? 6. Prove that, if A is a real symmetric matrix, then there is a real number A

such that the quadratic form §'(A +AI) E, is positive definite. 7. HA is a real symmetric matrix, show that E’AE is positive definite if and only if E’A‘l E is positive definite. 8. Prove that the symmetric matrix A of the quadratic form q(x) =E’AE with respect to a given basis is uniquely determined by q. 9. Let f(x, y) = g’An be a bilinear form on the vector space E. Prove that

f(x,x)=0 for all 1: 6E if and only iff(x,y)= —f(y,:c) for all x, y e E, i.e., if A = —A’. 11.2 Hermitian Forms on Complex Vector Spaces

Part of the discussion in 11.1 can be carried over directly to the case of complex vector spaces. However, we meet with difficulties for example when we come to the definition of positive definite quadratic forms. Apart from this, it is preferable on formal grounds to proceed somewhat differently in the complex case as will be described in the present section. In this, we will not repeat proofs which are simple generalizations of those in 1 1.1. If at is a complex number, we will denote the complex conjugate of a by 0?. 11.2.1

Quasi-bilinear Forms

Definition 1. A ‘quasi-bilinearform’ on a complex vector space E is a complexvaluedfunction f(x, y) of two variables x, y e E which is linear in a: and which is dependent on 3/ according to the rules

and

f(x, 2/1 +y2) = f(x, 2/1) +f(x, .112) f(w, 04/) = o'cf(x, y)

for all x, 3/, 3/1, 3/2 e E and all scalars oz. Example 1. We use the same notation as in Example 11.1;1, except that now z(a), y(1-) and k(o, -r) are complex-valued functions of the real variables 0', 7'. Then

fey) = f —1

is a quasi-bilinear form. 216

l

L".+

+1

k(a,r)x(o)37(—r)dadr

11.2: HERMITIAN FORMS 0N COMPLEX SPACES

If E is finite-dimensional, the representation 11.1;(1) corresponds to the following representation of the quasi-bilinear form f(z, y)

f(x,y) = i asem=§’Afi, where aa=f(et,ek) h

(1)

and Where '7] means the matrix whose elements are the complex conjugates

of the elements of 1]. For a given basis, f and A again determine each other uniquely. Under a change of basis §=S§*, A goes into A*=S’A§. Definition 2. The quasi-bilinear form is said to be ‘Hermitian’ if

“21:97) = fwd/)(We write f—(x, y) instead off(x,y).) We note that the property f(2:, say) = o?f(x, y) of Definition 1 is a consequence of the linearity with respect to x and the condition f(y, x) =f(x, y). (f(2:, any) =f(ocy,x) = o'cf(y,x) = o?f(x,y).) The quasi-bilinear form in Example

1 is Hermitian if and only if Ic(-r, o) =l_c(cr, 7). Theorem 1. If the complex vector space E is finite-dimensiomzl, then f(x, y) =E’Afi is Hermitian if and only if A’=Z. A matrix A with this host property is also said to be Hermitian.

The simple proof of this is the content of Problem 5. 11.2.2

Quadratic and Hermitian Forms

Definition 3. A complex-valued function q(x) on a complex vector space E is said to be a ‘quadratic form’ if there is a quasi-bilinear form f(at, y) (the ‘polar form’ of q) such that q(x) =f(x, x).

Theorem 2. A quadratic form and its polar form on a complex vector space determine each other uniquely.

Prwf- f(x, y) = &[q(w+y) + M76 + iy) - q(x —y) - Mm - #1)]Definition 4. A quadratic form on a complex vector space is said to be ‘Hermitian’ if its polar form is Hermitian. We then speak briefly of a ‘Hermitian’ form. Thus, in view of Example 1, a Hermitian form is given by +1

q(x) = f k(o, 1) x(a):7:(r) dad-r -1

whenever k(o, 1') = 76(1', 0).

217

CH. 11: FORMS OF THE SECOND DEGREE

If E is finite-dimensional, then Hermitian forms may be written in the form

906) = E’AE

(2)

where A is Hermitian. Conversely (2) always gives a Hermitian form Whenever A is Hermitian. For a given basis, q and A determine each other uniquely.

Theorem 3. A quadratic form q(x) on a complex vector space is Hermitian if and only if it only takes real values. Proof. 1. If g is Hermitian, then q(:c) =f(x, x) =f(x,w) =§(x). 2. If q only takes real values, then in particular q(z i y) and q(x i iy)

are real. Now if, in the proof of Theorem 2, we interchange x and y, q(x + y) and q(.v — y) do not change while q(x + iy) and q(:c — iy) interchange. Therefore

f(ax) =f(w, y). 11.2.3

The Reduction of Hermitian Forms

In this section we will formulate the theorems corresponding to the results of 11.1.3. We will omit the proofs which are simple generalizations of those in 1 1.1.3. Theorem 4. Given a Hermitian form q(.v) on a finite-dimensional complex vector space, then there is a basis with respect to which q(:c) takes the form

We) = i net. = i will“ k=1

(3)

k=1

where the coefiicients Ah are real. A Hermitian form is said to be positive definite if it never takes negative values and if it only takes the value 0 for x = 0. Clearly the Hermitian form (3) is positive definite if and only if all the coefficients Ab are strictly positive. The determinants A 1, . . ., A n (defined as in 11.1.3) are real for a Hermitian matrix A (see Problem 4).

Theorem 5. If q(x) is a Hermitian form and the determinants A1, .. ., A” are not zero, then there is a basis such that

qua—Alma +A2|€2l +...+ A” Iti__l_

2

‘fi

2

An—l

2

Theorem 6. A Hermitian form is positive definite if and only if Al, are strictly positive.

218

An

11.2: HERMITIAN FORMS 0N COMPLEX SPACES

Theorem 7. (‘Inertia Theorem’) If a Hermitianform is written in two different ways in the form (3), then the numbers ofpositive, negative and zero coefiicients are the same in both forms. Problems 1. Show that AA’ is Hermitian for all matrices A. 2. Show that, if A is a square matrix and detA 5A 0, then E’ AA’E is positive definite.

3. Show that, if A is Hermitian, then so is A"1 (providing A‘1 exists). 4. Prove that, if A is Hermitian, then detA is real. 5. Prove Theorem 11.2;1. 6. Let f (egg) = E’A‘q be a quasi-bilinear form on a complex vector space E. Prove that f(x,a:)=0 for all .7: e E if and only iff(x,y)=0 for all x, y E E,

i.e., if A = 0. 7. If A is a Hermitian matrix, show that S’AS is also Hermitian, where S

is an arbitrary matrix for which the product is defined. 8. Adapt the reduction method described in 11.1.3 so that it can be applied to Hermitian forms and use it to prove Theorems 4, 5 and 6.

219

CHAPTER 12

EUCLIDEAN AND UNITARY VECTOR SPACES As in the previous chapter, we will first consider the real case (Euclidean spaces) in sections 12.1 to 12.3 and then the complex case (Unitary spaces) in section 12.4.

12.1 Euclidean Vector Spaces 12.1.1 The Concept of a Euclidean Vector Space Definition 1. A Euclidean vector space is a real vector space together with a

positive definite quadratic form which is known as the fundamental form’ of the space. If q(:e) is the fundamental form of the Euclidean space E, then we refer to the real number [[x“ = + V{q(a;)} as the length or the norm of the vector x e E. The zero vector is the only vector which has norm 0. For all scalars

cc and all a: 6E, ”ax": |o¢||lx||. A vector a: GE is said to be normalized if “as” = 1. If y e E and 11750, then x: ||y||‘1y is normalized. The polar form of the fundamental form q(x), which we will now write briefly as (22,11) is known as the scalar product of the vectors a: and 3/. Thus

the scalar product is a symmetric bilinear form on E.

Example 1. The vectors of the 3-dimensional space of Euclidean geometry form a Euclidean vector space when ||x|| is defined to be the length of the vector x in the usual sense. It is from this example that the name ‘Euclidean vector space’ is derived. (See Problem 1). Example 2. The real n-dimensional space R“ is a Euclidean vector space when n

"r

”2;“2 = [El 6,2,,

and hence (22,11) = 1‘21 5,617,,

Example 3. The vector space 0 (Example 2.1 ;7) is a Euclidean vector space when

+1

"24a = f more —1

220

12.:1 EUCLIDEAN VECTOR SPACES

+1 (see Example 11.1;3). The scalar product is (x,y) = 1‘ 75(1) 11(1) (11. 1

Theorem 1. (Cauchy—Schwarz Inequality.) For all vectors x, y of a Euclidean vector space

[(96, 31)] S lll HyllEquality holds if and only if x and y are linearly dependent. Proof. 1. If x and y are linearly dependent, 3/: Ana say, then both sides are

equal to [M “2;”? 2. If a: and y are linearly independent, then we think of

9(M+M) = (Ax+py,)«x+py) = Ilacllzlz+2(w,y)MHz/”2M2 as a quadratic form in the variables A, ,u.. This is positive definite, and therefore

”96”2 (x, y) (x, .11) ”l2 = IlflfillzllzllIZ-(flml)2 > 0-

2—

(see Theorem 11.1;4). In view of the Cauchy—Schwarz inequality, it is possible to define the

angle ac between two vectors x and y( ¢ 0) by the formula cos at = (x, y)/”a” My“. Theorem 2. (Triangle Inequality.) For all vectors x, y of a Euclidean vector

space

llxII-llyll S l|x+yll < ||x||+llyllProof. By Theorem I,

”70+yll2 = llxll"'+2(%y)+H!/||2 S (llwlldHlyll)2 and the second inequality follows from this (remembering that both ||x+yH and [l + ”y” are positive or zero). The first inequality follows from the second because

lll = ll(x+y)-yll S Hx+yll+llyllThe vectors x and y are said to be orthogonal if (ray) =0. Since (mg) is a symmetric bilinear form, orthogonality is a symmetric relation between vectors.

The zero vector is the only vector which is orthogonal to all vectors x e E. This means that every Euclidean vector space E forms a dual pair of spaces with itself using the scalar product (egg) (see Definition 6.2 ;3). If E is finitedimensional, the definitions and theorems of sections 6.2.3 to 6.2.6 can be

brought directly into the present work.

221

CH. 12: EUCLIDEAN AND UNITARY VECTOR SPACES

In particular the dual space, or as we will now say the orthogonal complement, Ll of a subspace L of E consists of all those vectors y e E which are

orthogonal to every vector x e L. By Theorem 6.2 ;6, Ll'r =L (for finitedimensional spaces) and the sum of the dimensions of L and L'r is equal to the dimension of E. On the other hand, L 0 LT = {0}, because 0 is orthogonal to itself and, if “as“2 = 0, then x = 0. From these we have the following result.

(Cf. Theorems 2.4;10 and 3.2 ;6.) Theorem 3. Let E be afinite—dimensional Euclidean vector space, L a subspace ofE and L'{ its orthogonal complement. Then E=L {—9 LT, i.e.,for each x e E there is a unique representation x=x1 +272 where .751 e L and x2 6 LT.

Theorem 4. In a Euclidean vector space E, the vectors a1, ..., a, GE are linearly dependent if and only if Gram’s determinant G = Kai) dial]

is equal to zero (see Problem 2).

The elements of Gram’s determinant are the scalar products of the vectors a1, ..., an taken in pairs.

Proof. If 2’: Mai-=0, then i=1 (Zhiaimk) = 2 Ai(a,-,ak) = 0 fork = l,...,r. ‘I

(1)

|

Conversely (1) means that 2 Mai e [L(a1, . . .,a,)]l. 0n the other hand i 2 Ma,- eL(a1, . . .,a,.). Hence by Theorem 3,it follows from (1) thatZ Aim-=0. t 12 Thus a linear relation between a1, ..., a, is equivalent to the same relation between the rows of Gram’s determinant and the theorem follows from Theorem 4.2 ;7.

12.1.2

Orthogonalization

Definition 2. A subset A ofa Emlidean vector space E is said to be ‘orthogonal’ if any two of its elements are orthogonal. It is said to be ‘orthonormal’ if, in addition, its elements are normalized.

Theorem 5. Any orthogonal set A which does not contain the zero vector is linearly independent. Thus in particular an orthonormal set is linearly independent. 222

12.1: EUCLIDEAN VECTOR SPACES

Proof. If 2 Amx=0, then, for any 3/ e A, it follows from the conditions of «:64

the theorem that 2 Away) =Av(y,y) =0, and hence Ay=0. zeA

Theorem 6. Let E be a Euclidean vector space and let A be afin'ite or countably

infinite subset of E. Then the subspace L(A) of E has an orthmwrmal basis. In particular every finite-dimensional Euclidean vector space has an orthonormal basis. Proof. Since L(A) has a finite or countably infinite basis (Theorem 3.1 ;10), we will not lose any generality by assuming that the set A is linearly independent. We will prove the theorem by using the Schmidt Orthogonal-

lzatlon Method which will also produce a technique for constructing an orthonormal basis for L(A). Suppose that the vectors of the linearly independent set A are al, a2, a3, .... We first construct an orthogonal basis e1, e2, e3, by setting 91 = a1

e2 = A21e1+a2

(2)

e3 = A31+A32e2+a3, etc., where the coefiicients are to be determined as follows. Since (e1, e2) :0, it follows that

(91,)‘21 914412) = AZIllelll2+(elaa2) = 0 and, since [[e1||= ||a1||7é0, A21 can be determined from this equation. Since (e1, e3) = 0, it follows similarly that (31»)‘3191+A3292+“3) = A:31ll‘311l2‘l')l32(91:ez)+(91,flls) = A31llelll2+(elia3) = 0

and A31 can be found from this. The condition (e2, e3) =0, similarly leads to the equation A32||e2|12+ (e2,a3) =0 for A32. (Since a1 and a2 may be written as linear combinations of el and e2 from (2), it follows that el and e2 are linearly independent and hence that ||e2|| #0.) It is clear that this Schmidt Orthogonalization Method can be continued

in such a way that a new vector ek is assigned to each vector ak. In view of the method of construction, any two of these vectors ek will be orthogonal. The set e1, e2, e3, . . . is a basis of L(A) because for any 7‘, the vectors a1, . . ., a, may be written as linear combinations of e1, . . ., e, from (2). We can now obtain

an orthonormal basis simply by normalizing the vectors ek, i.e., by replacing ek with Hekll‘lek. We remark that the proof also shows that the vectors ek are uniquely 223

CH. 12: EUCLIDEAN AND UNITARY VECTOR SPACES

determined by the formulae (2) and the condition of orthonormality. It is easy to see that the vectors ek are uniquely determined except for scalar

factors, if we require any two to be orthogonal and each ek to be a linear combination of a1, ..., ah (k: 1,2,3, . . .). With a view to producing a concise technique for calculation, we will

now show that the Schmidt Method can be carried out by using the Exchange Method. We first prove the following result. Theorem 7. A basis B of a Euclidean vector space E is orthogonal if and only if the fundamental form is completely reduced with respect to this basis, i.e.,

”a?“2 = W?) = 2 Au {3e63

(3)

Further, B is orthonormal if and only if A6: 1 for all e e B.

Proof. 1. For any basis B, we have

E (e,f)ieif2 fEB 9(76) = (6M!) = eeB

(4)

Now, if B is orthogonal, then (e,f) =0 for eyef and hence we have (3) with A9: (e, e). Further, if B is orthonormal, then he: (e, e) = 1 for all e e B. 2. From (3) and (4), it follows that (e,f ) =0 for eaéf and hence that B is orthogonal (see Problem 11.1;8). Further A: (e, e) for all e e B so that

B is orthonormal when all )9: 1. Thus, in view ofTheorem 7, we can obtain an orthogonal basis by reducing the fundamental form. The Reduction Method of 11.1.3 can be easily extended to a countably infinite number of variables simply by allowing tableaux with a countably infinite number of rows and columns. Except for scalar factors, this method will give the same orthogonal basis as the Schmidt Method, because again each ek is a linear combination of a1, . . ., ah (k: 1,2,3, . . .). The norms ||ek||, which are used to normalize the basis vectors ck, also appear in the method, because, from (3), we have

||e,,||= +VAk. But Ak=l/3k where 8,, is the kth pivot. Thus the vector eh can be normalized by multiplying by + V31:-

Example 4. The space P3 (Example 2.1 ;5) is Euclidean with the fundamental form +1

Hasn’- = qua) = I were —1

since, if a polynomial x(1-) vanishes on — l < 7 S + 1, it vanishes for all -r and therefore x=0. A basis of P2 is given by ao('r) = 1; (11(7) = 1'; a2(—r) = 7-2; 224

12.1: EUCLIDEAN VECTOR SPACES

a3(-r)=-r3 (see Example 3.1 ;5). With respect to this basis, we have by (4) that 3 9(3) = Z (abak) £4 £1: €,k=0 where +1

/ aim «1,,(1) d1

(0’1" ah)

-—1

+1 I 15+d

l

-1

0

if (i+lc) is odd

2 —_i+lc+l

1 (2+lc)lseven

'f .

.

.

Hence the initial tableau for the reduction of the fundamental form is

£0 £1 £2 Es £0 = 770 ’71

2* 0

0 §

§ 0

0 E

772 §

0 § 0

"Is

E

0

0

7%

Using the elements in the main diagonal in their natural order as the pivots, we reach the following final tableau after four exchange steps.

m1 :2

cs

see—e 0 610% 04%0 £200 47’ £300 0 1% Here the pivots have the values 2, 2/3, 8/45, 8/175. 225

CH. 12: EUCLIDEAN AND UNITARY VECTOR SPACES

Thus the orthogonal basis is co = %“o = h 91 = gal = g

92 = —%(a0—3a2) = ”‘%(1—372)

e3 = ——“’8—5(3a1—5a3) = —3—8§(3«r—5—r3). To normalize these polynomials it is necessary to multiply them by 1/2, 142/3), V(8/45), \/(8/175) respectively. This gives

éo = 55

e = «see—1)

51 = x/(‘éih'

és = V(%)(5T3—3T)

which are the first four normalized Legendre polynomials. (We note, how-

ever, that the Legendre polynomials are not usually normalized in this way but by the condition that they should take the value +1 for 7‘: +1.) All

the Legendre polynomials can be obtained using this method. However for this special purpose there are much simpler methods. (Of. [18], p. 190.) 12.1.3 Orthogonal Transformations and Matrices Let E be an n-dimensional Euclidean vector space and let {1, ..., En and

111, ..., Vin be the components of a vector x e E with respect to two orthonormal bases. Then, by Theorem 7, n

n

”97112 = 9(90) = E 77% = E 5% = E'E = 71"}. k=1 Ic=1 Now, if

n = 85,

(5)

(of. Theorem 5.4; 2), then g' g = E’S’SE and hence, in view of the symmetry ofS’S, S’S = I, i.e., 8—1 = S’, i.e.,S* = S (6) (see 5.4;(5)). Definition 3. A real square matrix S which has the property (6) is said to be

‘orthogonal’. The linear transformation (5) of the vector components which corresponds to S is also said to be ‘orthogonal’. Now suppose that (5) represents a change of basis in which the f-basis is orthonormal. If S is orthogonal, then

11": = E’S’S‘é = E’E = q(x) 226

12.1: EUCLIDEAN VECTOR SPACES

so that the n-basis is also orthonormal (Theorem 7). If S is orthogonal, then, by Theorem 5.4;2, (5) corresponds to the trans-

formation f =S* e=Se of the vector components. Thus the vector components are transformed in the same way as the basis vectors. Putting all this together, we have Theorem 8. A change of basis between orthonormal bases is represented by an orthogonal matrix. Conversely, every orthogonal matrix can be expressed as the representation of a change of basis between orthonormal bases. In the transfer

from one orthonormal basis to another, the vector components undergo the same transformation as the basis vectors. From (6) it is clear that an orthogonal matrix always has an inverse which

is also orthogonal ((S‘l)’=(S’)'1=(S‘1)‘1). Further, the product of two orthogonal matrices 81,82 is also orthogonal ((SISZ)‘1=S§1;ST1 =SéSi= (8182f). Since the identity matrix I is orthogonal, we have the following theorem. Theorem 9. The set of all n x n orthogonal matrices is a group with the usual matrix multiplication and is known as the orthogonal group of degree n.

If S is orthogonal, then (detS’)2 = detSdetS’ = detSS’ = detI = 1 and therefore detS = i 1. Both possibilities occur. For example, when n=1, Sl=(+l) and Sz=(—l) are orthogonal. The first matrix has the

determinant +1 and the second has determinant — l. The set of all orthogonal matrices of degree n which have determinant + l is clearly also a group which is a subgroup of the orthogonal group. If we interpret (5) (referred to an orthonormal basis) as representing an

endomorphism f of E (cf. 5.3), then it follows from the orthogonality of S that f is an automorphism. Also, for all x1, x2 6 E,

(f(x1),f(¢2)) = mm = €119n = gigz = (971,902),

i.e., the scalar product of two vectors and hence the norm of a vector is invariant under the automorphism f. An automorphism with this property is called a rotation of the Euclidean vector space E. If detS’ = +1, we refer to f as a. proper rotation and if detS = -— l, we refer to f as a reflection. It is easy to verify that the matrix S in (5) is orthogonal if (5) represents a rotation

of E. Conversely we now have the following theorem. 227

CH. 12: EUCLIDEAN AND UNITARY VECTOR SPACES

Theorem 10. The set of all rotations of a Euclidean vector space of dimension n and the set of all proper rotations are groups (with the multiplication of map-

pings). The first is isomorphic to the orthogonal group of degree n and is a subgroup of the group A(E). (Theorem 5.3 ;l.)

A rotation of a Euclidean vector space E is often referred to simply as an automorphism of E because all the structural properties of E, i.e., the linear relationships, the norms and scalar products of vectors, are preserved. Exercises

1. Use Gram’s determinant to prove that the vectors ak('r) = 1-" (k = 0, 1 , 2, 3) in the vector space 0 (Example 2.1 ;7) are linearly independent. Solution. Make 0 Euclidean as in Example 3, and Gram’s determinant is equal to 256/23625950. 2. (a) Prove that the vector space 0’ (Example 2.1;7) becomes Euclidean

+1 With the scalar product (at, y) = I 12:1:(1) y(r) d1. —1 (b) In this Euclidean space, orthogonalize the sequence of vectors 010:], “1:7, “2:72, (13:1'3.

3 Solution. If the fundamental form q(x) = ., Z=0(ai,ak)f¢£,, (on the subspace

co = $00 =

also

e1= gal =

3049:

P3 9 0) is reduced by the exchange method, then the corresponding change of the basis vectors is

= _;2§a0+u_6a2= 34572 —3) 33 = —§§§al+‘$a3 = 468W —5T)

3. If the polynomials 2(7) are considered only for the values 1- = — 2, — 1, 0, l, 2, then they form a 5-dimensional vector space E. (Two polynomials are considered to be equal if and only if they take the same values for the five given values of 1- (see Example 12.2 ,2) ) The set of vectors a1(1-)— - 1‘ (i— —— 4) ls a basis of E and E 1s Euclidean with the fundamental form q(x) =

1:

2 2 [x(lc)]2. Orthogonalize the given basis.

Solution. e0: %, e1 =ilfi7" 92 = fih'z _ 2), e3 = —71§(51'3 —_ 171-), e,1 = $63574 _ 1551.2 + 72).

228

12.1: EUCLIDEAN VECTOR SPACES

The norms of these polynomials are 1/(9, V(fi—), V(fi), 1/(35—2), 1/(% . Problems

1. Prove the assertion made in Example 1. 2. Prove that Gram’s determinant (Theorem 4) is strictly positive if (11, ..., a, are linearly independent. (Hint: Consider the quadratic form r

.2

(a¢,ak)fiflc-)

1,Ia=1

3. Use Gram’s determinant to show that the polynomials (10(1) = l, . . ., (14(1) = 1'4 are a basis of P,1 (Example 2.1 ;5). 4. Prove that, if E is a Euclidean vector space, {0.1, . . ., en} is a basis of E and a1, . . ., can are real numbers, then there is exactly one vector x e E such that (x,ek)=ak for k=l, ..., n.

5. Show that the set L of all odd functions in the Euclidean vector space 0 (Example 3) (i.e., those functions x(1-) for which x( — -r) = —x(1-) in

— 1 < 1- S +1) is a subspace of 0. Which functions are in Ll? 6. Let E be a Euclidean vector space and let a e E. For which vectors x e E are (x+a) and (x—a) orthogonal? 7. Prove that

[:21 51.7142 s [121 6%] [1211):] for all real numbers {51, ..., E”, 171, .. ., 17". 8. Prove that +1

2

+1

+1

[ f x(-r)y(-r)d-r] sh maymflf [y(r)]2d‘r]forallx,y60. ._ 1

—1

—1

9. Prove that Pythagoras’s theorem,

‘Hxi 3/”2 = llxll2+ llyll2 if 9: and y are orthogonal} is true in any Euclidean vector space. 10. Show that a 2 x 2 orthogonal matrix is either of the form cos a sin cc — sin cc cos oz

)

or of the form

( cos a sm 0:) . sin 0: —cos oz

Which of these represent proper rotations? 11. Show that, iff is a reflection of a Euclidean plane (i.e., a 2-dimensional

Euclidean space), then f2 = e. 12. Suppose f is a rotation of a. finite-dimensional Euclidean vector space E. What is the dual endomorphism off ’9‘ 229

CH. 12: EUCLIDEAN AND UNITARY VECTOR SPACES

13. Which rotations of a Euclidean vector space—in particular a Euclidean plane—are symmetric endomorphisms? (Cf. Definition 1.3.2 ;1.) 14. Show that, if {eh . . ., en} and {f1, . . .,f,,} are orthonormal bases of the

Euclidean vector space E, then there is exactly one rotation t of E for which t(ek)=f,, (k: 1, . . .,n). 15. Prove that, given a basis of a real vector space, then there is a fundamental form which makes this basis orthonormal. 16. Prove that a subspace L of a Euclidean vector space E is itself a Euclidean vector space when llxll (x e L) is defined to be equal to the norm of x in E. 12.2 12.2.1

Approximation in Euclidean Vector Spaces, The Method of Least Squares The General Approximation Problem

Suppose that we are given a finite set of vectors a1, ..., a, in a Euclidean vector space E and suppose that, for an arbitrary vector 2 e E, we wish to find a vector x0 6 L=L(a1, . . .,a,) for which “2—210“ is as small as possible, i.e., [[z-x0[|=min”z—x||. xeL

Theorem 1. There is exactly one vector x0 6 L for which

llz—xoll = m111||74--90HxeL

This is the vector :60 which is uniquely determined by x0 6 L and 2 —x0 6 LT.

Proof. By Theorem 12.1 ;3 applied to the subspace L(L,z), there is just one vector x0 6 L such that z — x0 6 LT. Now suppose that x e L and :1: 9E 220. Then

”2-9?”2 = ”(z-zo) + (mo-96)”2 = Ilz—xol|2+2(z—xo, xo—x) + li$0”$”2Since z—xo e L1 and xo—x e L, the middle term vanishes. Further, since xo—xgéO, it follows that ”z —x||2 > [lz —x0||2 and the theorem is proved.

In the applications of the theorem it is important to be able to calculate 1’

the coefficients M, in the representation :50 = Z Ahab. Ic—l Since (z —x0,ai) :0 for all i=1, . . ., r, it follows that kél (ak’ai) A]: = (2,04')

(7; = 1, . . .,’r)

(l)

is a system of 1‘ linear equations for the r coefficients Ak. Every solution 230

12.2: THE METHOD OF LEAST SQUARES

(A1, . . .,h,.) of this system provides a representation of 2:0. The A1, . . ., A, are uniquely determined if and only if a1, . . ., a, are linearly independent. It is only in this latter case that the determinant of the system (1) is not equal to zero (Theorem 12.1 ;4). Definition 1. The vector me which is uniquely determined as in Theorem 1 is

said to be the ‘best approximation to’ 2 as a linear combination of the vectors “1, . . ., a,”

The following examples Will illustrate three important types of problem to which the theory may be applied. Example 1. Let E be the Euclidean vector space 0' (Example 12.1;3) and let a], e 0 be the polynomial ak(r) :7” (16:0, . . .,r). Then, for a given continuous function 2 e 0, the method described above will produce a ‘best approximation in mean squares’ for z on the interval —- 1 oo n

and therefore

. ux u = Ialuenu. hm ”Till

k—> no

n

273

CH. 13: EIGENVALUES AND EIGENVECTORS

Hence by division ”Wk“ = I)“.

b—m ”xiv—1”

(8)

From (6) it further follows that x

A—E_§nen = yk

and, since

lim ”3],,” = 0, k—>oo

lim 1’; = fine"

(9)

k—>oo

where convergence is defined in E as in 12.3 ;(2). Thus the sequence of vectors 971

”2

953

A”,

A3,

A3,

converges to an eigenvector corresponding to the eigenvalue A“ with the greatest absolute value. Example 5. Suppose that the endomorphism f is represented by the matrix A =

5 —2 —4

—2 2 2

— 2 5

with respect to some basis. We choose $0 to be the vector which has the components 1, 0, 0 with respect to this basis. Then the components of the vectors wk are

x1=(5 ,—2 ,—4 ) x2=(45 ,—22 ,—44) x3 = (445 , —222 , -—444) x4 = (4445, —2222, —4444) etc. We see immediately that

”k| ”xle—l”

tends to 10, so that the eigenvalue

with the greatest absolute value is A3: i 10.

From (9), we also see that the corresponding eigenvector has the components 2, — 1, —- 2 (apart from a scalar factor). Finally it is not possible that As: — 10, because, in this case, for sufficiently large h, each component of wk would change sign at each iteration. Hence A3 = + 10. The evaluation of the norm ”k at each step involves a certain amount of

calculation which is reduced if the given norm is replaced by a new one equal to the absolute value of the greatest component of x. We will denote this by 274

13.6: NUMERICAL CALCULATION OF EIGENVALUES

[m]. It is easy to verify that “$l ||en|| and ”3/i may be replaced by [wk], [en] and [yk] in the arguments (in particular (7)) which led to the conclusion (8), and hence we have lim [wk] b—Nao [Wk—1]

= WI-

(10)

By introducing the new norm [wk], we have also made it possible to drop the condition that E should be Euclidean. (We remark however that E can

always be made Euclidean by putting Hx||2= 2 §% where {k are the comk ponents of x with respect to some basis and f is represented by some matrix With respect to this basis.) From the representation of yk in (6), we see that the greater the quotient _"

is, i.e., the more A" exceeds the other eigenvalues, then the more rapid

A”—1

is the convergence in (8), (9) and (10). From (9), we see that the components

of wk become very large or very small as 10 increases if [Anl 9e 1. In order to avoid this, we can divide wk by [lxkll or [wk] after

"k

[wk]

"wk—1“

[xiv—1]

has been calculated. In this way, we ensure that llxk||=l or that [wk]: 1. This has the added advantage of making it easier to recognize the eigenvector as a limit. If E is Euclidean and f is symmetric then in general it is possible to find the eigenvalue A" to within a given degree of accuracy with less efi'ort than is involved in (8) or (10) by using the Rayleigh quotient (see 13.5 ;(6)). If

en is again an eigenvector corresponding to A” then An = P(en) =

«we.» . (en: en)

Hence, the Rayleigh quotients

(xk.f(xk)) (whack) p(a:k) =——

( Ic=1,2,3,... )

are approximations to A“. Ifwe assume that ||xk|| = ”en" = l and xk=en + dank, then, after a short calculation, we obtain

P(xk) _An = (f(dxk)—An dxk’ dxkl-

(11)

Since this is a quadratic form in dz,“ p(xk) will approach A” particularly quickly when f is symmetric. 275

CH. 13: EIGENVALUES AND EIGENVECTORS

We will check this by looking again at Example 5. The matrix A is symmetric and hence it represents a symmetric endomorphism of Rn (where (x,y)=2k§k17h). The calculation gives for instance

p(a:2)

44445 we E5 22 445

.

9 998875

_ [L5], [x4]

Finally we remark that the Rayleigh quotient also converges to the eigenvalue in the case of an arbitrary matrix A . However the extra rapidity of the convergence is lost when A is not symmetric. In practical applications the main interest is often not in the eigenvalue with the greatest absolute value but in the one with the least absolute value. For example this is the case in oscillation problems, where the eigenvalues represent the eigenfrequencies. The iteration method can still be applied, by using Theorem 13.1 ;7. In view of this theorem, the eigenvalue of A with the least absolute value is equal to the reciprocal of the eigenvalue of A“1 with the greatest absolute value. There are methods of calculation which are based on this idea, but do not in fact require the inverse of the matrix to be completely calculated. (cf. [8], p. 286. For other numerical techniques for the calculation of eigenvalues see [27] chap. 10.)

Finally we will indicate how, after finding A” and a corresponding eigenvector e”, it is possible to find the eigenvalue And, assuming thatf and A are symmetric. In this case, every eigenvector corresponding to An—l is ortho-

gonal to en and the orthogonal complement ofL(e,,) is mapped into itself byf (Theorem 13.2 ;4). If we assume that M,,_1| > |/\k| for k: 1, ..., n—2, then we can start with any vector x0 such that ($0, en)=0 and again apply the iterative method already described for A”. In this way we will find Ana and

an eigenvector corresponding to this eigenvalue. However in actual numerical calculation there is always a risk that, during the method, the orthogonality to en will be lost due to rounding errors. (See Example 7.) In View ofthis possibility, we must check the orthogonality at each step and ifnecessary we must restore it by replacing wk with xk— (wk,e,,)e,, (assuming e,, is normalized).

The other eigenvalues M2, Aka, . .. may also be found by the same method.

Example 6. We choose the same 3 x 3 matrix as in Example 1 and start with the arbitrarily chosen vector «:0: (1,7,3). Then :51: ( —43,59,51). In

the following table, we list the results for several values of k writing the vectors my, in normalized form so that lkll = 1. We will give the components

to five decimal places although more places are effectively calculated. 276

13.6: NUMERICAL CALCULATION OF EIGENVALUES

’9

xk

[xkl/[wk—11*

Mitt—1)

5 10 l5 25 35

- 0'54815, 060559, 057687 0'58189, —0‘57278, —0'57734 — 057677, 057793, 057735 — 0'57734, 057736, 057735 -—0'57735, 057735, 057735

— 4'06489 —2'77900 — 3‘01527 — 3‘00026 —3-00001

— 11-72404 —2'83200 -— 302293 — 300040 —3'00001

For convenience the method of evaluating [xk]/[x,c_1]* has been adapted so that it is always the component of x], which has the same index as the

component of xk_1 with the greatest absolute value which is divided by the latter. This makes no difference to the final result. From this table, we can see that p(xk) converges to the eigenvalue A3 = — 3 and that xk converges to the eigenvector ( — l, 1, 1). In this example p(:€k) does not converge any more rapidly than [wk]/[x,c_1]*. Note that A is not symmetric.

Example 7. 0-22 0-02 0-12 0- 14

0-02 0-14 0-04 — 0-06

0-12 0-04 0-28 0-08

0-14 — 0-06 0-08 026

Note that A is symmetric. We start with x0: (1, 7,3,9)

70' 2 5 10 15

xi: 0-58045, 057755, 0-57735, 067735,

003537, 0-00163, 000004, 000000,

057421, 0-57852, 057739, 057735,

057629 0-57598 057731 057735

[xk1/[xb—1]*

Mme-1)

043357 047939 0-47996 048000

046796 047999 048000 048000

We see that 0-48 is an eigenvalue with the eigenvector (1,0, 1,1). We will now repeat the method starting with x0: (0,1,0, 0), i.e., with a vector which is orthogonal to the eigenvector just found. Providing the

rounding errors do not interfere, we should find the eigenvalue with the second greatest absolute value and a corresponding eigenvector. 277

CH. 13: EIGENVALUES AND EIGENVECTORS

[why[wk—11*

P(xlc—1)

—0'57650 —0'57735 -0'57735 —-0'57735

0'23231 0'23977 0'23988 0-23994

0'23953 0'24000 0'24000 0'24000

—0'23477, 0'52746, 0'29269, —0'76223 —0'57735, 0‘00127,—0'57608, —0'57862

026928 047833

025132 048000

331:

7‘ 5 10 11 12

30 40

0'01747, 0'00056, 000028, 000014,

0'59567, 0'57791, 0'57763, 057749,

0'55903, 0'57679, 0'57707, 057721,

As far as 10 = 12, the solution approaches the eigenvalue 0-24 and the eigenvector (0,1,l,—l). However, the rounding errors then start to interfere

(nine significant places were used in the calculation) and we are led once again to the greatest eigenvalue. Finally we note that, in this example, the Rayleigh quotient always converges more rapidly than [wk]/[xk_1].

Exercises

1. Use the Gaussian Algorithm to find the eigenvalues and eigenvectors of the matrices

(a)

——8 A =( 9 9

—27 28 27

Solutions. (a) A1=10 A2=A3=l

9

_) a”:

5 6 2 —4

—4 —6 0 4

—2 —l —l 2

—1 —2 2 2

x1=(1,—1,—-1) $2=(l90) 1)

$3=(0, 1,3) x1=(l,2,0,—1) 9:2:(2, 3, 1,—2) x3=(3, 3, 1,—2) x4=(l, l,0,—l).

2. Use the iterative method to find the greatest eigenvalue and the corresponding eigenvector of the matrix A in Exercise 1.

278

13.6: NUMERICAL CALCULATION OF EIGENVALUES

Solution. Starting with x0: (1,0,0), the method produces in sequence the vectors

$1 (— 8, 9, 9) x2 (— 98, 99, 99) x3 = (—998, 999, 999) From this, it is easy to see the eigenvalue 10 and the corresponding eigenvector (— l, l, l).

CHAPTER 14

INVARIANT SUBSPACES, CANONICAL FORMS OF MATRICES If the finite-dimensional vector space E has a basis consisting ofeigenvectors

of the endomorphism f of E, then it is particularly easy to see how this endomorphism operates on E. Each vector in the basis is simply multiplied by the corresponding eigenvalue and then it is straightforward to find the image of an arbitrary vector x e E (see 13.6;(6)). By Theorem 13.1 ;4, the

matrix which represents f with respect to some basis is reducible to diagonal form and hence is similar to a diagonal matrix. Apart from the order of the diagonal elements, this diagonal matrix is uniquely determined by f We will now investigate the situation in the case of an arbitrary endomorphism to see if there are any similar properties and also to see if every

square matrix is similar to some matrix which has a simple canonical form. 14.1 14.1.1

Invariant Subspaces Vector Spaces over an Arbitrary Field

So far we have considered real and complex vector spaces. The most important property of the real and complex numbers which we have used is the possibility of adding and multiplying them together (i.e., the existence of two binary operations on the set S of scalars) subject to the following axioms.

l. (a+/3) +y= oc+ (13+y). Addition is associative. 2. or + [3 = B + ac. Addition is commutative. 3. There exists an element 0 e S such that 0+0: = aforallocES.

4. Corresponding to each element a: e S, there is an element — cc 6 S such that oc+ ( — oz) =0.

5. (ac/3) y = oc(/3y). Multiplication is associative. 6. «,8 = [3a. Multiplication is commutative. 7. There exists an element 1 e S such that lac = aforall oz ES.

280

14.1 : INVARIANT SUBSPACES

8. Corresponding to each element «(#0) ES, there is an element “—1

such that aux—1 = l . 9. «([3 + y) = 0:3 + any. 10. laéO.

Distributive.

An algebraic structure which has two binary operations satisfying axioms l, ..., 10 is known as a field. The sets of real and of complex numbers with their usual addition and multiplication are two examples of fields, but of course there are infinitely many other fields of which one well-known example is the field of rational numbers.

We can now redefine the concept of a vector space, replacing the fields of real or of complex numbers by an arbitrary field F. We will then speak of a vector space over the field F, and refer to F as the ground field of the vector

space. The elements of the field F are the scalars for all vector spaces overF. A great many of the previous results are also valid for vector spaces over an arbitrary field F. In this last chapter, we will now deal basically with vector spaces over arbitrary fields because it is only in this case that the theory attains its true value while the main results take on a very special

form When applied to real or complex vector spaces. 14.1.2

Polynomials

A polynomial over afield F is an expression of the form

r= a0+a1§+...+ans"= i «he

m say), then the

sum of f and g'15 defined by

f+g = [:0 (we)?

(2)

where BM“ = . .. =l3n=0.

m «#316 I: +l. 0%«k

'o‘:M:

II

The product of f and g is defined by

(3) 281

CH. 14: INVARIANT SUBSPACES

If f and g have degrees n and m then fg has degree (m+n).

It is easy to verify that these operations of addition and multiplication of the polynomials over F satisfy the axioms 1, .. ., 6 and 9 in the definition of a field (cf. 14.1.1). Thus the polynomials over a field F form an algebraic

structure which is known as a commutative ring. (Fields are a special type of commutative ring.) We refer to this ring as the ring of polynomials over the field F. The elements a e F can be considered as polynomials over F, in that they are the polynomials of degree 0 (except oz=0, which has no degree). If we apply Definition (3) to this special case, then we obtain the following rule

for the multiplication of the polynomial (l) by an element a e F.

at = 2": (was k=0

(4)

Division in the ring of polynomials over the field F has many properties which are similar to the properties of division in the ring of integers. We will now set out the most important of these, but we will not include their proofs which may be found in [24] chapter 3.

The polynomial f is said to divide the polynomial g, in symbols f lg, if there is a polynomial h such that g =fh. If f |g and f is neither equal to egg (cc 6F) nor of degree 0, then f is said to be a proper divisor of g. Every

polynomial of degree 0 (i.e., every a: 6F, (29:50) is a divisor of every polynomial. If f ]g, then g will also be referred to as a multiple of f. A greatest common divisor (god) of two polynomials f and g is any

polynomial h which has the following two properties. 1. h|f and hlg. 2. Every common divisor of f and g is also a divisor of h.

Apart from a factor on e F, the g.c.d. h of f and g is uniquely determined by f and g and there exist polynomials f1 and g1, such that

h = f1f+g1g.

(5)

If 1 is a god. of f and g, then f and g are said to be relatively prime.

A least common multiple (l.c.m.) of two polynomials f and g is any polynomial h such that l. f|h and g|h. 2. Every common multiple of f and g is also a multiple of h.

Apart from a factor ac 6F, the l.c.m. h of f and g is also uniquely determined by f and g.

The polynomials which correspond to the prime integers are the irreducible polynomials. A polynomial is said to be irreducible if it has no proper divisors and is not of degree 0. If f is irreducible, then so is af for all cc 6 F, cc yé 0.

282

l4. 1 : INVARIANT SUBSPACES

Every polynomial has a representation as a product of irreducible polynomials which is unique in the following sense. If

f= plpzmpr = q1q2"'qs are two such representations, then r=s and, with a suitable renumbering,

pk is identical with qk except for a factor ask 6 F (k = l, . . ., r). We see that the polynomials of degree 0 play the same part in the ring of polynomials as the two numbers i l in the ring of integers. They are known as the units of the ring. If f is monic, and the irreducible factors p1, . . ., p, are also monic, then these are uniquely determined by f. An ideal in the ring P of polynomials over the field F is a subset J of P with the following properties.

1. With respect to addition, J is a subgroup of P (i.e., Jaém and, if f, g EJ, then f—g EJ see [24] p. 372). 2. Iff eJ and g GP, then fg eJ. If an ideal J ofP does not consist only of the zero polynomial (zero ideal),

then it contains a polynomial f0 aé 0 of minimum degree. Apart from a factor a e F, this is uniquely determined by J. Because, if go 9é 0 is a second polynomial of minimum degree in J, then, by (5), the g.c.d. h of f0 and go is also in J and therefore has the same degree as f0 and go. It follows that f0

and go only differ by a factor ac 6F. Now, if 1' EJ, then again by (5), the god. of f and f0 is also in J and hence f0 |f. Thus every polynomial f e J is a multiple of f0, i.e., f0 generates the ideal J. An ideal which is generated by a single element is generally referred to as a principal ideal. Thus every ideal

in the ring of polynomials over a field F is principal.

14.1.3 Minimal Polynomials and Annihilators Now let E be a vector space of finite dimension n over a field F, let f be an endomorphism of E and let f be the polynomial (1) over F. By 5.3.1

ff = 12:10 “Icfk

(6)

is again an endomorphism of E (fk means the product of 1:: factors each equal to f and 050f0 is the identity endomorphism multiplied by etc). The endomorphism ff is constructed by substituting f for f in the polynomial

f. If g is a second polynomial over F, then we can also form g, and we have

(f+g)f = ff+gf

(7)

(fg)f = ff g;233

CH. 14: INVARIANT SUBSPACES

The polynomial f=1 corresponds to the identity endomorphism and the polynomial f =0 corresponds to the zero-endomorphism. Now (7) means

that the mapping f—>ff is a homomorphic mapping of the ring of polynomials into the ring of endomorphisms of E (see 5.3.1). Definition 1. A subspace H of E is said to be :f-invariant’ iff (H) is contained in H, i.e., ifH is mapped into itself by the endomorphism f.

If H is f-invariant, then it is easy to see that ff(H) E H for all poly-

nomials f over F. Now, if H is anf-invariant subspace of E and L is an arbitrary subspace of E, then we consider the set J g P of all polynomials f over F such that ff(L) E H. Then this set J is an ideal. Because, if f eJ and g eJ, then (f— g)f(L)=ff(L)—gf(L) g H. Further, if f eJ and g 6P, then (gf)f(L) =gf ff(L) g g,(H) g H. Finally 0 E J and therefore Jaéz. This ideal J is not the zero-ideal. Because, by Theorem 5.2 ;11, the n2 + 1

endomorphismsf0 = l, f, f2, . . ., f”’ are linearly dependent which means that there is a polynomial f #0 such that ff=0. Hence ff(L) E H and therefore f e J. The polynomial foaéO which generates J is the one which is uniquely

determined (apart from a factor 05 6F) as the polynomial with the least degree such that f0f(L) S; H. We now consider the following two special cases. 1. L=E, H = {0}. Then f0 is the unique (apart from a factor cc 6 F) polynomial of the least degree such that f0f=0 (the zero-endomorphism). 2. Let x e E. If we put L=L(x) and H = {0}, then f0 is the unique (apart

from a factor oz 6 F) polynomial of the least degree such that f0f(x) :0. Definition 2. The ‘minimal polynomial’ of the endomorphism f is the unique manic polynomial m of the least degree such that mf=0. The ‘f-annihilator’

of a vector x e E is the unique manic polynomial h of the least degree such that hf (x) = 0. The f-annihilator of x=0 is the polynomial 1.

Theorem 1. If s is the degree of the f-annihilator h of x e E and t; s, then the vectors as, f(x), ..., f"_1(x), are linearly independent, and the vectors x, f(x),

..., fs_1(x), f’(x) are linearly dependent. Proof. 1 . If the first set of vectors was linearly dependent, then there would be a polynomial k of degree < s such that kf(x) =0 and this contradicts the definition of h. 2. If t :8, the second part of the theorem follows from the fact that

hf (x) is a. linear combination of x, . . ., f‘(x). Now suppose that this part of the 284

14.1: INVARIANT SUBSPACES

theorem is also true for some t 2 8. Then f‘(x) e L(x, ..., f“1(x)). Therefore

f+1(x) e L(f(x), . . .,f”(x)) g L(x, . . .,f“"1(z)) so that the assertion is also true for t+ 1 and hence by induction for all t.

14.1.4 Invariant Subspaces Theorem 2. If f is a polynomial over the field F and f is an endomorphism of

E, then ker ff is an f-irwariant subspace of E. (ker f, denotes the kernel of the endomorphism ff (see Definition 5.1 ;2).)

Proof. If a: e kerff, then ff(x) =0 and hence

we» = me» = f(0) = 0, i.e.,flx) 6 km}We note that in general not all of the f—invariant subspaces can be found in this way, for example, when f=e (the identity endomorphism), kerfe

is either {0} or E depending on the polynomial f, whereas in fact every subspace of E is e-invariant. Theorem 3. If f |g, then kerff E kergf. Proof. If x e ker ff, then ff(x) =0 and hence, if g = hf, gf(x) = hfff(x) =0, i.e., x e ker gf. Theorem 4. If f is a proper divisor of g and g is a divisor of the minimal

polynomial m off, then kerf, is a proper subset of ker gf. Proof. We put m=hg and f1=hf. Then f1 is a proper divisor of m and therefore deg f1 < degm. Hence there is an .120 E E such that f1f(x0) 9&0. If we put yo: hf(x0), then ff(y0) = (hf)f(xo) =f1f(x0) #0 and hence yo 5% kerff. 0n the other hand g,(yo)=(hg)f (x0)=m,(x0)=0 and hence yo ekergf. The assertion follows from this and Theorem 3.

Theorem 5. If h is a god. of f and g, then kerb, = ker ff {1 ker gf.

Proof. 1. From Theorem 3, it follows that ker h, E ker f, n ker g,. 2. Suppose xekerff n kergf, i.e., ff(x)=gf(x)=0. Using (5), it

follows that l11(1”) = fl; ff(x)+glf g;(“) = 0 and therefore that a; e ker hf.

CH. 14: INVARIANT SUBSPACES

Theorem 6. If h is an l.c.m. of f and g then ker h, = ker ff+ ker g,.

Proof. 1. From Theorem 3, it follows that kerff g ker h, and ker g, g ker-hf and therefore kerff+ker g, g. ker hf. 2. Suppose a: e kerhf. If we put h=f1f=g1g then 1 is a g.c.d. of f1 and g1. Therefore by (5) there exist f2, g2 such that f2 {1 + g2 g1 = 1. Putting x1=f2f f1f(x), ff(x1)= (fzh)f (x) =0 and therefore ml 6 kerff. Similarly wz=g2fg1f(x) e kergf. Further x1+x2=e(x)=x. Hence a: e kerff+kergf. Theorem 7. If f and g are relatively prime, then ker (lg), = ker ff ® ker g,.

Proof. This follows from Theorems 5 and 6, remembering that l is a. g.c.d. and fg is an l.c.m. off and g. Definition 3. A subspace L of E is said to be ‘f-irreducible’ if L is f-in'variant and there is no decomposition L=L1 (-9 L2 of L into a direct sum of non-zero

f-invariant subspaces. Theorem 8. Iff is an endomorphism of the vector space E of finite dimension n, then E is equal to the direct sum off-irreducible subspaces.

Proof. 1. For n: 1, the theorem is obvious because E is itselff-irreducible. 2. We therefore use induction on the dimension n. Suppose that the theorem is true for all spaces of dimension less than n, and suppose that

dim E =n. There is nothing to prove if E is already f-irreducible. Suppose therefore that there is a decomposition E=Ll ® L2, in which each Li is f-invariant, each Ligé {O} and therefore dimLi