333 82 22MB
English Pages 496
ADVANCED CALCULUS_ ALLEN DEVINATZ
Northwestern University
_./
HOLT, RINEHART AND WINSTON New York Montreal
Chicago Toronto
San Francisco London
Atlanta
Dallas
Copyright
© 1968
by Holt, Rinehart and Winston, Inc.
All Rights Reserved Library of Congress Catalog Card Number: 68-18409
2689453 Printed in the United States of America
1 2 3 4 5 6 7 8 9
PREFACE The contents of this book represent a somewhat expanded version of a one-year course that I have given from time to time since 1961.
Those taking the course have been mainly undergraduate and first
year graduate students concentrating in mathematics. Occasionally,
students from engineering and the physical sciences have taken the
course and have told me they enjoyed it. I recommend that students
who select a course such as this should generally have a little more
mathematical maturity than that afforded by the usual freshman
sophomore courses in the calculus. One excellent way to gain such
maturity is through a beginning course in linear algebra, although the contents of such a course are not a specific prerequisite for the under standing of this book.
Section 1.1 on logic is to be read by the student. Of course, such a
brief introduction is not intended to teach the student the elements
of logic, but rather to make him aware of the formal processes involved
in mathematical reasoning. It may not be too well understood on the
first reading, but if the student will reread it several times during the course, it probably will begin to appear more reasonable. The notation
of the propositional calculus is to be viewed as a concise shorthand for
mathematical statements. My experience has been that students learn to use the notation in a reasonable way in a relatively short time and with very little trouble.
For those instructors who do not wish to spend time on an extended
treatment of the real number system, I have arranged matters so that
they can begin the discussion o(real numbers with Section 1.8. In that section, I have given what amounts to a set of axioms for the real num
ber system, the more usual starting point for a beginning course in
analysis.
If, in going through the material, my peers should at times accuse
me of being pedantic, I plead guilty to the charge; my aim in doing this has been deliberate. All too often students beginning the serious
study of mathematics get the idea that a vague or seemingly trivial
point should be waved away. I have triecrto convey the idea to the novice
that he should be sure that he can really prove these seemingly trivial
points.
As far as the differential calculus is concerned, there is probably not too much choice in the way one can proceed. As for the integral
calculus, I have chosen the more cumbersome and less general method
of Riemann-Darboux integration and Jordan content rather than one iii
of the more modern theories of measure and integration. Although I do not feel that a historical approach to a subject i� necessarily always
the best, in the case of integration my view is that a student cannot fully appreciate or even fully understand the more modern theories until he has seen the gradual and natural evolution of the ideas involved. I make absolutely no claims to originality. I have no gimmicks or
special pedagogical devices as aids in understanding. Mathematics is a difficult subject; I have tried to set down a small but important portion of it in as straightforward, clean, and concise a way as I know how, consistent with the level of student to whom it is addressed. Only the readers can ultimately decide whether or not I have succeeded. I am grateful to several friends for their help in preparing the manu
script. I am deeply indebted to Sam Lachterman of St. Louis University. He read the entire manuscript, pointed out a large, but finite, number of errors, and showed me how to make several proofs in a shorter and more elegant way. Jacob K. Goldhaber of the University of Maryland read several of the chapters and gave me some excellent advice. Thanks are also due to my former colleagues, Sebastian Koh and A. Edward Nussbaum of Washington University. The former used a preliminary version of the first five chapters in his class and made several sugges tions for improvement, while I had several helpful conversations with the latter on the subject matter of the book. Above all I am grateful to the various classes of students who endured varying versions of the course. Evanston, Illinois March 1968
A. D.
CONTENTS
v
Preface
CHAPTER 1
I
THE REAL NUMBER SYSTEM
I. I
Some Ideas about Logic
1.2 1.3 1.4 1.5 1.6 1. 7 1.8 1. 9
Sets Relations and Functions The Natural Numbers The Integers and the Rationals Countability The Reals A Review of the Real Number System and Sequences
Properties of the Reals
CHAPTER 2 2.1 2.2 2.3 2.4
The Heine-Borel Theorem and Uniform Corttinuity Monotone Functions Limit Superior and Limit Inferior
I
Convergence Tests Decimal Expansions Sequences and Series of Functions Infinite Products
I
69 76 84 90
INFINITE SERIES
Series of Real Numbers
CHAPTER 4 4.1 4.2 4.3 4.4
LIMITS
The Limit Concept and Continuity
CHAPTER 3 3.1 3.2 3.3 3.4 3.5
I
1 16 22 26 31 38 44 55 62
99 110 118 126 131
DIFFERENTIATION
The Derivative Concept Differentiation Rules Mean Value Theorems Taylor's Remainder Formulas
138 145 149 158 v
4.5 4.6
Power Series The Weierstrass Approximation Theorem
165 178
CHAPTER 5 I INTEGRATION 5.1 5.2 5.3 5.4 5.5
Riemann-Darboux Integrals Properties and Existence of Riemann-Darboux Integrals Improper Integrals Riemann-Stiel tj es Integrals
183 190 201 210
Functions of Bounded Variation and the Existence of Riemann-Stiel tj es Integrals
217
CHAPTER 6 j HIGHER DIMENSIONAL SPACE 6.1 6.2 6.3 6.4 6.5 6.6 6.7
Real Vector Spaces Euclidean Spaces Topology in En Continuous Functions Linear Transformations Determinants Function Spaces
228 235 241 248 256 274 293
CHAPTER 7 I HIGHER DIMENSIONAL DIFFERENTIATION 7.1 7.2 7.3 7.4 7.5 7.6
Motivation Directional Derivatives and Differentials Differentiation Rules Higher-Order Differentials and Taylor's Theorem The Inverse and Implicit Function Theorems Maxima and Minima
CHAPTER
305 309 319 324 332 344
8 I HIGHER DIMENSIONAL INTEGRATION
8.1 8.2 8.3
Riemann-Darboux Integrals Jordan Content Existence and Properties of Riemann-Darboux Integrals
353 359 366
8.4 8.5
Iterated Integration The Transformation Theorem for Integrals
CHAPTER 9
I
374 380
THE INTEGRATION OF DIFFERENTIAL FORMS
I. LINE INTEGRALS
9.1 9.2 9.3 9.4
Motivation and Definitions The Length of a Curve A Special Case of Stokes' Theorem Closed and Exact Differentials
396 403 407 416
II. SURFACE INTEGRALS
9.5 9.6 9.7 9.8 9.9 9.10
Motivation and Definitions The Algebra of Differential Fonns Closed and Exact Forms Manifolds Integration on Manifolds Stokes' Theorem
Symbols Index
429 437 452 455 461 467 479 482
2 I THE REAL NUMBER SYSTEM
new statements that are called true. In this sense mathematics is a complicated game and truth has nothing to do with reality (whatever that elusive thing is!) or various concepts of truth discussed by the philosophers. Truth, for us, shall be something prescribed by a set of rules. To be somewhat more specific, a branch of mathematics is usually constructed in the following way. A small number of statements are written down which are called axioms and these are arbitrarily called true and the letter 't' assigned to them. By means of a given rule, from each true statement a new statement can be formed which is called false and the letter 'f' is assigned to these. Then there are vari ous rules for assigning 't' or 'f' to new statements formed from col lections of statements that already have 't' or 'f' values attached to them. This enlarges our collection of statements with 't' or 'f' attached to them. We can then use our rules on this enlarged collection to get a possibly still larger collection of statements haviilg 't' or 'f' values attached to them. We can then apply the rules again to get a possibly still larger collection of statements having 't' or 'f' assigned to them, and so on. It is always our hope that in starting from the given axioms and applying our rules that we will not get a statement that has both letters 't' and 'f' attached to it. If we get a statement with both 't' and 'f' at tached to it, we say that our axioms are inconsistent. If this is never the case, we say our axioms are consistent. For a consistent set of axioms, those statements taking on the value 't' are called lemmas. propositions, theorems, and corollaries. It is not always clear which true statements should bear which names.
However,
current usage seems to suggest the following rules. A theorem is an important true statement. A lemma is a true statement that is used in constructing the proof of another true statement and usually does not have wider applicability. A corollary is a true statement that is an immediate consequence of a true statement. Finally, a proposition is a true statement that is not a lemma or corollary but is not important enough to be called a theorem.
Many people also use the word
'scholium' to play the same role as the word 'proposition' or even possibly to be a true statement that is not as important as a proposition. We shall now give some rules for forming new statements from given statements A and B that have values 't' or 'f'. That is to say, we shall construct new statements containing the statements A or B or both and give rules for assigning 't' or 'f' to the new statements. We shall do this by means of a truth table, which will list the symbol 't' or 'f' to be given to a new statement given the various combinations of 't' and 'f' values that A and B can take on. a.
Negation
-
A
(To be read: not A.)
I.I
A
-A
t
f
SOME IDEAS ABOUT LOGIC I 3
f b.
Implication A =>B (To be read: A implies B, or if A then B, or B if A, or A only if B, or A is a sufficient condition for B, or B 1s a necessary con dition for A.) A
c.
B
A =>B
t
t
t
t
f
f
f
t
t
f
f
t
B
A&B
Conjunction A&B (To be read: A and B.) A
d.
t
t
t
t
f
f
f
t
f
f
f
f
Disjunction AVB (To be read: A or B.) A
B
AVB
t
t
t t
t
f
f
t
t
f
f
f
The preceding truth tables give a prescription for the use of the symbols
'-
' '==i, & ' ,' and 'V'. As such, we can loosely think of these ,
tables as giving a meaning to statements containing these symbols. We shall try to explain this in more detail. In our previous discussion we have used the word 'statement' as if this were a well-known concept to the reader. Actually we think the reader has a good idea of this concept, but we shall be pedantic and
4 I THE REAL NUMBER SYSTEM
comment on it further. To form a statement in the written English language, for example, we begin with an alphabet consisting of 52 Latin letters (lower case and capitals), the various punctuation marks, and various other symbols such as parentheses, brackets, and so forth. We may even suppose the alphabet contains a symbol that cannot be seen-an empty space. A statement in the English language, meaningful or not, is a string of these symbols usually placed in a horizontal row, and one has a rule to tell where a statement begins and where it ends. A string of Latin letters that begins and ends witl) the empty-space
symbol and has no empty-space symbol in betwee'n is called a word. A string of objects beginning with two empty-space symbols and a
capital Latin letter and ending with a period, and having no period in between is called a sentence, and so forth. Statements in mathematics are formed in the same way, that is, by placing symbols in various positions. However, it is usually the case that we form these statements from a different collection of symbols than those that we use for the English language. The symbols '
-
' ·�. '&', ,
and 'V' are part of our mathematical alphabet. Now, what we have described as statements of the written English language cannot be said to constitute the written English language. Most of the strings of symbols that would be written down would not be meaningful. The meaningful statements are those prescribed by means of lists of words contained in dictionaries and by means of the rules of grammar. A moment's reflection is enough to convince us that for someone who does not already know the written English language it would be impossible to describe the rules of grammar or how to use a dictionary in terms of the written language. The various rules must be described in terms of a different language that is understood by the learner. For a child this is usually done by means of a spoken language, and for someone who understands a different written lan guage such as, for example, Hebrew or Sanskrit, the rules of written English can be described in terms of those languages. The same situation persists with regard to the mathematical language. In the mathematical language we say that the meaningful statements are those which can be assigned a value 't' or 'f'. The rules whereby we describe which mathematical statements are meaningful must be prescribed by a language outside the mathematical language. For us, the describing language is the English language. We are assuming that a truth table is part of the English language, since if we had a mind to do it we could describe these tables in terms of the conventional lan guage. So we see that truth tables are nothing more than rules, written in a language that we presumably understand, which describes which statements in our mathematical language are meaningful. Of course, .we have some intuitive ideas of what we want and these prescriptions
I.I
SOME IDEAS ABOUT LOGIC I 5
of the truth tables are nothing more than formalizations of these intuitive ideas. Let us give an example that illustrates how the truth-table method works. Let us suppose that
A, B,
and
are statements that can be
C
given 't' or 'f' values. We wish to show that the statement
[(A�B)
(B�C)] �[A �CJ
&
always has a 't' value regardless of the values taken on by
A, B, and C. 'A�B', E for 'A�C'. The table
To get the table to fit on one page, let us set 'D' for
'B�C',
'F' for
'(A�B)
&
(B�C)',
and 'G' for
'
'
looks as follows.
A
B
c
D
E
F
G
F�G
t
t
t
t
t
t
t
t
t
f
t
f
t
f
t
t
t
t
f
t
f
f
f
t
t
f
f
f
t
f
f
t t
f
t
t
t
t
t
t
f
t
f
t
f
f
t
t
f
f
t
t
t
t
t
t
f
f
f
t
t
t
t
t
Since the last column always has the 't' value we have shown what we set out to show. Once we have given the basic symbols of our mathematical language we can
define
point we can
new symbols in terms of our basic symbols. As a case in
define
the equivalence symbol
� · .·
When we write
A�B, this is to be read: A is equivalent with
B. It is sometimes also read: 'A�B' is defined as another repre for the statement (A�B) & (B�A), which is in terms of symbols of our mathematical language. Once 'A�B' has
A if and only if B. sentation the basic
The set of symbols
been defined, we coan consider it as the name of a statement. It is easily seen that the statement A and
A�B has
the value 't' attached to it whenever
both haye the 't' value or whenever A and B both have the 'f' value. Otherwise A�B has the 'f' value attached to it.
B
In the above paragraph we have used a short symbol to replace a more cumbersome one. This is, in essence, the nature of a definition. A definition gives a (usually shorter) new name to something that can be described in terms of known symbols or names. The object, as in any other language, is for efficiency in expression, which leads to efficiency in thought. The criteria of a definition is that it should only
6 I THE REAL NUMBER SYSTEM
introduce new symbols or names for groups of known symbols and we should not be able to obtain any true statements by use of the defini tion that could not be obtained without it. In other words, we should think of definitions as simply introducing a system of shorthand into the mathematical language. Suppose now that we have a set of statements Hi, H2,
·
·
·
,
Hn which
are meaningf ul in the sense that they have 't' op-"f' values attached to them. In addition, we suppose that there are statements A1,
•
•
•
, Am
so that each H k is composed of some of the A; together with the logical symbols '-','==?',etc. It may not, in general,be known whether the A; are meaningful. Further, suppose C is a statement that is composed of some of the A; and the logical symbols, and we don't know in general whether C is meaningful. However, suppose that under the supposi
tion that A1,
•
•
•
, Am can be given 't' or 'f' values we find by the use of
our truth-table rules that Hi & H2 &
·
·
·
& Hn ==* C
always has a 't' value. Then,if all the H k have the 't' value we shall give C a 't' value. This is a new rule for giving statements 't' or 'f' values and
is usually called the rule of inference. Our hope here,as with the truth table rules, is that starting from our axioms we cannot also give C an 'f' value,that is, -C cannot be given a 't' value by the scheme that has been outlined. As an example of this new rule,suppose A1 =*A2 and A2 =*A3 are axioms and therefore have 't' values, even though we don't know whether A1, A2, and A3 can be given 't' or 'f' values. However, under the supposition that they are meaningful we have established by a truth table that
always has a 't' value. Hence we would give A 1 ==* A3 a 't' value. Another example is
If Ai and A1 ==*A2 have 't' values,we would assign the 't' value to A2• In case a statement can be given a 't' value by means of the rules we have prescribed,then the statement is said to be derived or proved from the axioms. Suppose a statement C has a 't' value and it is obtained by our rules through an implication Hi & H2 &
·
·
·
& Hn -==*C .
Each
H k is in turn either an axiom or obtained through an implication of
a conjunction of other statements, and so forth, until we finally get back to where all the statements appearing in the conjunction on the left are axioms. The collection of all such statements is called the derivation or proof of C. However, it would be quite impractical to list all these statements beginning with the axioms and, as the reader well knows
1.1
SOME IDEAS ABOUT LOGIC 17
from experience, in practice a proof usually consists of just a portion of this collection. In other words, the proof starts with known true statements that have been proved elsewhere and proceeds from there. The set of rules we have given above is usually called the pmposi tional cakulus or sometimes a model for the propositional calculus. However, the symbols we introduced, and the rules for their use, are not rich enough to provide an adequate basis for most of the discourse of mathematics. Hence we shall introduce some new symbols together with rules for their use which in formal studies of logic is called the
predicate cakulus. A good deal of mathematics is described in the lan guage of the predicate calculus, usually at an informal level, since a very formal approach gets very cumbersome and may often interfere with understanding. However, there are some situations in which a formal approach may very much clarify and facilitate the handling of complicated situations. We have in mind the precise statements of complicated definitions and, in particular, the negating of complicated statements. Suppose that
Q(x) is a statement that depends on a variable 'x'. The
reader may think of a variable as the name of an unspecified object that can be replaced by any member of a specified set. The counterpart of a variable in the English language is a pronoun or a common noun.
x is not a specified object, in general it would make no sense to Q(x). However, by adding certain quali fying statements to Q(x) it may be possible to do so. One of these qualifying statements is 'for every x,' which in symbols is '(x)' or 'Vx.'
Since
associate a 't' or 'f' value with
We may then write down a statement:
(i)(Q(x)) This is to be read: For every
or
Vx Q(x).
x the statement Q(x) is true. Another x', which in symbols is '(3x).'
qualifying statement is 'there exists an
We may then write down another statement:
(3x)(Q(x)) . This is to be read: There exists an
x such that Q(x) is true. It may now
be possible to attach 't' or 'f' values to these statements. The symbols
'(x)' and 'Vx' are called universal quantifiers and the symbol '(3x)' is existential quantifier. It is also possible that we may have a statement Q(x, y) which depends
called an
on two variables and we may write down a statement:
(x)(y)(Q(x,y)). This is to be read:
For every
x and for every
is true. We may also write down a statement:
(x)(3 y)(Q(x,y)).
y the statement
Q(x, y)
8
I THE REAL NUMBER SYSTEM
This statement is to be read: For every x there exists a y such that Q(x, y)
is true. Clearly, the statements
(x)(3y)(Q(x, y)) and (3y)(x)(Q(x, y))
are different statements simply because the symbols are placed in a different order. However, even intuitively they cannot be considered equivalent. In the second case y is independent of which whereas in the first case y may depend on
x is chosen, x. As an example, suppose
x and y may be replaced by real numbers. We may then write the true statement:
(x)(3y)(x
(x)(-Q(x)).
Using these rules we can negate more complicated expressions. For
example,
is equivalent to
SOME IDEAS ABOUT LOGIC) 9
1.1
This is done by negating one at a time; that is, we consider
Q1(x1) = (x2) (3x3) (3x4)(xs) (Q(x., and then
-(x1)(Q1(x1))
is equivalent to
·
·
·
,xs))
(3xi)(-Q1(x1)),
and con
tinue on in this way. Now, let us go on to some of the other rules. Usually it is necessary to start a chain of proof by means of known true statements that in volve quantifiers. A broad description of the rules is as follows. First, have a consistent way of removing the quantifiers from the known true statements. Next manipulate the resulting quantifier free statements by the rules of the propositional calculus. Finally, have a consistent way of replacing the quantifiers. The final quantified statement can be given a 't' value provided we have used all the rules correctly. We believe that the rules of manipulating with quantifiers are best explained by means of examples. Let us first look at a simple situation involving only universal quantifiers. Suppose statements involving the variable
'x',
P(x), Q(x), and R(x)
are
and it is known that the statements
(x)(P(x) => Q(x)), (x)(Q(x) =>R(x)), are true, that is, have a 't' value attached to them. It would seem natural, at least from the rules given for the propositional calculus, that we should be able to conclude that the statement
(x)(P(x) =>R(x)) has a 't' value attached to it. The rule here is to remove the universal quantifiers to get the statements
P(x) => Q(x), Q(x) =>R(x). Consider each of these two statements to have the 't' value attached to it even though with the variable
'x'
the statements may have no mean
ing. However, we are thinking that if we replace
x
by any member of
a given set, then the statements will be true. By the methods of the propositional calculus we conclude that
{[P(x) => Q(x)J
&
[Q(x) =>R(x)]} => {P(x) =>R(x)}
is a true statement. Consequently, by the rule of inference we conclude that
P(x) =>R(x) is a true statement. The rule now is to add the universal quantifier and get the statement
(x)(P(x) =>R(x)).
10 I THE REAL NUMBER SYSTEM
Let us now look at a simple situation that involves an existential quan tifier. Suppose it is known that the statements (x) (P(x) :::} Q(x)), (3x)(P(x)) are true. It would seem reasonable that the rules we give should lead to the conclusion that the statement (3x)(Q(x)) is true. Now, the statement (3x)(P(x)) has the intuitive meaning that P(x) is true when x is replaced by only certain members (possibly only one) of a specified set. Hence the rule we now formulate is that when an existential quantifier is removed, the variable 'x' shall be replaced by a symbol that stands for a definite but unspecified member of some set. This is in accordance with the conventions used in ordinary mathe matical discourse. For the purpose of this discussion let us use the beginning letters of the Latin alphabet to stand for these definite but unspecified symbols. Consequently, remove the existential quantifier to obtain the statement P(a). If we remove the universal quantifier from the statement (x)(P(x) :::} Q(x))to get the statement P(x) :::} Q(x), then there would be no way for us to proceed. For the statement {P(a) & [P(x) :::} Q(x)]}:::} Q(a) is not a true statement according to truth-table methods. But {P(a) & [P(a) :::} Q(a)]} :::} Q(a) is a true statement, and by the rule of inference we conclude that Q(a) is a true statement. Hence we adopt the rule that whenever a universal quantifier is removed, we may retain the variable 'x' or replace it by one of the letters 'a', 'b', and so on, which stand for definite but unspecified objects. The tactics depend on just what is intended to be accomplished. Once we have established that Q(a) is true, we want to reinstate a quantifier. The rule is that whenever the letters 'a', 'b', and so on, appear in our statements, they can be quantified by existential quantifiers. Hence we get that the statement (3x)(Q(x)) is to be given a t value. '
'
In removing existential quantifiers, the rule we made is that the letters usually reserved for variables are replaced by other letters. This is done to serve as a warning that, when we reach the point where we want to reinstate quantifiers, we should not add a universal quanti fier where we should have added an existential quantifier. As an exam-
1.1
SOME IDEAS ABOUT LOGIC
I
11
ple of the type of difficulty that could arise if we did not follow this procedure, let us consider the statements: (x)(x< 1�x+1xy ¥ x). (x)(3y)(-[x ¥ 0 &xy ¥ l]).
We shall adopt the naive, intuitive point of view that a set is a collection of objects without questioning what these words mean. The term
'b EB' is to be read: b is an element of the set B. Often a set will be specified by some descriptive property. For example, suppose that
Q(x)is a sentence containing a variable 'x'. Then we can form the class 'x' make the sentence Q true. We denote this by means of the term '{x: Q(x)}', and this is to be read: The collection of all xsuch that Q(x) is true. As in the situation for quantifiers, x is understood to vary over a specified set. The set consisting of the one element a will be designated by '{a}', the set consisting of the two elements a and b will be designated by '{a, b}', and so on. Given two elements a EA and b EB we can form the ordered pair (a, b). The reason we use the word 'ordered' is that in general {a, b) and (b, a)are not considered the s.ame object. Indeed, the ordered pairs (a, b) and {a1, b1) shall be identified if and only if a= a1 and b= b1• Recall that the symbol '=' has the meaning that the symbols that stand of all objects that when their names are substituted for
to the left and right of it are simply different names for the same object and we adopted the rule that different names for the same object may be used interchangeably in any expression. From two setsA and B we can form a new set, the Cartesian product A X B, which is defined by the equality AX B
= { (x, y) : x EA & y E B}.
(l.2.1)
1.2
SETS I 17
We can also form the intersection of two sets defined by the equality
A n B={x:x EA & x EB}.
(1.2.2)
More generally, if� is a collection of sets we shall define
n {A:A EV'6}
=
{x: (A)(A Et16 =>x EA)},
(l.2.2')
that is, the collection of all elements each of which belongs to every set in x EB) .
(l.2.6)
18 j THE REAL NUMBER SYSTEM
The term
B.
If
'A CB ' is to be read: A is contained in B, or A is a subset A CB and A =fa B, A is called a proper subset of B. Two sets are
of to
be identified if and only if they are contained in each other, that is,
(1.2. 7)
A=B �(ACB&BCA).
Since we have taken equality as a primitive logical notion, the equiv alence ( l.2. 7) is to be viewed as an axiom rather than as a definition of equality between sets. For, from the rule we have adopted for the symbol'=', it is a simple matter to prove that
A=B =:}(A CB&BCA). However, the converse implication cannot be proved and in axiomatic set theory it is usually adopted as an axiom, provided equality is taken as a logical notion. Actually in making the definition (l.2.6) and in taking as an axiom (I. 2. 7) we should have used the universal quantifiers
'(A)'
and
'(B)',
so that these statements would refer to all pairs of sets
rather than to two particular sets. In our previous discussion we have introduced a new symbolism,
'{x: Q(x)}',
which has not been defined in terms of the symbolism of
the predicate calculus. Hence we must either define this symbol in terms of the rules of the predicate calculus or else give new rules for operating with statements that contain these symbols. The first method is clearly the preferable one. Hence we take the symbol
'{x: Q(x)}'
to
be a name for that set for which the following statement is true:
(y)(y E{x : Q(x)}�Q(y)). A moment's reflection is enough to convince us that the intuitive meaning of this statement is the same as the intuitive meaning we previously gave to the term As an example,
A n B
'{x: Q(x)}'.
is to be defined as that set for which the
following statement is true:
(x) (x EAn B � [x EA &x EB]).
(l.2.8)
Let us prove that the following statement is true:
An BCA. First,
removing
the
universal
(l.2.9)
quantifier from (l.2.8) we get the
statement:
x EA n B � [x EA &x E B],
(l.2.9')
which for the purpose of applying the rules of the propositional cal culus is assumed to have a 't' value. The truth-table method of the prop ositional calculus tells us that the following statement has a 't' value:
{x EA n B � [x EA&x E B]} =:} {x EA n B =:} [x EA &x EB]}.
(1.2.lO)
1.2
SETS j 19
From (l.2.9), (1.2.10), and the rule of inference we find that the follow ing statement is true:
x EA n B ""* [x EA
&
x E B].
(l.2.11)
The rules of the propositional calculus tell us that the following is true [Exercise 2(c) of Section 1.1]:
[x EA
&
x E BJ ===* x EA.
Designating the statement (l.2.11) by by
'S(x)',
'R(x)'
(l.2.12)
and the statement (l.2.12)
we get the following true statement:
[R(x)
&
S(x)] ===* [x EAn B ===*x EA].
(l.2.13)
Using the rule of inference the following statement has a 't' value:
x EAn B ===*x EA.
(l.2.14)
Adding a universal quantifier we get the following true statement:
(x)(x EAn B ===*x EA).
(l.2.15)
Using the statement (l.2.6) and the rules of the propositional calculus, we arrive at the true statement
(x)(x EA n B ===*x EA) ===*An BC A.
(l.2.16)
Using the fact that statements (1.2.15) and (1.2.16) are true, by the rule of inference we finally arrive at the true statement:
An BC A. We have presented above a formal proof of the last statement, being careful to point out at each stage exactly what was being used. Of course, we could have developed a scheme so that the proof would have been more mechanical and the amount of space needed to write it down would have been much less. Nevertheless, we think the reader now sees how cumbersome a formal proof can be, even of the simplest statements. For this pragmatic reason most of the discourse of mathe matics is carried on in an informal way. In an informal proof we do not write down all the steps but only those considered to be essential. This is analogous to the situation when in making an arithmetic or algebraic computation we usually do not take cognizance of the fact that we are using, for example, the commutative or associative laws, but suppose these are standard facts which the reader recognizes. For example, the chain of argument leading from ( l .2.9) to (l.2.11) or the chain of argument leading from (1.2.11) to (l.2.14) is usually considered a standard argument and would not be mentioned in an informal proof. Of course, just how much is written down is at the discretion of the writer. Usually enough should
20 I THE REAL NUMBER SYSTEM
be written down so that it would be clear how to make the formal proof if any question should arise about the validity of the informal proof. As an example of an informal proof let us show the following:
A
n
(B
B)
u
(A
(B U C)� [x EA
&
x EB U C].
u
C)
(A
=
n
n
C).
We have
x EA
n
Also,
x EB UC� [x EB V x EC]. Now, it is easily checked by a truth table that [ (x EA) & (x EB V x E C)] � [ (x EA & x EB) V (x EA
&
x EC)].
The disjunction on the right is equivalent with
x E (A
n
B)
u
(A
n
C).
Consequently, we have shown that
x EA
n
(B
u
C)� x E (A
n
B)
u
(A
n
C)'
which gives the equality we are seeking. Of course, proving such an equality or discovering it may be two dif ferent matters. Often the way to discover such an equality is by looking at the Venn diagram. In this case the set in question is shown by the cross-sectioned area in Fig. l.2.3.
FIGURE 1.2.3
The reader may now object that we have not defined the symbol 'E' in terms of the symbols of the predicate calculus. This is true, and in a formal development of mathematics it is necessary to give the rules or axioms that prescribe the use of this symbol. The situation is analogous to that of Euclidean geometry, where points and lines are taken as undefined objects and a set of axioms are given that give the relation ships between points and lines. In axiomatic set theory, sets and the symbol
'E' are taken as undefined things and a set of axioms is given that
will allow us to develop the kind of a theory of sets which seems intui-
1.2 SETS I 21
tively reasonable to us. These axioms deal mainly with prescribing the conditions under which new sets can be formed from given sets. For example, in axiomatic set theory the facts that
AnB and A
x
B can
be taken to be sets are usually given by axioms. In connection with the set
A
X
B, the notion of ordered pair can be defined by use of the
axioms. To try to give a reasonable axiomatic approach to set theory would be too difficult at this stage and would delay our study of the calculus for a long time. Hence, as we mentioned at the beginning of the dis cussion on sets, we shall suppose that everyone understands what a set is and we shall allow operations on sets and the construction of sets that seem intuitively reasonable. Such a procedure can, on occasion, lead to serious philosophical difficulties, but we shall pretend that they don't exist. Finally, let us remark that it is convenient to consider the set that has no elements. It is defined by the equality
0= {x: x � x}. The set 0 is called the null set or the empty set or the void set.
D Exercises 1.
Draw Venn diagrams for the sets
A \B, Ac, and An(BU C).
Give a schematic diagram (not a Venn diagram) for a Cartesian product set.
2.
Give formal proofs of the following statements: (a)
AU(BU C)=(AU B)U C. An(Bn C)=(AnB)n c. (c) (Ac)c =A. (d) An Ac= 0. (b)
3.
Prove the following:
(A u B) n c=(An C) u (Bn C).. (AnB) u c = (A u C)n(B u C). (c) An(A u B)=A. (d) AU(AnB)=A. (a)
(b)
4.
Prove the following: (a) (b)
5.
(An B)C=AC u Be. (A u BY=ACnBe.
Using the results of Exercises 2, 3, and 4, find the complements
of the following sets: (a) (b)
AU B U cc. An(BU(C U D)c).
22 I THE REAL NUMBER SYSTEM
6.
(c)
(A u BC)n (A u (Bn cc)).
(d)
0.
Prove the following by using the results of Exercise 3: If An B=An C and A U B=A U C, then B = C.
7.
Show the following: (a) (b)
8.
If A,, is any collection of sets and B is any set, show the following: (a) (b)
9.
(b)
Bn u {A:AEA,, }= u {AnB:AEA,,}. B u n {A:AEA,, }= n {A u B:A E A,,}.
If A,, is any collection of sets, show the following: (a) (b)
11.
B U U{A:AEA,, }= U {A U B:A EA,,}. Bn n{A:AE.A,,}= n{AnB:AE.A,,}.
If A,, is any collection of sets and B is any set, show the following: (a)
10.
A�B;,,,_A\(A\B). An B= A\(A\B).
-n c
( U{A: A E.A,,})c= n{Ac:AEA,,}. ( n{A :AE.A,,})c= U{Ac:AEA,,}.
If A,, is any collection of sets and B is any set, use the results of
the previous three exercises to show the following: (a) (b)
1.3
B\ U{A:A Evt}= n{B\A : AE vt} . B\ n {A :A Evt}= U {B\A :AE vt} .
RELATIONS AND FUNCTIONS
The concept of the Cartesian product of two sets leads to the concept of a relation. We shall first give a formal definition and then comment on the meaning.
1.3.1
Definition.
A relation is a subset of a Cart.esian product set.
If R is a relation, the set £>(R)={x :(3y)((x, y)ER)} is carted the do main of R and the set 5t(R)={y :(3x )((x, y)ER)} is called the range ofR. The relation defined by R-1={(y,x):(x,y)ER} is caUed the inverse of the relation R. If A is any set, then the set R-1(A)={x:(3y)(yEA &
(x, y)ER} is called the inverse image of A under R. An example of a relation is the following. Let A be the set consisting of all men in the United States and B the set of all people in the United States. Let R be the set of all (x,y)EA x B so that x EA and y is a rela tive of x. Since R is a subset of a Cartesian product it is a relation. Note that we also have R C B X B. The domain of R is the set of elements which are first members of the ordered pairs that are in R. In this case
1.3
RELATIONS AND FUNCTIONS I 23
this is A. t Suppose C is the subset consisting of those people in B who have at least one living male relative. It is probably true that C ¥- B. At any rate, C is the set of elements which are the second members of the ordered pairs that are in Rand hence is the range of R. Note that it is not true that R=A X C, although certainly R C A X C. The situation where to each element in the domain of a relation there corresponds only one element in the range so that the resulting ordered pair is in the relation is of special significance . Such relations are called functions and we now give the formal definition.
1.3.2
Definition.
A function F is a relation with the additional property
that
(x)(y)(z)([(x,y)
E
F & (x,z )
E F]
==>y=z)'.
For example, if A and B are the sets given above, then the set of all
(x, y)
E A X B with
x
a husband and y his legal wife is a function. A
more pertinent example of the distinction between a relation and a function is perhaps the following:
{ (x,y) : x2 + y2= 1} is a relation. { (x, y) : x2 + y2= 1 and y;a. O} is a
function.
Some people prefer the words multivalued function in place of the word 'relation.' If Fis a function and
(x, y)
E F, then the usual convention is to de
note the second member y by notation. The element
F(x)
F(x).
We shall follow this convenient
is called the value of Fat
x, and
we also
often speak of it as the map of x under F. In case Fmaps distinct elements of its domain into distinct elements of its range the function
F
is said
to be one to one. The formal definition is the following
1.3.3
Definition.
A function F is
said to be one to one
(x)(y)(F(x)=F(y) x =y). In the statement above the variables are, of course, understood to represent elements of J?>(F). In case F is a one-to-one function, it is clear that �1 is also a function. However, we shall state this as a formal proposition and leave the proof as an exercise.
1.3.4
Proposition.
If F is a one-to-one function, then p-i is also a one
to-one function. Given two or more functions there may be ways of combining these tWe are supposing that every man has a relative.
24 I THE REAL NUMBER SYSTEM
functions to get a new function. We shall give one way here, the
com
position of two functions. We shall give other ways later. 1.3.5 Definition. If F and G are functions, then F 0 G is that function having domain {x: G(x) EE(F) } and Vx EJ0(F 0 G) ,
F0G(x) =F(G(x) ) . In very formal terms we can write
F0G= {(x,y) : (3z) ((x,z) EG & (z,y) EF) }. By an abuse of language we shall often designate the range of a function
f by the symbol
{J(x) : x EE(f)}. If A is a set, we can define a new function g as that subset off consisting of those ordered pairs (possibly void) whose first members belong to
A.
We shall often write
g=JIA. If
A
C
E(f ) ,when
we write
f(A) = {f(x) : x EA}, we are referring to the range of
g.
A function is an important special type of relation. There is another
special type of relation, an
equivalence relation, which plays an extremely
important role in all branches of mathematics.
1.3.6 Definition. A relation R is said to be an equivalence relation if and only if the following are satisfied: (a) (b)
(x) (y) ((x,y) ER ==> (y,x) ER) . (symmetric) (x) (y) (z) ([(x,y) ER & (y,z) ER] =>(x,z) ER) . (transitive)
Many authors prefer to talk about an equivalence relation
X and (c)
R
on a set
add the condition
(x) (x EX=> (x,x) ER) .
(reflexive)
The condition (c) simply assures us that
X
C
,B (R) .
In fact, from
(a) and (b) it is easy to prove the following:
(x) (x E,B(R) ==> (x,x) ER) . (x, y) ER, then (y,x) ER. (x,y) ER & (y,x) ==> (x,x) ER.
Indeed, (a) tells us that if (b) we get that
Hence from
We shall usually denote an equivalence relation by the symbol'=' and instead of
'(x,y) E='we
shall write
'x
=
y'.
It is not hard to check that
1.3
RELATIONS AND FUNCTIONS I 25
it is possible to use the symbol ·�· to define an equivalence relation in the Cartesian product XX X, where X is the set of all meaningful state ments. We shall soon meet other familiar equivalence relations. 1.3.7 Theorem. Let X be a set and = an equiva/.ence relation having domain and range the set X. There is a collection 6 of subsets of X so that X=U{E :EE6}, where VE, F E 6, E ¥= F �E n F = 0 and x,y E E � x = y; x = y � 3E E 6 so that x,yE E. (The sets E E 6 are called equivalence classes.) Proof. For every x E X let E(x) = {y : y = x}. Since, as we have shown, xE ..e (=) �x = x, it follows that xE E(x) . For any sets E(x) and E(y) suppose E(x) n E(y) ¥= 0 and let z E E(x) n E(y) . We have z = x and for w E E(x) we have w = x. From the symmetry condition (a) we get x = z, and thus from the transitivity condition (b) we get w = x & x = z �w = z. On the other hand, z = y, and hence from (b) w = z & z = y �w ;,, y. Hence we have shown that w E E(x) �w E E(y) , which means E(x) CE(y) . By making the same kind of argument for the set E(y) we arrive at the conclusion that E(y) CE(x) . This shows E(x) =E(y) . If we now take 8 = {E(x) : xE X}, we see that the theorem is proved. D Exercises I.
Letf be a function and A,B CJ?J(f). Show the following: (a) A CB �l(A) Cl(B). (b) l(A U B) =l(A) U l(B). (c) l(A\B) Cl(A). (d) l(A n B) c l(A) n l(B).
2. Prove Proposition 1.3.4: The inverse of a one-to-one function is a function, which is also one to one. 3. =x.
If l is a one-to-one function, show that Vx
4. Let following: (a) (b) (c) (d)
E
..e(J), 1-1 l(x) 0
f be a function and A and B subsets of �(f). Prove the A CB �1-1(A) C1-1(B). 1-1(A U B) 1-1(A) U 1-1(B). l-1(A\B) =l-1(A) \j-1(B). 1-1(A n B) =1-1 (A) n l-1(B). =
26 j THE REAL NUMBER SYSTEM 5.
Give an example which shows that we may not have equality in
Exercise l(d). However, show that if f is a one-to-one function we get equality. Suppose f and g are functions such that tR- (g) C JFJ(J) and E "® (g), f g(x) =x. Show that g is one to one. If, in addition, tR-(J) C "®(g) and Vy E"®(J), g0f(y) =y. show thatf=g-1•
6.
Vx
7.
o
Define a relation on the set Z of integers by writing n
=
m �n
- m is divisible by 5. Show that this is an equivalence relation. How many
equivalence classes are there?
8. =
For ordered pairs (x, y) and ( u, v) of real numbers write (x, y)
(u, v) �there exists a real number t > 0 so that (x,y) =(tu, tv). Show
that this is an equivalence relation and give a geometric description of the equivalence classes.
9.
Suppose R is a relation with the following properties: y E tR- ( R) ::::} (y, y) E R.
(a)
(/3)
(x,y ) , ( z,y ) E R::::} ( z,x ) E R.
Prove that (x, y) E R ::::} (y,x) E R.
1.4
THE NATURAL NUMBERS
In this section we shall give a set of axioms for the natural numbers and derive some of their more important properties. The proofs we give will be informal, as explained in Section 1.2, and the set theory we shall use will be intuitive. One may, quite legitimately, ask why we bother to be so formal about the development of the real number system when we are being so informal about logic and set theory. One answer is that the first serious questions about the nature of mathematics arose in connection with the real numbers, first among the ancient Greeks and later again among the nineteenth-century mathematicians. Hence an enormous amount of intellect and energy have been expended in trying to clarify the nature of these objects. Many people seem to feel that between certain limits these efforts have been successful agd that a usable system can be obtained from a few psychologically satis fying and clearly stated principles or axioms. Of course, there may be sharp disagreement on just where to start and how far one can go without getting involved in contradictions. We shall start with a set of axioms that are not as minimal and/or perhaps not as intuitively satisfactory as others. However, we feel they are reasonably satisfac tory and have the advantage that the development of the real numbers can proceed quite rapidly from them. The name 'the natural numbers' is given to any set N together with two functions + and following axioms:
·
each with domain N X N and range in N satisfying the
1.4
(a)
THE NATURAL NUMBERS I 27
(x)(y)(x+ y=y+ x). (x)(y)(x·y=y·x).
(a') (b)
(commutative laws)
(b')
(x)(y)(z)(x+(y+z)=(x+y)+z). (x)(y)(z)(x (y z)=(x y) · z).
(associative laws)
(c)
(x)(y)(z)(x
(distributive law)
·
·
·
·
(y+z) =x · y+x · z).
(d)
1E
(e)
N & (x)(x
x, y in N, one (I) x=y. (2) (3z)(x=y+z). (3) (3z)(y=x+z).
·
1
=
For every
x). and only one of the following is possible: (trichotomy/aw)
M C N , the following is true: [I E M & (x)(xE M�x+ IE M)] �M=N.
(f)
For every
(induction)
Using these axioms it is immediately possible to prove a number of results about the natural numbers
N.
[We are using
'N'
to designate the
natural numbers, although strictly speaking we should use the triple
'(N + , , )'.] ·
However, let us first make some comments about the
previous axioms. The axioms (a) through (c) are of course the familiar ones from arithmetic. The first part of the cortjunction of axiom (d) says that
N
=/:-
0 and names a particular element. The second part of
the axiom states a property for this element. Axiom (e) has been stated rather informally for the sake of clarity. It simply says that we can have one and only one possibility; either
x
and y are the same,
x
is greater
than y, or xis less than y. More formally, we could have given this axiom in terms of two axioms: ( e' )
(x)(y)(x =/:- y ¢:::> (3z)( y=x+z_ V x=y+ z)) . (x)(y)((3z)(y=x+z)�- (3z)(x=y+z)).
(e")
The last axiom (f) is often stated in the following way: If P (x) is a state ment depending on
x,
then
[P( I) & (x)(P(x)�P(x+ I))]�(x)(P(x)).
(f')
This can be translated to our statement (f) by the following device. Set
M={x: P(x)}; then if (f') is true, (f) is true for
M,
and vice versa.
Let us now give some examples that show how these axioms may be used to obtain other true statements about that
N
N.
Our first statement says
has no zero element; actually it says more.
1.4.1
Proposition.
There is no
x
and no y in
that is,
-(3x)(3y)(x+y=x).
N,
so that
x + y=x;
28 [ THE REAL NUMBER SYSTEM
Proof. We shall prove this by contradiction. Suppose (3x)(3y) (x+ y=x). This implies
(3x)(3y)((x=x) & (x+y=x)), which contradicts the trichotomy axiom (e). Our next statement is to the effect that we have a cancellation law in N with respect to multiplication.
Proposition.
1.4.2
If x· z=y z, then x= y, and vice versa, that is, ·
(x) (y)(z)(x ·z=y
·
z �x=y).
Proof. The fact that x=y=>x ·z=y z follows from our rules x and y are different names for the same thing and hence we may •
that
use them interchangeably in any expression. Hence we must prove the implication
(x)(y)(z)(x·z=y·z=>x=y). Suppose this 1s not true.
Then using our rule for negating statements we have
(3x) (3 y)(3z)(x·z=y
•
z &
x
=F
y).
(l.4.1)
By the trichotomy axiom (e) [or (e')]
x
=;/:
y=>(3w) (x=y+w Vy=x+ w),
and by the distributive law the latter statement implies
x
=F
y=>(3w)(x ·z=y·z+w · z Vy·z=x · z+ w · z).
(l.4.2)
Hence from ( 1.4.1) and (l.4.2) we get
[(3x) (3y)(3z) (x z=y · z & x =F y)] => [ (3x) (3 y)(3z) (x · z=y· z & (3w)(x z=y · z+ w ·
·
·
z Vy· z=x ·z+w
·
z) ) ],
which contradicts the trichotomy axiom.
1.4.3
Definition (x)(y)(x < y�(3z) (y=x+z)), (x)(y)(x�y�x x+ z� y+ w). Proof.
Exercise.
1.4.5
Proposition (x) (1 �x).
Proof..
Let us set
M= {x: 1.;;: x}.
that is,
1.4
Clearly 1
THE NATURAL NUMBERS I 29
{x)(x+ 1 E M). The latter statement follows from x+ 1 and z = I. Hence {x)(x E M =:::} x+1 E M), and by the principle of induction it follows that M = N. EM
and
Definition l.4.3 by putting y
=
The next statement is to the effect that there is no natural number between two successive natural numbers.
1.4.6
Proposition
-(3x)(3y)(x k C\ &2-(1T n) We =
to be the smallest element in
1Tn+i (n+ 1)
•
1T(j) < 1T(k) and if l E C and I� 1T(n) for some l E gi( 1T). To prove these statements we first note that if
property thatj
S/4.
50 I THE REAL NUMBER SYSTEM
If we now call the isomorphic image of
r
in R by the same name, we
have proved the lemma.
1.7 .15
Theorem. R+ is Archimedian-ordered in the sense that
(x)(y)(x,y ER+=>(3n)(n
EN
& x:;;; ny).
Proof. By the previous lemma 3 r,s E Q.+ , so that 0 < r < y and x < s < x + 1. Since Q.+ is Archimedian-ordered (Exercise 13 of Section 1.5), 3n E N, so thatx < s:;;; nr < ny. ·
1.7 .16
range R+
The absolute value is that function with domainRand {O} defined by the following:
Definition. U
{
x ¢:::) x �0, lxl = -x ¢:::) x 0. :;;; 1. 7.17
For everyx and yin R,
Theorem.
x:;;; lxl, -x:;;; lxl, lxl =I-xi, llxl - IYll:;;; Ix+ YI:;;; lxl + IYI· Proof.
See Propositions 1 5. 7 and l.5.8. .
The important question now arises as to what happens if we repeat the process for
real
Cauchy sequences that we have just gone through
for rational Cauchy sequences. Theorem l.7.20 below shows that we get nothing new.
A real sequence is a function with domain N0=N U {0} and range in R. A real Cauchy sequence x is a sequence such that Ve>0, 3N so that if n,m EN0 and n,m �N, then lx(n)- x(m) I 0=> (3N) (n) (m)(n,m E N0 & n,m�N=> lx(n) - x(m) I 0, 3N so that Vn E N0 with n�N, lx(n)- al < E. If the real sequence x has a limit a we say that x is convergent, and also x converges to a. In the formalism of the predicate calculus the definition of a limit would be as follows:
a ER is
a limit of the real sequence
x ¢:::)
(e)(E>0=>(3N)(n)(n E N0 & n�N=>lx(n)- al ((/2),
€-
[r-;;{P) - r(p) ] > (@),
N.
If we now identify € with the constant Cauchy sequence defined by
r;{jj)
=
€, we have shown that for Vn � N the Cauchy sequences
r.-+ [�-T] are positive, where
r is
r. ..- [.... rn .. -r'] ....
and
the rational Cauchy sequence that evaluated at
p is r(p). If we take the equivalence classes of these sequences we get R(T,') + R(T,;'- r) > 0
R("f;) - R(T,;- r) > 0.
and
If we now set
a= R(T), then from the facts that and
r(n) = R (r,;-),
52 J THE REAL NUMBER SYSTEM
we have arrived at the conclusion that
Vn ;;;.: N,
lr(n)-al < E. x is a real Cauchy sequence. Using the Archimedian Un={m: m E Z & x(n) .;;; m/n} is nonvoid and hence, by the well ordering of N (see Exercise 17 of Section 1.5), Un has a minimal element mn. If we set r(n)= mn/n, then from the fact that (mn-I)/n < x(n) we get 0 .;;; r(n) - x(n) < I/n. Since x is a Cauchy sequence, the sequence r defined by the numbers r(n) is Cauchy. Indeed, Ve> 0, 3e' E Q+ with e' < E and 3M so that n,m ;;;.: M ==> lx(n) -x(m)I < e'/2. Hence, if n,m;;;.: max {M, 4/e}, we have Suppose now that
ordering of R+, the set
lr(n)-r(m)I .;;; lr(n)-x(n)I +lx(n)-x(m)I I I +ix(m)-r(m)I
0, 3N so that n
� N implies
lx(n) - al < e/2. Therefore, if
n,m
� N,
lx(n) - x(m) I
�
lx(n) - al
+
la - x(m) I
(j) < (k) means that the range of is
denumerable. Hence, speaking loosely, "picks out" an infinite num ber of the ordered pairs
(n, x(n)) to form the sequence y. As a simple (x(n)) to be the sequence given by x(n) = n2
example, suppose we take +
2. Take (n) =2n
+ l; then
y(n) = x (n) = (2n 0
+
1)2
+
2
=
4n2
+
4n
+ 3.
54 I THE REAL NUMBER SYSTEM
D Exercises I.
Use the principle of induction to show the following inequalities: (a)
h
0
;;,.
&
n
E
N0 =>
n( (1+h) n ;;,. 1+nh+ 0
(b)
�
h
�
I
(I - h) n h
(c)
�
N0 => n (n 1 - nh+
h2•
E
0, n. E N, n
;;,.
(I+h) n 2.
n
&
n; I)
;;,. 2 =>
;;,. 1+nh +
2
�
; I) h2•
h2•
Show that every finite set in R has a unique maximum element
and a unique m in imum element. Use this fact to give another proof of Lemma 1.7.7.
3.
Show that there is always an irrational (not rational) number
between any two real numbers. Recall that there exists an irrational number:
\/2.
If
lxl
1, show that Vk E
5.
If
6.
For every
x
as
n-oo.
N
E R, show that
xn ,-o n.
as
n-oo.
x(n) -a, y(n) - b as n - oo, and x(n)y(n) -ab as n -oo. 7.
If
8.
If
(x(n))
show that
x(n) + y(n) -a+ b
is a convergent sequence, show that every subsequence
(x (n)). Conversely, if every "proper" (x(n)) converges, then (x(n)) converges. By a "proper" subsequence we mean a subsequence x , where (m) (3n) (n ;;,. m & (n+I) > (n)+1). converges to the same limit as
subsequence of
0
9.
W ithout using Theorem 1.7.20, show that if a subsequence of a
Cauchy sequence converges, then the Cauchy sequence itself converges.
10.
Suppose
x(n) -a
as
n-oo and 3N such lx(n) - bl
What can be said about
la - bl
?
1.8
If
11.
A REVIEW OF THE REAL NUMBER SYSTEM AND SEQUENCES I 55
as
x(n)- a
show that
n- oo,
lx(n) I - lal
as
n- oo.
Give an
example that shows that the converse is not always true. For what value(s) of
12.
If
a,
if any, is it
(x(n))
always
true that
lx(n) I -l al
�
x(n) -a?
is a sequence and
x(2n) -a, x(2n
+
as
1) -a
n-
oo,
show that
x(n)-a 13.
Let
(s(n)) be
as
n-oo.
a sequence and set
x z
x+ IE M, then
M=N. Indeed, M is an inductive set and hence NC M. Since MC N we must have M=N. Now that we have N we can state the next property of the real number system. (i)
For every x and yin R with x that x < ny (Archimedian ordering).
> 0
and y>
0,
there is an n in N so
To state our last and rather crucial property for the real number system, it is necessary to consider the concepts of a sequence, a Cauchy sequence, and a limit of a sequence. For this purpose we have the follow ing definitions:
58 I THE REAL NUMBER SYSTEM 1.8.4 A (real) sequence is a function with domain N0 =N U {O} and range in R. A sequence will usually be denoted by the term '(x(n)) ' or ' (x n) ', and this is to indicate that x(n) or Xn is the value of the Junction at n E N0• A sequence (y(n)) is said to be a subsequence of (x(n)) there is a function with domain N0 and range in N0 so that j < k ==> (j) < (k) and y(n)=x((n)). 1.8.5
The absolute value is the function defined by lx. l =
1.8.6
{
xx �0, -x. x < 0.
The following properties hold for the absolute value: -x � lxl, x � lxl, x I l l- IYI I � Ix+ YI
We are, incidentally, using
a
�
b
�
lxl =I-xi, x l l+ IYI ·
to mean
a < b
or
a=b.
1.8.7 A sequence (x(n)) is said to be a Cauchy sequence VE > 0, 3N so that if m,n�N, then lx(n) - x(m) I < E. 1.8.8 A sequence (x(n)) is said to have a limit 3a such that VE > 0, 3N so that n �N ==> lx(n)- al < E. The number a is said to be the limit of (x(n)). If a sequence has a limit, it is said to be convergent. We can now state a crucial property of the real number s ystem:
Every Cauchy sequence has a limit.
(j )
Let us note that if a sequence has a limit, then the limit must be
unique. 3N so
Indeed, suppose that
that
a and bare limits of (x(n)). Then VE > 0, n �N ==> lx(n) - al < E/2 and lx(n) - bl < E/2. Hence we
have VE > 0,
la- bl If
�
S =la - bl > 0,
la- x(n)I+ lx(n) - bl
is true and very easy to prove:
1.8.9
Every convergent sequence is Cauchy.
Indeed, suppose x(n) - a. This means that V e> 0, 3N so that n;:;,: N� lx(n)-al < e/2. Hence, if m,n;:;,: N, we get, using the triangle inequality,
lx(n)-x(m)I ,,;;:; lx(n)-al+lx(m)-al < e. This, of course, proves that
(x(n))
is Cauchy.
1.8.10 Every convergent sequence is bounrkd; tha t is, 3M so that Vn EN0, lx(n)I,,;;:; M.
n;:;,: N � lx(n) - al < 1. Hence n;:;,: N� lx(n)I,,;;:; I+lx(a)I. Let L = max{x(n): n E( O, N) } and M= max(L, I+lal). Clearly Vn E No, lx(n)I,,;;:; M. Suppose
x(n) -a;
then 3N so that
using the triangle inequality, we get that
1.8.11
defined
as
The sum and product of two sequences (x(n)) and (y(n)) are follows: (x+y)(n) = x(n)+y(n), (xy) (n) = x(n)y(n).
Note we have reverted to the custom of dropping the symbol
'·'
for
multiplication.
1.8.12
If x(n) - a and y(n) -b, then (x+y)(n) - a+b, (xy)(n) -ab.
x(n) -a and y(n) -b mean first of all that 3M> 1, Vn EN0, lx(n)I ,,;;:; M, and ly(n) I ,,;;:; M. S econd, Ve> 0, 3N n;:;,: N� lx(n) - al < e/2M, ly(n)-bl < e/2M. Hence, if n;:;,: N,
The facts that so that so that
l(x+y)(n)- (a+b )I,,;;:; lx(n)- al +ly(n)-bl < e, I(xy)(n)- abl ,,;;:; ly(n)I lx(n)-al +lal ly(n)-b I ,,;;:; M {lx(n)- al+ly(n)-bl} < e. In the last part of the above proof we have used the fact that lx(n)I ,,;;:; M for all
n EN0 � lal ,,;;:; M.
We shall leave the verification of this simple
fact to the reader. In 1.8.2 we defined the natural numbers. Now take
-N= {x: -x EN}, Z=-N UN U{O}.
60 I THE REAL NUMBER SYSTEM
The set Z is, of course, called
the integers. The rational numbers is the
set Q=
{m/n: m,n
We are, of course, writing
1.8.13
Vx
E R,
E Z,
n
¥-
O}.
m/n for m( l/n).
The rationals are dense in the reals in the sense that VE 3r E Q, so that Ix - rl < E.
> 0 and
3n E N so that lxl < n, -n < x < n. Again, using Archimedian ordering, 3m E N so that l/m < E. Let k E N be the smallest natural number so thatx,,;:;; -n + k/m. Then -n + (k - I)/m < x and if we set r = -n + k/m, we have Ix - rl = -n + k/m - x < k/m - (k - l)/m < E. Indeed, by the Archimedian ordering of R,
or,
equivalently,
In the previous proof we have made use of some facts about the relation < without explicitly mentioning them. For example, we said
lxl < n is equivalent to the fact that -n < x and x < n. Indeed, x ,,;:;; lxl we get from (g) that x < n, and from -x ,,;:;; lxl we get -x < n, and from (e) and (h), n + x > 0. Now, using (d), (e), and (h) we get x = -n + n + x > -n + 0 = -n, which is what we set out to prove. that
from
The reverse implication follows in a similarly easy way. Note we are using the usual convention thatx -
y= x
+
(-y). We are sure the reader
can fill in the details of proofs of other facts. Aside from the fact that the rationals are dense in the reals, they also have the property that they have the "same number': of elements as the positive integers. More formally this can be written as follows:
1.8.14
There exists a one-to-one function with domain
N and range Q.
The proof of this statement can be found in Section 1.6. More gen erally we can make the following definition:
1.8.15 A set tion with domain
A N
is said to be denumerable and range A .
¢::> there
exists a one-to-one func
Another way of phrasing 1.8.14 is to say that the rationals are denum erable. We can also talk about finite sets.
1.8.16 A set A is said to be finite ¢::>A is the null set, in which case we say that A has zero elements, or else there is a one-to-one function with domain the {k: k E N & 1,,;:;; k,,;:;; n} and range A, in which case we say set (I, n) A has n elements. A set that is either finite or denumerable is called countable. =
D Exercises Use only statements (a) through U> as axioms in proving the following:
1.8
A REVIEW OF THE REAL NUMBER SYSTEM AND SEQUENCES J 61
1.
Show that -(x+y) =-x+ (-y) and (xy)-1=x-1 y-1.
2.
If n,m E N, then n +m E N and n
3.
Show that 0 < 1.
4.
For every n E N, 1 :;:;; n.
·
m E N.
5. It is not true that there is an n E N and an m E N so that n 0, 3x(n) such that 0 < lx(n) - al < e. Now, Ve> 0, 3N such that n;;;., N � ly(n) - al < e/2. Also, Vn;;;., N, 3n1;;;., n so that lx(n1) - y(n) I < e/2 and Vm> n1, 3n2;;;., m so that lx(n2) - y(m) I < e/2. Hence 3n1 and 3n2> n1 so that Jx(n1) - al lx(n2) -al Since (b)
x(n1) #- x(n2),
� �
lx(n1) - y(n) I + ly(n) - al < e, lx(n2) - y(m) I + ly(m) - al < e.
it follows that either
x(n1) #-a or x(n2) #-a.
Suppose now that A is any bounded infinite set in R. One way
of trying to reduce this case to the previous case is to try to choose a denumerable set inA. However, since it is not clear how to do this using only the axiom of induction, we shall proceed in a slightly different way. Let us set
m =infA,
M=supA.
These numbers exist by virtue of the Theorem 1.9.5. Next, let us put P = {x:
x
mA & Theorem 1.9.3 =>A & Theorem 1.9.5 =>A & Theorem 1.9.7.
[Incidentally, the reason we did not use Theorem 1.9.3 directly in the proof of Theorem l.9.7(a) in taking a as the limit of the monotone sequence (y(n)) is that we wanted to establish the above chain of impli cation.] If we can show that A & Theorem I. 9.7 => A & (j), then all these statements are equivalent and any one of the statements of Theorems 1.9.3, 1.9.5, or 1.9.7 can be used in place of U). 1.9.8
Theorem.
A
& Theorem 1.9.7 =>A & ( j ) .
Proof. Let (x(n)) be a real Cauchy sequence. We distinguish two cases. (a) The range of (x(n)) is finite. In this case 3N so that n,m � N => x(n) x(m). Indeed, if this is not the case, Vk, 3nk � k & 3mk � k such that lx(nk) - x(mk) I sk > 0. By hypothesis, it is immediate that the collection of numbers {Sk} is finite. Let 8 be the minimum of these numbers and E 8/2. Since (x(n)) is Cauchy, 3L such that k � L => lx(nk) - x(mk) I < E, which is a contradiction. Take a = x(n) for n � N, and this is clearly the limit of (x(n)). (b) The range of (x(n)) is infinite. Let a be an accumulation point of the range of (x(n)), which exists by Theorem 1.9.7. Since (x(n)) is Cauchy, 3N so that n,m � N => lx(n) - x(m) I < e/2. Now a is an accumulation point of the set {x(n): n � N}. Therefore, 3n1 � N so that lx(n1) a l < e/2. Hence, for any n � N, =
=
=
-
lx(n)
-
a l ,,;;;;
lx(n1) - al+ lx(n) - x(n1) I
n, Inc., Boston, 1961.
68 I THE REAL NUMBER SYSTEM D Exercises 1.
If
A and B are bounded subsets of R and A C B, show that l.u.b. A � l.u.b. B, g.l.b. B � g.l.b. A.
2.
If
A and B are bounded subsets of R, show that sup A U B =max(sup A, sup B), inf AU B =min(inf A, inf B).
3.
If
A C R and A has an upper bound that belongs to A, show A.
that this upper bound must be sup 4.
If a set
A has a l.u.b. that does not belong to A, show that this
l.u.b. is an accumulation point of A.
of
5. If A is a denumerable set in R and a is an accumulation point A, show that there is a sequence in A which converges to a. If you
use the axiom of induction, be sure you use it carefully and correctly.
6.
(a)
Show that the l.u.b. of the range of a bounded monotone
nondecreasing sequence is the limit of the sequence. (b)
Show that if a subsequence of a monotone nondecreasing
sequence is bounded, then the sequence is bounded.
7.
Show that the sequence defined by the following expression is
monotone increasing and bounded:
3 a(n) = 2 8.
·
5
·
·
·
·
4
·
·
·
(2n+ 3) 2(n+ l)'
n
EN0•
Assuming the binomial theorem as known, show that the se
quence defined as follows is monotone increasing and bounded:
(
a(n) = I +
)
I "+1 , n+1
n
EN0•
The reader may recognize that the limit of this sequence is designated by 'e'.
9. Show that Va � 0 and V n EN, there is a unique y � 0 so that =a. Designate this unique y by 'a1/n' and show that if 0 � a < b, then a 1 1n < blln, and conversely. " y
10.
Prove that lim11_00 n11" =1. [Hint: Set n11" =I+
n =(I+
a11
) n � [n(n-1)/2!]
2 an ,
an
and hence
for n � 2.]
11.
Show that lim.,._00 n!/nn
12.
What is the set of all accumulation points of the subset of the
=
0.
rationals of the form
n,p,q
EN?
CHAPTER
21 LIMITS
We have already discussed the concept of function in Section 1.3. In this and in the next few chapters we shall be exclusively interested in those functions which have their domains and ranges in the real number system. A real sequence is an example of such a function. To discuss the properties of real-valued functions, it is convenient to introduce some notation and terminology for certain sets of real numbers. We shall define
]a,b[= {x: a(J) and V€ > 0, 3S > 0 such tliat Ix - al (f), then for f to be continuous at a, it is necessary and sufficient that
limf(x) =f(a).
x-a
2.1.6 Proposition. If a function f is continuous at a, then a E J0(J) and for every sequence (xn) with range in 1€>(J) and Xn � a, we have f(xn)
74 I LIMITS
--+f(a). (AC) Conversely, if a E �(J), and for every sequence (xn) with range n i �(J) if Xn --+ a impl iesf(xn) --+f(a), then f is continuous at a. ,
Proof. If f is continuous at a, then Ve> 0, 3o > 0 so that Ix- al < o and x E �(J) � lf(x)-f(a)I< e. Also 3N so that n � N � lxn - al< o. Hence n � N � IJ(xn)-f(a)I< e, which is the proof of the first sentence. To prove the second statement, let us assume to the contrary that
0 so that Vo > 0 there exists an x E �(J) Ix- al< o and IJ(x)- J(a)I � e0. For n E N0, let On= I/ (n +I) and An= {x: x E �(J) & Ix- al< On & IJ(x)-f(a)I � eo}. Each set An is nonvoid and hence by (AC) there exists a sequence (xn) so that Xn E An. Clearly Xn--+ a, but since Vn E N0, IJ(xn)- J(a)I it is not true. Then 3e0>
so that
�
e0,
we get a contradiction.
2.1.7 Theorem. If f and g are continuous at a E �(J) n �(g), thenf + g and Jg are contn i uous at a, and if g(a) ¥- O,f/g is a/,so continuous ata. Proof. to
f
and
In case
g,
a
is an accumulation point of the domain common
then the theorem is an immediate consequence of Proposi
tion 2.1.4 and the remark made prior to Proposition 2.1.6. In case
a
is not an accumulation point of the common domain, all the functions listed are automatically continuous.
2.1.8 Theorem. If f and g are functions, if g is continuous at a and f continuous at b= g(a): then f 0 g is continuous at a. (See Defin ition 1.3.5 for f0 g.) is
Sincefis continuous at b, Ve> 0, 311 > 0 so that IY - bl < '11 y E �(J) � IJ(y) -f(b) I < e. Also, since g is continuous at a, 3 o> 0 s o that Ix- al< o and x E �(g) � lg(x) - g(a) I< '11· Hence
Proof.
and
IJ0 g(x)- Jo g(a)I< e Ix- al
0
THE LIMIT CONCEPT AND CONTINUITY 175
a E ]O, I]. We know that 38(E, a) , depending Ix - al < 8 and x E ]O, I] � ll(x) - l(a) I < E.
and
so that
Therefore,
and this in turn implies that
Ix - al x � 1.
since
Hence for fixed
E,
is fixed and
E goes
to zero,
EXa
r},
r }.
These are monotone nonincreasing and nondecreasing functions, respectively, and we define Jim f(x) = Jim ;,(r),
x-oo
r-ao
Jim f(x) = Jim f_t(r). x-oo
r-oo
We can consider oo as an accumulation point of Je(f) since every open interval I ( oo) contains a point of Je(f). If f is a bounded sequence, the latter quantities are called the limit superior and the limit inferior of the sequence, since the domain of a sequence has no finite accumu lation point. We shall leave to the reader the easy task of formulating these notions at -oo. It is possible to get a geometric meaning of the previous definition which may make it more understandable. The number ih(r) may be thought of as measuring the size of the largest peak off as x varies over the deleted interval {x: 0 < Ix - al < r}, and 'Pr(r) measures the depth of the deepest valley. As r decreases to zero, the size of the largest peak decreases to �1(0+) and the size of the deepest valley shortens to 'f'r(O+). - As an example, let us consider the function given by f(x) =sin (l/x) for x � 0. A sketch of the graph of this function is shown in Fig. 2.4.1.
92 I LIMITS
-1 7T
-1
-2
Figure 2.4.1
As x - 0, we get an infinite number of peaks and valleys of this function. It is clear that Vr > 0, �(r) 1, !e(r)= -1. Hence =
lim sin (l/x). = 1,
x-o
lim sin (l/x) x-o
=
-1.
'
Note that this function does not have a limit at x 2.4.2
Proposition.
=
0.
The function f has the limit l at a¢:::a :>
is an
accumu
lation point of �(f) and lim f(x)= l
=
lim f(x).
Proof. Suppose f(x) - z as x - a. This means Ve> 0, 38> 0 so that x E �(f) and 0 < Ix - a l < 8 � l f (x) - l l < e. It follows that if 0 < r ,,,-;; 8, then l�1(r) - ll ,,,-;; e and l 0, 38> 0 so that 1�1(8) - l l < e and l lim (Jg) (x).
x:::;o
x:::;o
x=o
2.4
LIMIT SUPERIOR AND LIMIT INFERIOR I 95
For the second exampie take
J(x) =I+ sin(l/x), Clearly
g(x) = cos(I/x),
x
¥=
0.
f(x) � 0 for all x in its domain, and g changes sign infinitely
often in any neighborhood of zero. If we take xk so that I/xk =
k = 0,
±
1, · · · , we
(2k+ 1)7T,
get
Hence
(Jg)(x)
lim
� -1.
x-o
On the other hand, limf(x)
x-o
= 0,
lim
x-o
g(x) =-1 ,
which leads to the inequality lim
x-o
f(x) lim g(x) > lim (Jg)(x). x-o
x-o
Finally, as a third example we consider
f(x) =-1 +sin (I/x), If we take
g(x) = -1 +cos (l/x).
xk so that l/xk = (2k+ 1)7T, k= 0, 1, 2, ···,we get f(xk)g(xk) = 2
and therefore lim
x-o
But lim J(x) x-o
=
lim
x-o
(Jg)(x) �2.
g(x) = 0 and consequently we get limf(x) lim g(x) < lim (Jg)(x).
x-o
x-o
x-o
Let us 'finish this section by giving an application of the use of the concept of limit superior and limit inferior. This involves an extension of the idea of a Cauchy sequence.
2.4.4 Proposition. Suppose f is a function, a is an accumulation point of �(J ) and Ve > 0, 38 > 0 so that 0 < Ix-al < a, 0 < l y-al 0 so that V8 with �(J ) so that 0 < IY - al < a and -
.
96 I LIMITS
lf(y) - ll < E/2. Suppose we have taken 8 small enough so that x,y E �(f ) and 0 < Ix - a l < 8 and 0 < IY a l < 8 � l f(x) -f(y)I < e/2. Then we get -
lf(x) - ll
�
IJ(x) - f(y)I
+
lf(y) - ll < E.
D Exercises 1.
that
If f is bounded and
a
is an accumulation point of �(f ), show
Jim - f(x)-= -lim f(x). x-a
x-a
2. Show that the inequalities in Theorem 2.4.3(b) are reversed if f and g are nonpositive. 3. If f is a bounded nonnegative function and a is an accumulation point of �(f ), show that Va � 0,
Jim r) ), where cl> is a one-to-one function with domain and range N0• 0
0
It turns out that every rearrangement of an infinite series is conver gent if and only if the series whose terms are the absolute values of the terms of the original series is convergent. We first give a formal definition.
3.1.5 Definition. An infinite series (a, u(a)) is said to be absolutely convergent� the sequence u( lal) is convergent, where Vn E N0, lal =la l If an infinite series is convergent, but not absolutely convergent, it is called conditionally convergent. n
n .
A natural question is whether or not conditionally convergent series exist. The answer is in the affirmative and an example is given .by the senes
102 I INFINITE SERIES
f
k=O
( (-I)k ) · k+I
This series is certainly not absolutely convergent, since the series of absolute values is the harmonic series. The fact that the series is con vergent is a consequence of Leibnitz' criteria, which will be established in Section 3.2.
3.1.6 Lemma. If V k finite subset of N0, then
ak
E N0,
�0,
, )),
we
3.1
Thus 0,,;;;;
an+,,;;;; Ian!, 0,,;;;; an-,,;;;; Ian!, and an=an+ - an-. n n "' L ak±,,;;;; L lakl ,,;;;; L lakl, k=O
and sirice
SERIFS OF REAL NUMBERS I 103
1. By the definition of limit
superior, VN, 3n � N so that
lanl11n
p- E,
>
or, what is the same thing,
lanl
(p - E)n
>
> 1.
lanl � 0 and thus u(jai) cannot con lanl11n is unbounded and clearly we get
Consequently, it is not true that verge. If p = oo, the sequence
the same result.
Let us prove the converges of �k=i First, if
k
E
N and n >
k,
we have
n!
>
(l/n!)
kn-k+1.
Consequently, for all sufficiently large
( n ! )l/n This shows that
. hm
n,
kkJn
>
by means of the root test.
( )l/n 1 -1
n-"' n.
>
k/2.
=O,
and the series converges by the root test.
3.2.3 3N so
Suppose (an)
D'Alembert's Ratio Test. � N ===>an¥- 0. Define
that n
is
a sequence for which
l aann l 1 aan+n1 1
+l , R =Im . 1 -n-
QO
r=I.1m
n:::"OO
If R < 1, u(jai) Proof.
is
convergent and if r
-- .
> 1,
u{iai)
is
divergent.
This is an immediate consequence of the root test and the
chain of inequalities, ll·m
n-co
1 aan+n 1 1
.;;;;
· 1 lffi
n-oo
I an 111n
112 I INFINITE SERIES
We shall leave the proof of these inequalities to Exercise 1 at the end of the secu'on. The convergence of the series of factorials that we have considered previously is also an immediate consequence of the ratio test since
n!/ (n + I) != l/n
+
1�0.
The ratio test, when it works, is usually very easy to apply and hence is very useful. When it fails it may be possible to apply a slightly sharper variant of it.
Raabe's Test. Suppose (an) is a sequence for which 3N so that ¥- 0. Let us put
3.2.4
n
�N => a
n
a
n- ( -la·ann+i I) , n n(1-1 aann+il)·
= lim n
1
oo
{3= lim -
If
a
oo
> 1, er(lal) is convergent and if (3
Proof.
Thus
If
a
kjak+il jak+il
> 1,
a
la�:l l
< k (
Hencek
Vp so that
0
1, er( lal) is divergent.
p > 1, 3 K so thatk
-x·
- I) lakl + ( I - p) lakl, or, rearranging terms, we get
lakak+1 1
> l
_
f!..
k
Therefore,
j
so that k a k + il is monotone increasing ask increases. For fixed andk � p we get
p
�K
lak+1I > (p- 1) lavlfk. Consequently, er (lal) diverges by comparison with the harmonic series.
3.%
CONVERGENCE TESTS I I 13
Both the root test and the ratio test fail for the series
i
k=l
{: ) 2
·
However, Raabe's test will show this is convergent. Indeed,
Thus I
( -
k
k +I
)
2
I > I - I+ 2/k
=k
2 + 2'
and hence Jim
k-oo
(k +
2 ) [l
-
(k/k + 1)2]
=Jim
k-oo
k [I
-
(k/k + 1)2 ]
;:.;,: 2.
3.2.5 Cauchy's Condensation Test. If (an) is a monotone nonin creasing sequence with nonnegative terms, then the following series converge or diverge simultaneously:
Proof.
The sequences of partial sums of these series are monotone
nondecreasing. If a monotone nondecreasing sequence has a subse quence that is bounded, then the sequence is bounded and thus con vergent.
If the sequence is divergent, then every subsequence is
unbounded. Using the monotone nonincreasing character of (an) we get 2k+l-1
L a;,,;;; 2ka 2k,,;;;
J=2k
2
2k-1
L
i=2k-1
where the sum on the right is taken to be 2a0 if
k =0
to
k
=
n we
k = 0.
Summing from
get
u(a)2 n+l-i,,;;;
n
ao + L
k=O
2k a2k,,;;; 2u(a) 2n_1•
This set of inequalities when taken together with the comments in the first paragraph constitute the proof. As an example of the use of the condensation test, let us consider the convergence properties of the
p series,
114 I INFINITE SERIES
Recall that for p
=
1 we called this the harmonic series and showed it
diverged. Hence the comparison test shows that the p series diverges for p < 1. We can apply the condensation test only for p � 0, since otherwise the sequence ( l/nP) is increasing. For p � 0 we must examine the convergence of the series 00
'L
k=O
k ( l/2 1 and diverges for p � 1 . There is another very useful test for absolute convergence called the integral test. We shall take this up in Chapter 5. For now let us turn to some other convergence tests which are not specifically tests for abso lute convergence. Let us first prove a result called the Abel sum mation form ula. In Chapter 5 we shall recognize this formula as a special case of integration by parts for Riemann-Stieltjes integrals. For every pair of sequences (a,,) and (b,,)
Lemma (Abel).
3.2.6
n
L
k=O
akbk
=
bn+1 u(a)n -
n
L
k=O
(bk+1 - bk ) u(ah.
We have, upon setting u(a)_1
Proof.
=
(3.2.l)
0,
ak = u(ah- u(ah 1· Therefore, n
L
k=O
akbk=
n
L
k=O
bku(ah-
n
L
k=I
bk u(ah-1
·
(3.2.2)
The last sum begins at k= 1, since u(a)_1= 0. This last sum can also be written n
L
k=O
bk+1
j so that (3.3.2)
b would have an expansion of the form j-1
b= :L bkf3k + 2w. k=l b is in the complement of C. Also, j, so that bk ¥ 0; otherwise we see from (3.3.2) that "' J-1 b= :L bkf3k + :L 2t3k, k=l k=;+l
which contradicts the fact that
3k
>
which would contradict the fact that
bE cc. Let us set
J-1 q/3i-l = :L bk/3k. k=l
VkE{ 1, j - 1 ), bk is even, we see that q is even and moreover 31-1• By what we have just proved,
Since
q
n, we again get a contradiction. 2n-1 intervals of the form lq,n that we have described. Since each such interval has length l/3 n, the sum of their lengths add up to 2n-i /3n . Hence the sum of the lengths of the pair contradicts the fact that p #
For fixed n there are exactly
wise disjoint intervals which make up cc is
n=I Every point of the Cantor set is an accumulation point of C. A closed set with this property is called a perfect set. The proof is very easy. Suppose
a EC and
akE{O,2},
124 J INFINITE SERIES
where
ak
�
0 for
an infinite number of
series are all different from
k. Then the partial sums of this
belong to C, and converge to
a,
a.
If, on
the other hand,
ak then for
n
>
E
{O, 2},
j, the numbers
are in C, are different from
a, and converge The Cantor set is uncountable. The proof
to
a.
is also rather easy, being
simply an application of the Cantor diagonal process. Suppose C is countable and is a one-to-one function with domain N and range C. Set
ak=(k)
and define
{
ak=
0¢::::}a\=2, 2¢=}akk=O .
Clearly, the number determined by the sequence
(ak)
in its ternary
expansion belongs to C but is not in the range of . Collecting all the previous results we have proved the following theorem.
3.3.4 Theorem. The Cantor set is an uncountable perfect set in [O, 1) whose complement consists of the union of pairwise disjoint open intervals, the sum of whose lengths is 1. It is interesting and instructive to look at the geometric positions of the intervals that comprise cc (see Fig. 3.3. l ). The interval I 0, 1 is
] 1/3, 2/3[,
0
that is, the "middle third" of the interval [O,
1 9
2 9
2 3
1 3
7 9
1].
8 9
The interval
1
FIGURE 3.3. 1
/0,2 is
] 1/9, 2/9[,
and the interval /2,2 is
)7/9, 8/9[.
These intervals
represent the "middle thirds" of the intervals that remain after I 0, 1 is removed. Proceeding in this way, by removing the "middle thirds" of the intervals that remain after any given stage, after a denumerable number of steps we are left with Cantor's set.
3.3
DECIMAL EXPANSIONS I 125
O Exercises 1.
If
m,p, n E N, m
write
p /mn 2.
If
>
n
=
p,m E N, m
1,
L
k Pk/m ,
>
1,
k=l
p < mn,
and
show that it is possible to
pk E ( 0, m
-
1 ).
show that p has a unique representation
in the form
Pk E (O,m-1), Pn¥- 0. (Hint:
Exercise
1
may be helpful.)
3. A sequence (ak) is said to eventually periodic{:::::> 3N and 3 p E N so that k ;:_;,,, N ==> ak+P = ak. If m E N and m > 1, we know that every a E;:: [O, I] can be written
ak E (O,m-1). Show that 4.
a
is rational{:::::>
Suppose that
(ak)
is eventually periodic.
m E N, m > 1, and suppose we make the decimal [ 0, 1 J unique by taking the representing
expansion of any number in
series of a terminating decimal as a finite sum. Show that 00
L
k=I if and only if A=
5.
k ak/m
m,
n � L lik(x )I k=m+l
=
l 0,
is conditionally convergent� the infinite series
is conditionally convergent. 6.
Suppose
the infinite
product
nk,,,O (I
+
ak)
is conditionally
convergent. Use the results of Exercise 5 to show that Va E R, a¥- 0, there is a rearrangement of the product which converges to a.
CHAPTER
41
DIFFERENTIATION
4.1
THE DERIVATIVE CONCEPT
When speaking about the derivative of a function at a point, it is usual to take the domain of the function an open interval or an open set, and, of course, the point to be in this open set. However, situations often arise where functions are defined on closed intervals, and it is conven ient to talk about the derivatives of the functions at the end points of the intervals. It is for this reason that we give the following slightly more general definition.
4.1.1 E
Definition. A function f is said to have a derivative at a � a £J(f ), a is an accumulation point of£J(f ), and . f(x) - f(a) lIm x-a
x-a
exists. In case this limit exists, it is called the derivative of f at a and is denoted by any one of the symbols 'f'(a)', 'df(a)/dx', or 'D f(a)'. If f is differentiable at every point of its domain it is called differentiable. The derivative of a function f is that function f' (or dj/dx or DJ) with domain £J(J') = {x:f'(x) exists} (possibly void) whose value at the point a E £J(f') is the derivative of f at a. From a logical point of view the notation
DJ,
for the derivative of a
D as a function with all real-valued functions each having domain
function, is the most suitable. For we can consider domain the collection &I of
in R (including that function whose domain is the null set), and the range of
D
is also in &I. Then we can define
D2 = D oD, D3 = D2 oD,
and so on. !his "definition by induction" is made precise in the follow ing way. It can be proved by induction that there exists a unique func tion
F with
domain N and range in the collection of functions each
having domain &I and range in &I so that
F(l) = D , If we set
F(n+I) =F(n) oD .
D11 =F(n), then V f E &I we call D"f the nth derivative 'j11 < » and 'df11/dx"'. We also set D0J =f
of
f.
Other notations are
4.1.2
at a� a lll8
Definition. E
We shall say that a function f is n-times differentiable £J(J 0, l/2n < S, and m be that integer so that m/2n �a < (m + 1) /2n. Set b1 m/2n, b2 = (m + l )/2n, and b = (b1 + b2)/2. If k > n, then sincefk has period l/2k, we have f (b1) =f (b) =fk(b2). If k < n, let p be that integer so k k that p/2k �b1 < (p + 1) /2k. Then, of course, p/2k < b< (p + 1) /2k and p/2k < b2 � (p + l)/2k. If pis even we have, for j E (1, 2), =
fk(b2) - fdb1) fk(bj)-h(b) = = 2k' b; -b b2 -b1 and, if pis odd,
In either event, if k >
n
or k
(f )
n
J:>(g).
h is
Formula (b) is a consequence of the equation
(Jg)(a+h)- (Jg)(a) h =f(a+h) the fact that
g(a+
�)
-
g(a)
+g(a)
f(a+h -f(a) '
k
f is continuous at a, and the fact that a limit of a sum and
product is the sum of the limits and the product of the limits, respect ively. Formula (c) is a consequence of the equation
(I/g)(a+h)- (I/g)(a) g(a+h)- g(a) -1 - g(a h)g(a) h h + _
a, and the fact that the limit of a product g is continuous and g(a) "" 0, g(a+h) ""0 for all sufficiently small h for which a+h E J:>(g). the fact that g is continuous at
is the product of the limits. Note that since
Theorem (Chain Rule). If f and g are functions with �(g) J:>(f ), and g'(a) and f '(g(a)) exist, then (f g)'(a) exists and
4.2.2 C
0
(fog)'(a)=f '(g(a))g'(a). Proof.
The most natural way to begin to construct a proof is to
consider the equality
(f
0
g)(a+h)- (f g)(a) h 0
=
f(g(a+h)) - f(g(a)) g(a+h)- g(a). h g(a+h)- g(a)
Since g has a derivative at a, it is continuous at a and hence as h - 0, g(a+h)- g(a) - 0. Consequently, taking limits on both sides of the
4.2
DIFFERENTIATION RULES I 147
above equality we should get the formula stated in the theorem. The
g(a + h) - g(a)
one possible difficulty with this method is that be zero for an infinite number of values of
h
could
in every neighborhood of
zero. Hence it would not always be possible to divide by the quantity
g(a + h)-g(a).
Consequently, we must proceed in a somewhat dif
ferent way.
f is differentiable 81 & y E .B(J ) ==>
Since
0, 381
at
>
0
so that
IY - g(a)I
IJ(y)-f(g(a))-j'(g(a))(y-g(a))I:;;; e,ly-g(a)I. Also, since g is differentiable at
a, Ve1 > 0, 38> 0
& x E .B(g) ==> lg(x)-g(a)I < 81 and ·
Hence, if
so that
(4.2.1)
Ix-al
lg(x)-g(a)-g'(a)(x-a)I:;;; e,lx-al.
Ix-al
0 so that Ix - al < 83 and x E Je(g) l ( ) - g(a)I < 82• Hence if -al < 83 and x E Je(g), we have from (4.2.6)
�
IJ(g(x))-f(g(a))-f'(g(a))(g(x) - g(a))I
�
eiJg(x)- g(a)I. , (4.2_6 )
Let us put 8 =min (8i. 83) and m = I/If'(g(a))I· If we use (4.2.5) and (4.2.6'), then Vx E Je(g) with Ix-al < 8, we get
lg(x) -g(a)-Cfoc:l���) �
I E1 m{lg(x)-g(a) I+ Ix-a} l . (x-a)
(4.2.7)
Now set M= I (f0g)' (a)I and use the triangle inequality on (4.27) to get (I - me1)
lg(x) -g(a)I
�
m(E1+ M) Ix.:._ al .
Take E1 < l/2m, and we get
lg(x)-g(a)I
�
(I+ 2Mm) Ix - al .
If we use this inequality in the right side of (4.2.7) and set A=
2m(I + mM), we get
(fog)'(a) lg(x)-g(a)- f' (x-a)I (g(a))
�
e1 A Ix - al .
(4.2.8)
Thus Ve> 0, take e1 < min(e/A,l/2m), and we see that 38> 0 so that 0 < Ix-al < 8 and x E Je(g) � -g�(a � ) l�g(�x� x) � -a
_
(f g)'(a) < E f'(g(a)) 0
I
(4.2.9)
·
In Theorem 4.2.2 we demanded that �(g) C Je(f ) to ensure that a is an accumulation point off0g so that we could talk about (f0g)'(a). InTheorem 4.2.3we needed�(g) C Je(f ),since an examination of the proof reveals that otherwise (4.2.9) would not necessarily be valid for all x E Je(g) for which 0 < -al < 8.
REMARKS.
Ix
D Exercises
Use the rules for differentiation to establish the following: (a) If f(x)= xn, Vx E Rand fixed n E N0, then f is differ entiable. 1.
4.3
MEAN VALUE THEOREMS j 149
(b) If c is a constant and f is differentiable at a, then cf is differentiable at a. (c) Any polynomial function of degree n E N0,
p(x)
=
n
L
k=O
akxk,
is differentiable. Use Theorem 4.2.3 to solve Exercise 1 of Section 4.1.
2.
3. Assume the results of Exercise l of Section 4.1 as known. Supposefand g are functions with se(g) C £>(!), f'(y) exists and is continuous in an open interval around g(a), J'(g(a)) # 0, and (J g)' (a) exists. Use the chain rule (Theorem 4.2.2) and the results of Exercise 1 of Section 4.1 to show that g'(a) exists and 0
g'(a) = (f g)'(a)/J'(g(a)). 0
4. Compute the derivative of e (x ) =ex and assuming the deriva tive of the logarithm function is known, use the chain rule to show that the following functions are differentiable and compute their derivatives: a> 0, Vx ER. (a) J(x) = ax = e x loga' Vx> 0, a ER. (b) J(x ) = xa =ea logx'
5. Assuming that the logarithm is a differentiable function, use Theorem 4.2.3 (and not the chain rule) to show that the functions in (a) and (b) of Exercise 4 are differentiable. 6. Suppose that f and g are n times differentiable functions on ]a, b[. If h is the product off and g, prove Leibnitz's formula for the nth derivative of h,
h(x) =
n ) pn-k>(x)g/-l(x)' ( k=O
i
k
where
4.3
MEAN VALUE THEOREMS
All the mean value theorems of the differential calculus are based on two principles: (a) a continuous function on a compact set assumes a maximum and a minimum, and (b) if a differentiable function is defined in an open interval about a point where it has a local maximum or mini mum, then the derivative must be zero at this local maximum or minimum.
150 I DIFFERENTIATION
4.3.1
Theorem. If f is a function with JF>(f) = ]a, b[, a< b, and a local maximum or local minimum at c, and if moreover f' (c) exists, f has if then f'(c) =0. Proof.
Let us suppose that
f(c + h - f(c)
k
{;;;:.:
has a local minimum at
f
0
if h > 0 ,
.;;; 0
if h< 0 .
c.
Then
From this we see that the limit of the difference quotient, as
h - 0,
must be zero.
4.3.2
If f is a continuous function with JF>(f) = [a, b] , a< b , f(a) =f(b) = 0, and f is differentiable on ]a, b[, then 3c E ]a, b[ so that f' (c) = 0. Proof.
Rolle's Theorem.
Since
f
is continuous on the compact set
[a, b],
it has a
maximum and a minimum on this interval. If the maximum and mini mum are taken on at the end points, we havef(x) hence the theorem is true. If
c
E
]a, b[, then
f
=0
for every x and
has a local maximum or minimum at
by Theorem 4.3.1,
f'(c) = 0.
4.3.3 Mean Value Theorem. If f is a continuous function with JF>(f) = [a, b], a< b, and f is differentiable on ]a, b[, then 3c E ]a, b[ such that
f(b) - f(a) = f'(c) . b-a Proof.
From a geometric point of view, the Mean Value Theorem
says that there is a point on the graph off where the tangent line to
f
is parallel to the line joining
(a, f(a))
to
(b,f(b)), Fig. 4.3.1. The (a,f(a)) and (b,f(b)) is
equation of the straight line through the points
f(b)-f(a) y=f(a) + (x-a). b-a y
a
c
Figure 4.3. 1
b
4.!
MEAN VALUE THEOREMS I 151
The difference between y andf(x) is
F(x)
=
f(b) - f(a) x a f(a) + b-a ( - )- f(x).
Now, F is a function which is continuous on
]a, b[, and F(a) F(b) 0. Hence F and find a c E ]a, b[ such that =
=
F'(c)
=
f(b)- (a) c f b-a - f'( )
differentiable on
=
0.
If f and g are continuous b, and have derivatives on ]a, b[, then
Generalized Mean Value Theorem.
4.3.4
functions defined on [a,b], a 3c E ]a, b[ such that
0. =
The statement
P(l)
is a statement of the Mean Value Theorem and
hence is true. Assume
P(n- l) g(x)
n> l
is true for =
and set
6hf(x) . h [a, b- h] and is n- l [a, b], then x + (n- l)h P(n- l) to g and find a
The function g is defined and continuous on times differentiable on E
[a, b - h].
81
so that
]a, b- h[.
If
x + nh
E
Consequently, we may apply
0 < 81 < 1. Now,
1 g(x + (n - l) 81h) =
x u(x) exists on ]a, b[, show that
J(x+h) - 2f(x) +f(x-h) r 1(2)(x) =hi_?;! . h2 (Hint: Use L'Hospital's rule.) 11. Suppose f is differentiable on [a, b] and f' (a) =a, f' (b) =f3 . Show that f ' takes on all values between a and f3. [Hint: If 'Y is strictly between a and {3, show that g(x) J(x) - ')l(x - a) takes on its maxi mum or its minimum in the open interval ]a, b[. This result is often called Darboux's theorem or property.] =
12. Let (r n) be a sequence whose range consists of all rationals in ]O, I[. Show that
4.4
TAYLOR'S REMAINDER FORMULAS
If f is a polynomial of degree n - I and a E R, then we may write
f(x) =ao+a1(x-a)
+
·
·
·+an_1(x-a)11-1•
By successive differentiation of both sides of this equation at a we find
that
J(a) ak=�,
kE (O,n-1).
For a general function defined on an interval [a, b], and which 1s
n - I times differentiable there, we write f(x)
=
n-1 j(a)
L
k=O
k-!
-
-
(x - a)k + R,.(x, a) , -
where this formula serves to define Rn· The problem is to find a con venient form for the remainder R,.. Any such formula is called a
Taylor's remainder formula,
4.4 TAYLOR'S REMAINDER FORMULAS I 159 although none of the expressions for
Rn
is
due to Taylor himself. The method for obtaining expressions for the remainder is an application of Rolle's theorem.
Suppose J n - l continuous derivatives on [a, b], and n times differentiable on ]a, b[. Further suppose and '11 are continuous on [a, b], differentiable on ]a, b[ and Vx,y ]a, b[ with a y x, the determinant (y) '11'(y) I '(x) 'l'(x) I Then Vx ]a, b], 3c ]a, x[, so that n (x - a)k Rn(X, a), f(x)=k=OL-1 jk(a) l where (a) (a) n I p> Rn (x, a)= - -1I(x) , (-c)--v'{!'1-1 (x) '-(c-)- (n -(lc)) ! (x - c)n-1. (x) v(x) I F [a, x], a x b, ]a, x[, 3c ]a, x[ F'(c) '(c) 'l''(c) F(a) «l>(a) 'l'(a) = F(x) «l>(x) '11 (x) a t b, -1 J ,L -(t) (x - t)k. F(t) J(x) - nk=O k. F [a, b] ]a, b[ F'(t)=- (x(n--t)n)-1! pn>(t). F(a)=Rn (x, a), F(x) Rn(x,a), 4.4.1
Theorem. t is
has
E
(c) (x-c)n-1, '(c) (n-1) !
The Roche form of the remainder: Rn(x,a)=
(c)
pn>(c) 1 ! (x-a)P(x-c)n-v, )
p{n-
a< c< x.
The Lagrange form of the remainder: f(c) R (x,a)=--1- (x-a)n, n n.
(d)
a< c< x.
a< c< x.
The Cauchy form of the remainder: - pn>(c) Rn(x,a)l (x-a)(x-c)n-1, (n- )!
Proof.
a< c< x.
To prove (a) choose 'I' any nonzero constant and apply
Theorem 4.4.l. To prove (b) set (t) = (x-t)v, 1 � (a). To prove (c) and (d), REMARK:
p set = n
p and
=
p
�
n, in part
1 in part (b), respectively.
In Theorem 4.4.1 and Corollary 4.4.2 for the sake of
convenience in the statements and proofs we have expanded about the left end point a. Clearly, everything will be equally valid if we
expand around the right end point b. In the future we shall use this fact without comment even though we refer to Theorem 4.4.1 or Corol lary 4.4.2.
The Taylor remainder formulas give a very convenient method for deciding when a given infinitely differentiable function is the sum of
a convergent infinite series. For example, Vn E N0 the nth derivative of the exponential function is again the exponential function. If we use the Lagrange form of the remainder we may write
ex=
n-1 xk ec L-+x n' ! n!
k�O
k
where c is a number between x and 0 and depends on both n and x.
If lxl � b, then
4.4
Since
bn/n! � 0
TAYLOR'S REMAINDER FORMULAS I
161
(Exercise 6 of Section l. 7), we see that the remainder
goes uniformly to zero in
[-b,b].
Thus .,
ex=:Lxk/k!, k=O
and the convergence of the series on the right is uniform on every compact set in
R.
The proof we have given above to show that the exponential func tion is the sum of an infinite series leads immediately to a more general result: If a function f is defined and infinitely differentiable in an open interval
I(a)
about the point
compact subinterval of
I(a)
a,
and if
(J'kl)
when restricted to every
is uniformly bounded, then
f(x) =
.,
L
k=O
r
.
I(a). Indeed, M= sup{IJ'k>(x)I: x E j
and the convergence is uniform on every compact set in
j= {x: Ix - al ,,;;; b } C I(a) and & k E N0}. Using the Lagrange form of for x E ],
suppose
lpn>(c)
let
the remainder again, we get
IRn(x,a)I= -1 - (x-a)n n. This estimate on
Rn
gives the result.
I
bn .;;M-,. n.
Actually, it is a rather interesting fact that the conclusions we have obtained in the previous paragraph will follow from the considerably milder assumption that all the derivatives are uniformly bounded below (or above). This fact is due to Serge Bernstein. Before we prove Bern stein's theorem we shall prove a lemma. The development we give is taken from the book by W. Maak,
An Introduction to Modern Calculus, I 963.
Holt, Rinehart and Winston, Inc., New York,
4.4.3 Lemma. Suppose J has do main [a,b] , is infinitely differentiable, and 3m > 0 so that Vx in [a, b] and Vn E N0,
-m
,,;_:;
pn>(x) .
Then 3M so that Vx E [a,b[ and Vn E N0, j Proof.
Suppose, at first, that
remainder,
Vx E [a,b[, we
f (b) where
x
(x)I� (b -x)n Let us use the Cauchy form of the remainder in Taylor's formula:
pn>(O R (x,c)= (x-c)(x-On-1• n (n-I)! Using the above estimate for
IJ({)I.
I Rn(x, c) I � nM If
c
0 uniformly for
This proves the theorem, since the last statement of
the theorem is obvious.
The last theorem gives a sufficient condition in order that a function
may be represented by a special kind of infinite series. Such functions
have a special name that we emphasize by a formal definition.
4.4.5 Definition. If f is an infinitely differentiable function having an open domain whose values in some open interval about the point a can be represented as
f(x)
=
"'
( a
k=O
k.
) :L J -,- - (x-a)k,
then f is said to be analytic at a. The series is called the Taylor expansion of f at a. If f is analytic at every point of its domain it is called analytic. We should point out that if a function is infinitely differentiable in
the neighborhood of a point it does not necessarily mean that the func
(
tion is analytic at the point. For example, the function given by
f(x)
is infinitely differentiable and
Section 4.1 ). Hence, if
f
e-1/x• '
x
0,
x
=I=
0,
=
=
0,
Vn E N0, pn>(O)
=
0 (see Exercise 6 of
were analytic at zero, it would of necessity have
to have zero values at every point of some neighborhood of the origin,
which of course it doesn't.
164 I DIFFERENTIATION
0 Exercises 1. Do Exercise 10 of Section 4.3 by using an appropriate form of Taylor's remainder formula and assuming thatj is continuous .
2.
Let f be the function with domain ]-1, 1[ defined by
f(x) =1
1 -
x.
Show that f is analytic at zero and that its Taylor expansion at zero converges uniformly on every compact subset of ]-1, 1[. 3. Generalizing the considerations of Exercise 2, letfa be the func tion with domain ]-1, 1 [ defined by a
ER.
Show that fa is analytic at zero and its Taylor expansion at zero con verges uniformly on every compact subset of ]-1, 1[. 4.
Show that the function f with domain J-1, 1[ defined by f(x) =log (I - x)
is analytic at zero and its Taylor expansion at zero converges uniformly on every compact subset of ]-1, 1[. 5. We have shown in Section 4.3 that the exponential function with values e3' is analytic at zero and its Taylor expansion converges at every point of R. Hence we may write 00
e= L llk' k=O
=
n
L
l/k! + Rn+1·
k=O
Show that Rn+i < I/n!n. This estimate for Rn+i shows that e is irrational. Indeed , if e is rational it can be written e = p/q, p, q E N and q
L k=O
llk' < pfq
(c)
POWER SERIES I 165
k [ g(x) - k=O 'f gk< >�a)(x - a)k ] pn>(c). . By choosing g to be a suitable function, obtain the Taylor formula with Lagrange remainder. Give an example of a function
8.
f that is defined and infinitely
differentiable in a neighborhood of zero so that in some interval
� �
f(
x) =
J 1.
Indeed, the proof of the Cauchy root test shows that the absolute con vergence is uniform on every compact subinterval of
]a - r, a+ r[.
However, the uniform convergence is also easily established by means of the Weierstrass M Test. For, if 0 < and hence
�k..olc kl Ix - al k
s < r,
then
�k"'o le kls k converges
converges uniformly in the interval
[a - s, a+ s]. The number
r
power series and
of (4.5.1) is called the
]a - r, a+ r[ is
radius of convergence of the given interval of convergence.
called the
The question as to whether a convergent power series is always the Taylor expansion of an analytic function is answered by the next two results.
4.5.2 Theorem. If the power series �k..o (ck(x - a)k) has a nonzero radius of convergence r , then the function f defined on ]a - r, a+ r[ by 00
J(x) = L cdx - a)k k=O is infinitely differentiable and 00
f' (x)
=
L kck(x - a)k-l,
k=l where the latter series also has the radius of convergence r. Proof. tion 2.4)
Since (Exercise 10 of Section I. 9 and Exercise 5 of Sec
4.5
POWER SERIES I 167
it follows that both of the series above have the same radius of con vergence. If we set
fk(x) =ck (x-a) k, then
fk
is differentiable and by Theorem 4.5. l and the first paragraph
of the proof of this theorem it follows that 00
2
k=O
(fk')
is uniformly convergent on any compact subinterval of the interval of convergence. Thus we may apply Corollary 4.2.5 on termwise differen tiation of a series to arrive at the differentiation formula of this theorem. The fact that the limit of the power series is infinitely differentiable follows by use of the axiom of induction.
If the power series �k..o (ck(x - a)k) has a n onzero radiu s of convergence r, an d if 4.5.3
Corollary.
00
f (x) then Vn
2
=
k=O
ck (x-a)k,
]a - r, a+ r[,
E
E N0,
Cn Proof.
pn>(x) x
=
pn>(a)/n!
By the use of the previous theorem and the axiom of induc
Vn
tion it is easy to establish that
Setting
x
=
a
E N0,
00
=
L k(k - 1)
k=n
· · ·
(k- n
+ I)c k(x-a)k-n.
in both sides gives the formula for
Cn .
If we formally multiply two Maclaurin series together and collect terms all having the same powers of 00
L
n=O
(anxn ) ·
x,
we get 00
00
( bnxn) = L (cnxn), n=O n=O
L
where
Cn
n
=
2 akbn-k·
k=O
The power series on the right is called the
Cauchy product
of the series
on the left. The natural question to ask concerns the value of the radius of convergence of the Cauchy product in relation to the radii of con vergence of the series that make up the product. We shall prove a theorem of this nature for series of constant terms that will immediately answer this question for power· series.
168 I DIFFERENTIATION
4.5.4 Definition. If (a,CT(a)) and (b,CT(b)) are infinite series, then the Cauchy product of these series is the series (c,CT (c)) , where Cn =
n L a kbn-k· k=O
(4.5.2)
The following theorem about Cauchy products is somewhat more general than is needed to establish the facts about Cauchy products of power series. The proof would be somewhat easier if we demanded that both series be absolutely convergent.
4.5.5 Theorem (Mertens). If (a, A, it follows that Ik= U {I*:I* EA k} and hence (Exercise 2 at the end of this section ) I I kl
L
=
II*I.
/*EAk
Ifm*(/*)=inf {f (x) :x EI*} andI* Ch, it is clear that m k ,,;_;; m* (/*). Thus we get n
L mk IIkl k=l
n
L 2, m*(I*)I I *!. k=l J•E A k Since A*= {/*: I* EA k & k E (l, n) }, it follows that the right side of the above inequality is precisely !l1(A*) . This proves the left-hand in equality of Lemma 5.1.6. The right-hand inequality follows by similar reasoning, and the middle inequality is obvious. ,,;_;;
Proof of Theorem 5.1.5. Suppose that the Riemann integral of f exists. f must be bounded (Exercise 4 of Section 5.1) and VE > 0 there is a decomposition A of [a, b] such that
-e/2 < R1(A, {xk}) -R(J) < e/2.
(5.1.l)
We shall suppose that a < b, since otherwise the fact that D( f ) exists and is equal to R(f ) is trivial. If we choose xk EI k so that Mk -f(xk) < e/2(b - a), then -
D1(A) -R1(A, {xk})
n
I kl < e/2. 2, (Mk - J (xk))I (5.1.2) k=l On the other hand, if we choosexk E Ik so thatf(xk) - mk < e/2(b- a), we get (5.1.3) =
186
\INTEGRATION
From (5.1.1) and (5.1.2) and the definition of D(J) we get D(J) - R(J) � l5t Q_there are decompositiOns .6.1 and .6.2 so that D(f) -[21(.6.1) < e, and DtC.6.2) -D(f) < e. If .6., is the common refinement of .6.1 and .6.2, then Lemma 5.1.6 gives =
=
D(f) - []t(.6..)
.6., ==::} 5. 1. 7
�
Proof. If the Riemann-Darboux integral off exists, then Ve> 0, 3A, so that A > A, and A' > .6., ==::} \R1(A, {xd) R(J) I < e/2, \R1(.6.', {x'd) - R (J) I < e/2. Hence, by the triangle inequality, �
\RtC.6., {xd) -R,(.6.', {xk}) I < e. Conversely, suppose R1 is Cauchy. The method of obtaining a limit is a variation on the theme of the proof of Proposition 2.4.4. There exists a decomposition A0 so that .6.> A0 ==::} \R1(.6., {xk}) -R , (.6.o, {xok}) \ < 1 . Hence A>A0 ==::} \R1 (A, {xk}) I is bounded by 1 + \R1(A0, {x0k}) \. For every .6.>.6.0 let us set -;;;(A) =sup {R1(.6. ', {x�})
:
A' >A},
Jim R1= inf{-;;;(.6.) :A>A0}. We claim that Jim R1 is the Riemann integral off. First, note that A*> A =}�(A*)� ;;J(A). Next Ve> 0, 3A1 >Ao so that (5.1.7)
5.1
RIEMANN-DARBOUX INTEGRALS I 187
Also, 3A2 >-A0 so that A, A*>- A2 ==:} IR1(A, {xd)-R1(A*, {x\}) I
- A, and A' >-A,==:}
IR1(A, {xk}) -R1(A', {x�})I
< E.
Let A1,, A2,, and A3, be the subsets of A, which are decompositions of [a,c], [c,d], and [d,b], respectively. If A2 is a decomposition of [c,d] and A2 >-A2. , then A= A1, U A2 U A3, is a decomposition of [a� b],which is a refinement ofA,. IfA '2>-A2, and A' = A1, U A'2 U A3., then
Hence R0 is Cauchy and the theorem follows from Theorem 5.1.7. 5.1.9 Theorem. If f and g are Riemann-Darboux integrable functions de.fined on [a,b] , then ( a) for all real numbers a and f3, af + {3g is integrable, (b) f g is integrable, and (c) Ill is integrable.
188 I INTEGRATION
Proof.
For every
E
>
0 , 3d, so that d
>-
d, ==?
I �1 af(xk)IIkl - J:f(x) I I� J3g(xk)IIkl - J3 J: g(x) l
p (t) dt. (n - I)!
PROPERTIES AND EXISTENCE OF RIEMANN-DARBOUX INTEGRALS I 193
5.2
If {3 ;;;.:
a
and
g is
integrable on
[a,{3], define
J: g(t) dt - J: g(t) dt. =
With this definition, the formula for Rn is valid as well for a � Hence
� l) ! J:
Rn(X,c) = (n
c
�
(x - t)n-ipn>(t) dt.
x
�
b.
(5.2.1)
5.2.4 Theorem. If g is a continuous increasing function on [a,b J and difef rentiable on ]a,b[, f is defined on se(g) and integrable, and (f 0 g)g' is integrable, then
fg(b) f(x)
d:x=
g(a)
fb f0g(x)g' (x)
d:x.
a
If d { [ak,bk]: k E (1, n)} is a decomposition of [a,b], { [a'k,b'k]: k E (1, n)}, where a'k = g(ak), b'k g(bk) is a de composition of [g(a),g(b)]. Conversely, since g is continuous and increasing, it is one to one and every decomposition of [g(a),g(b)] comes from a decomposition of [a,b J in this way. Further, xk E [ak,bk] x'k = g(xk) E [a'k,b'k] and d1 >-d d'1 >-d'. Since f is integrable, for every E > 0 there exists a decomposition d', of [g(a),g(b)] so that if a' >-d'., then
Proof.
d'
then
=
=
=
IR1(d', {x'd) - Jg(b)f(x) I
d:x
0, we have
J:+h J(t) dt- J: f(t) dt= J:+h f(t) dt, rx J(x)= ldx +h f(x) dt. 1
Hence
k
F(x+h -F(x) f(x)
f is continuous at x, Ve > IJ(t) - f(x) I < E. If we take h
Since =>
I
r
h\"«}
l * J:+h IJ(t) - f(x)I dt. :o;;;
0, 38 so that
It - xi
< 8 &
t
E
fe(f)
< 8, we are led to the conclusion
F (x+h)-F (x) =J(x) . h
5.2
If
h
PROPERTIES AND EXISTENCE OF RIEMANN-DARBOUX INTEGRALS I 195
< 0, a similar argument shows that
lim
h)'O
F(x + h) - F(x) = J(x) , h
which concludes the proof of the theorem.
5.2.6 First Mean Value Theorem. If f and g are defined on [a, b], g � 0, Jg and g are integrable, and J is bounded, then there exists a number c such that inf J ::% c ::% sup f and
J: J(x)g(x) Proof.
Let
dx= c
m =inf f, M =sup
J: g(x)
f. Since
dx.
mg(x)
::%
J(x)g(x)
::%
Mg(x),
it follows that
m Now, if
J: g(x)
J: g(x)
f J(x)g(x)
dx ::%
dx ::% M
J: g(x)
dx.
dx= 0, then the theorem is clearly true. If
J: g(x)
dx
> 0, set
c=
J: J(x)g(x) /J: g(x)
and it is immediate that
dx
m
::%
c
::%
dx,
M.
5.2. 7 Second Mean Value Theorem. If f and g are defined on [a, b], g � 0, Jg and g are integrable, J is bounded, and m ::% inf J ::% sup J ::% M , then 3c E [a, b] such that .
f J(x)g(x) Proof.
dx= m
Define the function
G(x)= m G
J: g(x)
G
on
dx + M
[a, b]
J: g(t) dt
+
M
f g(x)
dx.
by the equation
f g(t) dt.
is a continuous function (Exercise 4 of Section 5.2) and
min
G
::%
G(b) = m
J: g(t) dt J: J(t)g(t) dt M J: g(t) dt= G(a) ::%
::%
Since
G
takes on all values between min
G
::% max
and max
G,
G.
and since the
196 I INTEGRATION
above inequality shows that
J: f(t)g(t) dt
maximum and minimum, 3c E
G(c)
=
[a, b]
is a number between this
such that
J: J(t)g(t) dt
.
This proves the theorem. We shall now present a theorem concerning the integration of a unifotmly convergent sequence of integrable functions. The theorem is useful for a wide variety of purposes.
5.2.8 Theorem. If Un) is a sequence of functions each of which is de fined on [a, b] and integ;rable, and if fn - f uniformly on [a, b], then f is integrable and
J: fn(x)
dx -
J: f(x)
dx.
Proof. For every e > 0, 3N so that n � N and x E [a, b] => lfn(x) - J(x) I < e/3(b- a). In particular, this means that for every set A C [a, b], and Vn � N,
lsup{fn(x): x Iinf Un(x): x Fix
m
� N and let
E E
A} - sup{f(x): x A} - inf {f(x): x
E E
A}I � e/3(b- a), A}I � e/3(b- a).
a. be a decomposition of [a, b] so that .:1
ID rm (.:1) - 12rm (.:1) I
-
=>
·
Hence
ID,(.:1) -Q,(.:1) I � ID,(.:1)
-
i5,m ( .:1) I+ ID,m( .:1 ) - Q,m(.:1) I +
Thus
f
is integrable. Further,
I J:fn(x)
dx-
n
ll21m(.:1) - l2r(.:1) I
< E
•
� N =>
J: !(x) dxl J: lfn(x)- f (x)I �
dx � e/3,
which concludes the proof.
,,,/
If Cfn) is a sequence of functions each of which is defined on [a, b] and integrable, and if Lk,,,0 (fk) is uniformly convergent to f on [a, b], then f is integ;rable and 5.2.9
Corollary.
We shall leave the obvious proof of this corollary for the reader. We should remark that it is possible to relax the hypothesis about the uni-
5.2
PROPERTIES AND EXISTENCE OF RIEMANN-DARBOUX INTEGRAl.S I 197
form convergence of the sequence
(Jn) and still obtain the conclusion
of theorem 5.2.8. However, the hypothesis cannot be relaxed all the way to pointwise convergence as the following simple example shows.
Un) is the sequence of functions, each having domain [O, l]
Suppose
(
and defined in the following way:
fn(x)
=
n+ l
{::}
x E ]0,1/(n+ I)],
0,
otherwise.
It is not hard to check that each
fn(x)
�
0. But Vn E N0,
fn is integrable and Vx E [O,l],
If a sequence Un) converges pointwise to an integrable function J and if the sequence is uniformly bounded, then the conclusion of Theorem 5.2.8 remains valid. The proof of this result belongs more properly to the circle of ideas connected with the Lebesgue theory of integration and we shall not prove it in this book. Note that in our previous example the sequence of functions does not remain uniformly bounded. We shall now finish this section by giving two different sufficient conditions that a Riemann-Darboux integral of a function exists. Al though the conditions we shall give are not necessary conditions, they nevertheless are broad enough to be very useful in a wide variety of circumstances.
5.2.10 Theorem. If f is defined on [a, b] and is monotone nondecreas ing [nonincreasing], then f has a Riemann-Darboux integral. Proof.
Let
a= {lk:
k E (l,n)} be any decomposition of [a,b].
Suppose that the intervals Ik
= [ak, bk] are named so that a1 =a, n. Since f is nondecreasing, the mini mum and maximum off restricted to Ik are taken on at ak and bk ,
bn
=
b, and ak
=
bk-I for 1 < k
.,;;
respectively. Hence
n,(a) = f(a2)(a2 - a1) + J(a3)(a3 - a2)+ ...+ f(b) (b - an)' Q,(a) =f(a1)(a2 - a1)+ f(a2)(a3 - a2)+
-
·
·
·
+ f(an)(b - an).
Thus
D,(a) - Q,(a) Now, for
E
>
"
=
_L
k=l
[f(ak+1) - f(ak)] [ak+i - ak],
0, choose a so that 1a1
( (x - c)s
by making the change of variable t= (x- c)s + c in ;;:. 0, pn> is nondecreasing and thus for x > c,
J 1. Thus the p series converges {:=:} p > 1 .
Let us give another example that shows how the techniques used in the proof of the integral test can be used to obtain rather refined esti mates for certain finite sums. Let us show that
n k}: =2
1
k log k = -
log logn + a+ bn,
where a is a constant that satisfies
0 < a
0 it is clear from the formula
- loge= that the function with domain
I.I dt
]O, l]
•
-
t
and values l/t does not have a
5.3
IMPROPER INTEGRAIS I 207
convergent improper integral of the second kind. On the other hand, for
E
>
0,
. [f-· -+ dt f dt t t ] 1
hm E-0
-1
E
-
=0.
If an integral of a function exists in this sense, we say that
f
has a
convergent Cauchy principal value integral. We give the formal defini tion below.
Definition. Suppose f has domain [a,b] \{c}, c E ]a,b[, and with O f dt +
c+E
f(t) dt.
Then the ordered pair (f, I (J)) is called a Cauchy principal value integral. Similarly, if f has domain ]-oo,oo[ and Vx � 0, Jl[-x,x] is integrable, and I(j) I(f )(x) =
J:/(t) dt,
then the ordered pair (f, I (f)) is also called a Cauchy principal value inte gral. The Cauchy principal value integral (f, I (f)) is said to be convergent ¢=>Jim,_0I(J)(e) or limx-ool(J)(x) exists, and the latter numbers are called Cauchy principal values. We shall now give some examples which show that the symmetry used in the definition of the Cauchy principal value may be very important. Since
t2
sin
t
is an odd function, its integral over
the integral of 1/ ( 1 +t2) over . IJill x-oo
[-x,x]
x 1 +
I
-x
t2
[-x,x]
is 2 Arc tan
sin
1 +t2
t
dt
=
is zero. Also
Thus
x.
7T.
On the other hand,
x+17 1 +t2 sin
f
1
-x
+ t2
t
dt = 2
Arc tan
and thus the limit does not exist as
x+2
cos
x+
f
x+1l
x
I - sin t dt, 1 + t2
x �- oo.
By the same type of reasoning we find that .
x
hm
x-oo
On the other hand, x-00
Jim
f
2x
I
..!.2 .±..i_ dt = Jim
-xI+t
x -co
I+t dt = 7T. t2
-x 1 +
-
fx
-x
..!..±.2 .i_ dt+Jim l+t
x-co
f
2x
x
..!..±..i_ dt. I+t2
208 j INTEGRATION
Now,
2X df J x-00 x I+t2 . 2x -1-t - dt I . x-oo Jx +t2 x-oo •
hm
=
hm
Thus
=
0
-
2
hm log
(1+4x2 2) 1+x
. zx,I+t +t2 dt + x-oo J-x 1
hm
=
7T
=
log 2 .
log 2.
The definitions we have written down do not exhaust all the possi bilities for defining improper integrals and the reader can undoubtedly think of cases we have not discussed. However, in most instances a suitable definition of an improper integral will be either a variant or a combination of the definitions we have discussed. Some of the follow ing exercises are designed to exhibit the various possibilities.
D Exercises I.
State and prove results analogous to Theorems 5.3.2 and 5.3.3
for Cauchy principal value integrals.
2.
Discuss the convergence of the following improper integrals: (a)
(b)
(c)
3.
1
·
1
Discuss the convergence of the following improper integrals: (a)
(b)
(c) 4.
!100 (/� 1). L"" C2� ) J:""C3 � ) ·
L1 (t t dt). J: c- � t dt) . in/2 ( � ) · 0 log
in
t
cost
For what values of (a)
f000 (t"'e-1 dt).
a
will the following integrals converge?
5.3
(b) (c)
5.
IMPROPER INTEGRALS I 209
J: (;!). Loo (;!).
Discuss the convergence of the following integrals: (a ) (b)
L (k )· J:oo C�( (2 ) J-oooo ( t2 t I tltl1/2 ) dt 00
•
(c)
6.
1112
For what values of ( a) (b) (c)
7.
+
00
f C � t)a ) J: ( � J· L"' Ca ��g t) · 1
( lo
a
•
will the following integrals converge?
.
r I g
Show the following:
i"' --t dt J"' --t dt. t2 t sin 2
sin
=
0
0
8.
State and prove an analogue of Abel's test for improper integrals.
9.
State and prove an analogue of Dirichlet's test for improper
integrals.
IO.
Discuss the convergence of the following integrals:
(a ) (b) (c)
I I.
L"' ( � t dt) . "' L ( I t� ti dt ) . oo C�! t dt . ) L( i 5
si
Use the integral test to establish convergence or divergence of
the following series: (a)
""
L
k=I
(k3e-k).
210 I INTEGRATION 00
(b) (c)
L
k=2
((log k)P/k).
� ( k (log �og k)")
·
Let p and q be fixed integers p �
12.
an=
pn
L k=qn+l
q �
1
1 and let
k
Use the ideas involved in the proof of the integral test to establish that
(�)
an� log (Hint:
"1 L-
k=l k
where a is constant and 13.
0
(D1.o)} D1 , 0(d)
=
L mk[g(bk)-g(ak) ],
=
=
=
[l(f,g)
=
su p {!l1.0(d):
d E £>([21,0)}
=
J: J: f(x) dg(x) ,
arJ,d call these numbers the upper and lower Darboux-Stieltjes integrals of f with respect to g, respectively. In case D(f,g) [l(f,g) D(f,g), we say that f is Darboux-Stieltjes integrable with respect to g and call D(f,g) the Darboux-Stieltjes integral of f with respect to g. =
=
212 I INTEGRATION
The fundamental theorem here, as in the case of Riemann-Darboux integrals, is the following:
5.4.3 Theorem. If f is bounded and g is monotone nondecreasing, then the Riemann-Stieltjes integral of f with respect to g exists if and only if the Darboux-Stieltjes integral exists and if they exist they are equal. The proof of this theorem follows the details of the proof of Theorem
5.1.5, and we shall not reproduce it here. 5.4.4 Theorem. The Riemann-Stieltjes integral of the function f with respect to g exists¢=? the function S1,0 is Cauchy in the sense that VE > 0, 3a, so that a >- A. and a' >- a. ==>
ISr.o(a,{xd)-Sr.o(a',{x'k}I
n
=
=
L 1cxk> [g(bk> - g(ak> 1 k=I L f(xk)g'(xk)(bk - a k)= R10,(a, {xd). k=I
Since the limits on the left side and on the right side exist, they must be equal. This establishes the theorem. An immediate corollary of Theorem if we take
f(x) =
5.4.7
is Theorem
5.2.2. Indeed,
1, and note that
J: dg(x)
we immediately have Theorem
=
g(b) - g(a) ,
5.2.2.
The next theorem is integration
by parts for Riemann-Stieltjes integrals. When taken in conjunction with the previous theorem, it is seen to yield Corollary
5.2.3.
. ned on [a, b] and if the integral 5.4.8 Theorem. If f and g are defi offwith respect to g exists, then the integral ofg with respect to fexists and
J: f(x) dg(x) J: g(x) df(x) +
=
=
f(b)g(b) - f(a)g(a)
J: df(x)g(x).
214 I INTEGRATION
Proof.
.1 {h: k E ( l, n)} [ak,bk]. We may write
Let
where/k =
=
be any decomposition of
[a,b],
n
S0j.1, {xd) = L g(xk)[f(bk) - f(ak)], k=l On the other hand, n
f(b)g(b) - J(a)g(a) = L [f(bk)g(bk) - f(ak)g(ak)]. k=l Hence
f(b)g(b) - f(a)g(a) - S0,1('1, {xk}) n
=L
k=l
Now
[ak,xk] .1'
[xk,bk] = [ak, bk]
U
=
f(bk)[g(bk) - g(xk)]
n
+
L f(ak)[g(xk) - g(ak)]. k=l
and hence
{[ak,xk]: k E (l,n)}
U
{ [xk,bk]: k E (l,n)}
is a decomposition of [a,b] and indeed is a refinement of a. We may therefore write
S0,,(a, {xk}) x'k = ak for [xk, bk]. Since, by
where
= f(b)g(b)
the interval hypothesis,
- f(a)g(a) - S1,0('11, {x'k}) ,
[ak, xk] S(f,g)
and
x'k =bk
for the interval
exists, the conclusion of the
theorem is an immediate consequence of the last equality. The generalization of Theorem 5.2.4 is the following:
Theorem. If g is defined, increasing, and continuous on [a,b], and h are defined on [g(a),g(b)], and the Riemann-Stieltjes integral of f fwith respect to h exists, then the integral off g with respect to h g exists, and 5.4.9
0
fg(b) J(x) dh(x) =fb f o
.12 0 so that \lx,y � M Ix
IJ(x)-f(y) I
E
12 - Yl 1 •
Isf necessarily of bounded variation? 9.
[a, b] ,
Suppose f is defined on
[a, b] and 3M> 0 so that Vx,y
IJ(x)-f(y)I
� M Ix-
E
YI·
If g is any continuous function, show that the Riemann-Stieltjes integral off with respect tog exists.
5.5
FUNCTIONS OF BOUNDED VARIATION I 227
IO. Suppose f is continuous and of bounded variation on [a, b]. Show that
If f(x) dg(x) I J: lf(x) I ldg(x) I, �
where ldg(x) I is another symbol for dv0(x).
6j HIGHER
CHAPTER
DIMENSIONAL SPACE
In the previous chapters we have developed the essentials of the calculus for real-valued functions with domains in the real number system. The topics we have developed include ( 1) the real number system, (2) sequences and series, (3) real-valued continuous functions, (4) differen tiable functions, and (5) integration theory. The object of the remaining chapters will be to extend some of these theories to higher-dimensional situations. Some results extend in a straightforward manner. On the other hand, there are other results that are very simple and easy to prove in the one-dimensional case which become rather complicated and far-reaching theories in the higher-dimensional case. For example, the simple fact that a differentiable monotone function has an inverse that is also a differentiable monotone function generalizes to the con siderably more difficult inverse function theorem. The rather elemen tary formula for the integration of a derivative becomes a much more complicated theorem that requires considerably more algebraic and analytic machinery for its proof. This theorem is generally called
Stokes' theorem, but the names of Gauss and Green may also be associated with it. In this chapter we shall lay the basic foundation for the subsequent chapters. We shall discuss real vector spaces with a certain distance function acting on pairs of points and shall also discuss general proper ties of continuous functions as well as special continuous functions called ·
linear transformations.
6.1
REAL VECTOR SPACES
We shall begin our discussion with the n-fold Cartesian product Rn of the real numbers. The set Rn shall be defined as the set of all n-tuples
(x1,
·
•
" · ,x
),
where
x
k
ER fork E (l,n).
A much more formal
definition of Rn can be given as the collection of all functions with domain the set
(1, n)
and range in R. Indeed, a little reflection will
convince the reader that one reasonable way to define an n-tuple is as a function with doIJlain
(1, n)
and range in R. We shall continue to
use the very suggestive notation that we have indicated above. We shall now introduce two functions + and
· ,
the first one having
domain Rn X R" and range R" and the second one having domain 228
6.1
REAL VECTOR SPACES j 229
R X Rn and range Rn . These functions are defined by the equalities
(x 1, ... ,xn) + (y1, ...,yn) = (xl + y1, ... ,xn + yn), a· (x1,
•• •
•
,xn)= (ax1,
•
•
·
,
axn).
The triple (Rn,+, ) is a prototype example of what we shall call a real n-dimensional vector space and we shall designate it by vn. By an abuse of language we shall write x E vn rather than x E Rn when we want to emphasize that we are working with a vector space, and shall speak of the elements of vn rather than the elements of Rn. The ele ments of vn shall be called points or vectors and we shall designate them by letters without superscripts; for example, ·
x= (x1
'
•
•
•
'xn).
However, in two and three dimensions we shall often revert to the standard notations (x,y) and (x,y,z). As is usual, when we "multiply" a vector by a scalar (an element of R) we shall drop the dot. The numbers k x will be called the components of the vector x. The zero vector, 0, is that one for which every component is the zero element of R. We shall also set-x= (-I)x. 6.1.1 Definition. A finite set {xk: k E (1,m)} c vn, where if j =i' k, then X; =i' xk is said to be linearly independent for every finite set { ak: k E (l,m)}CR: m
L
k=l
akxk=O::::::} ak=O,
Vk E (l,m).
A set of vectors zn vn that is not linearly independent is called linearly de pendent. REMARl{S: In using the notation {xk: k E (1, m)} we are already specifying a function with domain (1,m) and range in vn by means of the equality (k) xk. The set {xk: k E (I, m)} is the range of, and if this set contains more than one element there is more than one function with domain (I, m) and with range {xk: k E (1,m)}. In some instances, for example when we talk about matrices, it is im portant to know exactly which function we are using. If this is the case, we shall call the function an ordered m-tuple of vectors and denote it by (x1 , , Xm). Ordinarily it is only the range of the function that is important. For exampk, the first sentence in Definition 6, I. I should perhaps more properly be stated as follows: Afinite nonvoid set A C vn is said to be linearly independent for every function a with domain A and range in R we have =
•
•
•
L XEA
a(x)x
=
0 ::::::}
a(x) =
0,
Vx E A.
230 I HIGHER-DIMENSIONAL SPACE
Suppose A has m elements and is any one-to-one function with domain (l,m) and range A. If we put xk = (k) and ak = a( xk) , then (referring to the introduction to Chapter 3) we get
2:
m
a(x)x
=
.xEA
2:
k=l
m
a((k) )(k) =
2:
k=l
akxk .
Hence, we have shown that Definition 6.1. l is independent of . In using the notation {xk: k E (l,m)} we usually mean that if j ¥- k, then x; ¥- xk> although this is not generally the case when we use the notation (x1, Xm ) . •
6.1.2
space of
•
· ,
Definition. vn
A nonempty subset L C vn is said to be a linear sub or a linear manifold ¢::::> V x, y E L and Va , f3 E R, ax + {3y
EL. 6.1.3 Definition. A vector x E vn is said to be a linear combination of the vectors in the set {xk : k E (1, m)} C vn ¢::::> there exist numbers a1 , am E R so that •
•
The last definition should perhaps more properly be stated as follows: vector x E vn is said to be a linear combination of the vectors in the finite set A C vn ¢::::> there exists a function a with domain A and range in R A
so that
As in the discussion of Definition 6.1.1, the sum on the right is quite independent of any function with domain ( 1, m) and range A. Thus Definition 6.1.3 really makes sense quite independent of which function from (1, m) onto {xk: k E (1, m)} we are using to define the sum. The terminology and notations we have adopted in the formal defini tions 6.1.1 and 6.1.3 are more classical and standard than those we have used in the rewritten versions of these definitions. Hence, if the reader will keep in mind what is involved, there seems to be no real reason to change from the classical terminology and notations, and we shall use them in the future without further comment.
6.1.4 Definition. If L is a linear subspace of vn, then a set {xk: k E (I , m)} C L is said to generate L ¢::::> every x E L is a linear combination of the vectors in {xk : k E (1, m)} . A set of vectors that generates L is called a basis for L ¢::::> the set is linearly independent. It is clear that every finite set {xk: k E (1, n)} C vn generates a linear subspace of vn, namely, the set of linear combinations of the vectors of the given set.
6.1
REAL VECTOR SPACES I 231
As an example of a basis for vn consider the vectors
(6.1.1) where
eki =0
ifj ¥-
k
k ek =1.
and
Of course, a vector space may have
many different bases, and for example a basis of
V3
is given by the
three vectors
X1=(1,0,0},
X3 = (1,1,l) .
X2=(1,l,O),
We shall leave the verification to the reader.
6.1.5
Theorem.
the linear subspace of s:;;:;; r. Proof.
If {yk: k E (1,s) } is a linearly independent set in vn generated by the set {x k: k E (1, r) } C vn, then
Since y1 ¥-
0, we
may write
r Y1 = L 0 so that
is convex and thus, by Corollary 6.3.12, is
connected. Hence every element in
B (x,p)
is equivalent to
x
and thus
is in E. Hence E is· open. Suppose
n
Q" is QA
times, and
the Cartesian product of the rationals =
Q" n A;
rational components. Since
that is,
A
QA
is open,
Q with
itself
is the set of points in A with
QA
is denumerable. (Proof?)
Let be a one-to-one function with domain N and range
QA·
If E is a
component of A, let
NE= {n: n E N & (n) Since E is open,
NE
E E}.
is nonvoid and thus by the well-ordering principle
it has a minimal element
nE.
Let qr be that function whose domain con
sists of the equivalence classes E and defined by qr(E) The one-to-one function
0
= nE.
qr has range a subset of
QA,
and thus its
range is countable. This means the collection of components of A is countable.
6.3
6.3.16
Corollary.
TOPOLOGY
IN E"
I 247
If A C E1 is open, then A is the countable union of
open intervals. Proof.
From the last corollary,
A
is the countable union of open
connected sets. Since by Theorem 5.3.10, every connected set in E1 is an interval, the proof is complete.
INTERIORS AND BOUNDARIES
The concepts of the interior and boundary of a set in En will be of some importance in Chapter 8 when we discuss Jordan content and the theory of integration in higher dimensions. In somewhat loose language the interior of a set
A
is the largest open set that is contained in
A.
The
boundary of A is the set of points that do not belong to the interior of
A
or the interior of Ac. The formal definition is as follows.
Definition. If A C E", then the interior of A, denoted by A0, the union of all open sets contained in A. The boundary of A, denoted {3A, is the set A\A0• 6.3.17
is
A0 � 38 > 0, so that B(x, 8) CA. Also we think {3A � V8 > 0, B(x, 8) n A =t= 0 and B(x, 8) n AC boundary of a set is clearly a closed set and, if A C B, then
It is clear that it is clear that
=t= 0. The A° C B0•
x
x
E
E
It is quite possible that a set may be rather "thin" and yet its boundary may be rather "thick." For example, the rationals in
[O, I]
are in some
sense rather "thin" but the boundary of this set consists of the whole interval
[O, l].
Note that this set of rationals has no interior. A single
point in En is an example of a closed set with no interior. The reader may easily construct an example of a denumerable closed set, which consequently has no interior. The Cantor set is an example of a closed set with no interior which nevertheless has an uncountable number of points.
D Exercises In the following exercises all sets are to be taken in En, unless otherwise specified. I.
Show that the intersection of any finite number of relatively
open sets is relatively open and the union of any number of relatively open sets is relatively open.
2.
Show that the union of any finite number of relatively closed sets
is relatively closed and the intersection of any number of relatively closed sets is relatively closed.
248 I HIGHER-DIMENSIONAL SPACE
3.
Suppose
tively closed. If
U is a relatively open set in A. Show C is a relatively closed set in A, show
that that
A\U is A\C is
rela rela
tively open. 4.
Let
C be relatively compact in A. d
then
5.
3c E C,
so that d
Prove that a set
inf {Ix
=
=
A
le
-
-
A
If
E A, is it true that if we set
E C},
x
al?
C En is connected{:::::} the subsets of A which
are both open and closed relative to
6.
al :
If a
A
and B are connected and
A
are
A
itself and the null set.
n B � 0, show that
A
U Bis
connected.
7.
Let
A
be a connected set in En , and A its closure. If A C B C
A,
show that Bis connected.
8.
A
If
C En and B C Em, show that
A
X B is open in £1•+m
{:::A ::}
X Bis closed in £1•+m
{:::::}A
and B are open. 9.
A
If
C En and B C Em, show that
A
and Bare closed.
10.
If A C En and B C Em, show that
A
X Bis compact{:::::} A and B
are compact.
11.
Show that a sequence with range in E" is convergent if and only
if it is a Cauchy sequence.
12.
If
A
C En and B C Em, show that
A
X Bis connected
{:::A ::}
and
Bare connected.
13.
Show the following: (a) (b)
14.
=
=
[A0 U (Ac)oy. (AoV\ (Ac)o.
Show the following: (a) (b)
6.4
{3A {3A
A is closed{:::::} {3A A is open {:::::} {3A
C C
A. Ac.
CONTINUOUS FUNCTIONS
In Chapter 2 we discussed the limit and continuity concept for functions having their domains and ranges in R. We shall now do the same for functions having their domains in En, n � 1, and ranges in E"', m � 1. The definitions are essentially identical and the theorems and their proofs are, by and large, the same. Hence we shall not present as de tailed a study here as we did in Chapter 2, but shall limit ourselves to those matters for which a slightly different formulation than given in Chapter 2 may lead to deeper insights.
6.4
6.4.1
Definition.
CONTINUOUS FUNCTIONS I 249
If f is a function with J0 ( f) C P and�f ( ) C Em
and a is an accumulation point of J0 f ( ), then I is said to have the limit l at a
¢=}VE> 0,38> Oso thatx
E [B ( a ,8)\ { a }] n J0 ( f) �l x ( )EB (l ,
E) .
In case f has a limit at a we write, as usual, Jimf x ( )= l
x-a
6.4.2
Definition.
or f ( x) - l
asx- a.
If f is a function with J0 ( f) C En and�( f) C Em,
then f is said to be continuous at a¢::::} a E J0 ( f) and VE> 0, 38> 0 so that x E B ( a 8) n J0 ( f)�f ( x) E B f ( a), E). ( ,
The function I
ZS
said to be continuous ¢:::f :} is continuous at every point of its domain.
The concept of continuity at a point is a local concept, that is, de pends only on the values that the given function takes on in some neigh borhood of the given point. However, a continuous function has a "global" characterization that is important and useful.
6.4.3
Theorem.
A function f with J0 ( f) C P and �f ( ) C Em
is
( is relatively open in J0 J ( ). continuous ¢:::f:} or every openU C Em, 11 - U)
Proof. Suppose f is continuous andU C E'n is open. If 11 - U ( ) is void, it is open relative to J0 f ( ) and we are done. If f1 - U ( ) � 0, let x E 1-1 ( U); sinceUis open,3p> Oso thatB ( f ( x),p) CU. From the continuity off, 38( x,p), so that y E B ( x,8) n J0 ( f) � f ( y) E B J ( x)),p), which in turn implies that B ( ( x,8) n J0 ( f) Cf1 - U ( ). If we set ( U ) }, Ef-1 x ( 8 , )x V= U {B : then Vis open andf1 - U) ( = V n J0 ( f). Conversely, supposef1 - U) ( is relatively open in J0 ( f) for every openU C Em. Let a E J0 ( f) and takeU =B ( f ( a), E); then there is an open V C En so thatf1 U) = V n J0 ( f). Since Vis open,38 > 0, - ( so that B ( a8) , C V. Hence f takes B ( a8) , n J0 ( f) into B ( f ( a ) E), which is the definition of continuity at a. This completes the proof. ,
6.4.4
Definition.
A function f is said to be an open function or an open
map¢::::} for every relatively open A C J0 f ( ),f ( A)is relatively open in�J ( ).
As we shall see in Chapter 7, open maps play a relatively important role in many considerations. Right now, Theorem 6.4.3 leads imme diately to the following corollary.
6.4.5
Corollary.
If f is a one-to-one open map, then f 1 - is continuous.
250 I HIGHER-DIMENSIONAL SPACE
Now, an open one-to-one map need not itself be continuous as the reader may easily verify (for example, refer to Theorem 2.3.6). A function that is one-to-one and f and f-1 are continuous is called a homeomorphism or wpologfral map. So, a Continuous one-to-one open map is topological. Exercise 4 at the end of this section will give another condition for a continuous one-to-one function to be topological.
6A.6
§Tl(j)
Theorem.
CEm
If f is a continuous function with �(J) and if �(J) is compact, then §Tl(j) is compact.
C E"
and
Let U be an open covering of
§Tl(j). For every U E U, there exists an open V CE" so that f-1(U) V n �(f). The collec tion of such Vis an open covering for �(f), and hence there are a finite number that cover �(f). Hence there are a finite number of elements of U that cover §Tl (f). Proof.
=
6.4. 1 Corollary. If f is continuous and �(f) is compact, then f is bounded; that is, §Tl (J) is bounded. Proof.
§Tl(j) compact and therefore is bounded.
6.4.8 Corollary. If f is real-valued and continuous and �(f) compact, then f assumes its maximum and its minimum on �(f).
CE"
is
Proof.
Since §Tl(j) C E1 is compact, it is closed and bounded. §Tl(f) must contain its supremum and infimum, which are the maximum and minimum of f, respectively. Hence
6.4.9 Definition. A function f with �(J) CE" and §Tl(j) C Em is said to be uniformly continuous¢::::} VE> 0, 38 > 0 so that x y E �(f) and x y E B(O, o) �f(x) - f(y) E B(O, E). ,
-
6.4.10
Theorem.
If f is continuous and �(f) is compact, then f is
uniformly continuous. Proof.
See the proof of Theorem 2.2.9.
In the one-dimensional case we showed (Theorem 2.2.14) that every continuous function with a closed interval domain takes on all values between its maximum and its minimum. This fact generalizes in a very interesting way for functions with domains and ranges in higher dimensional space.
6.4.11
§Tl(j)
Theorem.
C Em
If f is a continuous function with �(J) and if JEJ(J) is connected, then §Tl(j) is connected.
CE"
and
6.4
CONTINUOUS FUNCTIONS I 251
Proof. Suppose U and V are open in Em, f'/C,(f) C U U V, and U n f'/C,(f) and V n f'/C,(J) are disjoint. Then 1-1(U) l-1(U n f'/C,(f)) and 1-1(V) l-1(V n f'/C,(J)) are disjoint and open relative to tB(J), and cB(J) l-1(U) U l-1(V). Since cB(J) is connected, one of the setsj-1(U) or 1-1(V) must be the null set and hence the corresponding set U n f'/C,(f) or V n f'/C,(J) must be the null set. Thus f'/C,(J) must be connected. Theorem 6.4.11 is an important aid in deciding whether or not a set in higher dimensions is connected. For example, the proof of Corollary 6.3.12 can be made in the following way. Suppose C is a convex set in En and C A U B, where A and B are nonvoid disjoint relatively open sets in C. Let x E A and y E B, and set =
=
=
=
l(t)
=
(I- t)x + ty,
t
for
[O, l].
E
The function f is continuous and since its domain is connected its range is connected. Butf'/C,(J) [f'/C, (J) n A] U [f'/C, (J) n BJ, where clearly, f'/C,(J) n A and f'/C,(J) n B are disjoint, nonvoid, relatively open sets in �(f). This is a contradiction. We think it is clear that the procedure we have just given can be generalized in a very simple way. =
Definition. A set A C E" is said to be arcwise connected � A, there exists a continuous function f, with domain [0, 1 J and range in A, so thatl(O) x andl(I) y. 6.4.12
V x, y E
=
6.4.13
=
Proposition.
Every arcwise connected set is connected.
The details of the proof are essentially the same as the proof we have just given above for convex sets, and we shall leave it as an exercise. As an example of how Proposition 6.4.13 can be used, let us consider the ring in E2, which is the set
{x: 0 < r ,,,,; lxJ ,,,,; r }. 1 2 A and let (J and
,(f2 ) [O, l] (see Theorem 3.3.2) and that &',(J ) [O, l] X [O, l]. To show that f is continuous, it is enough to show that f1 and f2 are continuous. l/3 2
Theorem. A linear transformation T is one-to-one (nonsingular) the range of T has the same dimension as the domain of T.
Proof.
{uk: k E (l,r)} is a basis for .B(T). Since T is !R-(T) is a linear combination of the vectors in the set {T(uk): k E (I, r)}. Hence, if under the assumption that T Suppose
linear, every element in
is one to one we can show that this latter set is linearly independent, we will have proved the necessity. If T is one to one it follows that
T(uk)
¥- 0
Vk E (1,r) , since
0 is
the only vector taken into 0. Suppose that
� akT(uk)=r( t1 ak uk )
=
0.
SinceTis one to one, we get
{uk: k E ( 1, r)} is a linearly independent set we have ak 0, \fk E (l,r). Conversely, suppose the range of T has the same dimension as its domain. We must show that T(x) T(y) ==}x=y. Using the linearity of T, this is equivalent with showing that T(x) 0 =::::} x=0. Let us write
and since
=
=
=
then
T(x)
=
0 =::::} r
L
k=I
ak T(uk)
=
0.
!R-(T) is r,and, as we have already !R-(T) is generated by the set {T(uk): k E ( 1, r)}. Consequently, this latter set is linearly independent and Vk E (l,r), ak=O. Hence x 0 and the proof of the theorem is complete. Now, by hypothesis, the dimension of
noted,
=
is
6.5.3 Definition. The dimension of the range of a linear transformation called the rank of the linear transformation.
258 I HIGHER-DIMENSIONAL SPACE
Suppose L and M are linear subspaces in V" and vm, respectively, and
S and T are linear transformations each having domain L and a E R we define
range in M. If for
(aT)(x) (S + T) (x)
= =
aT(x), S( x) + T(x),
then these two operations will make the set of all linear transformations with domain L and range in M into a vector space. Indeed, this vector space is finite-dimensional, and if
p is the dimension of L and q is the
dimension of M, then its dimension is
pq.
We have asked the reader to
verify these facts in Exercise 4 at the end of this section. There is a very useful representation for linear transformations by means of
matrices. It is at this point that it becomes important to con
sider a basis of a vector space as a function rather than as a set (see the discussion in Section 6.1 ). We shall use the terminology
ordered basis
for such a function. Suppose
A is a linear transformation with domain L and range in (u1, · ,up ) and (v 1, · ,Vq) be ordered bases
the linear space M. Let
•
•
•
•
for L and M, respectively. Then we may write
A(uk)
q
=
2:
j=l
ajkVj,
kE(l,p).
The array of numbers
a21
a12 a22
a,. a2P
aql
aq2
aqp
[""
]
matrix. More specifically it is called a matrix repre sentation for A with respect to the ordered pair of ordered bases ((u1, · · · ,up) , (v1, · · · , vq)) . From a very formal point of view, a is called a
qXp
q Xp
matrix is a function with domain (I, q)
Suppose N. Let
( w1,
X (I,p)
and range in R.
B is a linear transformation with domain M and range in •
)
· , wr
•
be an ordered basis for N and let us compute the
r X p matrix representation of B A with respect to the ordered pair ((ui. ,up), ( w1, · · · , wr)) . Let the matrix entries of B with respect to the ordered pair of ordered bases ((v1, Vq) , (w1, , Wr) ) be denoted by b;k· Then we have 0
·
·
·
•
•
•
,
q
B
0
A (uk)
=
2:
j=l
a;kB(v;)
kE(l,p).
•
•
•
6.5
LINEAR TRANSFORMATIONS I 259
It follows that if c1k is in the lth row and kth column of the matrix representation of B
0
A with respect to the given bases, then c 1k
q
=
L i=I
b1;a;k·
This serves to define a multiplication 0£ matrices,
l
bu b21
b12 b22
b,. b2q
au a21
a12 a22
alp a2P
Cu C21
C12 C22
Ct p c 2P
br i
br2
b rq
a q,
aq2
aqp
Crt
cr2
C rp
Thus we see that we get c1k by "multiplying" the lth row of the matrix of B with the kth column of the matrix of A; that is, bu is multiplied by a;k and then the result is summed over j to get c1k. Suppose now that A is a linear transformation with domain a linear subspace L C vn and range in L. Let 'U (u 1, up) be an ordered basis for L and let us designate the matrix representation of A with respect to the pair ('U,'U) by [a;k]. We want to investigate the ques =
•
•
•
,
tion of how the matrix representation changes when we pick a new
ordered basis 'U' = (u'i. · · ·, u'v) for Land get a matrix representation
[a' ;k] with respect to the pair ( 'U', 'U'). From this point on,for the sake of simplicity, let us agree that if [au] is a matrix representation of A with respect to a pair of ordered bases ( 'U,'U) we shall say that A has the matrix represen tation [au] with respect to the basis 'U. Let Q be the linear transformation with domain and range L that is
defined by
This means Q(uk) = u'k> and we may write
u'k = Q(ud =
p
L
j=l
Q;ku;.
Hence the matrix representation of Q with respect to the basis
'U
is
[q;k]. Clearly, Q is nonsingular,since i:he dimension of &e,(Q) is the same as the dimension of ..e(Q). Let Q-1 be the inverse of Q and suppose its matrix representation with respect to the basis 'U is [ru]. Let us now compute the matrix representation of A with respect to the basis
'U'.
We have
A(u'k) = A Q(uk) = 0
p
L j=l
s;ku;.
260 I HIGHER-DIMENSIONAL SPACE
where
s ik
p
L a i,qlk· l=l
=
Now,
Q-1 (u i)
p
=
L ruui. l=l
Q, and note that Q( u 1)
and if we compose both sides with
=
u' 1, we get
Using this in the expansion for A (u'k) we get A (u'k)
Hence, if we write
[q;k]-1
� [� rlisik ] u'1•
=
1
for the matrix
[rik],
we have
(6.5.1) In Section
6.6 we shall show how to compute the numbers r ik in terms
of the entries of
[q;k].
Let us now turn our attention to properties of a linear transformation that are connected with the inner product, or, what is the same thing, with the length function on
vn x vn.
Theorem. Let T be a linear transformation whose domain is a C En and range is in Em. There exists an M > 0 so that
6.5.4
linear subspace L Vx EL,
IT(x) I
� M
lxl.
(6.5.2)
In particular this means that T is uniformly continuous. Proof.
x EL,
Let
{ uk: k E (I, p)}
be any orthonormal basis for
we may write
lxl2 If we apply the transform
p
=
T we
T(x)
L 1gk12• k=l
get p
=
L gk T(uk), k=I
L.
If
6.5
LINEAR TRANSFORMATIONS I 261
and taking the norm of both sides and using the triangle inequality we get p
IT(x)I :!SL ltkl IT(uk )I. k=l Now use the Cauchy-Buajakovsky-Schwarz inequality on the right to give
IT(x)i :!S
{� IT(uk)l2 f'2 {� ltkl2 r2•
This is the inequality (6.5.2). Now, replace
x by x - y and
use the linearity of T to give
IT(x) - T(y)I :!SMix - YI. This proves the uniform continuity.
6.5.5
3m>
0,
Corollary. If T is a nonsingular linear transformation, then so that Vx E JV(T),
mlxl :!S IT(x)I. 11 is a linear transformation, it follows by the previous 3M> 0 so that Vy E JV(T�1), l11(y)I :!SM IYI. But Vx E JV(T), 3y E JV( 11) , so that T(x) =y. Hence lxl -:!SM IT(x)I, and, taking m l/M, we are finished. For any linear transformation T let us set Proof.
Since
theorem that
=
llTll =inf {M: Vx Clearly,
Vx
E
JV(T) we
E
JV(T),
!T(x)I :!SM !xi}.
(6.5.3)
have
IT(x)I :!S llTll lxl. The real number defined by (6.5.3) is called the norm of T and it defines a distance function on the vector space of all linear transformations with domain a fixed space Land range in a fixed space M. For this distance
function to. be useful, the triangle inequality should be satisfied and this is seen as follows. Let
S
be another linear transformation with
domain Land range in M. Then
Vx
EL we have
! (T+ S)(x)I :!S IT(x)I + !S(x)I :!S {llTll + llSll} !xi. This shows immediately that
!IT+ Sii :!S llTll + !!Sil. It is also a very simple matter to check that
llaTll
=
la l llTll.
Va
E R,
262 I HIGHER-DIMENSIONAL SPACE
and that
We shall leave the proofs of these simple facts as an exercise. There are other expressions for 11Tll that are often very useful. For example,
llTll=sup{IT(x)I: lxl=l}. The proof of this is very simple. Indeed, if
(6.5.3')
lxl= 1, IT(x) I .;; IITll,
and
hence the right side of (6.5.3 ') is dominated by II Tll. On the other hand, let M0 be the right side of (6.5.3'). Then,
IT(x) /lxl ) I .;; From this it follows that
IT(x)I .;;
Vx
oF- 0,
M o.
M0 lxl and hence
llTll.;;
M0• This
E � (T)}.
(6.5.3")
shows the equality. Another useful expression is
llTll
=sup {IT(x )
·YI: lxl= IYI = 1 & Y
To prove this, let us first note that if
Vz
L
is a linear subspace of
En,
then
EL,
lzl
=sup { lz
·YI: IYI
=
1 &
y E
(6.5.4)
L}.
Indeed, if M1 is the right side of (6.5.4) the Cauchy-Bunjakovsky
Schwarz inequality shows that M1.;; take y=z/lzl. Then
lzl=z · z/lzl
lz l .
On the other hand, if
z
.;; M1, which shows equality. If z
oF- 0, =
0,
the equality (6.5.4) is obvious. To prove (6.5.3"), let M0 be the right side of that equation. From the Cauchy-Bunjakovsky-Schwarz inequality we get
.;; llTll,
lxl=IYI= 1. Hence Mo.;; llTllget Vx, so that l xl = 1 ,
for
(6.5.4) we
IT(x)I=sup { IT(x) ·YI: IYI
IT(x) ·YI .;; IT(x)I
On the other hand, from
=
l}.;;
M o.
Hence, from (6.5.3'),
llTll =sup { IT(x)I: lxl=l}.;; 6.5.6
Definition.
linear functional.
A
Mo.
linear transformation with range in
R
is called a
Linear functionals have very interesting and useful representations, as the next theorem will show.
6.5. 7
Theorem.
there exists a unique y
If A is a linear functional with domain L C En, then so that Vx EL,
EL
A(x)=x ·
y.
6.5
LINEAR TRANSFORMATIONS I 263
Proof. Let {uk: k E ( 1, p)} be an orthonormal basis for L. If x E L, we may write
A(x)
p
=
L
k=l
�kA(uk) .
If we set n
y= L A(uk ) uk> k=l A(x) =x y. z E L, so that Vx E L, A(x) =x ·z. Then Vx EL, x ·(y - z) =0. In particular, take x y - z and we get jy - zj =0, which implies y=z. it is clear that
·
Suppose
=
6.5.8 Theorem. Let A be a linear transformation with domain L C En and range in M C E'n. Then there exists a unique linear transformation A1 with domain M and range in L so that Vx EL and Vy E M A(x) ·y=x·A1(y).
(6.5.5)
y E Mand set Ay(x) =A(x) y. Since Ay is a linear func L, it follows from the last theorem that 3 a unique y1 E L so that Vx E L Proof.
Fix
·
tional with domain
A(x)·y=x·y1• Since
Aay+13Ax) =aAy(x)
+
(6.5.5')
f3Az(x) it follows that
(ay + {3z)1 =ay1 + {3z1• y1, it follows that A1 is a linear transformation L. Since the yl for which (6.5.5') holds is unique, it follows that there is only one linear transformation A' for which (6.5.5) holds. Hence, if we set
A1(y)
=
with domain M and range in
6.5.9 Theorem. If A is a linear transformation with domain Land range in M, if N(A) = {x: A(x) =O} is the null space of A, and if N(A)l. is the orthogonal complement of N(A) in L, then
N(A)l. =�(A1). Proof.
For every
x E N(A) and Vy E M, we have A(x) ·y=x·A1(y)=O.
�(A1) C N(A)l.. On the other hand, suppose 3z E N(A)l.\ �(A1). Because �(A1) C N(A)l., without loss of generality we may
Thus
264 I HIGHER-DIMENSIONAL SPACE
suppose that
z
E
&2.(A1).J..
Thus we have
Vy
E M,
A(z) · y=z·A1(y)=O, from which it follows, upon setting y E N(A) n N(A).J.. It follows that
=A(z), that A(z)=0,
z
Hence
N(A).J.=&C.(A1).
6.5.10
Corollary.
For every linear transformation A, rank
Proof. Since
The range of
AJN(A).J.
and hence
z= 0, which is a contradiction.
A
A=
rank
A1•
is clearly the same as the range of AJN(A).J..
is a nonsingular linear transformation, we have rank
A=dim N(A).J. =dim
&i (A1) =rank
A1•
The linear transformation A1 is called the transpose of A with respect to the space M. If A has the matrix representation [aid with respect to the ordered orthonormal bases ((u1, · · ·, up) , (v1, · · ·, Vq)), it is interesting and useful to comp ute the matrix representation of ((v1, • · - , Vq) , (u1,· · ,u p)) We have
respect to the bases
-
A1
with
.
q
A(ud = L aikvi, i=I
and thus
On the other hand, p
A1(vJ = L a1ki uk k=I
and hence
Thus
The matrix
[a1Jk] is called the transpose of the matrix [aik] and is a [aik]1•
p X q matrix. It is usually denoted by
The rank of a linear transformation can be comp uted from its matrix representation with respect to any ordered p air of ordered bases. The p recise facts are given in the following proposition.
6.5.11 Proposition. If [aii] is the q X p matrix representation of a linear transformation A with respect to any ordered pair of ordered bases, q then the rank of A is the dimension of the linear subspace in V generated by the column vectors {ak=(a1k,· · ·,aqk): k E (I,p)}, which is the same
6.5 LINEAR TRANSFORMATIONS I 265
as the dimension of the linear subspace of VP generated by the {bk= (ak1, · akp): k E(l,q)}. ·
row
vectors
· ,
) is an ordered basis for "®(A) and �(A). If r is the rank of A, then there is a set {A(uk; ): i E( 1, r)} of r linearly independent vectors that generate �(A). Suppose that {a; : i E (1, r)} Suppose
Proof.
(v1,
•
•
·,Vq)
(u1,
·
·
· ,Up
is an ordered basis for a linear space that contains
C Rand
Since
q A(uk1) = L aik; vi J=l
� a;A(uk1) =i� (�a; aik;)vi = 0, Vi E(l,r), a1=0. Thus the vectors {ak;: i E(l,r)} Vq . To show that these vectors generate the same space as the vectors {ak: k E( 1, p)}, we first note that Vuk there exist numbers {/3;k: i E (1, r)} so that it follows that
are linearly independent in
A(ud
r
=
L /3;kA(uk;)
i=l
Hence r
jE(l,q).
aik = L /3; kaik; , i=l
But this says that n
ak= L /3;kak;, i=l
which proves the assertion about the vectors To prove the assertion about the vectors · ·
· ,
ep)
be an ordered orthonormal basis
{ak: k E {l,p)}. {bk: k E (1, q)}, let ( e1, for £P and (f 1, · · ·,fq)
Eq. Let B be the linear transforma q tion with domain £P, range in E , and whose matrix representation with respect to ((e1,· ·,ep), (f1,· ·,fq)) is [aid· By what we have proved in the previous paragraph rank B = rank A. Also, we know from Corollary 6.5.10 that rank B1 = rank B. The matrix representation of B1 with respect to ( (f1, · , fq), (e1, ep)) is a p X q matrix with column vectors the set {bk: k E ( 1, q)}. Thus from the first paragraph be an ordered orthonormal basis for
•
•
•
•
•
•
· ,
of the proof, the dimension of the linear space generated by these vectors in
VP is rank B1 =rank A. This completes the proof.
266 I HIGHER-DIMENSIONAL SPACE SPECIAL LINEAR TRANSFORMATIONS
Projections. Suppose M is a linear subspace of En and Lis a linear M. If V- is the orthogonal complement of L in M, then Vx E M there is a unique y E L and a unique yl. E LL so that
subspace of
x
=
l y + y ..
The last statement is just Proposition
6.2. IO(b). Let us defn i e a linear P with domain M and range L by the equation
transformation
P(x) =y. The linear transformation
P is called the projection of M onto L. It has
the following properties: (a) (b)
x E L¢:::P :> (x) =x. p2 = p p=p.
(c)
P=P1•
0
We shall leave these simple facts as an exercise for the reader. If
{uk:
k
E (I, r)} is an orthonormal basis for the linear space L, P(x) in terms of this basis. Indeed,
it is a simple matter to compute let us write
r
P(x) = L akuk. k =l If we take the dot product of both sides with respect to
ak=P(x)
·
uk=x
·
uk, we get
uk.
The last equality follows from the facts that
P1
=
P and P(uk) = uk.
Hence r
P(x) = L (x k=l
·
uk)uk.
Let us use the idea of projection to obtain the
spherical representation P1 be the projection of En onto the linear subspace 1 O}. P1 may also be described as the projection of {x: x E En & x En onto the space generated by the vectors {ek: k E (2, n)}. Here we i are taking ek (e/, ekn ) where ek = 0 ¢::::> j # k, el= 1. Let P2 2 1 be the projection of En onto the linear subspace {x: x E En & x = x O}. The projection of P2 may also be described as the projection of En onto the space generated by {ek: k E (3,n)}. Note that �(P2) =�(P2j�(P1)), so that P2 restricted to �(P1) is the projection of the latter subspace onto the subspace �(P2). In general, let Pi, j E (I, n I), be the projection of En onto the subspace {x: x E En 1 & x = x; =O}. Clearly, the last subspace is the space generated by {ek: k E (j + l,n)}, and �(Pi)=�(Pij�(Pi_1)),j E (2,n-1). 1 The vector (t , 0; , 0) is the projection of t onto the subspace generated by e1, and we have of a vector in En. Let =
=
·
·
· ,
·
·
=
-
·
·
·
=
,
6.5
LINEAR TRANSFORMATIONS J 267
Now, there exists a unique 81E[O,1T] so that cos81 = (t provided It I =fa 0. Hence we get (1 = ltl
COS
·
e 1)/ltl,
(JI,
The number 81 may be considered as the measure of the angle between the vectors t and e1. Since t= (t e1)e1 +P1(t) and e1 and P 1(t) are orthogonal, we have ·
But, we also have ltl2 cos2 81 + ltl2 sin281 = ltl2, so that IP1(t)l2= ltl2 sin281• Since 81E [O,1T] , sin 81 � 0, so that 1 IP 1(t)I = ltl sin 8 . Now, let us repeat this process with the vector P1(t) playing the role oft and P playing the role of P1• We find that 2 O ) = (P1(t) e2)e2, (0, t2, 0, ·
·
·
·
,
and there exists a unique 82E [O,1T] so that cos 82 = (P1(t) IP 1(t)I , provided, of course, that P1(t) =fa 0. We then get t2= IPdt)I cos 82= ltl sin 81 cos 82,
·
e 2)/
IP2(t)I = IP1(t)I sin 82= ltl sin 81 sin 82• The number 82 may be considered the measure of the angle between P 1(t) and e (Fig. 6.5.1). 2 x3
FIGURE 6.5.1
268 I HIGHER-DIMENSIONAL SPACE
If P n ( t) =F -2
0 , then none of the vectors t, P 1 ( t) , P P ( t) , 1 2 P (t) is zero, and we can proceed by induction and 1 2 find that there exist unique (Jk E [ 0, 'TT] , k E(l,n - I), so that Vk E(l,n-1), tk = It I sin 61 sin 62 • • sin (Jk-i cos (Jk,
Pn_
°
P,._3
°
·
,
°
·
·
-
·
·
0
•
and moreover
It" I = Iti
sin 61 sin 62 ••• sin en-2 sin en-I.
The last equation comes from the fact that
(O , If
t = 0,
·
·
· ,
0 , t") =
Pn -1
°
Pn
-2
°
·
·
·
0
P1 (t) = P n (t). -1
these equations Still hold, but the numbers
(JI,
•
••,en -I
are no longer uniquely determined. Indeed, any numbers will do. If
P1(t) =O , then from the equation j P 1 ( t ) j = ltl sin 61, it 0 or 61 ='TT. Again the equations above hold, but now (}2, en-i are no longer uniquely determined and again any num bers will work. Proceeding in this way, we see that if t =F 0 and V k E (I, n - 2), (Jk E ]O,'TT[, then the vector t uniquely determines the numbers (Jk, Vk E (I, n I). Unfortunately the last equation is an equation for It" I rather than t". If we wish to remove the absolute value sign, it may no longer be true that we can take en-I E [0,7T]. However, if Vk E(l,n-2), (Jk E )0,'TT[, then sin 61 ••• sin en-2 =F 0, and there exists a unique en-I E [0,27T[ so that t
0,
=F
but
follows that 61 •
•
=
· ,
-
tn-I tn
= It I
= It I
sin 61
•
•
-
sin en-2 cos en-l'
sin e1 • • • sin en-2 sin en-1.
en-1) where p ;:;. 0, E (I, n-2) and en-1 E [O,27T]. Let S0 be this with p > 0, (Jk E ]O,7r[ for k E( l, n - 2) and en-i
Let S be the collection of n-tuples (p, 61, ••
(Jk E [O,'TT]
set of n-tuples
E [O,27T [.
We have proved the following result.
The function with domain t1
=
t2
=
tk
=
tn-I tn
· ,
for k
S
defined by
ltl cos 61
Iti
sin 61 cos 62
It I
sin 61 sin 62 •
= It I = It I
•
•
sin
(Jk-l
cos
(Jk
(6.5.6)
sin 61 sin 62 ••• sin en-2 cos en-I sin 61 sin 62 • • • sin en-2 sin en-I
'
has range all of E". If this function is restricted to S then it is one to one and its range is all of E" with the exception of the subspace generated by the set {e1:j E(l,n-2)} U {O}. 0,
6.5
Note that if
n
= 2,
the formulas
LINEAR TRANSFORMATIONS I 269
(6.5.6) give the ordinary transforma
tion from "polar coordinates" to "rectangular coordinates," and the
exceptional set is { 0}.
Symmetric Transformations. and
A is
linear transformation If
Suppose Lis a linear subspace of En
a linear transformation with
( u1, • •
·
,
ur
)
A is
J0(A) =Land .92.(A) RA = A1•
is an ordered orthonormal basis for Land
the matrix representation of
A with
A=A1,
we get that
[a;;]
is
respect to this basis, it follows from
A1
our discussion about the matrix representation of But since
C L. The
said to be symmetric
a1ii =a;;=aii.
that
a1;;=a;;.
Any matrix whose entries
symmetric.
satisfy the last relation is called
There is a method of computing the norm of a symmetric transforma tion that is usually more convenient than the methods we have indicated previously. Let us set
M=sup{IA(x) ·xi: lxl so that
Vx E
En,
M
IA(x) ·xi �M lxl2• �sup { IA(x)
=
l} ,
It is clear that
·YI: lxl =IYI
=
l}= llA ll.
On the other hand , a direct computation shows that 1
A(x) ·y = [A(x+y) "4
·
(x+y)-A(x-y) · (x-y)].
From the facts that
l[A (x+y) · (x+y)-A(x-y)
·
(x-y)]I �M[lx+yl2+lx-yl2]
Ix+Y l2+Ix - Y l2 = 2[lxl2 + IYl�J, we get
llAll =sup {IA(x) · YI: lxl =IYI = l} �M. Consequently, we have the following equality for symmetric transfor mations: sup {IA(x) Since
IA(x) xi ·
·xi: lxl = l}
=
llAll.
(6.5.7)
is a continuous real-valued function and the unit
sphere in L,S ={ x:
lxl = l},
is compact it follows that
3x0 E S,
llAll =max {IA(x) ·xi: x ES}=IA(x0) x01. •
Let us set
µ,0=A(x0) • x0• A direct computation shows that
0 � IA(xo ) - 1-toxol2 =IA(xo )J2-J.to2•
so that
270 I HIGHER-DIMENSIONAL SPACE
Since
IA(xo ) I
llAII= 11.to l.
:!f::
it follows that
IA(xo)-µ,o xol2=0. This means that
(6.5.8) Any number
µ,0
for which there exists a nonzero vector
x0 E�(A)
for which
(6.5.8) is satisfied is called an eigenvalue or proper value of the linear transformation A . Any nonzero vector that satisfies (6.5.8) is
called an
eigenvector
or
proper vector
A.
for
Our previous discussion shows
that a symmetric transformation has an eigenvalue. The corresponding eigenvectors are not unique, since clearly any element in the linear space generated by an eigenvector satisfies the relation
(6.5.8).
A., let MA be the linear subspace of all vectors in L that satisfy the relation Ax=A.x. It is clear that A takes every element in MA into an element in MA. If A is symmetric, then it is also true that A takes every element in MA.l into MA.l· To prove this last statement, let x EMA1-; then Vy EMA we have x · y=0. Now, Vy EMA, since A(y) EMA and x EMA.l we get For a given eigenvalue
A(x) · y = x µ,0
A(y)
A.(x
=
·
y)=0.
A(x) EMA.l'
This means Let
·
be the eigenvalue whose existence we established several
paragraphs back. Let
A1
A
be the restriction of
to
Mµ.,,1-;
that is,
A,= AIMµ..1-. A1 is a symmetric linear trans
As we have shown in the last paragraph, formation with domain
Mµ./
and range in the same linear subspace.
Hence, by the same existence proof as before, there exists a
x1 EMµ./,
so that
lx1I =I
µ,1
and an
and
Proceeding in this way (formally, by induction!) we find that there is an ordered orthonormal basis
{ A.i.
·
·
· ,
A. r}
so that
(v1,
•
•
·
,
v r)
for L and eigenvalues
A vk= A.k vk. A. k are the same. Mµ.i is more than 1.
We are not excluding the possibility that some of the This can happen, for example, if the dimension of With respect to the ordered basjs sentation of
A
is
A.,
(v1,
•
•
0 0
· ,
vr)
the matrix repre
6.5
LINEAR TRANSFORMATIONS I 271
where the entries off of the main diagonal are zero. This means that
[a;i],
the matrix
which is the matrix representation of
( u1,
to the ordered basis
•
•
Ur)
• ,
,
A
with respect
is similar to a diagonal matrix in the
sense that there is an invertible matrix
[bii]
so that
is the given diagonal matrix. Because of this, we usually say that a symmetric matrix is diagonalizable. Let us suppose that we have numbered the eigenvalues so that
>..
>--1 �
2
�
•
•
•
�
>..r.
For
x
E L let us write
Hence we get
r A(x) = L xk>..kvk> k=l r A(x) . x L >..k(xk)2. k=l =
This shows that
Vx
E L,
>--rlxl2 �A(x) · x �>..1 lxl2• Indeed, as the reader may easily verify,
{A(x) · x: lxl = l}, {A(x) x: lxl = l}.
>..1 =sup
>..r =inf
·
Orthogonal Transformations. and
A
Let L be a linear subspace of E"
a linear transformation with
linear transformation
=Ix!.
A
£l(A)
=
�(A) C L. The Vx E L, IA (x) I
L and
is said to be orthogonal (J) =C, fk(J) C En and so that Vx EC and Va f(ax) = af(x). Show that 3M> 0, so that Vx E C, IJ(x)I 8.
ER with
a;;.:
0,
� Mlxl.
Let f be a function with domain En and range in E'n which is
additive; that is,
Vx, y
E En
J (x + y)= J(x) Show that if f is continuous at
J(y).
+
x= 0,
then it is continuous at every
point of En.
9.
Let
Ay be
the linear functional defined on En by the equation
Ay(x) =x y. ·
Show that
10.
Let
A
be an orthogonal linear transformation with domain and
range a linear subspace L C En. Suppose orthonormal basis of L and
[aii]
( u1,
•
•
·
,
ur
)
is any ordered
the matrix representation of
A
with
respect to this basis. Show that
r
L aiiakJ
=
if
0
i # k.
j=I
11.
Let
A
be a symmetric transformation with domain a linear sub
space L C En and range in E". Let of
A
[aiJ]
be the matrix representation
with respect to the ordered orthonormal basis
that there is a matrix
[biJ]
( u1,
•
•
·,Ur
).
Show
that is the matrix representation of an
orthogonal transformation from L onto L so that
is a diagonal matrix.
12.
If L and Mare linear subspaces of E" that have the same dimen
sion, show that there is a linear transformation range M so that
13.
Vx
Show that if
E L,
A
IA (x) I= lxl.
A
with domain L and
is a linear transformation with domain L C En
and rangeM C E" so that
Vx
E L,
IA (x) I= lxl , then A can be extended
to be an orthogonal transformation with domain and range E".
14. that P2
Let P be a symmetric linear transformation with the property =
P. Show that P is the projection of its domain onto its range.
Give an example that shows that P2= P does not imply that P is a symmetric linear transformation.
274 I HIGHER-DIMENSIONAL SPACE
15. of
Let
�(A),
A
be a symmetric linear transformation, M a linear subspace
and
P the projection A= A P.
into itself¢::::? P
16.
If
P is
0
x
�(A)
onto M. Show that
a nonzero projection show that
vector in En and of
of
A
takes M
a
x
11P11= 1 .
If y is a nonzero
is any vector in En, use the formula for the projection
onto the linear space generated by y and the result of the first sen
tence of this exercise to give another proof of the Cauchy-Bunjakovsky Schwarz inequality.
17.
6.6
Show that a linear transformation is an open map.
DETERMINANTS
In his study of elementary algebra the reader has undoubtedly come across the notion of determinants and has learned enough of their properties to be able to use them for solving systems of linear equations. Our purpose in this section is to derive a number of properties of determinants in a rigorous way since they are very important quantities in the higher-dimensional calculus. Before we discuss determinants, it is necessary to discuss a certain class of functions, called
permutations,
which take finite sets onto them
selves.
Definition. Let S be a finite set. A one-to-one function 0,
If we set
B= u {B(xk,o): k E (l,p)}, then
Vx EB, p
Let us
(x)= L 'Pk(x) # 0. k=l define hk on K by putting Vx EK hk(x)
=
cpk(x)/(x).
We see immediately that this set of functions has the required properties. This completes the proof. REMARK:
The set of functions
{hk: k E (1,p)}
that we used in the
proof of the previous theorem is called a subordinate to the covering
partition of unity for K {B(xk,o): k E (I,p)}. We shall meet
these objects again later on, especially when we study integration on manifolds.
300 I HIGHER-DIMENSIONAL SPACE THE STONE-WEIERSTRASS THEOREM
We now come to an important generalization of the Weierstrass ap proximation theorem which was proved in Section 4.6. This generalized theorem was proved by M. H. Stone in a context that is more general than we shall present it, although the proof is the same. The reason we do not present the theorem in as general a context as originally given is that a discussion of the relevant concepts would take us too far afield. The theorem we shall state is valid only for
C1(K), where K is a compact C(K) in place
set in En. For the sake of simplicity we shall simply write of
C1(K).
Definiti.on. A set A C C(K) is said to be an algebra 0 and Vx EK, 38(x,e) so that Vf EA and
Vy EK for which lx-yl < 8(x,e) we have IJ(x)-J(y)I
0, we must have L(v) L(O) L1(0) = 0, we have shown that L = L1.
Since for fixed Further, since
=
L1(v).
=
7 .2.3 Proposition. If a function has a differential at a point a, then the function must be continuous at a. The simple proof of this fact was given in Section
7.1
and we shall
not repeat it here. However, we should call attention to the fact that the proof of Proposition
7.2.3
very definitely requires the use of the
linearity of the differential. On the other hand, the proof given above of the unicity of the function L which satisfies the conditions of Defi nition
7.2.2 requires only the homogeneity of L;
that is,
L(au)
=
aL(u).
Proposition. If f is a function with domain in E" and range in and has a differential at a, then Vu E En, Duf(a) exists and
7 .2.4 Em
Duf(a) Proof.
Suppose
Ve> 0, 38> 0
so
u
E E"
that
Vh
=
and
df(a) (u) . lul
=
E R with
1. Now, Va E -B(J)0 and 0 < lhl < 8 we have a+hu
312 I HIGHER-DIMENSIONAL DIFFERENTIATION
E -B{f) and
If we divide by
e lhl.
IJ(a+ hu) - f(a) -df(a)(hu)I
�
lhl and note that df(a) is linear,
we see that the proposi
tion is proved in this case.
v E
If
E" and
v
�
0, then upon setting u = v/lvl we see that lul = 1.
Thus 1 Dv11v1f(a) = � df(a)(v).
But
Va E R, a
�
0,
f(a+ hau) -J(a) f(a+ hau) -f(a) ' -a ha h _
Duf(a) exists. then Dauf(a) exists and is equal to aDuf(a). lvlDv11vif(a) =Dvf(a). Finally, since Dof(a) = df(a)(O) = 0, we
so that if Thus
have completed the proof of the proposition. REMARK:
For future reference, let us call attention to the fact that
during the proof of the last theorem we have shown that if exists then
Va E R, Dauf(a) exists and
Duf(a)
Dauf(a) = aDuf(a) . As we saw at the end of Section 7.1, the converse of the last proposi tion is not true. That is to say, if necessarily true that
f
Vu E
has a differential at
E",
a.
Duf(a)
exists, it is not
Indeed, the example we
gave showed the existence of a function all of whose directional deriva tives exist at the origin but the function itself is not continuous at the origin.
However, if in addition to the existence of the directional
derivatives of a function at a point, the directional derivatives are continuous, then the function has a differential at the point. This is shown by the next theorem, which actually proves somewhat more.
7 .2.5 Theorem. Suppose f is a function with domain in E" and range in Em, {µ.k: k E (l,n)} is a basis for En, 3i E (l,n) so that Dµ.J(a) exists, and there is a ball B(a,p) C -B{f) so that Vj E (1, n) \{i}, B(a,p) C -B(Dµ.J) and Dµ.J is continuous at a. Then df(a) exists. Proof.
We shall break the proof into several steps.
Suppose u and v are linearly independent vectors in En so that Dvf(a) exists, B(a, p) C -B(Duf) and Duf is continuous at a. Then Ve> 0, 3S > 0 so that Va,f3 E R with lau +f3vl < S we have (a)
lf(a +au+ {3v) - f(a) -aDuf(a) - f3Dvf(a)I
� E
lau+ f3vl.
(7.2.2)
7.2 DIRECTIONAL DERIVATIVES AND DIFFERENTIALS I 313
From the definition of the directional derivative, so that if
lf3vl
0,
381
> 0,
then
IJ(a+{3v) - f(a)-f3Dvf(a)I :;;;
E
lf3vl .
Next, let us set
F(a, {3) = f(a+au+{3v), a+au+{3v
where we suppose that
laul + lf3vl
D1F(a, {3)
=
=
=
Since
f/2,(F )
l lim _ h-0 h
lim
h-0
[F(a+ h, {3)-F(a, {3)]
J [f(a+ (a+h)u+{3v) - f(a +au+{3v)] h
Duf(a+au+{3v).
C Em we may write
F(a, {3) where
E JFJ(f); this is certainly true if
< p. We have
f/2,(Fk)
m
=
L Fk(a, {3)ek>
k=l
C R. Using the one-dimensional mean value theorem
we get
Fk(a, {3)-Fk(O, {3) aDufk(a+Ok u+{3v), where 0 :;;; I O i :;;; lal. By hypothesis, Duf is continuous at a and k 38 with 0 < 8 < p, so that if laul + lf3vl < 8 , then 2 2 2 IJ(a+au+{3v) - f(a+{3v) - aDuf(a)I :;;; E laul. J k(a+au+{3v) - J k(a+{3v)
=
=
thus
If we now write
f(a+au+{3v)-J(a) and
take
we have
=
J(a+au+{3v)-f(a+{3v) + f(a+{3v)-J(a),
83 = min (81 , 8 ), 2
then
Va, {3
E R with
IJ(a+au+{3v) - f(a)-aDuf(a)-f3Dvf(a)I :;;; e
laul +lf3vl
[laul +lf3vlJ.
1, k < n - 1 and P(k) is true. Suppose u is a nonzero linear combination of (k + 1) vectors in A. Then 3l E (1, n) \{i}, so that u1 u - u1µ,1 is a linear combination of k vectors in A. From the hypothesis P (k), Va E R, Du1+aµ./(a) exists and where
u
ment:
=
=
=
=
j¢i Now, by part (a) of the proof, -.
(7.2.3") since by hypothesis
B (a, p) and is continuous at a. P(k) =::::} P(k + 1) and the induction is com Vu E E",D,,f(a) exists and (7.2.3) holds.
D,,1µ.J
is defined on
Thus (7.2.3') holds. Hence plete. This shows that
The second statement of part (b) is an immediate consequence of the
1 u µ,1 and u1 +aµ,; are linearly independent, the fact that D,,1µ.J is defined on B(a, p) and continuous at a, formula (7.2.3"), and the
fact that
inequality (7.2.2). (c) . The function f has a differential at a. From part (b) we know that Vu E E",D,,f(a) exists. If we set L(u) D,,f(a), then (7.2.3) shows =
7.2
DIRECTIONAL DERIVATIVES AND DIFFERENTIALS I 315
(7.2.1) is satisfied for (7.2.4). This completes the proof.
that L is a linear transformation. The fact that L is simply the inequality
The last theorem is usually stated in terms of the basis
REMARK:
{ei: j
E
( l, n)}
since in practical situations the partial derivatives
of a function are usually the easiest to compute. The fact that we stated the theorem in the form that allowed one directional derivative merely to exist and not necessarily be continuous at
a was not done for reasons of sophistry. This was done to include the n 1, where a function has a differential at a point if it has a
case
=
derivative at the point, regardless of whether or not the function has a derivative in the neighborhood of the point. Indeed, iff has domain in R and is differentiable at
a,
then
df(a) (u)
Vu
=
E R,
uf' (a) .
Let us now note a particularly useful matrix form of the differential of a function at a point. Suppose f is a function with domain in range in
Em
and
df(a)
En
and
exists. Let us find the matrix representation of
this differential with respect to the ordered pair of ordered bases,
( (e1,
•
•
·,en), (e., · · ·,em)). If m
¥-
n,
we are using the same symbol
ek
to stand for different vectors in different dimensional spaces. However, we think that no confusion will result. We may write
df(a) (ek)
=
m Dd(a) = L DdJ(a)ei.
Thus the matrix representation of
df(a)
with respect to this ordered
pair of ordered bases is
(7.2.5)
This matrix is called the
Jacobian matrix
of f at
a,
and, if
n
=
determinant of this matrix is called simply the Jacobian off at
a
m,
the
and is
denoted by ]J(a) . Of course, the reader should be aware of the fact that the Jacobian matrix of a function can exist at a point without the func tion having a differential at the point in question. If f has a differential at
a,
df(a) (u) in J and the components of u with (e., · · ·,en)· We already noted such a then it is easy to compute
terms of the partial derivatives of respect to the ordered basis form in formula
(7.2.3). In general let us write
!16 I HIGHER-DIMENSIONAL DIFFERENTIATION
and apply the linear transformation df(a) to both sides. Noting that df(a)(e ) k
=
D f(a), we get k df(a)(u)
n
=
L
k=l
ukD f(a). k
(7.2.6)
The last formula can be put into a form that we feel sure the reader has seen in elementary calculus, even though the meaning may not have been clear at that time. For every k E
(1, n),
let xk be that function with domain En and
range E1 defined by (7.2.7) Note that we have also used the same symbol 'xk• as a variable. We think it will always be clear from the context in which way we are using this symbol, and when we use it as a function it will be clear on which space it is acting. The function xk is clearly a projection and an almost trivial calculation shows that Vx, u E En Duxk(x)= uk. Since Vu E En, this is a continuous function of x, 0 so that
since
lg(x) - g(a) - dg(a)(x - a)I ,,;;: E Ix - al/2M, g(x) I g(a) - 11
0 so that Vu E Eq, ldf(b)(u) I � N lu l. Hence =
l df(b)(g(x)-g(a))-df(b) 0dg(a)(x-a)I � N lg(x)-g(a)-dg(a)(x-a) I. Next, Ve> 0, 38' > 0 so that Vx
E
cB(g) with Ix-al
lg(x)-g(a)-dg(a)(x-a)I
�
0 so that Vx with lg(x) -g(a)I < 8" we have =
IJ0g(x) -f(b)-df(b)(g(x)-b)I Let us set 8 min(8', 8"/L); then Vx have from (7.3.2) through (7.3.5), =
E
�
E
(7.3.4)
cB(f 0g)
e lg(x) -bl/2L .
cB(J 0 g) with Ix-al
(7.3.5)
0, 38 > 0 so that \Ohek + O'td
\Di Dkf(a + Ohek 0
+
O'tei)
-
Di Dkf(a) I 0
P ( q +
1) ,
=
q+
1. Hence
and the corollary is proved.
function f is said to belong to the class Ck ¢:::::> all of the partial derivatives of f of order � k have domain £J (f) and are con tinuous. The function f is said to belong to the class C00 ¢:::::> all of partial deriva tives of f (of all orders) have domain £J (f) and are continuous. 7 .4.4
Definition.
A
Ck, then from Corollary 7.4.3 it follows that any partial deriva f of order � k is independent of the manner in which the partial
If f E tive of
derivatives are composed. Hence the operator D'>, which we shall define in an informal way below, becomes rather useful. Suppose that C En and
se(f)
C Em and a=
(a1,
•
•
·,an), where a,.
set
E
£J(f)
N . We shall 0 (7.4.5)
If
J
E
Ck and lal
� k, we shall set
DJ= (J] where
0Dkk (f),
(7.4.6)
)
D k"k is ak compositions of Dk and Dk0f
=
f.
The notation
D"f
has become popular with the advent of the modern theory of partial differential equations.
7 .4.5 Definition. Suppose f is a function with domain in En and range in Em which has a differential at the point a and Vj E ( 1, k - 1 ) and V (u 1 , · · ·, ui), where u; E E", the function II;-!:1 DuJ has a differential at a. Then V ( u1, , uk) we set °
•
•
•
dkJ(a) (u1,
k
•
•
·, ud =fl 0D u J(a) , i=l
(7.4.7)
and call dkf(a) the kth -order differential of f at a. If u1 = u2 = · · ·= uk= u, we shall set
dkj(a) (u)k
=
dkJ(a) (u, · ·
· ,
u),
(7.4.8)
and d0f(a) (u)0= J(a). A special case of Theorem
7 .2.5 says that if a function has continuous
first partials at a point, then the function has a differential at the point.
7.4 HIGHER-ORDER DIFFERENTIALS AND TAYLOR'S THEOREM j 329
The same type of result holds for kth-order differentials, although for the sake of simplicity we shall state it in a slightly less general form than is possible.
7 .4.6 Theorem. Suppose f has domain in En and range in Em, and there is a ball B(a, p) C JFJ (J), so that all the partials of f of order� k, (k � 1)
have B(a,p) in their domains and are continuous there. Then f has a kth order differential at every x E B(a,p) and dkJ(x) (u1 ,
•
•
· ,
uk )
L n
i1=1 Proof.
U1;,
•
•
•
uk 1 k
k
IJ
j=l
0
D1if(x).
(7.4.9)
Let P(k) be the statement of the theorem. The statement
P(l) is true by Theorem 7.2.5. Assume P(k) is true and we shall try to prove P(k
+ 1).
Since the hypotheses of P (k
+ 1)
imply the hypoth
eses of P(k), it follows from P(k) thatf has a kth-order differential at every point of B(a,p) and (7.4.9) holds. Each function on the right side of (7.4.9) has, by the hypotheses of P(k +
1),
continuous first partials on
B (a, p), and thus by Theorem
7.2.5 has a differential at each point of this ball. Further, Vuk+l E E11 and Vx E B( a,p)
(fI
Duk+1
J=l
0
)
D;J (x)
=
.
±
'k+I =1
uk+1ik+1 D;k+1
( fJ
o
)
D;J (x).
J=l
Hence, if we apply Duk+1 to both sides of (7.4.9) and use the last equality
we see that the induction is complete. REMARK:
Clearly there is nothing special about partial derivatives
and we could have stated the previous theorem in terms of the direc tional derivatives of any basis for £11• Note that the right side of (7.4.9) shows that
dkf(x) is multilinear.
As an exampie of the formula (7.4.9) let us write down
d 3f(x) (u)3 in
terms of the more classical terminology for partial derivatives. We have
d3f(x) (u)3
L L .L ii i2uia u u . 1 =1 11=1 n
n
=
.
11
a3J(x) axiaaxt.ax;"
t3=l 2 i where of course u i is the ii component of the vector u.
7 .4. 7 Theorem. Suppose f is a real-valued function with JFJ(J) C En, and the line segment L {x: x th + (1 - t)a & t E [O, l]} is contained in JFJ(J). If Vk E (1, m) and Vx E L, f has a kth-order differential dkf(x), then Vx E L, 3c E L so that c yx + (1- y)a, y E JO, 1 [, and =
=
=
330 j HIGHER-DIMENSIONAL DIFFERENTIATION
f(x)= Proof.
For
1 m-1 1 L kl. dkf(a)(x - a)k +m.1 dmf(c)(x - a)m. k� -
VtE [O, l]
(7.4.10)
let us set
F(t)
=
f(tx +(I - t) a) .
Since xEL we may write x rb +(I - r)a with TE[O, l]. Hence tx +(I - t)a= trb +(I - tr)a with trE [O, l]. Thus tx + (I - t)a is in L and hence by the chain rule, VtE [0, 1], =
F'(t)
=
df(tx +(I - t)a)(x - a).
It follows by an easy induction argument that
E[O, l]
VkE (1, m)
and
Vt
we have
p(t) = dkf(tx +(1 - t)a)(x - a)k . If we apply the one-dimensional Taylor formula with the Lagrange form of the remainder, Corollary
F (I)=
4.4 .2(c),
we get
i 1i p(O) +� p (y)' m.
k k=O .
If we substitute in for
F(l), p(O),
and
yE]O,l[.
p(y)
we have completed
the proof of the theorem. The above theorem is valid only for real-valued functions. However, by applying it component wise to a function whose range is in
m
>
1,
Em,
we do get a Taylor remainder formula. However, the reader
should be cautioned that the same point
c
will not work for all com
ponents and will in general change with the different components. There is an integral formula for the Taylor remainder that looks like the integral remainder formula for functions defined in R. The integral formula does not require that that if
f( t)= �k1!:1 Jk ( t)e k
f
be real-valued. We only note
is a continuous function with domain
[a, b]
we define
7 .4. 7 Theorem. Suppose the hypotheses of theorem 7.4.7 are satisfied and in addition VxEL, dmf(tx +(I - t)a)(x - a)m is a continuous func tion of t. Then VxEL,
m-1 1 f(x)= L I dkf(a)(x - a)k k. k=O ·
7.4
HIGHER-ORDER DIFFERENTIALS AND TAYLOR'S THEOREM I 331
As in the proof of Theorem 7.4.7 we set
Proof.
F (t)
=
f(tx + (I - t) a) .
Then by the formula (5.2. l) we get
F(l)
=
I - t)m�1 pcm>(t)dt. �i � F(O) + Jo{ 1 ((m-1) .
k=Ok.
If we substitute in for p ck>(O) and F(l) we get formula (7.4.10'), which concludes the proof.
If we assume that f E cm, then we can write formulas (7.4.10) or (7.4.10') in the terminology of the operators na. In this case Corollary 7.4.3 tells us that it doesn't matter in which order the Di; are applied and we can write, fork� m,
ii dkf(x)(u)k
where
U
"
=
=
L u"caDaf(x), 1al=k
fl '/=1 ( U j)a;, and Ca is a constant independent of j and X.
This can be proved by an easy induction argument onk. Suppose pis a polynomial of
n
variables of degree m; that is,
p(x) = L Pa (X - a)", lai"'m
where (x - a)a =Ilj= (xi - ai )ai. From Theorem 7.4.7 we can write
1 p(x) = L (x - a)acaDap(a), lal.:;m
since the (m + l)st remainder vanishes. If lal � m and we apply n
a
to
both sides of this equation, we get
a!Pa= Dap(a) =a!caDap(a),
where a!= nil aj ! . If we choose pso that Pa ¥- 0, we see that Ca= I/a!
Thus we can write the formula (7.4.10) in the form
1 1 f(x) = L -, naf(a)(x - a)a + L I naf(c)(x - a)o:. a a. lal=m . lal.:;m-1 D Exercises 1.
Compute d3f(a)(x - a)3 for the following functions at the given
point a:
(a) (b) (c)
2.
f(x1, x2, x3) (x2)2 + 2x1 (x2)2 + (x3)2, a= (I, 0, -1). f(x1, x2, x3) =sin (x1 + x2 + x3), a= 0. f(x1' x2) =ex•x•, a= 0. =
Write Taylor's formula about (0, 0) for
f(x1, x2) =sin (x1 + ex•) form=3.
332 I HIGHER-DIMENSIONAL DIFFERENTIATION
3.
Suppose
p(x1' x2)
=
x1 (x2)2 + 3(x1)2 x2
x1 + 2.
+
Write this polynomial as a polynomial in powers of 4.
f E C2 and Dif(a) Dd(a) JFJ(f), show 3M > 0 so that Vb E
Suppose
convex set in
=
IJ(b) - f(a)I
=
,,s; Mlb
(x1 - 1)
and
(x2
-
1).
0. If B is a compact
B
- a l2 •
C"' and JFJ(f) is an JFJ(f) is a convex set containing a E JFJ(f), and 3M so that Va andVx E B, ID"f(x)I :s; M. Show that Vx E B, the remainder in Taylor's formula goes to zero as m -'> oo. 5.
Suppose
open set in E
6.
n .
f
is a real-valued function of class
Suppose B C
State and prove an analogue of Bernstein's theorem, 4.4.4, for
functions with domain in E n,
7.5
n > 1.
THE INVERSE AND IMPLICIT FUNCTION THEOREMS
f is a function with domain in E", range in Em and df(a) exists. df(a)(x - a) + f(a) approximates f(x) very closely in a neighbor hood of a we might hope that if df(a) is nonsingular, then f itself, re stricted to a neighborhood of a, is a one-to-one function. It turns out Suppose
Since
that this is essentially the case and our first object in this section is to prove this, and indeed somewhat more.
7.5.1 Proposition. Suppose f is of class C1 with an open domain in E" and range in Em. For every compact set K C JFJ(f) and Ve > 0, 3 f> > 0 so that Vx, y E K with Ix - YI < f> and Vu E E" we have
ldf(x)(u) -df(y)(u)I
,,s;
elul,
IJ(x) - f(y) - df(x)(x - y)I Proof.
Let
S
=
{u: u
E P &
lul
=
l}
(7.5.1)
,,s;
elx - YI.
(7.5.2)
be the unit sphere in E".
From the expansion
df(x)(u)
n
=
L uiDJ!(x),
j=l
df(x)(u) is continuous on the Cartesian product JFJ(f) X S. If we restrict df(x)(u) to K X S, the restricted function is uniformly continuous. Thus Ve> 0, 36 > 0 so that Vx, y E K with Ix - YI < a, and Vu ¥- 0 we have it is clear that
ldf(x)(u/lul) - df(y)(u/lul)I lul
we get
e.
df(x) and df(y), if we multiply the (7.5.1). I(u 0, (7.5.1) is clearly true.
Using the homogeneity of equality by
(f ) so that Vx,y E B(a, 8(a)) we have (7.5.1). Using the mean value theorem, Vu E £711 there is a c on the straight line joining x and y so that C
[f(x)-f(y)-df(x)(x-y) ]
·
u= [df(c)(x-y)-df(x)(x-y)]
·
u.
Replace u by [f(x) - f(y)-df(x)(x -y) J and for the corresponding c we get, using the C-B-S inequality,
lf(x)-f(y)-df(x)(x-y)I
�
ldf(c)(x-y)-df(x)(x-y)I.
If we now use the estimate (7.5.1) on the right, we have (7.5.2) in
B(a,8(a)).
The collection {B(a, 8(a)/2):a EK} is an open covering for Kand thus reduces to a finite subcovering {B(ai,8(a;)/2):J E (l,q)}. Let 8=min{8{ai)/2:J E (l,q)} and suppose x,y EK with lx-yl < 8. Now 3J E ( 1, q) so that Ix -a; I < 8. Thus IY-aiI � IY -xi + Ix-a;I < 8(ai) . Hence x,y E B(a;, 8(a;)), and since we have the estimate (7.5.2) in this ball we have concluded the proof. 7 .5.2 Corollary. If f satisfies the hypotheses of Proposition 7.5.1 and 3a E JE>{f ) so that df(a) is nonsingular, then there exists a ball B(a, 8) C JE>(f ) and 3m > 0, so that Vx E B(a,8) and Vu EEn we have
ldf(x)(u)I
�
m lul.
(7.5.3)
Moreover, if df(x) is nonsingular for every x in a compact set K C JE>(f ), then 3m > 0 so that (7.5.3) holds Vx EK. Proof. Since df(a) is nonsingular, it follows from Corollary 6.5.5 that 3 m > 0 so that Vu EE" jdf(a)(u)I � 2m lul. Now, from Proposi tion 7.5.1, 38 > 0 so that Vx E B(a,8) and Vu EE" we have
ldf(a) (u)I -jdf(x)(u)I
�
m lul.
Thus
jdf(x)(u) I
�
ldf(a)(u)I-m lul
�
m lul.
To prove the second statement, it follows from what we have just proved, that Va EK, 38(a) > 0 and 3m(a) > 0 so that (7.5.3) holds Vx E B(a,8(a)) , provided m is replaced by m(a). The collection {B(a,8 (a)):a EK} is an open covering for K and thus reduces to a finite subcovering {B(a;,8(aj)):J E (l,q)}. If we now take m= min {m(a;):J E (l,q)} we have completed the proof. The next two propositions constitute essentially the proof of the Inverse Function Theorem.
3M I
HIGHER-DIMENSIONAL DIFFERENTIATION
Supposef E C1 has (an open) domain in En, range in Em, and df(a) is nonsingular. Then there exists a ball B(a, 8) C �(f) and 3m > 0 so that Vx E B(a,8), df(x) is nonsingular and Vx,y E B(a, 8), IJ(x) - f(y)I � m Ix-YI. In particular th,is means that JIB(a, 8) is a one-to-one function. 7.5.3
Proposition.
Proof. From Corollary 7.5.2 there exists a ball B(a,281) C 3m > 0 so that Vx E B(a, 281) and Vu E En we have
�(J)
and
ldf(x) (u) I
2m l ul.
�
(7.5.3')
If we take K as the closure of B(a, 81), then it follows from Proposition
7.5.l
Thus
that
38,0
8 < 81, so that Vx,y EK with lx-yl IJ(x)-f(y) - df(x) (x-y)I ,,;;; m Ix - YI.
Vx,y
E
(!) so that f(a)=0 and df(a) is of rank n. Then there is an open set U C Em containing 0, an open set V C £>(!) containing a, a function g with £>(g) U and !R,(g) C V which satisfies the following: (a) g(O)=a. (b) f g(t)=0, Vt E £>(g). gE Cq and Vt E U, rank dg(t) m. (c) (d) If x E V and f(x)=0, then x E !R,(g).
=
0
=
Proof.
Let us identify Em+n with E"' XE" in the obvious way, and
X {O} of Em+n andE" with the subspace {O} XE" of Em+n. Let M be any linear subspace of Em+n of dimension n so that the range of df(a) IM is E". Let P be the projection of Em+n onto M 1. and A any linear transformation of Em+n into itself which takes M 1. onto E"'. This is possible, since dim M n and thus dim M 1.= m. identify Em with the subspace E"'
=
(See Exercises 12 and 13 of Section 6.5.) If
Vx E £>(!)
we set
F(x)=(A0P(x),f(x)) , then
F
is a function of class Cq with domain
Further,
Vu EEm+n
£>(/)
and range in E"'+".
we have
dF(a)(u)=(dA0P(a)(u),df(a)(u)) =(A0P(u),df(a)(u)). dF(a) is Em+n. Indeed, let (v1,v2) EEm+n and u1 E Af1. A0P(u1)=v1,u2EM so that df(a)(u2)=v2-df(a)(u1), and u=u1 +u2• Then A 0P(u)=v1, df(a)(u)=v2, and we see that dF(a)(u)=(v1,v2). Consequently dF(a) has rank m + n, which means The range of so
that
it is nonsingular. If we apply the Inverse Function Theorem to F, we find that there is an open set V C
£>(/)
containing
a
and an open set W C Em+n contain
ing F(a) so that FIV is a one-to-one function with range Wand having an inverse function
G
of class Cq. Set
W1= { T:
T EE"' & (T,0) E W}.
It is clear that W1 is open in Em and is nonvoid, since For every
T E W1
A P (a) E W1 . 0
let u s set
h(T)=G(T,0). Then
h E Cq
and since
d G (T,0)
is nonsingular it follows that
dG(T,O)JE"' has rank
m.
But
Vu EE"'
we have
7.5
THE INVERSE AND IMPLICIT FUNCTION THEOREMS I 341
d G(r,O) (u,O) = Hence rank dh(T)
=
m
�
i=I
uiDiG(r,O) =dh(r) (u).
m.
Now,
h(A0P(a))=G(A0P(a),O)=G(A0P(a),f(a)) (7.5.8)
=G°F (a)=a. Further,
Vr
E W1
(AoPoh(r) ,Jo h(r)) =F 0h(r) =F 0 G((T, O)) =( T, 0). Thus, we get
AoP0h(r)=r ,
(7.5.9)
f 0h(r)=O. x
Note also, if
E V and
f(x)=0, then F(x)
(7.5.10) E Wand
x = Go F (x) =G(A0P(x), O) =h(A0P(x)). 0P(a)={t: t U let us set
Finally,let us setU=W1-A &
T
E W1}
•
Then
Vt
E
g (t)=h(r), Clearly
E Em &
(7.5.11)
t=r-A0P(a)
t=r-A0P (a).
g satisfies the conclusions of Theorem 7.5.7, condition (d)
coming from (7.5.11). The proof is complete. Condition (d) is a uniqueness condition on
� (g) rather than on g
itself. We can get any number of other functions that satisfy the con
g with a function of class cq U onto itself, leaves the origin fixed, and is of rank m at every point of U. To pin down the uniqueness of g, the Implicit Func
clusions of the theorem by composing that takes
tion Theorem is usually stated in a special form. We state this as a corollary, although it is really a corollary of the proof.
7.5.8 Corollary. Suppose f is of cl ass Cq, q � 1, eB(f) CE"' X En, and �(f) C En. Suppose further that (a,b) E eB(f) so that f(a,b) =0 and duf(a,b) is nonsingul ar, where duf(a,b) is the differenti al of the func tion with dom ain in En and val u es f(a,y). Then there is an op en set U CEm cont aining a and an op en set YC E" cont aining b, so that UXYC eB(f) and a function g with eB(g)=U and �(g) CYthat satisfies the following: (a') g (a) =b. (b') f(x,g (x))=0, Vx E eB(g). (c') g E Cq. (d') If (x,y) E UXYand f(x,y)=O,then y=g (x).
342 I HIGHER-DIMENSIONAL DIFFERENTIATION
We shall use the notations of the proof of the last theorem,
Proof.
Em X En by (x,y). E". From the formula
except that we shall designate the elements of Let
u= (0, u 2)
E Em X 11
df(a,b)(u) = L u2iD +d(a,b) =duf(a,b)(u2) , m ;�1 duf(a,b) is nonsingular and f76(df(a,b)) C E", we df(a, b) has rank n. Let M=E"; then, of course, the orthogonal complement of Min Em+n is Em. As in the proof of Theorem 7.5.7 we let P be the projection of Em+n onto Em so that V (x,y) E Em+n we have P(x,y) =x. We take A to be the identity transformation of£"'+" onto itself. Hence the function F of the last theorem becomes and the fact that
see that
F(x,y) = (x,f(x,y)). If we apply the proof of the last theorem, we find that there is an open neighborhood U C C �(J)
Em containing a and an open neighborhood U X Y (a,b) and a function h(x) = (h1(x),h2(x)) of
containing
class cq with domain u and range in u x y so that from
(7.5.8) we have
h(P(a,b)) =h(a) = (a,b). Thus
Further, from
(7.5.9)
we have
h1(x) =P0h(x) =x , and thus from
(7.5.10)
w e get
f0h(x) =f(x,h2(x)) =O. g=h2, then condition (a'), (b') and (c') are satisfied. (x,y) EU X Y and f(x,y) =O; then (x,y) E �(F) and F(x,y) = (x,O). Applying the inverse function G and recalling that G (x,O) =h(x) we get If we take
To prove the unicity condition (d') let us suppose
·
(x,y)
=
G
°
F(x,y) = G (x,O) = (x,g (x)),
from which it follows that
y = g (x). This completes the proof. duf(a,b) is the matrix
Of course, the Jacobian matrix of
Dm+d1(a,b)
1 Dm+nf (a,b) Dm+nf"(a,b)
As we remarked after the proof of the Inverse Function Theorem, the easiest way to check that
duf(a,b) is nonsingular is to check that the
Jacobian, that is, the determinant of the above matrix, does not vanish.
7.5 THE INVERSE AND IMPLICIT FUNCTION THEOREMS j 1143
The reader may find it instructive to go back and review the examples given before Theorem 7 .5. 7 in the light of that theorem and its corollary.
O Exercises 1.
Define a function f on
E2 by means of the equations
f'(x,y)=x2-y2, f2(x,y)= 2xy. Show that f has a nonsingular differential at every point except the origin and thus at every point of E2\ { (0,O)} is
locally a
one-to-one func
tion. Show thatf is not a one-to-one function. Is the restriction off to some neighborhood of
(0,O) a one-to-one function? [Note: From the z=x + iy, then f'(x,y) is the real part of z2 and f2(x,y) is its imaginary part.]
point of view of complex variables, if we set
2.
Letf be that function on
f'(x,y)=
{x
+
E2 defined by
x2 sin (I/x) {::::> x
0
oF-
0,
if x = 0,
f2(x,y)=y. Show that
df(O,O) is nonsingular but thatf is not a one-to-one function df(x,y) nonsingular for every
on any neighborhood of the origin. Is
(x,y) in some neighborhood of the origin? 3.
(a)
Suppose that f is a real-valued function defined on
E2 by
f(x,y)=x-y2· Does there exist a real-valued function g defined in a neighborhood of
x=0 so that f(x,g(x)) =O? (b)
Suppose thatf is the same function as in part (a). Show that
there is a unique function so that 4.
f(x,g(x))
=
g defined on a suitable neighborhood of x = 1
0 and g(x)
>
0.
Suppose that f is a real-valued function on
E2 defined by
f(x,y)=x2 - y2. How many continuous functions g do there exist, defined on a neighbor hood of x= 0 so that
f(x,g (x))= 0? Are there more functions for which
this is true if we remove the requirement of continuity on g?
5.
Suppose f has domain
E2 and is defined by the equations
f'(x,y)= e cosy, x
f2(x,y) =ex sin y.
344 I HIGHER-DIMENSIONAL DIFFERENTIATION
Show that
f/2,(J)
=
E2 \ {O}. Isfa one-to-one function? Isflocally one
to-one? Note that in terms of the complex variable is the real part of
6.
f2 (x, y)
ez,
z
= x + iy , f1 (x, y)
is its imaginary part.
If the open set U of Corollary 7.5.8 is connected and his a con
tinuous function with domain U which satisfies (a') and (b'), show that
h=g. 7.
Suppose
f
is a real-valued function with
J0(J)
C E2 and satis
fies the hypotheses of Corollary 7.5.8. Compute the derivative of g in terms of the partial derivatives off. Extend the results of Exercise 7 to higher dimensions, that is,
8.
where the domain and range off are in higher dimensions. In fact, show that
d g(x) = -dyf(x, g (x))- 1 [Hint:
Vu
Note that
0
dxf(x, g (x)).
E Em and Vv E En·
dxf (x, y)(u) = df(x, y)(u, O) , dyf(x, y)(v) = df(x, y)(O, v) .] 9.
Suppose
f
E
C1
with domain in En and range in Em and
is nonsingular. Show that there is a ball B (a,
=JIB(a, p),
then
Va
E R so that
a= 1, 3m
lg(x)-g(y) I lx-yl"
Suppose
lim
lg( x) - g(y)I Ix - YI
f
E
C1
·
0,
>
�
m
·
Vu
Vx E J0(f ) , [Hint: Use the [J(x)-J(y)] when u = x - y.] E E",
u
=ft 0, and
=ft 0. Show thatfis a one-to-one function.
mean value theorem on
7 .6
df(a) g
so that if
and its domain is a convex set in E" and its
range is in En. Suppose further that
u df(x)(u)
J0(J)
> 0
lx-yl-O 10.
C
1 we have
Jim
lx-yl-O and if
a
0, then since dk is a continuous function, p so thatVe E B(a, cr ) andVk E (I, n), dk(c) > 0. Thus from (7.6.3), (7.6.4), and Theorem 6.6.14 it follows that Vx E B(a,cr), f(x) - f(a) > 0, and hence f(a) is a local minimum for J. k IfVk E (l,n), (-I) d(a) < 0, then arguing the same way as above and using Corollary 6.6.15, we find that f(a) is a local maximum for J. If Vk
3cr
�
To prove the last statement of the theorem we use the last statement in
6.6.15. This tells us that 3u, v E En, so that T(a)(u) u > 0, and T(a)(v) v < 0. Since the functions with values T(x)(u) u and T(x)(v) v are continuous, there is a ball B(a, 'Y)) C �(f) so that Ve E B(a, 'Y)), T(c)(u) u > 0 and T(c)(v) v < 0. Now VOi E R, Ol� 0, andVe E B(a,'Y)), Corollary
·
·
·
·
·
·
T(c)(Olu) OlU = 10ll2 T(c)(u) u > 0, ·
T(c)(Oiv)
·
·
OlV = 10ll2 T(c)(v) v < 0 . ·
e > 0, 3a E R, a � 0, so that IOlul < e and IOlvl < e. Sup pose 0 < e < 'Y), and we set y =au+ a and z =av+ a. Then y, z E B(a, e) and from (7.6.3) and (7.6.4), For every
7.6 MAXIMA AND MINIMA I 347
J(y) - f(a)= T(c)(y - a) ·(y - a)= T(c)(au) ·au> 0, f(z) - f(a) =T(c' )(z - a) Thus the function with values borhood of
a
so that
a
(z - a)= T(c')(av) ·av< 0.
·
f(x) - f(a)
changes sign in every neigh
is at a saddle point for
f
LAGRANGE MULTIPLIERS
If
f
is a real-valued function with
le(J)
C E", it very often happens
that we are not interested in the local extrema offbut rather in the local
extrema of a new function g that is Usually the subset of
le(f )
f
restricted to a subset of
le(f ). {x:
we are interested in is given by a set
h(x)=O} n le(J), where his a function with domain in E" and range m ,,;;:; n. This is a standard type of problem that arises, for example,
in Em,
in classical analytical mechanics. It is usually called an extremal problem
for funder the constraint
h(x)=0.
The method of Lagrange multipliers gives a necessary condition
that a point
h(x)=0.
a
should be at a local extremum of funder the constraint
Actually, it is based on Theorem 7.6.2, being an elaboration
on that theme.
7.6.5
Theorem. Suppose f and h are of class C1 with (open) domains Suppose also that f is real-valued, f/2,(h) C Em, m ,,;;:; n, and Vx E le(h), dh(x) has rank m. A necessary condition that a be at a local ex tremum for f restricted to the set {x: h(x)=O} n le(J) is that 3A. E Em, so that the function F with domain [le(J) n le (h)] X Em and de.fined by
in
E".
(7.6.5)
F(x,y)=f(x) + h(x) : y has a critical point at (a, A.); that is,
dF(a, A.)= df(a) + Proof.
We shall suppose that
m
L
k=I
A_k dhk(a)= 0.
(7.6.6)
m< n, since otherwise, as we shall h(a)=0, and rank dh(a)= m,
show later, the theorem is trivial. Since
according to the Implicit Function Theorem 7.5.7, there exists an open set U C En-m containing the origin, and a function g of class C1 with domain U, so that
Vt
E U, rank
dg(t)=n - m, h
0
and
g(t)=0,
g{O)
=a.
Since
g{O) E le(J), g is continuous, and le(J) is open, there is a B{O,p) CU so that t E B{O,p) ::::::}g(t) E le(J). Consequently, since a is at a local extremum for frestricted to {x: h(x)=O} n le(J), it follows that t =0 is at a local extremum for the function f g, and is ball
0
an interior point of the domain of this function. We may apply Theorem
7.6.2 to f
0
g, and
also use the fact that h
0
g
is the zero function, to
348 I HIGHER-DIMENSIONAL DIFFERENTIATION
get the following two equations:
dh
0
g(0) = dh (a)
df
0
g(O)= df(a)
0
dg ( 0)
=
O,
(7.6.7)
dg(O)= 0.
0
Let N be the null space of dh(a) and N1- its orthogonal complement in En. Now, �(dh(a))= �(dh(a) IN1-), and dh(a) IN1- is a one-to-one func tion. Hence, since rank dh(a)= m, we must have dim N1-= m, and since dim N + dim N 1n, we must have dim N=n - m. Since rank dg(0) =n - m, it follows from the first equality of (7.6.7) that N �(dg(O)). Since df(a) is a linear functional on E", it follows from Theorem 6.5.7 that 3b EE" so that Vu EEn, =
=
df(a)(u) From the second equality of
df(a)
0
=
u · b.
(7.6.8)
(7.6.7) we get Vu EEn,
dg(O)(u)
=
dg(O)(u)
·
b= 0.
Thus b E N1-. Now, from Theorem 6.5.9 we know that �(dh(a)1) =N1-. Thus 3.A EE"', so that
b= -dh(a)1(A). If we use this in
(7.6.8) we get Vu EE", df(a)(u)= -dh(a)(u) ·A.
Now,
(7.6.9)
Vu EE" and Vv EE"',
dF(a,A)(u,v) = df(a)(u)+ dh(a)(u) ·A+ h(a) where, of course, by
·
dy(v),
(7.6.10)
dy we mean the differential of that function defined (x,y) is y. Since h(a) 0, it follows from
on E" XE'" whose value at
=
(7.6.9) and (7.6.10) that dF(a, A)= df(a)+
m
L
k=l
_Ak dhk(a)
=
0.
n= m, then we cannot use the preceding technique since g does dh (a) has an inverse and thus dh(a)1 has an inverse. So again 3.A EEn so that b=-dh(a)1(A). We can then If
not exist. However, in this case
proceed exactly as before. However, this situation is really trivial, since
h is one to one in a neighborhood of a and thus a is the only point in h(a) 0. Consequently, a is an isolated point of {x: h(x)= O} n .B(J). Of course, J restricted to this set still has a relative maximum and minimum at a. The proof is concluded. If we write (7.6.6) in terms of partial derivatives we get n equations: the neighborhood where
Dk f(a)+
m
L i=l
=
A1Dkhi (a)
=
0,
VkE(l,n).
(7.6.11)
MAXIMA AND MINIMA I 349
7.6
From the fact that
h(a)=0 we get m more equations hi(a)=O,
VjE(l,m).
(7.6.12)
If in the set of equations (7.6.11) and (7.6.12) we replace (a,>.) by (x,y), then these equations can be viewed as a system of m+n equa tions in m+ n unknowns, x1, xn and y1, ym. The points that are at the relative extrema of J under the constraint h(x)=0 must be among the solutions of this system of m+ n equations. The auxiliary solutions A.1, ·,Am are called Lagrange multifJliers. Unless the functions f and h are relatively simple, the method of •
•
•
·
,
·
•
·
,
•
Lagrange multipliers is difficult to apply. However, we shall now give an example which shows that it can lead to nice results. We shall obtain the so-called
geometric-arithmetic means inequality. Other examples of
its uses are given in the exercises at the end of the chapter. We shall prove the following statement: IJVk
E (l,n), ak;;,: 0, th en
( J1n ak )l/n �:;;1 � ak. n
(7.6.13)
To prove this, we shall find the maximum of the function
f(x)
=
under the constraint
(il x1 )2.
xi
>
0,
Vj E (1, n) ,
n h(x)= L (xi)2 -1=0. J=l
By use of the method of Lagrange multipliers, the local extrema are contained among the solutions to the
n+ 1 equations
kE(l,n), n
(7.6.14)
L (xi)2=1. i=l
S�ppose
(b,A.) is a solution of the above system. If we multiply the kth (7.6.14) by b k we get
equation in
(7.6.15) If we sum up over k and use the last equation of
(7.6.14)
we get
nf(b)+A.=0. Putting this value of
A. into (7.6.15) we get, Vk E (l,n), (bk)2= l/n,
(7.6.16)
350 I HIGHER-DIMENSIONAL DIFFERENTIATION
and thus
A= -n•-n.
(7.6.16')
It is not difficult to check that the numbers given by (7.6.16) and (7.6.16') constitute a solution of the system (7.6.14) and thus this sys
f is defined. To see f, let us extend J, in the obvious way, to a continuous function F defined on the set D {x: x E E" & Vj E (l, n), xi ;;;.: O}. If Sis the unit sphere in En, then since F is continuous, FI (D n S) must take on a maximum and minimum. Clearly, t_he minimum is taken on when 3j E (l, n) so that xi= 0, and the minimum is 0. Thus the maximum of FI (D n S) is taken on when Vj E (l, n), xi> 0, and by Theorem 7.6.5 the point where the tem has a unique solution in the domain where
whether this solution leads to an extremum for
=
maximum is taken on must satisfy the system (7 .6.14). If we specify that Vj E
(l, n), xi> 0,
this system has a unique solution. Hence it
follows that the maximum is taken on at the point whose components are given by (7.6.16).
k
•
C R+,yk= (a )112,y= ( y1,· · -,yn) ,xk=yk/lyl, k x�) . Then !xi = l and by what we have proved pre
(l,n)}
Let {a : k E
and x = (x1,
•
·
,
viously we have
But
The last inequality is precisely (7.6.13). In case 3j E
(l, n)
so that
a;=0, the inequality (7.6.13) is obviously true. O Exercises I.
Suppose A is a compact set in
E"
with a nonvoid interior A0,
and f is a real, continuous function with domain A which has a differen tial at every point of A0• If Vx E {3A =A\A0, f(x) so that df(a) =
2.
0.
Let f be a real-valued function defined on
f(x,y) = ax2 If
a
-,!:. 0
and b2 - 4ac =
0,
Let
+
bxy
+
J has = (O, 0).
show that
or a relative minimum at (x,y)
3.
=
0,
show 3 a E A0
This generalizes Rolle's theorem.
E2
by the equation
cy2• either a relative maximum
f be that real-valued function defined on E2 by the f(x,y)
=
1 x2 + xy + 2 y3. 1
equation
7.6
MAXIMA AND MINIMA\ 351
Find all the relative maxima and minima for f restricted to the triangle and its interior which has vertices at the points
( -1, 6) .
(-1, 2 ) , (-2, 4), f restricted to
What is the maximum and minimum of
and this
triangle and its interior? 4.
Let
f be
that real-valued function defined by the equation
1
1
J(x,y) =2+2+2xy, x y
x=F-0, y=F-0.
Find all the critical points of the function and decide whether they are at a relative minima, a relative maxima, or at a saddle point. 5.
Let P be that plane in £3 whose equation is
3x+ y- 2z = 5.
(12, 1, 5)
Find that point on P whose distance from the point
is a mini
mum.
6.
Find the shortest distance from the point
surface in £3 whose equation is
7.
Let
f
xy - z=0.
{3, 3,-1) to the
be a real-valued function with domain £2 given by the
equation
f(x,y) = ax2 + bxy2+cy4. b2 - 4ac > 0, then f does not have a relative extremum at a2 + {32 ¥- 0, and t E R, the function defined by g,,13 (t) =f(at,f3 t) has a relative minimum at t=0 if a > 0 and a relative maximum at t= 0 if a< 0. Show that if
0.
However, if
8. & Vj
f is that real-valued function (1, n), xi> O}, and defined by
Suppose E
with
.B{f) = {x: x
E En
1 n n i=t
f(x) =- L xi. Use the method of Lagrange multipliers to find the minimum of this function under the constraint n
h(x) =TI xi - 1=0. i=l
Deduce the geometric-arithmetic means inequality 9.
(a)
(7.6.13).
For fixed positive p and q, let f be that function defined on
the open first quadrant of £2 by the equation
Show that the minimum off under the' constraint
352 I HIGHER-DIMENSIONAL DIFFERENTIATION
h(x,y)=xy-1=0 is
(l/p) + (l/q). (b)
b� 0,
Use the result of part (a) to show that if
(l/p) + (l/q)
1 , and
=
a�
0 and
p
> 1 and
q
> 1,
then
b a2 b2•
This is a generalization of the result that 2a
b]
10.
[a,
�
+
Suppose f and g are nonnegative continuous functions on and
h
is nondecreasing on the same interval. Use Exercise 9(b)
p
to show that if
> 1andq>1, and
(l/p) + (l/q)
1 , then
b (x)q dh(x) ]l/q . afb f(x)g(x) dh(x) [ fab f(x)P dh(x) ]l/p [ fag =
�
This is known as Halder's If p
=
1 and
=
q
oo,
inequality.
define
[ abg(x)q dh(x) ]l/Q f
=sup g.
Show that Holder's inequality is true in this case also.
11. and
= 1,
and
� ak 12.
p� 1 k� 0 bk� 0, [� ak rp [� b kqrq.
Use the results of Exercise
(l/p) + (l/q)
Vk
h �
E
10
to show that if
(l,n), a
for 1 � p
A,�
IRf(A,{xd)-Rf(A',{xD)I
n2 so that
(8.1.2) Since n' >n2, it follows that Vn >n2 we get
IS(n) -S(n') I < E/2.
(8.1.3)
From the inequalities (8.1.2) and (8.1.3) we get that n >n2 ==}
0 � ip( n2) -S(n) < E .
(8.1.4)
From condition (c) 3N E n1 and N >n2• Hence if n >N we get from (8.1.1) and (8.1.4) and the monotone character of 7P that
0 � (,O(n) -s < E, 0 � �(n) -S(n) < E. From these two inequalities it is immediately clear that n >N ==}
IS(n)-sl < E . Let us see how we can apply this concept to the various definitions of limit that we have given. In case is taken as the relation �- Suppose f is a real-valued function with rB(J) C En and a is an accumulation point of rB(J). If x, y E rB(J) \{a} set x >yIx -al � IY-al. It is easily checked
·
358 J HIGHER-DIMENSIONAL INTEGRATION
�(J) \{a} �(J) \{a} a a. (A*, {xk*}) >- (A, {xd) A*
that is a directed set under this relation. The function J restricted to the directed set is a net and the function f has the limit l at the net f has the limit l at In the case of the function Rf we take ,;Vas the domain of Rf and define is a refinement Note that with this definition our discussion of Cauchy nets provides a general proof for Theorems 5.1.7 and 8.1.6. For the functions Df and[]! we can take ,;V as the set of all decompositions of the given interval I and take as a refinement of Then Df becomes a monotone non increasing net and !2.f a monotone nondecreasing net. The numbers are the limits of these nets, respectively. and Of course, everything we have said for real nets will work as well for vector-valued nets with ranges in E", n � 1.
A* >-A A* D(J) Q(J)
A.
A.
D Exercises I. Suppose that f and g are real-valued bounded functions each having as domain the closed interval I C E". Show that
I [f(x) +g(x)] dx,,,:; If(x) dx+ Ig(x) dx, Lf(x) dx+ J g(x) dx,,,:; L [J(x) + g(x)] dx, 2. Suppose f and g are bounded real-valued functions having as common domain the closed interval I C E". If show that
If(x) dx,,,:; Lg(x) dx, Lf(x) dx,,,:; Lg(x) dx.
f,,,:; g
3. Suppose f is a bounded integrable function with domain the closed ·interval I C E" and J is a closed subinterval of /. Is it always true that
4. Suppose that I and j are closed intervals in E" so that I U J is an interval and I n j is at most an (n !)-dimensional interval. If J is a real bounded function with domain I U J, show that -
T 1x(g(A))
0 so that Vx ,y E c®(J) with lx-yl < S, we have IJ(x)-f(y)I < E. Proof. Suppose x E c®(J). By the definition of Lim and Lim, 3B(x ,S(x)) so that Vy E B(x,S (x)) n c®(J) we have
Lim J(t) - E/4 �x
0 the compact set f!(f, e) = {x: w(f, x) � e} has zero Jordan content. Proof.
and V'Y/>
Suppose f has a Riemann-Darboux integral. Then Ve> 0, there exists a decomposition a of I so that
0
o � Dr(a) -!21(a) � L [M(}) -m(])J Ill< ET/. JEA
Let a' = {]: } Ea & } n f!(f, e) oF- 0}; if} Ea', it follows that M(}) - m ( } ) � e. Also, since f!(f, E) c u {]: J Ea'} it follows from Theorem 8.2.4 that ,
x( n (f, e) ) � x ( u {]: J E a' } ) � L I J I . JEil.'
Consequently,
ex(f!(f, e) ) � L [M(})
-
m(]) ] 11 I < ET/.
JEA'
Hence VT/ >
0
we have
x(f!(f, e) )
0 , !f!(f, e)! there is a decomposition a of I so that
jjX[!(f, 0 so that Vx , y EK with lx-yl< 8 we have IJ(x) - J(y) I < 2E. Let al be a refinement of a so that I al I < 8. Let a1* be the set of all }1 in a1 so that 3] E a* with }1 C}. If L Ea*1, then
368 I HIGHER-DIMENSIONAL INTEGRATION
clearly
IJI,
M(L)
-
m(L)
:s;;
2e.
Consequently, if
M
is an upper bound for
we have 0 :s;;
i5,(Li1) - Q,(6.1)
L
=
[M(]) - m(]) ] IJ I
JE!J.1*
+
[M(])
L
-
m(]) ] Ill
JE!J.1/!J.1*
:s;;
2e II I+2MDxn 0, 3p > 0 so that Vx E B(a, p) n I will be needed later on. Since
set/, it is enough to show that
we have Lim t-a
f(t) - T}/2
0 so that
J\!l(f, e)
Vx
E
B(a, p)
n
I,
is relatively open in/.
Suppose A is a bounded set in En. Then XA of A is continuous except at the points of every x E {3A, w(xA, x) =I and thus Ve so that 0 < E :s;; 1, = {3A . Embed A into an interval I and the last theorem tells
Proof of Theorem 8.2.5. the characteristic function
{3A. For n (XA' E) us that
if and only if
L
XA (x) dx =
J!l(xA, e) I = lf3A I
I
XA (x) dx
= 0. An immediate corollary of Theorems
8.3.2 and 8.2.5
is the following.
8.3.3 Corollary. Suppose A is aJordan-measurable set and f is a bounded continuous real-valued function with J:>(J) =A. Then f has a Riemann Darboux integral.
8.3
Proof.
{3A.
EXISTENCE AND PROPERTIES OF RIEMANN-DARBOUX INTEGRALS I 369
A into I; then fA is continuous except possibly on 0, O(JA, e) C {3A, so that applying Theorem 8.2.4 8.2.5 we get IO(JA, e) I 0. The proof is completed by
Embed
Thus V e >
and Theorem
=
an application of Theorem 8.3.2. To put the result of Theorem 8.3.2 into a more usable form, it is necessary to introduce the concept of an outer Lebesgue measure. The outer Lebesgue measure can be defined in a manner analogous to formula (8.2.6). The basic difference is that in defining outer Lebesgue measure we allow a countable number of intervals in the covering, rather than only a finite number, as in the case of outer Jordan content. Although this does not seem to be much of a difference, actually it turns out to be quite profound, and ultimately leads to a theory of integration that is much more flexible and useful than the theory of Riemann-Darboux integration.
8.3.4
Definition.
as
If A
1(A)
C En,
=
g.Lb.
the outer Lebesgue measure of A is defined
{ � 111:
0 and
Vk
E
N0,
so that
L III .;;;1(4>(k)) +E/2k. Now,
'U
U {'Uk: k E N0} is an open covering for Band hence
=
1(B).;;;
� (� III ) .;;; �1((k)) +E
•
•
Since this is true,
VE>
0 we have proved (a). Part (b) is an immediate
consequence of the fact that every covering for B is a covering for
A.
The last proposition says, in particular, that the union of a countable number of sets of zero Lebesgue measure is again a set of zero Lebesgue measure. The reader should not come to the conclusion that sets of zero Lebesgue measure consist only of a countable number of points. Indeed, Cantor's set has zero Lebesgue measure and we have asked the reader to verify this in Exercise IO at the end of this section. We now give a connection between outer Lebesgue measure and outer Jordan content.
Proposition.
8.3.6
IfA is a bounded set in E", then 1(A).;;; x(A),
(8.3.6)
and equality maintains if A is compact. Proof.
E> {Ik: k
For every
of closed intervals
0, there is a covering of A by a finite number E
(1, m)}
so that
k=I Clearly,
Vk
E
( 1, m) _
there exists an open interval J k so that Ik C J k and m
>...;;; :L lhl k=I
0, 3 8 with 0 < 8 < 8' so that Vx, y E K with Ix - YI < 8 we have -71 < lfn(Y)l- IJg(x)I
0 there
is any n-dimensional interval in K with that
m
L lhl :s; III. k=I
and Let
a k be
the center of
I k.
Then using
(8.5. 7)
it follows is a finite
we get
Jg(I)J 0, 38> 0 so that x,y EBand Jx -yJ < 8 =>
Proof.
and AC
Jg(x)-g(y)-dg(x)(x-y)J
:s;
E
Jx-yJ.
{h: k E (l,m)} be a covering for Aby cubes so E (1,m)} C B, the center of h is in A, d(Ik) < 8 and
Let
(8.5.9) that
U {I k: k
m
L II kl :s; 2nx(B). k=I The factor
2"
I k in A. (8.5.9) we get
is needed to make sure we can get the center of
Let] be any one of these cubes and a its center. Then from
Jg(x) -g(a)-dg(a)(x-a) I :s; elVn, where 2l is the side length of]. Since
j11(a)
=
(8.5.9')
0, the rank of dg(a) is r 1/4}. 4.
{(x,y) : (x - 1) 2 l}, {(x,y): (x-1/2)2+y2 > 1/4},
to the four regions
k=l
8 .5
THE TRANSFORMATION THEOREM FOR INTEGRALS I 395
is a one-to-one function, and the Hessian off
Hr(x) = det[D; Dkf(x)] 0
¥-
0,
Vx
E
JE>(f).
Let A be a bounded Jordan-measurable set with A C JE>(f). Show that
IL H�x) I = l(Vf)-1 (A)I. 5.
Show that
J:., e-x• dx = y/;. Do this as follows: First note that
Change to polar coordinates and use the transformation theorem on the integral on the right. Do all this carefully, justifying each step. 6. Compute the volume of the unit ball B (0, I) in En by changing to spherical coordinates (see Section 6.5 and Exercise 8 of Section 8.4). [Hint: Use induction to show that the Jacobian of the spherical coordi nate transformation in En is
pn-l (sin (J1 )n-2 ( sin (J2)n-3 .. . (sin on-2).] 7. Suppose g is a function of class C1 with an open domain in E2 so that Vx E JE>(g),J0(x) ¥- 0. Give an example which shows that the transformation theorem may not necessarily be valid for this type of g. 8. Suppose his a linear transformation with domain E" and range in En. If A CE" is Jordan measurable show that h(A)is Jordan measur able. [Hint: If h is singular, then dim t1a. and Vtk
9.2
= where
l7Jkl
ak+1 [.
w
=
df 'Y
Let
be a decomposition of the domain of
so
Using Theorems 5.2.1 (d) and 5.2.2 we get
J
w=
'Y
J
df=
'Y
=
Since
w
y
over
is closed,
y
is zero.
m
.L
fak+l dj
y(t) dt dt
o
k=l ak m L [J y(ak+i) - f y(ak)] k=l 0
y(a1)= y(am+1),
0
and it follows that the integral of
To prove the converse we may assume, without loss of generality, that
.B(w)
is arcwise connected. Otherwise we can work with each open
component of
.B(w).
x0
Fix a point
E
.B(w) and Vx .B(w) with x0
be a piecewise smooth oriented curve in and
x
the final point of
yx·
.B(w)
let
'Yx
Let us set
This defines a function of independent of the choice
E
the initial point
yx· We claim of 'Yx· Indeed,
that for fixed
x0
and
x
it is
ax is another piece x0 to x. If yx has that its domain is [b, c].
suppose
wise smooth oriented curve which proceeds from domain
[a, b]
suppose
Define
{3(t)=
ax
is parameterized so
{ 'Yx(l),
Vt Vt
ax(b + c - t) ,
[a, b], [b. c].
E E
Then f3 defines a piecewise smooth closed oriented curve
m
.B(w).
Hence
Thus for fixed
x0
we may set
f(x)= F(yx) , and this defines a real-valued function on
w= df u E En
We shall show that
.B(w). Now, let B(x, 8). Let us set C
'Yx+hu(t) -
Since and
h
.B(w)
E R so that
{ 'Yx{t)' -b)u, x+h(t
.B(w).
is open
Vt Vt
E E
38 > 0 so that B(x, 8) hot=- 0 and x+hu E
[a, b], [b, b + I).
418 I THE INTEGRATION OF DIFFERENTIAL FORMS
Hence we get
{f w- w } J b+t n d k (t) f "' Wk (t) dt =dt h
J(x+hu)-f(x) ! = h h
'Yx
'Yx+hu
I
'Y
'Y x+hu
x+hu
,,,c., b k=I (b+I n wk(x +h(t-b)u)uk dt. =J b o
�
Ash� 0 we get
n Duf(x)= L wdx)uk = w(x)(u). k=I Vu E En the right side is a continuous function of x, it follows that df(x) exists, and hence w(x)=df(x). 1 In case w is of class C it is possible to give conditions on the partials of wk so that w is "locally exact." For the moment we shall restrict our 2 selves to the case where �(w) is an open set in E • Later on we shall n consider the case where �(w) C E . Let B be an open ball in �(w)
Since
and] an interval inB. From Stokes' theorem we get
If Now, if
V(x,y)
[D1w2-D2w1] dxdy =
iJ
w.
EB,
D1w2(x,y)=D2w1(x,y) , w
then we get that the line integral of
along every oriented rectangle
in Bis zero . Now, if
(a,b) is
the center ofBand
(x,y)
EB, set
r W1(t,b) dt+I: Wz(X, t) dt , f2(x,y)= J: w2(a, t) dt+ J: w1(t,y) dt.
f1(X, y) =
The number f1 (x,y) is the line integral of
w
along the curve consisting
of the horizontal straight line proceeding from the vertical straight line from
(x,b)
to
(x,y).
the line integral consisting of the vertical line to to
(a,y) (x,y).
and then the horizontal straight line Since the line integral of
inBis zero, it follows that
V(x,y)
w
(a,b) to (x,b) and then The number /2(x,y) is proceeding from (a,b) proceeding from (a,y)
around every oriented rectangle
EB,f1(x,y) = f2(x,y). Now, a simple
calculation shows that
Dz/1(x,y)=w2(x,y),
Dif2(x,y)=w1(x,y).
9.4 CLOSED AND EXACT DIFFERENTIALS I 419
Since w1 and
w2 are continuous,
if we setf
=
f1
=
f2, then from Theorem df(x, y). Thus
7.2.5 it follows that df(x, y) exists and of course w(x, y)
wlB
=
is exact.
The discussion of the last paragraph prompts us to make the following definition.
9.4.3 Definition. A differential form w with domain an open set in En 1 is said to be closed� w is of class C and Vj, k E (I, n) and Vx E E(w),
DJwk(x)
=
Dkwi(x).
A little later on we shall present a definition of a closed differential
form in a much more compact and more easily remembered notation. For now, let us remark that we have proved above that every closed differential form with domain in E2 is "locally exact" in the sense that the restriction of the closed form to any ball in its domain is exact. However, it is not necessarily true that a closed form is "globally exact." For example, the form
x
- ......:::.1_ w(x, y) 2 + 2 dx x y
is defined on
2 £ \{0}
+ z--- 2 +y x
and is closed. However,
dy
w
is not an exact form.
Indeed, if J is any interval in E2 containing the origin in its interior, then (see Exercise 7 of Section 9.3)
l
w#-0.
aJ
It follows from Theorem 9.4.2 that
w
is not exact, even though it is
locally exact. From the previous example it would seem that for a closed form to be exact, there would need to be additional conditions on its domain. This is actually the case, and for the purpose of obtaining these addi tional conditions we introduce the following definition. To make the notation easier we shall suppose, for the remainder of this section, that we shall only work with representatives from a given curve that have domain
[O, l].
9.4.4 Definition. Two closed, piecewise smooth, oriented curves 'Yo and y 1 in a set E C En are said to be homotopic in E� there exists a continuous function r with domain [O, l] X [O, l] that is piecewise smooth in each variable, has range in E, VT E [O, l], f(T, O] f(T, 1), and Vt E [O, l], , t t y y O, t and f( 0(t), 1( ). f( ) l ) A piecewise smooth oriented curve y in E is said to be homotopic to zero in E� y is homotopic in E to a constant curve; that is, a curve 'Yo so that Vt E [O, l], y0(t) y0(0). =
=
=
=
420 I THE INT.EGRATION OF DIFFERENTIAL FORMS
It is not difficult to establish the fact that the homotopy relation is an equivalence relation, so that the piecewise smooth oriented curves in a given region break up into pairwise disjoint homotopy classes. It is also not difficult to show (Exercise 7 of Section 9.4) that the homotopy relation, 9.4.4, is independent of the piecewise smo9th representatives we pick from each curve. If every closed, piecewise smooth oriented curve in an arcwise con nected region is homotopic to zero, then we say that the region is simply
connected. From the point of view of the homotopy relation, the last statement says that a region is simply connected if all the closed, oriented, piecewise smooth curves belong to the same homotopy class.
9.4.5 Definition. An open set /€) C E" is said to be simply connected /€) is connected and every piecewise smooth, closed oriented curve in /€) is homotopic to zero. Roughly speaking, a simply connected set in
E2 is one that has no
holes in it. Of course in higher dimensions we no longer have such a simple interpretation. An example of an arcwise connected set in
E3
that is not simply connected is an anchor ring. For the purpose of giving an example, let us note that if an open set in
E" can be contracted to a point by means of straight lines, then the
set is simply connected. To be more precise, let us say that the set
S C En is star-shaped with respect to the point a E S Vx E S, the straight line L= {y: y ( 1 - t)a + tx & t E [O, l]} belongs to S. Of course, =
every convex set is star-shaped with respect to every point in the set. Suppose S is open and is star-shaped with respect to
a E S. If y is
a piecewise smooth, oriented closed curve in S, set
f ( T, t) = (1 - T) a + TY (t) , Clearly, variable,
VT E [O, l].
f has range in S, is continuous, is piecewise smooth in each f( T, O) = f(T, l) , f(O , t ) = a , and f(l,t)=y(t). Thus y is
homotopic to zero.
We want ultimately to prove that every closed first-order differential form defined on a simply connected domain is exact. First, we shall prove that every closed form on an open domain is locally exact. We have proved this previously for closed forms having domains in
E2,
making use of the Stokes-Green-Gauss theorem. Although we could still make use of this theorem in higher dimensions, it is much simpler to proceed by a more direct method. Of course, since we know what we are looking for, it is easy to discover a direct method of proof.
9.4.6 Theorem. Every closed first-order differential form with domain an open set in E" is locally exact; that is, its restriction to every open ball in its domain is exact.
9.4
Proof.
Let
w be
a first-order closed differential form with domain an
open set in En and suppose
B(a, r)
x
[O,I]
CLOSED AND EXACT DIFFERENTIALS I 421
C
B(a, r)
.e9(w).
Define the functions on
by
s(x,t) Clearly, for fixed
x,
=
( I - t)a + tx.
the range of
s
is the straight line joining
x
and
a.
Let us set
If we apply the operator
Di to
both sides we may use Theorem
8.4.3
to move this operator from the outside to the inside of the integral. Now,
[
Di wk
o
s(x,t)
ask(x,t) ask(x,t) = [Di wk0s(x,t)] at at ask(x, t) . +Wk0s(x,t) Dj at
J
Further, using the chain rule, and noting the form of
s,
we get
Diwk0s(x,t) = tDiwk(s(x,t)). Also,
· ask (x,t) D) at Now use the fact that
w
=
- = 8)k
l
l 0
j
j ¥= k.
=
k,
is closed so that
Thus we have
- t aw;0s(x,t) + Wj at
(
o S X,
t).
Consequently,
(1 aw-0 ) s(x , t) dt+ (1 Wj0s(x,t) dt. DJ(x) =Jot Jo at If we integrate the first integral hy parts we get
DJ(x) = w;(x). Since
w
equal to
is continuous, it follows that
w(x).
df(x)
exists and, of course, is
422 I THE INTEGRATION OF DIFFERENTIAL FORMS
The theorem we have just proved is one of the crucial steps in showing that every closed first-order differential form in a simply connected region is exact. A second cruc�al step is the following lemma. For the
a curve a is in a 8-neighborhood of the curve 1' if there are representatives of each curve so that
purposes of this lemma we shall say
sup{ja(t) - y(t) I: t Suppose w domain an open set in En. For .,®(w) , 3 8 > 0 so that for every with a ( 0) = 1'(0) , a (I) =1' (I) 9.4. 7
Lemma.
Proof.
Let
compact and and
Vx
E
28
is a closed first-order differential form with every piecewise smooth, oriented curve 1' in piecewise smooth, oriented curve a in .,®(w) and a in a 8-neighborhood of 1' we have
B(y(T), 28)
y, by
to >
.,®(w)c. Since 5C,(y) is 0. For every T E [O, I]
set
f
w
+
'Y,.
(0, T]
5C,(y)
be the distance from
.,®(w)c is closed, it follows that 8 J,(x) =
where
[O, I]} < 8.
E
f,
w,
s-r.x
is that oriented curve which has a representative defined on
y ,(t) = y(t ) ,
and
s,,x(t) =(I - t)y(T) + tx, t E (0, I]. As the Vx E B(y(T), 28), df,(x) = w(x).
proof the last theorem shows, Next,
Vx
E
B(a(T), 8)
let us set
g,(x) =
J
w+
a,
where
a,
w,
TT,X
y,, and, of course, r, ,x(t ) B(a(T), 8), dg,(x) = w(x). that Vx E B(a(T),8)
is defined in a manner similar to
=(I - t)a(T) + Since
J
B(a(T), 8)
Vx
tx. We also get that C
B(y(T), 28)
it follows
E
d[f,(x) - g,(x)] =0 , from which it follows that there is a number
Vx
E
c(T)
so that
B(a(r) ;8),
f,(x) - g,(x) =c(T). We think it is clear that
c
is a continuous function of
T.
Let us put
To= sup{T: c(T) =O}. The set on the right is nonvoid since is well defined and we claim
To< I.
Take
To< T1 �I
so
To= I. that t
0
certainly belongs to it. Thus
To
For suppose to the contrary that E
[T0,T1] =}y(t)
E
B(y(T0),28)
CLOSED AND EXACT DIFFERENTIALS I 423
9.4
a(t) E B(a(To), 6) . It is possible to do this because of the continuity y and a. Now, Vx E B(y(T0), 26) let us define
and of
Vt Vt Vt The function
f3x
oriented curve in
E E E
[T0, T i], [Ti.Ti+ l]' [T1 + l, Ti+ 2].
is the representative of a piecewise smooth, closed,
B(y(T0), 26)
(Fig. 9.4.1). From Theorems 9.4.6 and
FIGURE 9.4.1
9.4.2 it follows that
{ w= J f3x This shows that
Vx
J
w-
'Yn
E
J
w+
'YTO
f
w-
8n,x
f
8To,x
w=O.
B(y(T0), 26) , J,0(x) =f,,(x).
In exactly the same way we get that
Vx
E
B{a(T0), 6)
g,0 (x) =g,,(x). Thus
Vx
E
B(a(r0), 8)
we get
c(Ti) =f,,(x) - g"(x)
=
f,0(x) - g,0(x)
=
c(T0) = 0,
which, of course, is a contradiction. Consequently, since
y( 1) = a ( 1),
we get
fi(y(l)) =g1(a(l)). But this says nothing more than
9.4.8 Theorem. Suppose w is a closed first-order differential form with domain an open set in En. If a and f3 are oriented, piecewise smooth closed
424 I THE INTEGRATION OF DIFFERENTIAL FORMS
curves in .® ( w) , and are homotopic in .® (w) , then
Proof.
We shall divide the proof into two parts.
a(O) = a(l) = {3(0) = /3(1). [O,l] which gives the homotopy between a and /3, and let us suppose that VT E [O,l],f(T,O)= f(T, I)= a(O). Also, suppose that f(O,t) = a(t),f(l,t) = {3(t). Let E be the set of all points a E [O,l] with the property that VT E [O,a] (a)
To begin with, we shall suppose that
[O,l]
Let r be a function on
x
I.
w=
where
f7(t)= f(T,t).
J
w,
a
fT
The set E is a nonvoid set, since
To= sup E Lemma 9.4. 7, 3 e > 0 so
0
EE. Further,
it is clearly bounded. Hence,
is well defined. We claim
To= 1. Indeed, by
that if y is in an
e
neighbor
hood of rTO> then
I.fTO
w=
J')'
w
3o > 0
Now, since r is uniformly continuous,
Vt
E
so that
lro - Tl < o �
[O, I], lf(T,t)-f(To,t)j
m, it is still true that
flC,(T)
If
=
Em
tell
(9.5.1)
is in an m-dimensional subspace of
E", which is identifiable with E"' through an orthogonal transformation U of E" onto itself. Hence we now
T(A)
define
the m-dimensional content of
by
IT(A)I =IV T(A)I. 0
(9.5.2)
430 I THE INTEGRATION OF DIFFERENTIAL FORMS
The content on the right is computable by (9.5.1). Of course, we must make sure that the definition (9.5.2) is independent of the orthogonal transformation U which takes §Tt(T) into Em. Indeed, suppose V1 is another such orthogonal transformation. Now, if T is singular, the dimension of §Tt( U T) and §Tt(V1 ° T) is less than m and thus 0
IV
0
T(A)I = IV1°T(A)I=0.
If Tis nonsingular, then W = V1 ° v-1IEm is an orthogonal transforma tion of Em onto itself. Thus IV
0
T(A)I = ldet V Tl IAI = ldet W V Tl A I I = ldet V1 ° Tl IAI = IV1°T(A)I. 0
0
0
Let us now give an effective way of computing the right side of (9.5.2) so that the operator V does not intervene. Let us first note that T1 T is a nonnegative symmetric linear transformation from E"' into itself, that is, Vu E Em, T1 T(u)· u;:,, 0. Thus this linear transformation has a matrix representation consisting of nonnegative eigenvalues down the main diagonal (see Section 6.5). Suppose we arrange them in nondecreasing order d