Introduction to Applied Numerical Analysis

This Dover edition, first published in 2012, is an unabridged republication of the edition published in 1989 by Hemisphe

262 48 10MB

English Pages [356] Year 1989

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
vii
Preface
ale
1
Roundoff and function evaluation
2.
33
Real zeros of a function
2
57
Complex zeros
4.
81
Zeros of polynomials
5.
105
Simultaneous linear equations and matrices
6.
143
Interpolation and roundoff estimation
7.
175
Integration
8.
199
Ordinary differential equations
9.
221
Optimization
10.
247
Least squares
i:
261
Orthogonal functions
ied:
283
Fourier series
13.
297
Chebyshev approximation
14,
309
Random processes
15.
325
Design of a library
329
Index
Recommend Papers

Introduction to Applied Numerical Analysis

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

RICHARD

W. HAMMING

Introduction o Applied Numerical Analysis

DOVER BOOKS ON MATHEMATICS HANDBOOK OF MATHEMATICAL FUNCTIONS: WITH FORMULAS, GRAPHS, AND MATHEMATICAL

Tastes, Edited by Milton Abramowitz and Irene A. Stegun. (0-486-61272-4) ABSTRACT AND CONCRETE CATEGORIES: THE Joy OF Cats, Jiri Adamek, Horst Herrlich,

George E. Strecker. (0-486-46934-4) NONSTANDARD METHODS IN STOCHASTIC ANALYSIS AND MATHEMATICAL Puysics, Sergio Albeverio, Jens Erik Fenstad, Raphael Hgegh-Krohn and Tom Lindstrom. (0-486-46899-2) Matuematics:

Its ConTENT,

METHops

AND

Meaninc,

A. D. Aleksandrov,

A. N.

Kolmogorov, and M. A. Lavrent’ev. (0-486-40916-3) COLLEGE GEOMETRY: AN INTRODUCTION TO THE MODERN GEOMETRY OF THE TRIANGLE AND

THE Circe, Nathan Altshiller-Court. (0-486-45805-9) THe Works oF ARCHIMEDES, Archimedes. Translated by Sir Thomas Heath. (0-486-42084-1) REAL VARIABLES WITH Basic METRIC Space Topotocy, Robert B. Ash. (0-486-47220-5) INTRODUCTION

TO DIFFERENTIABLE

MANIFOLDS,

Louis

Auslander

and

Robert

E.

MacKenzie. (0-486-47172-1) PROBLEM SOLVING THROUGH RECREATIONAL MATHEmatics, Bonnie Averbach and Orin

Chein. (0-486-40917-1) THEORY

OF

LINEAR

OPERATIONS,

Stefan

Banach.

Translated

by

F. Jellett.

(0-486-46983-2) Vector CatcuLus, Peter Baxandall and Hans Liebeck. (0-486-46620-5) INTRODUCTION TO VECTORS AND TENSORS: SECOND EDITION—T'Wo VoLUMES BouNnD AS ONE, Ray M. Bowen and C.-C. Wang. (0-486-46914-X) ADVANCED TRIGONOMETRY, C. V. Durell and A. Robson. (0-486-43229-7) FourIER ANALYSIS IN SEVERAL COMPLEX VARIABLES, Leon Ehrenpreis. (0-486-44975-0) THE THIRTEEN BOOKS OF THE ELEMENTS, VOL. 1, Euclid. Edited by Thomas L. Heath.

(0-486-60088-2) THE THIRTEEN BOOKS OF THE ELEMENTS, VOL. 2, Euclid. (0-486-60089-0) THE THIRTEEN BOOKS OF THE ELEMENTS, VOL. 3, Euclid. Edited by Thomas L. Heath.

(0-486-60090-4) AN INTRODUCTION

TO DIFFERENTIAL EQUATIONS AND THEIR APPLICATIONS,

Stanley J.

Farlow. (0-486-44595-X) PARTIAL DIFFERENTIAL EQUATIONS FOR SCIENTISTS AND ENGINEERS, Stanley J. Farlow. (0-486-67620-X) StocHastic

DIFFERENTIAL

EQUATIONS

AND

APPLICATIONS,

Avner

Friedman.

(0-486-45359-6) ADVANCED CaLcuLus, Avner Friedman. (0-486-45795-8) Point SET Topotocy, Steven A. Gaal. (0-486-47222-1) DiscovERING MATHEmatics: THE ArT oF InvesTIGATION, A. Gardiner. (0-486-45299-9) Lattice

THEORY:

First

CoNcEPTS

AND

DistRipuTivE

Lattices,

George

Gratzer.

(0-486-47173-X) ORDINARY DIFFERENTIAL Equations, Jack K. Hale. (0-486-47211-6) METHOps OF APPLIED MATHEMatics, Francis B. Hildebrand. (0-486-67002-3) Basic ALcesrA I: SEconp Epition, Nathan Jacobson. (0-486-47189-6) Basic ALGEBRA II: Seconp Epition, Nathan Jacobson. (0-486-47187-X) NUMERICAL SOLUTION OF PARTIAL DIFFERENTIAL EQUATIONS BY THE FINITE ELEMENT METHOD,

Claes Johnson. (0-486-46900-X) ADVANCED EUCLIDEAN GeomeTRY, Roger A. Johnson. (0-486-46237-4) GEOMETRY AND Convexity: A Stupy in MatHematTicaL METHops, Paul J. Kelly and

Max L. Weiss. (0-486-46980-8) TRIGONOMETRY REFRESHER, A, Albert Klaf. (0-486-44227-6) (continued on back flap)

a)

:

a

>

tt

7 a’

"

a

—s

Le

Po

sy

Richard WY,

Moin *

ne

a“

Me

og

id

Av! *

ia

,

. ry

A

# ¢

é



s

ie teal

»

-

% be

®

=

=



‘,

«

4

- |

ed »

a

7

7



:

7

oa

—S

i

:



is

7

(

Kd

|



: free

7

ass

ie

4

mt >

A

é

4

he

7

a

ry

Tes

me



FY

wee.

alt



ata rhe

Gae

oe

Ye Vie neal

"

a z

O _

oe

Preps, — {

a

, 1 7

A



“=

7s

ahs,

a. oe

ot.

Sn

aa

;

:

s-

Coet i

ct

9

Eines: rhe

“© ye © © wel : 7

| Caer

>

sno

Pana,

age

e

tee

(a4)

eure

ey ‘al

© mefolien rer os ’

=

Pre

esac

7

POG

or

meas J Satay ieee)

ts, Sepevem Sa

nr

Ce "i

tam

Cx: nr

v

coe

oF

wa

| atioken ot gertegns Inte ment thaw sw ast beTOG >sails 7 an dgaatieg awed srstuqmae Faga tears. aq oa)ot eee -

-egrnebnt adr rot oghstwond

besilarsega stoutbas iti

os

i:

= wren bape eodiane gnisogrine 10Om 0) 06 wo ep * : pneu Oh et ottotew aut? “nig banwhee ett i sient pit

aux {Eliade 10bisShe

t on,

sd

: el)

sar

ade

ie

ney ors -

preface

The appearance of yet another “Introduction to Applied Numerical Analysis” (even if the word “Applied” distinguishes it from most of the other books) requires justification. The justifications can be grouped under the following headings: 1 The Subject Matter Covered The material in this book was selected by considering those topics which the practicing scientist or engineer seems to need most, and which at the same time can be reasonably treated within the limits of a one-term course. In the belief that for many students this may well be the terminal course, many topics are treated rather lightly, instead of a few topics very intensively. The interrelationship of the material and its treatment has further affected the selection. The result is a somewhat unconventional set of topics for a first

course.

2 The Treatment of the Material There is a strong effort to treat the material as simply as possible and in a uniform way, rather than as a set of unrelated topics. Also, some effort is made to prepare the student to face new problems that are not treated in the book. 3 The Use of Illustrations There is a deliberate attempt to teach as much as possible through the use of pictures and simple numerical illustrations, rather than through the more usual use of equations and words. It is hoped that this method will make the learning process easier as well as longer lasting. The viewpoint adopted through4 The Viewpoint Adopted l analysis is a subject in itself numerica applied out the text is that nor a branch of mathematming program of part a neither is and ics. These differences are deliberately stressed since most students seem to feel that applied numerical analysis is closely related to mathematics or programming, or both, and they need to be warned repeatedly that in practice there are many significant differences. 5

The

Separation

of

Programming

from

Numerical

Analysis

, as is so Teaching programming along with numerical analysis principle ical pedagog nown well-k a s violate often tried today,

PREFACE

that you should not try to teach more than one new idea at a time. Such attempts usually introduce a new programming idea and then treat the numerical illustration in a superficial and often false way. As a result, the student emerges with a wrong conception of what numerical analysis is and how to use it practically. The material in this text has been covered a number of times in fourteen 2-hour lectures to an evening class of advanced undergraduate and graduate engineers. Thus, in the usual 3-hour course there may be time for an additional 14 lectures on the local high-level language, Fortran, Algol, Mad, PL/1, or Basic, along with the essential parts of the local monitor system. Probably, these lectures on programming should be given by an instructor other than the one who gives the numerical analysis lectures, because when introducing the beginner to a computer, the teacher is justly interested in showing off the power and the range of application that the computer has, and therefore he usually does not have a real feeling for the details that occur in applied numerical analysis (which uses a small fraction of the flexibility of the machine). This plan of teaching programming and numerical analysis separately but within a single 3-hour course can probably be done with well-prepared, more advanced students; with not well-prepared, less advanced students, the text could be used to fill the entire course and the programming could be taught separately. Few people today are an island unto themselves, and of necessity much of what they know and believe is learned from their colleagues. Thus, I am especially indebted to my colleagues at the Bell Telephone Laboratories and to the environment that encourages independent thinking. Many friends and students have made comments on earlier drafts, but as usual they are not responsible for the faults. In particular, I also wish to thank Prof. Roger Pinkham of Stevens Institute for helping me at every stage of the work, and for encouraging me to continue to the end. Others to whom I owe particular thanks are Jolin Athanassopoulos for most of Chap. 9; M. P. Epstein, J. F. Kaiser, R. G. Kayel, S. P. Morgan, and D. G. Schweikert for comments on the manuscript; and Mrs. Dorothy Luciani and Miss Geraldine Marky for first-class secretarial help.

richard w. hamming

Introduction to

APPLIED NUMERICAL ANALYSIS

| A

‘ M

sk

omajie

ri

>of

i



=

es %

ve

a

~~

neee

2.

ee

wake hates 4shesgenie RON Livner

0

Choose

125 = 0 Choose

1.38

=x =15 x= 1.44

y(1.44) = 0.0736 y(1.38) y(1.44) < 0

2.3

WARNINGS ON THE BISECTION METHOD

Choose

1.38 = x < 1.44 x=141

etc.

How shall we end the cycle of repetitions? Among the possibilities are:

|x1 —x2| SE X1— Xe

i——

=e

absolute accuracy in x i

P

relative accuracy in x (except for x = 0)

xy

3

|f(x) —f(xe)| =e

function values small

4

Repeat N times

a good method

Which one to use depends on the problem, author tends to favor method 4.

PROBLEMS

but, as indicated,

the

2.2

1

Apply the bisection method for three steps to find 3,

2

interval of 1 to 2. Find the zero between

given the original

z and 37/2 for which tan x =x.

Follow the flow

diagram at each step. Calculate to one decimal place

14+V3 t=

2

by first finding a suitable quadratic that it satisfies. Calculate to three decimal places the solution of xe™=1

Find three decimal places at the ‘‘first”” zero of cos x = x.

2.3

Warnings on the bisection method

proof,” yet ‘we The bisection method seems to be practically “idiot that the thought have may we First, need to be careful when we use it.

function was continuous when in fact it had a pole.

40

REAL ZEROS OF A FUNCTION

For example, if

F(x) =

x—

7

then the bisection process will come down to a small interval about x =, and probably none of our tests will have caught the error (an overflow in evaluating the function would have been a clue). We need, therefore, to include a test for the size of f(x,) before we accept the final interval (x,, x2) as containing a zero.

Go on

Print: Pole in interval (1 ’ Xa)

Seo

Xe

Note: fix,)flx2) > 0

x

2.3

WARNINGS ON THE BISECTION METHOD

Note that if we had used the end of repetition test |f(x,) —f(x2)| 0.300 x 10-3 ‘ae |

0.314 x 10!

inequality! and we do not have the fineness in spacing to achieve the

ee

function values is The effect of roundoff in the computation of the interval in which an find does method the that not serious in the sense for the method. claimed is that all is —which change there is a sign

41

42

REAL ZEROS OF A FUNCTION

We may be misled into identifying this interval with that in which there is a mathematical zero of the function, but from the practical point of view what more can we reasonably expect if we do not also include an estimate of the roundoff noise in the function values? (This is almost never done in practice, though obviously it could be included if necessary.)

PROBLEMS 2.3 1 2

2.4

Draw acomplete flow diagram for the heart of the bisection method with use of test 4 for the ending. Show that the bisection method always finds one value when it starts with an interval with three zeros in it. Discuss the general problem of an odd number of zeros.

A search method for the bisection method

The heart of the bisection method assumed that there was an interval Xx, =x = x, such that

F (xi) f(x2) < 0 What we probably started with was the problem of finding all the real zeros in some interval a = x < b, and we now face the task of partially filling this gap. The solution is simple; we start with a search step of size

where n is the number of subintervals to be searched. Having computed f(a), which we assume is not zero, we compute a+h and f(a +h) and then form

2.4

A SEARCH METHOD FOR THE BISECTION METHOD

>0

fiaf(ia+h)=4=0 0 and risk high

Each user must debate and settle for himself how he will choose the number of search subintervals in each particular problem. Note that we have used n, the number of subintervals to be searched, rather than h, the step size, as the input parameter. If we used h, then we might fall into the following kind of a trap. Suppose in

2.5

THE FALSE-POSITION METHOD (A POOR METHOD)

one instance we search from 0 x =—5, and without thinking that it will be necessary to change the sign of h; we try again with the result that the routine will never end. It should be clear that the storage for the list of the zeros found must be provided, but since, at most, n intervals will have sign changes, there will be, at most, n zeros found (by this simple process). Probably the count of the number of zeros found should be given first, and a warning, as well, of any odd-order poles that were picked up. We have deliberately left for the student the task of sketching out a complete diagram of the method (which may require some “‘initializing” in places).

PROBLEMS 2.4 1 2

Fill in the details of the complete search and bisection method. What step size would you use on

1

y=sinzs——

3

(x > 0)

and why? inExplain why for = 0, we stepped two intervals forward rather than one terval.

2.5

The false-position method (a poor method)

regula The idea behind the false-position method (sometimes called The records. ical mathemat earliest falsi) can be found in some of our inthe decrease to trying in that ion observat the method is based on if one end function, the of sign the in change a is there which terval in is probably closer to value is large and the other is small, then the zero the small value than it is to the large.

45

46

REAL ZEROS OF A FUNCTION

Fla) Approximating line

New guess

Drop this end value

Mathematically speaking, we pass a straight line through the end points [a, f(a)], and [b, f(b)] and use this line as an approximation to the function f(x). This line

ule)=fla) +P FO)6_ 4) has the zero

_ f(a(b—a) _ af(b) — bf (a) ~ f(b) fla) f(b) F(a) and this provides a next guess for the position of the zero. Note that f(b) —f(a) cannot be zero but is in fact an addition of two numbers because f(b) and f(a) have opposite signs. Having found the new guess, we evaluate the function at this point, namely, calculate f(x), and then drop the old end point where the function value has the same sign as f(x) has; thus, we keep the zero

2.6

A MODIFIED FALSE POSITION (A GOOD METHOD)

we are searching for inside the interval. If f(x) =0, it is treated separately. In the false-position method we cannot be sure of the number of steps required to decrease the interval by a preassigned amount; indeed, since the approach to the zero tends to be from one side only (see the accompanying figure), the approach can be very slow.

Modern computer usage often requires ‘‘time estimates’’ for the job to be run, and these are difficult to make for the false-position method, whereas they tend to be easy for the bisection method. For all these reasons, the author suggests avoiding the false-position method.

PROBLEMS 2.5 1

Draw a flow diagram for the complete false-position method, the initial search for an interval.

including

on 2 Find the V2 starting with a = 1 and b =2, by using the false-positi 3

2.6

method (three figures). Find the zero of xe* = 1 by the false-position method.

A modified false position (a good method)

improves it. A simple modification to the false-position method greatly one-sided slow, the is The main weakness in the original method step each at ly arbitrari we this, remedy approach to the zero. To picture anying accomp The 2. by keep we that value divide the function shows the effect of this modification.

47

48

REAL ZEROS OF A FUNCTION

Compute

new

x, fix)

Test

fla)f(x)

Have zero

Set b=x

f(b) = fix)

fia) = 40)

We note again that since f(a) f(b) < 0, there is no loss of accuracy in computing the denominator

f(b) — f(a) since this is an addition. This method is usually the most effective simple method to use so far as speed is concerned. We can no longer use the count of steps to be done as the ending test of the loop, and we have to use one of the €

tests. Thus, the timing of the process of finding a zero is no longer easily estimated. The same search technique as used for the bisection method is available for the modified false-position method.

2.8

PROBLEMS 1

NEWTON'S METHOD (ANOTHER METHOD TO AVOID)

2.6

Apply the modified method to

ya 2

—e

The choice of halving the function value was arbitrary. Discuss other pos-

sible choices and when to use them. 3

2.7

Draw a complete flow diagram for a zero-finding routine based on the modified false-position method.

The secant method (a method to avoid)

It is often proposed that in the false-position method we always keep the two most recent values of the function and use the secant line through them as the basis for the next guess. y

Secant line

The idea that the more recent values are closer and therefore better is basically a good one, but the method can fail badly when we get both points on the same side of the zero and the approximating straight line, because of roundoff or just plain bad luck, leads us far astray.

PROBLEM 2.7 4

2.8

Make sketches showing how the secant method can fail.

Newton’s method (another method to avoid)

finding real The calculus course usually gives Newton’s method for line tangent a fit to is method the zeros of a function. The idea behind of x, estimate current the of point the at function to the curve of the the zero.

49

50

REAL ZEROS OF A FUNCTION

Tangent line

The zero x;,,, of this tangent line provides the next guess for the zero of the function. In mathematical notation, let x, be the current guess. Then the tangent line is

y (x) =f (xn) +f" (xx) (x — Xe) and the value where y(x) =0

is

Xkt1 = Xk —

f (xx)

fi (xp)

This formula provides a method of going from one guess x; to the next guess x;,4;. Newton’s method when it works is fine, but the restrictions on the method are seldom discussed. The three sketches show some of the troubles that can occur when the method is used carelessly.

Inflection point

fle)

Xk+2

Xe+1

Xk

Possible multiple zero

x

2.8

NEWTON'S METHOD (ANOTHER METHOD TO AVOID)

F(x)

Xk+e Xk+1

Xk

23

Local minimum (maximum if f(x) < 0)

Thus, in practice, unless the local structure of the function is well understood, Newton's method is to be avoided. When Newton’s method closes in on a zero, it tends on each step almost to double the number of decimal places that are accurate; thus, from two figures accurate we get in one step almost four figures accurate, in the next step almost eight figures, and so on. This is why, when it can be made to work, it is a good method.

Find a formula for Newton’s method for the function y = xe7—1

SOLUTION

y' =(x+ lje* a pe ral re 8 Xn+i = Xn

(i,t 1).e=8

=X, —-

tee §

errr"

an re ¥

Xu, t1

en

EEE

Another fault of Newton’s method is that it requires the differentiation of the function to find the derivative, and then the coding of the derivative, both of which certainly increase the chances for an error. But in spite of its faults we shall frequently make use of the method when we have information to indicate that the troubles that can plague it are not going to happen.

PROBLEMS 2.8 Draw a flow diagram for Newton's method.

Apply the method to the V2. Apply the method to the VN. Apply the method to the WN. = 2wonp

51

REAL ZEROS OF A FUNCTION

5

Apply the method to the nth root of a number N. ANS.

2.9

1 N X441 = ” [«— 1) x, bere

Multiple zeros

The mathematical which makes

idea of a zero of a function y = f(x) is a number x,

y (xo) =f (x0) =0

In computing we must usually settle for being ‘“‘close”’ to zero. The idea of the multiplicity of a zero seems to have arisen first in connection with polynomials where a zero x) corresponds to the factor x — X9. Since a polynomial of degree n has n factors, it is natural to say that it has n zeros even when some of the factors are repeated.

Factor theorem for polynomials If f(xo) =0, then x — xo is a factor of f(x) and conversely. If (x — x9)* is a factor, then xo is a zero of multiplicity k.

Thus, using the idea of multiplicity, we can say that a polynomial of degree n has n zeros (real or complex). The idea of a zero can be extended to functions that can be expanded in a Taylor series about the point xo,

F(x) = dp + a(x — Xp) + a(x — x)? +°° = The multiplicity of the zero is the number of consecutive coefficients, starting with ao, which vanish, that is, if a,=0

and

a; #Q_

then a single zero

a)=a,=0

and

a, #0

then a double zero

dy)=d,=a,=0

and

a,#O0 __ thena

and so forth.

f(x) = ay (x — X%) + ag (x — x)? +--f(x) = de (x — xo)? + a3(x —xo)8 ++:

F(x) = a3 (x — x0)? + a4(x — Xo)4 + °°:

triple zero

2.9

But what are we to say about the all too common

MULTIPLE ZEROS

53

type of function

y= Vat —x®

5

atx=2+a?

aa

a

x

What is the multiplicity of the zero? The definition of multiplicity could be extended to such situations by defining it to be k > 0 if lim

f(z)

ESR rea

=c #0

(nor infinite)

z—r,

regardless of whether k is an integer or not. When k is not an integer, the method we are going to give of examining the successive derivatives will fail. The bisection method and some other methods as well will locate isolated zeros of odd order and will usually miss even-order zeros. If it is essential to find all the zeros in an interval and their multiplicities, then searching both the function and the derivative will be necessary.

If the zeros of

y =f(x)=0 are

%;, X2, X3, ° : « (which locates all odd-order zeros) and of

g =f'G) =0 are x,, xb, x}, ... (which locates all even-order zeros), then the combined set of points are the only ones that need to be examined carefully (if the zeros are all of integral order). There will be many difficult decisions to make, however. For example, consider the two functions

yi =fi(x) = x? and

Yo=fa(x) =x? +

54

REAL ZEROS OF A FUNCTION

Both functions have a zero in their derivatives at x = 0. But how small can € be before we decide that we shall attribute it to roundoff effects and say that both have a double zero? For the case of polynomials, the problem can be given a somewhat better answer, but for a general function, it is simply hard to decide.

2.10

Miscellaneous remarks

The problem of finding the real zeros of a function has been intensively studied for many years, and there are many methods described in the literature. We have given only a few of them. The bases for our choices have been simplicity, effectiveness, and range of application. We have tended to avoid methods whose success depends on properties of the function that the person who uses a computing machine is not likely to know and which can fail if the user cannot watch to see that the method actually works. Frequently, the problem is not merely to find the zeros of a function

y =f (x) but rather to find the zeros as functions of some parameter, Say 4.

Real {x(A)} Imaginary {x(A)} x,(A)

-_

Pe oa ae

2.10

MISCELLANEOUS REMARKS

55

Thus, we are given

y =f (x, d) and are asked_-to find the zeros

wid)

#

(k= 1, 2,-4.7)

as functions of \. Once we have found the zeros for the first few values of A, then we can usually use this information to guess approximately where the zeros for the next value of A will lie—we can track the zeros as functions of }. The methods for predicting where they might be found for the next A value can be developed from the methods given in Chap. 8. The root-locus method for locating roots (often treated in network-design courses) can also be used to advantage at times. We shall not, however, examine the question further in this book.

=

HAMI@ BUG BWAL oii

ors

nj

t

F

a

v

*F

3



é

~~

4

-_



;

=

-

i.

an



_

1

F

.

as ™



4. vf Sh

2

i

:

a

-~< ee I 4

|

4

ia

a

a

7

a

ie

_s

oats

Hl

re

i

7

_ itt

_

5

Yrs"

=

:

»

.

ging:

gibt gay > + of ihe,

pre.

ng

-

_

te Ue ee *

==

Ls

t¥ —

Te Ree ap sa

iz

=

i,

ey

ane

eae "

ab OF

a "ne ped

COMPLEX ZEROS

3.1.

Introduction

We now examine the problem of finding the complex zeros of an analytic function which lie in a finite region of the complex plane. By an analytic function we mean that at all points in the region the function can be represented by a convergent Taylor series about the point. For convenience only, we shall assume that we are searching a rectangular region for complex zeros.

Typical region in the complex plane

problem; In Chap. 2 we usea tne notation y = f(x) for the real-zero the which in notation le -variab complex usual we now change to the is variable nt depende the and iy + x = z is variable independent

w = f(z) =f (x+ iy) of functions in the Once it is clearly understood how the definitions it becomes obthen , domain x comple the to ed real domain are extend vious that w(z) can be written as

w(z) =f(x + iy) = u(x, y) + iv(x, y)

COMPLEX ZEROS

where we have grouped the real and

w=eF — 33 =ertiv



(x + iy)?

=e*(cos y +i sin y) — (x? + Qixy — y?) =(e* cos y — x? + y?) +i (e” sin y — 2xy)

imaginary terms separately. Thus, the single condition

f(x + iy) =0 is equivalent to the two conditions u(x,y) =0

v(x,y) =0 The u(x,y) is called the real part and the v(x,y) is called the imaginary part of f(z).

Example

Find the zeros of w = sin z

The basic definition is ez



sin z=

ei? e-¥ —

ei

2i

eit eu

sin (x + iy) =——_———_

2i e-*(cos x +i sin x) — e¥(cos x —i

Qi ef en? e eet Cae === COS @hora ta SAT 2i 2 evY—e-¥

v= cos x ~

.

o2 evY+e-¥

u = sin x ————

Zeros require

ll a2 oo

sin x)

3.1.

INTRODUCTION

From u=0, sinx=0

“x=kr

From v = 0,

e’—e4=0 “y=O0

Hence all zeros

The equation u(x,y) =0 defines a set of curves in the complex z= x+iy plane. The equation v(x,y) =0 defines another set of curves, and it is only at the intersections of these two sets of curves (one from the u =0 curves and one from the v =O curves) that w(z) =0 can be realized. Thus the problem of finding the complex zeros of w = f(z) is equivalent to finding the intersection of u(x,y) =0 and v(x,y) =0 curves, and this, algebraically, is the simultaneous solution of the two equations.

real values for For the frequent special functions which take on only be in the may that i’s any of spite (in nt real values of the argume it follows that zero, a is iy + z=x if then, ), function the of description

the conjugate z = x — iy is also a zero.

59

COMPLEX ZEROS

(x,—ty) A sketch of a proof goes as follows: Since i and —i are formally indistinguishable, we have at a zero

w = f(z) = f(x + iy) =0 and

w = f(z) =f (x— iy) =0 where the long conjugate bar means that the i’s in both the function and the argument are replaced by —i’s, and the short bar means only the i's that occur in the function itself are so replaced. By hypothesis

w = f(x) =u = f(x) so that

w = f(x — iy) = f(x — iy) =0 Thus, in the special case of real functions of a complex variable, whenever we find a zero x + iy in the upper half-plane, we know that there is the corresponding conjugate zero x —iy in the lower halfplane, and we therefore need only look for the zeros that fall in the upper half-plane.

PROBLEMS 1.

3.1

Find u and v if In (x + iy) =u + iv

2 3

Write a‘ asu + iv.

Write i! as u + iv.

3.2

3.2

THE CRUDE METHOD

The crude method

Since the bisection method for finding real zeros is so reliable and easy to understand, it is worth extending the ideas to the problem of finding complex zeros. The bisection method has two parts, the search process (isolation) and the refinement process (improvement). We first reexamine the search process for finding real zeros. We went step by step, using a suitably chosen step size, until we found a pair of adjacent values of f(x) with one in the upper half-plane and one in the lower.

A real zero

nearby

+

+a

aaa

+

We can imagine that we had simply recorded + or — (or possibly 0) at each search point and then by eye isolated the real zeros. Similarly, in the crude method for complex zeros we shall search our rectangular region, using a suitably chosen grid of points in the complex z= x +iy plane, and at each point we shall record the quadrant number 1, 2, 3, or 4 that

w = f(z) falls in (we record a 0 if the function falls on the axis).

u=0 or the

v=0

the quadrants as best we We now take colored pencils and color in curves The can from the recorded quadrant numbers.

u(x,y) =0

61

62

COMPLEX ZEROS

divide quadrants 1 and 2, and quadrants 3 and 4, whereas v(x,y) =0

divide quadrants 1 and 4, and quadrants 2 and 3. iv

(u,iv) plane

(x, iy) plane

Evidently, where (at least) four quadrants meet, we have a zero of w = F(x). If eight quadrants meet at a point, then there is, as we shall see, a double zero, and so forth, but at this stage we shall confine our attention to simple zeros where four quadrants meet. One way we can refine the accuracy of this crude method is by merely “enlarging” any small region we are interested in until we have (by successive enlargements if necessary) sufficient accuracy, or else roundoff (or the granularity of the number system) will cause trouble. This crude method is easy to understand and is very effective, but it can be costly in machine time.

PROBLEMS 1

3.2

Prove that the x axis in a plot of quadrant numbers is a line of zeros for a real function of a complex variable.

3.3

2

AN EXAMPLE

USING THE CRUDE METHOD

Plot the quadrant numbers for the function w = e*

(Os x = 27)

(0=y = 2n) From the plot we see why the function has no zeros.

3

3.3

Plot the quadrant numbers for w = z?

An example using the crude method

As an example of how the crude method works, consider the problem of finding the complex zeros of the function w = f(z) =e? — 2?

which lie near the origin. The function is a real function of a complex variable so that we need only explore the upper half-plane. We try the rectangular region —a7ixi2T

cKSySi2c

The quadrant numbers are easily computed on a machine and are plotted on the figure. The x axis is a line of 0's, as it should be (there is one other almost zero number which we have marked as a 0). In drawing the curves u=0 and v =0, we remember that the curves in the lower half-plane are the mirror images (in the x axis) of those in the as upper half but that the quadrant designations interchange 1 and 4 well as 2 and 3.

u=0

11144444 44 11146444444 44444 4 11118 111\8 444444 3 11144444 3 11144444 3 4 4 1114 2 2 ro 111 49\4 4 4 2 2 2-252 4 1 1 A Pee 22 m1 1 d eee oes it 2222N1 { 1 Vegeta 2222A1 es fecka ae Real zero

Complex zero

u=0 = eats

63

64

COMPLEX ZEROS

Does this picture seem to be reasonable? The real zero on the negative real axis seems to be about right because we know that as x goes from 0 through negative values, e* decreases from 1 toward 0, whereas z? goes from 0 to large positive values. Thus, they must be equal at some place (and we easily see that this happens before we reach —1). The complex zero we found is likely one of a family of zeros, the next one appearing in the band 2ns=y

t4r

An examination of the picture shows that it is a reasonably convincing display of where the complex zero is approximately located. We could easily refine the particular region if we wished by simply placing our points in a closer mesh.

3.4

The curves u=0

and vt =0

at a zero

At any point z = Zo, the Taylor expansion of a function has the form F(z) =f (Zo) + f" (zo) ee

+f"

(Zo) Gal

+ f(a) oes

or f (2) = do + 44 (% — Zo) + d2(z — Zo)? + a3 (z — Zp We set for each k ay, = A,e'*e and we also set

iy ay

Ax dx

(A, real)

)B+°--

EL Shep

3.4

THE CURVES u =0 and

v= 0 AT A ZERO

Therefore, the Taylor expansion has the form f(z) = Ape*o + A,pet*i1+® + A, p2etort26) 4+... At a simple zero f(z.) =0, then, Aj = 0, and for small p the “immediate neighborhood of zy f(z), looks like”’

F(z) = A, peXes*?) or f(z) ~ Aip [cos (¢; + 8) +i sin (¢; + 4)) The u =0 curves are approximately given by A,p cos (¢,; + 6) =0 or

=—bi +p +ke

(k =0, 1)

and the v =0 curves are approximately given by

Aip sin (d, + 6) =0 or

6=—d,+kx

(k=0,1)

curves intersect at right angles, and We see that the u =O and v =0 each have an angle of approxicolor to plan hence the quadrants we mately 90° at the zero.

COMPLEX ZEROS

The picture we color is therefore easy to interpret at a simple zero. Note that this happened at both zeros of the example in Sec. 3.3. At a double zero both f(z) and f’(z) are 0, so that the Taylor series looks alike F(z) =

A, p?e es +28) + Ag pretles +39) Se

We have for small p u ~ Ap? Cos (d2 + 26) 5 es A, p? sin (do ar 26)

fa

eS

rs

Oe Oy ay te re

ka

k

o=0-70=- 34> and the angles of the colored quadrants are approximately 45°.

It is easy to see that for a triple zero, the colored quadrant angles will be approximately 30°, and in general for a multiplicity of m, we shall have the u=0O and vw =O curves meeting at approximately a/2m radians (or 90°/m).

PROBLEMS 1

3.4

Sketch the u =0 and v =0 curves for

w=2?-z2+5

3.5

A PAIR OF EXAMPLES OF u =0 and v=0 CURVES

using the lattice points

2

3.5

k x=7

k=0, 1, 2,3, 4

eri

m=0,

1, 2, 3, 4

and check by using the quadratic equation formula. Prove that for a zero of order m, the quadrant angles are 7/2m radians.

A pair of examples of u = 0 and c = 0 curves

The following pair of simple polynomials illustrate how the u =0 and vu =0 curves behave at zeros and elsewhere in the plane. The first example is a simple cubic with zeros at —1, 0, and 1. The

polynomial is w = f(z) = (z + 1)2(z —1) =z? —2 = (x + iy)® — (x iy)

= (x? — Sxy* — x) + i(3x"y —y? —y) The real curves are defined by u =x?

— 3xy?

—x =0

or x(x? — 3y? — 1) =0

This is equivalent to two equations x=0 x? —3y?-—1=0

The latter is an hyperbola whose asymptotes are

(==

x

V3

67

COMPLEX ZEROS

The imaginary curves are defined by

v = 3x?y —y® —y=0 This again is equivalent to two equations

y=0 3x? — y2=1

The latter is again a hyperbola, but this time with asymptotes y=tV3x

The figure we have drawn

looks reasonable.

In the first place, at

each simple zero the real and imaginary curves cross at right angles as they should according to theory. Secondly, far out, that is going around a Circle of large radius, the curves look as if they were from a triple zero, and the local effects of the exact location of the zeros tends to fade out the farther we go out. For the second example, we move the zero from z=—1 to z=0, which makes a double zero at z = 0. The polynomial is

3.5

A PAIR OF EXAMPLES OF u =0

and v =0 CURVES

w = z?(z — 1) = z8 — 2?

= (x + iy)® — (x + iy)?

= 2° — Sxy? — x? + y* + 1(Sx*y\— y? — 2xy) The real curve is

u = x? — 3xy? — x? + y?=0 Solving for y?, we have co st which is easily plotted as it has a pole at x = 3, zeros at 0 and 1, and

symmetry about the x axis. As we expect, the asymptotes are parallel to

ye

=

x

+——

VS

The imaginary curve is v = 3x?y — y3 — 2xy = 0

which is y (3x? — y? — 2x) =0

or y=0

and

Sad) owis The last curve has asymptotes

y=+V3 (x—3) | which is what we expect.

70

COMPLEX ZEROS

Sketching these curves, we see that far out in the complex plane they are the same as in the previous example. At the double zero, there are two real curves and two imaginary curves alternating and crossing at 45° angles. The rest of the curves look as if the real and imaginary lines were being forced together by the movement of the zero from z=—1 to z=O, but as if they tend to repel each other strongly. Note that in the example in Sec. 3.3, the infinite sequence of zeros must be considered in judging the reasonableness of the shapes of the curves.

3.6

General rules for the

u =0 and v = 0 curves

We cite without proof the principle of the argument which comes from complex-variable theory. This principle states that as you go around any contour, rectangular or not, in a counterclockwise direction, you will get a progression of quadrant numbers like Wy

Ay

Wy ey

Sr

ey

Oi roy Se

ae tere

with as many complete cycles, 1, 2, 3, 4, as there are zeros inside (we are assuming that there are no poles in the region we are searching).

3.6

GENERAL RULES FOR THE u =0 AND v=0 CURVES

12321412341234 2 zeros inside

There

may at times be retrogressions numbers such as

in the sequence

of quadrant

1,1, 1,2, 2,3, 3, 2, 3,.3,4, 4,4, 1... for some

suitably shaped contour, but the total number of cycles completed is exactly the number of zeros inside. We have glossed over the instances of jumping over a quadrant number (say a 1 to a 3) and will take that up later. We have assumed that the 0’s that may occur are simply neglected, as they do not influence the total number of complete cycles. To understand this principle of the argument, the reader can try drawing various closed contours in the previous examples. No matter how involved he draws them, he will find that he will have the correct the number of zeros inside when he counts +1 if he circles the zero in the circles he if —1 counterclockwise direction, and when he counts course. of twice, count zeros zero in the clockwise direction. Double In the general analytic function, the u =0 and v =0 curves can be distilted at an angle to the coordinate system, they can be somewhat _torted and involved, but they must obey the three following restraints: 1

according to Ata zero the curves must cross alternately and be spaced the multiplicity of the zero.

71

72

COMPLEX ZEROS

2

3

Far away from any zeros, the local placement of the zeros must tend to fade out and present the pattern of an isolated multiple zero having the number of all the zeros inside (with their multiplicities). The number of cycles of 1, 2, 3, 4 going counterclockwise along any closed contour must equal the number of zeros inside the contour (when counted properly).

These three conditions so restrict the behavior of the curves we are following as to eliminate many pathological situations and make the problem tractable.

PROBLEMS 1

3.6

Sketch the curves for the polynomial having zeros at z=i,

2

z2=-i,

z=0

Sketch the curves for

w=zt+1 3

Sketch the curves for w=

3.7

24+ 227+ 1

An improved search method

One of the main faults of the crude method is that it wastes a great deal of machine time in calculating the function values (and corresponding quadrant numbers) at points which lie far from the u = 0 and v = 0 curves and hence give relatively little information. iy

x

Instead of filling in the whole area of points, we propose to trace out only the u =0 curves and to mark where they cross the v = 0 curves, which give, of course, the desired zeros.

3.7

AN IMPROVED SEARCH METHOD

The basic search pattern is to go counterclockwise around the area we are examining and look for a u =0 curve, which will be indicated by a change from quadrant number 1 to 2 (or 2 to 1) or else from 3 to 4 (or 4 to 3).

When we find such a curve, we shall track it until we meet a vo =0 curve, which is indicated by the appearance of a new quadrant number other than the two we were using to track the u =0 curve.

Having found the general location of a zero, we shall pause to refine it, but we shall need to pass over the zero finally to continue tracking our u =O curve until it goes outside the area we are searching for zeros. We need to know if this plan will probably locate all the zeros (‘probably depending on the step size we are using and not on the basic theory behind the plan). By the principle of the argument, the number of cycles we find is the number of zeros inside the region. u=0 (and v =0) curves we care about must cross the Thus, the boundary of the region—they cannot be confined within the region—and our search along the boundary will indeed locate all the curves we are looking for (unless the step size of the search is too large).

Impossible

It may happen that occasionally there is a jump we numbers, say from 1 to 3. We can easily see that where Just curve. crossed both a u =0 and a v =O it cross is, of course, not known, though probably

in the quadrant have in one step these two curves is near the edge.

73

74

COMPLEX ZEROS

Outside If we want if we wish our search going into

3.8

Inside

Find zero

to find this zero, then we assume that it is a 1, 2, 3; whereas to ignore it, we assume that it is a 1, 4, 3. We have to modify plan accordingly, but this is a small detail that is not worth at this point.

Tracking a u =0 curve

How shall we track a u =O curve? For convenience we start at the lower left-hand corner of the rectangular region we are examining for zeros and go counterclockwise, step by step, looking for a change in quadrant numbers that will indicate that we have crossed a u=0 curve. When we find such a change, we construct a squaret inside our region, using the search interval as one side. u=0

Start

The u =O curve must exit from the square so that a second side of the

square will have the same change in quadrant numbers. We continue in this manner, each time erecting a square on the side that has the quadrant number change, until we find that a different quadrant

number appears. +Squares only if x and y are comparable in size or importance (or both); otherwise, suitably shaped rectangles to maintain relative accuracy.

3.8

TRACKING A

u=0 CURVE

When this occurs, we know that we have crossed a v = 0 curve and that we are therefore near a complex zero of the function we are examining. As a practical matter of machine economy we evaluate one point of a new square and check that side for the quadrant number change, before we evaluate the second corner of the square. In this way we occasionally save one function evaluation, but see Sec. 3.10 for why this

is risky. It is easy by eye to make the new square, but it is a bit more difficult to write the details of a program that properly chooses the two points stage to of the next square to be examined. It is also necessary at each region the of out us check whether the curve we are tracking has led we are searching,

we come to it at a later and if so, we must mark the exit so that when we do not track this r), contou the around going time (while we are direction of course. same u = 0 curve again, this time in the reverse

75

76

COMPLEX ZEROS

PROBLEMS

3.8

1

In the example of Sec. 3.3, apply the tracking

2

zero in the first quadrant. Find the complex zeros as in the example

method

to the complex

in Sec. 3.3, except

in the

region 0 =x S27, 2a 0,h>0 g>0,ha kU

y

ole

ow

i

re

ov: < A0>,

P&G fe

(ies of} opt? gua tot isew Mtyguos show Hiv cortiem sounds eT onnioa’ én ¢vit dw ti ores slams 2 tn Rolgen irtgit tg team tiv a ton Gaeh MamUgie sit to Giginolig ard sanies 4 rise ton 6t a ow .be 1695 ; JS visbaced aeSOG GW UB aevis srt) ie Omit \Narte ow tacitpa se

atugaiatery uvig aemitemae lliw bodiem ie {

r

c

te .

ek

C=

aide

of-2v

>

a

é



"

,

1 os

d

F

»

»

|

ed

°

\

1

Tria

Li

@ntt-o8 gf Wa!

ing

ar)

Sat

,

i

j

| ae

at

a

FF

lephy

Viera We

ee

WON

At

fond &4 werting i

c

-

=

aC

we

CFIC

eg

od

7

:

er er

7

P/

vprart Ff1)

ry

tea

A

v

el

STiet

a

we ~

7

19

Oe

ane

RW.

un

are

=

fir ap ih

pela nell

epolonual

yeti it

=

OA , Ce val Ge ee i,

ie



rae - aly

re

Se

_ a

;

ey

tem

#

a3

@ geen Plat,

Sra

BOs

i" oe

Poh

6Fucriieias Of Axa reales

tage srt we

are

:apee

|

eae

al Ware baF £6 © ;

A.

| 6S

eS we nh Tok

(0 Wires

ate. aT*

i

bevy. chow nay (ergs

fon Method, an (alte

Ta.

»

"\C, + e3x) =e -7(1+x)

But if owing to roundoff the roots were pi, =—l+e P2=—l-€

4.2

SCALING

Solution y=

Gers

&)r + c,e-At

&)r

Apply initial conditions l=

CYFhics

0=—(1 a 2

—&)c, —(1+ €)cy

(1+)eru-em 4 (1—=)e-wren € €

Note roundoff trouble, especially near x = 0.

It is for this reason that the proposed routine differs so significantly from those usually offered by the pure mathematicians who are usually anxious to show how well they can separate zeros; we are equally anxious to find all multiple zeros so that at a later stage we shall not face roundoff disaster. Different objectives produce different methods. We shall confine the treatment to real polynomials; thus, we propose to find the real linear and real quadratic factors. If the complex zeros are wanted, they are easily found from the real quadratic factors by using the quadratic formula for the zeros. Another purpose of the chapter is to extend our analysis of Newton’s method.

4.2

Scaling

on the n degree same the of polynomial another in polynomial that results We equation. original the of those to related simply are zeros whose have a number of such transformations available. The simplest transformation is to divide the whole polynomial by the leading coefficient, which makes the first coefficient equal to 1.

The idea of scaling in this case is to make a transformation

Oe



1 P,(z)

an

=z"

Gy-

+——

a

2"! 4+- Hip ie ee

an

=0

an

eeES ee

Another transformation is to replace z by a new variable z’ z'

aed b

84

ZEROS OF POLYNOMINALS

or

z', = bz

and multiply the equation by b" to produce which the coefficient a,-, is multiplied by b*.

a new

polynomial

for

@ + +e oQ)-o@ee becomes

= ar + (22) aig. a_ofw

snsigtenatisn

n

2.3

+(27)=0 ee

3

eee

t term equal to If we now pick.b to make the coefficient of the constan toward a form rmation 1 in size, then we have made a second transfo that is more easily understood.

Set dy)b" a

an or

ieee

| it ao

Fn pyica) = (ci) |rSE 0 tt a

a

Ip -, Qn

A third transformation is the substitution z = —z' which changes the sign of alternate coefficients. It could be used to confine our search to the positive half of the plane if we wished, but this seems, in practice, to give us no advantage. A fourth transformation is the substitution

2=—

z'

followed by multiplying the whole equation by (z’)", which has the effect of reversing the order of the coefficients and of taking the zeros that lie outside the circle |z| =1 in the complex plane and bringing them inside.

4.2

(2')" Pa(=)= (2) z!

an

SCALING

ri Gn-1 ata

cits ae

(z')"— (z")" = o(z')" + ay (Zz!) +++ + ay

1 _ (cos @—isin 4) p(cos 6 +i sin 4) p or

AS eis Sty SSR TPM eee le

As a result, we can search the two finite intervals —1 =z=1 and —12' becomes (2')3 + 3(z’)? — 22’ + 7=0

If in turn this has only even-degree terms, then we repeat the transformation until we finally come to a polynomial which has at least one term of odd degree. The presence of this term will assure us that except in a fantastically unlikely combination of roundoffs we shall get a nonzero value for Ap. In loose language, we might have been exactly on a ridge of a saddle of the function P,,,(z) and have needed a little push to find our way down to lower values; the term of odd degree will supply such a push.

4.9

A MINIMIZATION APPROACH

When shall we apply the test for even-degree terms only? One place would be immediately after we have checked that both a, #0 and dy # 0. Another place would be when we are about to start the search for quadratic factors. To be safe, we can try it both places.

Note that the reduction can produce a polynomial of odd degree which would have at least one real zero; hence, we go back and search for real zeros again, even for multiple zeros!

We have, now, to consider how to undo the effects of the transformations z?—>z’. Probably the simplest way is to take the zeros of the factors we have found, write them in polar coordinate form, and apply de Moivre’s theorem, using the proper multiplicities.

De Moivre’s theorem (cos

6+i sin 6)" =cos n@+i sin né

Alternatively, we may note that it is easy to factor z*—pZz*—q

into

(22 + az + V—q)(z? —az + V—q) where

a? =2V—q +p We have to repeat this process the proper number of times (plus the simpler process for any real zeros that may appear).

4.9

A minimization approach

lead us to a We are still not certain that Bairstow’s equations will tions have modifica the smaller value—only reasonably confident that

method in - removed the worst objections to what is basically Newton's we are that sure ically mathemat being on insist we If two variables. value, then we smaller a find can we which in direction a in pointed 9. need to use some of the minimizing techniques of Chap.

99

100

ZEROS OF POLYNOMIALS

We are trying to minimize the distance f(p.q)

=

fo": |Pon(z) |? — —qr;* sf Prifo =}

Consider ‘‘a level curve” along which

flp,q) =¢

f (p,q) is a constant,

f(p.q) =c A nearby point on the same level curve would be

f(p + Ap, q + Aq) =c Their difference is

f(p + Ap, q + Aq) —f (p,q) =

af

-2ug eaeFells hen

terms

The tangent line at the point (p,q) is given by the linear terms

of Bp

of

Ap iheF Aq

=0

where now the point (p + Ap, g + Aq) is on the tangent line.

Perpendicular lines have negative reciprocal slopes, m,=-— Mo

A line perpendicular to this level line is

of Ap rs

Of Pt

si Bi

and points in the direction of ‘‘steepest descent.”’

4.9

A MINIMIZATION APPROACH

How far do we want to go in this direction? We know in this case

that we have a minimum of zero. Expanding f(p,q) about the point (Po,Go) On the level curve, we have, keeping only the linear terms,

F (P19)= f(P09) +f Ap + # Aq We want f(p,q) =0 so that this equation plus the equation for the direction,

2f oor +34 Aq =—f(PoG0)

of 3q 4? ~ws ap 49

=0

[where the derivatives are evaluated at the point (po,q0)], determines the step sizes Ap and Aq

—f (8f/dp)

AP = Taflap)® + (af/aa)* —f (af/dq)

44 = “affap)? + (@f/8q)" We now find the partial derivatives f(p.q) = —9ri? + prito + To"

3f= —pqryts — 2r:f2q + p* rors + Prove + 2rorsd

p

2f=—r

q

— 2rirsqg + prits + profs + 2rore

101

102

ZEROS OF POLYNOMIALS

where we have used the earlier relations (Sec. 4.7)

Fara

tts

aiana

The trouble that will appear is that at the minimum

of 0; ee0q 0

Op

so that the denominator approaches zero quadratically—but so does the numerator, so that in principle the limit is correct. Underflow is, however, a problem.

4.10

Multiple factors

The problem of multiple factors is theoretically quite difficult, as shown by the following example. In practice it is not so bad; rarely do more than double zeros occur in actual problems.

fiz) =(z + 8 = 8 + Bz? ++ + ++ 70z*+ ++

falz) = (z + 1)° + Go)®

+ 8241

= 28487 +--+ +7024 +--+ +8z+1+4 (7)!

To eight decimal places the coefficients of f,(z) are the same as those of f(z).

But the zeros of the first are at z =—1, whereas those of the second are on a circle of radius 4 about z=—-1.

4.10

MULTIPLE FACTORS

The theory of real zeros was carefully developed so that the obvious parallel would apply for quadratic factors with complex zeros, and so nothing more need be said. The same form for bounds on the roundoff are used, except, of course, we divide by the trial quadratic at each step.

103

a

;

ae

SSUUDAT BTU,

tm

-

«



:

ml.

—-

va

:

“7

ene sichail srtwa boqotevab yitube isa -anveeorss lawrto oer edt” (98 Brg ours xsikgoo hw etofze? oiTRIbRYP yr yigae divow lateral

th.

statue att fo abrued 2ot ma! oman oT bie od bear s1om nit on q

Mihi ° jerrucu to agenne bast ms

ie Ditssdeup inte ant yd sbi ne ow

a,

\

a. :

;

+s

*

,

=

i

>

7

> Oe ode



a

a

ee

=

_

-

7 .

=

J

7

@

yar thet Soule a a

ne

ee

i+

Ps

;

_

a

a?

>

+

- ‘“

e

ea

te

i

:

hy?

-

ed J

>

>

_

_

i)

ene

=

=

She

ide the) e)

=

ig to

:



tebe



6 Powel afte ys tie i

(ruey

—_

ee :

_

Tie grettan ofMalice We lollsenrsy Al tu” ve ‘meen =



_

_

a

ae

ys

>

>

wa feamero :

I

oorrect

>

P

.

*'

bcZ rere

r

So) en NG



G head witaty~

Bo ithe 1 @

de

a

7

G67)

"6

cane,

=

_

20"

es

Ax.

:

Be


les

of

id

ta

_

;

a)

;

ra

}

~

7

x 7

se

nn

a|

co

‘Ta.

-

7

=

of



~

a

:

;

7

®

ee

==

e

yeas

.

ee.

a

ee

7

A ars ‘

SIMULTANEOUS

5.1

LINEAR EQUATIONS AND MATRICES

The idea of Gaussian elimination

The problem of finding the numerical solution of a set of simultaneous linear equations Gy,1%1 +y,9%2

+ ° + + aintn = dy

Go1X1 + AeexX2 ++

* + denXn = b,

OnX1

* + an nXtn = b,

+ GneXe

Fs

occurs frequently. It is necessary to examine the theoretical as well as the practical aspects of the problem in order to understand what happens in actual practice. With the sigma notation,

The sigma notation n

Seas es

ts 4 Xe

i=1

Special cases: > c=c ry l=cn i=1

t=1

1

bs xy =%) i=1 0

5 a

0

i=1

Notice in n

y i=1 i andj are “dummy indices.” ee

n

=Dj=1 %

SIMULTANEOUS LINEAR EQUATIONS AND MATRICES

the system can be written as n

>» G1,5X; =D,

j=

»Y A2,;x; = be j=1

n

Dd 4n5Xs = by j=1

or even more simply as n

DY 4isx5

= by

Sy

Re

ere et

J 1

This compact notation is useful but should fool no one into thinking that anything fundamentally new has appeared. An example of the importance of notation

Consider multiplying CCLXIV by DXLIX or multiplying 264 by 549. Which is easier?

The Gaussian elimination method for solving these equations is usually taught in an elementary algebra course. The method proceeds as follows: Divide the first equation by a,,, to make the first coefficient equal to 1. Multiply this new first equation by az,,, and subtract the result from the second equation. Multiply the new first equation by a3, and subtract this result from the third equation, and so forth. Inn — 1 steps we get n — 1 equations, none of which have the first variable x,, but have only the variables x2, x3,..., Xn-

Solve

(1)

3x + 6y + 9z =39

(2)

Qx + 5y —2z2=3

(3)

x+3y—-—z=2

Divide (1) by 3

(1') (2’) (3’)

x+2y+3z=13 Qx + 5y —2z2 =3 x+3y—z=2

5.1

THE IDEA OF GAUSSIAN ELIMINATION

Subtract twice (1') from (2') and (1') from (3')

(1") (2") (3”)

x+2y+3z=

13

y — 8z =—23 y—4z=-11

Divide (2”) by 1, and subtract (2") from (3") ue) (De")

x+2y+3z= 13 y — 8z = —23

(372)

4z=

12

Next we repeat this process and eliminate the variable x. in n—2 steps. And'so forth.

If all goes well, we finally come down to a single equation in x, which is easily solved. We then substitute this value of x, in the equation having only x, and x,-, and solve for xp-}. eee Back substitution z=3

y =—-23+24=1 x=13-—2-9=2 Hence, x=2

Vea z=3

is the solution.

ee

ee

ee

the The obvious repetition of this back-substitution process produces the Thus %1 -.-5 Xn-1, xp, sequence the in solution, one x, at a time system of equations is solved. In this process we have divided an equation by its leading coefficient n times,

ee The divisors were

3, 1,4 Hence the determinant has the value 3x1x4=12 Eee ee

107

108

SIMULTANEOUS LINEAR EQUATIONS AND MATRICES

and each of these divisions changes the value of the determinant by the corresponding amount. The final determinant clearly has the value of 1, and since (by Theorem 5 of Appendix A) none of the other operations changed the value of the determinant of the system,

Product notation ays [] 1 =

122°

ETS

i=1

Determinant

lay| = Il d, i=1

where the d; are the divisors used.

ie et ieee lal

sa Saipa ial wet

i el tl til Tew

it follows that the value of the original determinant is the product of the divisors we used.

PROBLEMS 1

5.1

Solve the system xty+z=6 2x + 3y+z=1

x—-yt2z=3 What is the value of the determinant?

2

Solve the system

t—ytz=1 2x + 2y—-z2=3 3x —S3y+z=1

3

Solve the system

ax + by =e

cx +dy=f

4

Count the number of additions, multiplications, and divisions in the total Gaussian elimination process as described above.

5.2

5.2

PIVOTING

Pivoting

If the solution of simultaneous linear equations is this simple, then why is a large part of a chapter devoted to this topic? The answer is that we have neglected a number of important details, including the difficult topic of roundoff errors. Gaussian elimination with pivoting, either partial or complete, is the generally recommended method for solving linear equations. By partial pivoting we mean the selection of the largest-sized coefficient in the next column and the use of the corresponding equation as a basis for the elimination process.

(1) (2)

x+y+z=6 Q2x— y+2=3

(3)

3x+ Qy—z=4 big0 pivot

(3’)

x+3y-32=4

Subtract (3') from (1) and twice (3') from (2)

;

1

(1)

tae

gut 3gz=%

(2’)

—fythz=4 =

pivot

y—9z=-7 1

(2”)

Multiply (2'') by ts and subtract from (1') (1)

Hz z

Il

ers

Put this in (2") —

ll |

oe

+

ll vic ll “Wz!

Put both in (1)

x=6-2-3=1 Solution

to

109

110

SIMULTANEOUS LINEAR EQUATIONS AND MATRICES

This avoids the difficulty of what to do if the coefficient we wish to divide by happens to be zero (but see later if all are zero). Complete pivoting is the selection of the largest-sized of all the available coefficients as the basis for the next stage of elimination, which, of course, operates on the corresponding variable. It is generally agreed that it is usually not worth the extra search time to do complete pivoting. The effect of doing partial pivoting can be regarded as merely a reordering of the sequence of the equations in the given system, though this should not actually be done in the machine because, among other things, it would greatly confuse the user if he tries to find out what happened inside the machine.

5.3

Rank

The concept of congruent triangles in high-school geometry does not make a distinction between left-handed and right-handed triangles, though this is important in some applications of geometry. Similarly, the usual definition of the rank of a determinant (Appendix A, Sec. 5A.4) does not distinguish between row and column dependence (in a certain sense they are equivalent, but there are practical differences).

Column dependence

Row dependence Q2x+4y+

=13

Wx+4y+2z2

xt+t2y+2

=I11

pm

x+

A

XE

Oy+5z=

2

ele ie i ae eae Column of all zeros

z2=13

he y a x+2y+3z=1,= 135 Pivot/

wot” x +2y -Z= =2 Pivot/

y+

z=

6

x+2y +$z= Oy +0z= —y

+32

8 O

=-3

Row of all zeros

This distinction is often important to the user of a library routine when he wishes to understand what has happened to his problem if the system does not have the maximum rank. How are we to recognize these two kinds of linear dependence? In the partial pivoting process, if we find in searching for a pivot in the next column of numbers that all of the numbers are zero, then this clearly (with a little thought perhaps) shows that the column of zeros is some linear combination of the preceding columns. If instead we were to find a row of zeros, then this indicates that we have a linear dependence among the rows that we have so far used. Thus, in the

5.3

RANK

process of searching for our next pivot, we need to check each zero in the column to see if it is part of an entire row of zeros, as well as to check to see if the entire column is zeros. Having found a linear dependence, we should try to locate it as closely as we can so that the user is pointed in the right direction, since it is highly probable that he intended the equations to have the maximal rank. To locate the dependence more closely, we can drop one rowt (column) at a time and restart to see if the same row (column) of zeros appears. If it does not, then the dropped equation (variable) is part of the linear dependence;

if it does, then we drop the

row (column) and try dropping another. In time we are led to a minimal set of equations having the linear dependence. In principle there can be arbitrarily difficult combinations of row and column dependence and an arbitrarily large decrease in rank, but in practice this is unusual because, as we have said, the user generally intends the system to be of the maximal rank so we shall not go further into this intricate topic.

0

OE

Oe

I50

0

0

O

Oey

OO

Block of zeros

The relevant mathematical prove) is:

theorem

(which we shall not bother to

If the rank r of the matrix of coefficients of the unknowns is the same as the rank of the augmented matrix, then r of the equations can be solved for in terms of the other n — r variables; and if the ranks are not the same, then the equations are inconsistent. parentheses or always the fIn this notation, read always the word before ihe word in the parentheses.

111

112

SIMULTANEOUS LINEAR EQUATIONS AND MATRICES

It is perhaps worth noting, especially for column linear dependence, that many of the variables can be solved for uniquely and only some of the variables have a degree of arbitrariness.

PROBLEMS 1

5.3

Solve xty+z=

6

ox —y +2——=2 2x+0y+z=

2

2

Solve x+3y+2z=10

x—-y+0z=-1 xtyt+ z=

3

6

Solve

yt

2—a

x+y+z=a xt+y+z=a

5.4

Roundoff

The main difficulty with the preceding discussion is that it ignored the vital problem of roundoff.

Given the system x+fy=1 Qx+ zy =2

The rank is 1. Put the system in a computer:

Note coefficients .100 x 10'x+ .333 x 10° y/=.100 x 10? .200 x 10'x + .667 X 10°y = .200 x 10!

Now the rank is 2, and we can solve the system uniquely! .100 x 10-*y =0

ee x=1

5.5

SCALING

As shown by the trivially simple example of rank 1 in the insert, as SOON as we commit roundoff, we raise the rank of the system to the

full amount. Add to the roundoff of the initial coefficients, the conversion to binary and the subsequent arithmetic, and it becomes clear that we cannot in general expect to find the zeros we discussed in the section on rank. We therefore face the question of how small is ‘small.’ Unfortunately, the answer is not easy to give. For one reason, once the equations are fixed, then the value of the determinant is also fixed and the product of all the divisors (which are the largest numbers in their respective columns) is also determined. Therefore, the more successful we are in finding large pivots in the beginning, the more sure we are that we must find compensating small ones later on. In short, in the process of Gaussian elimination there are large-scale internal correlations in the arithmetic, and we are unable to follow them through in a simple fashion so that we can give a simple meaning to the idea of “small.” When we suspect a rank less than n, we can artificially try various (multiplicative) levels of roundoff noise and call numbers less than this noise zero, but there appears to be no way of having the routine in the machine make reasonable judgments on this matter and then use them automatically.

5.5

Scaling

The purpose of preliminary scaling is, among other things, to give meaning to “large” and ‘‘small.” As we did for polynomials, we examine the question of equivalent systems that can arise from scaling; unless we can somehow scale the equations properly, then what significance can there be in the statement, ‘‘take the largest number in the column as the pivot,” other than the chance of the way the equations were written down?

Consider the matrix of coefficients

© Ne

min am

where we assume that € is small and that l+e=]

113

114

SIMULTANEOUS

LINEAR EQUATIONS AND MATRICES

Scale by rows

Sa (1

eruleal 2e :)

2

ces

Pivot

Pivot

v[m rl wl_m vn[mm

bo

|

==)

vlm me

via oe

and the back substitution would be easy. But if we scale first by columns, Pivot 3

eal,

(1 Qe? Oper

3

1

1

ie

td

2

2 TS

OnpE Vm

«|

come

ag

The rank is 2.

It is obvious that there are n scale factors that can be used on the rows and another n on the variables by using x; = kx;

without changing the system in any fundamental way. We could, if we wish, also scale the constants on the right-hand side, but this is seldom worth the trouble. We therefore have 2n scale factors to decide upon.

5.6

A METHOD OF SIMULTANEOUS ROW AND COLUMN SCALING

About all the help in scaling one can find in the standard textbooks is something like ‘scale by rows and then by columns (or the other way around) to get the largest element in any row and any column to be around 1 in size.’ The simple example above shows the superficiality of this advice. Evidently, the usual advice about scaling is inadequate, and we are in trouble if we try to defend the pivoting method in detail.

PROBLEMS

5.5

Scale as best you can: 1 x+10y+100z=3 10x —y+10z=72 100x + 100y +z=9 2 100x+10y +z= 100 100x—y+xz=10 100x+y—z=1 3

100x+y+z=72 x+100y+z=9 x+y+

5.6

100z =-7

A method of simultaneous row and column scaling

Since we use multipliers of the rows and columns as scaling factors, it is natural to look at the logarithms of the absolute values of the coefficients. Let us imagine that the rows have been multiplied by 2 (i=1,...,n) and the columns by 2% (j =1,...,n+ 1), where r; and c; need not be integers.t We shall also multiply all of the n(n + 1) elements of the system by 2”. We write the augmented matrix G11 Gin b, Maca sewawisn Rees k omic s Gn,

ann

=A

br

and set

lai.5|

ee

=

9b

2”

(61h, ok oe, 12) j= 1,

.,n+1)

+ 1 columns since tit may well be preferable to scale only the first n of the n If so, the modifipivots. of selection the in involved not are terms constant the cations are easily made.

115

116

SIMULTANEOUS LINEAR EQUATIONS AND MATRICES

for the moment

where din: = b; Supposing new exponent is bij

+M

that a,;

#0, then the

++;

In order to make all n(n + 1) new elements simultaneously as close

minimize

to 0 as possible, we will tial n

m=> i=1

ntl

Ds (b, 3+ M +17; + ¢;)? j=1

We have more than enough parameters, and we select n

Ml

nti

Somes bis i=1 j=1

as the negative of the average of all the b,,. As the first step in finding the minimum, we differentiate with respect to the variables r; and c;, and set them equal to zero to get om

ara = 25 by tM trite)

om

re =25 j

(by +Mtnte))

=0

(j=1,..., 7)

=0

(G=1,..

i=1

These equations have the solution n+1

n=

2 (b,3

+ M)

hence,

==>

(bi3 + M) i=1

hence, n+1

> co, =0 j=1

05.

1)

5.7

ILL-CONDITIONED SYSTEMS

Direct substitutions show that these are the solutions since

2[ —(n + 1)r, + (n + 1)r, +0] =0 2(—nce,;+ 0 + nc;) =0

Thus, the r; and c; are the appropriate averages of bis +

M

This is the classical analysis of variance. If any of the coefficients a,, = 0, then there is no corresponding b,,; and we must treat it as ‘‘missing data’ as discussed in conventional statistics books on the topic. In this approach to scaling, we have minimized the variance of the scaled exponents of the terms that are in the system (and have excluded the zero terms). AS usual when we come to carry out the scaling, we calculate the scale factors to use in picking out the pivots, but we do not scale the equations themselves, so that the question of integer solutions to the scaling equations need not arise. In practice it may be sufficient merely to use the exponents of the a,,; as the bis without taking the full logarithm. Unfortunately, this method of scaling cannot be shown to be relevant to any known method of solution; it is merely a plausible method to use in place of the usual ‘‘scale by rows and then by columns” rule.

PROBLEMS 1 2 3

5.7

5.6

Scale Prob. 3 in Sec. 5.5. Scale Prob. 1 in Sec. 5.3. Scale Prob. 1 in Sec. 5.5.

Ill-conditioned systems

Since we have failed to be precise both in identifying zeros so that we can determine the rank of the system and in finding a sound meaning do to scaling, it is necessary to find some concept that will at least term The system.” part of the job. This idea is the “ill-conditioned in “itl-conditioned” is ill defined. The vague idea is that small changes we If result. final the in changes large produce can _ the initial system say ‘relatively are to take floating point seriously, then we should in a sense the Thus, small changes” and “relatively large changes.” rank in the of idea the for idea of ill conditioning is a substitute

118

SIMULTANEOUS LINEAR EQUATIONS AND MATRICES

presence of roundoff. Alternatively, it can be said to be an attempt to cope with the idea of linear dependence in the presence of roundoff noise. We seem to need some idea like ‘‘almost linearly dependent,” but as yet this idea has not been formulated clearly. If the ill-conditioned effect is in the original physical system, then it is usually called “unstable.” Thus, a pencil balanced on its point is an unstable system since small changes in the initial position result very soon in large differences in the subsequent positions of the pencil.

lt

Unstable system

The ill conditioning may arise not from the original physical system but from the mathematical formulation of the problem. in a sense the material on function evaluation in Chap. 1 shows how different formulations of mathematically equivalent equations can lead to accurately computable or to very inaccurately computable expressions. Another example is in the choice of the basis for representing the function (see Chap. 12 for still another example of this). Thus, if for the differential equation

we choose the hyperbolic functions as our basic solutions, we get the solution y = cosh x — sinh x

5.7

ILL-CONDITIONED SYSTEMS

wnereas if we choose the exponential functions (which are mathematically completely equivalent), we get

y=e* For large values of x the second is easily calculated, whereas the first is impractical. Yet another example is in the classic case of the circular vibrating drumhead. If we introduce cylindrical coordinates, as is customary, we find that the differential equation we have to solve has a singularity at the origin—a singularity that is not present in the original problem!

EEE (99““1IaI?5(£_e

Bessel’s equation

problem. Singularity is due to the coordinate system and not to the physical ee

on of Thus ill conditioning can be a result of the matematical formulati of ation reexamin a by the problem, and if so, this can best be handled through get to attempts any by the mathematical steps rather than . somehow by double or triple precision or other fancy gimmicks the way, le reasonab a in ed formulat is problem the Even when vague definithe only Using unstable. it make may solution of process tion above, we announce the simple theorem:

Pivoting can take a well-conditioned system into Theorem ons. an ill-conditioned system of simultaneous linear equati

119

120

SIMULTANEOUS LINEAR EQUATIONS AND MATRICES

The example of scaling in Sec. 5.5 shows that the ill condiProof tioning arose not in the scaling but in the choice of the top line to eliminate from the other two lines.

The symmetric system Pivot

Bx+ Qy+ z= 34+ 8 2x + 2ey + 2ez =6é x + Qey — ez =2e

appears to be properly sealed, so we pivot to get

(— 4+ 2e)y + (—3 + 2e)z =-2 + 4e

(—$+2e)y+(-3— ez=-1+ @ For small ¢, these are ill conditioned. But if we eliminate between the second and third equations, we get &y — 2€z =—€

In floating point, we divide out the €,

and can now solve for y=z=1

Put in the first equation, we have trouble, but it is safe in either of the others, to get

The example in the insert proves the theorem for simultaneous equations with the use of a simple symmetrical, 3 x 3 system. There are a number of misconceptions about ill-conditioned systems. One misconception is that once the system is scaled, then a small determinant of the system must mean it is ill conditioned, but as we have just seen, this is not so. Another misconception is that somehow ill conditioned is connected with ‘‘the angle between the graphs of the equations.’’ However, in floating point arithmetic, a scale change of x = Ax can change the angle radically without changing significantly the arithmetic we shall do to get the solution.

PROBLEM 1

5.7

Given the lines

y=mx+b, y=mx+b,

5.8

WHY DID THE EXAMPLES WORK?

show that if we scale x = Ax, then there is a \ such that for m,m, 0, find the \ that makes the angle greatest.

5.8

Why did the examples work?

It is reasonable to ask how typical these examples are and how often in the past the pivoting method has created the ill conditioning that was reported to occur by some library routines. The answers are not known at this time; all that is claimed is that textbooks and library descriptions rarely, if ever, mention this possibility (though it is apparently known in the folklore). When we did the pivoting in the examples, we added to the two lower equations the large numbers in the y and z columns of the top equation. If in the earlier example we set x = ex, the equations become

Bex+2y+ 2=343¢ Qx + Qy + 22=6 x+2y-—

z=2

and we would never choose the 3¢ coefficient as the pivot. Thus, in a sense the example depended on the “wrong units” in x.

pee SIH

seviber ie

AAP 1D fine) Sans

oo YM ie

a

Se

Thus, we later found a “linear dependence” because of the finite size of our computing system. Indeed, it is now easy to see that in general the use of a pivot for the elimination of a variable from all the other equations leads us to a set of equations where we ‘‘see”’ the original pivoting equation everywhere we look in the derived system; we are apparently trying hard to make the system linearly dependent! A system of equations Large numbers

Pivot

that pivoting is But what of the well-known arguments that “prove” a good thing because it tends to reduce roundoff?

121

122

SIMULTANEOUS LINEAR EQUATIONS AND MATRICES

Pivoting produces effective multipliers of the pivoting equation that are less than 1 in size; hence it apparently does not amplify the earlier roundoffs. If Ly means a typical equation and |A| < 1, AL, ata L, —

Lz

is a typical step in the elimination process. But

1

1

L, 1 +—L, rN 2 =»

3

»

2

2

_

xX3

_

Hence + Aa

ek

%

Pit

_

Dears 1

2

Pes 4%

Again +

To examine this point, consider the function defined by

P = P(x,, x2,..., Xn)

P = (x, — X2) (3 — Xa): + +(x, — Xn) (Xg"— Xa)> * (XQ — Kaloo Ge

=e)

n

=

I] (x; — x5) i DiAs,5

If all the b,; =0, then the system of equations is Definition said to be homogeneous.

Corollary 1

If D # 0, then the solution of the homogeneous

system is 28) oa ples

--=%,

=0

That this is a solution is obvious. Suppose that there was also some other solution, and let x; # 0. By Theorem 9 we are led to a contradiction since all the b; are zero; hence the solution is unique.

5A.4.

THE CASE OF

D=0O

Corollary 2 If D #0, then the system of equations has a unique solution. If there were two different solutions, then their difference would satisfy the corresponding homogeneous system which by Corollary 1 has only the trivial x; = 0 solution.

5A.4

The case

D=0

If the determinant of a system of equations is zero, then there are a great many things that can be the cause of this, the most probable being that the problem is formulated incorrectly. In view of this remark we will give only a brief summary of the case D = 0. First, consider only three variables so that each equation can be viewed as a plane in the three-dimensional Euclidean space. - lf D #0, then the three planes intersect in a single point.

lf D =0, then one of the following can happen: [1]

[2] [3] [4]

[5] [6]

Two planes intersect in a line, and this line is parallel to the third plane (inconsistent equations; no solution is possible).

The three planes intersect in a common

line (degenerate; many solu-

tions are possible). Two planes coincide and the third is parallel (inconsistent). Two planes coincide and the third intersects the common (degenerate; many solutions are possible). All three planes are parallel and distinct (inconsistent). The three planes coincide (degenerate).

plane

135

136

APPENDIX: MATRICES

s require To treat the general case of n equations, where the drawing inan n-dimensional space and the complications are much more volved, we need to introduce two new ideas. of the largFirst is the idea of rank, which is simply the order (size) columns) and rows various omitting by (formed est subdeterminant that is not zero. Second is the idea of an augmented matrix, which is the matrix (a;,;) with the column of b,; adjoined on the right, written as (a,,;,55); the rank thus, the matrix has n rows and (n + 1) columns. The idea of more colone course, of (except, same the is matrix ed augment the of umn than row must be omitted in forming the square subdeter given. is theorem minants). In Sec. 5.3 the main

APPENDIX: MATRICES 5B.1_

Basic operations

A matrix is a rectangular array of elements a,,;; thus, Qi:

Greg

“°°

Gin

Go1

Go2

°“"*

Gan

Gmi1

Ome

eeeteeeee A = Ay H | ceeereceeeteet Amn

is a matrix of size m X n (m by n). Two matrices are equal if, and only if, all their corresponding elements are the same. A matrix is said to be symmetrical if a;,; = 4;, for all i and j (which requires m = n). Symmetrical matrix

sD W/tn bee

The sum and difference of two matrices A and B are defined as

A+B=C with

Ci,j = Ay,y + Diy

(where, of course, A and B are of the same size}.

5B.1

BASIC OPERATIONS

137

It is clear from the definition that matrix addition is associative, that

is, A+(B+C)=(A+B)+C

and that the zero matrix O, all of whose elements are zero, plays the role of zero in the arithmetic of matrices.

(e) iT O00 (eee, 0.0.6 ~~

There are two types of multiplication to be considered. The first is by a constant k (often called ‘‘scalar” multiplication), which is defined by multiplication

kA = k(a;,3) = (kai,;)

In scalar multiplication every element of the matrix A is multiplied by the constant k.

Le ki 4 5 7 8

os 6] 2

=|

k 2k 3k 4k 5k 6k 7k 8k 9k

Since this definition differs from the corresponding one for the multiplication of a determinant by a constant k, the beginner is apt to have an uneasy feeling that inconsistencies are likely to arise and give trouole. These fears are groundless because in practice this confusion almost never occurs. The second kind of multiplication is that for two matrices

A-B=C where the c,,,; are defined by n

Cit

> Ginde,s k=1

-

138

APPENDIX; MATRICES

ae Oe ee

eo) ee

ea

“Row by column”

Rule

ie

Sa

igsvng |

ee

ee

Thus the elements of a row in A are term by term multiplied by the corresponding elements in a column of B and the products are

summed to get the corresponding element of the product C. This definition requires that the number of columns of A is the same as the number of rows of B. Thus, C has the dimension of the number of rows of A and the number of columns of B. For example,

bia

(as5

Gig.

“°°

G15)

ben

= (C11)

Baa has only one term where n

Cis >; Gide,

Clearly, multiplication of two matrices is not commutative since in the above case for the product BA the elements of C are

Dia ben

bas

(@11412 °° * Gyn) = (C15) = (b4,141,5)

Even for square matrices the two products need not be the same since there is no reason to expect that m

CD; Cindy. k=1 and

are the same.

5B.1

BASIC OPERATIONS

The identity matrix for multiplication is

It is easy to see that

IA=AI=A Diagonal matrices are more general than the identity matrix in the sense that they may have any values down the main diagonal while still having zero elements elsewhere. Evidently, the effect of multiplying on the left (right) by a diagonal matrix is to multiply the rows (columns) by the diagonal elements.

Diagonal matrix

dy

SOLO

0 0,

d, O

eee

eee ee eee t terrane

O «ds

The associative law of multiplication

(AB)C = A(BC) however, still holds as can be seen from interchanging the summation processes in the general term of the triple product.

> ( ey obs 2

iho > aj,j ( as BDinCk.t

EY

j

k

The definition of a product of two matrices allows us to represent a system of simultaneous linear equations, Gintn

Gist

t

"G1ec0

16

ce

Osikee

tt)

G2st2

1)

a9 GenXn

UME AMEG ecCesTREREC

ECS HST DFE RCUEE ESD CSM

NME CPPS FS

Ee eres

= by =

be

139

140

APPENDIX: MATRICES

as a matrix product

Ax=b where A is a square matrix n X n, and x and b are column matrices of size n X 1. The definition of matrix multiplication leads immediately to the

question whether

|A| |BI=|C]

if

AB=C

It is not obvious that this will be true, though the student is apt subconsciously to assume that it will be. One rather direct proof goes as

follows. Consider the large matrix in which we have written smaller matrices as the elements to be found in the region indicated.

(i) The determinant, by the basic definition, will be the product of all possible products (of the usual type) of A by one of the products of the usual type of B regardless of what is written in the lower left-hand corner. Thus, clearly,

A —I

0

3| ll (BI

Writing the whole matrix out in a little detail, we have

Miarn-duac Go, G2 —1 0 SUPT

re ***

99 O

0 0

bia

b, 2

bey

bo»

0 —1

Eee Ree eee ee eee

eee

eee)

We are going to use the 1s in the lower left to eliminate the a’s in the upper left. When we examine how we eliminate the ith row of A, we shall need to use for the multipliers of the lower rows the numbers Qi, Giz, ..., and these will multiply the corresponding b’s in the jth column, b,,;, be,;,... 3 thus, we shall find in the upper right-hand part exactly n

> Gin dr,y = C15 k=1

5B.1

BASIC OPERATIONS

so that we now have, after the elimination of all the a’s in the upper left,

ORC —I

B

We now interchange the rows of the upper half with the rows of the lower half; each interchange of course produces a sign change which we absorb by changing the sign of the 1 in the identity matrix. We have, therefore,

I B OG

Expanding this by the basic definition, we find that we have the determinant of C. Therefore, the definition of matrix multiplication is ‘‘consistent with” the definition of the value of a determinant, and we have

|A] |B| = |C|

141

rt

:

OOCAT OID

i .

wat.

nore

went!

iia

So

rm

n

A

or

agranih iPual wt ss ow Gace: eo wd fo ary eet. «iw Hart equ ait} %% —s BE herhs rigie & wba ssivod lo apceriomt al Mose ; \iael tow 4 we Gv

artery

qt?

>i ian

4...

O82

Jo

ane

toe

:

7 . 7

CURA (8

o~epal erly womaet ou gnin 3

PI a) ROUESNgiiuM vadww ene Jimi

dong wpaeak arte qth a: Awepttiny

lo ROHAN ol) ore toventT . SW: 6 $6 soy ott io noltiniiog ont” nae ?

=

ane ’

Ac rs 1OSG5 ow

anolevertl @ene

Gant awincie Eti®

Creggew iid

©

1h)

i re

INTERPOLATION AND ROUNDOFF

6.1

S

ESTIMATION

Linear Interpolation

Interpolation is usually first introduced in connection with the study of logarithms and trigonometry. The tables of logarithms and the trigonometric functions are generally arranged so that linear interpolation will give sufficient accuracy (for almost all values). y

ee ay

y(x)

Oo

hi

Error flx;)

x

x

Xe

x

Linear interpolation

The usual table of a function f(x) gives the values for f(x;) to a fixed

number of decimal places for a sequence of equally spaced values of x,. From these approximate values of f(x;) we are asked to estimate often the value of f(x) for some value x that is not in the table. This is lines.” the called “reading between x, The process of doing linear interpolation is simple. Given a value x, and we search in the table for a pair of values, which we shall call

x2, such that a aX Ne

If x, # x, we then assume

that between

the two values x, and x, the

144

INTERPOLATION AND ROUNDOFF ESTIMATION

by a straight line through the two

function f(x) can be approximated points. This line is given by

y(x) = f(x1) +

f (x2) — f(x) ( x—%1) ave Xo

_ f(xs) [x2 — x] + flee) [x — 41] Xg —Xy

and is the desired formula for estimating the value of f(x) from the given value of x. When we are at the end of a table and wish to estimate a value beyond the end, we use the same formula but call it extrapolation. As we shall later see, extrapolation is usually much more dangerous than is interpolation.

Error

fix) oo f(x2)

fix)

x1

Xe

x

x

Linear extropolation

lf we are given the value of the function and are asked to find the corresponding value of x, then the process is called inverse interpolation. For inverse interpolation the same straight line is used, but the equation is rearranged into the more convenient form =

=

f(x) — f(x1)

shale it nosis Ge any een [f(x2) — f(x)] + x20 f(x) — f(x1))

F (x2) — f(x1) where f(x) is the given value of the function which we wrote before as y(x). Note the symmetry of the equations for the two types of interpolation.

6.2

ANALYTIC SUBSTITUTION

Inverse interpolation

PROBLEMS 1

3

6.2

6.1

Derive the inverse interpolation formula. Linearly interpolate for sin 30° from the values sin 0°=0 and sin 45° = 0.707. ' Inversely interpolate for sin x =0.5 from sin 0°= 0 and sin 45° = 0.707.

Analytic substitution

The process of linear interpolation is an example of analytic substitution. In place of the function f(x), which we cannot handle, we substitute a straight line and use that as if it were the function f(x). For linear interpolation the particular straight line was chosen to pass through the two end points of the interval, where we knew the values of the function. In Chap. 2 when we treated Newton’s method, we also used a straight line in place of the function whose zero we were seeking, but we used the value of the function and the slope at the same point, rather than the value of the function at two distinct points. Newton’s method

Fix)

y(x) is tangent line

Zero of y(x) used as an approximation for the zero of f(x)

145

146

INTERPOLATION AND ROUNDOFF ESTIMATION

The idea of analytic substitution is central to much of numerical analysis. Repeatedly, when we have a function which we cannot handle, we replace it by some other analytic expression which we can handle and operate on the new expression as if it were the original function. Two steps are involved in analytic substitution. First, what class of functions shall we use? Up to now we have discussed the class of straight lines. Second, how shall we select the particular member of the class? Here we ‘have tried both using two points to determine the straight line and using a point plus a slope at the same point to determine the line. We shall later examine two other criteria for picking the approximating function: least squares in Chap. 10 and the Chebyshev, or minimax, criterion in Chap. 13. Classical numerical analysis generally uses a class of polynomials up to some fixed degree, and we shall do this most of the time. However, in Chap. 12 we shall consider the class of sines and cosines as

our approximating functions.

Polynomials y(x) =dyp

bayx +++ + + dyx%

Fourier series y (x) = do +d, cos x + dz cos 2x + b, sin x + by sin 2x

PROBLEM 1

6.3

6.2

Describe the “analytic substitution” used in the: a_ False position method. b Secant method.

Polynomial approximation

The values that we are given of a function are sometimes spaced so far apart that linear interpolation is not sufficiently accurate for our purposes. In such cases classical numerical analysis uses the class of nth-order polynomials P(x) = dp + a,x + gx? ++ °° + a,x"

and chooses a polynomial which the function.

passes through selected samples of

6.3

POLYNOMIAL APPROXIMATION

147

The simplest method to understand for finding a polynomial is the method of undetermined coefficients, which we shall use repeatedly.

This method assumes the form of the answer with arbitrary (undetermined) coefficients written in it. Then the conditions, such as passing through the various points, are applied to determine the arbitrary coefficients.

fix)

x;

ad

Condition that (x,,y;) satisfies the expression is y; = f(x:)

in the particular case of polynomial interpolation in a table of values of a function y(x), the condition that the polynomial pass exactly through the point (x;,y;) is that P,.(x;) = yy = do

+ 4X; + °°

+ ayx;"

y(x)

P,(x)

X1

Xe

X3

x4

Xs

If there are as many points, equally spaced or not, as there are undetermined coefficients (parameters) a,;, then we must have n+ 1 points and are led to n + 1 linear equations in the unknowns 4a;:

w=Saxd j=0

G=12..5n+1)

,

148

INTERPOLATION AND ROUNDOFF ESTIMATION

coefficients a; is the famous Van-

The determinant of the unknown dermonde determinant 1 1 D=

mecca cay ghd

Xi Xe

c cw cccccccccvcnncccccccanvescenes

|x? | scan

1

Xn

Kati

Xn+1

which cannot be zero if x; # x; as we shall prove in the next section. We can therefore, at least in principle, determine the unique in-

terpolating polynomial. Interpolation is, then, merely evaluating this polynomial at the desired point(s).

Example

a matter

of

Find cubic P3(x) such that P;(0) P3(0)=1 P,(1)= P3(1) =0

Solution P,

=

P,!

=

P, (0) Ps; @)

= =

do do do

i

a,x

+

+

a,x?

a;

+

Qasx

+

oasx-

ay,

Se

2

+

as

+

3a;

0

+

P,'(0)

=

a;

P;'(1)

=

ay

and

aj=O0,

Qa

PROBLEMS

=]

=) +

2a,

ty

a,=1

{dy Therefore,

Ux?

+ cule

ls Bus

=

0

=-!l

a= a3 =

1 —1

P3(x) =x +x? — x3

6.3

1

Construct the interpolating quadratic through [a, f(a)], [a t+h, fla+h)], and [a + 2h, f(a + 2h)].

2

Construct an interpolating cubic through the points (0,1), (1,2), (2,2), and

3

Construct an interpolating

(3,3). and (2,4).

cubic through the points (—1,1), (0,0), (1,1),

6.4

6.4

The Vandermonde

THE VANDERMONDE

DETERMINANT

determinant

To show that if x, # x,, then the Vandermonde determinant

Die

1

x,

HP

See

1

Xo

Xq?

Ya

ee a

x,"

Meet eyes Mahan s vvissssoaddunidecsouce lt

eter

aha

Xn+1

is not zero, we ignore where it came from and regard it as a function of the variables x, (i= 1, 2,...,n+ 1), DD

(xiites > 4: wei)

First, D is clearly a polynomial in the x, What is the degree of this polynomial regarded as a function of all the x;? Each term in the expansion of the determinant has an element from each row and each column. The element from the first column has 0 degree, the element from the second column has first degree, the element from the next column has second degree, and so forth, and in total the degree of the product is

Tt2+3+¢°*+n=

n(n + 1) -

Degree of this term is

O+1+2+:--tn as is the degree of every other term

Next, it is evident that if x, =x., then two rows of the determinant D=0; if x; =%4, are the same and D =O. Similarly, if x, =x3, then all the factors have must D Thus, xn4:. = then D = 0; etcetera, until x,

ll (x; — X1) = (2 — X1) (Xa — 1) 0

Hanes = y=1

which has a coefficient of +1. When we expand the product form and search for this same term, we find that it comes from the terms on the left sides of the parentheses and hence also has a coefficient of +1. We conclude, therefore, that

D=

I (xi — xj) i>j=1

and D # 0 for x; # x; (i ¥ j).

6.5

6.5

THE ERROR TERM

151

The error term

Having found the approximating polynomial, we naturally ask how much error we make in the analytic substitution process when we use this polynomial P,(x) in place of the original function y(x). We therefore examine the difference

y (x) — P» (x) This difference is zero at each x = x;; hence, we can write y (x) — P(e) = (We — 21)(SB x2) 2X

— Xg

K(x)

where K(x) is some function of x. For any value of x, say x (we are not using complex numbers so this is not the conjugate), we can write

y(x) = P,,(x) = (x — x1)(x— x2) + + « (x — xn) K(x) Consider, now, the expression

(x) = y(x) — Pa(x) — (x — x4)(x— 22) + + + (Xe — Xn41) K(X) where the x occurs only in the K(x) term. For this expression we know that (x) =0

If we

differentiate

for

Gee

ee

this expression

eo Tan

once,

we

and

x

get, because

of the

mean-value theorem,

d'(x) =0

Mean- value theorem

There is a 6 where tangent is horizontal

va

152

INTERPOLATION AND ROUNDOFF ESTIMATION

for at least

n +1

values of x. If we

differentiate again, by the same

reasoning we have

p"' (x) =0 for at least n values of x.

ye

Ae)

cece

URE!

Continuing in this manner, we come finally to

gm? (x) =0 for at least one value of x, say, x = x*, provided the function y(x) has at least n + 1 derivatives. When we actually carry out this differentiation with respect to x on the term

(x = 25) (x= x2) * + © (k= X41) K(x) we find that we have

ot! (x?) = y"t! (x) — (n + 1)! K(x) =0 The expression for K(x) is therefore =

Osc

ype

(

al(eteme

6.8

DIFFERENCES OF POLYNOMIALS

Thus

A drops each single power to a polynomial hence, since A is a linear operator,

k I

of lower

degree;

k CTE =>) a; Ax'

0=1

i=1

is a polynomial of degree k — 1 with leading coefficient equal to a;,kh

If now we apply the lemma repeatedly, we get, after k times, a, kth*

and after k + 1 times, exactly 0. The reason that this theorem is so important (in classical numerical analysis) is that we tend to tabulate our results at a small enough interval so that the function may be quite closely approximated locally by a polynomial of moderate degree. If as we take higher differences, it turns out that the differences of our (equally spaced) results do not get small rapidly, then usually we compute the results at a spacing half as large. TABLES OF THE FUNCTION Y = X? AT DIFFERENT SPACINGS

x

gmx

4

0

4

2

4

12

4 6

8

10

16 36 64

100

OVA

20

eS

36

Atty!

ey

Ay

= A*y

x

y

Ay 1

= A’y

0

0

1

0

0

8

1

1

3

2

1

1

4

6

2

4

5

2 2

et 1

{

—— 4 1 5 2

3

9

4

2

4

8

4

&

16

125

9

2

2

4

p4

if

1

4%

The effect of halving the spacing in x is to divide (approximately) the first differences by 2, the second differences by 4, the third differences by 8, and so forth. Thus, by choosing a sufficiently small spacing, we should in most practical cases be able to get the difference table to have small values. This remark must be tempered by the fact that we compute with the a typical numbers that are actually in the machine and the theorem is

160

INTERPOLATION AND ROUNDOFF ESTIMATION

mathematical proof concerned with infinitely precise numbers. we turn in Sec. 6.10 to the question of roundoff.

PROBLEMS 1

Thus,

6.8

Find the first and second differences (for h = 1) of y= 2x2 — 6x — 7

2

6.9

Show how to construct successive values of a polynomial from the starting differences.

An example of table extrapolation with polynomials

The manufacturing costs of a certain computer component were estimated to be

Year

Cost

1968 1969 1970 1971

63 52 43 38

How are we to judge this, and in particular how can we extend the table?

C

ACMA

1968 1969

63 «52

1

1971

38

©

Cycle

Ped

If we assume



aren

GHALG:

3

that it is a cubic, then the third differences are con-

stant and we can fill in the last column with 2s. From calculate in turn A?C, AC, and C for each year.

this we

can

6.9

AN EXAMPLE OF TABLE EXTRAPOLATION WITH POLYNOMIALS

Year

Cost

1968

63

1969

1970 1971

1972 1973

1974 1975

A

Az

52

maIL

2

39 48

‘ :

a 10"

AS

43 SE 4 2 Ee =ae 67 98

om 3

12

¢ £52

Me

The manufacturing costs are rising rapidly! But we note that changing the last two given estimated costs to

1970 1971

44 39

(which is a small change) lets us fit a quadratic:

Year

Cost

1968 1969

63 52

1970 1971 1972 1973 1974 1975

A

jt

AS

44 oS eles ep 3 7 38 42 49

“ 3 ames ets

whose extrapolated value at 1975 is one-half that of the cubic! Polynomials are simply not suited for extrapolation, but if you insist on using them, then the lower-order ones seem to be safer. Note that both have a minimum around 1972 and that the quadratic provides the more believable estimate, but is hardly a reliable prediction. a Further examination suggests adjusting the 1971 value only to fit

quadratic

162

INTERPOLATION AND ROUNDOFF ESTIMATION

Year

Cost

AC

A?C

196863, ions g 52°", ak 2 1070-6 43." 5 2 ivi seprseeea ee 2 1872 ea 31. gee? 1973 Gr28 me 2 1974 27 2 io7s ag Here the adjustment of the 1971 value change in view of the uncertainty of the believable set of costs, but by this time has been destroyed and we suspect we

from 38 to 36 (not a large estimates) gives a still more presumably our confidence could produce almost any-

thing we wanted if we fooled around long enough.

ee

eS ee

a

ee

ee

eee

As an alternative approach, consider that perhaps it is the log C that is important.

1968 1969 1970 1971

log C

A

1.799 LAN 1.6633 1.580

—83 35} —53

Suppose we alter the 1971 value so that the A is constant. We should have

log C = 1.550 C=35.5 We should then have the table for extrapolation. Cc

1968

63

1969 52 1970 43 1971 35.5 1972 «+++ 1973 + +--+ 187Atay 4 aang 1975

16.5

log C

1.799

1.716 1.633 1.550 1.467 1.384 1.901 1.218

A

we oe

Ey ee

6.10

ROUNDOFF

ec e a a Again, we could slightly modify the data.

Cc

log C

1968 1969 1970 1971 1972

63 52 44 385 33.5

1.799 1.716 1.643 1.580 1.527

1973 1974 1975

30.5 28.5 27.0

1.484 1.451 1.428

A

_g _9 _g3 5 —43 33 93

A rather different cost in 1975.

6.10

Roundoff

Suppose first we had a function which was zero identically, and that we made a small error € at only one point. Let us examine the difference table

0

a

0

ee oe mee

0 *

A 0

0

3e ie

We see immediately the growing triangle of ¢’s having binomial coefficients which grow rapidly. Thus the difference table tends to magnify small errors. Second, consider the computed table as the product of the true values y(x,;) = y, and of random roundoff factors of size (1 + €,). Thus, the values we have are

yi(1

+E) = yi + yiki

164

INTERPOLATION AND ROUNDOFF ESTIMATION

When we compute the difference table of this function, we get the and same result as if we had computed the differences of two tables original the of that being es differenc of table added them: the first function y; and the second table of differences being that of the roundoff y,€;. The reason that this is so is, of course, that the operator A is linear and the difference of a sum is the sum of the differences. e ee e ee Find the error in the table.

A

Az

AS

246 ee ee —28 300 2 —30 330 "i =37 967 wc Sanuat® 408 oc soee 448

0 5 =5 ‘

We suspect that A® shows an error. The pattern low

high low high is centered at about 330. The 5, —5 suggests —3é, 3é

The difference —5 — (5) =—10

suggests that

e=-2

Try 332 in place of 330:

A

change

A?

246 og ROO

ABS are 4

332

a2

3

Se78 406. ag

ase te

4 8

It looks much better!

~ 6e

ee

6.11

PHILOSOPHY

165

Notice in computing the difference table that usually y; and yj.,; have the same sign and therefore there is a cancellation and no loss of absolute accuracy even if the numbers do get smaller. It is only in the occasional situation where the algebraic signs of consecutive table values are different that an addition can occur and hence the possibility of a carry to the left which forces a roundoff that loses information.

PROBLEM 1.

6.10

Find the error in the table

2460 2718 3004 3318 3669 4055 4957

6.11

Philosophy

From the beginning we have stressed that we are using finite machines. One consequence of this is that we must recognize that roundoff can occur and at times do us serious harm if we are not careful. We are about to enter deeply into the domain of the infinite processes in mathematics which of necessity must be approximated by finite processes. Thus, in addition to roundoff we face on a computer a second source of trouble, truncation error. Truncation error versus roundoff error. Smaller steps usually reduce truncation error and may increase roundoff error.

Just and some must

as we recognized that we were going to use the words “small” “Jarge’’ when talking about numbers and therefore had to do thinking about scaling to give these words meaning, so now we try to create some kind of a theory which will enable us to separate the two effects that arise from the finiteness of the machine.

Finite number length leads to roundoff. Finite processes lead to truncation. wes Pete twin ig Le ool Pa

ye

ae

ae

-

166

INTERPOLATION AND ROUNDOFF ESTIMATION

The theory we shall produce rests on the two observations: 1

The higher differences of a suitably tabulated smooth function tend to

2

approach zero (Sec. 6.8) The higher differences due to random (Sec. 6.10).

roundoff errors tend to get large

The weakness of the theory was carefully indicated by the word “tend” in both statements. We are not going to present a perfect, reliable, elegant theory; rather we are going to develop what might be

called a ‘desperation theory” whose chief merit is that it is better than no theory at all. If the truncation error is too large, usually we can reduce it by a known multiplicative factor(+)* by halving the interval, (Sec. 6.8), but if we are already down to the roundoff level of accuracy, we shall clearly do ourselves harm (as well as increase the computing bill) with no gain in accuracy. Truncation error is typically of the form 7

hky'*)

(6)

Eas Ce

ork

4

hk y*-1) (6)

but may be

ilamgnly aaT Let us be clear about what we can expect from a roundoff theory. We could expect to state a bound on the error. If we develop this approach, we find that often the guaranteed bound is so pessimistic that we cannot afford to use it. Also it is likely to cost us dearly either in our own time or in machine time.t We intend to develop a statistical theory for estimating roundoff. Just as an insurance company by using statistics can often make quite successful predictions about the number of deaths they will have to pay off on (although on any individual they may be very wrong in estimating his death date), so too we shall make estimates for average behavior only and shall at times make gross errors. As we said, it is a desperation theory. Epidemics of trouble may occur and invalidate our predictions.

+A method now being developed under the name ‘range arithmetic"’ (also called ‘interval arithmetic’’) will probably provide one solution when it becomes generally available.

6.13.

6.12

ROUNDOFF IN THE KTH DIFFERENCE

The roundoff model

We will assume that our numbers y; have typical floating point errors and appear to us as yi(1 + €;)

where y, and & are uncorrelated. We will assume that “on the average” €, is as likely to have one signt as the other; thus, we assume

Ave {e,;} =0 We are, of course, using the customary statistical device of thinking of many repetitions of the same experiment (calculation) and of get-

ting an ensemble of ¢;. In practice the same input, barring machine failures, gives the same result. We are only ‘thinking’; thus, our averages of the €; are taken over the imagined repetitions. In addition to assuming ave {e,} =0, we assume that the error e, made in the ith function value is uncorrelated with those at other points. Uncorrelated means Ave

{¢,€,} =0

i Fj

Often, especially in recursive calculations, this assumption is plainly false and we must be careful how we apply the conclusions we are going to develop. As we noted, epidemics do occur, especially on computing machines that “chop” rather than round.

6.13

Roundoff in the kth difference

Let us fix in our minds a particular position in the kth difference column

of the roundoff error table of values y,¢,. The entries which are

far removed from the difference entry we are looking at are not used in its calculation. As we go up the table from the bottom, we come toa first entry that is involved in the computation of A*y,é;. The entry YierEire Will have a coefficient +1. The entry yisu-1Ei+n—-1 which follows

by +On many computers the numbers are not rounded but are “chopped” size are dropping the extra digits. If positive and negative numbers of the same

about equally likely, then this model is reasonable; otherwise it is not.

167

168

INTERPOLATION AND ROUNDOFF ESTIMATION

will have a coefficient —C(k,1), then ysx-2€i+n-2 With coefficient +C(k,2), and so forth. In all, we shall have the usual formula Riga ion CABONMESIGV) BYE ste

=E-1

Ak = (E — 1)*

k AK {y,€)} =>) (1 i=0 k

CsA) vise sbine-s

=>) (—1)F9C(K.S) yissEits

io

for computing the kth difference we fixed our attention on. How big is this number? If we bound the number we shall get

5.Clk,j) =(1+ 1) = 2 J=0

If we calculate the average, we shall, by the nature of our assumption

on the behavior of the errors €;, get exactly zero (the y; are fixed and the average is over the ensemble of the &;). It is much more reasonable to ask, ‘‘What is the average of the square of the kth difference? That is,

Ave { (A* y,&;)?} since in this kind of expression there will be no cancellation of plus and minus values of A*y;€;,

i

n

Variance {x1} =— > (x. — *P

nm

where x = mean of the.x;. In text, the mean of the y;&; is zero.

all terms will contribute according to the square of their size. Big errors, being more serious than small ones, are taken more seriously in-the calculation of the average. We might have tried Ave {|A*y,€;|}

6.13

ROUNDOFF IN THE KTH DIFFERENCE

but this is unmanageable theoretically, and we stick with the variance, as it is called in statistics books. Substituting our expression for A*y, squaring out, and taking the average term by term we get, recalling the lack of correlation of the ¢,,

k Ave {(A*y,€;)?} = Ave {D (1 j=0

Cl Ayreseers Sy (—1)"C(k,m) yiemeiem m=0

ty

=F

Dd CH 1P

Clk AN C(km) yrsyirm Ave {€145€:4+m}

j=0 m=0

k =P Crk Auris Ave { (€:45)?} j=0

Remember that the ¢,’s being uncorrelated means Ave

{€445€1+m } =0

G cad m)

Holding this formula in the back of our minds, let us ask what the variance is in the error terms of the orignial function values; this is by definition

Ave {(y:&)?} = 0? We call this quantity, 02, the variance or ‘“‘noise level” in the original

table. Now, returning to the formula we were examining and noting that Ave {y7&?} =y? Ave {¢?} =o? we have

k 2k)! Ave {(A*y,8))?} = 0? >) C2(k,j) = 0 ae j=0

(The last step is to be proved as an exercise.) in conclusion we see that the roundoff noise in the kth difference is approximately

(2k)! (k!)? times as great as the noise in the original table, and hence the difference

table,

because

it tends

to emphasize

the

errors,

should

169

170

INTERPOLATION AND ROUNDOFF ESTIMATION

provide a means function.

of forming an estimate of the roundoff error in a

(2k)! /(k!)?

~~

2 6 20 70 252 924 3,432 12,870 OnN — RWN Oa

Our next difficulty is a standard one in statistics; we have a single table and have developed a theory for an ensemble of tables. The standard solution is to appeal to ‘‘the ergodic hypothesis” that the average over the ensemble is the same as the average over the table, of which we have a short, finite sample. All one can say in defense of this assumption is that it is both plausible and necessary. The ergodic hypothesis is very commonly used by people without realizing it. For example, from an insurance mortality table giving the probabilities of 100,000 people’s dying next year, one tries to deduce the pattern over a single life—a typical ergodic hypothesis application.

PROBLEMS 1

6.13

Expand both sides of

(1+ 1)? = (1 + 2)7(1 + 1) and equate like powers of ¢ to get

C(a +b, r) = s C(a,s) C(b, r— s) 3s=0

2

Inthe previous problem, set a = b =r

to get

C(2r,r) = s C?(r,s) 8=0

6.14

3

CORRELATION IN THE KTH DIFFERENCES

In Prob. 1, leta=b=n, r=n +1 to get

C(2n, n + 1) = > C(n,s)C(n, s — 1) 8=0

6.14

Correlation in the kth differences

We now have the pressing problem of deciding which difference column to use. To answer this, we need to develop another fact about kth differences that we actually use, namely, that adjacent values are highly correlated in a negative way. By correlation we refer to the obvious fact that the A*y,¢, and A*y,,,¢;,, use many of the same values of Yj€;.

We therefore examine the quantity

Rey

c(kry (Ht)

x VC(@kkK)

s C(k,r) C(k, r+ 1)

Ch r+ 1)

2

VC(2hk)

~—

C(2kk)

ELS So) ee

Chk)

ae

ok +1

This shows that if one term in a kth difference column is, say, +, then probably the next is —; our theory of roundoff noise estimation Inin sign dicates that the higher k is, the more probable is the change between successive entries.

172

INTERPOLATION AND ROUNDOFF ESTIMATION

k

—k/k+1

Probability t+, %

ran ike ee ae 1

-3=-.50

66.5

2 3

-2=—.67 —3=—75

74.2 76.8

4

—}=-—.80

79.4

5 6 7

-§=—.83 —$=—.86 ~~ f=-.875

81.5 82.8 83.9

8

-—%=—.89

84.7

9

~—,=-.90

85.6

10

—-}?=-.91

86.3

20

-#=~—.953

90.1

+The probabilities were found from suitable tables.

6.15

A roundoff estimation test

We now apply the various pieces of the theory. If the function were a polynomial, then the higher differences would all be zero (provided we went to a sufficiently high order of differences). We agreed that we usually compute at small enough steps so that in the absence of roundoff errors the ideal differences tend to approach zero.

The second part of the theory is that if we have random data, we expect that a suitable column in the difference table will have alternating signs and that the differences will grow in size. We view our table as the sum of a table of the ideal numbers plus a table of the roundoff noise. The differences in the first table we hope go to zero;

Calculated or measured value

Ideal or

true

Roundoff error vy

y(xi) = ¥(x,) + Y(xie, We hope 1 2

A*Y(x;) become small for increasing k. A*Y(x,)& become large for increasing k.

6.15

AROUNDOFF ESTIMATION TEST

those in the second tend to grow as we go to higher differences, and to alternate in sign in going down a particular column. We, therefore, in our desperation theory assume that when we see alternating signs in some kth difference column, we are seeing roundoff noise. Although counterexamples are easily constructed, they seem to be artificial, and in practice, this is a fairly safe guide. How shall we test “alternating signs’? Experience shows that usually we operate around fourth and fifth differences where

k+t p = 79.4%

are the probabilities of a sign change between successive kth differences. Thus, if out of three possible changes in sign between four consecutive fourth or fifth differences we have at least two changes in sign, then we will say we are at “noise level.’’ This is a practical compromise between opposing forces and represents an average criterion, not absolute safety. Having found k, we can now apply Sec.

6.13. It is important to note that the structure of this roundoff theory does we start with the answers we make our estimate. which from and form a simple difference table particular calculation, the of independent is that Thus, it is a theory numbers in the particular the on depend obtained results the though not depend

on the particular calculation;

table.

PROBLEMS 4 2

6.15

Check the roundoff level in a standard five-place sin x table. Is it as expected? Estimate roundoff level in the table

21215 21236 21257 21286 21308 21329

173

4

;

« OF

SSAA,

MEPS ORT

Sl

Tp

ave

M Joa :

5

ekple gitikmaiin seq ew Mora tort? @irreroe ‘oper Ye tevoceat qe nt LA

acion

omiinn

grins

on

Ow Ria

ysl

sciup ine

:

s

tui!

ewaru)

7

spostege

Sioiw

gage

t.

soncielib

{"aagie.

enor

gnvarnete”

teat

ety nnn ona Huet

Vone

Cee

MA

€ a

nv

Blgmaxeteinugd Aguoitti

yibes 210

Gatowtrio7

veri

4c Of fwee

7%

2

re

¥

=

pascee eff AF esort 7 Seo seonssstii> teient of 9S Gv OB wor Of Snet B hwek py ing nm "he oh ortemaita of” Actes AN cordon want

4

Ts

eed

Pog al

=

ite ath @aatOM



R

ae

:

=

a

_

Rae

Pewee? agoets

“3

8 lo emijbasd edomn §tht

suet Heswted AO’ ri eeqnald witliecd eorti fo fun 1 BadT .

@! seGnROo ew! ace te avad ow wena gid HN) wy reel only Veneers

sient” pro! sinn’ i

Ard SQe Oe ose ying

oe

ow

©

Ww aw

ie etAdintoss: bn done! gnivegno Heswm all ae ere naa tw. A BAvel gritehh viele etutoncae

tp

Qa

asad viet) Molnar vee tte teh BIND 191) geDat ah PHY Sreyrnor a) ite, me

ow

eounkaale > pine

SPORE hia MT ee. izle eh

ati 04ber

Olas nocneetii sieve & 8

vette uta ae ateagret att Ie emai ragaamrn ai atl wcertt be eid Terwrcatit: bint? WEY

po ill

m

Can

FM

«= ®

or) rit ina hres

sh ‘

PE

DerAare walHari

a

*acten

aweeD

i

ana

4 «aie

=

o ae +

= oe * rh -

Rie

_

- eg

>

ae

P

4

oa

i

~~

mf)

OA sels

iw pthc

tentionlate

x

‘ -_

-

: —

~%

j

shiny ett

area

tl

-

TT

ama: ’4

'

z

Sru0cw etarveas ew ¥ ‘Bueee _

3.18

;

telat

aw. lieta: wold.

7

ICE 7

Neha

7

er ts

¥)

INTEGRATION

7.1.

Introduction

A common situation problem is given by methods of analytic tinct kinds; the first

in engineering and science is that the answer to a an integral that cannot be evaluated by the usual integration given in textbooks. There are two diskind is the indefinite integral

ix)= [F(tat which requires a table of values to give the answer and which we shall take up in the next chapter as a special case of integrating an ordinary differential equation; Indefinite integral e7

I(x,t) = y 0

dx 1+

x?

where t is a parameter. For each t we need a table

the second kind is the definite integral

uid)= [fla)dx

176

INTEGRATION

which is a single number and is treated in this chapter.

Definite Integral =e

—tz?

HO=}, 4a For each t we need a single number. ct

i

a

RS

ER

There are two distinct approaches to the definite integral; the first is to use a single formula for numerically integrating from a to b; the second is to divide the range a < x < b into a number of subintervals, apply a formula to each subinterval, and add together the results over each of the subintervals. The latter process generally uses the same formula in each subinterval and is called a composite formula.

Be

a

fix)

b

x

A single formula for the whole range

Fix)

a

b

A formula (straight line) for each interval

x

7.1

INTRODUCTION

The basic idea for finding an integral over a subinterval is to approximate the integrand f(x) (or a part of the integrand) by a polynomial and then analytically integrate this, a process we called ‘‘analytic substitution” in Chap. 6.

f fo dx =f (x)dx

lf we approximate the integrand with a polynomial of degree n, then obviously if the original function f(x) is a polynomial of degree n or less, the formula will be exact for f(x)= 1, x, ..., x". Conversely, if the formula is exact for f(x) =1,x,..., x", then it will be true for any linear combination of them, namely, any polynomial of degree n. These two not quite equivalent methods} both have their uses. The analytic substitution is a way of thinking about the problem; the “exact for 1, x,..., x”"' is a method for finding formulas easily.

Analytic substitution

Operate on p (x) as if it were f(x). Exact

Make formula exactly true for eee hs 2

There are so many different integration formulas that we shall only give a few of the more useful ones. By useful we require that they contain some method for estimating both the truncation and roundoff errors of the calculation or else have some other feature of considerable merit. These are severe, practical restrictions; however, the first one has led to the development of new formulas of some merit.

+Sometimes,

it is not possible to find the approximating

given y(—1), 4 (0), (1), yw (—1), (0), w’"(1).

polynomial,

e.g.,

177

178

INTEGRATION

7.2

The trapezoid rule

interval

If we approximate the function f(x) in an straight line, through the end points

y(x) = f(a) + oe)

a=x=b

by a

(x—a)

as we did in linear interpolation, and then integrate this approximation as if it were the original function, we get, after some algebra,

I -[ y(x) dx= [ey] which is simply trapezoid.

Example

the

classical

formula

(b — a)

for

finding

the

The trapazoid rule by the exact matching method.

We try

[foo = w, f(a) + w, f(b) and make it exact for both f= 1 andf = x. For f(x) = 1,

b-—a=w,+w, For f(x) =x, Be = a =w,a+ web

Eliminate w2b:

ehh

he

— b(b — a) = wa

22

— wb

pb=-wy b-a

Hence,

and the formula is [#0 dx =

P= * Ifla) +f(b))

area

in a

7.3.

THE TRUNCATION ERROR IN THE TRAPEZOID RULE

Usually the function f(x) is such that a single Straight line for the whole interval is not a good enough approximation. In such cases we

can divide the interval a < x < b into n subintervals,+ each of length h =(b—a)/n, and apply the formula to each subinterval (a,a +h),

(a+h,a+2h),...,(a+(n—1)

h, b) to get the composite formula

[ #9 dx =h{ 3f(a) +fla+h) +--+ + 3f(b)]

7.3

The truncation error in the trapezoid rule

We consider first the truncation error in the simple trapezoid formula.

,

Je.

uncation error equals E

a

ath=b

x

formula is +We need not use the same spacing for each interval, but then the not as elegant.

179

180

INTEGRATION

gives All we shall provide is an approximate theory (which, however,

essentially the same result as the exact theory). We write the Taylor series for the integrand as f (x) =f (a) + (x — a) f' Gi

Y f(a) +e

and substitute this in both sides of the simple trapezoid formula. The left-hand side is

[ [ras apa + 25% pra +] dx which upon integration becomes

(b — a)f (a) pe Y f(a ) —— Y fa) ta The right-hand side is, including the truncation error E,

AIfla) +b adf (a) +25" pla) + + fla] +E Equating the two sides and recalling that h = b — a, we get, after cancellation of like terms,

ow

eS or

ye:y+

ee

dey

E= oe yey

Jt+e+E

fi Woe

The next term in the expansion has a (b — a)* factor, and we shall assume that all the terms beyond the first error term are small. Although it is not always true, for this formula it is true that the error E is ex-

actly expressible in the form

E= _e—e Oe OL

ae es)

We next examine the composite formula. Here (b — a) is h. Thus, the error term for n intervals is hs

aah

"

(0;)

——

cee

=

BE,

sof

"

(On) = = ~

"

5 f"(64) i=1

7.3

THE TRUNCATION ERROR IN THE TRAPEZOID RULE

Now, by using the fact thatt

S cf"(6) = (5a)F) i=1

i=1

provided all the c, = 0, f” is continuous, and @ is inside the range of the 6;, the error term becomes

ee

pea

Oe

a

a

ane

(8)

since nh is the new range b —a. If we can bound the second derivative, then we have a bound on the truncation error. Unfortunately, such bounds either are not easily obtained or are useless. (See Probs. 4 and 5 below.)

PROBLEMS 7.3 1 2

Derive the composite trapezoid rule for unequal spacing. Derive the error term for Prob. 1 above.

Integrate by using the composite trapezoid rule,{ and estimate the truncation error in Probs. 3 to 6: 1

3 i e~** dx

byusing

h=%

0 12

oj

ie =

a

anit

dx

1

5 [ xinxas

by using h = =

18

h=%

0

TRUE ANS. = 12 6

iy sin? x dx

A

Bases

18

7 TRUE ANS. i.

$This important result may be proved by induction with the use of

C1 8(0;) + Co@(O2) = (cr + C2) 8(8) where 6, q,{A%(n — s)* +

(Senin

eee

ee

determine the q,. Appendix A derives the formulas for the coefficients qs.

7.9

Results

In Appendix A to this chapter the following table of the coefficients q, of A’ is derived. The first column gives the coefficients for the Gregory formula we started with. In the second column, we have the coefficients for ‘Simpson's formula plus end corrections.” Note that the coefficients are all smaller in size than for the trapezoid rule of Gregory’s formula. In this sense we can say that this formula is “better than’ the Gregory

7.9

Gregory,

=4

1

1

a ta

3

~

1

3g

ae)

3

4

_

6 _

6

0

a

4 — 355

eo780

4 — 380

ees

_

_548

_275



ie

24,192 33,953

3,628,800

a=0

0

790

=

5

7

Simpson,

hl

art

:

RESULTS

29,228

— 3,628,800

B

80 a2 60,480

— xaos 19,778

— 3,628,800

formula; however, there always exist special integrands for which a particular integration formula is exactly correct and for which a “‘better formula” will do worse. In the third column, a=0, we have the very interesting case in which half the coefficients are zero,

Remember, a=0

implies {c=2a=0

b=2-2a=2

For Simpson the of those than a =0 the first few gq, are positive and larger formula, but if we include the fourth differences as end correction terms, then the rest of the coefficients are smaller than either of the other two! Thus a = 0 seems especially attractive. We need to compute a number of the end values at the regular end spacing because we need the values for the differences for the corrections and for the roundoff estimates, and many of these function values never need to be computed!

191

INTEGRATION

a.

f(x)

b

wit

c

ot

b



(@

ot

b

_

c

ot

VAFG)-

Aly

Oy,

+Can be omitted if only A° is used.

but over the middle range we do not need half the integrands since they enter into the formula with zero coefficients. Experience shows that typically we can use the fourth (or sometimes the fifth) differences to estimate the roundoff error, and if we are not down to roundoff noise, then we can look at the first A*’s we dropped and ask if the truncation error is small enough. If the truncation error is not small enough, then we decrease the interval by a factor of 3 for the a=0 (thus, we can use the old points) and a factor of 5

for the a= 4 b

c=0

e

b

e

_@ © e e e e

Cc

Il @

© e oe oe

bc=0bc=0b c=0b c=0bc=0

NOR

these

two points

lf we keep the fourth differences (and roundoff noise is not causing trouble), then the truncation error should drop by a factor of about

3* = 243 for a = 0, and by a factor of 2* = 32 for a= 4. If we meet roundoff errors before we reduce the truncation to the

size we wish, then we can try fewer differences at that spacing and hope for the best, but generally we are in real trouble.

PROBLEMS

7.9

Apply the appropriate integration formula using at least five-decimal arithmetic and a suitable number of differences:

7.10

GAUSS QUADRATURE

i =

[ ent

as

h=%

a=0

0

2

Ly

[ sine x ax 0

3

4

7.10

0

8

[sine x ax 0

2 6

7

h=—

e-= 1+

7 h=— 8

= tt

ha

Z=0

10

1

1

sa Od

x?

Gauss quadrature

Up to now we have often used n + 1 equally spaced sample points and fitted a polynomial of degree n, using the n+ 1 sample points to determine the coefficients. As a result, the formula was exact for polynomials of degree n, though as in Simpson's formula it sometimes happens that the formula is exact for polynomials of one degree higher. The idea behind Gauss type of formulas is that by also using the positions of the sample point as parameters, we can get a formula by using n samples which is exact for polynomials of degree up to and including 2n — 1. For many purposes this is a significant gain. The theory, however, is so complex that we will give only one example of a Gauss quadrature formula. Consider the formula

[Fle dx = wafter) + wafles) where the w; and the x, are all parameters. Thus, we can trv to make the formula exact for

ah Pe iest f=:

2=wWw,+

2

f=x:

O = w, xX, + Were

f=xt:

2=w xi t+wx}

f=:

O= wx} + w2x}

193

194

INTEGRATION

s We need to solve these four nonlinear equations for our unknown pick we if that see to easy W1, We, X1, and xp. It is Wy, = xy

We

=

Ts

two then both the second and fourth equations are satisfied. The other equations become 2=2u,

or

w,=1

and

2=2w.x,

or

x2=4

Hence,

a1 x, =—= = +0.577--3 Our formula is, therefore,

J fiede=4(S3) +4(ya) and it is exact for cubics in spite of the fact that it uses only two sample points! There are three classic forms of Gauss quadrature,

[.f(x) dx

‘ie-* f(x) dx

in each of which the f(x) is approximated

fs e-* f(x) dx

by a polynomial. Extensive weights w; are

tables of the sample points x; and the corresponding available.

tHandbook

916-924,

of Mathematical

Functions,

NBS Appi. Math., Ser. 55, 1964, pp.

7A.1_

PROBLEMS

THE BASIC DERIVATION

7.10

Find the Gauss quadrature formula for the following:

1 f Flex) = ws f(a,) + wafleg) + waffles) NOTE:

x,;=x3;

w,=ws

Xo =0 2

2

| e-* f(x) dx = wf (x,) + wif(x,) 0

3

| e-2 F(x) dx = w,f(x,) + wof (x2) + waf (xs)

APPENDIX 7A.1_

The basic derivation

We are going to solve the system of equations

% x* dx = a(—n)* + b(—n + 1)* +--+ + a(n)* + y gol =n

— s)* +

s=1

(1) S2(—n)

(=

051)

by the ‘generating function method.” In this method we multiply the kth equation by t*/k! and sum all the equations. Using the fact that

we get n

| ett dx =ae—-™

+ be "++ ceCnt2 + --- + ae™

+3 q.[Arer-ot + (1) tem")

—n

But

ee ee The symbol A,e* means that x is the variable that is used in the differencing.

ee

eS

A,et

=

elrrit

Se

ae ett =

ett (et =

1)

196

INTEGRATION

hence, Antez

=

et (et —— i)

We have, therefore, n [

+ en)

a(e-™

ett dx =

+

n-1 b >

e(—nti+2k)t

n-1 »Y e(—n+2k)t

k=1

k=0

—n

+c

++ y q, [em(1 —e7*)* + e-™(1 — e')8] Doing the geometric sums and the integration, we get ent —

p-—nt

nt —

ean

e(r-i)t

—_,

+

——— =ale + e-™) + bhe'—e! oF t

a

e

>

(n-1t

eu em

+ »S qs {en*(1

_

ent)s + en

=

(1 — e')*]

s=1

Notice that we can divide this identity in ¢ into two parts. The first part is ent

Saito:

Sy

bent

nt

ees ak2a cl es

cet

oo

ee ciesgaaa ode

Sa

nt

p—-t\s (i— — 2-5)

s=1

and the second part is exactly the same except that ¢ is replaced by —t. Thus this single identity in t is all we need use to determine the

constants qs. This expression is an expansion in powers of 1 — e~‘, so we set c=1-—e-'

which is the same as

=—In (1 —») and factor out the e”, leaving (after some algebra)

—1

co

b(1—v)

c(1—v)?

inthe pliae oleArias

©

ead

?

8=1

Rearranging this, we have 2

2 a?

s=1

ae

1

b(1—v)

renuoehe

| c(1—v)?

Cen

7A.2_

THE GREGORY FORMULA

and we see that q, is simply the coefficient of v* in the power series expansion of the right-hand side.

7A.2 Before

The Gregory formula doing the general

formula. In this case

formula,

we turn to the special Gregory

a = 3 and b=c=1. Our formula for the q, is

therefore

The troublesome part is finding the power series for the reciprocal of the log term. To get this, we start by noting

In(i—v)

dv : fan

=-

vo?

d epsian v3

rts peuah =-o1+5+5+---) onl When

we divide this into 1 and combine

it with the two other terms,

we get

=

2

v

v?

19

= —F9 — 24 720

pi—ess

More values of the coefficients are given in the table in the text under Gregory, a = 3. These coefficients are the y’s, the coefficients of the differences in the integration formula.

7A.3

Other special cases

if we write the coefficients of Gregory’s formula as g,, we have from the definition

197

198

INTEGRATION

A.3 to elimi: This can be used in the general formula at the end of Sec. nate the term eS

1 In (1 —v)

to get the convenient formula

at peeCN. Be =e) eo)

ef e r eee 2 = ES

Now, using c = 2a, b = 2 — 2a, and some routine algebra ao

co

D

4

pee eee? +(5-8) (2—u) Dividing out this last term, vo

v/2

=

2—-v 1 sae

fu\e

te)

we get finally (upon equating like powers of v*)

Gs

8s

2

a

from which the table entries follow easily.

38

ORDINARY DIFFERENTIAL EQUATIONS

8.1

Meaning of a solution

Differential equations are of frequent occurrence in engineering, and relatively few of those that appear can be solved in closed form by the standard mathematical tricks. We therefore need to consider the numerical solution of ordinary differential equations. Given a single first-order ordinary differential equation

y’ =H = Fox) what do we mean by a solution?

The indefinite integral

uix) =| fle) dt is the special differential equation

y’(x) =f (x) where f(x,y) does not depend on y.

Loosely speaking, we mean a Curve

y = y(x) such that if we calculate the value of the derivative of y(x) at a point (x,y), then this value will be the same as that given by the differential equation [also evaluated at the same point (x,y)]. This suggests a crude method for numerically solving a given differential equation. At each point (x,y) of a rectangular mesh of points,

200

ORDINARY DIFFERENTIAL EQUATIONS

we Calculate the slope from the given differential equation

and then

draw through each point a short line segment which has the calculated slope.

A solution

A direction field

Such a picture is called a direction field. Our problem now is to draw a smooth curve through the direction field such that the curve is always tangent to the appropriate lines of the direction field. If we succeed, then we must have a solution of the given differential equation. We see immediately that through almost any point in the area of the direction field, we can draw a curve. Thus, we are given normally both the equation and a point through which the curve is to pass.

y

No solution outside circle!

y'=

V¥1—x?—y?

This point is usually called the Initial condition.

8.2

IMPROVED DIRECTION FIELD METHOD

The curves along which the slope has a constant value k are called isoclines and are defined by

k= f(x,y) Often this provides an easy way to draw a direction field. Note that the

maxima and minima of the solution The direction-field approach just useful and practical. On occasions can settle an important point under

PROBLEM 1

8.2

must lie on the zero isocline. described, though simple, is very a quick sketch of a direction field discussion.

8.1

Show that the inflection points of solutions lie on the curve y” =0.

Improved direction field method

lf we want only a single curve (solution), then it becomes immediately clear that we need draw the direction field only in the region where the solution is going to go and we can ignore all the rest of the (x,y) plane. Thus, we start at the initial point (xo,yo), calculate the slope yo =f(xo.yo), and go a “short distance” in this direction to a second point. We now regard this second point as a new initial point and repeat the process again.

Xo

x)

Xe

X3

x

After enough small steps we shall have the solution on the finite interval we were interested in. If we want to know how the solution behaves at infinity, we shall probably be satisfied (after a moderate dis~ tance) by the general behavior of the direction field far out, or else we shall have to make a change of variables to transform infinity to some finite point. It should be also clear that the smaller the steps we take, the less

201

202

ORDINARY DIFFERENTIAL EQUATIONS

truncation error there will be but the more effort we must expend and the greater the possible roundoff error will be. It is obvious that we need not make the drawing; we can merely tabulate on a sheet of paper the coordinates of the points. Then, given a point (x,,yn) on the solution curve, the process is 1

Calculate y’, =f(xn,yn), Which is the slope of the curve at the point.

A

Slope y'~ >;

aoe

2

Poy

x

Calculate the next point

Ynsi =n thy, ee And

repeat steps 1 and

ee ay|

2 until we reach the end

chosen a uniform step size Ax, =) for convenience.

Method applied to y’=Vv1

y(0)=0

—x?—y?

h=%

With two decimal places except in the values of y

x

y

aaa hd

0 0.1 0.2 0.3 0.4 05 0.6 0.7

0 0.100 0.199 0.295 0.386 0469 0.572 0.601

0+ 0= 0 0.014+0.01=0.02 0.04+0.04=0.08 0.09+0.09=0.18 0.16+0:15=0.31 025+0.22=047 0.36+029=0.65 0.49+036=0.85

Vall V0.98 =0.99 V0.92 =0.96 V0.82 =0.91 V0.69 =0.83 V0.53 =0.73 0.35 =0.59 /0.15 =0.39

0.8

0,640

0.64 + 0.41 = 1.05

imaginary

0.9 1.0

y'

of the interval. We have

8.3

PROBLEMS 1

MODIFIED EULER METHOD

8.2

Integrate

y’ =y? + x?

(Ov

S41)

starting at x = 0, y =0 and using h = 0.1. 2

Integrate y

through (0,0), using 3

=e"

—x*

Os

2=2)

h = 0.2.

Integrate

y'’=siny—x

@ ==)

y(0) =0, using h = 7/12.

8.3

Modified Euler method

The trouble with the preceding method is that we tend to make systematic truncation errors because we are generally using the derivative of the curve that applied somewhat before where we are now. The

following method, called the “modified Euler method” gives a better

(more economical and more practical) method of calculation. We suppose for the moment that we have a pair of points (xp-1,Yn-1) and (x,,Yn) and ask, “How shall we find the next point (Xni1: Yat)? We will predict the next value by applying the midpoint formula

to shies uit Tp-1

which gives

Bes

203

204

ORDINARY DIFFERENTIAL EQUATIONS

Xn-1

Xn

Xn+1

Predict Patt — Un

1 2hy,'

where p,4: is the predicted value of y,4,. Clearly, we are using the tangent line at the middle of the double interval as a guide for how to move across it. Using this predicted value, we compute the slope at the predicted point P'n+1 =f (Xa+1 Pati) zn

and apply the trapezoid rule to [ 4 y'(x) dx which gives Tn

h

,



Ynti — Yn +3 (Proi + Yn)

as the corrected

value y,.,. Here we are using the average of the slopes at the two ends of the interval of integration as the average slope in the interval.

Xn

Xn+1 Correct

x

8.4

STARTING THE METHOD

We are now ready for the next step. In both the predictor and the corrector we have avoided some of the systematic error we discussed for

the simple method.

The method has four steps Pasi = Yn

+ 2hy,'

Pasi =F (Xn41 1 Patr)

Ynti =

h





Un si 2 (Yn ae Pats)

Yn+t = f(xn415Yn+1)

8.4

Starting the method

We assumed that we had two starting values (xn-;,Yn-1) ANd (Xn, Yn); but when the problem comes to us we are given only one starting

point. We shall give two methods of starting, one suitable for hand calculation and one for machine calculation. The hand-calculation method of starting is based on the Taylor expansion

y(x +A)= u(x) + hy! (x) +57 9") to h2

The derivatives are easily found from the differential equation by differentiating

y' =f (x,y) ”

=

af , of +

ox

pa

a

,

dy y

y'’ = etc.

take and then evaluating the derivatives. The number of terms to desired. depends on the step size h and the accuracy

205

ORDINARY DIFFERENTIAL EQUATIONS

Se

eS SE

Example

y= y? + x? through the point (0,0)

y"’ = Qyy' + 2x y'"

=Oy'2?+

Vie =

Qyy"

+2

6y’y"” ae Qyy'”’

at (0,0)

y=0 BCA y'’

y?'

y”

=0

=

2

=0

Therefore,

(hystos, Hone oa 94. oe ae eg | ne hsae 3

The second method which is more suitable for machine calculation is based on the repeated use of the corrector. Given (xo, yo), we first estimate an earlier point CssinM=n))

by the linear approximation

X-1

Xo

x X17 =Xo—h

Y-1 = Yo — hy Y-1 =f (x_1,y-1)

8.5

ERROR ESTIMATES

207

We then use the corrector backward

h



'

Y-1=Yo Pr) (Yo + y1) yt

= f (x-1,y-1)

and repeat this backward corrector enough times so that the value of y_, settles down. If it does not settle down within a few trials, then h should be decreased. The value y_, is used only once, in the first prediction.

Example y'=y?+x?

point (0,0)

x... =0-h=-h y-; =0-—h0O=0 yl, =h?

Use corrector

yi=0-20+nh%) hs

“2 hs yoy = htt ~ h? NOTE:

We got y-; =—h3/2 and not, as we should have, —h*/3. Iteration on the corrector will not change either y_, or y'-, in this case. ee ee ea. Gis Seca (EE 1

2

8.5

Error estimates

The predictor, as noted, is the midpoint formula (Sec. 7.3) with the use of a double interval, so that the error is

E= (2h)8y""" (8) _ hy’’' (4) im 24 3 On the other hand, the error in the corrector (Sec. 7.2) is ="

E key

y"’’

vee

(82)

=

ORDINARY DIFFERENTIAL EQUATIONS

We are integrating y', not y. So the derivatives are 1 higher than in Chap. 7.

e ee So e es

Thus, the predictor value minus the corrector value is h3

p —c =|

h3

true mia y'’’(0;) |— | true +75 y''' (02)

Assuming that the third derivative does not change much in the double interval (and if it did, we should have to decrease h until it did not), then Ue (6,) = y’’ (@.) and we have

p—c=—ighiy'” (6) Pn+1

p-c Yn+1

Xn-1

Xn

Xn+1

x

Thus, in a computation we naturally monitor the quantity prs — Cna1 where what we wrote as y,,, is now labeled c,,,, the corrected value. From these values we can deduce that the actual truncation errors are probably about -3 (Pn+:1 — Cn+i) in the predictor and 3 (Detar Ca io the corrector. Summary Predict Pn+i = Yn-1 + 2hyn

Evaluate Pasi

=F (Xn+1,P asi)

Correct

Ynti = Yn + 2 (Yn + Pasi) h

,

Evaluate Yner =F (Xn41,Ynsr)

/

8.5

ERROR ESTIMATES

We have played a bit fast and loose in our arguments. For example, the error in the trapezoid rule is based on the assumption that we have the correct y,,, value. If p,4; — Cas, were large, then we have cause to

doubt that this is true and we would be tempted to iterate the corrector several times. As a practical matter it is probably better to shorten the interval than it is to iterate the corrector. The following is a slight elaboration of the basic method, which somewhat increases the accuracy and is based on the observation that the entries in the p — c column change slowly (if they do not, then we shall have to shorten the interval). Since we recognize that p,4, — Cn+1 gives us an estimate of the error, it is tempting to use this infor-

mation to make predict

small corrections

as we go along. Thus, when

we

Pati = Yn-1 + 2hyn

we suspect that the modified value

Mens: = Past —F (Pu — Cn) would be more accurate than the predicted value. We then compute

Mass =f (Xn+1+Mnt1) and correct, using Cn+i =

h

Un +3 (mies * vs)

We take as the final value Ynti = Cnti + + (Psi — Cn+1)

and compute Yn+1 =f (Xn41 Uns)

In doing the computation by hand, we should use a table of the form

TABLE

et Xn

Yn

Yn

Pn

Mn

mM

Cn

Pn—Cn

210

ORDINARY DIFFERENTIAL EQUATIONS

8.6

An example using Euler’s method

As an example of the preceding, consider the differential equation y’=y?+1

y(0) =0 To get started by the machine-calculation method y’ (0) = 1, we get from the linear extrapolation

with h=%

and

y(—0.2) = 0 —0.2(1) = —0.2 y'(—0.2) = 1.04 Then the backward

corrector

y(—0.2) = 0 —0.1(1 + 1.04) = —0.204 y'(—0.2) = 1.042 and we can see that further repetition will produce negligible improve-

ment. Let us do the first step in some detail. Predict p(0.2) = — 0.204 + 0.4(1) = 0.196 p'(0.2) = 1 + (0.196)? = 1.038 c(0.2)

=0 + 0.1(1 + 1.038) = 0.204

The initial step needs to be especially carefully done; otherwise we start wrong and continue to go wrong!

Since p — c = 0.196 — 0.204 = —0.008 is rather large, we repeat the first corrector step to be safe.

c'(0.2) = 1 + (0.204)? = 1.042 c(0.2)

=0 + 0.1(1 + 1.042) = 0.204

With no change, we feel somewhat safe and go on to the next step.

p(0.4) = 0 + 0.4 (1.042) = 0.417 p'(0.4) = 1 + (0.417)? = 1.174 c(0.4) = 0.204 + 0.1(1.042 + 1.174) = 0.426

8.7

ADAMS-MOULTON METHOD

Now p —c=-—0.009, so that the error is around 0.002 on the c(0.4) value, and if this error is tolerable for each step (remember it accumulates along the solution), then we can continue; otherwise we should restart, using a smaller h, say, h = 0.10 or h = 0.15.

PROBLEMS

8.6

1

Continue the example until x = 1.0, and compare the answer with the true solution.

2

Integrate

y’ =y? + x? using

8.7

=6 =1,0

=0.1 and y(0) =0.

Adams-Moulton

method t

Although the modified Euler method is very effective, it is often preferable to use a higher-accuracy formula. A pair of such formulas is given by: Predictor

h , ‘ ; 251 Yati = Yn +34 (s5y x ~ O9Nn-1 + 97y'5-2 — 9yi-s) +550 h>y>(6) Corrector

: Uni1 =n tSh (Sunes + 1948: — Syi-n’ + Ui-n) — oq19 PU(A Thus, the ‘“‘predictor minus corrector” is

270 he y®) 720 and if we want to modify, we subtract

251 270

(Pr - Cn)

+See almost any standard text, for example, R. W. Hamming, ‘‘Numerical Methods for Scientists and Engineers,’ chap. 15, McGraw-Hill Book Company, New York, 1962, or A. Ralston, “A First Course in Numerical Analysis, chap. 5, McGraw-Hill Book Company, New York, 1965.

211

212

ORDINARY DIFFERENTIAL EQUATIONS

whereas to final adjust, we add

aig? (

270

Pn+1

—C

n+1

)

Another widely used method (which we will not derive because the derivation is not like anything else in numerical methods and therefore does not illustrate an important idea) is the Runge-Kutta method. Here we use the formulas (see the same references)

ky =f (*nYn)

k= f(a +5, peek: ;) h h ks = f(x, 5) Yn + ke 3)

k, =f (Xn st h, Yn ate ksh)

Xnt1 =XInth

Yn+i — Un +2 (ky + 2k, + 2ks + k,)

1

Take the first slope k,, and go halfway across the interval.

2

Use this slope kz, and again start and go halfway.

3 4

Use this slope k3, and go all the way. Average k,, kz, ks, and k,, and use this as the final slope.

This makes an excellent starting method for the Adams-Moulton method and is occasionally used to integrate an entire solution. The latter is not recommended because of the lack of error control and the excessive computational labor of the four function evaluations per step, but it does save programming time and effort if no library routine is available.

8.8

Step size

The starting step size to use can be found by experiment or experience or from physical intuition.

8.8

STEP SIZE

Later, the prs: — Cas; gives an indication of when to halve or double.

Suitable halving formulas for Adams-Moulton are

Yn-y2 = qe [45¥n + 72yn-1 + 11yn-2 + h(—9y) + 36), + 3y/-»)]

Yn-ai2 =

7p [110 + 72yn-1 + 45yn-2 — h(3y, + 36y4,_, — 9y'_,)]

For Euler’s modified method we can use

Ynty

h

Yntij2 = Sa

When

1 2 3

;

af 8 (yn' a Yns)

it comes to doubling, we can

Carry old back values. Restart. Use special formulas for a couple of steps.

For example, to use method 3 for Euler’s predictor, we need a formula of the form

Ynse = GoYn + G1 Yn—-1 + A(boyn + Bivn-s) To find such a formula, we proceed pretty much as usual and make the formula exact for (because the formula is around x = nh):

y=, y=1

x—nh, Se

(x — nh)?, =d,+a,

y=x—nh : 2h y=(x—nh)?: 4h? y=(x—nh)?: 8h?

= = =

—ha, +h(bo+b;) ha, + 2h?(—b;) —h%a,+3h3(b;)

whose solution is a)

(x — nh)

=—27

b, = 18

a,

= 28

b,=12

214

ORDINARY DIFFERENTIAL EQUATIONS

8.9

Systems of equations

firstAlthough we have apparently discussed only the solution of a system a handle easily can order differential equation, the methods of n first- order equations

yi =filtsyiYar > ++ Un) Yy2 =felXiYiYo

++

Yn)

Yn =fralXYi1Yo

Tes:

Yn)

by the simple process of doing each operation

in parallel on each

equation.

Higher-order equations such as

very = are easily reduced to a system of first-order equations by the nota-

tional trick of writing

then

and the equation y” + y = 0 is equivalent to the two equations

z’+y=0

y' =2% Thus, our discussion really covered systems of first- and higher-order equations.

PROBLEM 1

8.9

Reduce the system y" =f(x,y,y',%,2')

z"=g(x,y,y',2,2') to a system of four first-order equations.

8.10

8.10

LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS

Linear equations with constant coefficients

As a completely different approach which is based on the polynomial

approximation of a coefficient in the equation and not the polynomial approximation of the solution, consider the equation

y" + ay’ + by = f(x)

Me eis

y'(a)=y4

Let us approximate f(x) by a sequence of straight lines.

fx) Approximation F(x) = rr + qalx — x1)

x;

Xi+t

x

Thus, in the interval x; = x =x,;,,, we assume

F(x) = pi + gilx — x:) Given y(x) and y' (x) at the left of an interval, we can proceed to get the analytical solution of the equation in the interval as follows: We know the solution of the homogeneous equation is

c,e™

==) + csemale —*)

where m, and mz, are the roots of the characteristic equation

m?+am+b=0 and we assume

that m, # m,. A particular solution of the complete equation can be found by trying a solution of the form y = A; + B;(x —=

SRO

Re

ye

“moo of ssitay boteluvias ened! eau NSS Sw

: a)

j

oe

Won Oey

,

Bad, -.

=

yi

o

wm

iS

;

; _

ek

aa et _—.

beecoud.ot

wen erice ohua 3 ;,

a9)

820,t Wana



me

5

ey

of

:

+

a: a

a

ae | Pe sd

pritiere wen wo en eeuley eegvit eey nbO_ Sw tort wmsid od & arit of yibiqe! epwyho9 tity sioyo ort Yo noite ye) S iat bne eage ad MM xy

“hp S(t each jannil need bat 6 otsupe ta ont

ante

jee _bivow Aasteggs svuPtsh os san veal 6d a ilaaabae ci =

y Dt]

OPTIMIZATION

9.1

Introduction

Much

of the complex

art of engineering

is the art of optimization.

Therefore, it should be no surprise to the student to be told that in practice optimization can be very difficult, and only special cases can

be handied easily and reliably. Optimization is the third stage in the process of designing a system. The first stage is modeling, or simulating, the system. This stage includes the art and special knowledge of the field of application. Upon the quality of the modeling depends the value of the answer. All too often a poor model is accurately optimized; usually it is wiser to optimize a better model partially.

System Design

Step 1

Construct model.

Step 2

Construct objective function.

Step 3

Optimize.

The second step is to decide what is to be optimized, that is, to construct the so-called objective function that describes what is to be optimized (often the cost is to be minimized). This construction again lies in the field of application and outside the domain of a book on numerical methods. The construction of the objective function is often so difficult that in the past it has been left in a vague state of a general understanding of what is to be done, and this has at times led to poor workmanship because the designers did not clearly recognize what they were trying to do. However, let us be realistic and admit that in many applications it is not possible to say at the beginning just exactly exwhat is to be optimized; often it is only after the matter has been underis problem the of nature true amined from many sides that the stood to any extent. When we suppose that we are given an objective is well function to optimize, we are in fact supposing that the problem describe indeed does understood and that the function we are given the optimization problem accurately.

9)

222

OPTIMIZATION

Furthermore, in many cases of engineering design and evaluation, there are many goals to be considered. In these cases the optimizations for each of the goals must be combined at some time, either analytically through some new objective function that combines the various goals, or by some intuitive judgment. It is surprising how after long and careful analytical studies the final judgment is frankly a “hunch.” Optimization implies either maximizing or minimizing. But since the maximum of a function f(x) occurs at the same place as does the minimum of —f(x), and the extreme values are simply related, it is convenient to discuss only the minimization problem; this we will do.

Maximum

Minimum

The classical design methods often used the simple process of setting up the model plus the objective function (if only vaguely in the mind) and then adjusting one variable at a time in trying to minimize the numerical value of the objective function. In simple problems this may be satisfactory, but in more complex problems this method of “one variable at a time” is too slow and costly for practical use, and we now have methods which change many or all of the variables at the same time. Let us emphasize again, optimization is implied in almost every simulation or modeling, but the success of the optimization depends critically on the quality of the modeling and on the proper choice of the objective function. If either is not accurate or reasonably close to accurate, then it is highly unlikely that the result of a careful optimization will be of much value. Thus, before starting any optimization a careful examination should be made of the previous two steps.

9.2

9.2

RESULTS FROM THE CALCULUS

Results from the calculus

We begin with a brief review of optimization as it is usually taught in the calculus course. In a calculus course the function characteristically has a horizontal tangent at'the point where a maximum or minimum occurs. For a function of one variable this means that the derivative is zero at the point, whereas for a function of many variables it means that all the partial derivatives at the point are zero. In both cases this is not sufficient; it is only necessary.

Maximum

Maximum Minimum

| | | | |

Stationary

x

point

It is usually glossed over that the interval to be searched for extremes may be limited and that a special check needs to be made along the boundary. This remark will serve to recall that in fact what the method of “equating the derivative to zero’’ really found was relative maxima or minima, and that sometimes points which were found were neither; these are the so-called stationary points where the

function

is locally horizontal

but where

there are both larger and singularities, such

smaller values in the neighborhood. Furthermore, as cusps in the function, were tacitly excluded. y

Maximum

Cusp

223

224

OPTIMIZATION

For functions of a single variable, the additional test with the sign of the second derivative (supposing that the value of the second derivative is not zero) will separate the maxima and minima from each other as well as exclude the stationary points.

Second derivative test

Ret Holds water

VEeNS Does not hold water

For functions of more than one variable, beyond two variables and only observe that

: fash —Ff?rv)

the textbooks

>0

for frr

0

=0

undecided

Her) “su < jai

Ox,

j=

where x“ is the vector of the ith iteration. Both have their faults. In more detail, the method of steepest descent consists of the following steps: 1

Start at some suitable initial point x (a point having n coordinates). The ith iteration proceeds as follows.

9.6

2

STEEPEST DESCENT

Compute the negative gradient.

af OX)

Jat x=x

o| direction

The steps to be taken lie in this direction, that is,

af x)

= x, —h Bea

xj where h is a parameter.

Search in the negative gradient direction until three points are found for which the middle one has the least of the three values of the objective function. This can be done by stepping forward in equal-sized steps until a larger value than the immediately preceding one is reached. If the first step has this property, we bisect and try again, repeating the bisection until we find three equally spaced values with the desired property. Use quadratic interpolation about the middle point

ay 1,0 —#_,0 Sat a — 2, + a)

0: ¢

—h

h

x

to find the minimum along the line that was being searched, where

f (i) =

rh

f(xED

j

at

i—)

ax,

Thus, the new point is or

KD

= x,

—t—_

ax,

for each j.

5

Check to see if the stop criterion is met, and if not, return to step 2. Notice the that as the iterations progress, it will usually be necessary to decrease

search-step size h.

235

236

OPTIMIZATION

9.7

A variable metric method

Most variable metric methods can be considered quasi-Newton methods. In Newton’s method for finding a zero of a function of a valsingle variable, we use both the function and the first derivative should we ), derivative the of zero (a minimum a ues. In searching for use the derivative and the second derivative.

_ Ff (@) f" (a) For many variables we should need the inverse of the matrix of second partial derivatives (the Hessian matrix). ne

EE

a

Le

Se

Example Let f(x) be a quadratic

f(x) =at+ b’x+x"Qx Then the gradient is

u(x) = b + Ox Let x° be the minimum and x® any initial point. Then,

a(x) = 2(x*) + O(x* — x°) But g(x*) =0; hence

& =x?

OF) alt)

and if Q-) is known, then the minimum is found in one step. ee

se

—————————eE————e——E————————

If we had this inverse, and if the surface were exactly a quadratic, then we should expect to find the minimum in one step. In practice we neither evaluate the Hessian (since it is usually very expensive to do so), nor find the inverse; furthermore, the surface that we are searching for a minimum is not apt to resemble a quadratic until we are quite near the minimum. The variable metric, or quasi-Newton, methods all have the underlying idea of generating the direction of the next step of the search by multiplying the negative gradient by a matrix that in some sense is related to the inverse Hessian. Corrections in this matrix are later made so that it approaches the inverse Hessian. Perhaps the best known of these methods is the one made popular by Fletcher and

Powell. +R. Fletcher and M. J. D. Powell, “A Rapidly Converging Descent Method for Minimization,

Brit

Comp. J., vol 6, pp. 163 -168, 1963,

9.7

AVARIABLE METRIC METHOD

The steps are as follows: 1

Start with the matrix 7 as the initial guess H® for the matrix of quadratic terms. For the ith steps we proceed as follows: Compute the negative gradient just as before,

at

ax; Compute the new direction

of 8,0, =—-H (-—)

ax,

Find the step size t as before by quadratic interpolation.

Compute the vector

oj = ts, Compute the new value as before, x)

= x; + ts;

Compute the change in the negative gradient,

ifs (mer) la) @O=(—

of (+)

of)

SS

Compute the matrix

AO

oa, oT

ei (OT y (i) Go; ¥Y;

where the superscript T means the transpose.

Compute the matrix

OE

a otbial

MyOywor yO

10

Hw

Hy

Compute the matrix for the next iteration

Het) = HO + AW + BO 11

d, then return to step 2 for Check the stop criterion, and if it is not satisfie the next iteration.

237

238

OPTIMIZATION

9.8

Optimization subject to linear constraints

Up to now the problems in this chapter have been problems of unconstrained optimization. This has meant that in our search for a minimum of f(x:,...5 Xn), any value of the variables x,,...,%, Was a permissible one. There are many practical cases, however, in which physical or mathematical reasoning forces us to restrict our search to values of the variables which satisfy certain conditions.

Minimize

F(x) = (x1. — 4)? + (x2 — 4)? subject to Ko

X=

0

4x, +%,t1220 —x,—x,t420 X—x%2+520

These conditions, or constraints as we will call them from now on, can take on many forms. In this section and the next we shall examine some of the forms in which constraints appear and how they effect the

solution and we shall suggest methods for handling them. Some constraints are easier to handle than others. Consider, for example, nonnegative constraints, that is, each of the variables x,,..., x, is restricted

to nonnegative values. Under these conditions, the statement of the optimization problem would read: Minimize the function F(x,

ons

a8

ea)

subject to the constraints x, 20

(i=1,...,n)

The easiest way to solve this problem is through a transformation of variables. If we let x=y?

(=,

450, nN)

then we may minimize f as an unconstrained function of the y;,..., y, and be certain that the nonnegative constraints will be satisfied at the optimum. An alternative way is to apply the gradient or the quasi-Newton technique of Sec. 9.7 with the following modification: Whenever a variable, say x,, becomes zero and at the same time the

9.8

OPTIMIZATION SUBJECT TO LINEAR CONTRAINTS

xth component of the gradient is positive, then stop updating x, until of/dx;, changes sign. This is another way in which we can be sure that the search will be restricted to nonnegative values. The above tricks apply equally well when the variables are restricted to lying between lower and upper bounds, that is, when the conStraints are of the form x,—-1,=0

u;—x,

(i ll

att)

= 0

where the /; and u,; are given and, of course, u; > /,;. Here, we may again use a transformation on the variables. We may, for example, let x, = (u; —1,) sin? y, + 1; and solve the problem as unconstrained in terms of the y’s. Other transformations are possible; however, one has to be careful about introducing new local minima. This is because the minimization of f(y;,

.. + Yn) Will stop whenever

df/day;=0;i=1,...,n. But by the chain

rule,

of _ af ax dy,

AX; OY;

Therefore, any transformation on the x's which causes dx,/dy,; to be zero at a point which meets this constraint and is not on the boundary

will introduce a new minima. The constraint types discussed so far are linear and are special cases of the general linear constraints of the form

>

aisx,—b;

20

(GG=1,

285m)

i=1

Linear constraints are much easier to handle than the nonlinear ones,

which we discuss in the next section. The main reason for this is that the boundary of the feasible region is composed of straight lines in two dimensions, planes in three dimensions, and hyperplanes in higher dimensions. It is, consequently, easy to travel along a boundary during the search for the optimum. Several methods, developed especially for linearly constrained problems, take full advantage of this fact. Best known among these is the so-called gradient-projection method. This method has been designed primarily for linearly constrained problems with a nonlinear objective function; it is essentially

239

240

OPTIMIZATION

an extension of the steepest-descent method to linear constraints. This presentation is too lengthy to be included in this book and the reader is referred to the literature. A very important class of optimization problems is the one in which both the objective function and all the constraints are linear. Problems of this type, that is, minimize

Ae) = >, C1Xj i=1

subject to n

>

4%, —

b; 20

Giz Vigtecei tt)

i=1

are called linear programming problems, and they are important because of the enormous variety of applications which they have found in practical situations.} A little thought will make it intuitively clear that the solution of a linear program lies on the boundary of the feasible region, usually on a corner, but sometimes along one side of a constraint. This observation provides an efficient technique for solving linear programs. More details can, again, be found in the literature.

9.9

Nonlinear constraints—Lagrange multipliers

Up to now the constraints we have used have been linear inequalities. Had they been equalities, it would have been easy (in principle) to eliminate some of the variables and thus reduce the problem to one with no linear equality constraints. A constraint that is a nonlinear function of the variables makes the direct elimination appear to be difficult. Fortunately, the method of Lagrange multipliers makes it fairly easy in theory to handle the problem. Suppose we wish to find the stationary values (the extreme values plus possibly a few others) of the function

z = f(x,y) +Read, for example, the section beginning on p. 133 of ‘Nonlinear Mathematics ” by T. L. Saaty and J. Bram, McGraw Hill Book Company, New York, 1964.

tSee, for example, S. |. Gass, Book Company, 1969.

“Linear

Programming,”

3d ed., McGraw-Hill

9.9

NONLINEAR CONSTAINTS-LAGRANGE MULTIPLIERS

subject to the constraint

g(x,y) =0

alxy)=0 The proof is straightforward. g(x,y) =0 we find

Suppose

Bes

for the moment

that from

y = (x) Then we want to find the stationary values of

F(x, 6()) Thus, we differentiate fand set it equal to zero

of af



ax

=



ox

4H

af db_

‘anh

SF

oy dx

But g(x,y) =0 leads to og , og dy_ aeun aie”

or

om,

26 dd= 9

‘ax | dy dx

)

(9.2)

241

242

OPTIMIZATION

Eliminating d¢/dx from Eqs. (9.1) and (9.2), we get

of ag _ of 86

ax ay

dy ox

(9.3)

of og —+\— Es +r ay = 0

9.4 (9.4)

We now define \ by

and use this in Eq. (9.3) to eliminate dg/dy so that we get

OF, Gh Serene ax oy

dy ax

or

Fi, Miooa

ats

(F a0)

(ap me

Be

These two equations (9.4) and (9.5) together with g(x,y) =0 determine the quantities A, x, and y at the stationary point. These equations can be obtained formally by the following method: In place of f(x,y), we use

F (x,y) =f (x,y) + Ag(x,y) The derivatives

OF

107,

og

Ox adi dee orca

OF IS 32 CL oy

dy

oy

lead to the same equations as before and together with the constraint g(x,y) =0 determine the solution. For n variables x,, x2, ...,X, and the function

f =F X11%as ae

«1 Xn)

together with the m constraints

Ce

i

9.10

OTHER METHODS

leads to the function F=f+

181+

Asg2.+:

i “+

Amon

The n partial derivatives

oF net td

P (= Ter

1)

plus the m constraints

g,;=0

(1

rae

12)

determine the m + n unknown x, and A;. These equations must be satisfied at every extreme value of f unless all the Jacobians of the m functions g; with respect to every set of m variables chosen from Skene

9.10

«on. Xa 1S ZELO.

Other methods

The Lagrange multiplier approach is a classical one and has endured a long time because of its practical and theoretical importance. There are occasions, however, when the solution of the nonlinear equations resulting from this approach is quite difficult. In such cases, other methods are preferred in practice. One of the best-known alternative techniques for handling equality constraints was originated by R. Courant. In this method we minimize the function

Ptr ees ty) Fler

ead HPS [elie Bad)®

(7 > 0)

i=1

as an unconstrained problem for a sequence of monotonically increasing values of r. As r becomes infinitely large, the sum of the squares of the constraints is forced to zero. In this way, the sequence of the successive minima of F converges toward the solution of the given problem. We conclude this chapter with a few brief comments about nonlinear inequality constraints of the form giao

aca

0

(Leer UR casseats)

First let us examine the necessary conditions for the solution. It turns

243

244

OPTIMIZATION

out that here, as for the equality constraints, necessary conditions can be expressed through a Lagrange function. If a point x1, ...5 Xn isa minimum of the given objective function f(x;,..., %n) Subject to the above constraints, then it must be true that the negative gradient vector of f can be expressed as a linear combination of the negative gradients of the constraints which are binding. Geometrically this means that if the minimum point is on the boundary of the feasible region, then the negative gradient of f must fall within the cone which is formed by the gradients of the constraints which are binding. Constraints

————- > — Gradient

A little thought will show that if this were not true, then we should be able to move a little farther along the boundary to obtain a point at which f is smaller. There are two common philosophies in the various solution approaches suggested for problems with nonlinear inequality constraints: ‘‘the boundary-following approaches” and the “‘penaity-function techniques.” As their name implies, the boundary-following approaches suggest that when a constraint is (or is about to be) violated, then follow the

boundary of the feasible region defined

by that constraint

until a

point which satisfies the necessary conditions for a minimum is found. If the boundary is highly nonlinear, then the convergence of such approaches will be slow. In such cases, we prefer the penalty-function techniques.

9.10

OTHER METHODS

The idea of a penalty function was introduced earlier when we added the sum of the squares of equality constraints to f in order to “penalize” the minimization of the resulting function whenever a constraint is violated. A similar technique can be used for inequality constraints. Consider, for example, the functions

Be

Reo

het

m

Ma) Ft et

1

uilxys sss

0

Xp

oe

or

Fty,.-.. te)=f )—r Sin [eile .- 6s ta)

(r 0)

i=1

or

FC

=f)

+r SUB.a

where 2; = 0 if g; = 0, and 2; = g; if g; < 0. If we minimize these functions sequentially for a series of positive values for r (monotonically decreasing for the first two functions and increasing for the third one), then we approach a solution to the constrained problem.

245

.

get

ate

1h

of vaio. oo

99 ‘be? hod

aywy.-< pepe Sreaas ji Viteneg

4, oh atetitrterod

4

seepage

ae

aoiaanu! ic

reat

\ikeuge FO oniivess.

ef

(wd

onto

o£ agit BAT

Polesiminvn

> ip lenin?

scout

.€

ote pe ot) Oo) (ve ot sib: -

ae

Jochen

ail

A. Jesh

lernte.2or

castingveny(

ony

al sista

sbizany

;

;

Fle,

.

eps.



jimany

hex a

St Le

-

a

-

i"

es

"

wi bs

: a

(an

%)} ta i

ae

fs

iy

yt



PR

:

\e



a

ane

a

v

9 / =f ~

~~

*

.



ve

,



:

Liar...

i”

cethinl



A

re

Bf’ +



ni

=

.

;

(oka

ss

(eee

;

i

s

Lie

~

i

eo



{

.

es

.

z

.

y

Ty

(

it~

i

Mw

; oy — HP > 1 = A bas 65 ahem wer ache vines’ I. Sei we Bvt enNNGUpSs 2N0lISAUT weed asian Qi

Uns ahokgnyt ot tevit ull Vor grinee cet yileatnoto: om} 1 30h e8 *

hep aA OF Aoliuies & rsaoiqqe ewie 1) {ono pvt? ert sot

i

pan

snatde oe

7

_

_

at

J

pias”

at

“a

i

Any

=

re



,

Conn

i t= :

"

i

vo

Vm bee niles

i

«

fe

7) -

»

Pike =)

:

Wiy

:

a?

a Ls

,

7

j

farther

oho

a

. 7

a

»



7

Sponge ’



. wee

4

os

S

wh”?

0T Ge

ij mi

irc

» s



Wely wa

: ’

oS * ne ; ~

me :

-*

the

wit

reaio

a4

o

nee

whe

'

rm

1 “ Af

Seavey eT

ce

sia

¢

ame

‘weooreary

Yoh pint

oe

:

ore "i em!

-

< oa es

rr

ne 7

setee *

-

¢ Ww hed

¥

a

¢ orang

mnirenet

we te Bldaw,t" &

cant

. eae

Sree anna wh



a

:

ix

aa

wach can, aa athee he P ,aw Te 7 Seah“a ~ 7

—_ ;

_

LEAST SQUARES

10.1

The idea of least squares

Situations frequently arise in engineering and science in which there are more conditions to be satisfied than there are parameters to adjust. For example, we may be given a family of curves with, say, three Parameters, and we are required to find that member of the family which best approximates a set of M > 3 data points. The question is, of course: Best approximates in what sense? How shall we choose the particular member of the family?

x

One widely used method for selecting the values of the parameters that define the particular curve of best fit is to say they are given by the choice which minimizes the sum of the squares of the differences between the observations and the chosen curve at the corresponding points. Minimize M

os (fobs sgh (a k=l

This method is called the ‘‘least-squares method of fitting the curve.” Thus we have a special case of the general minimization problem which was treated in Chap. 9. The particular method of least squares has special features which deserve special treatment.

LO)

248

LEAST SQUARES

x

Before accepting this method as worth using, let us examine some of its features. It has been said that scientists and engineers believe that the principle of least squares is a mathematical theorem, whereas mathematicians believe it to be a physical law. Neither belief is correct; it is simply a convenient criterion for selecting a particular curve to fit some given data. The main characteristic of the method of leas squares is that it puts great emphasis on large errors, and very little emphasis on small errors; the method

will prefer 10 small errors 0

size 1 to one error of size 4.

4>

> (1)?= 10 k=1

An obvious fault of the method is that a gross blunder in the record ing of the data will usually completely dominate the result.

Probably a

~

More reasonable nano’

x

It is wise, therefore, to examine

the residuals—the

differences

be-

tween the observed data and the computer “data’’ from the approxi-

mating least-squares curve—to see if they appear to be reasonable or due to a single large error (or perhaps two large errors). Other choices of a criterion for selecting the particular member of the family of curves can be made, and of course different criteria give

10.1

THE IDEA OF LEAST SQUARES

different answers, sometimes quite different! The least-squares choice minimizes

but we might, for example, try to minimize M

> lel t=1

where the é; are the residuals.

This minimization

of the sum

of the

absolute values of the residuals is rarely used, probably because

it

leads to difficult mathematics. Another criterion, which we shall exam-

ine more closely in Chap. 13, is

min (f= a1,...,M et) which minimizes the maximum

residual error (€;).

x

In this chapter we shall confine ourselves to the least-squares criterion. Least-squares fitting is often regarded as a method of “smoothing the data” to eliminate unwanted ‘‘noise”’ or errors in the observations. Points on the least-squares curve are often regarded as the “smoothed values”

of the original data.

e OF ee ee e fees condiFrom the assumption that the errors satisfy certain plausible the tions, we can derive the least-squares criterion for selecting here.t so parameters, but we shall not do

Engineers," chap. 17, +R. W. Hamming, “Numerical Methods for Scientists and McGraw-Hill Book Company, New York, 1962.

249

250

LEAST SQUARES

10.2

The special case of a straight line

In order to approach the general method, let us start with the simple and very common problem of fitting a straight line y=ax+b to a set of data points (xi,41)

(j=1,...,M)

x

We want to pick two parameters a and b so that the calculated values

y(x;) =ax,+b are near the data points yj. Note that y; as used here means the data points (observations), whereas y(x;) are the calculated values

An example Observations:

(0, 0) (1, 1) (2, 1) (3, 4) (4, 4)

Guessed line

10.2

THE SPECIAL CASE OF ASTRAIGHT LINE

It is unreasonable to expect that the line will go exactly through each data point (x;,y;). Thus, we consider the residuals & =y; — y(x;)

= yi — (ax; + b) as a measure of the error. The least-squares criterion imizing M

requires min-

M

m(a,b) = '¥ &? => [yi — (ax, + b)]? i=1

where m(a,b) is the function of the parameters a and b that is to be minimized. As in the calculus, we differentiate with respect to each of the parameters and set the derivatives equal to zero.

om

25

[yi— (ax; + b)]x,;=0

i=1

Sp = ~2 3 (ui (ax, +b)]=0 Dropping the —2 factors and rearranging, we have

M

M

ayix a+b ~=1

Sua

>

i=1

i=1

aS x +b i=1

xvi

51=s Yi {=1

as the pair of linear equations to be solved for the unknown values of the parameters a and b.

NOTE:

S 1 =M and the symmetry of the coefficients. i=1

251

252

LEAST SQUARES

These equations are called the ‘‘normal equations,” and their solution determines the least-squares straight line (in this case).

Sum

x

y

x?

xy

0

0

0

0

1

1

4

1

2

1

4

2

3

4

9

12

4

4

16

16

10

10

30

31

ax30+bx10=31]+1 aX10+bx5=10|-2 10a =11 ll

a=

9 =i

5b =10=-]

10a

'. the line is

y=1.1x —4

PROBLEM 1

10.2

Find the least-square straight line fitting the data x

y

Nw Oo fF oo oO + & On

10.3

Polynomial approximation

A general polynomial of (fixed) degree N through M(>N) points is easy to do because the parameters (the coefficients of the polynomial)

10.3

POLYNOMIAL APPROXIMATION

occur linearly in the function, and as a result they also occur linearly in the normal equations that come from the differentiation process. To see how the method goes for the general polynomial, let the equation of the polynomial be

Uo

grt 4,2

0,8

e ay x”

(x1,44)

acs Ve pomilenae

and let the data be

We wish to minimize M

m(dy,.-+1 ay) =m (4,) => e? i=1

M =D [yi — (ao + a,x

+ + + ayxi"))?

i=1

with respect to the parameters a,,__. , ay. Differentiating m(a;) with respect to each a, and setting the result equal to zero, we have the N +1 equations

M dm(a;) o mae bY Yi - (a0 a 1%ey Bay

x) |x nXi i

i=1

=0

ee Ty er Fangs 5ON

Rearranging and writing out sponding normal equations

“repenPiedgeMeAR a a

> x

reer

+49

>a,"

detail,

in more

we

I Pe IE?

tess

=F

“thay >, aft)

rrr rrr rr rrr rrr rrr rrr rrr reer erie

=> purr

eee

eee

To simplify the notation we set t=

al

55

ett i

eS)

‘+ay > P= > yal

A ide a Ls

>

have

(R=

OM

(K=O

ane)

45 ae

2)

DN)

the corre-

254

LEAST SQUARES

The N + 1 equations can now be written as Sodo

Me Sia}

ae

S$1Qo ata Soa

a

Swyao te Sw4141

Ai

OO he

ae Swan

>

To

Bore Syaid

=

T,

aE Sondy

aS Ty

We need to solve these equations for the

N + 1 unknowns do, a;,..., ay. Note that there are only 3N + 2 sums to be found (2N + 1 S's and N+ 1 T's). It is easy to show that the determinant of these equations is not zero, for if the determinant were zero, then the homogeneous equations (with the right-hand sides all zero) would have a nonzero solution. This follows from the observations that D = O implies that any solution of the first N — 1 equations will automatically satisfy the last equation and that we can therefore assign any value we please to one of the variables in the first N — 1 equations. To show that the homogeneous.equations have only the zero solution, we multiply the first homogeneous equation by ay, the second by

a;,..., and the last by ay and then add all these equations to get » Sy 4,0;Sy45=>

ko

ee

lI

S apa;

3

> ets

i

E (Eas) (Sa)

=D v(x) u(x)

Dy? (x) =0 This requires that the polynomial y(x) of degree N be zero for the M > N values, x;,..., x4; hence y(x) = 0 for all x. Thus the homogeneous equations have only the zero solution, and therefore the original determinant was not zero—which is what we set out to prove.

Although the determinant is theoretically not zero, for N > 10 it is so small that it might as well be zero. As long as N is less than, say, 6, it is reasonable to solve the system of equations; around 10 or so, it is often difficult; and by N = 20, it is almost impossible. One solution to this trouble is given in the next chapter, on orthogonal functions.

PROBLEM ‘1

10.3

Sketch a flow diagram for computing the sums S, and T,.

10.5

10.4

THE GENERAL LINEAR CASE

Weighted least squares

It is a common experience that all the data points are not equally reliable. Typically, they tend to get less reliable as one approaches one (or both) ends of the range of measurements. One method of taking care of this matter of variable accuracy is to attach suitable weights w, = 0 to each term in the sum

that is being minimized. A brief examination of the process shows that the effect is merely that the sums M

Sp=>

wixf

i=1 M T,= >, wi yix*

i=1

now

include

the

weights

w;.

Otherwise,

there

are

no

significant

changes.

PROBLEM 1

10.5

10.4

Fill in the details of the derivation in this section.

The general linear case

Consider, next, the more general case in which the unknown coefficients a, occur linearly. In this case we are approximating (fitting, smoothing) with the function y(x) = asf (x) + dofe(x)

tes

+ anf y (x)

where the f;(x) are given functions of x. If the given data are (xi,Ui), i=1,...,MandM >N, then we want to minimize M m(d,) = pH e?

i=1

=3 {uy— [asf i(x;) + dofe(xi) + °° i=1

+ asfeleol}

255

256

LEAST SQUARES

As usual we differentiate with respect to each of the unknowns a, in turn and set the derivatives equal to zero (and neglect the factor of —2). We get

a Dfhfita DAf+ forj=1,...,N. points. We set

+ +an DHA=>d whi

The summations

are over i, the index of the sample

M

Sky = > Fly fj(i) = S suc i=1

M

Ty => vila) i=1

and obtain the ‘‘normal equations”

Sis@1

+ Sigde

+t

+++

+Siyay

=T;

SoG

+

+

se

=f Sowdy



S224

Swi@1 + Syo@_ +

T;

+ Syway = Ty

In the general case we can prove that the determinant is not zero in the same way as before provided the f;(x,) are all linearly independent (over the set of points x;,).

10.6

Nonlinear parameters

If the parameters of the family of curves do not occur in a linear fashion, then the normal equations are no longer linear and are hence usually more difficult to solve. For these cases we can use the methods in Chap. 9. The best simple strategy if there are some nonlinear parameters and some linear parameters seems to be as follows: Guess at a set of values for the nonlinear parameters, and then do a least-squares fit on the linear parameters. Regard the resulting sum of the squares of the residuals as a function of the nonlinear parameters only, and minimize it by the methods in Chap. 9.

10.7

GENERAL REMARKS

So ee SS Se

eee

Examples

y =a + be

jean

y

a IMMbx* ay

Pa

EE b,x?

a

as

1+box?

1+ b,x?

In the case

y=a-+ber Guess atac

value.

Fit a and b as before. Plot the sum of squares corresponding to c. Change c, using gradient

ha — ON

am

-

_

ace

> bxes*(y

—a

.

ec*)

and repeat cycle from 2 on.

In this case the function and gradient evaluation includes the minimization of the linear parameters each time.

10.7

General remarks

It frequently happens that the system of equations does not determine the parameters very accurately. This tends to worry the beginner.

ee

Interval of uncertainty

The reason they are poorly determined is often that the minimum is broad and rather flat. It is true that the optimum values are then poorly

257

LEAST SQUARES

known, but it is also true that whichever set you choose (among those that look good) does not very much affect the value of the function you were minimizing; thus, the uncertainty does you little harm. The method of least squares seems to be much more complex to the beginner than it is. The method is: You simply write down the expression for the sum of the squares of the residuals, decide on the parameters you have available, and then minimize the expression as a function of these parameters. The algebra may get messy, but the ideas are very simple.

= 5 lei 9 hes)]° i=1

The student who does several least-squares fittings will soon notice that the biggest errors often occur at the ends of the interval, which can be annoying. Using suitable weights can cure this if necessary.

10.8

The continuous case

Sometimes

we want to fit a function in an entire Interval by a least-squares approximation. Consider the specific problem of fitting

by the straight line Y=a+bx Proceeding as usual, we form the square of the residuals and ‘‘sum”’ over all the data; that is, we integrate over the range

mia.b) = f [ez — (a + bx)]? dx Now we differentiate with respect to the linear parameters

om : sen eh [e7

—a —bx]

1dx=0

10.8

dm

a,

THE CONTINUOUS CASE

1

2] [es —a — bx]x dx =0

Upon doing the integrations, we get (after dropping the factor —2)

iss)

|

ms

a

Il

I

s +

to

rmo|s NI& ole

Solving these equations, we get b = 6(3 — e) = 1.69 a=

4e—10

= 0.873

Hence,

Y = 0.873 + 1.69x is the least-squares approximating polynomial. The use of a positive (or at least a nonnegative) weight function in the continuous case follows exactly in the same process as in the discrete Case.

PROBLEM 1

10.8

Approximate

y=e7,

least-squares sense.

0=x k, then dkti dat

eee

and the functions are orthogonal over the inverval —1 =x 0 Define the new set f;(x) as

Fi(x)

filx) = Vi

Then

[ftar=1

The reason orthogonal functions play an important role in practical computing is that in a sense they are ‘more linearly independent” than the usual set of independent functions from which they are constructed. For example, in the interval (-1 = x = 1), the functions x” are similar to one another, but the Legendre polynomials differ rather more from each other than do the x".

PROBLEMS 1

2

11.4

Construct the first three Legendre polynomials by the orthogonalization process. Note that they differ from those listed by suitably chosen-scale factors. Modify the Gram-Schmidt process to construct orthonormal functions.

11.5

11.5

LEAST SQUARES

Least squares

Orthogonal functions are closely related to least squares, as is shown by the following theorem:

Theorem

In an orthogonal function approximation oo

F(x) = $\ a; f(x) j=0

the coefficients of the least-squares fit are given by the standard formula b

| F(x)f j(x) dx (A eed

a

D

jif? (x) dx a

Proof

As always in least squares we want to minimize

m aa [Fe _ > a; f;(x) ]dx j=0

We proceed in the standard way

om =—2 fe[Fo - fs a fla) fie(x) dx =0 Od, or

[FFenfete) dx =3 as |filerfelr a 6

oe

b

= ax [fit (2) ds proThe first importance of this theorem is that whenever the direct

of equations for the the equivalent then solve, to hard is that nts coefficie unknown that are trivis equation to orthogonal-function approach (which leads been pushed has trouble the sense, one al to solve) may be used. In

cess of least-squares fitting leads to a system

272

ORTHOGONAL FUNCTIONS

from solving the equations in the particular case to that of constructing the orthogonal functions in the general case. The Gram-Schmidt process of constructing the orthogonal set often gives roundoff trouble, and experience seems to indicate that for polynomial systems the three-term recurrence relation discussed in Sec. 11.8 provides a preferable approach in constructing them. However, once having found the orthogonal functions appropriate to the interval (set of points), then any specific set of data is easily processed. The second importance of this theorem is that we can determine the least-squares fit one term at a time rather than as in the direct method

where each time we change the degree of the polynomial we have to recompute all the coefficients (though not all the sums involved). An even better method, based on quasi-orthogonal functions, will be

given in Sec. 11.10.

11.6

The Bessel inequality

The quality of the least-squares fit with the use of orthogonal functions can be found in two ways. We will use a discrete set of points rather than a continuous interval to illustrate this result. The first direct way is to compute the sum of the squares of the residuals at the points i=1,2,...,M, M Di=>

é?

i=1

=> [Fee = > a; f3(x;) ] i=1

j=0

In the second method we merely transform this expression by expanding it out and making some simple substitutions, using the notation

M A= os £2 (xi) i=1

D?= > F(x;)—2 Sia3 F(x;)f5 (xi) + > y ajay S filedfela) i=]

j=0

M

N

a SS F?(x;) —2 i=1

j=0

N

ey a;h34; te oS a;*); j=0

M N im SFR) = 3) of Ay i=1

=i

J=0

j=0

k=0

i=1

11.6

THE BESSEL INEQUALITY

Thus, if we first compute M

> F(x) i=]

by summing over all the data points and then subtract the square of each coefficient (times the corresponding A;), then the result is the sum of the squares of the residuals for that many terms in the approximation. If we now graph this quantity D* = D?(N), we often get a clue to whether or not to include more terms in the expansion.

D*(N)

A better clue to when to stop in the process of fitting more and more terms in the least-squares approximations can be found from following the usual statistical practice of examining the residuals, in particular, examining the number of sign changes in the residuals. Residuals

Mainly noise

in Consider first the extreme condition of having only “noise” left indepenand the residuals. Supposing that the residuals are random expect dent of one another, we should, by using simple probability, half sign same the of one by followed be that a given residual would should we Thus time. the half sign opposite the of one by the time and If the expect to see about half the number of possible sign changes. eiexpected, number the from number of sign changes were different the of root square the times several by ther too low or too high, that there was number of possible changes, then we should suspect

273

274

ORTHOGONAL

FUNCTIONS

some kind of “‘signal” left and that our assumption of pure noise was

wrong. At the other extreme of no noise but all signal, then, at least for the typical orthogonal system of functions, we should expect that the residuals would have about one more sign change than had the last function we fitted, and a couple of extra sign changes above that would not worry us. Residuals

Mainly signal

The reason for this is that most orthogonal function systems have the property that the kth function has about k sign changes (this is exactly true of the important class of orthogonal polynomials; see Sec. 11.9). What we shall see in the residuals, of course, is probably a mixture of the two extremes, both some signal and some noise.

Noise plus signal

The noise oscillations will tend to increase the number of sign changes near the crossings due to the signal, and the more sign changes we see, the less we expect to reduce the sum of the squares if we fit a few more orthogonal functions. This theory is a “down to earth” theory and is not given in the usual statistical courses. The test that is currently in statistical good graces is quite complex to understand, has some dubious assumptions (all tests must make some assumptions), and probably is of less use to the practicing scientist who is not highly trained in statistics and is therefore not in a position to appreciate what the orthodox test

11.7

means. lf you do understand use it if it seems appropriate.

11.7

advanced

ORTHOGONAL POLYNOMIALS

statistics, then by all means

Orthogonal polynomials

If we choose the basic set of linearly independent functions as y;,(x) =x*, (k=0,1,...) (the fundamental theorem of algebra shows that they are linearly independent), then the corresponding kth orthogonal function is a polynomial of degree k and the set is called an “orthogonal polynomial set.’ The Legendre polynomials form the orthogonal polynomial set for the interval (-1 = x = 1) with weight

woe) = 1. It is easy to convert + dyx* to

an arbitrary polynomial

y(x) =a) +aix+---

asum y(x) =CoPo

tc,P, +: ++ +cyPy

of orthogonal polynomials. Take the orthogonal polynomial of degree N, and subtract from y(x) the appropriate multiple of the Nth orthogonal polynomial to make the term in x” exactly vanish. nnn

Example

EE EEE

aa

Write x3 in terms of Legendre polynomials, that is, x8

=ayP)>+a4,;P,+42P2+d3P3

or

n. {The term We then have the coefficient cy of Py(x) in the expansio but now system al polynomi e Legendr the ts ' Py(x) no longer represen difference y(x) The system.] ial polynom nal orthogo typical a ts represen remove the appro— cyPy is a polynomial of degree N — 1, and we next and so forth. vanish; to x‘-! in term the priate multiple of Py-; to cause

275

276

ORTHOGONAL FUNCTIONS

Sometimes there are tables giving the powers orthogonal polynomials; that is,

xk =aP,+aP,+-::+aPP,

of x as sums

of

fork=0,1,...

to make the conversion easy. The reverse process of going from the expansion in the orthogonal polynomials to a single polynomial is merely a matter of substituting the polynomial representation of P,.(x) in place of P,(x) and gathering together like terms in x. The processes are simple in theory; in practice the loss of accuracy due to cancellation of large numbers is another matter. It should be clear now that the problem of least-squares fitting a polynomial to some data is equivalent to first finding the appropriate orthogonal polynomials and then expanding the data in the formal orthogonal polynomial representation. Since this is so important a process, we will investigate the class of orthogonal polynomials a bit more. We shall prove thatt the nth polynomial f,(x) has exactly n real, distinct zeros in the interval (for the discrete set of points, n changes in sign). To prove this, assume the opposite, that there are only k < n distinct zeros. Corresponding to each zero of odd multiplicity, include one factor in Gua)

(a!

ey

x7) (Fxg)

eee

(eS

Zp)

where k’ is the number of odd multiplicity zeros and may be less than k. Now consider

[ Fataete dx >0 By hypothesis this has an integrand which does not change sign,t and hence the integral is clearly positive. On the other hand, we can write k

Qu(x) = >) a; fi(x) j=0

tit is necessary to have the weight function w(x) => 0, a

{We assumed w(x) = 0.

=x cos cos(2 22)ain(2m 22.)=pe (F oe Tyeon p=0

Bea (ie ata a (Em ED > sin (Fk 5) 8in Sk)

0

(k # m)

N

(k=m#0,N)

0

(k=m=0,N)

.

In fact we shall for future use prove more, namely, that when

k+m=0, +2N,+4N,... the orthogonality conditions have special features. We begin simply by examining the series 2N-1

> e2miL)jrp — > emispIN p=0

for integer j

p=0

which is a geometric progression with ratio r=

etiiIN

whose sum is

a seal desk

oe

0

(r ¥ 1)

2N This is true because j=0, +2N, +4N,....

(r = 1) e’i=1.

The

condition

r=1

means

We next show that the set of functions

e2ri/L) ke,

are orthogonal in the sense that the product of one function times the

12.3

THE FINITE FOURIER SERIES

complex conjugate of another function summed zero, that is, that

"saris, g-tritonry. = |9

(ik—m|# 0, 2N, AN.)

2N

p=0

over the points x, is

(|k



m|

=0,

2N,

4N,

..

.)

This follows immediately from the above by writing the product as 1 >

e(2miiL)( k-m) x,

p=0

and noting that k — m plays the role of j. We now return to the “real functions’ by using the Euler identity e*™=cosx+isinx

The condition for the single exponential summed over the points x, becomes two equations (the real and imaginary parts separately)

wt Oe = | Of >» COs TTJ*»~ \on

#0, +2N,+4N,...) (j=0,+2N,+4N,...)

2N-1

> sin oTin, =0

for all j

p=0

At last we are ready to prove the orthogonality of the Fourier functions over the set of equally spaced points x,. The first of the three orthogonality equations can be written, by using the trigonometric identity, as

cos a cos b= } {cos (a + b) + cos (a — b)] k

p

Sy

p

22 [cos m (k +m) x +608 m ( —m) x|

0 =—*, ft ogiCOS L (N — 1)x, cos L Nx sin sd F

aay Sin or

—1)x

FOURIER SERIES

|k + m|=0, 2N, 4N, .. . cannot occur unless k = m = 0 or N. Thus, we have the required orthogonality of the Fourier functions over the set ine

The orthogonality in turn leads directly to the expansion of an arbitrary function F(x) defined on the set of points x,, A

N-1

F(x) = suas (4.cos = kx + B, sin a Kt)+" cos STN L t= where 2N-


> F(x,) Sie These are effectively the trapezoid rule for numerically integrating the corresponding integrals in the continuous interval. There is a second set of equally spaced points that we need to mention, namely,

9 t=

Loe ; +0 =se(p +5)

which are midway between the x, points (recall that x, = 0 is the same AS Xv

due to periodicity).

12.4

RELATION OF THE DISCRETE AND CONTINUOUS EXPANSIONS

Repeating the steps briefly, we see that the sum

ance 0 wrij/L)tp —=ES S) eltrtittp p=0

(r ¥ 1)

(r = 1)

as before. Hence, the rest follows as before except the special treatment of the end values in the summations for the coefficients.

PROBLEMS

12.3

Find the finite Fourier expansion for:

1 The data x(0) = 0, x(1) = 1, x(2) = 2, x(3) = 3; N= 2. 2 The data x(3) =1, x(3) = 1, (2) =—1, (2) =-1;N=2, 12.4

Relation of the discrete and continuous expansions

It is reasonable to ask: What is the relation between the two expansions we have found, the continuous and the discrete? Let the continuous expansion be

an

F(x) == +

2

we

=

2a

(a cos —x+b,

:

L

Re

2ir

a. sineen+") 7

with lower-case letters for the coefficients. We pick x, =Lp/2N for convenience. If we multiply F(x,) by cos (22/L)kx, and sum, we get 2N-1

2r

pe F(x,) cos The = NAy = N(x + Gon—n + Gaven* **)

p=0

Hence, the coefficient

we calculate is

Ay =a, + py (Ganm-k + Gonm+k) m=1

which expresses the (upper-case) finite Fourier series coefficients in terms of the (lower-case) continuous Fourier series coefficients.

Aliasing

Frequency

291

FOURIER SERIES

Similarly,

By = be + SY (—Danm-t + Vawm+k) m=1

For the special constant term Ap, we have Ay=a,+2>

Qonm

m=1

Thus, various frequencies present in the original continuous signal together due to the sampling. This effect is called “aliasing” and is directly attributable to the act of sampling at equally spaced points; once the sampling has been done, the effect cannot be undone (from the samples alone). This aliasing is a well-known effect to the watchers of TV westerns. The effect of the TV sampling is to make the stagecoach wheels appear to go backward at certain forward speeds. Rotating helicopter blades also show this effect on TV or in the movies. Aliasing is also the basis of the stroboscopic effect that is used to observe rapidly rotating machinery with a flashing light, which causes the machinery to appear as if it were rotating slowly. F(x) are added

12.5

The fast Fourier series

How shall we compute the Fourier coefficients? The direct approach seems to require about

(2N)? multiplications and additions. Recently a method has been popularized by J. W. Tukey and J. W. Cooley that significantly reduces the number of arithmetic operations needed to compute the finite Fourier coefficients (and because of the symmetry between the formulas for coefficients and those for the value of the series, it also applies to evaluating Fourier series for equally spaced points). If the number of sample points is a power of 2 (or has many small factors), then the number of arithmetic operations by this method can be reduced from approximately N? to around N log N operations. This can be very significant for long runs of data and has produced a fundamental change

in what is currently practical. In order to present the essential idea of the Tukey-Cooley algorithm, we shall simplify the notation we are using. We will suppose that the Xp = Lp/2N and the length of the interval L = 1. We also assume that

12.5

THE FAST FOURIER SERIES

the number of sample points 2N (which could have been an odd number, but we assumed was even for convenience) can be factored,

2N = GH Then our sample points are

p %»p =a

GH

The coefficients of the Fourier expansion are

Ac=A(k) =Zar3) F(xp)e*t> p=0

We now write k=k,+k,G

p=potpiH where ky, < G, k, < H, po < H, and p, < G. Then,

A(ky + k,G)=2h a is bsrete Dot #) en 2tilko + kiG)(po+P, HIGH Po=0Pp

——

z e— 27 ike PoIGH en 27ik, Py iH po=0

ab (Sep + 2b G-1

Po

Pr)

em

,-amniko /G

pi=0

where we have used e~2rik p,

=1

We recognize that the contents of the brackets is the Fourier series of 1/H of the samples, phase-shifted »./GH. There are H such sums to be done. If we label these sums as

A (koPo) we have ee

A(ko + iG) ad G

ee em

> Alkop o)e-27leolen) + (k,/H)] Po

Po=0

to be evaluated. which is a second Fourier series, this time of H terms

293

294

FOURIER SERIES

When we count the operations, we find them proportional to

GH(G + B) Evidently, if the number of sample points had Go, ..., Gy, we Should have, by repeating this process, G,G,

:



G, (Gi

+G.+Gst+:

r

+

factors

G,,

G,)

operations. When the number of sample points is a power of 2, we get the log factor that we earlier announced. Much work has been done to optimize the fast Fourier series. Tukey and Cooley recommend finding factors of 2. Later examination of the problem of optimization produced the idea that it is better, when possible, to go by factors of 4; still later results suggest factors of 8 when possible. It is not the purpose of this book to try to produce the optimum library routine (because much still depends on the particular machine available) but rather to indicate reasonably efficient methods. The details of a very efficient Tukey-Cooley method are given in Mathematics of Computation, volume 19, 1965, pages 297 to 307.

12.6

Cosine expansion

If we use the interval (—L/2 = x = L/2), and from the definition of the function (0 = x = L/2), define F(x) to be an even function, that is,

F(—x) = F(x)

-L

x=0

L

x

Symmetry about x =0 An even function

(note that this does not introduce a discontinuity in the function), then the function is even and all sin terms will have zero coefficients.

12.6

Two

cases

occur

of

importance,

namely,

COSINE EXPANSION

x,=Lp/2N

and

tp = L(p + 3)/2N. For Xp,

A=

ww

pl F(0) +2 5 F(a, we amik/2N)p +F(L)|

In the case of f,, Area> r(Po3 etna [Ip + 1/2] Ar

The first is essentially the trapezoid rule, whereas the second is the midpoint formula. We shall use these in the next chapter.

n

A Pies

-ie Voi

is

i

7

4

1“

= to

enanhear

$enr ay lenare -

os

a

’ ,-



|

;

o

:

ae ale

Be

att

MBO

VERBHASKS

n

_ ’

a

_

}

,,

~"

atte wack

¥ :

=~ oeioe . :

rae

; it 45 NRE

4! ’

;

a

2

Net

=

ae

:

hes

the

wet

.y

, ‘

e



on

+ -

i



Tie

4

The

A

ha

7 é

2

ne

na Rae) Ss ey

i. rem

ee

ieee

siby->

va

=i &

|

dots * os wry

"

ipsa

—¥e

a

i

eo

=

eke ka eid

iteinet vbap aig

a



ee

—_

i

ns

Loaine » eapres ale : nt

; maery

mer he i

e ee eo ee rt _—™

iS

=

Salt"3 ites ae

cary

_

ve

orc

eee Ba

of

ae

_

ae

7



.

ay y

:

ne | a

“4 5 ees b Sawatet © FP pol earl

eee esl oe - rye ae Fhe eee

_

x9

ing."Sem)

Spohr

Fal

u

ees

©wi

\@

ye ihe

hs em

earad

ra

-

, as a ealre) 1 Poet os i a ate66

a

oon

ae



ae

s.r

|ry al ics

AST !

|

Lgpe Ss

“Le

Sethe me ee Mis moi stron, voiuf e 2 nape

ee

:

Ce

ieeata v

ss COWES. aga’ Bier WAT nMbead iy Hare Fide rat BYS aT VYPoem, eas Neer

eee:

~t

ip>-e*

aie

s +

ben er

the eyes ee: uy

Sty

> es

ie :

SIE ‘or

- ’

7 ‘ie

“{ oon"

‘ =

.

is

LagoaPk

engin)

ieee ‘a

Saget et eur ety baasonw’, wiht bloseqare ortyialt

;

7



ae

ae

J

y



:

wi


.

ar bp

DESIGN OF A LIBRARY

15.1

Why new library routines may be needed

The material in the preceding chapters has been selected for what is believed to be most needed by the practicing engineer or scientist,

and not for what is currently in textbooks and the literature. Therefore, it should not be surprising that the material is sometimes new, and not in the local library. Even when the material is well known, the presentation may be unusual. The selection of the material was also based on the observation that the scientist or engineer has a limited amount of time and effort that he can devote to the topic of numerical methods, and that although more material would of course be useful, it is often necessary to be practical and severely limit the chosen material. Thus, there is no pre-

tense that the material selected will meet all the needs that may arise, it is merely hoped that it will meet some of the more common needs. No attempt was made to pick the “‘best” methods in the sense of the efficient use of the machine time and capacity; rather, the methods were selected for the efficient use of the learner's time and effort. Furthermore, no attempt was made to be complete; most of the chapters more) could be, and some have been, the subject of a one-volume (or

work. is not Thus, the user of this book may often find that what he wants program a writing in the local computer library and he must consider necessary. for the library. A few remarks of warning are therefore

15.2

Theories of library design

to gather together a Most computing centers sooner or later decide user can have the average the that so s library of the ‘‘better’’ routine ones known to be avoid can and le availab readily approved processes are various library the “bad.”’ Behind the selection of the material for the Among stated. clearly beliefs, which, unfortunately, are seldom t differen many e practic in though ng, pure principles are the followi library lar particu many of on selecti the in used ly principles are general routines. 1. Get Rid of the Problem

The first method that all too often is

LS:

326

DESIGN OF A LIBRARY

used is simply to write something that is supposed to work so that the problem of supplying a library routine will be finished. This approach is most often used by professional programmers with no interest in, and usually little knowledge of, numerical methods. Unfortunately, the above has too often been an accurate characterization of the programs that were issued by the manufacturers of computers.

2. Save Machine Time Because the first routines received from the manufacturer were so exceedingly slow, the local group often wrote some very fast ones, but without considering the breadth of coverage that may be asked for, or the accuracy of the results. Indeed, often the answers were clearly wrong for simple problems. 3. Mathematical Completeness Another theory, which comes mainly from a revulsion from the first two, is to write a program that both will cover every possible situation that can arise and will be sure to get “the right answer.” This approach is likely to be taken by the pure mathematician who has recently become interested in computation. Such programs, if they work, are apt to consume large amounts of time covering the pathological cases that are more often conceptual than real (so far as practice is concerned), and large amounts of storage as well. The underlying principles of the routine are apt to be very hard to understand, and the answers, though “right,” are often not appropriate. And almost always the mathematical proof that the method works ignores the actual numbers used by computers and hence to some extent is irrelevant.

4. Do Not Offend Anyone This approach, instead of trying to cover every circumstance in a single library routine, supplies a wide assortment of different routines covering somewhat the same ground but with each one having special properties and advantages in special situations. The user is thus faced with a list, say, of a dozen slightly differing routines for integrating a system of ordinary differential equations, and he does not know which one to choose. This guiding principle allows the computer-center group to avoid the responsibility of selecting the better routines and thus to avoid offending anyone.

5. Garbage Disposal Another method is to decide that the various programs in method 4 can be combined (without regard to the quality of the individual parts) into one program with the user’s having the option to select the particular details that he wants. This can produce a very long calling sequence for the library routine. In order not to baffle the beginner, it is sometimes arranged that he only needs to mention the sequence variations that he wants; those he does not want he omits and thus gets the standard version in those parts. The describing memo is apt to be very thick and discouraging to read. 6. Black-box Library There is a school of thought which believes that the user need not understand the inside workings of the

15.2

THEORIES OF LIBRARY DESIGN

routine. But this is obviously false. In so simple and common a routine as that for finding the sine of an angle, unless the user realizes that in one way or another the input angle is converted to rotations and the integer part (in rotations) is dropped, he is not in a position to realize that for very large angles there must be a significant loss of accuracy. With more involved routines than for the sine it is even more difficult to understand what you have obtained if you do not understand the basic plan of solution. Thus, there is a real virtue in having library routines that are easy to understand; they save the user a lot of time and trouble when problems arise. Clear and easily read documentation is an important issue often overlooked by library planners. 7. A Guaranteed Library Another school of thought believes that every library routine should give answers accompanied with guaranteed error bounds. The quality of the error bounds may be dubious, but they must be rigorous bounds. Again, this luxury is nice to

have, but is quite likely to cost a good deal of machine speed and storage, and for the user it is apt-to be rather difficult to understand what is being done and how seriously he should take the reported bounds as being indicative of the probable error. The theory also supposes that the user looks at the output of the library computation before going on and that the routine is not buried inside a very large program, where in practice the error bounds are all too often ignored by the subsequent computation. Another difficulty with some routines is that 8. Trick Methods n is made the paramount goal then the optimizatio when machine to be machine dependent, not only in parlikely very is routine library ticular details, but often in the basic plan of attack. The user suffers in such circumstances because he cannot hope to know why the pro-

gram is written the way it is. Still another method, and one 9. Answer the Right Question the that the author of this book obviously believes in, is to try to have apsome not question, asked library routine answer the probably “multiple parently equivalent mathematical question. The finding of the best perhaps is ones” distinct close not and ones multiple roots as is routine library example of trying to find the right question that the asked. often so is that question supposed to answer rather than the The author also believes that a 10. A Library Is a Library of isolated programs, but rather on collecti a not is routines library of structure and must have an overall plan plus both a common internal a common

be a external structure in its programs. Thus, there should

on against errors, - common level of “error returns,” uniform protecti occur in various they when thing same the mean that error bounds as well. When re structu routines, and a uniform ‘‘calling sequence” used (even at be should tools possible, the same basic mathematical of learning burden the reduce to cy) efficien the cost of some machine that falls on the user.

327

DESIGN OF A LIBRARY

With all the criteria for selecting routines for a library, it should be clear that we are still very far from having any agreed-upon basis for selection. Unfortunately, we do not even know enough about what we want to do in order to say what cases we want to handle and with what frequencies the various subcases occur; hence optimization cannot be seriously considered. As an example, how often will multiple roots occur? If seldom, then a method that is very fast on simple zeros and rather slow on multiple ones may be preferable to one that is a bit slower on simple zeros but much faster on multiple ones. And so it goes; we simply do not know how to balance the gainin one characteristic against the loss in another, let alone what characteristics we most want. Too often machine speed is made the goal, rather than user efficiency. Meanwhile, something must be done and various groups do what they can to establish the local library routines. Little is being done to approach any general processes for selection of library routines. Often the selectors themselves have only a slight awareness of the bases for their preferences. Thus, the user must still wonder on what bases his local library routine was selected. All that can be said at this point is “beware.”

index

Absolute error, 20 Adams-Moulton method, 211 Aliasing, 292 Analytic substitution, 145, 177 Athanassopoulos, J., viii Augmented matrix, 136 Average, 167

Back substitution, 107 Bairstow’s method, 93, 231 Bessel's inquality, 272 Binary number system, 4

Cosine expansion, 294 Courant, R., 243 Cramer's rule, 134 Crude method, 61, 63

Delta operator, 155 Determinants, 127 minors of, 132 Difference of polynomials, 158 Direct method, 177 Direction field, 200, 201

conversion from, to decimal, 5

Binomial coefficient, 157 Bisection method, 36, 39 search method for, 42 Bound on roundoff, 168 Buffon's needle, 311

Chebyshev approximation, 249, 297, 305 Chebyshev identities, 303 Chebyshev principle, 301 Chebyshev polynomials, 298 orthogonality, 299 three-term relation, 299 Chopping, 16 Constraints, 238, 243 linear, 238 nonlinear, 240 Cooley, J. W., 292 Correlation, 171

Economization, 306 Epstein, M. P., viii Ergodic hypothesis, 170 Error estimate, 187 of ordinary differential equations, 207 Error term, 151, 153, 166 Eurler’s modified method, 203 Exponent, 11 Extrapolation, 144, 160

False position, 45 Fast Fourier transform, 292 Finite Fourier series, 287, 291

Fisher, R. A., and Yates, 280n. Fixed point numbers, 10 Fletcher, R., and M. J. D. Powell,

236n. Floating point numbers, 10, 11

330

INDEX

Fourier series, 283

complex, 288 finite, 287, 291

Gass, S. |., 240n. Gauss elimination, 105, 122 Gauss-Jordan process, 123 Gauss quadrature, 193 Gradient, 226, 228, 244 estimate of, 232 Gram-Schmidt method, 269 Gregory method, 188, 190, 197

Hamming, R. W., 211 and R. S. Pinkham, 189n.

lll-conditioned system, 117, 119, 122 Improved search, 72, 78 Integration, 175 definite, 175 indefinite, 175, 199 Interlacing zeros, 279 Interpolation, 143 inverse, 144 linear, 143 Inverse matrix, 124, 125 Kaiser, J. F., viii

Kayel, R. G., viii Knuth, D. E., 320n.

Lagrange multipliers,-240 Least squares, 247, 271 general case, 255 nonlinear parameters, 256 polynomial, 252 straight line, 250 weighted, 255 Legendre polynomials, 264, 275

Level curves, 100, 226 Library design, 325 Linear independence, 268

Mantissa, 11 Matrices, 136 Maximum and minimum, 222,

223 Mean-value theorem, 26, 151 Midpoint formula, 182 Minimization, 99 Modified false position, 47 Modified value, 209 Monte Carlo methods, 312 Morgan, S. P., viii

Newton’s method, 49, 95, 145 Noise, 3, 169, 322 Normal equations, 252, 253

Objective function, 221 Optimization, 221, 238 Order of magnitude, 1 Ordinary differential equations, 199 linear, 215 solution of, 199 starting, 205 systems of, 214 two-point, 217 Orthogonal Fourier series, 284, 288 Orthogonal functions, 261, 262 Orthogonal polynomials, 275 Orthonormal, 270 Overflow, 13

Pinkham, Roger S., viii, 189n.

Pivoting, 109 complete, 110 partial, 109

INDEX

Polynomial approximation, 146 Predictor-corrector, 203 Principle of the argument, 70, 73

Quadratic interpolation, 233, 235, 237 Quasi-orthogonal polynomials, 280

Ralson, A., 211 Random numbers, 309 distributions of, 320 generation of, 313, 319

Rank, 110, 135 Relative error, 20 Rodrigues’ formula, 265 Roundoff, 2, 16, 88, 163, 165, 167, 202 estimation in table, 172 in simultaneous equations, 112 theories of, 31 Runge-Kutta method, 212

Saaty, T. L., and J. Bram, 240n. Scaling, 83, 113, 115, 231 Schweikert, D. G., viii Secant method, 49 Shift operator, E, 156 Simpson's method, 184 composite, 184, 190, 191

Simultaneous linear equations,

105 Steepest descent, 233

Taylor expansion, 26 Three-term recurrence relation, 277 Trapezoid rule, 173 composite, 180 truncation of, 179 Truncation error, 3 Tukey, J. W., 292

Underflow, 13° Undetermined coefficients, 147 unstable, 118 u-v curves, 59, 64, 67, 70, 74

Vandermonde’s

determinant,

148, 149 Variable metric method, 236 Variance, 168

Wallis integral, 267 Zeros, 33, 57 complex, 57, 91 meaning of, 22 multiple, 52, 77, 87, 89, 102 polynomial, 81 quadratic, 23 real, 33, 86

331

saci _ cenolmips wal ‘

: AC TPAal ri ‘Bat neltsmixargqeisimonyiet

coosnatiumiéc

_

hE AoRenod-rataiber?|

2or

,

at oY moeNAaTA edt OSHS

Ef? _rreseed. weqoaZ j

GS

.idis

MBGAS

wOlyeT 7

8S 2ES|ngieioqaini serbeue

ay oi-aoiaT

WHilels) @orret9s)



Vs £ ‘4



an eizire sovi00 iwvogoive-asud .

bitiseqnT

BT

oof sl2oegmes

RST

-

™ _

ira)

Yo nbitesauy

aa

oath pldererny

28:

“CSE NO iio

he i* ate ro

mT rd



eer Ger”

hideer Waters, mG

-

aiem eideney

_*

aeet.

iri re

:

ae ath 33"ma

= BBnormebany

“gar acne?

¢ wiGetw

enneatarn ing 598:moth wn Wee

Lady aw 04:

iw tts, aa

|

:

-

\e)_.€2 801

re ., Smet ex SS to’ Qaingem SOF 28 18S Se wight “he

r ta)

9 nah

iskee> —_

>

“~

i #,, ae tsimanylog

x SaES88.citaybaup Bf Jew

&

ox

j

_

i

Ts _

“a

=

-

a4

si

iar

az, £

Hip 25 -;

vs ne



-



Sa, veon asf, re >ae _ >

%

(es, om 7

a

isuctiakarsSGD

7:

_

_

se=

jvm, 5 wy ae

tas, OY

ors |

as

2a

coe a nnasn’ & aa: . =

rs

=

moby:

zit

noi S1siwp

a"

d

i

RA no eon smn

Se ray so 9 98 aarned Yu o>

TF

Leae

- )

& tone aoisoneiT Ser wee eaut

z



*

CATALOG OF DOVER BOOKS

Mathematics-Bestsellers HANDBOOK

OF MATHEMATICAL

FUNCTIONS:

with Formulas, Graphs, and

Mathematical Tables, Edited by Milton Abramowitz and Irene A. Stegun. A classic resource for working with special functions, standard trig, and exponential logarithmic definitions and extensions, it features 29 sets of tables, some to as high as 20 places. 1046pp, 0-486-61272-4

8x 10 1/2.

ABSTRACT AND CONCRETE CATEGORIES: The Joy of Cats, JiriAdamek, Horst Herrlich, and George E. Strecker. This up-to-date introductory treatment employs category theory to explore the theory of structures. Its unique approach stresses concrete categories and presents a systematic view of factorization structures. Numerous examples. 0-486-46934-4 1990 edition, updated 2004. 528pp. 6 1/8 x 9 1/4. MATHEMATIGS:

Its Content, Methods

and Meaning, A. D. Aleksandrov,

A. N.

Kolmogorov, and M. A. Lavrent'ev. Major survey offers comprehensive, coherent discussions of analytic geometry, algebra, differential equations, calculus of variations, functions of a complex variable, prime numbers, linear and non-Euclidean geometry, topology, 0-486-40916-3 functional analysis, more. 1963 edition. 1120pp. 5 3/8 x 8 1/2.

INTRODUCTION TO VECTORS AND TENSORS: Second Edition-Two Volumes Bound as One, Ray M. Bowen and C.-C. Wang. Convenient single-volume compilation of two texts offers both introduction and in-depth survey. Geared toward engineering and science students rather than mathematicians, it focuses on physics and engineering appli0-486-46914-X cations. 1976 edition. 560pp. 6 1/2 x 9 1/4.

AN INTRODUCTION TO ORTHOGONAL POLYNOMIALS, Theodore S. Chihara. ion theoConcise introduction covers general elementary theory, including the representat the recurrence rem and distribution functions, continued fractions and chain sequences,

x 8 1/2. formula, special functions, and some specific systems. 1978 edition. 272pp. 5 3/8 0-486-47929-3

STS, Paul ADVANCED MATHEMATICS FOR ENGINEERS AND SCIENTI algebra, callinear DuChateau. This primary text and supplemental reference focuses on partial differential culus, and ordinary differential equations. Additional topics include edition. 400pp. 1992 problems. solved Includes methods. equations and approximation 71/2x9

1/4.

0-486-47930-7

AND ENGINEERS, PARTIAL DIFFERENTIAL EQUATIONS FOR SCIENTISTS differential equapartial solve and e formulat to how shows text StanleyJ.Farlow. Practical problems, elliptic-type probtions. Coverage of diffusion-type problems, hyperbolic-type available upon request. 1982 lems, numerical and approximate methods. Solution guide edition. 414pp. 6 1/8 x 9 1/4.

0-486-67620-X

ARY PROBLEMS, Avner VARIATIONAL PRINCIPLES AND FREE-BOUND nal methods in partial differFriedman. Advanced graduate-level text examines variatio undary problems. Features free-bo to ions applicat their es illustrat and ns ential equatio parabolic operators. 1982 edition. detailed statements of standard theory of elliptic and 0-486-47853-X 720pp. 6 1/8 x 9 1/4. THEORY, Steven A. Gaal. Unified LINEAR ANALYSIS AND REPRESENTATION and operator algebras on Hilbert rs operato of theory the from treatment covers topics ical groups; and the theory of Lie topolog for theory spaces; integration and representation . 704pp. 6 1/8 x9 bagn pesos edition algebras, Lie groups, and transform groups. 1973

lications.com Browse over 9,000 books at www.doverpub

CATALOG OF DOVER BOOKS

A SURVEY OF INDUSTRIAL MATHEMATICS, Charles R. MacCluer. Students learn how to solve problems they'll encounter in their professional lives with this concise single-volume treatment. It employs MATLAB and other strategies to explore typical industrial problems. 2000 edition. 384pp. 5 3/8 x 8 1/2.

0-486-47702-9

NUMBER SYSTEMS AND THE FOUNDATIONS OF ANALYSIS, Elliott Mendelson. Geared toward undergraduate and beginning graduate students, this study explores natural numbers, integers, rational numbers, real numbers, and complex num-

bers. Numerous exercises and appendixes supplement the text. 1973 edition. 368pp. 5 3/8 x 8 1/2. 0-486-45792-3

A FIRST LOOK AT NUMERICAL FUNCTIONAL ANALYSIS, W. W. Sawyer. Text by renowned educator shows how problems in numerical analysis lead to concepts of functional analysis. Topics include Banach and Hilbert spaces, contraction mappings, convergence, differentiation and integration, and Euclidean space. 1978 edition. 208pp. 5 3/8 x 8 1/2. 0-486-47882-3 FRACTALS,

CHAOS, POWER

LAWS: Minutes from an Infinite Paradise, Manfred

Schroeder. A fascinating exploration of the connections between chaos theory, physics, biology, and mathematics, this book abounds in award-winning computer graphics, optical illusions, and games that clarify memorable insights into self-similarity. 1992 edition. 448pp. 6 1/8 x 9 1/4. 0-486-47204-3

SET THEORY AND THE CONTINUUM PROBLEM, Raymond M. Smullyan and Melvin Fitting. A lucid, elegant, and complete survey of set theory, this three-part treatment explores axiomatic set theory, the consistency of the continuum hypothesis, and forcing and independence results, 1996 edition. 336pp. 6 x 9. 0-486-47484-4 DYNAMICAL systems

SYSTEMS, Shlomo Sternberg. A pioneer in the field of dynamical

discusses

one-dimensional

dynamics,

differential equations,

random

walks,

iterated function systems, symbolic dynamics, and Markov chains. Supplementary materials include PowerPoint slides and MATLAB exercises. 2010 edition. 272pp. 61/8 x9 1/4. 0-486-47705-3

ORDINARY DIFFERENTIAL EQUATIONS, Morris Tenenbaum and Harry Pollard. Skillfully organized introductory text examines origin of differential equations, then defines basic terms and outlines general solution of a differential equation. Explores integrating factors; dilution and accretion problems; Laplace Transforms; Newton's Interpolation Formulas, more. 818pp. 5 3/8 x 8 1/2. 0-486-64940-7 MATROID THEORY, D. J. A. Welsh. Text by a noted expert describes standard examples and investigation results, using elementary proofs to develop basic matroid properties before advancing to a more sophisticated treatment. Includes numerous exercises. 1976 edition. 448pp. 5 3/8 x 8 1/2. 0-486-47439-9 THE CONCEPT OF A RIEMANN SURFACE, Hermann Weyl. This classic on the general history of functions combines function theory and geometry, forming the basis of the modern approach to analysis, geometry, and topology. 1955 edition. 208pp. 5 3/8 x 8 1/2, 0-486-47004-0 THE LAPLACE TRANSFORM,

David Vernon Widder. This volume focuses on the

Laplace and Stieltjes transforms, offering a highly theoretical treatment. Topics include

fundamental formulas, the moment problem, monotonic functions, and Tauberian theo-

rems, 1941 edition. 416pp. 5 3/8 x 8 1/2.

0-486-47755-X

Browse over 9,000 books at www.doverpublications.com

CATALOG OF DOVER BOOKS

Mathematics-Logic and Problem Solving PERPLEXING PUZZLES AND TANTALIZING TEASERS, Martin Gardner. Ninetythree riddles, mazes, illusions, tricky questions, word and picture puzzles, and other challenges offer hours of entertainment for youngsters. Filled with rib-tickling drawings. Solutions. 224pp. 5 3/8 x 8 1/2.

0-486-25637.5

MY BEST MATHEMATICAL AND LOGIC PUZZLES, Martin Gardner. The noted expert selects 70 of his favorite "short" puzzles. Includes The Returning Explorer, The

Mutilated Chessboard, Scrambled Box Tops, and dozens more. Complete solutions in-

cluded. 96pp. 5 3/8 x 8 1/2.

0-486-28152-3

THE LADY OR THE TIGER?: and Other Logic Puzzles, Raymond M. Smullyan. Created by a renowned puzzle master, these whimsically themed challenges involve paradoxes about probability, time, and change; metapuzzles; and self-referentiality. Nineteen chapters advance in difficulty from relatively simple to highly complex. 1982 edition. 0-486-47027-X 240pp. 5 3/8 x 8 1/2. M. Smullyan. Raymond Puzzles, SATAN, CANTOR AND INFINITY: Mind-Boggling A renowned mathematician tells stories of knights and knaves in an entertaining look at the logical precepts behind infinity, probability, time, and change. Requires a strong background in mathematics. Complete solutions. 288pp. 5 3/8 x 8 1/2. 0-486-47036-9

THE RED BOOK OF MATHEMATICAL PROBLEMS, Kenneth S. Williams and Kenneth Hardy. Handy compilation of 100 practice problems, hints and solutions indispensable for students preparing for the William Lowell Putnam and other mathematical competitions. Preface to the First Edition. Sources. 1988 edition. 192pp. 0-486-69415-1 5 3/8 x8 1/2. KING ARTHUR IN SEARCH OF HIS DOG AND OTHER CURIOUS PUZZLES, features Raymond M. Smullyan. This fanciful, original collection for readers of all ages arithmetic puzzles, logic problems related to crime detection, and logic and arithmetic 3/8 x 8 1/2. puzzles involving King Arthur and his Dogs of the Round Table. 160pp. 5 0-486-47435-6 of Mathematics, UNDECIDABLE THEORIES: Studies in Logic and the Foundation Robinson. This M. Raphael and i Mostowsk Andrzej with tion collabora in Tarski Alfred Method in General "A treatises: three of consists logician well-known book by the famed ility in Mathematics," Proofs of Undecidability,” "Undecidability and Essential Undecidab edition. 112pp. 5 3/8 x and "Undecidability of the Elementary Theory of Groups." 1953 0-486-47703-7 8 1/2.

tion of essential topics LOGIC FOR MATHEMATICIANS,J. Barkley Rosser. Examina a major addition to the tedly "Undoub logic. in und backgro no assumes s and theorem edition.

ical Society. 1978 literature of mathematical logic." — Bulletin ofthe American Mathemat 0-486-46898-4 592pp. 6 1/8 x 9 1/4.

MATHEMATICS, Andrew INTRODUCTION TO PROOF IN ABSTRACT constitutes an acceptable what s student teaches text e Wohlgemuth. This undergraduat s as well as those requirproblem routine of proof, and it develops their ability to do proofs 0-486-47854-8 1/4. 9 x 1/2 6 384pp. edition. 1990 . insights ing creative , Patrick Suppes and Shirley Hill. FIRST GOURSE IN MATHEMATICAL LOGIC range of ation and context for wide Rigorous introduction is simple enough in present and validity; truth tables; terms, truth ce; inferen logical es; sentenc izing students. Symbol laws of identity; more. 288pp. and cation specifi sal univer redicates, universal quantifiers; 0-486-42259-3 5 3/8 x 8 1/2.

lications.com Browse over 9,000 books at www.doverpub

CATALOG OF DOVER BOOKS

Mathematics-Algebra and Calculus VECTOR CALCULUS, Peter Baxandall and Hans Liebeck. This introductory text offers a rigorous, comprehensive treatment. Classical theorems of vector calculus are amply illustrated with figures, worked examples, physical applications, and exercises with hints and answers. 1986 edition. 560pp. 5 3/8 x 8 1/2. 0-486-46620-5 ADVANCED CALCULUS: An Introduction to Classical Analysis, Louis Brand. A course in analysis that focuses on the functions of a real variable, this text introduces the basic concepts in their simplest setting and illustrates its teachings with numerous examples, theorems, and proofs. 1955 edition. 592pp. 5 3/8 x 8 1/2. 0-486-44548-8 ADVANCED CALCULUS, Avner Friedman. Intended for students who have already completed a one-year course in elementary calculus, this two-part treatment advances from functions of one variable to those of several variables. Solutions. 1971 edition. 432pp.

5 3/8 x 8 1/2.

0-486-45795-8

METHODS OF MATHEMATICS APPLIED TO CALCULUS, PROBABILITY, AND STATISTICS, Richard W. Hamming. This 4-part treatment begins with algebra and analytic geometry and proceeds to an exploration of the calculus of algebraic functions and transcendental functions and applications. 1985 edition. Includes 310 figures and 18 tables. 880pp. 6 1/2 x 9 1/4. 0-486-43945-3 BASIC ALGEBRA I: Second Edition, Nathan Jacobson. A classic text and standard reference for a generation, this volume covers all undergraduate algebra topics, including groups, rings, modules, Galois theory, polynomials, linear algebra, and associative alge-

bra, 1985 edition. 528pp. 6 1/8 x 9 1/4.

0-486-47189-6

BASIC ALGEBRA II: Second Edition, Nathan Jacobson. This classic text and standard reference comprises all subjects of a first-year graduate-level course, including in-depth coverage of groups and polynomials and extensive use of categories and functors. 1989 edition. 704pp. 6 1/8 x 9 1/4. 0-486-47187-X CALCULUS: An Intuitive and Physical Approach (Second Edition), Morris Kline. Application-oriented introduction relates the subject as closely as possible to science with explorations of the derivative; differentiation and integration of the powers of x; theorems on differentiation, antidifferentiation; the chain rule; trigonometric functions; more. Examples. 1967 edition. 960pp. 6 1/2 x 9 1/4. 0-486-40453-6

ABSTRACT ALGEBRA AND SOLUTION BY RADICALS, John E. Maxfield and Margaret W. Maxfield. Accessible advanced undergraduate-level text starts with groups, rings, fields, and polynomials and advances to Galois theory, radicals and roots of unity, and solution by radicals. Numerous examples, illustrations, exercises, appendixes. 1971

edition. 224pp. 6 1/8 x 9 1/4.

0-486-47723-1

AN INTRODUCTION TO THE THEORY OF LINEAR SPACES, Georgi E. Shilov. Translated by Richard A. Silverman. Introductory treatment offers a clear exposition of algebra, geometry, and analysis as parts of an integrated whole rather than separate subjects. Numerous examples illustrate many different fields, and problems include hints or answers. 1961 edition. 320pp. 5 3/8 x 8 1/2. 0-486-63070-6 LINEAR ALGEBRA, Georgi E. Shilov. Covers determinants, linear spaces, systems of linear equations, linear functions of a vector argument, coordinate transformations, the

canonical form of the matrix of a linear operator, bilinear and quadratic forms, and more. 387pp. 5 3/8 x 8 1/2. 0-486-63518-X

Browse over 9,000 books at www.doverpublications.com

CATALOG OF DOVER BOOKS Mathematics-Probability and Statistics BASIC PROBABILITY THEORY, Robert B. Ash. This text emphasizes the probabilistic way of thinking, rather than measure-theoretic concepts. Geared toward advanced undergraduates and graduate students, it features solutions to some of the problems. 1970 edition. 352pp. 5 3/8 x 8 1/2. 0-486-46628-0

PRINCIPLES OF STATISTICS, M. G. Bulmer. Concise description of classical statistics, from basic dice probabilities to modern regression analysis. Equal stress on theory and applications. Moderate difficulty; only basic calculus required. Includes problems with answers. 252pp. 5 5/8 x 8 1/4. 0-486-63760-3

OUTLINE OF BASIC STATISTICS: Dictionary and Formulas, John E. Freund and Frank J. Williams. Handy guide includes a 70-page outline of essential statistical formulas covering grouped and ungrouped data, finite populations, probability, and more, plus over 1,000 clear, concise definitions of statistical terms. 1966 edition. 208pp. 0-486-47769-X 5 3/8 x 8 1/2.

GOOD THINKING: The Foundations of Probability and Its Applications, Irving J. Good. This in-depth treatment of probability theory by a famous British statistician explores Keynesian principles and surveys such topics as Bayesian rationality, corroboration, hypothesis testing, and mathematical tools for induction and simplicity. 1983 edition. 0-486-47438-0 352pp. 5 3/8 x 8 1/2.

INTRODUCTION APPLICATIONS,

TO

PROBABILITY

THEORY

WITH

CONTEMPORARY

Lester L. Helms. Extensive discussions and clear examples, written

in plain language, expose students to the rules and methods of probability. Exercises foster problem-solving skills, and all problems feature step-by-step solutions. 1997 edition. 0-486-47418-6 368pp. 6 1/2 x 9 1/4. CHANCE,

LUCK, AND STATISTICS, Horace C. Levinson. In simple, non-technical

language, this volume explores the fundamentals governing chance and applies them to Scenty sports, government, and business. "Clear and lively ... remarkably accurate." — 0-486-41997-5 Monthly. 384pp. 5 3/8 x 8 1/2.

NS, FIFTY CHALLENGING PROBLEMS IN PROBABILITY WITH SOLUTIO and Frederick Mosteller. Remarkable puzzlers, graded in difficulty, illustrate elementary ingeneral originality, for selected were problems These . probability of aspects advanced solutions. detailed includes Also techniques. valuable e demonstrat terest, or because they 0-486-65355-2 88pp. 5 3/8 x 8 1/2. k for those seekEXPERIMENTAL STATISTICS, Mary Gibbons Natrella. A handboo g, constructing, ing engineering information and quantitative data for designing, developin analyzing of extremethe ts, experimen of planning the Covers t. equipmen testing and 76 tables. 560pp. 8 3/8 value data: and more. 1966 edition. Index. Includes 52 figures and x1.

0-486-43937-2

L. Nelson. Coherent inSTOCHASTIC MODELING: Analysis and Simulation, Barry ical, numerical, and simulatroduction to techniques also offers a guide to the mathemat analysis, and interpretation tion tools of systems analysis. Includes formulation of models, 0-486-47770-3 1/4. 9 x 1/8 6 336pp. edition. of results. 1995 Robert R. Sokal and F. INTRODUCTION TO BIOSTATISTICS: Second Edition, in mathematics, this ound backgr minimal a with aduates undergr James Rohlf. Suitable for tions and the testdistribu ental fundam to s introduction ranges from descriptive statistic ms and examples. 1987 edition. proble -out worked us numero s Include ses. hypothe of ing 0-486-46961-1 384pp. 6 1/8 x 9 1/4.

lications.com Browse over 9,000 books at www.doverpub

CATALOG OF DOVER BOOKS

Mathematics-Geometry and Topology PROBLEMS AND SOLUTIONS IN EUCLIDEAN GEOMETRY, M. N. Aref and William Wernick. Based on classical principles, this book is intended for a second course in Euclidean geometry and can be used as a refresher. More than 200 problems include 0-486-47720-7 hints and solutions. 1968 edition. 272pp. 5 3/8 x 8 1/2. TOPOLOGY OF 3-MANIFOLDS AND RELATED TOPICS, Edited by M. K. Fort, Jr. With a New Introduction by Daniel Silver. Summaries and full reports from a 1961 conference discuss decompositions and subsets of 3-space; n-manifolds; knot theory; the Poincaré conjecture; and periodic maps and isotopies. Familiarity with algebraic topology required. 1962 edition. 272pp. 6 1/8 x 9 1/4. 0-486-47753-3 POINT SET TOPOLOGY, Steven A. Gaal. Suitable for a complete course in topology, this text also functions as a self-contained treatment for independent study. Additional enrichment materials make it equally valuable as a reference. 1964 edition. 336pp. 5 3/8 x 8 1/2. 0-486-47222-1

INVITATION TO GEOMETRY, Z. A. Melzak. Intended for students of many different backgrounds with only a modest knowledge of mathematics, this text features selfcontained chapters that can be adapted to several types of geometry courses. 1983 edition. 240pp. 5 3/8 x 8 1/2. 0-486-46626-4

TOPOLOGY AND GEOMETRY FOR PHYSICISTS, Charles Nash and Siddhartha Sen. Written by physicists for physics students, this text assumes no detailed background in topology or geometry. Topics include differential forms, homotopy, homology, cohomology, fiber bundles, connection and covariant derivatives, and Morse theory. 1983

edition. 320pp. 5 3/8 x 8 1/2.

0-486-47852-1

BEYOND GEOMETRY: Classic Papers from Riemann to Einstein, Edited with an Introduction and Notes by Peter Pesic. This is the only English-language collection of these 8 accessible essays. They trace seminal ideas about the foundations of geometry that led to Einstein's general theory of relativity. 224pp. 6 1/8 x 9 1/4. 0-486-45350-2

GEOMETRY FROM EUCLID TO KNOTS, Saul Stahl. This text provides a historical perspective on plane geometry and covers non-neutral Euclidean geometry, circles and regular polygons, projective geometry, symmetries, inversions, informal topology, and more. Includes 1,000 practice problems. Solutions available. 2003 edition. 480pp. 6 1/8 x 9 1/4. 0-486-47459-3 TOPOLOGICAL VECTOR SPACES, DISTRIBUTIONS AND KERNELS, Frangois ‘Tréves. Extending beyond the boundaries of Hilbert and Banach space theory, this text focuses on key aspects of functional analysis, particularly in regard to solving partial differential equations. 1967 edition. 592pp. 5 3/8 x 8 1/2.

0-486-45352-9 INTRODUCTION TO PROJECTIVE GEOMETRY, C. R. Wylie, Jr. This introductory volume offers strong reinforcement for its teachings, with detailed examples and numerous theorems, proofs, and exercises, plus complete answers to all odd-numbered

end-of-chapter problems. 1970 edition. 576pp. 6 1/8 x 9 1/4.

0-486-46895-X

FOUNDATIONS OF GEOMETRY, C. R. Wylie,Jr.Geared toward students preparing to teach high school mathematics, this text explores the principles of Euclidean and nonEuclidean geometry and covers both generalities and specifics of the axiomatic method. 1964 edition. 352pp. 6 x 9. 0-486-47214-0

Browse over 9,000 books at www.doverpublications.com

CATALOG OF DOVER BOOKS Mathematics-History THE WORKS OF ARCHIMEDES, Archimedes. Translated by Sir Thomas Heath. Complete works of ancient geometer feature such topics as the famous problems of the ratio of the areas of a cylinder and an inscribed sphere; the properties of conoids, spheroids, and spirals; more. 326pp. 5 3/8 x 8 1/2. 0-486-42084-1

THE HISTORICAL ROOTS OF ELEMENTARY MATHEMATICS, Lucas N. H. Bunt, Phillip S. Jones, and Jack D. Bedient. Exciting, hands-on approach to understanding fundamental underpinnings of modern arithmetic, algebra, geometry and number systems examines their origins in early Egyptian, Babylonian, and Greek sources. 336pp. 5 3/8 x 8 1/2. 0-486-25563-8

THE THIRTEEN BOOKS OF EUCLID’S ELEMENTS, Euclid. Contains complete English text of all 13 books of the Elements plus critical apparatus analyzing each definition, postulate, and proposition in great detail. Covers textual and linguistic matters; mathematical analyses of Euclid’s ideas; classical, medieval, Renaissance and modern com-

mentators; refutations, supports, extrapolations, reinterpretations and historical notes. 995 figures. Total of 1,425pp. All books 5 3/8 x 8 1/2. Vol. 1: 443pp. 0-486-60088-2 Vol. II: 464pp. 0-486-60089-0 Vol. III: 546pp. 0-486-60090-4

A HISTORY OF GREEK MATHEMATIGS, Sir Thomas Heath. This authoritative two-volume set that covers the essentials of mathematics and features every landmark innovation and every important figure, including Euclid, Apollonius, and others.

5 3/8 x8 1/2.

Vol. I: 461pp. 0-486-24073-8 Vol. II: 597pp. 0-486-24074-6

A MANUAL OF GREEK MATHEMATICS, Sir Thomas L. Heath. This concise but thorough history encompasses the enduring contributions of the ancient Greek mathematiclans whose works form the basis of most modern mathematics. Discusses Pythagorean arithmetic, Plato, Euclid, more. 1931 edition. 576pp. 5 3/8 x 8 1/2.

0-486-43231-9

CHINESE MATHEMATICS IN THE THIRTEENTH CENTURY, Ulrich Libbrecht. combines An exploration of the 13th-century mathematician Ch’in, this fascinating book the Shuwork, extant only his of history a with life cian’ mathemati what is known of the shu chiu-chang. 1973 edition. 592pp. 5 3/8 x 8 1/2. 0-486-44619-0 URE IN PHILOSOPHY OF MATHEMATICS AND DEDUCTIVE STRUCT ding of the clasEUCLID'S ELEMENTS, Ian Mueller. This text provides an understan Elements. It focuses on sical Greek conception of mathematics as expressed in Euclid's appendixes. 400pp. helpful features and questions logical and nal, philosophical, foundatio 0-486-45300-6

61/2 x9 1/4.

Einstein, Edited with an BEYOND GEOMETRY: Classic Papers from Riemann to language collection of Englishonly the is This Pesic. Peter by Introduction and Notes ons of geometry that foundati the about ideas seminal these 8 accessible essays. They trace 0-486-45350-2 x 9 1/4. led to Einstein's general theory of relativity. 224pp. 6 1/8 ume history — from Egyptian HISTORY OF MATHEMATICS, David E. Smith. Two-vol Non-technical chronological . diagrams and graphs papyri and medieval maps to modern evaluations, and contemporary critical notes, ical biograph of ds thousan with survey 1/2. opinions on over 1,100 mathematicians. 5 3/8 x 8 0-486-20429-4

Vol. 1: 618pp. Vol. II: 736pp.

0-486-20430-8

lications.com Browse over 9,000 books at www.doverpub

7?

,

i Pm

:

Diao

si

Iie bein "

en

Ji lee

Bs

Age

hos

of bea

snobalogrpe

.

ee

:

Wsivv

into

aT

a sheave Heavoocetaiere



pee

SC

eS

Facey att02, oH head

oetaoshs

ie

ary.

“0

~

7

EAM Rizo WO YH alk atten 6rl4 Putail

_

oy baer egy, Yai lg ANB

st) lle

my? |

ee

id Daaow eed? 3fal 5 eater dt ZOVLTAMSUTAM2RAD ag ‘aul

A

oe

2m

ai

iinet

=

—_e

cy

ae

wee a

webs frees nine shina, dathawl qohakond _—* wedia ‘errs: hare

anghobsl deve

aj

ey =ateb payee Ene 2. ‘*

| myoaguha ah

err

teeth anette Set PAM

ee

Orin

cago

bry nbs ese s 1s eI

«igor ier

GEROHOSDO

7a

tds

at Be ane a.

pietace hit

os

ee

Ue 1S>exo asta

t ty

- Fitval ysies edyiyled- test warns wnt to Mase

~—t

:

baw lanes

ee

mats 9moet

ean

cent tnel pacer) saul mewn veo cath Mqutt hy ean OE ner 4)trie Sober en

wa

;

(ote _

BLE i Ladaren wisihs eth Se

as ge Piha sas |

a4

Esa

7 : es

bist olf 3a5“SPAS HSE

mena ie ‘erievicred ally ai Orusoneniaitindt Yueeediel Bi fe sit ried rains B et i Seapets ial ltSteir 4 bn vo Mu ox oa%, gant vopirtises BAT qpeioubi ie

-)

ouoiannaad

it

vee

Pmee, Phen



ava TTA

6AN2

30 AOR

had slo gabewashntacabhing cod uO asta th ad SIMA SeGLU

.

a

ORE |

a =

o7 a 59 on = ae

_

sone x bibud as barenges annae aat lo wnhguaiie 2a

ere ae feloninact honsantameising! trassia

|

Wy

Aah mts pooieeeareeT ent Lint sora | Drie leratenckea ony nt besesai iw pscansleeeoo ape

‘tits“ea

ata

a i “Le aHAr ght_garriulei Toyiew ite

=

a Po

oun

aarrounte. mt

9

ae

AHP “ihecorpUAaETtAM a2 Ate ee

eT - ees i

-

> \s

os

FUL " rob

ote: ieaterntd bur anecictetermiot

er:

vi —_

:

ened

tx i Sarees

: Gans ARES

2

——

abt arom pay (raya,

sisNggees Ceara" a xo ell =

;

lade A

i

-4

tian weal! £1Gs fhe een amseer

“ey

:

a

$i is

Sead

009 taba

ae

coi

RES

i=

We ace sa tov

IVE 'LI2 | ra Byer etratee aT as

pea 4» wie? Abn ZOYSIA

7H

a

Dik

7

eRaze oPs-4:

as?

iz

ACLsie RE (OQ 2500" HT estes bes dito ist oa ee

ded tues, ected ccd

a és .mmcrone, Thee9

tS

a

mists ie eet oe _evekauiy.Unanhittay aes ayn to qqann, wine gmanshe

west

=

7

Sty)

- unis eased eat5PER PS ree?

2

ee,

7

she

sn Loe

HN exon! 2ETAMAH CAM

yee

&

yroigik VeacitemortieM Ae

Crcrus

emukite adineene de et aesepeceqculh.eracitye foci

:

oH

7

40 HOAETES ine

“het (stdoay sient ade v6

s

*

-

, agli : : 5455 :

j

TAOS AO

1 scisntelonn O

ems _

i

(continued from front flap)

Catcuus: AN INTUITIVE AND PuysicAL APPROACH (SEconD EpiTion), Morris Kline.

(0-486-40453-6) THe

or

PuitosopaHy

Martuematics:

InTRopuctory

AN

Essay,

Stephan

(0-486-47 185-3) ComPANION

TO

MATHEMATICAL

MATHEMATICS:

CONCRETE

TECHNIQUES

AND

Korner.

VARIOUS

Appuications, Z. A. Melzak. (0-486-45781-8) NumBER

SYSTEMS

AND

FOounpaTioNs

THE

oF

ANALysis,

Elliott

Mendelson.

(0-486-45792-3) EXPERIMENTAL Statistics, Mary Gibbons Natrella. (0-486-43937-2) AN INTRODUCTION TO IDENTIFICATION, J. P. Norton. (0-486-46935-2) Beyonp

GEOMETRY:

CLASSIC

PAPERS

FROM

RIEMANN

TO EINSTEIN,

Edited

with

an

Introduction and Notes by Peter Pesic. (0-486-45350-2)

Tue STANFORD MATHEMATICS PROBLEM Book: WiTH Hints AND Sotutions, G. Polya and

J. Kilpatrick. (0-486-46924-7) SPLINES AND VARIATIONAL METHODS, P. M. Prenter. (0-486-46902-6) Propasitity THEORY, A. Renyi. (0-486-45867-9) Locic For MATHEMATICIANS, J. Barkley Rosser. (0-486-46898-4) PARTIAL DIFFERENTIAL Equations: Sources AND Sovutions, Arthur David Snider. (0-486-45340-5) INTRODUCTION

TO Biostatistics:

SECOND

Epition, Robert

R. Sokal

and F. James

Rohlf. (0-486-4696 1-1) MATHEMATICAL PROGRAMMING, Steven Vajda. (0-486-47213-2) Tue Locic of Cuance, John Venn. (0-486-45055-4) Tue Concert oF A RIEMANN SurFACE, Hermann Weyl. (0-486-47004-0) INTRODUCTION TO PRosecTIVE GEOMETRY, C. R. Wylie, Jr. (0-486-46895-X) Founpations oF Geometry, C. R. Wylie, Jr. (0-486-47214-0) CATIONS.COM SEE EVERY DOVER BOOK IN PRINT AT WWW.DOVERPUBLI

ne

Te

RICHARD

Ww. HAMMING

Introduction to Applied _ Numerical Analysis This book by a prominent mathematician is appropriate-for a singlesemester course in applied numerical analysis for computer science majors and other upper-level undengeadigts and graduate students. Although it does not cover actual programming, it focuses on the applied topics most pertinent to science and cugineetng professionals.



m”

.

An extensive range of topics includes round-off and function evaluation, real zeros of a function, simultaneous linear equations and ma-

> a

~

a

4

-%

trices, interpolation and roundoff estimation; integration, and ordinary differential equations. Additional subjects include optimization, least — squares, orthogonal functions, Fourier series, Chebyshev z approximation, and random processes. The author stresses the teaching of mathematical _ concepts through visual aids, and numerous diagrams and illustrations * * i, “ » complement the text.

the” "— Dover (2012) unabridged republication of the edition published i &: . ae Hemisphere Publishing Corporation, New York, 1989. ;

Me

~

.

me

e

.

,

~

s

-.

ep

*

*

; ‘

|

'—

a

Be

=

a

— c

4

PRINTED IN THE USA

ISBN-13: 978-0-486-48590-4 ISBN-10: 0-486-48590-0 | | 51995

9"780486"4859

S

racae

-

«(

$19.95 USA

See every Dover book in print at , P www.doverpublications.com _

a

a.

t

-

x

:*,

es| iF