Classical and Modern Numerical Analysis: Theory, Methods and Practice (Solutions, Instructor Solution Manual) 9781439842164


141 61 986KB

English Pages [162]

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Classical and Modern Numerical Analysis: Theory, Methods and Practice   (Solutions, Instructor Solution Manual)
 9781439842164

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

INSTRUCTOR’S ANSWER GUIDE FOR Classical and Modern Numerical Analysis: Theory, Methods and Practice

by Azmy S. Ackleh Edward J. Allen R. Baker Kearfott Padmanabhan Seshaiyer

Boca Raton London New York

CRC Press is an imprint of the Taylor & Francis Group, an informa business

Chapman & Hall/CRC Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2010 by Taylor and Francis Group, LLC Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number: 978-1-4398-4216-4 (Paperback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com

Instructor’s Answer Guide for Classical and Modern Numerical Analysis: Theory, Methods, and Practice Azmy S. Ackleh, University of Louisiana at Lafayette Edward J. Allen, Texas Tech University R. Baker Kearfott, University of Louisiana at Lafayette Padmanabhan Seshaiyer, George Mason University January 4, 2010

Contents Acknowledgments

2

Chapter 1

3

Chapter 2

16

Chapter 3

35

Chapter 4

58

Chapter 5

81

Chapter 6

91

Chapter 7

101

Chapter 8

122

Chapter 9

144

Chapter 10

155

1

Acknowledgments In addition to the authors, the following graduate students at the University of Louisiana at Lafayette have contributed to the answers in this guide: Pei Zhang, Qihua Huang, and Baoling Ma.

2

Chapter 1 1. (a) Let f (x) = ex , for x ∈ (−∞, 0]. We have f ∈ C 1 (−∞, 0] and, by assumption, y ∈ (−∞, 0] with y ≤ x. We may therefore use the Mean Value Theorem to obtain that there exists a ζ ∈ [y, x] ⊆ (−∞, 0], such that f (x) − f (y) = eζ (x − y). Hence, |ex − ey | = |eζ ||x − y| ≤ |e0 ||x − y| = |x − y|,

∀x, y ≤ 0.

(b) Let f (x) = xp , for x ≥ 0, and p ≥ 1. Then, f ∈ C 1 [0, ∞). Suppose 0 ≤ y ≤ x, The Mean Value Theorem then implies ∃ζ ∈ [y, x] ⊆ [0, ∞) such that f (x) − f (y) = f (ζ)(x − y).

Since g(x) = xp−1 for x ≥ 0 and p ≥ 1 is an increasing function, we have y p−1 ≤ ζ p−1 ≤ xp−1 . Thus, py p−1 ≤ xp − y p ≤ pxp−1 (x − y)

for 0 ≤ y ≤ x and p ≥ 1.

(c) The solution is already available in the Appendix of the book. (d) Since f ′ (x) 6= 0, ∀x ∈ (a, b), either f ′ (x) < 0 or f ′ (x) > 0, ∀x ∈ (a, b). Since f ′ ∈ C[a, b], f is a continuous monotone function on [a, b]. Thus, f (x) can vanish at most one point in [a, b]. 2. An answer is

x2 x4 x6 + − . 3! 5! 7! To see why, observe that, when x 6= 0, Taylor’s Theorem gives p(x) = 1 −

x−

sin(x) = x where ξ is between 0 and x. Thus, sinc(x) =

x3 x5 x7 ξ9 + − + 3! 5! 7! 9! , x

8 ξ |sinc(x) − p(x)| ≤ . 9!

Then, −0.2 ≤ x ≤ 0.2 implies 8 ξ (0.2)8 ≤ = 7.05 × 10−12 . 9! 9!

3. By Taylors Theorem,

h2 ′′ 1 f (x) + 2! 2!

f (x + h)

= f (x) + hf ′ (x) +

f (x − h)

h2 1 = f (x) − hf ′ (x) + f ′′ (x) + 2! 2! 3

Z

x+h

x x−h

Z

x

(x + h − t)2 f ′′′ (t)dt, (x − h − t)2 f ′′′ (t)dt.

Subtracting the second equation from the first, dividing by 2h, and simplifying then gives f (x+h)−f (x−h) ′′ − f (x) 2h Z Z x−h 1 x+h 2 ′′′ 2 ′′′ = (x + h − t) f (t)dt − (x − h − t) f (t)dt 4h x x 1 max |f ′′′ (t)| 4h x−h≤t≤x+h Z Z x−h x−h 2 2 · (x − h − t) dt + (x − h − t) dt x x  3 2h 1 max |f ′′′ (t)| · = ch2 , = x−h≤t≤x+h 4h 3



where c =

1 max |f ′′′ (t)|. 6 x−h≤t≤x+h

4. This is similar to Problem 3. By Taylor’s Theorem, since f has a continuous fourth derivative, we have f (x + h) =

f (x − h) =

h3 h2 ′′ f (x) + f ′′′ (x) 2! 3! Z 1 x+h + (x + h − t)3 f (4) (t)dt, 3! x h2 (−h)3 ′′′ f (x) + (−h)f ′ (x) + f ′′ (x) + f (x) 2! 3! Z x−h 1 + (x − h − t)3 f (4) (t)dt. 3! x

f (x) + hf ′ (x) +

Combining these two equations according to the difference quotient gives f (x + h) − 2f (x) + f (x − h) ′′ − f (x) h2 Z Z x−h 1 x+h 3 (4) 3 (4) = 2 (x + h − t) f (t)dt + (x − h − t) f (t)dt . 6h x x

4

Integrating by parts, we have Z x+h 3 (4) (x + h − t) f (t)dt x



max |f t

Z x+h 3 (t)| · (x + h − t) dt x

h4 , t 4 Z x−h (4) 3 max |f (t)| · (x − h − t) dt t x max |f (4) (t)|.

=

Z x−h 3 (4) (x − h − t) f (t)dt x

(4)



max |f (4) (t)| ·

=

t

Thus, f (x + h) − 2f (x) + f (x − h) ′′ − f (x) ≤ ch2 h2 1

5. Let xn = e n −

1 n

− 1 and αn =

1 n.

1

1

xn = e n −

for c =

1 max |f (4) (t)|. 12 t

By Taylor’s Theorem,

e n = e0 + e0 ( Therefore,

h4 . 4

1 1 − 0) + 2 eζ . n 2n

1 1 − 1 = 2 eζ . n 2n

Hence, xn = O(α2n ). 6. This is because n+1 xn n2 ln n n(n + 1) 1 + n1 = = = αn 1 n2 ln n ln n → 0 n

as n → ∞.

7. We follow definition (2) of fl on page 11 of the text, and we emulate the operations a machine would do, as follows: a ← 0.410 × 100 ,

b ← 0.135 × 10−3 ,

c ← 0.431 × 10−3 ,

and fl (b + c) = = =

fl(0.135 × 10−3 + 0.431 × 10−3 ) fl(0.566 × 10−3 ) 0.566 × 10−3 ,

so fl (a + 0.566 × 10−3 )

= fl (0.410 × 100 + 0.566 × 10−3 ) = fl (0.410 × 100 + 0.000566 × 100 ) = fl (0.410566 × 100 ) = 0.411 × 100 . 5

On the other hand, fl(a + b) = fl (0.410 × 100 + 0.135 × 10−3 )

= fl (0.410000 × 100 + 0.000135 × 100 ) = fl (0.410135 × 100 )

= 0.410 × 100 , so fl (0.410 × 100 + c)

= fl (0.410 × 100 + 0.431 × 10−3 )

= fl (0.410 × 100 + 0.000431 × 100 ) = fl (0.410431 × 100 ) = 0.410 × 100 6= 0.411 × 100 .

Thus, the distributive law does not hold for floating point arithmetic with “round to nearest.” Furthermore, this illustrates that accuracy is improved if numbers of like magnitude are added first in a sum. 8. We will not present it as as formally as in the previous problem, although such computations should underly the presentation here, so students get the proper idea of how floating point arithmetic works. On a 2-digit decimal computer arithmetic machine with rounding to nearest, we obtain a−b 0.41 − 0.36 0.50 × 10−1 → → → 0.71 × 10−1 , c 0.7 0.7 while a 0.41 → → 0.59 × 100 c 0.7 so

and

b 0.36 → = 0.51 × 100 , c 0.7

a b − → 0.80 × 10−1 6= 0.71 × 10−1 . c c

9. Let fl(x∗1 x∗3 ) = x∗1 x∗3 (1 + ǫ5 ),

f lt(x∗2 x∗4 ) = x∗2 x∗4 (1 + ǫ6 ),

and fl (fl(x∗1 x∗3 ) + fl(x∗2 x∗4 )) = (fl (x∗1 x∗3 ) + fl(x∗2 x∗4 ))(1 + ǫ7 ). Then S2∗

= fl (fl(x∗1 x∗3 ) + fl(x∗2 x∗4 )) = fl (fl(x1 (1 + ǫ1 )x3 (1 + ǫ3 )) + fl(x2 (1 + ǫ2 )x4 (1 + ǫ4 ))) = fl (x1 (1 + ǫ1 )x3 (1 + ǫ3 )(1 + ǫ5 ) + x2 (1 + ǫ2 )x4 (1 + ǫ4 )(1 + ǫ6 )  = x1 (1 + ǫ1 )x3 (1 + ǫ3 )(1 + ǫ5 ) + x2 (1 + ǫ2 )x4 (1 + ǫ4 )(1 + ǫ6 ) ·(1 + ǫ7 ) =

x1 x3 (1 + ǫ1 )(1 + ǫ3 )(1 + ǫ5 )(1 + ǫ7 ) + x2 x4 (1 + ǫ2 )(1 + ǫ4 )(1 + ǫ6 )(1 + ǫ7 ). 6

(The above computation is an algebraic depiction of the accumulation of roundoff errors.) Since (1 − δ) ≤ (1 + ǫi ) ≤ (1 + δ),

i = 1, . . . 7,

it follows (regardless of the algebraic signs of the xi ’s) that S2 (1 − δ)4 ≤ S2∗ ≤ S2 (1 + δ)4 , so (1 − δ)4 ≤

S2∗ ≤ (1 + δ)4 . S2

However, (1 + δ 4 ) ≤ e4δ , an elementary property of the exponential. 10. If IEEE single precision with rounding to nearest is being used, Theorem 1.6 may be used with p = 1, β = 2, and t = 23. So, Thus, regardless of which interval, the maximum relative error is bounded by x − fl(x) 1 1−t ≤ β p = 2−23 . 2 x (In fact, this bound is sharp for certain numbers in these intervals.)

11. Again, we use Theorem 1.6. If IEEE double precision with rounding to nearest is being used, x − fl(x) 1 1−t ≤ β p = 2−53 . 2 x 12. δ = ǫm .

13. (a) 0.999 × 109

(b) 0.100 × 10−9 (c) ǫm =

(d)

101−3 β 1−t = = 0.005 2 2

i. fl (f (0)) = 0.100 × 101 and fl (f (0.0008)) = 0.100 × 101 . ii. fl (fl(f (0.0008)) − fl (f (0))) = 0, while the nearest machine number to f (0.008) − f (0) is 0.800 × 10−3 . iii.

fl (fl(f (0.0008)) − fl(f (0))) 0 = = 0. fl(0.0008) 0.100 × 101

On the other hand, the first 10 digits of the exact value are 0.9999998933, so there is a total loss of accuracy. Note that the quotient is an approximation to f ′ (0) = 1, and, while the exact value is a very good approximation, the computed value is totally misleading. 7

14. For simplicity here, we interpret log(x) to mean the natural logarithm. (a) When x is large, log(x + 1) is closer to log(x) than x + 1 is to x, by a factor of approximately log′ (x) = 1/x. For example, with x = 105 , log(105 + 1) − log(105 ) ≈ 10−5 , while log(105 ) ≈ 11.5. Reasoning roughly, we can expect 5 digits of fl(log(x + 1)) and fl (log(x)) to be the same, and these digits will be lost. For example, with 16 decimal digits of accuracy (roughly the precision of IEEE double precision), we can expect only roughly 16 − 5 = 11 digits to be correct in the floating-point result, for a relative error of roughly 104 (since we divide by roughly 11.5 in the expression for the relative error). The absolute error will be small regardless of the value of x. On the other hand, since |f ′ | is small for large x, the absolute error decreases as x gets larger. (b) There are several things we can   do. One is to write log(x+1)−log(x) 1 as log x+1 or as log 1 + x x , but there may be roundoff error in evaluating the argument of log in these expressions, for extremely large x. Another is to do the cancelation algebraically, before the numerical computation, by using, say, Taylor polynomials. This introduces a truncation error, but it will be smaller, the larger x is, and we can control it by supplying a higher-degree Taylor polynomial. The Taylor series about x is an alternating series, and there is roundoff error in its evaluation, but the terms are rapidly decreasing for large x, so, for large x, roundoff error can be negligible. For sufficiently large x, we can approximate log(x + 1) − log(x) by 1/x. For example, the degree-4 Taylor polynomial with remainder term for log(x + 1) − log(x), centered at x, is 1 1 1 1 − 2 + 3 − 4. x 2x 3x 4x It is easily seen that the Taylor series is alternating, with terms decreasing in magnitude for x > 1, so the error is bounded by the first term left off. For example, when x = 105 , the truncation error is bounded by 0.25 × 10−20 if we approximate log(x + 1) − log(x) by the degree-3 Taylor polynomial. (c) The following matlab script (ch 1 prob 14 experiments.m) gives some results: log(x + 1) − log(x) ≈

x = 10^4; for i=1:4 naive_value = log(x+1)-log(x); lxp1overx = log((x+1)/x); lonepodx = log(1 + 1/x); taylor3 = 1/x - 1/(2*x^2) + 1/(3*x^3); fprintf(’%12.4e %12.4e %12.4e %12.4e, %12.4e\n’,... x, naive_value, lxp1overx, lonepodx, taylor3); x = 1e4*x; end

8

Namely, we obtain >> ch_1_prob_14_experiments(1) 1.0000e+004 9.9995e-005 9.9995e-005 1.0000e+008 1.0000e-008 1.0000e-008 1.0000e+012 1.0019e-012 1.0001e-012 1.0000e+016 0.0000e+000 0.0000e+000 >>

9.9995e-005, 1.0000e-008, 1.0001e-012, 0.0000e+000,

9.9995e-005 1.0000e-008 1.0000e-012 1.0000e-016

We see the beginning of loss of accuracy for x = 1012 , with the difference being the most inaccurate, and a total loss of accuracy for x = 1016 , for all forms except for the Taylor polynomial. (Looking at x = 1013 , x = 1014 , and x = 1015 would reveal more about the loss of accuracy of the different representations.) Computing using Mathematicar and 100 digits of precision, we get log(1012 + 1) − log(1012 ) = log(1016 + 1) − log(1016 ) =

9.999999999995000... × 10−13 , 9.9999999999999995000... × 10−17 ,

(By default, matlab rounds its internal results into a four decimal digit display, but will round into a 16 decimal digit display if the command format long is issued.) 15. (a) fl(f (0.1001 × 104 ) = 0.6909 × 101 , fl(f (0.1000 × 104 ) = .6908 × 101 , so the difference fl(0.6909 × 101 − .6908 × 101 is 0.1000 × 10−3 , and, dividing by 2, the result is 0.5000 × 10−4, whereas the closest floating point value is 0.4998 × 10−4 . (b) See the answer to the previous problem. In this case, when we write f (x) =

ln(x + 1) − ln(x) 1 1 = ln(1 + ) 2 . 2 x

We obtain fl (1 + 1/x) = 0.1001 × 101 , fl((0.1001 × 101 ).5 = 0.1000 × 101 , so f evaluates to 0, a total loss of accuracy. However, if we use the degree two Taylor polynomial   1 1 1 f (1000) ≈ − , 2 103 2 × 106 we obtain fl(1/.1000×104) = 0.1000×10−2,

fl (1/.2000×107) = 0.5000×10−6,

and taking the subtraction in the Taylor polynomial and rounding gives fl(0.1000 × 10−2 − 0.5000 × 10−6 ) = .9995 × 10−4 . When we divide by 2 and round, we get fl(0.49775 × 10−4 /2) = 0.4998 × 10−2 , if we are using round to even. 9

(c) The relative error for the answer obtained in part (b) is smaller than the one obtained in part (a). 16. (An expanded explanation of a solution in the back of the book) Suppose x∗ = y ∗ , but |x − x∗ | ≤ R|x| and |y − x∗ | ≤ R|y|, or −R|x| ≤ x − x∗ ∗

−R|y| ≤

y−x

≤ R|x|,

≤ R|y|.

Subtracting the second set of inequalities from the first gives −R(|x| + |y|) ≤ x − y ≤ R(|x| + |y|), that is, |x − y| ≤ R(|x| + |y|). This cannot happen if R
> This proves that the first three digits of π are 3.14. 25. We use matlab with intlab to obtain the interval enclosures here. Care needs to be taken, since −0.8 is not exactly representable; the function 12

rigorinfsup from the 2009 Moore / Kearfott / Cloud book. We obtain 1 f ([−1. − 0.8]) ⊆ sin2 ([−1, −0.8]) + ([−1, −0.8]) 2 ⊆ [−0.8415, −0.7173]2 + [−0.5, −0.4] ⊆ [0.0145, 0.3081]. Since 0 ∈ / [0.0145, 0.3081], there is no solution to f (x) = 0 for x ∈ [−1. − 0.8]. 26. We empirically compare the natural interval extension and the mean value extension (a) We have used the following matlab script (ch1 26a.m). format long intvalinit(’displayinfsup’) for i=1:10 u(i)=4^(-i); l(i)=1-u(i); r(i)=1+u(i); wx(i)=r(i)-l(i); xintval(i)=infsup(l(i),r(i)); fx(i)=xintval(i)^2-xintval(i); wfx(i)=2*rad(fx(i)); % The following is only valid for l(i) > 0.5. wfux(i)=(r(i)^2-r(i))-(l(i)^2-l(i)); E(i)=wfx(i)-wfux(i); E1(i)=E(i)/wx(i); E2(i)=E(i)/wx(i)^2; end u’ xintval’ fx’ wfx’ wfux’; E1’ E2’

From this, we compiled the following table. i 1 2 3 4 5 6 7 8 9 10

4−i 0.25000000000000 0.06250000000000 0.01562500000000 0.00390625000000 0.00097656250000 0.00024414062500 0.00006103515625 0.00001525878906 0.00000381469727 0.00000095367432

[ [ [ [ [ [ [ [ [ [

xi 0.74999999999998, 1.25000000000001] 0.93749999999998, 1.06250000000001] 0.98437499999998, 1.01562500000001] 0.99609374999998, 1.00390625000001] 0.99902343749998, 1.00097656250001] 0.99975585937498, 1.00024414062501] 0.99993896484373, 1.00006103515626] 0.99998474121093, 1.00001525878907] 0.99999618530273, 1.00000381469727] 0.99999904632568, 1.00000095367432]

13

i 1 2 3 4 5 6 7 8 9 10

[ [ [ [ [ [ [ [ [ [

f (xi ) -0.68750000000001, 0.81250000000001] -0.18359375000001, 0.19140625000001] -0.04663085937501, 0.04711914062501] -0.01170349121094, 0.01173400878907] -0.00292873382569, 0.00293064117432] -0.00073236227036, 0.00073248147965] -0.00018310174346, 0.00018310919405] -0.00004577613436, 0.00004577660002] -0.00001144407725, 0.00001144410635] -0.00000286102204, 0.00000286102386]

w(f (xi )) 1.50000000000000 0.37500000000000 0.09375000000000 0.02343750000000 0.00585937500000 0.00146484375000 0.00036621093750 0.00009155273438 0.00002288818359 0.00000572204590

E(f ;x) w(x i )

2 2 2 2 2 2 2 2 2 2

E(f ; x)/w(xi )2 4 16 64 256 1024 4096 16384 65536 262144 1048576

(b) We use the following matlab script (ch1 26b.m). The function mean value form.m from the Moore / Kearfott / Cloud 2009 book is used in this script, and f (x) = x2 − x is programmed as the matlab function xsqmx. format long; intvalinit(’displayinfsup’) for i=1:10 u(i)=4^(-i); l(i)=1-u(i); r(i)=1+u(i); wx(i)=r(i)-l(i); xintval(i)=infsup(l(i),r(i)); fmvx(i)=mean_value_form(’xsqmx’,xintval(i)); wfmvx(i)=2*rad(fmvx(i)); % The following is only valid for l(i) > 0.5. wfux(i)=(r(i)^2-r(i))-(l(i)^2-l(i)); E(i)=wfmvx(i)-wfux(i); E1(i)=E(i)/wx(i); E2(i)=E(i)/wx(i)^2; end u’ xintval’ fmvx’ wfmvx’ wfux’; E1’

14

E2’

From this, we compiled the following table. i 1 2 3 4 5 6 7 8 9 10 i 1 2 3 4 5 6 7 8 9 10

4−i xi 0.25000000000000 [ 0.74999999999998, 1.25000000000001] 0.06250000000000 [ 0.93749999999998, 1.06250000000001] 0.01562500000000 [ 0.98437499999998, 1.01562500000001] 0.00390625000000 [ 0.99609374999998, 1.00390625000001] 0.00097656250000 [ 0.99902343749998, 1.00097656250001] 0.00024414062500 [ 0.99975585937498, 1.00024414062501] 0.00006103515625 [ 0.99993896484373, 1.00006103515626] 0.00001525878906 [ 0.99998474121093, 1.00001525878907] 0.00000381469727 [ 0.99999618530273, 1.00000381469727] 0.00000095367432 [ 0.99999904632568, 1.00000095367432] f mv (xi ) w(f mv (xi )) [ -0.37500000000001, 0.37500000000001] 0.75000000000000 [ -0.07031250000001, 0.07031250000001] 0.14062500000000 [ -0.01611328125001, 0.01611328125001] 0.03222656250000 [ -0.00393676757813, 0.00393676757813] 0.00787353515625 [ -0.00097846984864, 0.00097846984864] 0.00195693969727 [ -0.00024425983429, 0.00024425983429] 0.00048851966858 [ -0.00006104260684, 0.00006104260684] 0.00012208521366 [ -0.00001525925473, 0.00001525925473] 0.00003051850945 [ -0.00000381472637, 0.00000381472637] 0.00000762945274 [ -0.00000095367614, 0.00000095367614] 0.00000190735227

i 1 2 3 4 5 6 7 8 9 10

E(fmv ; x)/w(xi ) 0.50000000000000 0.12500000000000 0.03125000000000 0.00781250000000 0.00195312500000 0.00048828125000 0.00012207031250 0.00003051757813 0.00000762939453 0.00000190734863

E(fmv ; x)/w(xi )2 1 1 1 1 1 1 1 1 1 1

(c) Based on the results in parts 26a and 26b, we find that α1 = 1 and α2 = 2. Note: Since this particular function f is monotonic over the intervals in question, the intervals (and these tables) can be computed symbolically, without using intlab. Some students may do it that way. 27. This problem can be done by modifying the lines setting l(i) and u(i) in the matlab scripts for parts (a) and (b) of problem 27. The results are the same: The mean value extension is clearly of order 2, but the natural extension is only of order 1.

15

Chapter 2 1. (a) Yes, because f (x) = arctan(x) is continuous, f (−4.9) ≈ −1.365 < 0 and f (5.1) ≈ 1.377 > 0, that is, f (4.9) · f (5.1) < 0. We need 1 1 (5.1 − (−4.9)) = k+1 10 < 10−2 . 2k+1 2 Thus, 2k+1 > 103 , so k + 1 > 3/log 2 ≈ 9.97. Hence, we need 10 iterations. (b) Applying Algorithm 2.1, we get k 0 1 2 3 4 5

ak -4.9 -4.9 -2.4 -1.15 -0.525 -0.2125

bk 5.1 0.1 0.1 0.1 0.1 0.1

xk 0.1 -2.4 -1.15 -0.525 -0.2125 -0.05625

Thus, z ∈ (−0.2125, 0.1). 2. (a) We have provided the matlab function bisect method, available from the web page at http://interval.louisiana.edu/Classical-and-Modern-NA/ to do bisection. This program does not print ak , bk , f (ak ), f (bk ) and f (xk ) at each step, but may be made to do so by removing semicolons from the ends of appropriate statements, or by modifying the sprintf statement within the loop. The following is an example dialog (before alteration to print values as requested): >> bisect_method(-4.9,5.1,1e-2,’atan’) N = 9 ----------------------------Error Estimate ----------------------------5.0000e+000 1.0000e-001 2.5000e+000 -2.4000e+000 1.2500e+000 -1.1500e+000 6.2500e-001 -5.2500e-001 3.1250e-001 -2.1250e-001 1.5625e-001 -5.6250e-002 7.8125e-002 2.1875e-002 3.9063e-002 -1.7188e-002 1.9531e-002 2.3437e-003 ans = -0.007421875000000 >>

16

(b) (i) Recall that if (bk − ak ) = (b − a)/2k+1 < ǫ, then |z − xk | < ǫ. For a = −4.9, b = 5.1, 10 2k+1 10 2k+1 10 2k+1 10 2k+1 10 2k+1 10 2k+1 10 2k+1

< 10−2



k+1>

< 10−4



k+1>

< 10−8



k+1>

< 10−16



k+1>

< 10−32



k+1>

< 10−64



k+1>

< 10−128



k+1>

3 log 2 5 log 2 9 log 2 17 log 2 33 log 2 65 log 2 129 log 2



stop at k + 1 = 10,



stop at k + 1 = 17,



stop at k + 1 = 30,



stop at k + 1 = 57,



stop at k + 1 = 110,



stop at k + 1 = 216,



stop at k + 1 = 429.

(ii) Students should observe the values of f (xk ) to see if they are halved, on average, on each step. This may not be true for particular f and k, but, on average, the behavior is linear, with convergence factor C = 1/2. 3. The function xsqm2 is provided to students through the web page for the course, at http://interval.louisiana.edu/Classical-and-Modern-NA/ The function bisect method in the previous problem can be used, with the following matlab dialog. (The modifications of bisect method.m for the extra printing are not represented here.) >> bisect_method(1,2,1e-2,’xsqm2’) N = 6 ----------------------------Error Estimate ----------------------------5.0000e-001 1.5000e+000 2.5000e-001 1.2500e+000 1.2500e-001 1.3750e+000 6.2500e-002 1.4375e+000 3.1250e-002 1.4063e+000 1.5625e-002 1.4219e+000 ans = 1.4141

4. xsqm2.m can be modified to x4m1000.m (with modifications that should be clear), then the following matlab dialog may be used. >> bisect_method(1,10,1e-5,’x4m1000’)

17

N = 19 ----------------------------Error Estimate ----------------------------4.5000e+000 5.5000e+000 2.2500e+000 7.7500e+000 1.1250e+000 6.6250e+000 5.6250e-001 6.0625e+000 2.8125e-001 5.7813e+000 1.4063e-001 5.6406e+000 7.0313e-002 5.5703e+000 3.5156e-002 5.6055e+000 1.7578e-002 5.6230e+000 8.7891e-003 5.6318e+000 4.3945e-003 5.6274e+000 2.1973e-003 5.6252e+000 1.0986e-003 5.6241e+000 5.4932e-004 5.6236e+000 2.7466e-004 5.6233e+000 1.3733e-004 5.6235e+000 6.8665e-005 5.6234e+000 3.4332e-005 5.6234e+000 1.7166e-005 5.6234e+000 ans = 5.6234 >>

5. An answer is provided in the appendix to the book. 6. Consider g(x) = x − arctan x. (a) This is very simple to do in matlab. The following dialog may be used. >> x = 5 x = 5 >> x = x-atan(x) x = 3.6266 >> x = x-atan(x) x = 2.3249 >> x = x-atan(x) . . .

In this dialog, the command x = x-atan(x) may be recalled using the up-arrow key. We obtain the following. 18

k 0 1 2

5 3.6266 2.3249

(Starting points) -5 -1 -3.6266 -0.2146 -2.3249 -0.0032

3 4 5

1.1603 0.3008 0.0086

-1.1603 -0.3008 -0.0086

−1.0987 × 10−8 0 0

1.0987 × 10−8 0 0

6

2.1282 × 10−7

−2.1282 × 10−7

0

0

0 0 0

0 0 0

7 8 9

−21

3.028 × 10 0 0

k 0

(Starting point) 0.1

1

3.3135 × 10−4

2 3 4

3.028 × 10−21 0 0

1 2.146 0.0032

1.2126 × 10−11 0 0

(b) The sequence xk+1 = g(xk ) is converging to z = 0. g(x) = x − arctan x, g ′ (x) = 1 − x21+1 , for x 6= 0. Thus, for any starting point x0 , |x1 − z| = |g(x0 ) − g(z)| = |g ′ (ξ)||x0 − z|

for some ξ between 0 and x0 . Thus, g ′ (ξ)| ≤ |g ′ (x0 )| < 1, so L = |g ′ (x0 )| is a Lipschitz constant for g over the interval [0, x0 ]. Furthermore, since g ′ > 0 everywhere, g maps [0, x0 ] into [0, x0 ], so the hypotheses of the Contraction Mapping Theorem (Theorem 2.3, page 40) hold, and the method converges from any starting point, at least linearly. However, the table shows that the convergence is quadratic. To see why this is so, observe that g ′ (0) = 0; in fact, the Lipschitz constant L itself goes to zero linearly as x0 → 0.

7. (a) g ′ (x) =

1 a − . 2 2x2

√ 1 ′ Therefore, √ for x0 ≥ a, 0 ≤ g (x0 ) < 2 , so all iterates remain within[ a, x0 ], and the hypotheses of the contraction mapping theo√ rem hold for any a > x0 . For huge x0 , convergence is approximately linear for the first√iterate or two, with convergence factor C = 21 . ′ However, since √ g ( a) = 0, we might expect quadratic convergence for x0 near a. √ (b) For simplicity, we’ll use a = 9, so a = 3, and we’ll use x0 = 9. Using matlab as in problem 6, we obtain the following table, where the displayed numbers are rounded from the actual numbers used in the computations.

19

k

xk

0 1 2 3 4 5 6

9 5 3.4 3.0235 3.0000 3.0000 3

|xk −

√ a|

6 2 0.4 0.0235 9.155 × 10−5 1.397 × 10−9 0

√ |xk+1 − a| √ 2 |xk − a|

0.0555 0.1000 0.1471 0.1654 0.1667 —

We indeed observe quadratic convergence, with convergence factor √ about 1/6 for x0 near a = 3. We observe finite convergence. This is an artifact of the floating point computation, since, if the precision were unlimited, we would never achieve the exact fixed point in a finite number of iterations. 8. Observe that |g ′ (x)| = |1 + cf ′ (x)| and choose c such that |1 + cf ′ (α)| < 1, i.e. −2 < cf ′ (α) < 0.

If we choose, say, cf ′ (α) = −1, i.e. c = 1/f ′ (α), then the conditions of Proposition 2.4 hold. The reason to choose c so cf ′ (α) is so close to the center of [−2, 0] is to make it likely that g will be a contraction over as large an interval as possible. (x would have to stray further from α to get to the boundary of the region where g ′ < 1.)

9. Since α is a root of f (x), f (α) = 0. When xk+1 = g(xk ), we can show g ′ (α) = g ′′ (α) = 0 to prove the order of convergence of xk to α is at least 3. Note that     f (x)f ′′ (x) f (x)f ′′ (x) − f ′ x − ff′(x) + f x − ff′(x) f ′′ (x) ′ (x) (x) f (x) g ′ (x) = , f ′ (x)2 which reduces to g ′ (α) = 0 when we plug in x = α and use f (α) = 0. If we differentiate g again, g ′′ (α) = 0 can be shown similarly. 10. (a) f (0) = −1 < 0 and f (1) = 1 > 0. Hence, by the intermediate value theorem, the positive√real root is in [0, 1). But x3 √ + x2 − 1 = 0 3 3 is equivalent to x = 1 − x, so we may set g(x) = 1 − x. Then, 2 2 g(x) : [0, 1] → [0, 1], g ′ (x) = − 13 (1 − x)− 3 , and |g ′ (x)| = | 13 (1 − x)− 3 |. 1 ′ ′ The maximum of |g (x)| is |g (0)| = 3 < 1. Hence, g(x) satisfies the condition of the contraction mapping theorem. (b) We know |xn − ζ| ≤ ( 13 )n |x0 − ζ|, where x0 is the initial guess and ζ is the root. Then, to get an accuracy of 10−4 , we need  n  n 1 1 4 |x0 − ζ| < 10−4 ⇒ < 10−4 ⇒ n > ≃ 8.4. 3 3 log 3 20



11. Let g(x) = x2 + 1000 2x . Then, using the iteration function g in prob1 lem problem 7 with starting points x0 > 1000 4 , the fixed point method converges quadratically, and we can get 5.62341325191715 as an approx1 imation to 1000 4 with 10−5 . For example, the following matlab script (supplied as file ch 2 11.m) may be used. clear all format long r=1000^(1/4); x=6; k=1; while (abs(x-r)>10^(-5) & k> ch_2_11 x(k)=5.6352 x(k)=5.6234 x(k)=5.6234 x = 5.623413251917146 ans = 1.365485502446973e-011 >>

12. An answer is provided in the appendix to the book. 13. Since xk+1 = g(xk ) = γf (xk ) + h(xk ) xk+1 − z

= = =

= ||xk+1 − z|| ≤ = ≤

g(xk ) − z

γf (xk ) + h(xk ) − g(z) γf (xk ) + h(xk ) − (γf (z) + h(z))

γ(f (xk ) − f (z)) + h(xk ) − h(z) γL||xk − z|| + ||h(xk − h(z)|| γL||xk − z|| + ||h′ (ζ)(xk − z)|| 3 (γL + )||xk − z||. 4

Hence for convergence, we require γL + 21

3 4

< 1 ⇒ γL
x = infsup(0,2) intval x = [ 0.0000, 2.0000] >> (1/3)*(x^2-2*x-5/4) intval ans = [ -1.7501, 0.9167]

so there is too much overestimation. To reduce the overestimation, we use the mean value form as in problem 26(b) of Chapter 1 and subdivide [0, 2] into 10 intervals, with the matlab script ch 2 14.m, as follows. x = linspace(0,2,11) for i=1:10 ix(i)=infsup(x(i),x(i+1)); range(i) = mean_value_form(’g_ch_2_14’,ix(i)); end range’

This gives the following enclosures for the ranges over the subintervals: intval ans = [ 1.2334, [ 1.1143, [ 0.9792, [ 0.8334, [ 0.6823, [ 0.5312, [ 0.3854, [ 0.2503, [ 0.1312, [ 0.0334,

1.3435] 1.2484] 1.1319] 0.9995] 0.8564] 0.7079] 0.5595] 0.4164] 0.2839] 0.1675]

Since the range of g over [0, 2] must be contained in the union of these intervals, this proves that g maps [0, 2] into itself, so all of the hypotheses of the contraction mapping theorem hold. 15. (a) First observe that g(x) has a unique fixed point on (−∞, ∞) provided f (x) = x − g(x) = x − e−x 22

has a unique zero on (−∞, ∞). Considering f (x), we see that f (0) = −1 and f (1) = 1−e1 > 0 so by the intermediate value theorem, f has a zero z in [0, 1]. Suppose that there is another zero z ∗ ∈ (−∞, ∞). Then by the mean value theorem, 0 = f (z) − f (z ∗ ) =

f ′ (ζ) , z − z∗

implying that f ′ (ζ) = 1 + eζ = 0 for some ζ ∈ (−∞, ∞). This is impossible, so the zero of f is unique, so g has a unique fixed point in (−∞, ∞).

1 (b) |g ′ (x)| = |e−x | ≤ |e− ln(1.1) | = 1.1 < 1, for x ∈ [ln(1.1), ln(3)]. Thus g is a contraction on [ln(1.1), ln(3)].

(c) On [ln(1.1), ln(3)], g(x) achieves its minimum and maximum at the end points, because g ′ (x) 6= 0, for x ∈ [ln(1.1), ln(3)]. Thus, ln(1.1) ≤

1 1 = g(ln(3)) ≤ g(x) ≤ g(ln(1.1)) = ≤ ln(3). 3 1.1

Therefore, g : [ln(1.1), ln(3)] → [ln(1.1), ln(3)].

(d) Case 1: Suppose that x0 ∈ [ln(1.1), ln(3)]. Then, by the contraction mapping theorem, xt → z ∈ [ln(1.1), ln(3)] as t → ∞. Case 2: Suppose that x0 ∈ (ln(3), ∞). Then, x1 = e−x0 satisfies 1 0 < x1 < 31 and e− 3 < x2 < 1. So x2 ∈ [ln(1.1), ln(3)]. Therefore, by case1, xt → z as t → ∞. 1 Case 3: Suppose that x0 ∈ (−∞, ln(1.1)). Then, x1 ∈ ( 1.1 , ∞). By cases 1 and 2, xt → z as t → ∞. 16. We need to show that S(z) = z and S ′ (z) = 0. The key is that f (x) → f (z) = 0 as x → z, so f (f (x) + x) − f (x) → f ′ (x). f (x)

23

This is not obvious, but can be seen by rewriting the quotient as f (f (x) + x) − f (x) f (x)

f (f (x) + x) − f (f (x) + z) f (x) f (f (x) + z) − f (z) f (z) − f (x) + + f (x) f (x) f ′ (f (x) + cx )(x − z) = f (x) − f (z) f (f (x) + z) − f (z) f ′ (ξx )(x − z) + − f (x) f (x) 1 → f ′ (z) ′ f (z) 1 +f ′ (z) − f ′ (z) ′ f (z) = f ′ (z). =

Plugging this into the expression for S gives S(z) = z −

f (z) = z. f ′ (z)

Note that S ′ (x) = 1 −

f 2 (x)[f ′ (f (x) + x)(f ′ (x) + 1) − f ′ (x)] 2f (x)f ′ (x) + . f (f (x) + x) − f (x) (f (f (x) + x) − f (x)2

and, plugging in for f ′ (z), we get f ′ (f (x) + x)(f ′ (x) + 1) − f ′ (x) → (f ′ (x))2 . Therefore, S ′ (x) → 1 − 2 + 1 = 0. 17. For the iteration method to be fourth order, we need the iteration function g(x) to satisfy g(z) = z,

g ′ (z) = 0,

g ′′ (z) = 0,

g ′′′ (z) = 0

at the fixed point z. Note that: g(z) = z − af (z) − b(f (z)2 ) − c(f (z)3 ) = z,

g ′ (x)

= 1 − af ′ (z) − 2bf (x)f ′ (x) − 3c(f (x)2 )f ′ (x),

g ′ (z) = 0 g ′′ (x)



1 − af ′ (z) = 0



a=

1 , f ′ (z)

= −af ′′ (x) − 2bf (x)f ′′ (x) − 2b(f ′ (x)2 )

− 3c(f (x)2 )f ′′ (x) − 6c(f ′ (x)2 )f (x),

g ′′ (z) = 0



−af ′′ (z) − 2b(f ′ (z)2 ) = 0 24



b=−

1 f ′′ (z) . 2 (f ′ (z))3

Similarly setting g ′′′ (z) = 0 gives  ′′ 2  1 f (z) 1 f ′′′ (z) c= − (f ′ (z))−3 . 2 f ′ (z)2 b f ′ (z) 18. Note that y (k+1) = F (y (k) ) is a fixed point iteration, where F (y) = yj +

h [f (yj ) + f (y)]. 2

Clearly, F : R → R and |F (u)−F (v)| ≤ hL 2 |u−v|. Thus, F is a contraction on R if hL < 1. Hence, by the contraction mapping theorem, there is a 2 (k+1) unique yj+1 ∈ R such that yj+1 = F (yj+1 ), and the iterations yj+1 converge to yj+1 . 19. The method is of the form xk+1 = g(xk ), with g(x) = − We have

c g (x) = , (x + b)2 ′

c . x+b

c . so |g (α)| = 2 (α + b) ′

Since α and β are two real roots we have α + β = −b and αβ = c. This implies αβ α ′ = . |g (α)| = (−β)2 β

For convergence, we require |g ′ (α)| < 1, which gives |α| < |β|.

1 2x and f ′′ (x) = − . 2 1+x (1 + x2 )2 Let G = [−R, R] ⊂ R. Then, x2 ≤ R2 for x ∈ G. Thus,

20. (a) Since f (x) = arctan(x), f ′ (x) =

|f ′ (x)| = and

|f ′′ (x)| = −

1 1 ≥ , 2 1+x 1 + R2

√ 2x 2|x| 3 3 = ≤ 2|x| ≤ , (1 + x2 )2 (1 + x2 )2 8

where the last inequality comes from calculus (finding the critical 1 points of f ′′ by setting f ′′′ equal to zero). Letting m = , 1 + R2 √ M = 3 3/8, we have 2m 2 8 16 = · √ = √ . M 1 + R2 3 3 3 3(1 + R2 ) 25

2 Suppose √ s is nonnegative integer such that R < s. Letting ρ = 16/[3 3(1 + s), we have Kρ (0) ⊂ G = [−R, R]. (We also observe that ρ < 2m/M , so the conclusions of Theorem 2.5 are true √ with Kρ (0) = [−ρ, ρ]. For example, if R = 2, we may take ρ < 16/[3 3(1+ 4)] ≈ 0.615. We may find an R√for which ρ is maximum subject to ρ ≤ R by solving R = 16/[3 3(1 + R2 )] (say, using a computer algebra system such as Mathematicar ). This gives ρ ≈ R ≈ 1.227. Theorem 2.5 thus asserts that Newton iteration should converge for any starting point in the interval [−1.227, 1.227].

(b) The Newton iteration is x ← x − (1 + x2 ) arctan(x). This may be implemented, say, in a matlab dialog as we illustrated in the solution to problem 6. (i) We form a table of results, where n is the number of iterations needed, and where we put “—” if it did not converge in 20 iterations and ∞ if there was overflow in an iteration (and −∞ if the result of an iteration overflowed but would have been negative). x0 n

0.5 3

1.0 5

x0 n

1.39375 ∞

1.3 6

1.4 −∞

1.390625 11

1.35 7

1.375 8

1.3875 9

1.3921875 ∞

(Note: We chose these values partially based on the method of bisection starting with [1.3, 1.4], to try to locate the point p.) (ii) We observe an oscillating convergence when it converged, and values oscillating in sign and getting larger in magnitude in the cases where it did not converge. There is an interval, at approximately [−1.39375, 1.39375], in which the Newton iteration converges, and the iteration diverges outside this interval. Theorem 2.5 gives an interval in which convergence is guaranteed, but it is somewhat smaller than the actual convergence interval. For starting points within this interval but close to an end point, the convergence is initially slow. For starting points outside this interval but close to an end point, the divergence is initially slow. The slope of f (x) = arctan(x) is smaller when x is farther from 0. If x0 is far enough from 0, f ′ (x0 ) ≃ 0, and x1 = x0 −

f (x0 ) ≃ ∞ or − ∞. f ′ (x0 )

(iii) (α) The sequence will oscillate between p and −p if x0 = p exactly.

26

(β) Actually, p is probably irrational and thus cannot be represented in the computer. Thus, although it may take many iterations, we will eventually observe either convergence or divergence. 21. (a) f ′ (x) = 2x, so Newton’s method becomes xk+1 = xk −

x2k − a xk a = + , 2xk 2 xk

the same iteration as in problem 7. (b) For x0 = a > 1, f ′ (x) = 2x ≥ 2a, so m = 2a, and f ′′ (x) = 2, so M = 2. This gives ρ = 2(2)/(2a) = 2/a, for a finite interval of convergence. (c) k 1 2 3 4 5 6 7 8

xk (x0 = 2) 1.5000 1.4167 1.4142

xk (x0 = 4) 2.2500 1.5694 1.4219 1.4142

xk (x0 = 8) 4.1250 2.3049 1.5863 1.4235 1.4142

xk (x0 = 16) 8.0625 4.1553 2.3183 1.5905 1.4240 1.4142

xk (x0 = 32) 16.0313 8.0780 4.1628 2.3216 1.5915 1.4241 1.4142

xk (x0 = 64) 32.0156 16.0390 8.0819 4.1647 2.3224 1.5918 1.4241 1.4142

We observe monotonic convergence, regardless of how large we choose x0 . √ (d) The graph is convex and increasing for x > 0. Thus, for x0 > a, the tangent line at x intersects the horizontal √ axis below the graph √ of f , resulting in x1 > a, but x1 closer to a. This is √ why the √ iterates are monotonically decreasing to a for every x0 > a. To use the convergence theory from this chapter, observe that √ f is increasing on [0, ∞). With x0 > 0, the sequence converges to 2. By Theorem 2.5 and problem 21b, the sequence converges quadratically with √ √ √ 2 √ M 2 |xk+1 − 2| ≤ |xk − 2| = |xk − 2|2 . 2m 2 (e) Let

Then g ′ (x) =

1 2

x2 − 2 x 1 g(x) = x − = + . 2x 2 x √ 1 ′ ′′ − x2 , g ( 2) = 0, and g (x) = x23 , so √ g ( 2) = ′′

√ 2 . 2

Hence, by Theorem 2.4, the sequence converges quadratically.

27

22. Consider the Taylor polynomials for f (xn ) and f ′ (xn ) about the point z: 1 f (xn ) = f (z) + f ′ (z)(xn − z) + f ′′ (ζ1 )(xn − z)2 + ... 2 f ′ (xn ) = f ′ (z) + f ′′ (ζ2 )(xn − z) where ζ1 , ζ2 ∈ (xn , z) or (z, xn ). Since f (z) = f ′ (z) = 0, f (xn ) =

1 ′′ f (ζ1 )(xn − z)2 + ... 2

f ′ (xn ) = f ′′ (ζ2 )(xn − z) ⇒

f (xn ) f ′′ (ζ1 )(xn − z) = . f ′ (xn ) 2f ′′ (ζ2 )

Plugging this into the formula for Newton’s method gives xn+1 = xn − However,

f (xn ) f ′′ (ζ1 ) = x − (xn − z). n f ′ (xn ) 2f ′′ (ζ2 )

f ′′ (ζ1 ) →1 f ′′ (ζ2 )

Hence, lim

n→∞

1 xn+1 − z = , xn − z 2

as n → ∞. i.e. en+1 =

1 en . 2

23. An answer is provided in the appendix to the book. 24. From Remark 2.11 (page 53), g ′ (z) = 0. Thus, g ′ (x) = g ′ (z) + g ′′ (cx )(x − z) = g ′′ (cx )(x − z). Thus, |xk+1 − z| = ≤

|g(xk ) − g(z)| = |g ′′ (ck )|(xk − z)2   ′′ max |g (x)| · (xk − z)2 . x∈[xk , z] or [z, xk ]

This shows that Newton’s method is quadratically convergent. 25. An answer is provided in the appendix to the book. 26. An answer is provided in the appendix to the book. 27. (a) To begin the induction, we show x0 ≥ x1 and x1 ≥ 0. But we get x1 by subtracting f (x0 )/f ′ (x0 ), a non-negative quantity, since both

28

the numerator and denominator are positive, from x0 . Furthermore, f (x1 ) < f (x0 ) by Assumption (i), x0 − x1 > 0, and f (x1 ) − f (x0 ) ≥ −f (x0 ) =

f ′ (x0 )(x1 − x0 ) f ′ (x0 )(x1 − x0 ),

where the inequality follows from Assumption (ii) and the equation follows from the definition of Newton’s method. Subtracting the equation from the inequality then gives f (x1 ) ≥ 0. Thus, x1 obeys the same assumptions as x0 , so we may repeat the argument. (Note that we just used the weaker assumption f (x0 ) ≥ 0.) Therefore, the assertion in this part of the problem is true by induction. (b) By (a), the sequence xk is monotone and bounded, so it converges. Suppose xk → z as k → ∞. Taking limits on both sides of xk+1 = xk − f (xk )/f ′ (xk ) results in lim f (xk ) = f ( lim xk ) = f (z) = 0.

k→∞

k→∞

28. (a) We may use i newton step no fp.m from the Moore / Kearfott / Cloud book, if we have matlab. Doing so, we have the following matlab dialog. >> x = midrad(0,0.5) intval x = [ -0.5000, 0.5000] >> [x, is_empty] = i_newton_step_no_fp(’atan’,x) intval gradient value fXg.x = [ -0.4637, 0.4637] intval gradient derivative(s) fXg.dx = [ 0.7999, 1.0000] intval x = [ 0.0000, 0.0000] is_empty = 0 >> x = midrad(0.5,0.5) intval x = [ 0.0000, 1.0000] >> [x, is_empty] = i_newton_step_no_fp(’atan’,x) intval gradient value fXg.x = [ 0.0000, 0.7854] intval gradient derivative(s) fXg.dx = [ 0.5000, 1.0000] intval x = [ 0.0000, 0.0364] is_empty = 0 >> [x, is_empty] = i_newton_step_no_fp(’atan’,x) intval gradient value fXg.x = [ 0.0000, 0.0364] intval gradient derivative(s) fXg.dx = [ 0.9986, 1.0000] intval x = 1.0e-005 * [ 0.0000, 0.2002] is_empty = 0 >> [x, is_empty] = i_newton_step_no_fp(’atan’,x) intval gradient value fXg.x = 1.0e-005 * [ 0.0000, 0.2002] intval gradient derivative(s) fXg.dx = [ 0.9999, 1.0000] intval x = 1.0e-018 * [ 0.0000, 0.3342] is_empty = 0 >> [x, is_empty] = i_newton_step_no_fp(’atan’,x) intval gradient value fXg.x = 1.0e-018 * [ 0.0000, 0.3342]

29

intval gradient derivative(s) fXg.dx = [ 0.9999, 1.0000] intval x = 1.0e-034 * [ 0.0000, 0.2408] is_empty = 0 >> x = midrad(1,1) intval x = [ 0.0000, 2.0000] >> [x, is_empty] = i_newton_step_no_fp(’atan’,x) intval gradient value fXg.x = [ 0.0000, 1.1072] intval gradient derivative(s) fXg.dx = [ 0.1999, 1.0000] intval x = [ 0.0000, 0.2147] is_empty = 0 >> [x, is_empty] = i_newton_step_no_fp(’atan’,x) intval gradient value fXg.x = [ 0.0000, 0.2114] intval gradient derivative(s) fXg.dx = [ 0.9559, 1.0000] intval x = 1.0e-003 * [ 0.0000, 0.4090] is_empty = 0 >> [x, is_empty] = i_newton_step_no_fp(’atan’,x) intval gradient value fXg.x = 1.0e-003 * [ 0.0000, 0.4090] intval gradient derivative(s) fXg.dx = [ 0.9999, 1.0000] intval x = 1.0e-011 * [ 0.0000, 0.2851] is_empty = 0 >> [x, is_empty] = i_newton_step_no_fp(’atan’,x) intval gradient value fXg.x = 1.0e-011 * [ 0.0000, 0.2851] intval gradient derivative(s) fXg.dx = [ 0.9999, 1.0000] intval x = 1.0e-027 * [ 0.0000, 0.2020] is_empty = 0 >> [x, is_empty] = i_newton_step_no_fp(’atan’,x) intval gradient value fXg.x = 1.0e-027 * [ 0.0000, 0.2020] intval gradient derivative(s) fXg.dx = [ 0.9999, 1.0000] intval x = 1.0e-043 * [ 0.0000, 0.1122] is_empty = 0 >> x = midrad(3,6) intval x = [ -3.0000, 9.0000] >> [x, is_empty] = i_newton_step_no_fp(’atan’,x) intval gradient value fXg.x = [ -1.2491, 1.4602] intval gradient derivative(s) fXg.dx = [ 0.0121, 1.0000] intval x = [ -3.0000, 1.7510] is_empty = 0 >> [x, is_empty] = i_newton_step_no_fp(’atan’,x) intval gradient value fXg.x = [ -1.2491, 1.0519] intval gradient derivative(s) fXg.dx = [ 0.0999, 1.0000] intval x = [ -0.0663, 1.7510] is_empty = 0 >> [x, is_empty] = i_newton_step_no_fp(’atan’,x) intval gradient value fXg.x = [ -0.0662, 1.0519] intval gradient derivative(s) fXg.dx = [ 0.2459, 1.0000] intval x = [ -0.0663, 0.1424] is_empty =

30

0 >> [x, is_empty] = i_newton_step_no_fp(’atan’,x) intval gradient value fXg.x = [ -0.0662, 0.1414] intval gradient derivative(s) fXg.dx = [ 0.9801, 1.0000] intval x = 1.0e-003 * [ -0.7514, 0.0184] is_empty = 0 >> [x, is_empty] = i_newton_step_no_fp(’atan’,x) intval gradient value fXg.x = 1.0e-003 * [ -0.7514, 0.0184] intval gradient derivative(s) fXg.dx = [ 0.9999, 1.0000] intval x = 1.0e-009 * [ -0.0165, 0.1906] is_empty = 0 >> [x, is_empty] = i_newton_step_no_fp(’atan’,x) intval gradient value fXg.x = 1.0e-009 * [ -0.0165, 0.1906] intval gradient derivative(s) fXg.dx = [ 0.9999, 1.0000] intval x = 1.0e-025 * [ -0.2585, 0.1293] is_empty = 0 >>

We see that the interval Newton method, using intersection with the original interval, is stable, and appears to converge regardless of initial interval. (b) f ′ (x) = 1 + ex . Since x ∈ [−1, 0], f ′ (x) ∈ [1 + 1e , 2]. If x ˇ = −0.5, we then have N (f ; x, x ˇ) = x ˇ−

f (ˇ x) ⊆ [−0.5779, −0.5532] ⊂ [−1, 0]. ′ f (x)

By Theorem 2.6, there exists a unique solution to f (x) = 0 for x ∈ [−1, 0]. (Note: i newton step no fp may be used as in part (a), to obtain the above result.) (c) Iterating i newton step no fp until the displayed intervals become stationary, we obtain x ∈ [−0.56714329040979, −0.56714329040978].

(d) Notice that f (0)f (1) = −1 + e−1 = −0.63 < 0, and |f ′ (x)| = |1 + ex | > 0, so it follows that there exists a unique solution z of f in [−1, 0]. Furthermore, if we let m = 1 + 1e and M = 1, then |f ′ (x)| ≥ m, |f ′′ (x)| = |ex | ≤ M , and Theorem 2.5 implies there is a neighborhood [z − ρ, z + ρ] ⊆ [−1, 0] such that z is the only zero of f in [z − ρ, z + ρ], where ρ can be selected to be any number less than 2m/M provided that [z − ρ, z + ρ] ⊂ [−1, 0]. Students should consider which is easier to apply (the interval Newton method or the Kantorovich theorem) and which is more powerful, i.e. which leads to uniqueness in a larger interval. 29. By Assumption (a), g(x) ⊆ x, so g maps x into itself. By Assumption (b), mag(g ′ (x)) < 1, so g is a contraction on x. Then by Theorem 2.3 (the 31

Contraction Mapping Theorem), there exists a unique x such that g(x) = x. Therefore there is a unique solution x of f (x) = 0 in x. √ √ 30. Since (1 − 5)k ≤ (1 + 5)k , for k = 0, 1, 2, ..., we have qk

√ √ 1+ 5 √ (1 + 5)k + 2 5 √ √ 1+ 5 √ (1 + 5)k + ≤ 2 5 √ = (1 + 5)k . =

√ √ 5−1 √ (1 − 5)k 2 5 √ √ 5−1 √ (1 + 5)k 2 5

Thus, Equation (2.19) follows. 31. A matlab code for the secant method is provided in ch 2 31.m. 32. By (2.20), p(xk ) = f (xk ) is clear to see. N ext, note that p(xk−1 ) = f (xk ) + (xk−1 − xk )f [xk , xk−1 ], where f [xk , xk−1 ] =

f (xk −1)−f (xk ) . xk−1 −xk

f [xk , xk−1 , xk−2 ] =

Thus, p(xk−1 ) = f (xk−1 ). Also, since

f [xk−1 , xk−2 ] − f [xk , xk−1 ] , xk−2 − xk

)−f (xk−1 ) where f [xk−1 , xk−2 ] = f (xxk−2 and f [xk , xk−1 ] = k−2 −xk−1 we obtain p(xk−2 ) = f (xk−2 ).

f (xk−1 )−f (xk ) , xk−1 −xk

33. A matlab code for M¨ uller’s method is provided in ch 2 33.m, which calls functions dif1.m and dif2.m. 34. A matlab function for Steffensen’s method, steffensen.m, is available from the web site for the book, at http://interval.louisiana.edu/Classical-and-Modern-NA/

An example matlab script run steffensen.mthat calls steffensen.m is available from the same place. 35. We may use newton.m, available from the web site for the book. The actual computations for that routine work just as well whether we are using real arithmetic or complex arithmetic, but the print statements will not function as wished without modification. If we use the following function function y = f_ch_2_35(z) y = z^2+1;

(and corresponding function for its derivative) we obtain the following dialog.

32

>> z = 0.2 + 0.7i z = 0.2000 + 0.7000i >> [zstar,success] = newton(z,’f_ch_2_35’,’f_ch_2_35_prime’,1e-8,3) 1 0.200000000 0.550000000 2 -0.088679245 -0.012998398 3 -0.001238151 0.007446937 zstar = 0.2000 + 0.7000i success = 0 >>

(The imaginary components are omitted, since the format in the print statement is for a real values.) Presented in tabular form and rounded, the actual iterates are: k 0 1 2 3

xk 0.2+0.7i -0.0887 + 1.0104i -0.0012 + 0.9963i 1.0000i

f (zk ) 0.5500 + 0.2800i -0.012998-0.1792i 0.0074469-0.0024671i -1.2412e-005+9.2858e-006i

f ′ (zk ) 0.4000 + 1.4000i -0.17736+2.0208i -0.0024763+1.9925i 9.2858e-006+2i

36. This problem is similar to the next problem. The provided program roots of poly.m may also be used for this problem, and in particular for parts (b), (c), and (d), and (e). The Wilkinson polynomial of degree 7 expanded in terms of powers of x is W7 (x)

=

−7 − 28x6 + 322x5 − 1960x4 + 6769x3 −13132x2 + 13068x − 5040,

while the Wilkinson polynomial of degree 20 expanded in terms of powers of x is W7 (x)

=

x20 − 210x19 + 20615x18 −1256850x17 + 53327946x16 − 1672280820x15 +40171771630x14 − 756111184500x13 +11310276995381x12 − 135585182899530x11

+1307535010540395x10 −10142299865511450x9 + 63030812099294896x8

−311333643161390640x7 + 1206647803780373360x6 −3599979517947607200x5 + 8037811822645051776x4

−12870931245150988800x3 + 13803759753640704000x2 −8752948036761600000x + 2432902008176640000

33

One point here is that, in power expansion, the number of digits in the coefficients of the Wilkinson polynomial is too high to represent the coefficients exactly in the floating point system and, furthermore, there is severe cancelation when evaluating the corresponding expression in the portion of the domain corresponding to the roots. Thus, in power form, the roots of the Wilkinson polynomial may not be accurately obtainable, but somewhat better results may be obtain if the polynomial is actually evaluated in product form. The problem may not be so obvious with the polynomial of degree 7 and IEEE double precision arithmetic. Another thing students can do is compute the condition number of the Wilkinson polynomial with respect to its coefficients. 37. A matlab program for this problem is provided in roots of poly.m. This matlab script uses the function mymuller.m, as well as the function mynewton.m (where the latter accepts f in terms of its coefficients). (I) The roots are: ±1, ±i.

(II) The roots are: ±i, ±3, 5.

(III) The roots are: −0.00096 ± 0.9998i, 3.31287, −2.95127, and 4.18577. From the answers to (II) and (III), we see that the roots of the polynomial are sensitive to small changes in the coefficients. 38. Yes, it works for the Wilkinson polynomials of degree 7, 10 and 20. One can check this by typing the following matlab’s command window : c = store_polynomial(7,’Wilkinson_polynomial’) roots_of_poly(c)

Above, the functions store polynomial.m and Wilkinson polynomial.m are included on the media distributed with this instructor’s manual. These functions are in matlab’s symbolic mathematics toolbox. (That toolbox is required to run these.)

34

Chapter 3 1. An answer is provided in the appendix to the book. 2. Let A = (aij ) be a symmetric positive definite matrix, and let ei be the ith coordinate vector, i.e., that vector whose i-th coordinate is 1 and whose other coordinates are 0. Then, eTi Aei = aii > 0. (This is also provided in the appendix to the book.) 3. (a) (xT x)T = xT AT x = −xT Ax. Since xT Ax = (xT Ax)T , we have xT Ax = −xT Ax ⇒ xT Ax = 0. For a given i, let x = (xj ) defined by xi = 1 and xj = 0. If i 6= j. Since x 6= 0, we have aii = 0 (b) Suppose (I − A)x = 0 ⇒ x = Ax. Then, xT x = xT AT x = −xT Ax = −xT x. Hence, xT x = 0 ⇒ x = 0. Therefore, we have shown (I − A)x = 0 ⇒ x = 0. Hence, (I − A) is non-singular. 4. Note that aij = wjT wi = wiT wj = aij . Hence, A is symmetric and real, since wi is real. Also note that xT Ax

=

n X

xj (

j=1

=

n X

=  where Z =

n X

aij xi )

i=1

xj

j=1



n X

n X

wjT wi xi

i=1

n X j=1

T

xj wj 

= Z T Z ≥ 0,

!

n X i=1

xi wi

!

xi wi .

i=1

5. Let (·, ·)2 denote the dot product with defined by the second part of Example 3.5. (a) (x, x)2 = xH Hx ≥ 0 and xH Hx = 0 only if x = 0 come directly from the definition of Hermitian matrix. H H H H H (b) (x, y)H = xH Hy = (y, x)2 . However, 2 = (y Hx)H = x H (y ) T H T since v = v and since (y, x)2 = (y, x)2 , part (b) of Definition 3.19 holds. (c) Part (c) of Definition 3.19 holds because of the linearity of matrix multiplication. 6. Let σ1 ≥ σ2 ≥ ... ≥ σn ≥ 0 be the singular values of A. Then we have Av1 = σ1 u1 and AH u1 = σ1 v1 . Hence, AH Av1 = AH σ1 u1 = σ1 AH u1 = σ12 v1 . Thus, σ12 p is the largestpeigenvalue of AH A. Hence the 2-matrix norm is ||A||2 = (AH A) = σ12 = σ1 = ρ(A). 35

7. An answer is provided in the appendix to the book. 8. (a) First, observe that d2 =

1 minkxk2 =1 kxk1

and d1 = min kxk2 . kxk1 =1

To compute d2 , notice that kxk21

= =

n X

i=1 n X

|xi |

x2i +

i=1

=

!2

kxk22 +

n n X X

i=1 j=1,j6=i n n X X i=1 j=1,j6=i

|xi ||xj |

|xi ||xj |.

Thus, the smallest kxk1 can be is kxk2 , (and that value is actually achieved when x is a coordinate vector), so d2 = 1. To obtain d1 , we can compute min kxk22 subject to kxk1 = 1. Without loss of generality, we may assume all components of x are nonnegative. We usePLagrange multipliers ∇f = λ∇g, where f (x) = n kxk22 and g(x) = i=1 xi . This gives     2x1 1     ∇f =  ...  = λ  ...  = ∇g. 2xn

1

Combining this with√ the constraint g = √ 1 gives x1 = x2 = · · · = xn = 1/n, with kxk2 = 1/ n. Thus, d1 = 1/ n. Summarizing, 1 √ kxk1 ≤ kxk2 ≤ kxk1 . n (b) As in part[(a)], b2 =

1 minkxk∞ =1 kxk1

and b1 = min kxk∞ . kxk1 =1

For b2 , minkxk∞ =1 kxk1 occurs when x is a coordinate vector, so b2 = 1. For b1 , we may assume all components are non-negative. We also see a symmetry in the solution: If |xk | = kxk∞ , where x = argminkxk1 =1 kxk∞ , then any permutation of the components of x is also a solution to the minimization problem. The instructor might point out to students that this is a special case of the Chebyshev equi-oscillation property discussed in Chapter 4: If one component 36

of x were larger than another component, we could decrease that component and increase all of the other components to degrease the maximum of the components. Therefore, x1 = x2 = · · · = xn = 1/n = b1 . Summarizing, 1 kxk1 ≤ kxk∞ ≤ kxk1 . n (c) Combining parts (a) and (b) gives kxk∞ ≤ kxk1 ≤



1 nkxk2 i.e. √ kxk∞ ≤ kxk2 , n

and kxk2 ≤ kxk1 ≤ nkxk∞ , so

1 √ kxk∞ ≤ kxk2 ≤ nkxk∞ . n

Thus, the implications from parts (a) and (b) are not as sharp as Proposition 3.2, which derives the bounds directly. 9. Since the two norms are equivalent, there is a c such that kz − xk kα ≤ ckz − xk kβ . Since limk→∞ kx − xk kβ = 0, given ǫ > 0, there is an N such that k > N implies kx − xk kβ < ǫ/c, and thus kz − xk kα < ǫ.    n n n X X X 1 10. kxk21 = |xj |2 ≤  |xj |2   12  = nkxk22 ⇒ kxk1 ≤ n 2 kxk2 . j=1

kAk1 =

kAk2 =

j=1

sup kAxk1 ≤

kxk1 =1

sup kAxk2 ≤

kxk2 =1

j=1

1 2

sup n kAxk2 ≤

kxk1 =1

1 2

sup n kAxk∞ ≤

kxk2 =1

1

1

sup n 2 kAxk2 = n 2 kAk2 .

kxk2 =1

1

1

sup n 2 kAk∞ = n 2 kAk∞ .

kxk2 =1

1 2

Combining the results, we get kAk1 ≤ n kAk2 ≤ nkAk∞ . p 11. kAk∞ = 11, kAk1 = 9 and kAk2 = ρ(AT A) = 9.2450 where ρ(A) = 9. Clearly, ρ(A) ≤ kAk1 , ρ(A) ≤ kAk2 , and ρ(A) ≤ kAk∞ . 12. An answer is provided in the appendix to the book. 13. An answer is provided in the appendix to the book. 14. An answer is provided in the appendix to the book. 15. An answer is provided in the appendix to the book. 16. An answer is provided in the appendix to the book. 37

17. From Algorithm 3.2, n−1 X

Number of divisions:

1 + 1 = n.

k=1

Number of multiplications: n−1 X

n X

1

=

k=1 j=k+1

n−1 X k=1

n−1 X

(n − k) =

= n(n − 1) −

k=1

n−

n−1 X

k

k=1

n(n − 1) n(n − 1) = . 2 2

Hence, the number of multiplications and divisions is n(n − 1) n2 + n +n= . 2 2 Number of additions and subtractions:

n−1 X k=1

(n − k) =

n(n − 1) . 2

18. From Algorithm 3.1, Number of divisions:

n−1 X

n X

1=

k=1 j=k+1

Number of multiplications:

n−1 X

n2 − n . 2

n n X X

k=1 i=k+1 j=k

1=

n3 − n . 3

Hence, the number of multiplications and divisions is n3 /3 + O(n2 ).

Note: The number of additions and subtractions is also n3 /3 + O(n2 ). In fact, if we look at “fused multiply-add” z ← ax+y, the multiplications and additions can mostly be grouped as fused multiply-add. In many current computer chips, fused multiply-add is designed as a single operation. 19. Since LL−1 = I, the  l11  l21 l22   l31 l32   .. ..  . .   li1 li2   . ..  .. . ln1 ln2

i-th column of L−1 corresponds to the system     x1 0   x2  0       x3  0 l33       ..   ..  .. ..   .  = . . . .       xi  1 li3 · · · lii       .  . .. .. .. ..   ..   ..  . . . . ln3 · · · · · · · · · lnn xn 0

Using forward substitution, we get x1 = x2 = x3 = .... = xi−1 = 0 and 1 xi = . Hence, L−1 is a lower triangular matrix. lii 38

20. Consider the multiplier matrix at stage-r (3.11):

M

(r)



   =  

Ir−1 0 0 .. . 0

0 1 −mr+1,r .. . −mn,r

Let B = (M (r) )−1 . We then have    Br−1,r−1 Ir−1 0 0...0   br,r−1  0 1 0 . . . 0      br+1,r−1  0 −mr+1,r ·     . . .. ..  .. . In−r   0 −mn,r bn,r−1   Ir−1 0 0 . . . 0  0 1 0...0      0 = 0 .  .  . .. I  ..  n−r 0 0

 0...0 0...0    .  In−r  br−1,r br,r br+1,r .. . bn,r

 br−1,r+1 . . . br−1,n  br,r+1 . . . br,n      Bn−r,n−r

Block wise multiplication immediately yields the only non-zero elements in the B matrix as Br−1,r−1 = Ir−1 , br,r = 1, Bn−r,n−r = In−r and [br+1,r , . . . , bn,r ]T = [mr+1,r , . . . , mn,r ]. 21. M (r) is the identity matrix except in the r-th column. Furthermore, in multiplication of a matrix A on the left by a matrix M , the i-th row, r-th column of M specifies what multiple of the r-th row of A is to be in the linear combination of rows of representing the i-th row of the result. However, the r-th column of M (r) contains precisely minus the multiplying factors for the r-th step of Gaussian elimination, and the i-th column (for i 6= r) contains the i-th column of the identity matrix, so applying M (r) translates to “replace the i-th row by it minus the multiplying factor times the r-th row, for i > r.” 22. This follows from the results of problems 20 and 21. 23. Factoring the matrix requires 13 n3 + O(n2 ) multiplications and divisions, and a similar number of additions and subtractions. Then, to find the k-th column of the inverse, we apply forward substitution and back-substitution to the i-th column of the identity matrix, that is, we solve Ly = ei , then solve U x = y. Since the first non-zero element of ei is the i-th one, solution ˜ y = e1,n−i where L ˜ ∈ R(n−i+1)×(n−i+1) of Ly = ei is equivalent to solving L˜ is the lower left part of L. This requires n−i X j=1

(n − i + 1 − j) =

n−i X

k=1

39

k=

(n − i)(n − i + 1) 2

multiplications. Thus, doing the forward substitution requires n X (n − i)(n − i + 1) i=1

2

=

n−1 X j=1

=

n−1 n−1 j(j + 1) 1X 2 1X = j + j 2 2 j=1 2 j=1

1 3 n + O(n2 ). 6

The back-substitution also can be economized, since if yi is the vector with Lyi = ei , the first through (i−1)-st elements of yi are zero; in other words, the upper left corner of U need not be processed. Thus, the number of multiplications to back-substitute yi is i X

j=n

(j − 1) +

1 X

j=i−1

(i − 1) = =

n−1 X

k=i−1 n−1 X k=1

= =

k + (i − 1)2

k−

i−2 X

k=1

k + (i − 1)2

(n − 1)n (i − 2)(i − 1) − + (i − 1)2 2 2 i2 n2 + + O(n). 2 2

Thus, the total number of operations for the n back-substitutions is n  2 X n i=1

2

+

 i2 2 + O(n) = n3 + O(n2 ). 2 3

Adding the operations for the forward substitutions and the back substitutions gives 56 n3 + O(n2 ). Adding this to the 13 n3 operations required to do the factorization of the matrix gives 7 3 n + O(n2 ) 6 multiplications to compute the inverse. Note the error in the question. Also, since the meaning of “operation” is not unique when talking about modern computers, the request could cause some confusion. A suggested reformulation of the question is in the errata. 24. The A, B, C, and E are m × m matrices. Thus, to carry through the computation at the top of page 121 requires n − 1 inversions of an m × m matrix, 2(n − 1) multiplications of m × m matrices, and n − 1 subtractions of m × m matrices. Instead of the inversion and multiplication to form the Ei , we may do the factorization (requiring 1/3n3 + O(n2 ) operations) and m forward substitutions and m back-substitutions. The counts are done as in problems 17 and 23. 40

25. Note that kU xk22 = (U x, U x) = (U x)H U x = xH U H U x = xH Ix = (x, x) = kxk22 , so kU xk2 = kxk2 .

p p 26. κ2 (U ) = kU k2 kU −1 k2 = ρ(U H U ) ρ(U −H U −1 ). Since U is unitary, UHU = p I and p U −H U −1 = U −H U H = (U U H )−1 = I. Hence, we have κ2 (U ) = ρ(I) ρ(I) = 1. 27. (a) Note that

A−1



 −2 −4 3 1 =  −2 11 −6  . 3 3 −6 3

Hence, kAk∞ = 25 and kA−1 k∞ =

19 3 ,

so k∞ (A) ≈ 158.333.

(b) Using floating point arithmetic with β = 10 and t = 3 and Algorithms 3.1 and 3.2, we get x3 = 0, x2 = −1.33, x1 = 1.66.

(c) (This part is lengthy and tedious if done as indicated, but should provide insight into the process. A less lengthy and tedious variant would be to use intlab, but the relationship between conditioning and width of solution bounds would be more difficult to see using double precision arithmetic.) Using the forward substitution then back substitution as in the note on page 110 (but with three-digit rounding to nearest), we obtain   −0.670 × 100 −0.134 × 101 0.100 × 101 0.367 × 101 −0.200 × 101  . Y =  −0.667 × 100 1 0.100 × 10 −0.200 × 101 0.100 × 101 Computing Y A and Y b with outwardly rounded interval arithmetic now gives   0.970 −0.04 [−0.1, 0] Y A ⊆  [−0.1, 0] [0.9, 1.1] [−0.1, 0.1]  , 0 0 1   1.67 Y b ⊆  [−1.34, −1.33]  , 0

where point values represent intervals of zero width (i.e. no roundoff error occurred). We now proceed with step 3 of Algorithm 3.5: k = 1: For i = 2, m1,2 ˜ 2,2 a ˜ 2,3 a ˜2 b

← [−0.1, 0]/0.97] ⊆ [−.104, 0], ← [0.9, 1.1] − [−0.104, 0](−0.04) ⊆ [0.895, 1.1],

← [−0.1, 0.1] − [−0.104, 0][−0.1, 0] ⊆ [−0.111, 0.1],

← [−1.34, −1.33] − [−0.104, 0](1.67) ⊆ [−1.34, −1.15]. 41

For i = 3, m1,3 ˜ 2,3 a ˜ 3,3 a ˜3 b

← 0,

← 0, ← 1, ← 0.

At this point, we have   0.970 −0.04 [−0.1, 0] ˜ ⊆  0 [0.895, 1.1] [−0.111, 0.1]  , A 0 0 1   1.67 ˜b ⊆  [−1.34, −1.15]  . 0

˜ do not change. ˜ and b k = 2: For i = 3, we have m2,3 ← 0, and A

We now proceed with steps 4 and 5 (the back-substitution process): x3 x2 x1

← 0, [−1.34, −1.15] − 0 · [−0.111, 0.1] ← ⊆ [−1.50, −1.04], [0.895, 1.1] 1.67 + 0.04 · [−1.50, −1.04] ← ⊆ [1.65, 1.68]. 0.97

(d) The exact solution (via elementary row operations) is x =

T 5 4 . 3, −30

(e) Since the machine epsilon is ǫm = 5 × 10−3 and the condition number is κ ≈ 1.58 × 102 , reasoning roughly, we would expect relative error in the solution to be ǫm /κ; that is, we would only expect about one digit to be correct. We observe that the interval enclosure contains the exact solution (it must, if both are correct), and that it also happens to contain the approximate solution. Also, the widths of the interval enclosure reflect the expected relative error in the solution, based on the condition number, although the approximate solution happens to be much more accurate in this case. 28. ϕ(x) = yields

1 2 kAx

− bk22 =

1 2 (Ax

− b, Ax − b). Expanding the inner product

 1 (Ax)T (Ax) − bT (Ax) − (Ax)T b + bT b . 2 For a minimum, we require that ∇ϕ(x) = 0, which then yields ϕ(x) =

 1 2AT Ax − 2AT b = 0, 2

which in turn leads to the normal equations. 42

29. (a) Let A1 = (1, 1, 1, 1)T . Then v=

A1 = [1 1 1 1]T . kA1 k∞

Let a1 = A1 (1). Then σ ˜=

sgn(a1 ) = 2. kA1 k∞

From here, compute u = (u1 , u2 , u3 , u4 )T , where u1 = a1 + σ ˜ = 3, u2 = A1 (2) = 1, u3 = A1 (3) = 1, and u4 = A1 (4) = 1. Also, θ=σ ˜ u1 = 6 and σ = σ ˜ kA1 k∞ = 2. We can then compute   −1/2 −1/2 −1/2 −1/2  −1/2 u uT 5/6 −1/6 −1/6  . = U1 = I −  −1/2 −1/6 5/6 −1/6  θ −1/2 −1/6 −1/6 5/6 Repeating the same procedure for A2 = (0, 1, 2)T , we obtain   1 0 0 0  0 0 −0.4472 −0.8944  . U2 ≈   0 −0.4472 0.8 −0.4  0 −0.8944 −0.4 0.2 Then,

and





 −2 −3  0 −2.2361   R≈  0  0 0 0

 −0.5000 0.6708 0.0236 0.5472  −0.5000 0.2236 −0.4393 −0.7120  . Q=  −0.5000 −0.2236 0.8079 −0.2176  −0.5000 −0.6708 −0.3921 0.3824

43

(b)

A1

A2

A3

A4

A5

√ √ √    √ 1/ √2 1/√2 0 0 2 1/√2  −1/ 2 1/ 2 0 0     · A =  0 1/ 2  , = P21 A =     0 0 1 0 1 2  0 0 0 1 1 3 √  √ √  2/ 3 0 1/ 3 0  0√ 1 √ 0√ 0   · A1 = P31 A1 =   −1/ 3 0 2/ 3 0  0 0 0 1 √  √  3 3 √  0 1/ 2  , √ =  0 6/2  1 3  √    2 3√ 3/2 0 0 1/2   0  1 0 0   · A =  0 1/ √ 2 , = P41 A2 =   0 0 1 √0  2  0 √ 6/2  −1/2 0 0 3/2 0 3     1 0 0 2 √3 √0  0  0 1/2 3/2 0  2     √ = P32 A3 =   0 − 3/2 1/2 0  · A3 =  0 0  , √ 0 0 0 1 3 0     1 √ 0√ 0 √ 0√ 2 √3  0  2/ 5 0 3/ 5  5   · A3 =  0  = P42 A4 =   0   0 0 . 0 1 0 √ √ √ √ 0 0 0 − 3/ 5 0 2/ 5 

This gives Q



 and R ≈  

= (P42 · P32 · P41 · P31 · P21 )−1   0.5000 −0.6708 0.4082 0.3651  0.5000 −0.2236 −0.8165 −0.1826   ≈   0.5000 0.2236 0.4082 −0.7303  0.5000 0.6708 0 0.5477  2 3 0 2.2361  .  0 0 0 0

44

(c) In matlab, [Q,R] = qr(A) yields  −2 −3  0 −2.2361 R≈  0 0 0 0 and



   

 −0.5000 0.6708 0.0236 0.5472  −0.5000 0.2236 −0.4393 −0.7120  . Q≈  −0.5000 −0.2236 0.8079 −0.2176  −0.5000 −0.6708 −0.3921 0.3824

This is the same result as in part(a) but differs from the one obtained in part(b). (d) Using the R and Q in part(c) (or part(a)) along with R(x1 , x2 , 0, 0)T = QT b where x = (x1 , x2 )T yields x ≈ (1.2001, 2.1999)T .

(e) Using the R and Q in part(b) along with Rx = QT b yields x = (1.2001, 2.1999)T . Note that, even though the QR factorizations were distinct, the least squares solution to Ax = b is the same. 30. (a) A = LU , where   1/3 4/11 1 1 0  L =  2/3 1 0 0



6 and U =  0 0

 −5 8 22/3 −13/3  . 0 −1/11

(b) Solving Ly = b yields y = (15, −3, 1/11)T . Then, solving U x = y yields x = (3, −1, −1)T .

31. Since A = LU , we have    C BT I = B 0 X

0 I

 

C 0

Y Z



=



C XC

Y XY + Z



.

Then, B T = Y , B = XC and XY + Z = 0. Solving these three equations, we obtain X = BC −1 , Y = B T and Z = −BC −1 B T . 32. An answer is provided in the appendix to the book.    T l11 0 0 l11 0 0 33. A = LLT =  l12 l22 0   l12 l22 0  , where l31 l32 l32 l31 l32 l32   1 0 0 L =  −1 2 0 . 2 3 4 A is positive definite because its Cholesky factorization exists. 45

34. (a) Since kA−1 ∆Ak < δkA−1 k kAk = r < 1, it follows that A + ∆A = A(I + A−1 ∆A) is non-singular. (b) Since (A + ∆A)y = b + ∆b, we have A−1 (I + ∆A)y = A−1 (b + ∆b). Since Ax = b, we then have (I + A−1 ∆A)y = x + A−1 ∆b. Then:  kyk ≤ k(I + A−1 ∆A))−1 k kxk + δkA−1 k kbk  1 kxk + δkA−1 k kbk ≤ 1 − k(−A−1 ∆A))k  1 ≤ kxk + δkA−1 k kbk 1−r   1 kbk = kxk + r . 1−r kAk The proof is completed by noting that kbk = kAxk ≤ kAk kxk.     1 0.1α 0.1α 1.5 −0.1α 35. Since A = , we have A−1 = . 1.0 1.5 −1.0 0.1α 0.05α Then, we have kAk∞ = max{0.2|α|, 2.5} and   1 1 −1 kA k∞ = max (1.5 + 0.1α), (0.1 + 0.1α) 0.05α 0.05α   30 20 30 = max + 2, +2 = + 2. |α| |α| |α| Hence,

  75 κ(A) = max 0.4|α| + 2, +5 . |α|

For κ(A) to be minimum, we require that 0.4|α| + 2 = α = 12.5.

75 |α|

+ 5, which yields

36. kAk∞ = 2, kA−1 k∞ = 4. Hence κ(A) = 8. 37. An answer is provided in the appendix to the book. 38. (a) Multiplying the matrices yields, α1 = a1 which implies α1 6= 0. Also note that, |α2 | = |a2 − γ1 b2 | > |a2 | − |γ1 kb2 | > |a2 | − |b2 | > |c2 | > 0 which implies α2 6= 0. Similarly, |α3 | = |a3 − γ2 b3 | > |a3 | − |γ2 | |b3 | > |a3 | − |b3 | > 0 which implies α3 6= 0. Note that we have used |γi | < 1 for i = 1, 2, which will be proven in part (b). c1 (b) Multiplying the matrices yields, γ1 = which implies |γ1 | < 1. a1 Note that |γ2 | =

|c2 | |c2 | |c2 | = < ≤ 1. |α2 | |α2 − b2 γ1 | |a2 | − |c2 |

This implies that |γ2 | < 1. 46

39. (a) Since the inverse matrix, we have  1 0 0 2  1 1  4 2 0   1 1 1  8 4 2 1 16

1 8

Therefore, A−1

of a lower triangular matrix is a lower triangular 0



a

 0  b   0   d 1 1 g 4 2  2 0  −1 2 =  0 −1 0 0

0

0

c

0

0





1 0

  0   0 1 =  e f 0    0 0 h i j 0 0  0 0 0 0  . 2 0  −1 2

0 0



 0 0  . 1 0   0 1

(b) κ(A) = kAk∞ kA−1 k∞ = 2.8125. (c) A(u − u ˜) = b − ˜b which yields ku − u ˜k∞ = kA−1 (b − ˜b)k∞ = 0.02. −1 Note that we have used A from part (a) and the vectors b and ˜b given. We could also estimate the norm from ku − u ˜ k∞ kb − ˜bk∞ ≤ κ∞ (A) = (2.8125) · (0.01). kuk∞ kbk∞ 40. Let 

 3 x =  0 , 4



 −5 y= 0  0

  0.8944 (x − y) . 0 and w = ≈ kx − yk2 0.4472

Then the Householder matrix is given by   −0.6000 0 −0.8000 . 0 1.0000 0 H = I − 2 w wT =  −0.8000 0 0.6000 41. (a) For the given algorithm, we have the following: n n X X n2 − n Number of Divisions: 1= . 2 k=1 i=k+1

Number of Multiplications: n X n k−1 X X k=1 j=k s=1

1+

n n k−1 X X X

k=1 j=k+1 s=1

1=

2n3 − 3n2 + n . 6

Adding the results yields the total number of multiplications and n3 − n divisions for LU factorization: . 3 n n n X n X X X 1+ 1 = n2 . Number of Subtractions: k=1 i=k+1

47

k=1 j=k

Number of Additions: n X n k−2 X X

1+

k=1 j=k s=1

n n k−2 X X X

1=

k=1 j=k+1 s=1

2n3 − 9n2 + n . 6

Adding the results yields the total number of subtractions and addin3 n2 n tions for LU factorization: − + . 3 2 6 (b) For solving Ly = b using a forward substitution algorithm (with lii = 1) given by: For i = 1 to n do i−1 X xi = bi − xi lij j=1

End do The number of divisions: 0. The number of multiplications: n X i=1

The number of additions:

(i − 1) =

n X i=1

and subtractions:

n X

n n2 − . 2 2

(i − 2) =

n2 − 3n 2

= n.

i=1

(c) We now consider solving U x = y using the back substitution algorithm (with lii = 1), given by For i = n to 1 (step: -1) do   n X bi − uij xi  xi =

j=i+1

uii End do The number of divisions is 0,

the number of multiplications is

n X i=1

the number of additions is

n X i=1

the number of subtractions is

(i − 2) =

n X i=1

48

(i − 1) =

1 = n.

n2 n − , 2 2

n2 − 3n , and 2

(d) Since the leading term 13 n3 in the factorization is the same in both cases, the difference is not significant for large n. Similarly, the leading term, 12 n2 , is the same in both cases for the forward-substitution and back-substitution. 42. (a) The eigenvalues of J are approximately 0.3737±0.8674i and −0.7474. Hence ρ(J) ≈ 0.9444 < 1. The eigenvalues of G are 0, 0, and −1, so ρ(G) = 1. (b) Starting with x(0) = [1 1 1]T , the Gauss-Seidel Method diverges. In particular the iterates x(k+1) = −(L + D)−1 U x(k) + (L + D)−1 b are (−1, −1, −1)T for k even and (1, 1, 1)T for k odd. 43. We may use Gauss Seidel image from the Moore / Kearfott / Cloud book. (See http://interval.louisiana.edu/Classical-and-Modern-NA/) This results in the following matlab dialog. >> A = [rigorinfsup(’0.99’,’1.01’) rigorinfsup(’1.99’,’2.01’) rigorinfsup(’2.99’,’3.01’) rigorinfsup(’3.99’,’4.01’)] intval A = [ 0.9899, 1.0101] [ 1.9899, 2.0101] [ 2.9899, 3.0101] [ 3.9899, 4.0101] >> b = [rigorinfsup(’-1.01’,’-0.99’);rigorinfsup(’0.99’,’1.01’)] intval b = [ -1.0101, -0.9899] [ 0.9899, 1.0101] >> x0 = [infsup(-10,10);infsup(-10,10)] intval x0 = [ -10.0000, 10.0000] [ -10.0000, 10.0000] >> [x1,is_empty,error_occurred] = Gauss_Seidel_image(A,b,x0) intval x1 = [ 2.5922, 3.4330] [ -2.2654, -1.7450] is_empty = 0 error_occurred = 0 >> [x2,is_empty,error_occurred] = Gauss_Seidel_image(A,b,x1) intval x2 = [ 2.8175, 3.1938] [ -2.1313, -1.8738] is_empty = 0 error_occurred = 0 >> [x3,is_empty,error_occurred] = Gauss_Seidel_image(A,b,x2) intval x3 = [ 2.8214, 3.1897] [ -2.1265, -1.8785] is_empty = 0 error_occurred = 0 >> [x4,is_empty,error_occurred] = Gauss_Seidel_image(A,b,x3) intval x4 = [ 2.8215, 3.1895] [ -2.1264, -1.8786]

49

is_empty = 0 error_occurred = 0 >>

The instructor may either have students gain experience with intlab by doing the above computation, or have students gain more detailed insight into the algorithm by having them carry out the computation by hand. 44. The example claims that A and B are irreducible and C is reducible. First, to show C is reducible, we need non-empty disjoint subsets S and T such that S ∪ T = {1, 2, 3} and i ∈ S and j ∈ T implies cij = 0. To do so, let S = {2} and T = {1, 3}. Then S ∪ T = {1, 2, 3}, S and T are nonempty and disjoint, and c21 = 0 and c23 = 0. Hence, C is reducible. Next, to show A is irreducible, one has to show that there do not exist non-empty disjoint subsets S and T such that S ∪ T = {1, 2, 3, 4} and i ∈ S and j ∈ T implies aij = 0. Consider the possibility that S = {1} and T = {2, 3, 4}. Then S ∪ T = {1, 2, 3, 4} and S ∩ T = ∅. But a23 6= 0. Similarly consider all the other 13 possibilities to check the irreducibility of A. One can repeat the same argument for B as well. 45. Follow the steps presented in the text on page 169. 46. We proceed as with Gauss Seidel image as in problem 43, with the following matlab dialog. >> A = [intval(’3.333’) intval(’15920’) intval(’-10.333’) intval(’2.222’) intval(’16.710’) intval(’9.612’) intval(’1.5611’) intval(’5.1791’) intval(’1.6852’)] intval A = 1.0e+004 * [ 0.0003, 0.0004] [ 1.5920, 1.5920] [ -0.0011, [ 0.0002, 0.0003] [ 0.0016, 0.0017] [ 0.0009, [ 0.0001, 0.0002] [ 0.0005, 0.0006] [ 0.0001, >> b = [intval(’15913’);intval(’28.544’);intval(’8.4254’)] intval b = 1.0e+004 * [ 1.5913, 1.5913] [ 0.0028, 0.0029] [ 0.0008, 0.0009] >> x0 = [infsup(-10,10);infsup(-10,10);infsup(-10,10)] intval x0 = [ -10.0000, 10.0000] [ -10.0000, 10.0000] [ -10.0000, 10.0000] >> [x1,is_empty,error_occurred] = Gauss_Seidel_image(A,b,x0) intval x1 = [ 0.9999, 1.0001] [ 0.9999, 1.0001] [ 0.9999, 1.0001] is_empty = 0 error_occurred = 0 >> format long >> x1 intval x1 =

50

-0.0010] 0.0010] 0.0002]

[ 0.99999999999873, 1.00000000000105] [ 0.99999999999999, 1.00000000000001] [ 0.99999999999682, 1.00000000000389] >> [x2,is_empty,error_occurred] = Gauss_Seidel_image(A,b,x1) intval x2 = [ 0.99999999999999, 1.00000000000001] [ 0.99999999999999, 1.00000000000001] [ 0.99999999999999, 1.00000000000001] is_empty = 0 error_occurred = 0 >> [x3,is_empty,error_occurred] = Gauss_Seidel_image(A,b,x2) intval x3 = [ 0.99999999999999, 1.00000000000001] [ 0.99999999999999, 1.00000000000001] [ 0.99999999999999, 1.00000000000001] is_empty = 0 error_occurred = 0 >>

These results are mathematically rigorous, since intval rounds decimal strings into machine intervals that contain the exact decimal string. (Note that we need to use, e.g., intval(’3.333’), rather than intval(3.333) or simply 3.333. In the second case, the decimal string 3.333 is converted to (a necessarily inexact) binary format before being passed to intval, and intval simply fills both endpoints of the interval with that approximate number. In the third case, only an approximate number is stored. Such care does not need to be taken when the number being converted is an integer with a small number of digits or a fraction whose denominator is a small power of 2.) Yes, we have proven that the system contains a unique solution in x(0) , since x(1) is contained in the interior of x(0) . This is true even though Gauss Seidel image takes the intersection of the Gauss–Seidel sweep with the original box, since that intersection is equal to the image if it is contained in the interior of the original box. In this case, the bounds are very tight, even though the condition number is moderately high. 47. We provide the code interval Gaussian elimination.m, that is a straightforward transcription of Algorithm 3.5: function [x] = interval_Gaussian_elimination(A, b) n = length(b); Y = inv(mid(A)); Atilde = Y*A; btilde = Y*b; error_occurred = 0;

51

for k=1:n for i=k+1:n m_ik = Atilde(i,k)/Atilde(k,k); for j=k+1:n Atilde(i,j) = Atilde(i,j) - m_ik*Atilde(k,j); end btilde(i) = btilde(i) -m_ik*btilde(k); end end x(n) = btilde(n)/Atilde(n,n); for k=n-1:-1:1 x(k) = btilde(k); for j=k+1:n x(k) = x(k) - Atilde(k,j)*x(j); end x(k) = x(k)/Atilde(k,k); end x = x’;

We have the following matlab dialog. >> A = [intval(’3.333’) intval(’15920’) intval(’-10.333’) intval(’2.222’) intval(’16.710’) intval(’9.612’) intval(’1.5611’) intval(’5.1791’) intval(’1.6852’)] intval A = 1.0e+004 * [ 0.0003, 0.0004] [ 1.5920, 1.5920] [ -0.0011, [ 0.0002, 0.0003] [ 0.0016, 0.0017] [ 0.0009, [ 0.0001, 0.0002] [ 0.0005, 0.0006] [ 0.0001, >> b = [intval(’15913’);intval(’28.544’);intval(’8.4254’)] intval b = 1.0e+004 * [ 1.5913, 1.5913] [ 0.0028, 0.0029] [ 0.0008, 0.0009] >> x = interval_Gaussian_elimination(A,b) intval x = [ 0.9999, 1.0001] [ 0.9999, 1.0001] [ 0.9999, >> x = interval_Gaussian_elimination(A,b) intval x = [ 0.9999, 1.0001] [ 0.9999, 1.0001] [ 0.9999, 1.0001] >> format long >> x intval x = [ 0.99999999999999, 1.00000000000001] [ 0.99999999999999, 1.00000000000001] [ 0.99999999999999, 1.00000000000001] >>

-0.0010] 0.0010] 0.0002]

1.0001]

We observe that the result is indistinguishable, in this case, from the result for the interval Gauss–Seidel method, if we iterate the interval Gauss– Seidel method. 48. An answer is provided in the appendix to the book. 52

49. An answer is provided in the appendix to the book. 50. An answer is provided in the appendix to the book. 51. Strict diagonal dominance implies x(m) → x as m → ∞, where x = (D − L)−1 U x + (D − L)−1 b, or (D − L − U )x = b or Ax = b: Consider (D − L)−1 = (I − D−1 L)−1 D−1 . But ρ(D−1 L) < 1 since A is strictly diagonally dominant. Furthermore, (I − D−1 L)−1 = I + D−1 L + (D−1 L)2 + . . . ≥ 0 since D, L ≥ 0. Thus (D − L)−1 ≥ 0. Therefore, x(m+1) = (D − L)−1 U xm + (D − L)−1 b ≥ 0 for each m, because x(0) = 0, (D − L)−1 ≥ 0, U ≥ 0 and b ≥ 0. Thus, x(m) → x ≥ 0 as m → ∞. 52. An answer is provided in the appendix to the book. 53. Let v be an eigenvector of A corresponding to λ. Then Av = (aI + L + U )v = av + (L + U )v = λv, so (L + U )v = (λ − a)v, i.e. v is also an eigenvalue of (L+U ). However, since A is strictly diagonally dominant, |L + U | < |D|, and Lemma 3.8 (page 151) implies ρ(L + U ) < ρ(D) = a. Thus, |λ−a| < a, and the result follows. (Note that Lemma 3.8 deals with non-strict inequalities. However, the proof still goes through if the inequalities are strict.)       3 2 7 0 54. Let A = ,b= and x(0) = . Then the residual is 2 4 10 0   7 r(0) = b − Ax(0) = = v (1) . Then, we have: 10   (r(0) , r(0) ) 1.2612 (1) (0) (1) ≈ 0.1802, x = x + t v ≈ , 1 1.8017 (v (1) , Av (1) )   (r(1) , r(1) ) −0.3869 (1) (0) (1) r = r − t1 Av ≈ , s1 = (0) (0) ≈ 0.0015, 0.2709 (r , r )   (r(1) , r(1) ) −0.3765 v (2) = r(1) + s1 v (1) ≈ , t2 = (2) ≈ 0.6938. 0.2858 (v , Av (2) )   1 (2) (1) (2) Hence x = x + t2 v ≈ 2 t1 =

53

55. (a) Using matlab: >> norm(eye(2)-A*F,inf) ans = 0.3750

(It is also not hard to obtain this exactly by hand.) (b) For the method in Theorem 3.24, q = 0.375, and kXk − A−1 k∞

(0.375)k kX1 − X0 k∞ 1 − 0.375 (0.375)k (0.375)k kF k∞ = · (6.625) = 0.625 0.625 = (10.6) · (0.375)k .



Solving (10.6) · (0.375)k < 10−8

for k gives

log k>



10−8 10.6



log(0.375)

≈ 21.19.

Thus, k = 22 will do. For the method in Theorem 3.25 with X0 = F , kR0 k∞ = kF k∞ = 0.375 = q, and k

−1

kXk − A

k∞

≤ =

k (0.375)2 0.375 kF k∞ = (0.375)2 1 − 0.375 0.625 k

(0.6)(0.375)2 .

Solving k

(0.6)(0.375)2 < 10−8 gives



k>

log 

log



10−8 0.6



log(0.375)

log(2) Thus, k = 5 iterations will do.



≈ 4.3.

56. (a) q1 = α1 y1 + α2 y2 , and we have for j = 1: z ← Aq1 = λ1 α1 y1 + λ2 α2 y2 . q2 ←

 1 Aq1 − (q1T Aq1 )q1 . kAq1 − (q1T Aq1 )q1 k2

Thus, since kq1 k22 = 1, q2 is orthogonal to q1 , yet is of the form q2 = β1 y1 + β2 y2 . 54

for j = 2: We proceed as with j = 1 to conclude that, if z from part 2(b)(ii) were non-zero, z would be a linear combination of y1 and y2 , but would be orthogonal to both q1 and q2 . Therefore, Algorithm 3.7 must stop with j = 2. (b) We assume we mean y0 = b and xm is computed according the the Full Orthogonalization Method as on page 173. The result follows from part (a) and Proposition 3.12. 57. In all cases, A+ = V Σ+ U T , where A = U ΣV T , and Σ and Σ+ are as follows. m > n:

and



..

          Σ=          



σ1

       0   ..  .  0         

0

. σr

0

0



1/σ1

    + Σ =   

..

0

.

0

1/σr 0

..

0 . 0

    ,   

where r is the rank of A. Then, since V is orthogonal, kV T xk2 = kxk2 for every x ∈ Rm , and a solution to Ax = U ΣV T x = b corresponds to a solution to U Σy = b, where y = V T x Similarly, because U is orthogonal, kU Σy − bk2 = kΣy − U T bk2 , so a least squares solution to U Σy = b is the same as a least squares solution to Σy = U T b, However, kΣy − U T bk22 =

r m X X (σi yi − (U T b)i )2 + (U T b)2i i=1

55

i=r+1

is minimized when yi = (U T b)i /σi , 1 ≤ i ≤ r and yi arbitrary when i > r. Furthermore, since kyk22 =

r X i=1

yi2 +

n X

yi2 ,

i=r+1

kyi k2 is minimized over all least squares solutions when yi = 0 for i > r. However, this y is given by y = Σ+ U T b. Since x = V y, x = V Σ+ U T b = A+ b. Pm m ≤ n: APsimilar argument holds for m ≤ n. However, i=r+1 (U T b)2i n and i=r+1 yi2 may not have any terms, depending on whether or not A is of full rank. Pm • When i=r+1 (U T b)2i doesn’t have any terms, the minimum residual is 0, and least squares solutions are actual solutions. Pn 2 • When i=r+1 yi doesn’t have any terms, the solution (least squares or actual solution) is unique. 58. We obtain w ≈ (−0.6115, −1.2751)T in step 5 of Algorithm 3.9, which gets updated to w ≈ (−0.0363, −1.1935) in step 6. Finally, x is approximately computed from step 7 to be x ≈ (0.9444, 0.1111, −0.7222)T , which is the same as the x reported in Example 3.19. 59. Note that ˜ 2 = kU ΣV T − U ΣV ˜ T k2 = kU T (U ΣV T − U ΣV ˜ T )V k2 , kA − Ak since U and V are orthogonal. Hence we have ˜ 2 = |σn | = σn . kA − Ak Let us suppose on the contrary that there exists B ∈ L(Rn ) with rank(B) = ˜ 2 . Then there exists an (n − r) dir < n such that kA − Bk2 < kA − Ak n mensional subspace W ⊆ R such that ∀w ∈ W, Bw = 0 =⇒ Aw = (A − B)w. Thus, kAwk2 = k(A − B)wk2 ≤ kA − Bk2 kwk2 < σn kwk2 . Now let K be the (r + 1) dimensional subspace defined by K = span(V:,1 , V:,2 , . . . , V:,r+1 ). If k ∈ K, then one can show that kA kk2 = kU ΣV T kk22 ≥ σn kkk2 . Thus, K ⊆ Rn is (r + 1) dimensional, so dim(K) + dim(W ) = n + 1. Therefore, there exists a vector x ∈ Rn − {0} such that x ∈ W ∩ K. But this is a contradiction, since if such a vector existed, then σn kxk2 ≤ kAxk2 < σn kxk2 . ˜ 2 ≤ kA − Bk2 . Therefore, kA − Ak 56

60. κ2 (A) = σ1 /σn ≈ 88.4483. However, σ1 /σ2 ≈ 19.8963 < 25. We may thus ˜ T , where Σ ˜ is obtained from Σ by replacing form a new matrix A˜ = U ΣV σ3 by 0. Note that, from problem 59, this is equivalent to projecting A onto the set of singular matrices (where the projection is with respect to the 2-norm). We then determine x as x = A˜+ b. We obtain x ≈ (−0.6205, 0.0245, 0.4428)T . Thus, to within the accuracy of 1/25 = 4%, we can only determine that the solution lies along the line   −0.6205  0.0245  + y3 V:,3 , y3 ∈ R. 0.4428 Note: This is a common type of analysis in data fitting. The parameter y3 (or multiple parameters, in case of higher-order rank deficiency) needs to be chosen through other information available with the application. 61. This problem is straightforward with matlab’s svd function. The singular value decomposition of A is A = U ΣV T where   −0.5475 −0.1832 −0.8165 0.4082  , U ≈  −0.3231 −0.8538 −0.7719 0.4873 0.4082   4.0791 0 0 0.6005  , and S ≈  0 0   −0.4027 −0.9153 V ≈ . −0.9153 0.4027 62. Consider A(AT A)−1 AT

= =

−1 P DQ (P DQ)T (P DQ) (P DQ)T P D(DT D)−1 DT P T .

Hence, kA(AT A)−1 AT k2 = kP D(DT D)−1 DT P T k2 = kD(DT D)−1 DT k2 , since P is unitary. Since 

   D(DT D)−1 DT =   

we get kA(AT A)−1 AT k2 = 1.

57



In 0 0 ..

. 0

   ,  

Chapter 4 1. (a) The degree-3 Taylor polynomial approximation is T3 (x)

1 x − x3 6 1.0000x − 0.1667x3

= ≈

(b) We may either expand in terms of the Legendre polynomials or compute the Gram matrix (defined on page 198) with respect to the inner product Z 1 (f, g) = f (x)g(x)dx. −1

However, since we wish to compare coefficients to the other fits, and since an order 4 Gram matrix should not be too ill conditioned, we will write down the system (4.1) (page 198) with wj = xj−1 . We obtain P3 (x) = α0 + α1 x + α2 x2 + α3 x3 , with



2

  0   2  3 0

0

2 3

0

2 3

0

2 5

0

2 5

2 5

0



α0





0

     α1   −2 cos(1) + 2 sin(1)  =    0 0    α2   2 α3 10 cos(1) − 6 sin(1) 7



  .  

(The above was obtained with the help of Mathematicar .) An approximate solution of this system (with matlab) gives P3 (x) ≈ 0.9981x − 0.1576x3. (c) We may use Lagrange polynomials, the Newton form, or the Vandermonde system. Since the order 4 Vandermonde system is not excessively ill conditioned and since it gives a form directly comparable with parts (a) and (b), we use it, obtaining Q3 (x) = β0 + β1 x + β2 x3 + β3 x3 , with



1

  1   1  1

−1

− 31 1 3

1

1

−1



β0





sin(−1)



   1    β1   sin(−1/3)  − 27  =    sin(1/3)  , 1    27   β2  1 1 β3 sin(1)

1 9 1 9

giving an approximate solution of

Q3 (x) ≈ 0.9991x − 0.1576x2 . 58

(d) This is already done, with the way we have computed the coefficients in the first place. We note the coefficients are very close, differing in the second or third significant digit. (e) 2. We will use normalized Legendre polynomials here, as defined on page 203. If Λ0 and Λ1 are the first two Legendre polynomials, we obtain Z 1 β0 = (Λ0 , ex ), β1 == (Λ0 , ex ), where (f, g) = f (x)g(x)dx. −1

p √ √ with Λ0 (x) ≡ 1/√ 2 and Λ1 (x) ≡ x/ 2/3. We obtain b0 = (e−1/e)/ 2 ≈ 1.662 and b1 = 6/e ≈ 0.901. We thus obtain p1 (x)





1.662Λ0(x) + 0.901Λ1(x) 1.1752 + 1.1036x,

i.e. b0 ≈ 1.1752 and b1 ≈ 1.1036. 3. An answer is provided in the appendix to the book. 4. An answer is provided in the appendix to the book. 5. (a) We may modify plot Runge and Lagrange interpolant.m from the web page for the book, at http://interval.louisiana.edu/Classical-and-Modern-NA/. We obtain the following matlab script xpts = linspace(-5,5,200); z1 = 1./(1+xpts.^2);; [a] = Lagrange_interp_poly_coeffs(4,’runge’,-5,5); z2 = Lagrange_interp_poly_val(a,xpts); [a] = Lagrange_interp_poly_coeffs(8,’runge’,-5,5); z3 = Lagrange_interp_poly_val(a,xpts); [a] = Lagrange_interp_poly_coeffs(16,’runge’,-5,5); z4 = Lagrange_interp_poly_val(a,xpts); plot(xpts,z1,xpts,z2,xpts,z3,xpts,z4);

The result is as follows.

59

2 0 −2 −4 −6 −8 −10 −12 −14 −16 −5

0

5

Students should observe that, the higher the degree, the worse the approximation near the ends of the interval. (b) We use (4.11) on page 215. First, observe that the Maclaurin series for f is ∞ X f (x) = (−1)n x2n n=0

(since we may write f as a convergent geometric series with ratio −x2 ), so f (n+1) (0) is 0 if n is even and ±(n + 1)! if n is odd. Thus, the factor f (n+1) (ξ)/(n + 1)! in (4.11) Qn can only be bounded by approximately 1. The second factor j=0 (x − xj ), when the points are equally spaced, has a value equal to h h 3h 5h (2n − 1)h ··· 22 2 2 2  n+1 Y n h = (2i − 1) 2 i=2  n+1 Y n n b−a (b − a)n+1 Y 2i − 1 = (2i − 1) = , 2n (2n)2 i=2 2n i=2

when x is the midpoint of [x0 , x1 ] or [xn−1 , xn ]. This expression can be verified to not tend to zero. (A more sophisticated and clearer analysis may be obtained from the theory of analytic functions in the complex plane.) Depending on the level of the students, students can verify this rigorously, or compute values of this expression for various n. 60

(c) We can modify Lagrange interp poly coeffs.m to use the Chebyshev points. To do so, one may simply replace the lines for i =1:np1; x(i) = a + (i-1)*h; end

by for i =1:np1; t = cos((2*i-1)/np1 *pi/2),0; x(i) = 0.5 *((b-a)*t + (b+a)); end

(Our modifications are supplied to the instructor in Chebyshev interp poly coeffs.m.) With appropriate modifications of the script (replacing Lagrange interp poly coeffs by Chebyshev interp poly coeffs, we obtain the following graph. 1.2

1

0.8

0.6

0.4

0.2

0

−0.2 −5

0

5

Students should observe that, with these better points of interpolation, there now appears to be convergence towards the actual function as we increase n. Qn (e) Students should observe that i=0 (x − xi ) is bounded according to (4.18) (page 219) in this case. 6. An answer is provided in the appendix to the book.

61

7. With the expression W (y) =

n Y

(y − yi ),

i=0

we consider the inverse of the function giving the yi , obtaining x = x − xi

=

y b+a − , b−a b−a 1 (y − yi ). b−a

xi =

yi b+a − , b−a b−a

and

Thus W (y) =

n Y

i=0

= =

[(b − a)(x − xi ))

(b − a)n+1

n Y

i=0

(x − xi )

(b − a)n+1 Tn+1 (x),

from which the result follows. 8. An answer is provided in the appendix to the book. 9. A possible reference for this is reference [17] of the text (Approximation Theory by Eliot W. Cheney). 10. The interpolating polynomial bounds for the equally spaced points are left as Exercise 22. For this exercise, we will use the Chebyshev interpolation points. We may use a routine analogous to Chebyshev interp poly coeffs.m that we employed in problem 5(c), except, for mathematical rigor, we must use rigorous bounds on the coefficients, we need to compute the approximating polynomial, although evaluated at a point, with interval arithmetic, and we need to plug in interval bounds in the error term and add it to the approximating polynomial. To obtain the coefficients, we modify Chebyshev interp poly coeffs.m and name it rigor Cheby interp poly coeffs.m, and to evaluate the polynomial, we may use Lagrange interp poly val as-is, since matlab will automatically do interval arithmetic if the arguments have an interval data type. The matlab script is as follows. % file ch_4_10.m % Modification of ch_4_5.m to do problem 10 from chapter 2. aa = rigorinfsup(’-0.1’,’-0.1’) bb = rigorinfsup(’0.1’,’0.1’) n=6 x = rigorinfsup(’0.01’,’0.01’) [a,points] = rigor_Cheby_interp_poly_coeffs(n,’sin’,aa,bb); y = Lagrange_interp_poly_val(a,x); % We use formula (4.11), page 219, with n+1=7 --

62

% The (n+1)-st derivative of f is -cos, on [-0.1,0.1] , bounded % below by -1 and above by -1+0.1^2/2-ub_fnp1 = midrad(-1,0) + aa^2/2; fnp1_range = infsup(-1,sup(ub_fnp1)); Error_bound = fnp1_range/midrad(factorial(n+1),0)... * Chebyprod(x,points) y = y + Error_bound

The routines we call from the above script are as follows. function [coeffs,x] = rigor_Cheby_interp_poly_coeffs(degree,f,a,b) n=degree; np1 = degree+1; % % % %

Construct the abscissas (first place with the change for intervals) -(The points don’t need to be bounded by intervals, since their inexactness only affects the quality of the approximation, not whether or not a rigorous enclosure is obtained.)

for i =1:np1; t = midrad(cos((2*i-1)/np1 *pi/2),0); x(i) = midrad(0.5,0) *((b-a)*t + (b+a)); end % Evaluate the function at the abscissas -y = feval(f,x); y=y’; % Set up the Vandermonde system -A = midrad(zeros(n,n),0); A(:,1) = midrad(1,0); for j=2:np1 for i=1:np1 A(i,j) = x(i)^(j-1); end end

function y = Chebyprod(x,points) np1 = size(points,2); y = x - points(1); for i=2:np1 y = y*(x-points(i)); end

This results in the following matlab dialog. >> ch_4_10 intval aa = [ -0.1001, -0.0999] intval bb = [ 0.0999, 0.1001] n = 6 intval x = [ 0.0099, 0.0101] intval Error_bound =

63

1.0e-012 * [ 0.1989, 0.2000] >> ch_4_10 intval aa = [ -0.1001, -0.0999] intval bb = [ 0.0999, 0.1001] n = 6 intval x = [ 0.0099, 0.0101] intval Error_bound = 1.0e-012 * [ 0.1989, 0.2000] intval y = [ 0.0099, 0.0100] >> format long >> ch_4_10 intval aa = [ -0.10000000000001, -0.09999999999999] intval bb = [ 0.09999999999999, 0.10000000000001] n = 6 intval x = [ 0.00999999999999, 0.01000000000001] intval Error_bound = 1.0e-012 * [ 0.19899802579364, 0.19999801587302] intval y = [ 0.00999983333416, 0.00999983333417] >> sin(x) intval ans = [ 0.00999983333416, 0.00999983333417] >>

11. An answer is provided in the appendix to the book. 12. We use (4.11) on page 215: Y n |f (ξ(x))| (x − x ) j (n + 1)! j=0  n+1 Y n K(n + 2)! 1 (2j − 1) (n + 1)! 2n j=2 (n+1)

kF − Pn k∞

=

≤ =

n (n + 2)K Y 2j − 1 →0 (2n)2 j=2 2n

as n → ∞,

where we use considerations as in problem 5(b) to bound the product. 13. Here, we are considering the Lagrange basis as defined in (4.8) on page 212. In particular, the Lk in this problem are simply the ℓk in (4.8). Since ℓk (xj ) = δj,k , and since the n-th degree polynomial that interpolates a function f at {xi }ni=0 is unique, and since each ℓk is of degree n, it follows

64

immediately that the n-th degree interpolating polynomial to f is n X

Pn (x) =

ℓk (x) f (xk ).

k=0

Thus, the n-th degree polynomial that interpolates f (x) ≡ 1 is n X

k=0

ℓk (x) · 1.

However, sine f (x) ≡ 1 is a polynomial of degree n or less, it interpolates itself, so uniqueness of the interpolating polynomial implies n X

k=0

ℓk (x) ≡ 1.

For the second assertion in this problem, differentiate ψ using the product rule, to obtain n n X Y ψ ′ (x) = (x − xi ), j=0 i=0,i6=j

so

n Y

ψ ′ (xk ) =

i=0,i6=k

so

(xk − xi ),

ψ(x) 1 · ′ , x − xk ψ (xk ) and the result follows immediately. ℓk (x) =

14. An answer is provided in the appendix to the book. 15. An answer is provided in the appendix to the book. 16. (a) The system of equations is given by   N0 (x0 ) · · · Nn (x0 )     .. .. ..   . . .   N0 (xn ) · · ·

Nn (xn )

c0





  ..  =   .    cn

y0



 ..  . .   yn

The fact that this system is upper triangular follows from the fact that Nk (x) has x − xj as a factor for j ≤ k, so Nk (xj ) = 0 for k > j. (b) We use forward substitution to solve the lower triangular system. Since N0 ≡ 1, c0 = y0 = y[x0 ]. We have c1

= =

y1 − N0 (x1 )c0 y1 − y0 = N1 (x1 ) x1 − x0 y[x1 ] − y[x0 ] = y[x0 , x1 ]. x1 − x0 65

Continuing the forward substitution, we see that the computations to compute ck are the same as the computations we used to compute them on page 213 following Definition 4.7, so ck = y[x0 , · · · xk ]. 17. Let ξ = ex . Then, ξ k = ekx , and finding {xk }nk=0 such that Pn interpolates {yi }ni=0 is equivalent to finding cP k such that Qn (ξi ) = yi , i = 0, 1, 2, . . . , n, n k where ξi = exi and Qn (ξ) = k=0 ck ξ . However, the ξi are distinct since the exponential function is strictly monotonic. Therefore, from the existence and uniqueness of the interpolating polynomial of degree n, the ck are unique. 18. We may use Theorem 4.8 (page 220). First, note that the polynomial should be of degree 3, rather than degree 2, so we will refer to it as P3 (x). We have n = 2, with x1 = 0, x2 = 2, and ℓ1 (x) = so

x−2 , −2

1 (1 + x)(x − 2)2 , 4 ˜ 1 (x) = 1 x(x − 2)2 , h 4 From this, we obtain h1 (x) =

ℓ2 (x) =

x , 2

1 (3 − x)x2 , 4 ˜ 2 (x) = 1 (x − 2)x2 . h 4 h2 (x) =

˜ 1 f ′ (0) + h ˜ 2 f ′ (1). P3 (x) = f (0)h1 (x) + f (1)h2 (x) + h (We provide a Mathematicar notebook that checks this.) To obtain the error bound, we simply plug in to (4.19) (page 220). Namely, we obtain f (x) − P3 (x) =

x2 (x − 2)2 (4) f (ξ) 24

for some ξ ∈ [0, 2].

19. By piecewise cubic Hermite interpolation, this means Hermite interpolation on each interval [xi , xi+1 ]. (Note that this gives a C 1 function, in contrast to a cubic spline, which gives a C 2 function.) As in the previous problem, we plug into (4.19) to get 2 2 ˆ 2 (x)| = (x − (i − 1)h) (x − ih) f4 (ξi ) |f (x) − H 24  2 2 h 1 ≤ max |f (4) (x)|, 4 24 0≤x≤1 where x ∈ [(i − 1)h, ih] and h2 /4 is the maximum of x(x − h) for x ∈ [0, h]. The conclusion to the problem follows immediately from this. 66

20. We may use Theorem 4.12 (page 234), with n = 1, so there must be three points in [0, 1] at which the error is maximum. Let e(x) = ln(1 + x) − (a0 + a1 x),

and

max e(x) = e0 .

0≤x≤1

Then, since ln(1 + x) is concave, it can be concluded that the maximum error occurs at the end points, that e(0) = −e0 , e(1) = −e0 , and there is a point x0 ∈ (0, 1) with e(x0 ) = e0 . Setting e(0) = e(1) gives ln(1) − a0 = ln(2) − a0 − a1 , whence a1 = ln(2). Equating e(x0 ) = −e(0) or e(x0 ) = −e(1) gives e(x0 ) = ln(1 + x0 ) − (a0 + ln(2)x0 )

= a0 .

Also, x0 must be a critical point of e(x) = ln(1 + x) − a0 − ln(2)x. We have 1 1 e′ (x) = − ln(2) = 0 when x = x0 = − 1. 1+x ln(2) Combining these computations gives a0

= = =

1 [ln(1 + x0 ) − ln(2)x0 ) 2    1  1 1 ln 1 + − 1 − ln(2) −1 2 ln(2) ln(2)    1 1  ln + ln(2) − 1 . 2 ln(2)

Note that the maximum error is a0 ≈ 0.0298. 21. Let pn (x) be the minimax approximation to f of degree n, and write pn = qn + rn , where qn is even and rn is odd, and assume rn 6≡ 0. Let x0 and x1 be arbitrary points such that e(x0 ) = f (x0 ) − qn (x0 ) − rn (x0 ) = e(x1 ) = f (x1 ) − qn (x1 ) − rn (x1 ) =

max |f (x) − pn (x)| and

x∈[−a,a]

− max |f (x) − pn (x)|. x∈[−a,a]

Then, e(−x0 ) = f (x0 ) − qn (x0 ) + rn (x0 ) ≤ M , but e(−x0 ) = M implies rn (x0 ) ≡ 0, so assume e(−x0 ) < M , implying rn (x0 ) < 0. A similar argument shows that rn (x1 ) > 0. However, all such error-maximizing points are either x0 or x1 , so we may replace rn by (1 − ǫ)rn , which, for sufficiently small ǫ, will decrease e(x0 ) and increase e(x1 ) without |e(x)| exceeding M at any other point x, contradicting the assumption that the maximum error was M . Therefore, rn ≡ 0, and pn is even. 67

22. We will compute an enclosure for the range sin([−0.01, 0.05]) as was done on page 238 using Taylor series. We use the error in (4.11) (on page 215). However, we need rigorous bounds on the value of the function at the points of interpolation, so, in contrast to the Taylor polynomial approximation, the coefficients will end up being narrow intervals (of widths related to the roundoff in evaluating the function at the points of interpolation). For simplicity, to compute the coefficients, we will modify Lagrange interp poly coeffs.m to deal with the resulting interval systems of equations. (Lagrange interp poly val.m can be used as-is, since matlab will automatically use interval arithmetic if the arguments are intervals.) We obtain the resulting matlab function (stripped of the preliminary in-line documentation, which is similar to the non-interval version.) function [coeffs, x] = ivl_Lagrange_interp_poly_coeffs(degree, f, a, b) % Define the polynomial degree and end points of the interval -n = degree; h = (b-a)/(n); np1 = degree+1; % Construct the abscissas -for i =1:np1; % The points do not need to be exactly equally spaced for the % computation to be mathematically rigorous. x(i) = midrad(a + (i-1)*h,0); end % Evaluate the function at the abscissas -y = feval(f,x); y=y’; % Set up the Vandermonde system -A = midrad(zeros(np1,np1),0); A(:,1) = midrad(1,0); for j=2:np1 for i=1:np1 A(i,j) = x(i)^(j-1); end end % Solve the system using INTLAB’s routine -coeffs = verifylss(A,y); x=x’;

We obtain the following matlab dialog. >> format long >> [coeffs, nodes] = ivl_Lagrange_interp_poly_coeffs(5,’sin’,-0.1,0.1) intval x = [ -0.10000000000001, -0.10000000000000] [ -0.06000000000001, -0.06000000000000] [ -0.02000000000001, -0.02000000000000] [ 0.01999999999999, 0.02000000000000] [ 0.05999999999998, 0.06000000000000] [ 0.10000000000000, 0.10000000000001] intval coeffs = [ -0.00000000000001, 0.00000000000001] [ 0.99999999999714, 0.99999999999715] [ -0.00000000000004, 0.00000000000004] [ -0.16666665844672, -0.16666665844512] [ -0.00000000000373, 0.00000000000373] [ 0.00833055589901, 0.00833055604810] intval nodes =

68

[ -0.10000000000001, -0.10000000000000] [ -0.06000000000001, -0.06000000000000] [ -0.02000000000001, -0.02000000000000] [ 0.01999999999999, 0.02000000000000] [ 0.05999999999998, 0.06000000000000] [ 0.10000000000000, 0.10000000000001] >> valmp01 = Lagrange_interp_poly_val(coeffs,rigorinfsup(’-0.01’,’-0.01’)) intval valmp01 = [ -0.00999983333415, -0.00999983333414] >> valp05 = Lagrange_interp_poly_val(coeffs,rigorinfsup(’0.05’,’0.05’)) intval valp05 = [ 0.04997916927084, 0.04997916927086] >>

Here, we needed to use either intlab’s intval function or rigorinfsup (as illustrated here), because −0.01 and 0.05 are not exactly representable in the internal binary format, and these functions convert the decimal representation into small intervals that contain the exact decimal representation. What remains in this problem is to bound the error according to (4.11). Noting that d6 sin(x) = − sin(x) dx6 and sin(x) ≤ x for x ≥ 0, we can replace |f (n+1) (ξ(x))| in (4.11) by [−0.1, 0.1] to bound the error, provided x ∈ [−0.1, 0.1]. We have created the following matlab function to do the computation. function [ivl_bnd] = Lagrange_interp_poly_error (nodes, x, fnp1_bnd) % The input arguments should be intervals np1 = size(nodes,1); prod = fnp1_bnd; for i=1:np1 prod=prod*(x-nodes(i)); end ivl_bnd = prod/factorial(np1);

With this matlab function, we follow the preceding dialog with: >> err_bnd_mp1 = Lagrange_interp_poly_error(nodes,... rigorinfsup(’-0.01’,’-0.01’),rigorinfsup(’0’,’0.1’)) intval err_bnd_mp1 = 1.0e-011 * [ -0.14437500000001, 0.00000000000000] >> err_bnd_mp01 = Lagrange_interp_poly_error(nodes,... rigorinfsup(’-0.01’,’-0.01’),rigorinfsup(’0’,’0.1’)) intval err_bnd_mp01 = 1.0e-011 * [ -0.14437500000001, 0.00000000000000] >> lower_enclosure = valmp01+err_bnd_mp01 intval lower_enclosure = [ -0.00999983333559, -0.00999983333414] >> err_bnd_p05 = Lagrange_interp_poly_error(nodes,... rigorinfsup(’-0.05’,’0.05’),rigorinfsup(’0’,’0.05’)) intval err_bnd_p05 = 1.0e-010 * [ -0.92640625000001, 0.39703125000001] >> upper_enclosure = valp05+err_bnd_p05 intval upper_enclosure = [ 0.04997916917820, 0.04997916931056]

69

>> enclosure = infsup(inf(lower_enclosure),sup(upper_enclosure)) intval enclosure = [ -0.00999983333559, 0.04997916931056]

Note that this is slightly, but not much, wider than the Taylor polynomial enclosure given in the text. This could be due to various factors, including the way we computed the enclosures. For part (b) of this problem, one may replace the loop for i =1:np1; % The points do not need to be exactly equally spaced for the % computation to be mathematically rigorous. x(i) = midrad(a + (i-1)*h,0); end

in ivl Lagrange interp poly coeffs by for i =1:np1; % The points do not need to be exactly equally spaced for the % computation to be mathematically rigorous. xi = cos((2*i-1)/(np1) * pi/2); x(i) = midrad(0.5*((b-a)*xi + (b+a)),0); end

then redo the computations. (Computation with Chebyshev interpolation points is in ivl Cheby interp poly coeffs.m, available from rbk@ louisiana.edu with the other codes in this answer book.) The resulting enclosure is [ -0.00999983333770, 0.04997916931056], slightly worse than with equally spaced points. Issuing sin(rigorinfsup(’-0.01’,’0.05’)), intlab returns [ -0.00999983333417, 0.04997916927068], slightly better than any of the enclosures we have computed. (However, the differences in the enclosures in this problem are, in many contexts, probably not significant. 23. Differentiating sj (x) in (4.29) yields:  0,       1   − (xj+2 − x)2  3  2h     1 1 3   − 2 (xj+1 − x)2 + 3 (xj+1 − x)2 ,  − 2h h 2h s′j (x) = 2 3    − 2 (x − xj ) − 3 (x − xj )2 ,   h 2h     1 2   (x − xj−2 ) ,   2h3     0, 70

x > xj+2 , xj+1 ≤ x ≤ xj+2 , xj ≤ x ≤ xj+1 , xj−1 ≤ x ≤ xj , xj−2 ≤ x ≤ xj−1 , x < xj−2 ,

and

s′′j (x) =

                                  

0,

x > xj+2 ,

1 (xj+2 − x), h3 2 6 (xj+1 − x) − 3 (xj+1 − x), h2 2h 2 3 − 2 − 3 (x − xj ), h h 1 (x − xj−2 ), h3 0,

xj+1 ≤ x ≤ xj+2 , xj ≤ x ≤ xj+1 , xj−1 ≤ x ≤ xj , xj−2 ≤ x ≤ xj−1 , x < xj−2 .

We can now easily check that lim s′j (x) = s′j (xi ) and lim s′′j (x) = s′′j (xi ) x→xi

x→xi

for i = j − 2, j − 1, . . . , j + 2. Therefore they are continuous. 24. The matrix A is derived from N X j=0

cj ϕj (xi ) + c1 ϕ−1 (xi ) + cN −1 ϕN +1 (xi ) = 0,

0 ≤ i ≤ n.

A is 

ϕ0 (x0 )

  ϕ0 (x1 )     ..   .   ϕ0 (xN )

ϕ−1 (x0 ) + ϕ1 (x0 )

···

ϕN−1 (x0 )

ϕ1 (x1 )

···

ϕN−1 (x1 )

.. . ϕ1 (xN )

···

ϕN−1 (xN ) + ϕN+1 (xN )

However, using the computations for that xi+1 − xi = h, we have    ϕj (xj−1 )      ϕj (xj )        ϕj (xj+1 )

ϕN (x0 )



 ϕN (x1 )     . ..   .    ϕN (xN )

s′j , from problem 23 and the fact

= = =

1 , 6 2 , 3 1 , 6

and ϕj (xi ) = 0 otherwise. Plugging these values for the ϕj into our form for the matrix A and multiplying by 6 gives the result. 25. This follows from the same computations as in problem 24. 26. An answer is provided in the appendix to the book. 71

27. An answer is provided in the appendix to the book. 28. (a) Since Sj (t) is quadratic, Sj (t) = αt2 + βt + γ for t ∈ [tj−1 , tj ], which implies S(t) = αt2 + βt + γ for t ∈ [tj−1 , tj ]. Hence, S ′ (t) = 2αt + β. Then, we have S ′ (tj−1 ) + S ′ (tj ) = 2α(tj−1 + tj ) + 2β. Also from, (i), we have S(tj ) = S(tj−1 ) = Then,

αt2j + βtj + γ = f (tj ), αt2j−1 + βtj−1 + γ = f (tj−1 ).

2 (f (tj ) − f (tj−1 )) = 2α(tj + tj−1 ) + 2β, hj

which proves the result. (b) Using the result in part (a) we have S ′ (t0 ) + S ′ (t1 ) =

2 (f (t1 ) − f (t0 )), h1

which can be solved for S ′ (t1 ) =

2 (f (t1 ) − f (t0 )) − f ′ (0). h1

Similarly for j = n, we have, S ′ (tn−1 )S ′ (tn ) = From these, we  1 0    1 1    0 1   

2 (f (tn ) − f (tn−1 )). hn

can then obtain the  ... ... ... 0   0 ... ... 0    1 0 ... 0    ..  .

0 ... ...

0

1

1



    =     72

following linear  ′  S (t1 )    ′   S (t2 )        . ..       S ′ (tn )

2 (f (t1 ) − f (t0 )) − f ′ (t0 ) h1 2 (f (t2 ) − f (t1 )) h2 .. . 2 (f (tn ) − f (tn−1 )) hn



    .    

29. Euler’s formula is eiy = cos(y) + i sin(y). Plugging into p(x) = p(x)

PN −1

=

j=0

cj e−ijx gives

N −1 X

X

cj cos jx +

j=0

=

N −1 X

aj cos jx +

j=0

X

j = 0N −1 icj sin(jx) j = 0N −1 bj sin(jx),

where aj = cj and bj = icj . (Thus, complex arithmetic need not be used in the analysis, if the function f is real.) 30. One way to make sense of this problem is to interpret f as a complex 5 function, and interpret t 2 in terms of its principle branch. Such an interpretation is necessary, since f is not defined as a real-valued function over [0, 2π]. As such a complex-valued function, f satisfies the hypotheses of Theorem 4.17 (page 251, the theorem on convergence of Fourier series) with n = 2, and the result follows. 31. (a) Note that f (l) (0) = fˆ(l) (0) =

n X

ck (ik)l for l = 0, 1, 2, . . . , 2n gives

k=−n

the linear system Ac = f , where: 

1

   i(−n)   Ac =      

(−in)2n

1

1

...

i(−n + 1) . . . . . .



1 in

.. . . . . . . . (in)2n

...

c−n

    c−n+1     ..   .   cn



     =f =     

           

f (0) ′

f (0) .. . f (2n) (0)



     .     

But A is a Vandermonde matrix and is nonsingular, so c exists and is unique, so the approximation fˆ(x) exists.

73

(b) We have fˆ(0)

=

fˆ′ (0)

=

fˆ′′ (0) =

1 3 1 9π 2 27π 2



=

c−1 + c0 + c1 ,

=

−ic−1 + ic1 ,

=

−c−1 − c1 .

Solving these yields       −i 1 1 2 i 1 −ix ˆ f (x) = − e + + + − eix , 18π 27π 2 3 27π 2 18π 27π 2 which can be simplified to yield 1 1 2 2 fˆ(x) = − cos x − sin x + + . 2 27π 9π 3 27π 2 32. We arrange the computations in a table such as Table 6.10, page 297 of Alan V. Oppenheim and Ronald W. Schafer, Digital Signal Processing, Prentice-Hall, Englewood Cliffs, New Jersey, 1975. In our case, WN WN2

= =

e−i(2π/16) = e−i(π/8) , e−i(π/4) ,

WN4 WN8

= =

e−i(π/2) = −i, e−iπ = −1.

The values of f need to be arranged initially according to the bit pattern of the indices, with the order of the function values determined by reversal of the bit patterns. With N = 16, the indices go from 0 to 15, so the bit pattern (binary representation) for index 5 would be (0101)2 . Thus, f (x5 ) would occur in position (1010)2 = 10 of the initial array. (This is actually the eleventh listed number, since indexing starts with 0.) We then proceed as in the table we have just referenced. Note that f (x0 ) through f (x7 ) are equal to −1, while f (x8 ) through f (x15 ) are equal to 1. Note: If computing the FFT is and gaining an idea of the approximation properties of the discrete Fourier transform is of interest, but understanding the algorithm in detail is not, students may use matlab’s fft function, instead of forming this table. Forming this table by hand may be time-consuming, since it may require some outside study of fast Fourier transform algorithms. Programs for fast Fourier transforms, in addition to being in matlab, are also available on the web.

74

i 0

0 −1

0

0

2

8 4

1 −1

−2 0

12 2

1 −1

10 6

4 0

0

0

−2 0

−2 + 2i 0

−2 + 2i 0

(−2 + 2i)(1 + e−iπ/4 ) 0

−2 0

2i 0

−2 − 2i 0

(−2 − 2i) 0

(−2 − 2i)(1 + e−i3π/4 ) 0

1 −1

−2 0

−2 0

−2 + 2i 0

(−2 + 2i)e−iπ/4 0

(−2 + 2i)(1 − e−iπ/4 ) 0

14 1

1 −1

−2 0

2i 0

−2 − 2i 0

(−2 − 2i)e−i3π/4 0

(−2 − 2i)(1 − e−i3π/4 ) 0

9 5

1 −1

−2 0

−2 0

−2 + 2i 0

−2 + 2i 0

(−2 + 2i)(1 + e−iπ/4 ) 0

13 3

1 −1

−2 0

2i 0

−2 − 2i 0

(−2 − 2i) 0

(−2 − 2i)(1 + e−i3π/4 ) 0

11 7

1 −1

−2 0

−2 0

−2 + 2i 0

(−2 + 2i)e−iπ/4 0

(−2 + 2i)(1 − e−iπ/4 ) 0

15

1

−2

2i

−2 − 2i

(−2 − 2i)e−i3π/4

(−2 − 2i)(1 − e−i3π/4 )

8 0 (−2 + 2i)(1 + e−iπ/4 ) 0 (−2 − 2i)(1 + e−i3π/4 ) 0 (−2 + 2i)(1 − e−iπ/4 ) 0 (−2 − 2i)(1 − e−i3π/4 ) 0 (−2 + 2i)(1 + e−iπ/4 )e−iπ/8 0 (−2 − 2i)(1 + e−i3π/4 )e−i3π/8 0 (−2 + 2i)(1 − e−iπ/4 )e−i5π/8 0 (−2 − 2i)(1 − e−i3π/4 )e−i7π/8

0 (−2 + 2i)(1 + e

−iπ/4

)(1 + e−iπ/8 )

0 (−2 − 2i)(1 + e

−i3π/4

)(1 + e−3iπ/8 )

0 (−2 + 2i)(1 − e−iπ/4 )(1 + e−5iπ/8 ) 0 −i3π/4 (−2 − 2i)(1 − e )(1 + e−7iπ/8 ) 0 −iπ/4 (−2 + 2i)(1 + e )(1 − e−iπ/8 ) 0 (−2 − 2i)(1 + e−i3π/4 )(1 − e−3iπ/8 ) 0 −iπ/4 (−2 + 2i)(1 − e )(1 − e−5iπ/8 ) 0 −i3π/4 (−2 − 2i)(1 − e )(1 − e−7iπ/8 )

Just for illustration purposes, we programmed this table as function y = ch_4_32_eval_trig_poly(x) c(1) = 0; c(2) = (-2 + 2*i)*(1 + exp(-i*pi/4))*(1 + exp(-i*pi/8)); c(3) = 0; c(4) = (-2 - 2*i)*(1 + exp(-i*3*pi/4))*(1 + exp(-i*3*pi/8));

75

c(5) = 0 ; c(6) = (-2+2*i)*(1 - exp(-i*pi/4))*(1+exp(-i*5*pi/8)); c(7) = 0; c(8) = (-2-2*i)*(1 - exp(-i*3*pi/4))*(1+exp(-i*7*pi/8)); c(9) = 0; c(10) = (-2 + 2*i)*(1 + exp(-i*pi/4))*(1 - exp(-i*pi/8)); c(11) = 0; c(12) = (-2 - 2*i)*(1 + exp(-i*3*pi/4))*(1 - exp(-i*3*pi/8)); c(13) = 0 ; c(14) = (-2+2*i)*(1 - exp(-i*pi/4))*(1-exp(-i*5*pi/8)); c(15) = 0; c(16) = (-2-2*i)*(1 - exp(-i*3*pi/4))*(1-exp(-i*7*pi/8)); c = c/16; y = c(2)*exp(i*x) + c(4)*exp(i*3*x) + c(6) * exp(i*5*x) + ... c(8)*exp(i*7*x) + c(10)*exp(i*9*x) + c(12) * exp(i*11*x) + ... c(14)*exp(i*13*x) + c(16)*exp(i*15*x);

and produced the following plot with the following matlab dialog. >> x = linspace(0,2*pi,500); >> for i=1:500;y(i) = ch_4_32_eval_trig_poly(x(i));end >> plot(x,y)

It can be verified that the fit does interpolate the data, although the graph shows the interpolant is not a good overall approximation to this step function. A better approximation may be obtained by taking a much higher degree polynomial, but there will still be overshoot at the discontinuities (at π and 0 or 2π). This is known as the “Gibb’s phenomenon.”

76

1.5

1

0.5

0

−0.5

−1

−1.5

0

1

2

3

4

5

6

7

33. Considering (4.48) with b1 = 1, we get the following system of equations. xi f (xi )q(xi ) − p(xi ) = 0 0 −a0 = 0 0.05 ln(1.05)(b0 + 0.05) − 0.05a1 − 0.0025a2 = 0 0.1 ln(1.1)(b0 + 0.1) − 0.1a1 − 0.01a2 = 0 0.15 ln(1.15)(b0 + 0.15) − 0.15a1 − 0.0225a2 = 0 In matrix form,  ln(1.05)  ln(1.10) ln(1.15)

this is     −0.05 −0.0025 b0 −0.05 ln(1.05) −0.10 −0.0100   a1  =  −0.10 ln(1.10)  . −0.15 −0.0225 a2 −0.15 ln(1.15)

Putting this matrix into matlab gives >> A = [log(1.05) -0.05 -0.0025 log(1.10) -0.10 -0.0100 log(1.15) -0.15 -0.0225] A = 0.0488 -0.0500 -0.0025 0.0953 -0.1000 -0.0100 0.1398 -0.1500 -0.0225 >> cond(A) ans = 8.2304e+003 >> b = [-0.05*log(1.05) -0.10*log(1.05) -0.15*log(1.15)] b = -0.0024

77

-0.0049 -0.0210 >> sol=A\b sol = -67.5883 -67.4384 30.6871 >>

Thus, the rational interpolating function is given by r(x) ≈

−67.4384x + 30.6871x2 . −67.5883 + x

We have k = L + M + 1 = 4, and we have that the fourth derivative of f q is 6(x − 67.5883) 8 − . (x + 1)3 (x + 1)4 This function is monotonically decreasing in x over the interval in question, so, replacing x by 0, we see it is bounded by 413.6. (This bound may be obtained by interval arithmetic, by first evaluating the first derivative over the interval to determine monotonicity, then evaluating the fourth derivative with interval arithmetic at the lower end point, or even by directly evaluating the fourth derivative over the entire interval.) A bound on the error is thus |f (x) − r(x)|

=

≤ ≤

|x(x − 0.05)(x − 0.1)(x − 0.15)| · 24| − 67.5883 + x| 4 d [ln(1 + x)(−67.5883 + x)] · max x∈[0,0.15] dx4 4.102 × 10−5 · 413.6 24 × 67 1.06 × 10−5 .

34. We will use (4.44) (page 260) with p(x) = p0 + p1 (x) and q(x) = q0 + q1 x + q2 x2 . We have: ℓ 0 1 2 3 4e

f (ℓ) (x)

1√ 2 πerf(x) −x2

e

−2xe

−x2

−x2

2

x − 2e

78

f (ℓ) (0) 0 1 0

−x2

−2

Equations (4.44) thus become Pi f (ℓ) (0) qi−ℓ = pi i ℓ=0 ℓ! 0 0 = p0 , 1 1 = p1 , 2 q1 = 0, 3 q2 − 13 = 0. This gives the following Pad´e approximant. r(x) =

x . 1 + 31 x2

35. An answer is provided in the appendix to the book. a0 is the Pad´ e approximant to f (x) = x, then r(0) = 0 1 + b1 x ′ and r (0) = 1. However, r(0) = 0 =⇒ a0 = 0 and hence r(x) = 0. Thus r(x) cannot satisfy r′ (0) = 1. Therefore R(0, 1) Pad´ e does not exist for f (x) = x.

36. If r(x) =

a0 is the Pad´ e approximant to f (x) = 2 + 3x, then (2 + 1 + b1 x 3x)(1 + b1 x) = a0 which yields a0 = 2 and b1 = − 23 .

37. If r(x) =

38. For this example (4.49) becomes ϕ(x) = ϕ(2x) + ϕ(2x − 1). (a) This is true by assumption. (See the errata.) (b) For x ∈ (0, 1/2], 2x ∈ [1/2, 1], 2x − 1 < 0, and ϕ(x) = ϕ(2x) + ϕ(2x − 1) = ϕ(2x). Similarly, if x ∈ [1/2, 1], ϕ(x) = ϕ(2x − 1). We will first use this to show that ϕ is continuous on [0, 1]. Take an arbitrary x ∈ [0, 1]. Then ϕ(x) = ϕ(x/2) = · · · = ϕ(x/2n ) for n arbitrary. Here we must make another assumption. (See the errata.) Let’s assume limx→0+ ϕ(x) = ϕ(0). Since ϕ(x) is equal to ϕ(y), where we can make y arbitrarily close to 0, ϕ(x) = ϕ(0), and ϕ is constant on [0, 1]. The integration condition now implies that the constant value is 1. 39. To say that xr ∈ V0 means that xr =

∞ X

k=−∞

79

ak,r ϕ(x − k)

for some non-zero sequence {ak,r }∞ k=−∞ , so (4.50) and the orthogonality property stated below (4.51) (on page 269) gives Z ∞ xr ϕ(x)dx = a0,r . −∞

Furthermore, representing ϕ in terms of (4.49) and changing variables in the integration gives Z



xr ϕ(x)dx = 2−(1+r)

−∞

L X

aℓ,r .

ℓ=0

The remainder of the proof is left to the instructor.

80

Chapter 5 1. (a) Using absolute row sums, a11 = 2, ρ1 = 1; a22 = 2, ρ2 = 2; a33 = 2, ρ3 = 1. Thus, every eigenvalue of A lies in the disc {z ∈ C : |z − 2| ≤ 2}. (b) The three eigenvalues and corresponding eigenvectors are: λ1 = 2,

λ2 = 2 +

λ3 = 2 −

√ √

2,

2,

v1 =

√ √ !T 2 2 − , 0, , 2 2

v2 =

!T √ 1 2 1 − , ,− , 2 2 2

v3 =

!T √ 2 1 1 , , . 2 2 2

and

Every eigenvalue of A lies in the disc {z ∈ C : |z − 2| ≤ 2}. 2. (a) Computing the eigenvalues of A directly from the definition, we get λ1 = 1, λ2 = −1. (b) Using absolute row sums, a11 = 0, ρ1 = 1; a22 = 0, ρ2 = 1. Thus, every eigenvalue of A lies in the disc {z ∈ C : |z| ≤ 1}. (c) In this case, the two disks are identical. A one-to-one correspondence between the discs and the eigenvalues can still be set up. Thus, A is not a counterexample to Corollary 5.1. 3. The sum of the absolute values of the off-diagonal elements in each row is less than 1, and the diagonal entries are all 0. Hence, the Gerschgorin Circle Theorem implies that all eigenvalues have modulus less than 1. 4. Yes, zero can be in the spectrum of a diagonally dominant matrix. As an example, take   1 1 A= . 1 1 However, if A is a strictly diagonally dominant matrix, zero can not be the spectrum of A. 5. First, we use Schur’s Theorem to represent A as A = P (Λ + U )P −1 ,

81

where P is unitary, Λ is diagonal, and U is upper triangular with 0’s on the diagonal.Then, Ak

= P (Λ + U )k P −1

 = P Λk + Λk−1 U + U Λk−1 + Λk−2 U 2 + U 2 Λk−2 + · · · + U k P −1  = P Λk + · · · + Λk−n+1 U n−1 + U n−1 Λk−n+1 P −1 ,

where the last equality follows from the fact that U must be nilpotent of degree n − 1. Furthermore, since all matrix norms are equivalent, there is a C independent of the matrix M such that kM k ≤ CkM k2 . Combining this with the observation that that ρ(Ak ) = ρ(A)k then gives ρ(A)

1 1 1 1 ≤ kAk k k ≤ kP k k Λk + · · · + U n−1 Λk−n+1 k kP −1 k k

1 1 1 1 ≤ kP k k C k Λk + · · · + U n−1 Λk−n+1 2k kP −1 k k  k1 1 1 1 ≤ kP k k C k kΛkk2 + · · · + kU kn−1 kkΛkk−n+1 kP −1 k k 2 2   1 1 ≤ kP k k C k 2n max 1, kU k2, . . . kU kn−1 kΛkk−n+1 2 2 1   k −1 1 max 1, kΛk2, . . . , kΛkn−1 kP k k 2 1

1

n

1

1

1

−n+1 k

= kP k k C k 2 k MUk MΛk kP −1 k k kΛk2

where

kΛk2 ,

 MU = max 1, kU k2, . . . kU kn−1 2

and

 MΛ = max 1, kΛk2, . . . , kΛkn−1 . 2

However, all of the factors in the last expression except kΛk2 tend to 1 as k → ∞, giving 1 lim kAk k k ≤ kΛk2 . k→∞

But kΛk2 = ρ(A), thus proving the theorem. 6. Let q (0) = (1, 1, 1)T . Then, 1 q (1) = Aq (0) ≈ (1.0000, 0, 1.0000)T (choosing σν+1 = kAq (ν) k∞ ). σ1 Thus, q (2) =

1 Aq (1) ≈ (1.0000, −1.0000, 1.0000)T . σ2

q (3) =

1 Aq (2) ≈ (−0.7500, 1.0000, −0.7500)T σ3

q (4) ≈ (−0.7143, 1.0000, −0.7143)T . q (5) ≈ (−0.7083, 1.0000, −0.7083)T . 82

.. . q (10) ≈ (−0.7071, 1.0000, −0.7071)T . Using (5.11), ν (1)



(Aq (0) )1 ≈ 1.0000, (q (0) )1

ν (2)



(Aq (1) )1 ≈ 2.0000, (q (1) )1

ν (3)



3.0000,

ν (4) ν (5)

ν

(10)

≈ ≈ .. . ≈

3.5000, 3.4286, , 3.4142.

Let q (0) = (0, 1, 0)T . Then, q (1) q (2)

= (−0.5000, , 1.0000, , −0.5000)T , ≈ (−0.6667, 1.0000, −0.6667)T ,

q (3) q (4)

≈ (−0.7000, 1.0000, −0.7000)T , ≈ (−0.7059, 1.0000, −0.7059)T , .. .

q (10)

≈ (−0.7071, 1.0000, −0.7071)T .

Using (5.11), ν (1) ≈ 2.0000, ν (2) ≈ 3.0000, ν (3) ≈ 3.3333, ν (4) ≈ 3.4000, ν (5) ≈ 3.4118, · · · , ν (10) ≈ 3.4142. Hence, the eigenvalue of A of the largest absolute magnitude is λ ≈ 3.4142 and the corresponding eigenvector is approximately (−0.7071, 1.0000, −0.7071)T . The approximation √ of λ is close to the results obtained from problem 1, that is, λ = 2 + 2, and the approximated eigenvector is a multiple of the T  √ eigenvector v = − 12 , 22 , − 12 . The following matlab function can be used to solve the problem: function [v,d]=powermethod(A, q0, N); c = length(A); q = zeros(c,N); u = zeros(1,N); q(:,1)=q0; for n=2:N

83

for i=1:c if abs(q(i,n-1))==max(abs( q(:,n-1))) k=i; i=c; end end q(:,n) = A*q(:,n-1); u(n) = q(k,n)/q(k,n-1); for i=1:c if abs(q(i,n)) == max(abs( q(:,n))) m = i; i = c; end end q(:,n) = q(:,n)./ q(m,n); end v = u(N); d = q(:,N);

9. Let q (0) = (1, 1, 1)T and let λ = 1.Then, q (1) = (λI − A)−1 q (0)

q (2) = (λI − A)−1 q (1)

q

(3)

−1 (2)

= (λI − A)

q q (4)





≈ ≈

(0.6667, 1.0000, 0.6667)T , (0.7059, 1.0000, 0.7059)T , (0.7071, 1.0000, 0.7071)T , (0.7071, 1.0000, 0.7071)T .

Using (5.23), the Rayleigh Quotient, we have ν (1) ≈ 0.5714, ν (2) ≈ 0.5854, ν (3) ≈ 0.5858, and ν (4) ≈ 0.5858. Let q (0) = (0, 1, 0)T and λ = 1.Then, q (1) = (λI − A)−1 q (0) q (2) = (λI − A)−1 q (1)

q (3) = (λI − A)−1 q (2) q (4)

≈ ≈

≈ ≈

(1.0000, 1.0000, 1.0000)T , (0.7143, 1.0000, 0.7143)T , (0.7071, 1.0000, 0.7071)T , (0.7071, 1.0000, 0.7071)T ,

and ν (1) ≈ 0, ν (2) ≈ 0.5714, ν (3) ≈ 0.5854 and ν (4) ≈ 0.5858.

Let q (0) = (1, 1, 1)T and λ = 3. Then, q (1) = (λI − A)−1 q (0) q (2) = (λI − A)−1 q (1)

q (3) = (λI − A)−1 q (2) q (4) q (5)

≈ (0, 1.0000, 0)T , ≈ (−0.6667, 1.0000, −0.6667)T ,

≈ (−0.7073, 1.0000, −0.7073)T , ≈ (−0.7071, 1.0000, −0.7071)T , ≈ (−0.7071, 1.0000, −0.7071)T ,

84

and ν (1) ≈ 0, ν (2) ≈ 4.0000, ν (3) ≈ 3.4146, ν (4) ≈ 3.4142 and ν (5) ≈ 3.4142. If we let λ = 3 in the inverse power method, comparing the results obtained from problem 6, we see that both methods converge to the same eigenvalue 3.4142 and the corresponding eigenvector (−0.7071, 1.0000, −0.7071)T , but the inverse power method has a faster convergence rate. The matlab function inverse power method can be used to solve this problem. This function is available from the web site for the book, at http://interval. louisiana.edu/Classical-and-Modern-NA/#Chapter_5 10. Using



ρj = 

n X

j,k=1,j6=k

1/2

2

|ajk |

,

we see that all the discs Kρj are disjoint for j ≥ 2. By Gerschgorin’s Theorem for symmetric matrices and the fact that λ1 ≤ λ2 ≤ · · · ≤ λ2n ≤ λ2n+1 , λn+1,n+1 is in the disc   1/2      X (n+1) (i+j−1) 2   Kρ(n+1) = z ∈ C : |z − 1.5 |≤ |0.5 | ,     i,j=1,i6=j and for large n, the disc radius is small. Thus we should choose a value in the disc Kρ(n+1) , for example, 1.5(n+1) .

11. Using the power method with deflation, we obtain the approximation of all eigenvalues: 3.4142, 2.0372 and 0.6243. Comparing with the eigenvalues obtained from problem 1, the error is O(10−2 ). The following matlab script can be used to solve the problem: clear clc A=[2 -1 0; -1 2 -1; 0 -1 2]; N=20;L=length(A);lambda=zeros(1, L); for i=1:L q0=ones(L,1); [lam,x]=powermethod(A,q0,N); lambda(i)=lam; I=eye(L); sigma=sign(x(1)); w=x+sigma*I(:,1); theta=1/2*norm(w,2)^2; T=I-w*w’/theta; T(:,1)=[]; U=T; A=U’*A*U; L=length(A);

85

end lambda

12. We use Householder transformations. Let A ∈ Cn×n be an n by n matrix in the form   a11 a(1)H A= , A(1) ∈ C(n−1)×(n−1) , a1 A(1) Construct a Householder transformation H1′ , such that H1′ a1 = −σ1 e1 , and let   1 0 H1 = . 0 H1′ Then H1 AH1 =



a11 σ

hH 1 A1



,

where σ = (−σ1 0, · · · , 0)H . With A(1) replacing A, continue the above process. After n − 2 steps, we obtain the upper Hessenberg matrix H = Hn−2 · · · H1 AH1 · · · Hn−2 . Letting Q = H1 H2 · · · Hn−2 , we have A = QHQH . Note that hH 1

= =

a(1)H H1′ = a(1)H − θ1−1 (a(1)H w(1) )w(1)H a(1)H − θ1−1 ,

where w(1) is the vector u and a1 is the vector x in Lemma 3.4 on page 135. We also have A1 = H1′ A(1) H1′ = H1′ {A(1) − θ1−1 (A(1) w(1) w(1)H }

= A(1) − θ1−1 (A(1) w(1) )w(1)H − θ1−1 w(1) (w(1)H A(1) )   + θ1−2 w(1) w(1)H (A(1) w(1) ) w(1)H .

Computation of hH 1 requires O(n − 1) multiplications. Since we don’t actually store H1′ as a matrix, but we only store w(1) , computation of H1′ only involves computing w(1) , which requires O(n − 1) multiplications. Computation of A1 requires the following matrix-vector multiplications and outer products, each of which requires (n − 1)2 multiplications: v1 M2 v3 α4 M5

= = = = = 86

A(1) w(1) , v1 w(1)H , w(1)H A(1) , w(1)H v1 , w(1) w(1)H ,

while the remaining operations in computing A1 only require O(n − 1) multiplications. Thus, there is a total of 5(n − 1)2 + O(n − 1) multiplications. Following the recursion with A1 replacing A, the total number of multiplications for all n − 2 steps is thus n−2 X i=1

5(i − 1)2 + =

n−2 X i=1

O(i − 1)

5 (n − 1)3 + O(n2 ). 6

13. (The analysis of this problem is based on the analysis of problem 12, and we will use the same notation.) Grouping factors differently than in problem 12, we use the fact that w(1) w(1)H A(1) and w(1)H (A(1) w(1) )w(1)H are Hermitian to rewrite A1 as     A1 = A(1) − θ1−1 A(1) w(1) w(1)H − θ1−1 A(1) w(1) w(1)H   + θ1−2 w(1) w(1)H (A(1) w(1) ) w(1)H     = A(1) − 2θ1−1 A(1) w(1) w(1)H + θ1−2 w(1) w(1)H (A(1) w(1) ) w(1)H      = A(1) − 2θ1−1 A(1) w(1) w(1)H + θ1−2 w(1)H A(1) w(1) w(1) w(1)H . The computations thus reduce to v1 M2 M3 M4

= = = =

A(1) w(1) , w(1) w(1)H , v 1 w(1)H ,   w(1)H A(1) w(1) w(1) w(1)H .

The remaining operations are O(n − 1) (except for the matrix addition). Thus, the number of multiplications is reduced from 5(n − 1)2 + O(n − 1) to 4(n − 1)2 + O(n − 1), so the total number of multiplications for all n − 2 steps of the recursion is reduced from 56 (n − 1)3 + O(n − 1)2 to 2 3 2 3 (n − 1) + O(n − 1) . 14 First, Aν − µν I = Qν Rν and Aν = µν I + Qν Rν . Also, we may rewrite A as A = (a1 , a2 , · · · , an ), Qν = (q1 , q2 , · · · , qn ) and R = (rij )n×n where ai and qi are the i-th column of A and Qν respectively. Comparing both sides of Aν = µν I + Qν Rν , a1 = r11 q1 + µν e1 , a2 = r12 q1 + r22 q2 + µν e2 , .. . an = r1n q1 + r2n q2 + · · · + rnn qn + µν en 87

We can see from above that Qν is an upper Hessenberg matrix if Aν is. Also, since Rν is an upper triangular matrix, Aν+1 = Rν Qν + µν I is a Hessenberg matrix. 15. The QR method is outlined in (a), (b), and (c) on pp. 308–309. In (a), computing the origin shift does not require computation. (See the explanation on page 312 above Remark 5.23.) In part (b), a QR factorization is performed. If we perform it with Given’s rotations, we see that at the k-th step, we need to apply n − k rotations to n − k rows, for O(n − k)2 operations, and summing from k = n to 2 gives a total of O(n)3 operations. If A is upper Hessenberg, we only need to apply 1 rotation on each of the n − k rows, for O(n − k) rotations on the k-th step, for a total of O(n)2 rotations. If A is tridiagonal (e.g. if it is Hessenberg and Hermitian), only one rotation is required per step, for a total of O(n) operations over the n − 1 steps. Students can fill in the details of this. 16. By hypothesis, A has eigenvalues λ1 = 2, λ2 = 4, λ3 = 6 with the corresponding eigenvectors v1 , v2 and v3 . Thus the matrix (A − 3I)−1 has eigenvalues −1, 1 and 13 with corresponding eigenvectors v1 , v2 and v3 . 1 And (A − 5I)−1 has eigenvalues − , −1 and 1 with correspoinding eigen3 vectors v1 , v2 and v3 . Let B = −(A − 3I)−1 (A − 5I)−1 . Then, since xk+1 = −(A − 3I)(−1) (A − 5I)−1 xk , xk = Bxk−1 = B 2 xk−2 = · · · = B k x0 = B k (v1 + v2 + v3 )

Note that so

1 1 B(v1 + v2 + v3 ) = − v1 + v2 − v3 , 3 3  k 1 1 xk = − v1 + v2 + (− )k v3 . 3 3

Therefore,

 k  k

k

1 1

x − v2 = v1 + − v3



3 3  k 1 kv1 + v3 k = 3 1 c ≤ k (kv1 k + kv3 k) ≤ k , 3 3 where c is an upper bound on kv1 k + kv3 k. 88

17. (a). AK = A(k1 , k2 , · · · , kn )

= A(b, Ab, · · · , An−1 b)

= (Ab, A2 b, · · · , An b) = (k2 , k3 , · · · , kn , An b)

= (Ke2 , Ke3 , · · · , Ken , An b)

= K(e2 , e3 , · · · , en , K −1 An b) = K(e2 , e3 , · · · , en , −c). Hence, K −1 AK = (e2 , e3 , · · · , en , −c) = C. (b). A and K −1 AK are similar, and it is well-known that similar matrices have the same eigenvalues. In particular, if λ is an eigenvalue of A with eigenvector v, applying K −1 AK on the left to K −1 v verifies that K −1 v is an eigenvector of K −1 AK with eigenvalue λ. (c). We will prove this problem by induction. For n = 2,   0 −c1 C = (e2 , −c) = , 1 −c2

so det(C − λI) = λ2 + c2 λ + c1 = p(λ). Thus it is true for n = 2. Suppose that it is true for n = k − 1. That is, for C a (k − 1) × (k − 1) Pk−1 matrix, det(C − λI) = (−1)k−1 (λk−1 + i=1 ci λi−1 ). Then, for n = k, we expand det(C − λI, C ∈ Ck×k by minors along the first row, to obtain det(C − λI) = (−λ)(−1)k−1 (λk−1 + = (−1)k (λk +

k−1 X

k−1 X

ci+1 λi−1 ) + (−c1 )(−1)k−1

i=1

ci+1 λi + c1 )

i=1

= (−1)k (λk +

k X

ci λi−1 ).

i=1

This ends the proof. (d) In this problem, K is any n × n matrix. From (a), (b), and (c), we can see that finding the eigenvalues of an n × n nonsingular matrix is equivalent to finding the roots a polynomial of degree n. However, from, Galois theory, we know that there cannot exist formulas for quintic or higher degree polynomial equations, so, generally, the 89

eigenvalues of an n × n matrix with n ≥ 5 cannot be computed in a finite number of steps.

90

Chapter 6 1. Consider the two point Gauss-Legendre quadrature rule using the quadrature formula obtained from Table 6.1 (on page 349) Z

m X

1

f (x)dx = 2

−1

αj f (zj ) + E(f )

j=0

where α1 = α2 = 12 , z1 = − √13 , z2 =

√1 , m 3

= 1. Z f (4) (ξ) 1 2 By Theorem 6.2, the error E(f ) = p2 (x)dx. 4! −1 Since {pk (x)}∞ k=0 is the associated sequence of orthogonal polynomials generated by (6.33), we know by a previous result (on page 202) that p2 (x) = x2 − 31 . Thus, 1 (4) f (ξ) E(f ) = 24

Z

1 −1



1 x − 3 2

2

dx =

1 8 · f (4) (ξ) · . 24 45

1 8 1 (4) · f (4) (ξ) · = f (ξ). This yields 24 45 135      Z 1 1 1 1 (4) f (x)dx = f − √ +f √ + f (ξ). 135 3 3 −1

Therefore, E(f ) =

2. Suppose we have a finite-difference approximation method where the roundoff error is O( hǫ ) and the truncation error is O(hn ), so the total error is modeled by ǫ ǫ E(h) = O( ) + O(hn ) ≈ M · + N · hn , h h for some constants M and N . The minimum error occurs at E ′ (h) = 0, ǫ which gives − M · 2 + nN hn−1 = 0. Hence, −M ǫ + nN hn+1 = 0. Solving h for h gives h = hopt =



Mǫ nN

1  n+1

=



91

M nN

1  n+1

1

1

· ǫ n+1 = O(ǫ n+1 ).

We thus get a minimal bound of the error as follows: ǫ E(hopt ) = M + N hnopt hopt   1  n  nN n+1 M ǫ n+1 = Mǫ +N Mǫ nN n   n+1 1 n n 1 n M n+1 · ǫ n+1 + N n+1 · = M n+1 · (nN ) · ǫ n+1 n  1  n 1 n n − = M n+1 N n+1 n n+1 + n n+1 · ǫ n+1 n

= O(ǫ n+1 ).

1

Thus, the optimal h is O(ǫ n+1 ) and the minimum achievable error bound n n+1 is O(ǫ ). 3. (a) Assuming that the roundoff error obeys |e(x)| ≤ ǫ|f (x)| for some relative ǫ, that |f (x)| ≤ M0 for some constant M0 , and that |f (5) (x)| ≤ M5 for some constant M5 , for all values of x near x0 , we get the 3ǫM0 h4 M 5 bound for (6.8) to be: E(h) = + . 2h 30 (b) For optimal h, set E ′ (h) = 0, which yields r r 15 5 4 4 4 5 45ǫM0 hopt = E(hopt ) = ǫ M0 M5 . 4M5 8 45 (c) With f (x) = ln x and x0 = 3, (6.8) gives h

f’(h)

10−1 10−2 10−3 10−4 10−5 10−6 10−7 10−8 10−9 10−10 10−11 10−12

0.33333300280405 0.33333333330040 0.33333333333332 0.33333333333441 0.33333333335217 0.33333333336142 0.33333333278781 0.33333332390602 0.33333332390602 0.33333299083912 0.33332966017004 0.33341847801201

√ (d) Our computations predict the optimal h√to be O( 5 ǫ) and the min5 4 imum achievable error bound to be O( ǫ ). In the table, double precision IEEE arithmetic was used, such as can be done in matlab. Thus, the unit roundoff is ǫ ≈ 2 × 10−16 , so ǫ1/5 ≈ 10−3 and 92

ǫ4/5 ≈ 10−12 . In this problem, a relative error of 10−12 corresponds to 12 or more of the displayed digits being correct. This is all consistent with rows 2, 3, and 4 of the above table, so our model of the total error gives good predictions. 4. (a) Assuming that |e(x)| ≤ ǫ|f (x)| for some relative ǫ, that |f (x)| ≤ M0 for some constant M0 , and that |f (5) (x)| ≤ M5 for some constant M5 , for all values of x near x0 , we get the bound for (6.9) to be: 32ǫM0 h4 M5 + . E(h) = 3h 5 (b) For optimal h, set E ′ (h) = 0 which yields s  r 4 40 5 5 40ǫM0 hopt = E(hopt ) = ǫ4 M04 M5 3M5 3 (c) Using f (x) = ln x and f ′ (3) using (6.8) yields: h

f’(h)

10−1 10−2 10−3 10−4 10−5 10−6 10−7 10−8 10−9 10−10 10−11 10−12

0.33333181879849 0.33333333314109 0.33333333333426 0.33333333334144 0.33333333340583 0.33333333360197 0.33333333722870 0.33333332760677 0.33333339792089 0.33333373098780 0.33333706165687 0.33343698172909

√ (d) As in Problem 3, we predict the optimal h to be O( 5 ǫ) and the mini√ 5 mum achievable error bound to be O( ǫ4 ). Also as in problem 3, we see we achieve roughly 12 digits of accuracy with h = 10−3 , consistent with our predictions. However, the minimum error is somewhat larger. This is consistent with the fact that, in (6.9), the coefficient of the h4 term is larger than in (6.8). 5. An answer is provided in the appendix to the book. 6. An answer is provided in the appendix to the book. 7. See the errata: A closed-form formula for the n-th term may not be practical. The degree-3 term is (u′ )3 (− cos(u) + 3u′ u′′ (− sin(u)) + u′′′ cos(u), 93

while the degree-4 term is sin(u)u′4 − 6 cos(u)u′′ u′2 − 4 sin(u)u(3) u′ − 3 sin(u)u′′2 + cos(u)u(4) . It is not hard to program a computer to compute the n-th coefficient symbolically. In fact, we obtained the degree-4 term above in a single statement using Mathematicar , and a simpler program just for the purpose can be written by computer-savvy students. However, writing such a program will be somewhat more work than a usual assigned problem. 8. A similar problem arises here as in problem 7; see the errata at http:// interval.louisiana.edu/Classical-and-Modern-NA/errata.pdf The components of the degree-4 Taylor object are un , nun−1 u′ , (n − 1)nu′2 un−2 + nu′′ un−1

(n − 2)(n − 1)nu′3 un−3 + 3(n − 1)nu′ u′′ un−2 + nu(3) un−1 ,

(n − 3)(n − 2)(n − 1)nu′4 un−4 + 6(n − 2)(n − 1)nu′2 u′′ un−3 + 3(n − 1)nu′′2 un−2 + 4(n − 1)nu′ u(3) un−2 + nu(4) un−1 . (The above was obtained with Mathematica.) Computer-savvy students, especially those with a strong computer science background, can write a program that computes the general n-th term symbolically. 9. To find ∂f /∂x1 , we simply solve the linear system   ′   1 0 0 0 0 v1 1  0   v2′   0 1 0 0 0   ′   ′ 2v1    0 −1 0 0     v3′  =  0  0   2v2 0 −1 0 v4   0 0 0 1 −1 −1 v5′ 0 for v1 . Doing it with matlab, we obtain >> A = [1 0 0 0 0 ; 0 1 0 0 0 2 0 -1 0 0 0 4 0 -1 0 0 0 1 -1 -1] A = 1 0 0 0 1 0 2 0 -1 0 4 0 0 0 1 >> b = [1;0;0;0;0] b = 1

0 0 0 -1 -1

0 0 0 0 -1

94

     

0 0 0 0 >> f1 = A\b f1 = 1 0 2 0 2 >>

∂f /∂x1 (1, 2) is now the last entry in the vector f1, namely, 2. To get ∂f /∂x2 , we follow the above with this computation: >> b = [0;1;0;0;0] b = 0 1 0 0 0 >> f2 = A\b f2 = 0 1 0 4 -4 >>

Thus, ∂f /∂x1 (1, 2) = −4, the last component of f2 . There are some other things that the instructor may note for the students. First, in an efficient implementation, we use the fact that the system is lower triangular and very sparse. Second, directional derivatives can be computed just as easily, with the same amount of work that it takes to compute a single partial derivative, regardless of the number of variables. For example, the following matlab dialog will an approximate √ compute √ directional derivative in the direction u = (1/ 2, 1/ 2)T . >> b = [1/sqrt(2);1/sqrt(2);0;0;0] b = 0.7071 0.7071 0 0 0 >> D_u_f = A\b D_u_f = 0.7071 0.7071 1.4142 2.8284 -1.4142 >>

This shows that the directional derivative is Du (f )(1, 2) ≈ −1.4142. 10. We just did this at the end of the answer to problem 9.

95

11. First, since f is continuous on the closed interval [a, b], f is uniformly continuous on that interval, i.e., for any ǫ > 0, there is a δ > 0 such that for x, y ∈ [a, b], |y − x| < δ implies |f (y) − f (x)| < ǫ. 12. Let the error Em (f ) = J(f ) − Qm (f ). Consider any polynomial Pm of degree m that satisfies Em (Pm ) = 0. We then have J(Pm ) = Qm (Pm ), which then yields Z

b

ρ(x)Pm (x) dx =

a

m X

αi Pm (xi ).

i=1

Pm Also, from the discussion on Gaussian quadrature, let i=0 |αi | < c where c is a positive constant independent of m and a ≤ xi ≤ b for 0 ≤ i ≤ m. Z b m X Moreover if f (x) = 1, then ρ(x) dx = |αi | < c a

i=0

Now consider, ||Em (f )||∞ . We have kQm (f ) − J(f )k∞

max |Qm (f ) − J(f )| m Z b X = max αi f (xi ) − ρ(x)f (x) dx a≤x≤b a i=1 m X = max αi (f (xi ) − Pm (xi )) a≤x≤b i=1 Z b − ρ(x) (f (x) − Pm (x)) dx) a =

a≤x≤b

< 2cǫ,

where we have used triangle inequality and the Weierstrass approximation theorem (page 205). (That is, we have chosen m so that there is a degree m polynomial such that |Pm (x) − f (x)| < ǫ for every x ∈ [a, b].) The result follows. Z 13. (a) Define F (x) = f (x)dx. Then Z

b a

f (x)dx = F (b) − F (a) = F

    h h xi + − F xi − , 2 2

where xi = b + a/2 and h = b − a. Note that

  h F xi + = 2   h F xi − = 2

 2 ′′  3 ′′′ h F (xi ) h F (ξ1 ) + , 2 6 2 24  2 ′′  3 ′′′ h h F (xi ) h F (ξ2 ) F (xi ) − F ′ (xi ) + − . 2 2 6 2 24 h F (xi ) + F ′ (xi ) + 2

96

Hence,     h h h3 ′′′ F xi + − F xi − = hF ′ (xi ) + [F (ξ1 ) + F ′′′ (ξ2 )] . 2 2 48 Using these with h = (b − a) and xi = (a + b)/2, we have     Z b b+a h3 1 f (x)dx = (b − a)f + [F ′′′ (ξ1 ) + F ′′′ (ξ2 )] . 2 24 2 a Using the Intermediate Value Theorem then yields the result. (b) Note that Z b   N X b − a a + a j j+1 f (x)dx − f N 2 a j=0 " NX  # Z aj+1 −1 b − a a + a j j+1 . = f (x)dx − f N 2 j=0 aj Since b − a = N (aj+1 − aj ), using part (a) gives Z b   N X aj + aj+1 b−a f (x)dx − f N N a j=0 NX −1 (aj+1 − aj )3 M ≤ 24 j=0 =

N −1 X j=0

= =

14. Note that En (Pn ) = 0 yields

||En (f )||∞

Z

N −1 X (b − a)3 M 1 24N 3 j=0

(b − a)3 M. 24N 2

1

Pn (x) dx = 0

(b − a)3 M 24N 3

m X

wi Pn (xi ). Consider

i=1

Z n 1 X wi f (xi ) = max f (x) dx − a≤x≤b 0 i=1 Z 1 m X = max wi (f (xi ) − Pn (xi )) − (f (x) − Pn (x)) dx) a≤x≤b 0 i=1

≤ 2||f (x) − Pn (x)||∞ , 97

where we have used triangle inequality. The proof now follows by using the Weierstrass approximation theorem (page 205). 15. This problem is erroneously stated, and should be omitted. (See the errata at http://interval.louisiana.edu/Classical-and-Modern-NA/ errata.pdf)     Z h h 16. Given that f (x)dx = h Af (0) + Bf + Cf (h) 3 0 (a) Since this is exact for degree ≤ 2, plug f (x) = 1, f (x) = x, and f (x) = x2 into the formula to get A + B + C = 1, B 1 +C = , 3 2 B 1 +C = . 9 3 Solving these gives A = 0, B = 3/4, C = 1/4. (b) With h = 2, the trapezoidal rule gives Z

2

f (x)dx =

0

2 1 [f (0) + f (2)] = , 2 2

whence

1 . 2 The quadrature rule given in part (a) for h = 2 gives f (0) + f (2) =

Z

0

2



3 f (x)dx = 2 f 4

   2 1 1 + f (2) = . 3 4 4

Plugging f (0) = 3 into the first equation gives a value for f (2), which, when plugged into the second equation, gives f (2/3) = 1. 17. Z N −1 Z N −1 1 X xi+1 X f (x)dx − f (xi )h = (f (x) − f (xi )) dx 0 x i i=0 i=0 Z −1 NX xi+1 = f ′ (ξ)(x − xi )dx , xi i=0

98

where ξ ∈ (xi , xi+1 ). Hence, Z N −1 Z N −1 1 X xi+1 X ′ f (x)dx − f (xi )h ≤ max |f (x)| (x − xi ) dx 0 0≤x≤1 xi i=0

i=0

h2 = max |f ′ (x)| N 0≤x≤1 2 h = max |f ′ (x)|. 2 0≤x≤1

18. An answer is provided in the appendix to the book. 19. An answer is provided in the appendix to the book. 20. Note that we are being asked to approximately evaluate 1 1 + √ erf(1), 2 2 2 where “erf” denotes the error function, common in statistics. We know that   Z 0 Z ∞ 1 1 1 1 −x2 −x2 √ √ e dx = e dx = , 2 2 2π −∞ 2π −∞ since Z



−∞

2 2 e−x dx

= =

Z





e−(x

−∞ −∞ Z 2π Z ∞

2

+y 2 )

dxdy

2

re−r drdθ

0

=

Z

0

2π.

The problem thus is related to evaluating Z 1 2 2 erf(1) = √ e−x dx. π 0

R1 2 The integral 0 e−x dx in turn can be evaluated approximately by the two-point Gaussian quadrature rule derived at the top of page 344, with a = 0, b = 1. We obtain Z 1 i √ 2 2 1 h −[(−1/√3+1)/2]2 e−x dx ≈ e + e−[(1/ 3+1)/2] ≈ 0.5814, 2 0 √ with 0.5814/ 2π ≈ 0.2320. Thus, the approximation to the given integral is about 0.5 + 0.2320 ≈ 0.7320. 99

R1 R1 2 2 Computing 0 e−x dx using matlab’s error function, we get 0 e−x dx ≈ 0.2979, for a total integral of about 0.7979, a discrepancy of only about 0.7979 − 0.7320 ≈ 0.0826, 0.7979

about 8%. 21. An answer is provided in the appendix to the book. 22. The linear interpolant over [xi , xi+1 ] is defined as: Φi (x) = f (xi ) Hence, Z b Φ(x) dx = a

= =

x − xi+1 x − xi + f (xi+1 ) xi − xi+1 xi+1 − xi

n−1 X Z xi+1 i=0 xi n−1 X Z xi+1 i=0 n−1 X i=0

xi

Φi (x) dx   x − xi+1 x − xi f (xi ) + f (xi+1 ) dx xi − xi+1 xi+1 − xi

xi+1 − xi (f (xi ) + f (xi+1 )) dx. 2

23. An answer is provided in the appendix to the book. Z 2 2 24. Let E = ex dx. If the error is proportional to hk , we have 0

 k 1 E − 16.50606 ≈ c , 4  k 1 E − 16.45436 ≈ c , 8  k 1 E − 16.45347 ≈ c . 16

Subtracting these equations and dividing the resulting equations, we get  k 1 k − 18 0.0467 4 ≈ c k k . 1 0.00589 − 1 8

16

k

Thus, 2 ≈ 7.929, so k ≈ 2.99. (Based on the theory for numerical integration, this result suggests that k = 3.) 25. An answer is provided in the appendix to the book. 26. An answer is provided in the appendix to the book.

100

Chapter 7 1. Note that y˜k+1 = y˜k + h[f (tk , y˜k ) + ǫk ] + ρk . Subtracting (7.15) from the above equation, we have y˜k+1 − y(tk+1 ) = y˜k − y(tk ) + h[f (tk , y˜k ) − f (tk , y(tk ))] + hǫk + ρk −

h2 vk . 2

Hence, k˜ yk+1 − y(tk+1 )k ≤ k˜ yk − y(tk )k + hkf (tk , y˜k ) − f (tk , y(tk ))k + hǫ + ρ +

h2 c1 M 2

≤ (1 + hL)k˜ yk − y(tk )k + hǫ + ρ +

h2 c1 M 2

with k˜ y0 − y0 k = ke0 k. Using Lemma 7.1, by setting dk = k˜ yk − y(tk )k, h2 ∗ δ = hL, K = hǫ + ρ + 2 c1 M , and using kh ≤ b − a for 0 ≤ k ≤ N, we obtain   eL(b−a) − 1 h ρ yk − y(tk )k ≤ ke0 keL(b−a) + max k˜ c1 M + ǫ + . 0≤k≤N L 2 h 2. (a) Left Riemann sum. (b) Clearly, f satisfies a Lipschitz condition with respect to y for all L > 0 since kf (t, y) − f (t, y˜)k = kf (t) − f (t)k = 0 ≤ Lky − y˜k. Furthermore, by the Theorem 7.3, we find that the error is O(h). Therefore, if the step size is sufficiently small and f is continuous on [0, 1], Euler’s method may be appropriate to use in practice. However, the step size may need to be so small that roundoff error overwhelms the practice, and, if f is smoother than just Lipschitz-continuous, a higher-order method may be more practical. (c) Since f satisfies a Lipschitz condition with respect to y for all L > 0, taking the limit in (7.24) as L → 0, we have max k˜ yk − y(tk )k ≤ ke0 k +

0≤k≤N

h ρ c1 M + ǫ + . 2 h

Thus, the roundoff error grows like 1/h, and, since Euler’s method is first-order, the point of minimum total error (roundoff plus truncation balances gives a minimal error on the order of the square root of the machine epsilon. 101

3. An answer is provided in the appendix to the book. 4. An answer is provided in the appendix to the book. 5. Here, f (t, y(t)) = f (y) is a function of the dependent variable y only. Using the mean value theorem, we have |f (t, y) − f (t, y˜)| = |b(y − y˜) + c(sin(y) − sin(˜ y ))| ≤ b|y − y˜| + c| cos ξ||y − y˜| ≤ (b + c)|y − y˜|.

that is, f satisfies a Lipschitz condition in y with Lipschitz constant b + c. Hence, by Theorem 7.3, we obtain the desired result. 6. An answer is provided in the appendix to the book. 7. By hypothesis, we have yh (xi ) − y(xi ) = c1 h + c2 h2 + c3 h3 + · · · , y h (xi ) − y(xi ) = c1 2

h h2 h3 + c2 + c3 + ··· , 2 4 8

h h2 h3 + c2 + c3 + ··· , 3 3 9 27 Hence, for any constants a1 , a2 , a3 , y h (xi ) − y(xi ) = c1

a1 yh (xi ) + a2 y h (xi ) + a3 y h (xi ) − (a1 + a2 + a3 )y(xi ) 2

3

h h2 h2 h = a 1 c1 h + a 2 c1 + a 3 c1 + a 1 c2 h 2 + a 2 c2 + a 3 c2 2 3 4 9 h3 h3 3 + a 1 c3 h + a 2 c3 + a 3 c3 + ··· . 8 27

Solving the system of the linear equations a1 + a2 + a3 = 0, a1 c1 + a22c1 + a33c1 = 0, a1 c2 + a24c2 + a39c2 = 0 yields a1 = 21 , a2 = −4, a3 = 92 . Thus, 1 9 yh (xi ) − 4y h (xi ) + y h (xi ) = y(xi ) + O(h3 ). 2 2 2 3 Now, let yˆ(a) = 12 yh (a) − 4y h (a) + 92 y h (a), then yˆ(a) is a approximation 2 3 to y(a) that is accurate to order h3 .

102

8. First, we consider the midpoint rule for quadrature. In (6.28), we approxRb imate the integral a f (x)dx by (b − a)f ( a+b 2 ). To estimate the error, we by Taylor’s theorem and integrate f over [a, b]. expand f (x) about a+b 2 Rb ′′ a+b 1 Thus, we get a f (x)dx = (b − a)f ( 2 ) + 24 (b − a)3 f (ξ). Hence, the error is O(h3 ). Next, we consider the midpoint method for IVP’s. It has the form  y0 = y(t0 ) yj+1 = yj + hΦ(tj , yj , h) where

h h , yj + f (tj , yj )). 2 2 Using the Taylor’s theorem in two variables, we obtain Φ(tj , yj , h) = f (tj +

h2 ′′ y (t) + O(h3 ) 2 h2 − y(t) − hy ′ (t) − y ′′ (t) + O(h3 ) = O(h3 ). 2

y(t + h) − y(t) − hΦ(t, y, h) = y(t) + hy ′ (t) +

Therefore, the midpoint method has order p = 2. Observe that, if f is a function of t only (and not of y), then each step of the midpoint method reduces to an application of the midpoint rule for quadrature. 9. Running a short matlab program and letting h = 0.05, we can duplicate the table on page 396 as follows: i 0 1 2 3 4

ti 1 1.05 1.1 1.15 1.2

yi (Euler) 2 2.150000 2.315250 2.491532 2.679410

yi (T.S. order 2) 2 2.155000 2.320506 2.497057 2.685219

Letting h=0.01, we obtain the following results:

103

y(ti ) (Exact) 2 2.155084 2.320684 2.497337 2.685611

i 0 1 2 3 4 5 6 7 8 9 10

ti 1 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.1

yi (Euler) 2 2.030000 2.060602 2.091612 2.123034 2.154873 2.187132 2.219815 2.252928 2.286474 2.320458

yi (T.S. order 2) 2 2.030200 2.060804 2.091816 2.123240 2.155081 2.187342 2.220028 2.253143 2.286691 2.320676

y(ti ) (Exact) 2 2.030201 2.060805 2.091818 2.123243 2.155084 2.187346 2.220327 2.253148 2.286697 2.320684

Consider the global error in the order two Taylor series method. Notice that the error for h1 = 0.05 at t = 1.05 is about e1 = 0.000084, and the error for h2 = 0.01 at t = 1.05 is about e2 = 0.0000035. Thus, h22 e2 0.012 e1 ≈ 0.041216 ≈ h21 = 0.052 = 0.04. Similarly, the error for h1 = 0.05 at t = 1.1 is about ε1 = 0.000177, and the error for h2 = 0.01 at t = 1.1 2 h2 is about ε2 = 0.0000073. Thus, εε21 ≈ 0.041216 ≈ h22 = 0.01 0.052 = 0.04. 1

Therefore, the global error in the order two Taylor series method is O(h2 ).

10. An answer is provided in the appendix to the book. 11. Using Taylor’s theorem, we have y(tj ) = y(tj−1 ) + hy ′ (tj−1 ) +

h2 ′′ h3 y (tj−1 ) + y ′′′ (ξj ), 2 6

where tj−1 ≤ ξj ≤ tj . Thus, yj − y(tj ) =yj−1 − y(tj−1 ) + h[f (tj−1 , yj−1 ) − f (tj−1 , y(tj−1 )]   h2 d d h3 + f (tj−1 , yj−1 ) − f (tj−1 , y(tj−1 ) − y ′′′ (ξj ) 2 dt dt 6 Hence, by hypothesis and noticing that h ≤ 1, we obtain |yj − y(tj )| ≤ |yj−1 − y(tj−1 )| + hL|yj−1 − y(tj−1 )|

h2 h3 + L2 |yj−1 − y(tj−1 )| + M 6  2  L2 h3 ≤ 1 + h(L + ) |yj−1 − y(tj−1 )| + M, 2 6

1 ≤ j ≤ N,

with |y0 − y(0)| = 0. Therefore, using Lemma 7.1, by letting dj = |yj − 2 3 y(tj )|, δ = h(L+ L2 ), K ∗ = h6 M , and noticing that jh ≤ 1 for 0 ≤ j ≤ N , we obtain  h2 M  L+ L2 2 − 1 |yj − y(tj )| ≤ e . 6L + 3L2 104

12. By Definition 7.2, we need to consider y(t + h) − [y(t) + h(γ1 k1 + γ2 k2 + γ3 k3 )], where k1

=

f (t, y),

k2 k3

= =

f (t + α2 h, y + hβ21 k1 ), f (t + α3 h, y + h(β31 k1 + β32 k2 )).

Expanding y using Taylor’s theorem gives y(t + h) = y(t) + hy ′ (t) + = y(t) + hf +

h2 ′′ 2 y (t)

h2 2 (ft

+

h3 ′′′ 6 y (t)

+ O(h4 )

+ f fy )

3

+ h6 (ftt + 2f fty + f 2 fyy + ft fy + f fy2 ) + O(h4 ), where f = f (t, y), and with similar notation for the others terms. Similarly, 2 2 = f + hα2 ft + hβ21 f fy + 21 h2 α22 ftt + h2 α2 β21 f fty + 12 h2 β21 f fyy

k2

+O(h3 ), and k3

= f + hα3 ft + hβ31 f fy + hβ32 fy k2 + 12 h2 α23 ftt + h2 α3 β31 f fty 2 2 +h2 α3 β32 fty k2 + 12 h2 β31 f fyy + h2 β31 β32 f fyy k2 2 fyy k22 + O(h3 ) + 21 h2 β32

= f + hα3 ft + hβ31 f fy + hβ32 fy (f + hα2 ft + hβ21 f fy ) + 12 h2 α23 ftt 2 2 +h2 α3 β31 f fty + h2 α3 β32 f fty + 12 h2 β31 f fyy + h2 β31 β32 f 2 fyy 2 2 f fyy + O(h3 ). + 21 h2 β32

Thus, y(t + h) − [y(t) + h(γ1 k1 + γ2 k2 + γ3 k3 )]

= h(1 − γ1 − γ2 − γ3 )f + 12 h2 (1 − 2α2 γ2 − 2α3 γ3 )ft

+ 12 h2 (1 − 2β21 γ2 − 2(β31 + β32 )γ3 )f fy + 16 h3 (1 − 3α22 γ2 − 3α23 γ3 )ftt + 13 h3 (1 − 3α2 β21 γ2 − 3α3 γ3 (β31 + β32 ))f fty 2 + 16 h3 (1 − 3β21 γ2 − 3γ3 (β31 + β32 )2 )f 2 fyy

+ 16 h3 (1 − 6α2 β32 γ3 )ft fy + 16 (1 − 6β21 β32 γ3 )f fy2 + O(h4 ).

105

Setting the coefficients of terms up to and including h3 equal to zero gives 1 − γ1 − γ2 − γ3 1 − 2α2 γ2 − 2α3 γ3

= =

0, 0,

1 − 2β21 γ2 − 2(β31 + β32 )γ3 1 − 3α22 γ2 − 3α23 γ3

= =

0, 0,

1 − 3α2 β21 γ2 − 3α3 γ3 (β31 + β32 ) = 2 1 − 3β21 γ2 − 3γ3 (β31 + β32 )2 =

0, 0,

1 − 6α2 β32 γ3 1 − 6β21 β32 γ3

= =

0, 0.

Then, we can find a particular solution of these equations, e.g. γ1 = 0, γ2 = 1 3 1 1 2 4 , γ3 = 4 , α2 = 1, α3 = 3 , β21 = 1, β31 = 9 , β32 = 9 . 13. The 4th order Runge–Kutta method applied to y ′ = λy yields K1 = λyk , K2 = λ(yk + h2 K1 ) = λ(1 + h2 λ)yk , K3 = λ(yk + h2 K2 ) = λ[1 + h2 λ(1 + h2 λ)]yk , K4 = λ(yk + hK3 ) = λ[1 + hλ(1 + h2 λ +

h2 λ2 4 )]yk ,

Thus, h [k1 + 2K2 + 2K3 + k4 ] 6 (λh)2 (λh)3 (λh)4 = [1 + λh + + + ]yk . 2 6 24

yk+1 = yk +

Hence, yk → ∞ as k → ∞ if and only if 3 4 2 1 + λh + (λh) + (λh) + (λh) < 1, 2 6 24

which for λ real leads to the interval of absolute stability (−2.7853, 0). 14. This method applied to y ′ = λy yields yi+1 = yi + hλ(yi +

  h (hλ)2 λyi ) = 1 + hλ + yi . 8 8

Hence, yi → ∞ as i → ∞ if and only if 2 1 + hλ + (hλ) < 1, 8

which for λ real leads to the interval of absolute stability (−8, −4)∪(−4, 0). 106

15. Using Taylor’s theorem, we have y(tk+1 ) = y(tk ) + hy ′ (tk ) +

h2 ′′ h3 y (tk ) + y ′′′ (ξk ), 2 6

and h [f (tk , yk ) + f (tk+1 , yk+1 )] 2 h h h = yk + y ′ (tk ) + y ′ (tk+1 ) + [f (tk , yk ) − f (tk , y(tk ))] 2 2 2 h + [f (tk+1 , yk+1 ) − f (tk+1 , y(tk+1 ))] 2   h ′ h ′ h2 ′′′ ′′ = yk + y (tk ) + y (tk ) + hy (tk ) + y (ηk ) 2 2 2 h h + [f (tk , yk ) − f (tk , y(tk ))] + [f (tk+1 , yk+1 ) − f (tk+1 , y(tk+1 ))] . 2 2

yk+1 = yk +

Hence,

3 h h3 h |yk+1 − y(tk+1 )| = y ′′′ (ηk ) − y ′′′ (ξk ) + [f (tk , yk ) − f (tk , y(tk ))] 4 6 2 h + [f (tk+1 , yk+1 ) − f (tk+1 , y(tk+1 ))] 2 3 h h3 hL hL ≤ M+ M+ |yk − y(tk )| + |yk+1 − y(tk+1 )|. 4 6 2 2

Thus, |yk+1 − y(tk+1 )| ≤ with |y0 − y(0)| = 0.

16.

5h3 hL |yk − y(tk )| + M, 0 ≤ k ≤ N, 2 − hL 6(2 − hL)

By Lemma 7.1, using hL < 1 and noticing that kh ≤ 1 for all 0 ≤ k ≤ N , we obtain   2khL 5h2  2−hL 5h2 2L |yk+1 − y(tk+1 )| ≤ e −1 ≤ e −1 . 12L 12L (i) Here, Φ(t, y, h) = 14 f (t, y) + αf (t + h4 , y + βhf (t, y)). When α = 34 , Φ(t, y, 0) = f (t, y), this method is consistent. (ii) By Taylor’s theorem, we have y(t + h) − [y(t) + hΦ(t, y, h)]   1 h = y(t + h) − y(t) − h f (t, y) + αf (t + , y + βhf (t, y)) 4 4 h = y(t) + hf (t, y) − y(t) − f (t, y) − hαf (t, y) + O(h2 ) 4 3 2 = h( − α)f (t, y) + O(h ). 4 107

Letting α = 34 and β be an arbitrary constant, we find that this method has order of accuracy p = 1. (iii) Similarly, note that y(t + h) − [y(t) + hΦ(t, y, h)] h h = y(t + h) − y(t) − f (t, y) − hαf (t + , y + βhf (t, y)) 4  4  h2 ∂f (t, y) ∂f (t, y) h = y(t) + hf (t, y) + + f (t, y) − y(t) − f (t, y) 2 ∂t ∂y 4   ∂f (t, y) h ∂f (t, y) − hα f (t, y) + + βhf (t, y) + O(h3 ) ∂t 4 ∂y 1 α ∂f (t, y) 1 ∂f (t, y) 3 + h2 ( − αβ)f (t, y) = h( − α)f (t, y) + h2 ( − ) 4 2 4 ∂t 2 ∂y 3 + O(h ). If a β could be chosen so that the order of accuracy is p = 2, all coefficients would have to vanish, giving α − 34 = 0 and 12 − α4 = 0 simultaneously, an impossibility. 17. (a) The Trapezoidal method applied to y ′ = λy yields yj+1 = yj +

h (λyj + λyj+1 ). 2

Thus,

2 + hλ yj . 2 − hλ Hence, yj → 0 as j → ∞ if and only if 2+hλ 2−hλ < 1. Therefore, the region of absolute stability for the Trapezoidal method, if λ is real, is the interval (−∞, 0), and if λ is complex, the open-half complex plane. yj+1 =

(b) The Backward Euler method applied to y ′ = λy yields yj+1 =

1 yj 1 − hλ

Hence, yj → 0 as j → ∞ if and only if |1 − λh| > 1. Therefore, if λ is real, the region of absolute stability for the Backward Euler method is the interval (−∞, 0) ∪ (2, ∞), and if λ is complex, the region of absolute stability is the set of complex plane given by {(x + yi) : (1 − x)2 + y 2 > 1}. 18. (a) By Taylor’s theorem, we have y(ti+1 ) = y(ti ) + hy ′ (ti ) + 108

h2 ′′ h3 y (ti ) + y ′′′ (ξi ). 2 6

Thus, |y(ti+1 ) − yi+1 | h2 ′′ h3 ′′′ ′ = y(ti ) + hy (ti ) + y (ti ) + y (ξi ) − yi − hΦ(ti , yi , h) 2 6 2 ′ h ′′ ≤ |y(ti ) − yi | + hy (ti ) + y (ti ) − hΦ(ti , y(ti ), h) 2 h3 + h|Φ(ti , y(ti ), h) − Φ(ti , yi , h)| + k3 6 h3 3 ≤ |y(ti ) − yi | + c1 h + hM |y(ti ) − yi | + k3 6   k3 h3 , 0 ≤ i ≤ N − 1, = (1 + hM )|y(ti ) − yi | + c1 + 6 with |y(t0 ) − y0 | = 0. Hence, using Lemma 7.1 and noticing that ih ≤ 1 for 1 ≤ i ≤ N , we obtain |y(ti ) − yi | ≤

(6c1 + k3 )h3 ihM 6c1 + k3 M (e − 1) ≤ (e − 1)h2 . hM M

(b) For any 1 ≤ i ≤ N , by (a) and Taylor’s theorem, we have |hy ′ (ti ) − (yi+1 − yi )|

= |hy ′ (ti ) − yi+1 + y(ti+1 ) − y(ti+1 ) + yi − y(ti ) + y(ti )|

≤ |hy ′ (ti ) − y(ti+1 ) + y(ti )| + |y(ti+1 ) − yi+1 | + |y(ti ) − yi |   h2 ≤ hy ′ (ti ) − y(ti ) + hy ′ (ti ) + y ′′ (ξi ) + y(ti ) + 2c2 h2 2 =

Hence,

h2 ′′ k2 |y (ξi )| + 2c2 h2 ≤ h2 + 2c2 h2 = c3 h2 . 2 2 ′ y (ti ) − yi+1 − yi ≤ c3 h. h

19. An answer is provided in the appendix to the book. 20. Since kBti k∞ ≤

1 , 2

0 ≤ i ≤ N,

hence, kI − Bti k∞ ≥ kIk∞ − kBti k∞ = 1 − kBti k∞ ≥ But k(I − Bti )−1 (I − Bti )k∞ = kIk∞ = 1, hence, k(I − Bti )−1 k∞ ≤ 2. 109

1 . 2

Thus, kyi+1 k∞ ≤ kI + h(I − Bti )−1 k∞ kyi k∞ ≤ (1 + 2h)kyi k∞ , 0 ≤ i ≤ N − 1. Furthermore, kyN k∞ ≤ (1 + 2h)N ky0 k∞ ≤ e2N h ky0 k∞ = e2 ky0 k∞ . 21. The implicit Simpson’s method defined by Equation (7.67) has the form yj+2 − yj =

h [f (tj+2 , yj+2 ) + 4f (tj+1 , yj+1 ) + f (tj , yj )]. 3

Here, α2 = 1, α1 = 0, α0 = −1, β2 = 13 , β1 = 43 , β0 = 13 . Thus, c0 = α0 + α1 + α2 = 0, c1 = α1 + 2α2 − (β0 + β1 + β2 ) = 0, c2 = c3 = c4 = c5 =

1 2! (α1 1 3! (α1

+ 22 α2 ) −

1 1! (β1 1 2! (β1

+ 2β2 ) = 0,

1 4! (α1 1 5! (α1

+ 24 α2 ) −

1 3! (β1 1 4! (β1

+ 23 β2 ) = 0,

3

+ 2 α2 ) − + 25 α2 ) −

+ 22 β2 ) = 0, 1 + 24 β2 ) = − 90 6= 0

Hence, by Definition 7.9, the implicit Simpson’s method has order of accuracy 4. 22. An answer is provided in the appendix to the book. 23. Note: A computer algebra system such as Mathematica may help for this problem. (a) According to Definition 7.6, the method has the form yn+2 + α1 yn+1 + cyn = h(β0 fn + β1 fn+1 + β2 fn+2 ). For the method to be consistent and of highest possible order, we use the analysis represented in the conditions (7.73) on page 407. In particular, we have c0

=

1 + α1 + c,

c1

=

c2

=

c3

=

c4

=

2 + α1 − (β0 + β1 + β2 ), 1 α1 + 2 − (β1 + 2β2 ), 2 1 4 1 α1 + α2 − β1 − 2β2 , 6 3 2 1 2 1 4 α1 + α2 − β1 − β2 . 24 3 6 3 110

For consistency and high order, we want the leading ci ’s to equal zero. Setting 0 = c1 = c2 = c3 = c4 gives α1 α2 β0 β1 β2

= −1 − c, 15 1 = − c, 16 16 1 1 = − − c, 6 2 5 1 = − c, 6 2 1 = . 3

This gives an order-4 or higher method, regardless of c. (b) For maximal order, even higher order cq should vanish. We have c5

= =

1 4 1 1 α1 + α2 − β1 − β2 120 15 24 3 23 − c . 240

Hence, the method is of order 5 or higher if and only if c5 = 0, that is, c = 23. (c) The method is explicit if and only if β2 = 0. Then, however, the method cannot be fourth-order, since, for a fourth-order method, we have β2 = 13 . If we remove the condition c4 = 0 and add the condition β2 = 0, we obtain α1 α2 β0 β1

= −1 − c, 11 1 = − c, 16 16 1 1 = − − c, 2 2 3 1 = − c. 2 2

Thus, we obtain a family of explicit order-3 methods. 24. Note: Please see the errata at http://interval.louisiana.edu/Classical-and-Modern-NA/errata.pdf

for corrections to the problem. The following is an analysis of consistency and stability, based on the corrections. (i) Here, α2 = 1, α1 = −2, α0 = 1, β2 = 0, β1 = 2, and β0 = −2. Thus, c0 = α0 + α1 + α2 = 1 − 2 + 1 = 0, c1 = α1 + 2α2 − (β0 + β1 + β2 ) = −2 + 2 − (2 − 2) = 0, 111

so the method is consistent. Note that the polynomial ρ(z) = α0 + α1 z + α2 z 2 = 1 − 2z + z 2 has a double root z1 = z2 = 1. Since |z1 | = 1, this method is not stable, from Definition 7.11(b). (ii) Here, α1 = 1, α0 = −1, β1 = 0, β0 = 2. Thus, c0 = α0 + α1 = −1 + 1 = 0, c1 = α1 − (β0 + β1 ) = 1 − (0 + 2) = −1 6= 0,

Hence, this method is not consistent. Note that the polynomial ρ(z) = α0 + α1 z = −1 + z has a single root z = 1 satisfying |z| ≤ 1, this method is stable.

(iii) Here, α1 = 1, α0 = −1, β1 = 1, β0 = 0. Thus,

c0 = α0 + α1 = −1 + 1 = 0, c1 = α1 − (β0 + β1 ) = 1 − (0 + 1) = 0,

so the method is consistent. Note that the polynomial ρ(z) = α0 + α1 z = −1 + z has a single root z = 1 satisfying |z| ≤ 1, so the method is stable. 25. Here, α2 = 1, α1 = 4, α0 = −5, β2 = 0, β1 = c, and β0 = 2. Thus, c0 = α0 + α1 + α2 = 1 + 4 − 5 = 0, c1 = α1 + 2α2 − (β0 + β1 + β2 ) = 1 + 8 − (2 + c) = 7 + c.

This method is consistent if and only if c0 = c1 = 0. Hence, when c = −7, this method is consistent. 26. Here, α0 = −5, α1 = 4, α2 = 1. Consider the polynomial ρ(z) = α0 + α1 z + α2 z 2 = −5 + 4z + z 2 . It has roots z1 = −5, z2 = 1. Since |z1 | > 1, this method is not stable. 27. The predictor-corrector pair applied to y ′ = λy yields yk+1 = yk + αhf (tk , yk + hf (tk , yk )) + (1 − α)hf (tk , yk ) = yk + αhλ(yk + hλyk ) + (1 − α)hλyk = (1 + hλ +

h2 λ2 )yk . 3

2 2 Hence, yk → 0 as k → ∞ if and only if 1 + hλ + h 3λ < 1. If λ is real, the interval of absolute stability of this method is (−3, 0) 112

28. The following matlab function can be used to solve the IVP: function [t,y]=ch_7_28(a, b, alpha, N, f) y(1) =alpha; t(1)=a; h=(b-a)/N; for i= 2:4 t(i)=a+(b-a)/N*i; K1=feval(f,t(i-1),y(i-1)); K2=feval(f,t(i-1)+h/2,y(i-1)+h/2*K1); K3=feval(f, t(i-1)+h/2,y(i-1)+h/2*K2); K4=feval(f, t(i-1)+h,y(i-1)+h*K3); y(i)=y(i-1)+h/6*(K1+2*K2+2*K3+K4); end for i=5:N t(i)=a+(b-a)/N*i; y(i)=y(i-1)+h/24*(55*feval(f, t(i-1),y(i-1))... -59*feval(f, t(i-2),y(i-2))+37*feval(f, t(i-3),y(i-3))... -9*feval(f, t(i-4),y(i-4))); y(i)=y(i-1)+h/24*(9*feval(f, t(i),y(i))... +19*feval(f, t(i-1),y(i-1))-5*feval(f, t(i-2),y(i-2))... +feval(f, t(i-3),y(i-3))); end

29. (a) Using the program developed in Exercise 28 with f(t,y) == t + y, we get the following table, corresponding to parts (a) through (d) of this problem. N 10 20 40 80 160 320 640 e−2

yi ei ≈ 0.718283618752232 1.79 × 10−6 ≈ 0.718282081879898 2.53 × 10−7 ≈ 0.718281849925418 2.14 × 10−8 ≈ 0.718281829997879 1.53 × 10−9 ≈ 0.718281828561739 1.03 × 10−10 ≈ 0.718281828465674 6.63 × 10−12 ≈ 0.718281828459467 4.21 × 10−13 ≈ 0.718281828459046

αi 0.1415 0.0847 0.07169 0.06673 0.06454 0.06315

The αi seem to be converging to 1/16. (1/0.06315 ≈ 15.74). This is because the overall method is of order 4, so when h is halved, the error goes down by roughly a factor of (1/2)4 . (b) It is easy to find that the exact solution of this IVP is y = −t+ et − 1. Hence, ye = −2 + e. 30. Since matlab functions will accept vectors as well as scalars, minimal alteration is required for the function from problem 28. In particular, the only thing required is to change references to y(i) (and similar references to components of y) to references to y(i,:). The resulting matlab function will work for both scalar equations and systems of equations. The program is as follows. 113

function [t,y]=ch_7_30(a, b, alpha, N, f) y(1,:) =alpha; t(1)=a; h=(b-a)/N; for i= 2:4 t(i)=a+(b-a)/N*(i-1); K1=feval(f,t(i-1),y(i-1,:)); K2=feval(f,t(i-1)+h/2,y(i-1,:)+h/2*K1); K3=feval(f, t(i-1)+h/2,y(i-1,:)+h/2*K2); K4=feval(f, t(i-1)+h,y(i-1,:)+h*K3); y(i,:)=y(i-1,:)+h/6*(K1+2*K2+2*K3+K4); end for i=5:N+1 t(i)=a+(b-a)/N*(i-1); y(i,:)=y(i-1,:)+h/24*(55*feval(f, t(i-1),y(i-1,:))... -59*feval(f, t(i-2),y(i-2,:))+37*feval(f, t(i-3),y(i-3,:))... -9*feval(f, t(i-4),y(i-4,:))); y(i,:)=y(i-1,:)+h/24*(9*feval(f, t(i),y(i,:))... +19*feval(f, t(i-1),y(i-1,:))-5*feval(f, t(i-2),y(i-2,:))... +feval(f, t(i-3),y(i-3,:))); end

(a) Identifying v1 with y and v2 with y ′ , the system becomes   v2 v′ = . 6et − 2v1 − 3v2 Writing down the characteristic equation and using the method of undetermined coefficients (a technique learned in a junior engineering course on differential equations), the exact solution is y(t) = e−2t + e−t + et . (This result could also be obtained through a computer algebra system such as Mathematica or Maple.) The function for matlab is function fval = f_ch_7_30a(t,y) fval(1) = y(2); fval(2) = 6*exp(t) - 2*y(1) - 3*y(2);

and we produce a table of values with the matlab command [t,y] = ch 7 30(0,1,[3;-2],10,’f ch 7 30a’)

We may program the exact solution as function y = y_ch_7_30a_exact(t) y(:,1) = exp(-2*t) + exp(-t) + exp(t); y(:,2) = -2*exp(-2*t) - exp(-t) + exp(t);

114

and we produce a table of values for the exact solution and its derivative with the matlab command y exact = y ch 7 30a exact(t)

and we obtain a plot with the command plot(t,y(:,1),t,y exact(:,1))

The plot is as follows. 3.4 3.3 3.2 3.1 3 2.9 2.8 2.7 2.6

0

0.2

0.4

0.6

0.8

1

Indeed, with N = 10, the approximate solution is indistinguishable from the exact solution. (b) With v1 = z, v2 = z ′ , v3 = y, v4 = y ′ , the system becomes   v2  v12 − v3 + et  , v′ =    v4 v1 − v32 − et and the corresponding matlab function for f is function fval = f_ch_30b(t,y) fval(1) = y(2); fval(2) = y(1)^2 - y(3) + exp(t); fval(3) = y(4); fval(4) = y(1) -y(3)^2 - exp(t);

We then run our ODE integrator with the command [t,y] = ch 7 30(0,1,[0;0;1;-2],10,’f ch 7 30b’)

We plot z and y as a function of t with plot(t,y(:,1),t,y(:,3))

to obtain the following plot.

115

1

0.5

0

−0.5

−1

−1.5

−2

0

0.2

0.4

0.6

31. Let P be the nonsingular matrix whose pendent eigenvectors of A. Then  λ1 0  0 λ 2 AP = P   ··· ··· 0 0

0.8

1

columns are the n linearly inde 0 0   ···  λn

··· ··· ··· ···

where λ1 , λ2 , · · · , λn are the eigenvalues of A.

Letting y = P z and using y ′ (t) = Ay(t), we get P z ′ (t) = AP z(t). Thus,

Hence,

Thus,



λ1  0 ′ −1 z (t) = P AP z(t) =   ··· 0 

c1 eλ1 t  0 z(t) =   ··· 0

0 c2 eλ2 t ··· 0

y(t) =

n X i=1

116

0 λ2 ··· 0 ··· ··· ··· ···

ci eλi t xi .

··· ··· ··· ···

 0 0   z(t). ···  λn

 0  0  ···  cn eλn t

32. Consider the Pad´e method corresponding to m = 1 and n = 2, the Pad´e 6+2z approximation to ez is 6−4z+z 2 . To show that the method is A-stable, we must show that there are no solutions z = x + iy to 6 + 2z 6 − 4z + z 2 < 1 with x < 0. This leads to

|6 + 2z|2 < |6 − 4z + z 2 |2 ,

i.e. (6 + 2x)2 + (2y)2 < (6 − 4x + x2 = y 2 )2 + (−4y + 2xy)2 .

Replacing the “ 0, where 0 ≤ h ≤ h0 .)

Therefore, when the solution y(t) is sufficiently smooth, the local error for this method is O(h4 ), which implies that the global error is O(h3 ). 34. (a) First of all, the three “‘hat” functions are   1 − x 0 ≤ x ≤ 1, ϕ0 (x) = , 0 1≤x≤2   x 0 ≤ x ≤ 1, ϕ1 (x) = , 2−x 1 ≤x ≤2   0 0 ≤ x ≤ 1, ϕ2 (x) = . x−1 1 ≤x ≤2

Since we have a finite data set, we will minimize with respect to the discrete dot product (f, g) =

m X

f (ti )g(ti ),

i=1

with m = 7 and the ti in the set {0, 3, 0.6, 0.9, 1.2, 1.5, 1.8, 2}. The overdetermined set of equations for least squares thus becomes     ϕ0 (t1 ) ϕ1 (t1 ) ϕ2 (t1 ) a(t1 )   c0  ϕ0 (t2 ) ϕ1 (t1 ) ϕ2 (t2 )   a(t2 )      c1  =    , .. ..     . . c2 ϕ0 (tm ) ϕ1 (tm ) ϕ2 (tm ) a(tm ) That is,

         

0.7 0.4 0.1 0 0 0 0

  0.3 0  0.6 0      0.9 0  c 0    0.8 0.2   c1  =    0.5 0.5  c2    0.2 0.8  0 1 118

5.0 5.2 4.8 4.7 5.5 4.2 4.9



    .    

This system may be solved with a QR decomposition in matlab as follows. >> M = [0.7 0.3 0 0.4 0.6 0 0.1 0.9 0 0 0.8 0.2 0 0.5 0.5 0 0.2 0.8 0 0 1] M = 0.7000 0.3000 0 0.4000 0.6000 0 0.1000 0.9000 0 0 0.8000 0.2000 0 0.5000 0.5000 0 0.2000 0.8000 0 0 1.0000 >> b = [5;5.2;4.8;4.7;5.5;4.2;4.9] b = 5.0000 5.2000 4.8000 4.7000 5.5000 4.2000 4.9000 >> [Q,R] = qr(M) Q = -0.8616 0.2063 0.0673 0.3346 -0.4924 -0.2063 -0.0673 -0.5141 -0.1231 -0.6188 -0.2020 -0.2854 0 -0.6051 -0.0461 0.7108 0 -0.3782 0.2552 -0.1795 0 -0.1513 0.5564 -0.0698 0 0 0.7572 0.0034 R = -0.8124 -0.6647 0 0 -1.3222 -0.4311 0 0 1.3207 0 0 0 0 0 0 0 0 0 0 0 0 >> QTB = Q’*b QTB = -7.4593 -8.5705 6.2508 -0.2933 0.5421 -0.7225 0.0011 >> c = R(1:3,1:3)\QTB(1:3) c = 5.1410 4.9388 4.7331 >>

0.2481 -0.4445 0.0412 -0.2436 0.7793 -0.1979 -0.1826

0.1617 -0.3750 0.3677 -0.1979 -0.2620 0.6740 -0.3687

0.1041 -0.3286 0.5855 -0.1675 -0.2894 -0.4114 0.5073

Note that the n+1-st through m-th components of QT b, representing the residual in the least squares fit, are an order of magnitude smaller than the first 3 components. This gives confidence that the fit is good. (b) We will try our program from problem 30. The function for the right-hand-side is as follows. 119

function fval = f_ch_7_34(t,y) fval = a_ch_7_34(t)*y*(1-(1/5)*y); function a = a_ch_7_34(t) c = [ 5.1410; 4.9388; 4.7331 ]; a = c(1)*phi_ch_7_34(0,t) + c(2)*phi_ch_7_34(1,t)... + c(3)*phi_ch_7_34(3,t); function val = phi_ch_7_34(i,t) if i==0 if (t1) val = 0; else val = 1-t; end elseif i==1 if (t2) val = 0; else if (t> Y = [2.0030 1.2399; 0.0262 0.0767] Y = 2.0030 1.2399 0.0262 0.0767 >> x = [8.0;-0.9] x = 8.0000 -0.9000 >> F = [x(1)*cos(x(2)) + 3 + exp(x(2)^2) x(1)*x(2)^2 - x(1) + 2*x(2)] F = 10.2208 -3.3200 >> x = x - Y*F x = -8.3558 -0.9131 >> F = [x(1)*cos(x(2)) + 3 + exp(x(2)^2) x(1)*x(2)^2 - x(1) + 2*x(2)] F = 0.1945 -0.4378 >> x = x - Y*F x = -8.2026 -0.8847 >> F = [x(1)*cos(x(2)) + 3 + exp(x(2)^2) x(1)*x(2)^2 - x(1) + 2*x(2)] F = -0.0096 0.0137 >> x = x - Y*F x = -8.2004 -0.8855 >> F = [x(1)*cos(x(2)) + 3 + exp(x(2)^2) x(1)*x(2)^2 - x(1) + 2*x(2)] F = 1.0e-004 * -0.1008 0.1213 >> x = x - Y*F x = -8.2004 -0.8855 >> F = [x(1)*cos(x(2)) + 3 + exp(x(2)^2) x(1)*x(2)^2 - x(1) + 2*x(2)] F = 1.0e-008 * -0.5761 0.9325 >> x = x - Y*F x = -8.2004 -0.8855 >> F = [x(1)*cos(x(2)) + 3 + exp(x(2)^2) x(1)*x(2)^2 - x(1) + 2*x(2)] F = 1.0e-011 * -0.4020 0.6519 >>

123

At least initially, the error appears to be decreasing quadratically, with the norm of F decreasing (roughly) from 10 to 10−1 to 10−2 to 10−4 to 10−8 , but then to 10−11 . (a) To show it is a contraction, it is sufficient to show kG′ (x)k < 1 for every x in the ball, by Theorem 8.3. Theorem 8.2 holds in any norm, so we may use k ·k∞ , and use interval arithmetic to obtain bounds on the ranges of the elements of G′ . We may use the following matlab dialog (with the intlab toolbox). >> x = [midrad(-8.2005,0.001);midrad(-.8855,0.001)] intval x = [ -8.2016, -8.1994] [ -0.8866, -0.8844] >> Fprime = [cos(x(2)), -x(1)*sin(x(2)) + 2*x(2)*exp(x(2)^2) x(2)^2-1, 2*x(1)*x(2)+2] intval Fprime = [ 0.6321, 0.6337] [ -10.2457, -10.2111] [ -0.2177, -0.2141] [ 16.5049, 16.5413] >> normGprime = norm(eye(2) - Y*Fprime,inf) intval normGprime = [ 0.0000, 0.0613] >>

This proves that every matrix G′ (x) has norm at most 9.0613 < 1, so G is a contraction within the specified ball. To show that G maps the ball into itself, we would like to do an interval evaluation of G. However, the naive interval evaluation does not give a sufficiently small range, so we try the mean value extension, using mean value form.m from the Moore / Kearfott / Cloud book. We program G with the following function (G ch 8 4.m; note that vectors heed to be expressed as row vectors here.). function G = G_ch_8_4(x) Y = [2.0030 1.2399; 0.0262 0.0767]’; F = [x(1)*cos(x(2))+3+exp(x(2)^2), x(1)*x(2)^2-x(1)+2*x(2)]; G = x - F*Y; G=G’;

With this G, may use the following dialog. >> x = [midrad(-8.2005,0.001), midrad(-.8855,0.001)] intval x = [ -8.2016, -8.1994] [ -0.8866, -0.8844] >> Gx = mean_value_form(’G_ch_8_4’,x) intval Gx = [ -8.2006, -8.2003] [ -0.8855, -0.8854]

This dialog proves that G maps the ball into itself, so the hypotheses of the Contraction Mapping Theorem are satisfied. To compare with Theorem 8.4, we use the following dialog: 124

>> x = [midrad(-8.2005,0.001), midrad(-.8855,0.001)] intval x = [ -8.20150000000001, -8.19949999999999] [ -0.88650000000001, -0.88449999999998] >> x = mean_value_form(’G_ch_8_4’,x)’ intval x = [ -8.20050410625927, -8.20038151981793] [ -0.88546588616765, -0.88546051094983] >> x = mean_value_form(’G_ch_8_4’,x)’ intval x = [ -8.20044280560666, -8.20044280185012] [ -0.88546317779996, -0.88546317157449] >> x = mean_value_form(’G_ch_8_4’,x)’ intval x = [ -8.20044280372724, -8.20044280372688] [ -0.88546317467291, -0.88546317466847] >> x = mean_value_form(’G_ch_8_4’,x)’ intval x = [ -8.20044280372707, -8.20044280372705] [ -0.88546317467069, -0.88546317467067] >> x = mean_value_form(’G_ch_8_4’,x)’ intval x = [ -8.20044280372707, -8.20044280372705] [ -0.88546317467068, -0.88546317467067] >> x = mean_value_form(’G_ch_8_4’,x)’ intval x = [ -8.20044280372707, -8.20044280372705] [ -0.88546317467068, -0.88546317467067] >> Fprime = [cos(x(2)), -x(1)*sin(x(2)) + 2*x(2)*exp(x(2)^2) x(2)^2-1, 2*x(1)*x(2)+2] intval Fprime = [ 0.63293097575189, 0.63293097575190] [ -10.22773553579915, -10.22773553579913] [ -0.21595496630213, -0.21595496630212] [ 16.52238023738696, 16.52238023738698] >> Gprime = eye(2) - Y*Fprime intval Gprime = 1.0e-003 * [ 0.00181828696133, 0.00181828696389] [ 0.05502186955480, 0.05502186960800] [ -0.01904564932681, -0.01904564932674] [ 0.70010683035603, 0.70010683035793] >> norm(Gprime,inf) intval ans = 1.0e-003 * [ 0.71915247968278, 0.71915247968474]

(Note that we have iterated until the displayed intervals became stationary.) This proves rigorously that kG′ k∞ < 0.72 × 10−3 at the fixed point. 5. As in Problem 5, we may verify the hypotheses of the Contraction Mapping Theorem using interval arithmetic with a mean value extension for G. We program G for mean value form.m as follows. function G = G_ch_8_5(x) G = [(x(1)^2+x(2)^2+8)/10; (x(1)*x(2)^2+x(1)+8)/10];

The following dialog proves that G maps D0 into itself. >> x = [infsup(0,1.5), infsup(0,1.5)] intval x = [ 0.0000, 1.5000] [ 0.0000, >> Gx = mean_value_form(’G_ch_8_5’,x)’ intval Gx = [ 0.4624, 1.3626] [ 0.3359,

1.5000]

1.4985]

To verify kG′ k < 1 on D0 , we use: >> Gprime = [2*x(1)/10, 2*x(2)/10; x(2)^2/10, x(1)/10] intval Gprime = [ 0.0000, 0.3001] [ 0.0000, 0.3001] [ 0.0000, 0.2251] [ 0.0000, 0.1501] >> norm(Gprime,inf) intval ans = [ 0.0000, 0.6001]

125

This proves that kG′ (x)k∞ < 0.6001 for x ∈ D0 , so the hypotheses of the Contraction Mapping Theorem are satisfied. 6. Using G ch 8 5.m from problem 5, we have >> x = [0.5;0.5] x = 0.5000 0.5000 >> x = G_ch_8_5(x) x = 0.8500 0.8625 >> x = G_ch_8_5(x) x = 0.9466 0.9482 >> x = G_ch_8_5(x) x = 0.9795 0.9798 >> x = G_ch_8_5(x) x = 0.9919 0.9920 >> x = G_ch_8_5(x) x = 0.9968 0.9968 >>

7. An answer is provided in the appendix to the book. 8. (a) We have G(x) =

0.5

−0.25

−0.25

0.5

so ′

G (x) = and ′

kG (x)k∞

! 

0.5 −0.25 −0.25



0.5

cos(x1 ) sin(x2 )

! 



+



− sin(x1 ) cos(x2 )

1 2





,

,

!

= 0.75,

0.5 ∞

0.5 −0.25 −0.25

where the inequality is an equality for some x1 and x2 . Thus, G is a contraction on R2 , and the hypotheses of the Contraction Mapping Theorem are satisfied. 126

(b) We have α = .75. Starting with x(0) = (0, 0)T , we have

     

0.5 1 0 (0) (0)

kG(x ) − x }∞ =

−0.25 + 2 − 0 = 1.75.

Thus, Formula (8.10) on page 443 (part of the Contraction Mapping Theorem) along with the condition kx(k) − x∗ k < 0.001 give kx(k) − x∗ k∞ ≤

0.75k · 1.75 < 0.001. 1 − 0.75

Solving the right inequality for k gives  log 71 · 10−3 . k> log(0.75) Since the right member is bounded above by 30.8, k = 31 will do. In fact, if we may iterate the following function in matlab. function G = G_ch_8_8(x) A = [0.5 -0.25;-0.25 0.5]; b = [1;2]; v = [cos(x(1));sin(x(2))]; G = A*v + b;

(We do the iteration as in problem 4.) We find that, actually, less than 18 iterations are required. 9. We iterate Newton’s method using the function newton sys.m, available for download from the web site http://interval.louisiana.edu/Classical-and-Modern-NA/ for matlab routines for the book. The function and its Jacobian matrix may be programmed as function y = F_ch_8_9a(x) y = [x(1)^2 - x(2)^2 + 1; 2*x(1)*x(2)] function A = Fprime_ch_8_9a(x) A = [2*x(1),-2*x(2);2*x(2),2*x(1)]

We obtain: >> x = [0.2;0.7] x = 0.2000 0.7000 >> [root,success] = newton_sys(x,’F_ch_8_9a’,’Fprime_ch_8_9a’,1e-8,3) y = 0.5500

127

0.2800 i = 1 x = 0.2000 0.7000 norm_fval = 0.6172 A = 0.4000 1.4000 y = -0.0130 -0.1792 i = 2 x = -0.0887 1.0104 norm_fval = 0.1797 A = -0.1774 2.0208 y = 0.0074 -0.0025 i = 3 x = -0.0012 0.9963 norm_fval = 0.0078 A = -0.0025 1.9925

-1.4000 0.4000

-2.0208 -0.1774

-1.9925 -0.0025

This results are the same as what we got in problem 35 from Chapter 2. 10. The formula is programmed in the matlab script ch 8 10.m, available on the media distributed with this instructor’s manual, as follows. % ch_8_10.m A = [2 -1 0;-1 2 -1; 0 1 2] X = diag([1/2,1/2,1/2]) norm(eye(3)-A*X) for i=1:3 X = 2*X - X*A*X

128

norm(eye(3)-A*X) end

We get the following matlab dialog. >> ch_8_10 A = 2 -1 0 -1 2 -1 0 1 2 X = 0.5000 0 0 0.5000 0 0 ans = 0.7071 X = 0.5000 0.2500 0.2500 0.5000 0 -0.2500 ans = 0.5000 X = 0.6250 0.2500 0.2500 0.5000 -0.1250 -0.2500 ans = 0 X = 0.6250 0.2500 0.2500 0.5000 -0.1250 -0.2500 ans = 0 >>

0 0 0.5000

0 0.2500 0.5000

0.1250 0.2500 0.3750

0.1250 0.2500 0.3750

(a) Convergence is extremely fast, and could be quadratic. We get the inverse of A in only two iterations. (b) It requires two matrix-matrix multiplications and multiplication of a matrix by a scalar per iteration. For a full matrix and usual matrix multiplication, this would be 2n3 + n2 matrix multiplications. For a tridiagonal matrix (a very sparse matrix), the number of multiplications is 6n2 + n2 = 7n2 . Analysis: Computing the inverse of a tridiagonal matrix by Gaussian elimination with repeated back-substitution is also only proportional to n2 . Thus, this algorithm may only be the algorithm of choice for computing inverses if the computer architecture makes it easy to do matrix-matrix multiplications. 11. Let

(k+1)

g(x) = fi (x1

(k+1)

, x2

(k+1)

, · · · , xi

(k)

, x, xi+1 , · · · , x(k) n ),

where x represents the i-th coordinate xi . Then, the one-step SOR method involves computing one step of the univariate Newton’s method applied (k) to g with starting point x = xi , then relaxing the result. We observe

129

that

∂fi (x(k,i) ) , ∂xi

(k)

gi′ (xi ) = so a step of Newton’s method is (k)

x˜ = xi



fi (x(k,i) ) ∂fi (x(k,i) ) ∂xi

,

and the SOR relaxation of x ˜ is (k+1)

xi

=

(k)

(1 − σk )xi

=

(1 −

=

xi

(k)

(k) σk )xi

− σk

+ σk x ˜ (k) xi

+ σk

fi (x(k,i) ) ∂fi (x(k,i) ) ∂xi



fi (x(k,i) ) ∂fi (x(k,i) ) ∂xi

!

.

12. We will use newton sys.m, available from http://interval.louisiana.edu/Classical-and-Modern-NA/

with the following function and derivative. function y = F_ch_8_12(x) y = [x(1)^2 - 10*x(1) + x(2)^2 + 8 x(1)*x(2)^2 + x(1) - 10*x(2) + 8]; function A = Fprime_ch_8_9a(x) A = [2 * x(1) - 10, 2*x(2) x(1)^2 + 1, 2*x(1)*x(2) - 10];

We used the following matlab dialog. >> [x_star,success] = newton_sys([0.5;0.5], ’F_ch_8_12’, ’Fprime_ch_8_12’, 1e-10, 4) i = 1 x = 0.5000 0.5000 norm_fval = 5.0389 i = 2 x = 0.9377 0.9392 norm_fval = 0.5357 i = 3 x = 0.9987 0.9984 norm_fval =

130

0.0127 i = 4 x = 1.0000 1.0000 norm_fval = 7.5046e-006

13. (a) Continuous F -differentiability follows from part of the conclusion of Theorem 8.3 (on page 444), after observing that each component of F has continuous partial derivatives. (See the proof of Theorem 8.3.) (b) By Lemma 8.4 (page 455), it is sufficient to show F (y) − F (x) ≥ F ′ (x)(y − x). In particular, F is convex if the above inequality holds for each component of fi of F and, indeed, F is convex, by definition, if each component of F is convex. Observe that f1 and fn are linear, and hence must be convex. For fi , 1 < i < n, we have fi (y) − fi (x)

−(yi−1 − xi−1 ) + 3(yi − xi ) − (yi+1 − xi+1 ) + eyi − exi

= ≥

−(yi−1 − xi−1 ) + 3(yi − xi ) − (yi+1 − xi+1 ) +exi (yi − xi ) (∇fi )T (x)(y − x),

=

thus verifying that the hypothesis of Lemma 8.4 holds. Hence, F is convex. (c) F ′ is of the form   1 0 ··· 0    −1 3 + ex2 −1 0 ···      0 ··· . −1 3 + ex3 −1  0    . .. ..  .. .. ..   . . . . . .   . 0

0

0

···

0

−1

We observe that, for all x, this matrix is strictly diagonally dominant, and hence is invertible. (d) It is well known that a diagonally dominant matrix with positive diagonal elements and non-positive off-diagonal elements (known as an “M-matrix”) is positive definite, and has an inverse that is positive. (For example, see M. Fiedler, Special Matrices and Their Applications in Numerical Mathematics, Martinus Nijhoff, Dordrecht, 1986., or R. A. Horn and C. R. Johnson, Topics in Matrix Analysis, Cambridge University Press, Cambridge, 1991.) Thus, Theorem 8.8 implies that the Newton iterates converge to the unique solution x∗ for any x(0 ∈ Rn . 131

14. An answer is provided in the appendix to the book. 15. We need to show that, given x ˇ ∈ x, for any x ∈ x, there is an A ∈ A such that F (x) − F (ˇ x) = A(x − x ˇ). However, the fact that A is a Lipschitz matrix means precisely that this true for any x ∈ x and xˇ ∈ x, regardless of whether we are thinking of fixing x ˇ and letting x ∈ x vary. 16. The fact that A is a Lipschitz matrix for F over x follows directly from the definition of Lipschitz matrix and from the multivariate mean value theorem (Theorem 8.1 on page 441). 17. The proof is analogous to the argument leading to (2.11) on page 54: Let x∗ ∈ x be a solution of F (x) = 0. Then The multivariate mean value theorem (Theorem 8.1, on page 441) implies that 0 = F (x∗ ) = F (ˇ(x)) + A(x∗ − x ˇ) for some A ∈ A (since A is either a slope matrix or a Lipschitz matrix). Letting v = x∗ − x ˇ, we have Av = −F (ˇ x), so, by the fundamental theorem of interval arithmetic, v must be in any interval vector v that contains the solution set to Av = −F (ˇ x). However, from our definition of v, it follows that x∗ = x ˇ + v ∈ xˇ + v = N (f, x, x ˇ). 18. We will use nonlinear Gauss Seidel image.m and Krawczyk step.m from the Moore / Kearfott / Cloud book. The function F is the same as in problem 9 of this chapter. (a) We obtain the following matlab dialog. >> x = [rigorinfsup(’-0.1’,’0.2’);rigorinfsup(’0.8’,’1.1’)] intval x = [ -0.1001, 0.2001] [ 0.7999, 1.1001] >> xcheck = [0.05;0.95] xcheck = 0.0500 0.9500 >> [x,error] = nonlinear_Gauss_Seidel_image(x,xcheck,’F_ch_8_9a’) intval y = [ 0.0999, 0.1001] [ 0.0949, 0.0951] intval gradient value y.x = [ -0.2101, 0.4001] [ -0.2201, 0.4401]

132

intval gradient derivative(s) y.dx = [ -0.2001, 0.4001] [ -2.2001, -1.5999] [ 1.5999, 2.2001] [ -0.2001, 0.4001] intval x = [ -0.0429, 0.0262] [ 0.9714, 1.0396] error = 0 >> xcheck = mid(x) xcheck = -0.0083 1.0055 >> [x,error] = nonlinear_Gauss_Seidel_image(x,xcheck,’F_ch_8_9a’) intval y = [ -0.0110, -0.0109] [ -0.0168, -0.0167] intval gradient value y.x = [ -0.0808, 0.0582] [ -0.0892, 0.0545] intval gradient derivative(s) y.dx = [ -0.0858, 0.0524] [ -2.0792, -1.9428] [ 1.9428, 2.0792] [ -0.0858, 0.0524] intval x = [ -0.0015, 0.0015] [ 0.9985, 1.0014] error = 0 >> xcheck = mid(x) xcheck = 0.0000 0.9999 >> [x,error] = nonlinear_Gauss_Seidel_image(x,xcheck,’F_ch_8_9a’) intval y = 1.0e-003 * [ 0.1330, 0.1331] [ 0.0082, 0.0083] intval gradient value y.x = [ -0.0027, 0.0030] [ -0.0030, 0.0030] intval gradient derivative(s) y.dx = [ -0.0030, 0.0030] [ -2.0027, -1.9970] [ 1.9970, 2.0027] [ -0.0030, 0.0030] intval x = [ -0.0001, 0.0001] [ 0.9999, 1.0001] error = 0 >> xcheck = mid(x) xcheck = -0.0000 1.0000 >> [x,error] = nonlinear_Gauss_Seidel_image(x,xcheck,’F_ch_8_9a’) intval y = 1.0e-007 * [ -0.1062, -0.1061] [ -0.0620, -0.0619] intval gradient value y.x = 1.0e-005 * [ -0.4486, 0.4465] [ -0.4078, 0.4066] intval gradient derivative(s) y.dx = [ -0.0001, 0.0001] [ -2.0001, -1.9999] [ 1.9999, 2.0001] [ -0.0001, 0.0001] intval x = [ -0.0001, 0.0001] [ 0.9999, 1.0001] error = 0

133

>>

We see quadratic convergence. Moreover, since the image under the method is contained in the interior of the original x, the final answer must contain the unique solution to F = 0 within the original x, that is, this computation is a proof that there is a unique solution, and it must lie within the box ([−0.0001, 0.0001], [0.9999, 1.0001])T . (The actual box was computed more accurately, but only five digits were displayed, because we used format short.) (b) The computations are analogous to part (a): >> x = [rigorinfsup(’-0.1’,’0.2’);rigorinfsup(’0.8’,’1.1’)] intval x = [ -0.1001, 0.2001] [ 0.7999, 1.1001] >> xcheck = [0.05;0.95] xcheck = 0.0500 0.9500 >> [x,error] = nonlinear_Gauss_Seidel_image(x,xcheck,’F_ch_8_9a’) intval y = [ 0.0999, 0.1001] [ 0.0949, 0.0951] intval gradient value y.x = [ -0.2101, 0.4001] [ -0.2201, 0.4401] intval gradient derivative(s) y.dx = [ -0.2001, 0.4001] [ -2.2001, -1.5999] [ 1.5999, 2.2001] [ -0.2001, 0.4001] intval x = [ -0.0429, 0.0262] [ 0.9714, 1.0396] error = 0 >> xcheck = mid(x) xcheck = -0.0083 1.0055 >> [x,error] = nonlinear_Gauss_Seidel_image(x,xcheck,’F_ch_8_9a’) intval y = [ -0.0110, -0.0109] [ -0.0168, -0.0167] intval gradient value y.x = [ -0.0808, 0.0582] [ -0.0892, 0.0545] intval gradient derivative(s) y.dx = [ -0.0858, 0.0524] [ -2.0792, -1.9428] [ 1.9428, 2.0792] [ -0.0858, 0.0524] intval x = [ -0.0015, 0.0015] [ 0.9985, 1.0014] error = 0 >> xcheck = mid(x) xcheck = 0.0000 0.9999 >> [x,error] = nonlinear_Gauss_Seidel_image(x,xcheck,’F_ch_8_9a’) intval y = 1.0e-003 * [ 0.1330, 0.1331] [ 0.0082, 0.0083] intval gradient value y.x = [ -0.0027, 0.0030] [ -0.0030, 0.0030]

134

intval gradient derivative(s) y.dx = [ -0.0030, 0.0030] [ -2.0027, -1.9970] [ 1.9970, 2.0027] [ -0.0030, 0.0030] intval x = [ -0.0001, 0.0001] [ 0.9999, 1.0001] error = 0 >> xcheck = mid(x) xcheck = -0.0000 1.0000 >> [x,error] = nonlinear_Gauss_Seidel_image(x,xcheck,’F_ch_8_9a’) intval y = 1.0e-007 * [ -0.1062, -0.1061] [ -0.0620, -0.0619] intval gradient value y.x = 1.0e-005 * [ -0.4486, 0.4465] [ -0.4078, 0.4066] intval gradient derivative(s) y.dx = [ -0.0001, 0.0001] [ -2.0001, -1.9999] [ 1.9999, 2.0001] [ -0.0001, 0.0001] intval x = [ -0.0001, 0.0001] [ 0.9999, 1.0001] error = 0 >>

In this case, the Krawczyk iterates are identical to the interval Gauss– Seidel iterates, although, in general, this will not be the case. The interpretation of the results is the same as in part (a). (c) We program our own solver, based on the interval linear system bounder ch 8 18.m verifylss that is included in intlab and on formulas (8.59) and (8.60) on page 464. We use nonlinear Gauss Seidel image.m as a template, to obtain the following. function NX = ch_8_18(X,y,f) % [NX,error_occurred] = ch_8_18(X,y,f) % (built using nonlinear_Gauss_Seidel_image as a template) % First compute f(y) using interval arithmetic to bound % roundoff error -n = length(X); iy = midrad(y,0); fy = feval(f,iy); % Now compute F’(X) -Xg = gradientinit(X); FXg = feval(f,Xg); % Now, do the interval Newton step V = verifylss(FXg.dx, -fy); NX = y + V;

135

--

We have the following matlab dialog. >> x = [rigorinfsup(’-0.1’,’0.2’);rigorinfsup(’0.8’,’1.1’)] intval x = [ -0.1001, 0.2001] [ 0.7999, 1.1001] >> xcheck = [0.05;0.95] xcheck = 0.0500 0.9500 >> x = ch_8_18(x,xcheck,’F_ch_8_9a’) intval y = [ 0.0999, 0.1001] [ 0.0949, 0.0951] intval gradient value y.x = [ -0.2101, 0.4001] [ -0.2201, 0.4401] intval gradient derivative(s) y.dx = [ -0.2001, 0.4001] [ -2.2001, -1.5999] [ 1.5999, 2.2001] [ -0.2001, 0.4001] intval x = [ -0.0286, 0.0233] [ 0.9739, 1.0258] >> xcheck = mid(x) xcheck = -0.0026 0.9999 >> x = ch_8_18(x,xcheck,’F_ch_8_9a’) intval y = [ 0.0002, 0.0003] [ -0.0053, -0.0052] intval gradient value y.x = [ -0.0523, 0.0523] [ -0.0586, 0.0478] intval gradient derivative(s) y.dx = [ -0.0571, 0.0466] [ -2.0516, -1.9478] [ 1.9478, 2.0516] [ -0.0571, 0.0466] intval x = [ -0.0001, 0.0001] [ 0.9999, 1.0001] >> xcheck = mid(x) xcheck = 0.0000 1.0000 >> x = ch_8_18(x,xcheck,’F_ch_8_9a’) intval y = 1.0e-005 * [ 0.6870, 0.6871] [ 0.0707, 0.0708] intval gradient value y.x = 1.0e-003 * [ -0.1448, 0.1585] [ -0.1510, 0.1524] intval gradient derivative(s) y.dx = [ -0.0002, 0.0002] [ -2.0002, -1.9998] [ 1.9998, 2.0002] [ -0.0002, 0.0002] intval x = [ -0.0001, 0.0001] [ 0.9999, 1.0001] >> xcheck = mid(x) xcheck = -0.0000 1.0000 >> x = ch_8_18(x,xcheck,’F_ch_8_9a’) intval y = 1.0e-010 * [ -0.1168, -0.1167] [ -0.0243, -0.0242] intval gradient value y.x =

136

1.0e-009 * [ -0.5862, [ -0.5770, intval gradient [ -0.0001, [ 1.9999, intval x = [ -0.0001, [ 0.9999, >>

0.5629] 0.5721] derivative(s) y.dx = 0.0001] [ -2.0001, 2.0001] [ -0.0001,

-1.9999] 0.0001]

0.0001] 1.0001]

We see that the convergence rate is similar to what we observed in parts (a) and (b), but the results in each iteration are sharper. (More sophisticated techniques are used in verifylss.m.) 19. Students’ explanations may vary on this problem. The points of the problem are to highlight the analogy to multiplication of a scalar function by a scalar, and to give students additional practice in viewing properties of matrix multiplication. (Students can simply write down what the derivative is in terms of sums.) 20. The multivariate mean value theorem, proven in problem 1 of this chapter, shows that, for x ∈ x and x ˇ ∈ x, fi (x) = fi (ˇ x) +

n X ∂f i (ci )(xj − x ˇj ). ∂xj j=1

If we replace the unknown quantity ∂f i /∂xj by an interval inclusion for its range over x, and we replace x by its range x, then the fundamental theorem of interval arithmetic implies that fi (x) ∈ fi (ˇ x) +

n X ∂f i (x)(xj − xˇj ), ∂x j j=1

from which the assertion follows. 21. On page 467, we have ϕi (t) ∈ (Y F )i (ˇ x1 , . . . , x ˇi−1 , t, x ˇi+1 , . . . , x ˇn )) + I, where I is an interval expression depending on the components of x other than xi , that we pretend is constant, even though it depends on the interval vector x. (That is, we may think of ϕi (t) as having values at the point t that are intervals representing uncertainties due to variation in the coordinates other than the i-th one.) We then have ϕ′i (t) =

∂(Y F )i (ˇ x1 , . . . , x ˇi−1 , t, x ˇi+1 , . . . xˇn ), ∂xi

137

so the univariate interval Newton method applied to ϕi becomes N (ϕi ; t, tˇ)

ϕi (tˇ) = tˇ − ′ ϕi (t) (Y F )i (ˇ x) + I = tˇ − ∂(Y F ) , i x) ∂xi (ˇ

where x ˇ = (ˇ x1 , . . . , x ˇi−1 , t, x ˇi+1 , . . . xˇn ) and x ˇ = (ˇ x1 , . . . , x ˇi−1 , t, x ˇi+1 , . . . x ˇn ) Plugging in the expressions for (Y F )i (ˇ x) + I and for ∂(Y F )i /∂xi (ˇ x) with tˇ = x ˇi and t = xi gives formula (8.62) for the nonlinear Gauss–Seidel method. 22. The details to be filled in are mentioned on lines 4 and 5 of page 470. These details are entirely analogous to the detailed argument on the previous page, so students merely need to go through that argument carefully, reversing inequality signs as appropriate. 23. By problem 19 of this section and the fact that the Jacobian matrix of the sum of two functions is the sum of the Jacobian matrices of the individual functions, the Jacobian matrix of G is G′ (x) = I − Y F ′ (x). (Writing things down was already done in problem 19.) 24. This is a simple observation. We have b+ s = f (x+ ) − f (x). We simply divide both sides by the scalar b+ , observe what x+ and s are, and observe how the quasi-Newton method is defined. 25. The students can give various proofs depending on which theorems they assume. Here is one proof that doesn’t assume much. kv + wk22

= = = =

(v + w) ◦ (v + w)

kvk2 + kwk2 + 2v ◦ w (kvk + kwk)2

kvk2 + kwk2 + 2kvkkwk

if and only if v ◦ w = kvkkwk. Now, write w = wv + wv⊥ , where wv = αv ˜ and wv⊥ ◦ v = 0. Then, by expanding (˜ αv + wv⊥ ) ◦ (˜ αv + wv⊥ ), 138

we have kwk2 = kα ˜ v + wv⊥ k2 = |˜ α|kvk2 + kwv⊥ k2 , so, for v ◦ w to equal kvkkwk, we must have p v◦w =α ˜v ◦ v + 0 = α ˜ kvk2 = kvk |˜ αkkvk2 + kwv⊥ k2 , whence

α ˜ kvk =

p |˜ αkkvk2 + kwv⊥ k2 .

If w 6= 0, this can only happen if α ˜ ≥ 0 and kwv⊥ k2 = 0. We thus have α = 1/α ˜ , and the first part of the problem is done. Now, we consider (8.80) on page 475. The Euclidean norm of kAkE ) (also known as the Frobenius norm) is simply the norm of the matrix considered as a vector in Rn×n , so, by our previous computations, the square of left side of the inequality and the square of the right side of the inequality in Remark 8.20, with v = λA and w = (1 − λ)B, differ by 2kvkkwk − 2v ◦ w, which we have seen to be equal only if v = αw for some positive scalar α. It remains to show λ(B1 − B) + (1 − λ)B2 − B) 6= 0 for any λ. But [λ(B1 −B)+(1−λ)(B2 −B)]s = λ(y−Bs)+(1−λ)(y−BS) = y−BS 6= 0. 26. We have written a matlab function Broyden step.m, according to (8.87), (8.88), and (8.89), as follows. function [xplus,fxplus,Bplusinv] = Broyden_step(x,fx,Binv,F) % First, compute the step based on the present matrix -s = -Binv * fx % Now, compute xplus and fxplus -xplus = x + s; fxplus = feval(F,xplus); % Update the inverse y = fxplus - fx; Binvy = Binv*y; Bplusinv = Binv + (s - Binvy)*s’*Binv / (s’*Binvy);

We initialize Binv to be the identity matrix, and produce the following matlab dialog. (We abridge this dialog by deleting all but the iterates x, until the last few steps.) 139

>> x = [0.2;0.7] x = 0.2000 0.7000 >> Binv = eye(2) Binv = 1 0 0 1 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) x = -0.3500 0.4200 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) x = 5.9573 -1.5400 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) x = -0.6430 0.2441 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) x = -1.1435 -0.1400 -0.2951 -0.1710 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) x = -0.5344 0.5900 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) x = -0.6726 0.7673 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) x = 5.4204 -0.1398 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) x = -0.9669 0.3866 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) x = -1.4500 0.1373 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) x = 1.4798 0.4210 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) x = 2.0821 1.2679 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) x = 0.5867 1.3869 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) s = -0.0227 -0.7070 x = 0.5640 0.6799 Fx = 0.8559 0.7669 Binv =

140

0.1316 0.2459 -0.3000 0.3214 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) s = -0.3013 0.0102 x = 0.2627 0.6901 Fx = 0.5928 0.3626 Binv = 0.3109 0.5429 -0.3656 0.2126 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) s = -0.3811 0.1396 x = -0.1184 0.8297 Fx = 0.3255 -0.1965 Binv = 0.3044 0.5362 -0.5544 0.0152 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) s = 0.0063 0.1835 x = -0.1122 1.0132 Fx = -0.0141 -0.2273 Binv = -0.0691 0.5592 -0.5416 0.0144 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) s = 0.1261 -0.0043 x = 0.0140 1.0089 Fx = -0.0177 0.0282 Binv = -0.0631 0.4928 -0.5381 -0.0246 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) s = -0.0150 -0.0088 x = -0.0010 1.0001 Fx = -0.0002 -0.0021 Binv = -0.0449 0.4699 -0.5405 -0.0216 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’)

141

s = 1.0e-003 * 0.9684 -0.1281 x = -0.0001 0.9999 Fx = 1.0e-003 * 0.1009 -0.1388 Binv = -0.0429 0.5056 -0.5390 0.0049 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) s = 1.0e-004 * 0.7448 0.5506 x = 0.0000 1.0000 Fx = 1.0e-004 * -0.0922 0.1020 Binv = -0.0232 0.4829 -0.5212 -0.0157 >> [x,Fx,Binv] = Broyden_step(x,Fx,Binv,’F_ch_8_9a’) s = 1.0e-005 * -0.5141 -0.4646 x = -0.0000 1.0000 Fx = 1.0e-007 * 0.7086 -0.7823 Binv = -0.0211 0.4809 -0.5193 -0.0175 >>

We observe that the iterates are essentially random until a reasonable B −1 is developed, but then, there appears to be a superlinear convergence. Traditionally, this is ameliorated by choosing t(k) < 1 in (8.74), and also, traditionally, Binv is sometimes initialized using a finite difference approximation, then taking the inverse. Note that the actual inverse at the solution is:   0.0 0.5 (F ′ (x∗ ))−1 = . −0.5 0.0 In this case, the Broyden iterates Binv appear to be converging to the actual inverse. (This is not the case in general.) 27. We use the function homotopy step.m that is posted on the web page for the book. The required functions for this problem are also posted on the 142

web page. We use the matlab script ch 8 27 to generate the array of x and t to plot, as follows. >> ch_8_27 >> x x = -2.0000 -1.6097 >> t t = 0 0.0878 >> plot(x,t) >>

-1.2201

-0.8313

-0.4440

-0.0592

0.3201

0.6861

1.0113

0.1781

0.2722

0.3721

0.4817

0.6091

0.7723

1.0115

0

0.5

The plot is as follows. 1.4

1.2

1

0.8

0.6

0.4

0.2

0 −2

−1.5

−1

−0.5

1

1.5

We do not see the predictor and corrector steps on this plot. However, they can be obtained by appropriately modifying homotopy step.m, and using a variation of matlab’s plot function.

143

Chapter 9 1. To see that the equations x2

=

x1

=

x1 + (x3 − x1 )(1 − α)

and

x0 + (x2 − x0 )α.

are the correct conditions, simply draw a figure of the process and study the relationship between successive steps of the algorithm. Here is a solution to the system: Plugging in the values for x0 , x1 , x2 , and x3 into these two equations and simplifying gives the same equation for each, namely: (a − b)(−1 + α + α2 ) = 0. Thus, unless the end points of the original interval are the same, we must have α2 + α − 1 = 0, and the positive root of this quadratic is √ 5−1 α= . 2 Note: A computer algebra system, such as Mathematica or Maple, could be useful for this problem. 2. Since the Frech´et derivative of ϕ at x is ∇ϕ(x),

ϕ(x + λs) = ϕ(x) + λ(∇ϕ(x))T s + o (kλsk) .

Thus, if λ > 0 is sufficiently small, the term λ(∇ϕ(x))T s dominates the o (kλsk) term; the result then follows from the assumption (∇ϕ(x))T s < 0. 3. An answer is provided in the appendix to the book. 4. This problem can actually be done using matlab’s plotting facilities. For example, the following script (named ch 9 4.m) can be used line([-2.1,1.6],[-1.1,4.1],’Color’,’white’) draw_simplex([-1,3],[-2,3],[-2,4],’black’,’1’) draw_simplex([1/2,-1],[-1,3],[-2,3],’blue’,’2’) draw_simplex([1/2,-1],[-1,3],[3/2,-1],’red’,’3’) % Actual optimizing point -text(0,0,’*’)

where draw simplex is as follows. function [success] = draw_simplex(P1,P2,P3,color,label) line([P1(1),P2(1)],[P1(2),P2(2)],’Color’,color); line([P1(1),P3(1)],[P1(2),P3(2)],’Color’,color); line([P2(1),P3(1)],[P2(2),P3(2)],’Color’,color); centroid = (1/3)*(P1+P2+P3); text(centroid(1),centroid(2),label,’Color’,color) success = 1;

144

Running ch 9 4.m in the matlab command window gives the following plot. 5

4 1 3

2 2 1 3 *

0

−1

−2 −2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

5. We will choose λk to minimize ϕ in the direction −∇ϕ(x(k) . (This is one variation of the steepest descent method. We start with x(0) = (−2, 3)T : Step 1—     −2 4 (0) (0) x ← , s = −∇ϕ(x ) = , 3 −6

and f (λ) = ϕ(x(0) +λs) = (−2+4λ)2 +(3−6λ)2 . For this simple illustrative example, f can be minimized using elementary calculus, giving λ0 = 12 , so         1 −2 −4 0 0 x(1) = + = , s = −∇ϕ(x(1) ) = , 3 −6 0 0 2 and we have obtained a local minimum in only one step! (In general, this cannot be expected to happen. The reason it does here is because the contours of ϕ are perfect circles centered on the optimizing point. The more eccentric the contours are, the less efficient the steepest descent method will be.)

6. We again start with x(0) = (−2, 3)T , and we will carry 4 digits in the computations: Step 1 — (0)

x





−2 3



,

(0)

s = −∇ϕ(x 145

)=



0.04 −6



,

and f (λ) = ϕ(x(0) + λs) = (−2 + 0.04λ)2 + (3 − 6λ)2 . Again, calculus gives λ0 ≈ 0.5022, so       −2 −0.04 −1.9799 (1) x ← + 0.5022 ≈ , 3 −6 −0.0132   0.0396 s ← −∇ϕ(x(1) ) ≈ . 0.0264 Step 2 — f (λ) = ϕ(x(1) + λs) = (−1.9799 + 0.0396λ)2 + (−0.0132 + 0.0264λ)2 . We obtain λ0 ≈ 34.7675, so       −1.9799 0.0396 −0.6032 x(2) ← + 34.7675 ≈ , −0.0132 0.0264 0.9047   0.0121 (1) s ← −∇ϕ(x ) ≈ . −1.8093 We do not see immediate convergence. In fact, if we plotted the iterates, we would see zig-zagging progress that would only slowly move towards the local minimum. This is because the contours are eccentric: The steepest descent method is sensitive to scaling. 7. This is very similar to the previous two problems. We have     −4 + 2x + y −1 ∇f (x, y) = , so s = −∇f (x(0) ) = , −1 + x −1 and g(λ) = f (2 − λ, 1 − λ) = 3 − 2λ + 2λ2 , with a minimum at λ = This gives       1 1 −1 1.5 (1) x = + = . 1 −1 0.5 2

1 2.

8. An answer is provided in the appendix to the book. 9. (a) min

v

subject to

v v v v v v v v

≥ ≥ ≥ ≥ ≥ ≥ ≥ ≥

x2 − 1 −x2 + 1 x1 + x2 − 4 −x1 − x2 + 4 2x1 + x2 − 5 −2x1 − x2 + 5 3x1 + x2 − 8 −3x1 − x2 + 8

(b) We choose to put it into the form of matlab’s linprog solver. That is, we need to provide a vector c, a matrix A, and a matrix b such that 146

the problem is min cT x such that Ax ≤ b. We have three variables, where we identify the slack variable x3 with v. We obtain     0 1 −1 1  0 −1 −1   −1       1   4    1 −1     0  −1 −1 −1   −4     .   c= 0 , A= , and b =  2 1 −1  5      1  −2 −1 −1   −5       3  8  1 −1  −3 −1 −1 −8

(c) We use matlab’s linprog (from the optimization toolbox), with the following dialog. >> c = [0;0;1] c = 0 0 1 >> A = [0 1 -1; 0 -1 -1; 1 1 -1; -1 -1 -1; 2 1 -1; -2 -1 -1; 3 1 -1; -3 -1 -1] A = 0 1 -1 0 -1 -1 1 1 -1 -1 -1 -1 2 1 -1 -2 -1 -1 3 1 -1 -3 -1 -1 >> b = [1;-1;4;-4;5;-5;8;-8] b = 1 -1 4 -4 5 -5 8 -8 >> [x,minval] = linprog(c,A,b) Optimization terminated. x = 2.0000 1.5000 0.5000 minval =

147

0.5000 >>

The ℓ∞ fit is thus g(t) = 2t + 1.5. (Compare this to the ℓ2 fit we may obtain using a singular value decomposition, namely g˜(t) = 2.2t + 1.2.) 10. Suppose x solves (9.22), and set v to be the minimum value of the objective function. Then, since v ≥ |fi (x)| ∀i, (x, v) is a feasible point of (9.23). Furthermore, if (x, v) does not minimize (9.23), there must be a smaller v˜ < v corresponding to a feasible point of (9.23). However, since at least one of the inequalities in (9.23) is an equality for (x, v), v˜ would need to correspond to a different x, say x ˜. Then, x ˜ would lead to a smaller objective in (9.22), contradicting the optimality assumption. Conversely, suppose (x, v) solves (9.23). Then, v = max1≤i≤m |fi (x)|, because, otherwise, all of the constraints would be strictly satisfied, and a smaller v would also satisfy the constraints, contradicting optimality of (9.23). If the corresponding x is not the minimum of (9.22), we could find a x˜ such that max1≤i≤m |fi (˜ x)| is smaller. Then, setting v˜ = max |fi (˜ x)| 1≤i≤m

leads to a feasible point (˜ x, v˜) for (9.23) with smaller objective value, contradicting the assumption that (x, v) optimizes (9.23). 11. (a) We identify vi with |fi (x)|. We then have Pm min i=1 vi subject to



vi



vi

fi (x),

−fi (x),

1≤i≤m 1≤i≤m

(b) Again, we will use matlab’s linprog function from the optimization toolbox, with the following dialog. >> A = [ 0 1 -1 0 0 0 0 -1 -1 0 0 0 1 1 0 -1 0 0 -1 -1 0 -1 0 0 2 1 0 0 -1 0 -2 -1 0 0 -1 0 3 1 0 0 0 -1 -3 -1 0 0 0 -1] A = 0 1 -1 0 0 -1 -1 0 1 1 0 -1 -1 -1 0 -1

0 0 0 0

148

0 0 0 0

2 1 0 0 -1 -2 -1 0 0 -1 3 1 0 0 0 -3 -1 0 0 0 >> b = [1;-1;4;-4;5;-5;8;-8] b = 1 -1 4 -4 5 -5 8 -8 >> [x,minval] = linprog(c,A,b) Optimization terminated. x = 2.3333 1.0000 0.0000 0.6667 0.6667 0.0000 minval = 1.3333 >>

0 0 -1 -1

This gives a fit of h(t) = 2.3t + 1, close to, but not the same as, the ℓ1 and ℓ2 fits. Summarizing, we get ℓ1 : 2.3t + 1.0 ℓ2 : 2.2t + 1.2 ℓ∞ : 2.0t + 1.5 12. (a) We will put into the form of (9.18). Identify A with x1 , B with x2 , C with x3 , and D with x4 . We need to introduce a slack variable for each off the 10 inequality constraints for a total of n = 14 variables. We have m1 = 10. We have c0 = 0, and we represent the objective as z = cT x, where c = (c1 , . . . , cn )T . We have 

100  −12  −100   100   0 A=  0   0   0  0 0

50 −4 0 0 −50 50 0 0 0 0

80 −4.8 0 0 0 0 −80 80 00 00

40 −4 0 0 0 0 0 0 −40 40

149

1 0 0 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0 0 0

0 0 0 0 1 0 0 0 0 0

0 0 0 0 0 1 0 0 0 0

0 0 0 0 0 0 1 0 0 0

0 0 0 0 0 0 0 1 0 0

0 0 0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 0 0 1

             

and 

      b=      

200, 000 −18, 000 0 100, 000 0 100, 000 0 100, 000 0 100, 000



      ,      



          c=          

10.0 3.5 4.0 3.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0



          .          

(b) We use matlab to obtain, for t = 0, x2 , x3 , x6 , x9 , x11 , and x14 are zero, while the other variables are nonzero. That is, we have 6 zero values and 8 non-zero values. For t = 1250/0.9285, we have x2 , x6 , x9 , x12 , and x14 are zero, while all the other variables are non-zero. That is, we have 5 zero values and 9 nonzero values. Since m1 = 10 and n − m1 = 4, This does not contradict Theorem 9.2. However, it is not a usual case where exactly m1 variables are positive. (c) At the midpoint of the line connecting the extreme points in part (b), exactly m1 = 10 variables are positive; this is the usual case. (The variables x2 , x6 , x9 , and x14 are zero.) 13. We use the following matlab dialog (in file ch 9 13.m) to draw the boundaries of the inequalities x = linspace(0,1); y1 = 1-x; y2 = (4+x)/2; plot(x,y1,x,y2);

to get 2.5

2

1.5

1

0.5

0

0

0.2

0.4

150

0.6

0.8

1

Since the feasible set to the first inequality lies below the bottom line, and the feasible set to the second inequality lies above the top line, the feasible set for the problem is empty. 14. It will be most instructive if students solve this by hand, since they see the inner workings of the simplex method in that case. We will follow the steps starting on page 508. We do these steps here using matlab. In these computations, the first two columns of the tableau represent x1 , and x2 , the next four columns represent the slack variables, the last column represents the right side vector b, and the last row represents the objective value. >> A = [1 -2 1 0 0 0 4 1 1 0 1 0 0 15 -1 1 0 0 1 0 6 1 0 0 0 0 1 8 2 1 0 0 0 0 0 ] A = 1 -2 1 0 0 0 4 1 1 0 1 0 0 15 -1 1 0 0 1 0 6 1 0 0 0 0 1 8 2 1 0 0 0 0 0 >> for i=2:5;A(i,:) = A(i,:) - A(i,1)*A(1,:);end >> A A = 1 -2 1 0 0 0 4 0 3 -1 1 0 0 11 0 -1 1 0 1 0 10 0 2 -1 0 0 1 4 0 5 -2 0 0 0 -8 >> A(4,:) = A(4,:)/2 A = 1.0000 -2.0000 1.0000 0 0 0 3.0000 -1.0000 1.0000 0 0 -1.0000 1.0000 0 1.0000 0 1.0000 -0.5000 0 0 0 5.0000 -2.0000 0 0 >> for i=1:3;A(i,:) = A(i,:) - A(i,2)*A(4,:);end >> A(5,:) = A(5,:) - A(5,2)*A(4,:) A = 1.0000 0 0 0 0 0 0 0.5000 1.0000 0 0 0 0.5000 0 1.0000 0 1.0000 -0.5000 0 0 0 0 0.5000 0 0 >> A(2,:) = A(2,:)/A(2,3) A = 1.0000 0 0 0 0 0 0 1.0000 2.0000 0 0 0 0.5000 0 1.0000 0 1.0000 -0.5000 0 0 0 0 0.5000 0 0 >> for i=3:5;A(i,:) = A(i,:) - A(i,3)*A(2,:);end >> A A = 1 0 0 0 0 1 8 0 0 1 2 0 -3 10 0 0 0 -1 1 2 7 0 1 0 1 0 -1 7 0 0 0 -1 0 -1 -23 >>

0 0 0 0.5000 0

4.0000 11.0000 10.0000 2.0000 -8.0000

1.0000 -1.5000 0.5000 0.5000 -2.5000

8.0000 5.0000 12.0000 2.0000 -18.0000

1.0000 -3.0000 0.5000 0.5000 -2.5000

8.0000 10.0000 12.0000 2.0000 -18.0000

Thus, the maximum occurs for x1 = 8, x2 = 7, s1 = 10, s3 = 7, and the maximum value is 23. 151

15. We first put the problem into the form for (9.1). We have: minimize subject to

ϕ(x)

=

−2x1 − x2

x1 − 2x2 x1 + x2 −x1 + x2 x1 −x1 −x2

≤ ≤ ≤ ≤ ≤ ≤

4 15 6 8 0 0

There are no equality constraints in this formulation. We have     −2 1 1 −1 1 −1 0 ∇ϕ = , ∇g = . −1 −2 1 1 0 0 −1 We thus get a system of quadratics (including the ui gi = 0). 16. The problem in Example 9.10 is different from that of Example 9.11 in the sense that it is a 0-1 integer nonlinear programming problem, that is, the values of the decision variables (parameters) are all either 0 or 1. Let ri ∈ {0, 1} denote whether or not you harvest fish in year i, with ri = 1 denoting that fish are harvested, and let Pi denote the total profit actually cashed in up to and during year i, 0 ≤ i ≤ 2, given in dollars at the beginning of year 0. It is assumed that the fish that are left in the pond at the end of year 3 represent profit. Let Ai be the number of fish in the pond at the beginning of year i, so Pi+1 = (0.75)i 0.7ri+1 (1000Ai ) + Pi and Ai+1 = (2 − ri )Ai , 0 ≤ i ≤ 2, with A0 = 10, P0 = r0 (.7)(1000)A0 , and P4 = P3 + 1000A3 . We need to choose r0 , r1 , and r2 to maximize P4 . There is software for solving 0, 1 integer programming problems, although these problems are difficult for large numbers of decision variables. In this particular example, we can program the objective in matlab, using the recursion, and try each of the 8 possible variable vectors (r0 , r1 , r2 ). (However, we call these [r(1),r(2),r(3)], since matlab subscripts must begin with 1.) We use the following function: function profit = P_ch_9_16(r) A(1) = 10; P(1) = r(1)*700*.75^3*A(1); for i=1:2 A(i+1) = (2-r(i))*A(i); P(i+1) = (0.75)^(i) * 700 * r(i+1) * A(i) + P(i); end

152

profit = P(3); profit = P(5);

A corresponding matlab dialog is >> r = [0 0 0 ] r = 0 0 >> P_ch_9_16(r) ans = 0 >> r = [0 0 1 ] r = 0 0 >> P_ch_9_16(r) ans = 7875 >> r = [0 1 0 ] r = 0 1 >> P_ch_9_16(r) ans = 5250 >> r = [0 1 1 ] r = 0 1 >> P_ch_9_16(r) ans = 13125 >> r = [1 0 0 ] r = 1 0 >> P_ch_9_16(r) ans = 2.9531e+003 >> r = [1 0 1 ] r = 1 0 >> P_ch_9_16(r) ans = 6.8906e+003 >> r = [1 1 0 ] r = 1 1 >> P_ch_9_16(r) ans = 8.2031e+003

0

1

0

1

0

1

0

153

>> r = [1 1 1 ] r = 1 1 >> P_ch_9_16(r) ans = 1.2141e+004 >>

1

This computation shows that the maximum profit, in dollars at the beginning of year 0, is $13, 125, and it occurs when we do not harvest during the first hear, but harvest during the second and third year. (We do not count the fish that are left in the pond.) This is consistent with the optimal path represented in Figure 9.6. Note: Working backward, as in Figure 9.6, instead of forward, as implied by these computations, we obtain the same result, but with less total operations in evaluating the function. (We do, however, need to store the values at each node.)

154

Chapter 10 1. An answer is provided in the appendix to the book. 2. (a) A = A2 + h2 A1 + h2 A0 , where  −2 1 0   1 −2 1    0 1 −2 A2 =   . ..     0 ··· 

A1

0

  p2    0 =       0

−p1

0

p3 .. .

0

0 −p2

···

0

1 .. . 1

−2

0

1

···

0 .. .

0

0

−p3 .. .

··· .. .

0 .. .

0

−pN −2

pN −2 ···

     ,    1   −2

··· .. .

0

0



pN −1

0



     ,     

and A0 is the diagonal matrix whose i-th diagonal entry is −qi , 1 ≤ i ≤ N − 1, where pi = p(xi ) and qi = q(xi ). Thus, the matrix is tridiagonal, with diagonal element in the i-th row equal to −2 − h2qi , left off-diagonal element (except for i = 1) equal to h2 pi , and right off-diagonal element (except for i = N − 1) equal to − h2 pi .

(b) From part (a), A will be strictly diagonally dominant if h | − 2 − h2 qi | > 2 |pi |, 2

1 ≤ i ≤ N − 1.

Noting that | − 2 − h2 qi | > 2

and h|pi | ≤ h max |p(x)|, 0≤x≤1

a condition for A strictly diagonally dominant is 2 > h max |p(x)|. Dividing this inequality by 2 gives the result. 3. An answer is provided in the appendix to the book. 4. We use the Taylor polynomial expansion y(h) = y(0) + hy ′ (0) + 155

h 2 ′′ h3 y (0) + y ′′′ (ch ) 2 6

to represent u1 ∼ y(h), while −2u0 = −2y(0) and 2αh = −2y ′ (0)h. Plugging these in and canceling like terms, we obtain 2u1 − 2u0 − 2αh =

h 2 ′′ h3 y (0) + y ′′′ (ch ). 2 6

Dividing this by h2 thus gives 2u1 − 2u0 − 2αh = y ′′ (0) + O(h). h2 A similar computation holds for y ′′ (1). It is not true that (10.21) is second order when α = β = 0. As a counterexample, consider 2 y(t) = t2 − t3 . 3 Then y satisfies the differential equation 8 y ′′ + 2y ′ + 4y = 2 − t3 , 3 with y ′ (0) = y ′ (1) = α = β = 0, and with y ′′ (0) = 2. However, assuming u1 = y(h) and u0 = y(0), the first approximation in (10.21) becomes 2u1 − 2u0 − 2αh h2

= =

2h2 − 23 h3 h2 2 + O(h).

5. The “F”-norm id defined in formula (10.32) on page 548 to be  Z 1 1 1 2 ′ ′ 2 2 kY − ykF = r(x)(Y (x) − y (x)) + (Y (x) − y(x)) s(x) dx, 2 2 0 where the differential equation is in the form (10.23) on page 544:      − d r(x) dy + s(x)y(x) = f (x), 0 < x < 1, dx dx   y(0) = y(1) = 0.

Comparing these forms to this problem, we see that r(x) ≡ 1 and s(x) ≡ 0, which, plugging in to the formula for the “F”-norm, immediately gives Z 1 1 ′ kY − yk2F = (y (x) − Y ′ (x))2 x 2 0 To prove the bound, recall (10.34) (page 548), which asserts that kY −ykF is minimum over all piecewise linear functions on the given partition. In particular, kY − ykF is at least as small as kS − ykF , where S is any other 156

piecewise linear function over the given partition. In particular, suppose S is the piecewise linear interpolant to the actual solution y, and suppose Si (x) is the portion of this interpolant on [xi , xi−1 ]. Then the mean value theorem shows that Si′ (x) ≡ y ′ (ci ) for some ci ∈ [xi−1 , xi ]. Thus, kS − yk2F =

Z M X 1 i=1

2

xi

xi−1

(y ′ (ci ) − y ′ (x))2 dx.

However, |y ′ (ci ) − y ′ (x)| = |y ′′ (ξi )kci − x| ≤ max |y ′′ (x)|h. 0≤x≤1

Thus, Z

xi



xi−1



2

(y (ci ) − y (x)) dx ≤ =

Z

xi

( max |y ′′ (x)|h)2 dx

xi−1 0≤x≤1



2 max |y (x)| h3 , ′′

0≤x≤1

and Z M X 1 i=1

2

xi

xi−1

(y ′ (ci ) − y ′ (x))2 dx

≤ = =

1 M 2



2 max |y ′′ (x)| h3

0≤x≤1



2 11 ′′ max |y (x)| h3 2 h 0≤x≤1  2 1 max |y ′′ (x)| h2 . 2 0≤x≤1

Taking square roots thus gives kY − ykF ≤ kS − ykF ≤ ch max |y ′′ (x)|, 0≤x≤1

√ where c = 1/ 2. (Note the error in the statement of the problem in the book.) 6. Since the basis functions are smooth in this case, we may use (10.24) instead of (10.25), and keep in mind that the Galerkin method consists of PM setting up a system of equations with Y = i=1 ai ϕi , and (L(Y ), ϕj ) = (f, ϕj ),

1 ≤ j ≤ M,

where L is the operator representing the left side of the differential equation and the dot product is given by Z 1 (f, g) = f (x)g(x)dx. 0

157

We elect to use (10.24) here, although students may also use (10.25). We obtain, with ϕ1 (x) = sin(πx) and ϕ2 (x) = sin(2πx), that ϕ1 and ϕ2 are orthogonal with respect to our dot product, (ϕ1 , ϕ1 ) = 12 , (ϕ2 , ϕ2 ) = 12 , 1 −ϕ′′1 = π 2 ϕ1 , a−ϕ′′2 = 4π 2 ϕ2 , (x, ϕ1 ) = π1 , and (x, ϕ2 ) = − 2π . Thus, the Galerkin orthogonality relations become     2     π 2 (ϕ1 , ϕ1 ) (ϕ2 , ϕ1 ) c1 π (ϕ1 , ϕ1 ) 4π 2 (ϕ2 , ϕ1 ) + (ϕ1 , ϕ2 ) (ϕ2 , ϕ2 ) c2 π 2 (ϕ1 , ϕ2 ) 4π 2 (ϕ2 , ϕ2 ) 2   (x, ϕ1 ) = , (x, ϕ2 ) that is,  2          π 2 1/2 0 π /2 0 c1 1/π + = , 0 4π 2 /2 0 1/2 c2 −1/(2π) 2

or



5π 2 /8 0 0 17π 2 /8



c1 c2



=



1/π −1/(2π)



.

Thus, c1 = 8/(5π 3 ), c2 = −4/(17π 3), and Y (x)

=

8 4 sin(πx) − sin(2πx) 3 5π 17π 3



0.0516 sin(πx) − 0.0076 sin(2πx).

7. The equation is a Volterra integral equation of the second kind, as in formula (10.44) on page 554, and we can use the existence and uniqueness theorem (Theorem 10.6 on page 555). We have g(t) ≡ 1, and K(t, s, u) =

sin(t − s) , 1 + u2

it is immediate that g and t are continuous, so we merely need to verify that K is Lipschitz in u. But K is everywhere continuously differentiable in u, so

∂K

|K(t, s, u) − K(t, s, v)| = (t, s, cuv )

u − v| ∂u 2u sin(t − s) |u − v| = − (1 + u2 )2 √ 3 3 ≤ |u − v|. 8 √ Hence, K is Lipschitz with Lipschitz constant 3 3/8, so the hypotheses of Theorem 10.6 hold, and the equation has a unique continuous solution.

158

8. The remark follows from the mean value theorem, as in the solution to problem 7, and as in the proof of Proposition 2.1 on page 41. 9. Assume 1 ≤ n ≤ N . Then the iteration in Remark (10.13) is of the form F = G(F ), where F = (F1 , . . . , FN ). Also suppose F˜ = (F˜1 , . . . , F˜N ). Finally, we should assume that K satisfies a Lipschitz condition P as in Definition 10.3 (page 555), with Lipschitz constant L, and that N i=0 |wni | = M . Then, the n-th component of |G(F ) − G(F˜ )| is n X h i wni K(tn , ti , Fi ) − K(t + n, ti , F˜i ) |Fn − F˜n | = h i=1

≤ h

N X i=1

≤ hL

|wni |L|Fi − F˜i |

N X i=1

|wni | max |Fi − F˜i | 1≤i≤N

= hLM max |Fi − F˜i |. 1≤i≤N

Thus, G is a contraction in k · k∞ if hLM < 1. If we assume K is Lipschitz over all of R, then the above holds for every F and F˜ in RN , and G : RN → RN , so the Contraction Mapping Theorem (Theorem 8.2 on page 442) asserts that G has a unique fixed point. 10. An answer is provided in the appendix to the book. 11. If we want the matrix to be symmetric, we need to assume the kernel is symmetric, that is, that K(s, t) = K(t, s). As a counterexample, take K(t, s) ≡ 0.9t, and take ϕ1 (t) ≡ 1 and ϕ2 (t) ≡ −t. In this case, the matrix for the system (10.84) is     0.450 −0.200 (Lϕ1 , ϕ1 ) (Lϕ1 , ϕ2 ) = , (Lϕ2 , ϕ1 ) (Lϕ2 , ϕ2 ) −0.275 0.183 (Although positive-definiteness is defined for non-symmetric matrices, we usually only deal with the symmetric case. Also, this problem becomes easier if the kernel is symmetric.) If the kernel K is symmetric and |K| ≤ M < 1, we will define (ϕ, ψ)L = (Lϕ, ψ), then prove that (·, ·)L is an inner product according to Definition 4.1 on page 195. The matrix for the system (10.84) is then simply a Gram matrix, which wee showed was positive definite in problem 8 of Chapter 4 (on page 285). The first property of inner product, namely that (ϕ, ϕ)L ≥ 0 and (ϕ, ϕ)L = 0 if and only if ϕ ≡ 0, follows from Lemma 10.2 and the fact that 159

(ϕ, ψ) is an inner product. For the second property, we need to show (ϕ, ψ)L = (ψ, ϕ)L , which follows in a straightforward way from the symmetry K(s, t) = K(t, s): (ϕ, ψ)L

= (Lϕ, ψ) Z 1  Z = ϕ(t) − t=0

= (ϕ, ψ) − = (ϕ, ψ) −

Z

Z

1

0

Z

1

s=0 1

 K(t, s)ϕ(s)ds ψ(t)dt

K(t, s)ϕ(s)ψ(t)dsdt

0

1 0

Z

1

K(s, t)ψ(t)ϕ(s)dtds

0

= (ϕ, Lψ) = (Lψ, ϕ) = (ψ, ϕ)L . The third property of inner product, namely, linearity in each argument, follows directly from the linearity of L and the linearity of (·, ·). Invoking the result of problem 8 on page 235, the requested property is thus proven.

160