229 42 3MB
English Pages [357] Year 2009
Mathematics for Engineers IV Numerics by Gerd Baumann
Oldenbourg Verlag München
Prof. Dr. Gerd Baumann is head of Mathematics Department at German University Cairo (GUC). Before he was Professor at the Department of Mathematical Physics at the University of Ulm.
© 2010 Oldenbourg Wissenschaftsverlag GmbH Rosenheimer Straße 145, D-81671 München Telefon: (089) 45051-0 oldenbourg.de All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise without the prior written permission of the Publishers. Editor: Kathrin Mönch Producer: Anna Grosser Cover design: Kochan & Partner, München Printed on acid-free and chlorine-free paper Printing: Druckhaus „Thomas Müntzer“ GmbH, Bad Langensalza ISBN 978-3-486-59042-5
Preface Theory without Practice is empty, Practice without Theory is blind. The current text Mathematics for Engineers is a collection of four volumes covering the first three up to the fifth terms in undergraduate education. The text is mainly written for engineers but might be useful for students of applied mathematics and mathematical physics, too. Students and lecturers will find more material in the volumes than a traditional lecture will be able to cover. The organization of each of the volumes is done in a systematic way so that students will find an approach to mathematics. Lecturers will select their own material for their needs and purposes to conduct their lecture to students. For students the volumes are helpful for their studies at home and for their preparation for exams. In addition the books may be also useful for private study and continuing education in mathematics. The large number of examples, applications, and comments should help the students to strengthen their knowledge. The volumes are organized as follows: Volume I treats basic calculus with differential and integral calculus of single valued functions. We use a systematic approach following a bottom -up strategy to introduce the different terms needed. Volume II covers series and sequences and first order differential equations as a calculus part. The second part of the volume is related to linear algebra. Volume III treats vector calculus and differential equations of higher order. In Volume IV we use the material of the previous volumes in numerical applications; it is related to numerical methods and practical calculations. Each of the volumes is accompani ed by a CD containing the Mathematica notebooks of the book. As prerequisites we assume that students had the basic high school education in algebra and geometry. However, the presentation of the material starts with the very elementary subjects like numbers and introduces in a systematic way step by step the concepts for functions. This allows us to repeat most of
vi
Mathematics for Engineers
the material known from high school in a systematic way, and in a broader frame. This way the reader will be able to use and categorize his knowledge and extend his old frame work to a new one. The numerous examples from engineering and science stress on the applications in engineering. The idea behind the text concept is summarized in a three step process: Theory Examples Applications When examples are discussed in connection with the theory then it turns out that the theory is not only valid for this specific example but useful for a broader application. In fact , usually theorems or a collection of theorems can even handle whole classes of problems. These classes are sometimes completely separated from this introductory example; e.g. the calculation of areas to motivate integration or the calculation of the power of an engine, the maximal height of a satellite in space, the moment of inertia of a wheel, or the probability of failure of an electronic component. All these problems are solvable by one and the same method, integration. However, the three-step process is not a feature which is always used. Some times we have to introduce mathematical terms which are used later on to extend our mathematical frame. This means that the text is not organized in a historic sequence of facts as traditional mathematics texts. We introduce definitions, theorems, and corollaries in a way which is useful to create progress in the understanding of relations. This way of organizing the material allows us to use the complete set of volumes as a reference book for further studies. The present text uses Mathematica as a tool to discuss and to solve examples from mathematics. The intention of this book is to demonstrate the usefulness of Mathematica in everyday applications and calculations. We will not give a complete description of its syntax but demonstrate by examples the use of its language. In particular, we show how this modern tool is used to solve classical problems and to represent mathematical terms. We hope that we have created a coherent way of a first approach to mathematics for engineers. Acknowledgments Since the first version of this text, many students made valuable suggestions. Because the number of responses are numerous, I give my thanks to all who contributed by remarks and enhancements to the text. Concerning the historical pictures used in the text, I acknowledge the support of the http://www-gapdcs.st-and.ac.uk/~history/ webserver of the University of St Andrews, Scotland. The author deeply appreciates the understanding and support of his wife, Carin, and daughter, Andrea, during the preparation of the books. Cairo Gerd Baumann
To Carin and Andrea
Contents 1. Outline 1.1. Introduction ......................................................................................................1 1.2. Concept of the Text ......................................................................................2 1.3. Organization of the Text ............................................................................3 1.4. Presentation of the Material .....................................................................4 2. Simulation Methods 2.1 Introduction ........................................................................................................5 2.2 Simulation .......................................................................................................... 7 2.2.1 Structure of Models ................................................................................7 2.2.2 Requirements on a Modeling Process ...................................................14 2.2.3 Modelling Process ....................................... ........................................15 2.2.4 Object-Oriented Approach Using Classes .............................................16 2.2.5 Tests and Exercises ...............................................................................20 2.2.5.1 Test Problems ..............................................................................20 2.2.5.2 Exercises .................................................................................... 21
2.3 Car on a Bumpy Road ...............................................................................21 2.3.1 Modelling Steps ....................................................................................21 2.3.1.1 Hardware ....................................................................................21 2.3.1.2 Formulation ................................................................................22 2.3.2 Class Definitions ....................................................................................22 2.3.2.1 Class for Setup ............................................................................24 2.3.2.2 Class Body ..................................................................................24 2.3.2.3 Classes for Axles ................................. ........................................25 2.3.2.4 Class Car ....................................................................................26 2.3.2.5 Class Simulation ..........................................................................27 2.3.3 Objects ..................................................................................................29 2.3.3.1 Body Object ................................................................................30 2.3.3.2 Axle Object .................................................................................30 2.3.3.3 Car Object ..................................................................................32 2.3.3.4 Simulation Object ........................................................................32
x
Mathematics for Engineers
3. Numbers and Errors 3.1 Introduction ..................................................................................................... 36 3.2 Numbers and Representation ............................................................... 37 3.2.1 Numbers on the Real Line ....................................................................37 3.2.2 Representation of Numbers .................................................................. 43 3.2.3 Tests and Exercises ............................................................................... 48 3.2.3.1 Test Problems .............................................................................48 3.2.3.2 Exercises ....................................................................................48
3.3 Errors on Computers .................................................................................. 49 3.3.1 Theory of Rounding Errors ................................................................... 63 3.3.2 Tests and Exercises ............................................................................... 67 3.3.2.1 Test Problems .............................................................................67 3.3.2.2 Exercises ....................................................................................67
3.4 Approximations ...................................................................................... .......69 3.4.1 Important Properties of Polynomials ....................................................69 3.4.2 Polynomial Approximation by Interpolation ....................................... .72 3.4.3 Polynomial Approximation by Least Squares .......................................77 3.4.4 Least Squares Approximation and Orthogonal Polynomials ................79 3.4.4.1 Legendre Polynomials ..................................................................80 3.4.4.2 Chebyshev Polynomials of the First Kind .......................................81 3.4.4.3 Chebyshev Polynomials of the Second Kind ...................................83 3.4.4.4 Laguerre Polynomials ..................................................................84 3.4.4.5 Hermite Polynomials ...................................................................86 3.4.5 Local Quadratic Approximation .......................................................... .92 3.4.6 Maclaurin Polynomial ...........................................................................95 3.4.7 Taylor Polynomial ................................................................................98 3.4.8 nth -Remainder .......................................................................................99 3.4.9 Tests and Exercises .............................................................................102 3.4.9.1 Test Problems ...........................................................................103 3.4.9.2 Exercises ..................................................................................103
3.5 Power Series and Taylor Series .................................................... .....104 3.5.1 Definition and Properties of Series .....................................................104 3.5.2 Differentiating and Integrating Power Series ......................................108 3.5.3 Practical Ways to Find Power Series ..................................................111 3.5.4 Generalized Power Series ...................................................................114 3.5.4.1 Fourier Sum ..............................................................................114 3.5.4.2 Fourier Series ............................................................................115 3.5.5 Tests and Exercises ............................................................................. 121 3.5.5.1 Test Problems ...........................................................................121 3.5.5.2 Exercises ..................................................................................121
Contents
4. Roots of Equations 4.1 Introduction .....................................................................................................123 4.2 Simple Root Finding Methods ...............................................................125 4.2.1 The Bisection Method ..........................................................................129 4.2.2 Method of False Position ......................................................................133 4.2.3 Secant Method ......................................................................................136 4.2.4 Newton's Method ..................................................................................141 4.2.5 Fixed-Point Method .............................................................................. 151 4.2.6 Tests and Exercises ............................................................................... 166 4.2.6.1 Test Problems .............................................................................166 4.2.6.2 Exercises ....................................................................................167
5. Numerical Integration 5.1 Introduction .....................................................................................................169 5.2 Trapezoidal and Simpson Method ......................................................170 5.2.1 Trapezoidal Method ..............................................................................170 5.2.2 Simpson's Method .................................................................................176 5.2.3 Generalized Integration Rules ..............................................................182 5.2.4 Error Estimations for Trapezoidal and Simplson's Rule .......................186 5.2.4.1 Error Formulas for Simpson's Rule ................................................190 5.2.5 Tests and Exercises ...............................................................................195 5.2.5.1 Test Problems .............................................................................195 5.2.5.2 Exercises ....................................................................................195
5.3 Gaussian Numerical Integration ...........................................................197 5.3.1 Tests and Exercises ...............................................................................209 5.3.1.1 Test Problems .............................................................................209 5.3.1.2 Exercises ....................................................................................209
5.4 Monte Carlo Integration ............................................................................210 5.4.1 Tests and Exercises ...............................................................................213 5.4.1.1 Test Problems .............................................................................213 5.4.1.2 Exercises ....................................................................................213
6. Solutions of Equations 6.1 Introduction .....................................................................................................215 6.2 Systems of linear Equations ...................................................................216 6.2.1 Gauß Elimination Method ....................................................................220 6.2.2 Operations Count ..................................................................................225 6.2.3 LU Factorization ...................................................................................227 6.2.4 Iterative Solutions .................................................................................236 6.2.5 Tests and Exercises ...............................................................................242 6.2.5.1 Test Problems .............................................................................242 6.2.5.2 Exercises ....................................................................................242
xi
Mathematics for Engineers
xii
6.3 Eigenvalue Problem ...................................................................................244 6.3.1 Tests and Exercises ............................................................................... 254 6.3.1.1 Test Problems .............................................................................254 6.3.1.2 Exercises ....................................................................................254
7. Ordinary Differential Equations 7.1 Introduction .............................................................................................. ...... 255 7.2 Mathematical Preliminaries .................................................................... 256 7.2.1 Tests and Exercises ...............................................................................261 7.2.1.1 Test Problems .............................................................................261 7.2.1.2 Exercises ....................................................................................261
7.3 Numerical Integration by Taylor Series .............................................262 7.3.1 Implicit and Predictor–Corrector Methods ...........................................271 7.3.2 Tests and Exercises ...............................................................................275 7.3.2.1 Test Problems .............................................................................275 7.3.2.2 Exercises ....................................................................................275
7.4 Runge-Kutta Methods ...............................................................................277 7.4.1 Tests and Exercises ...............................................................................285 7.4.1.1 Test Problems .............................................................................286 7.4.1.2 Exercises ....................................................................................286
7.5 Stiff Differential Equations .......................................................................287 7.5.1 Tests and Exercises ...............................................................................289 7.5.1.1 Test Problems .............................................................................289 7.5.1.2 Exercises ....................................................................................289
7.6 Two-Point Boundary Value Problems ................................................289 7.6.1 The Shooting Method ...........................................................................296 7.6.2 The Optimization Approach .................................................................301 7.6.3 The Finite Difference Approach ...........................................................309 7.6.4 The Collocation Method .......................................................................316 7.6.5 Tests and Exercises ...............................................................................319 7.6.5.1 Test Problems .............................................................................319 7.6.5.2 Exercises ....................................................................................319
7.7 Finite Elements and Boundary Value Methods .............................321 7.7.1 Finite Elements for Ordinary Differential Equations ............................322 7.7.2 Tests and Exercises ...............................................................................334 7.7.2.1 Test Problems .............................................................................334 7.7.2.2 Exercises ....................................................................................334 Appendix A. Functions Used System of Equations 6 ...................................................................................335 B. Notations C. Options References......................................................................................................338 Index.............................................................................................................. 341
1 Outline
1.1. Introduction We have compiled this material for a sequence of courses on the application of numerical approximation techniques. The text is designed primarily for undergraduate students from engineering who have completed the basic courses of calculus. Familiarity with the fundamentals of linear algebra and differential equations is helpful, but adequate introductory material on these topics is presented in the text so that those courses will be refreshed at the certain point. Our main objective with this book is to provide an introduction to modern approximation techniques; to explain how, why, and when they can be expected to work; and to provide a firm basis for future study of numerical analysis and scientific computing. The book contains material for a single term of study, but we expect many readers to use the text not only for a single term course but as a basic reference for their future studies. In such a course, students learn to identify the types of problems that require numerical techniques for their solution and see examples of the error propagation that can occur when numerical methods are applied. They accurately approximate the solutions of problems that cannot be solved symbolically in an exact way and learn techniques for estimating error bounds for the approximations. Mathematics is for engineers a tool. As for all other engineering applications working with tools you must know how they act and react in applications. The same is true for mathematics. If you know how a mathematical procedure (tool) works and how the components of this tool are connected by each other you will understand its application. Mathematical tools consist as engineering tools of components. Each component is usually divisible into other components until the basic components (elements) are found. The same idea is used in mathematics there are basic elements you should know as an engineer. Combining these basic elements we are able to set up a mathematical frame which
2
Mathematics for Engineers
incorporates all those elements which are needed to solve a problem. In other words, we use always basic ideas to derive advanced structures. All mathematical thinking follows a simple track which tries to apply fundamental ideas used to handle more complicated situations. If you remember this simple concept you will be able to understand advanced concepts in mathematics as well as in engineering.
1.2. Concept of the Text Every concept in the text is illustrated by examples, and we included more than 1,000 tested exercises for assignments, class work and home work ranging from elementary applications of methods and algorithms to generalizations and extensions of the theory. In addition, we included many applied problems from diverse areas of engineering. The applications chosen demonstrate concisely how numerical methods can be, and often must be, applied in real life situations. During the last 25 years a number of symbolic software packages have been developed to provide symbolic mathematical computations on a computer. The standard packages widely used in academic applications are Mathematica® , Maple® and Derive® . The last one is a package which is used for basic calculus while the two other programs are able to handle high sophisticated calculations. Both Mathematica and Maple have nearly the same mathematical functionality and are very useful in symbolic and numeric calculations. This is a major advantage if we develop algorithms and implement them to generate numerical results. This approach is quite different from the traditional approach where algorithms are developed by pencil and paper and afterwards implemented in packages like MATLAB or in a programming language like FORTRAN or C. Our approach in using Mathematica is a combination of symbolic and numeric approach, a so called hybrid approach, allowing the design and implementation in the same environment. However the author's preference is Mathematica because the experience over the last 25 years showed that Mathematica's concepts are more stable than Maple's one. The author used both of the programs and it turned out during the years that programs written in Mathematica 25 years ago still work with the latest version of Mathematica but not with Maple. Therefore the book and its calculations are based on a package which is sustainable for the future. Having a symbolic computer algebra program available can be very useful in the study of approximation techniques. The results in most of our examples and exercises have been generated using problems for which exact values can be determined, since this permits the performance of the approximation method to be monitored. Exact solutions can often be obtained quite easily using symbolic computation. In addition, for many numerical techniques the error analysis requires bounding a higher ordinary or partial derivative of a function, which can be a tedious task and one that is not particularly instructive once the techniques of calculus have been mastered. Derivatives can be quickly obtained symbolically, and a little insight often permits a symbolic computation to aid in the bounding process as well. We have chosen Mathematica as our standard package because of its wide distribution and reliability. Examples and exercises have been added whenever we felt that a computer algebra system would be of
IV - Chapter I: Outline
3
significant benefit, and we have discussed the approximation methods that Mathematica employs when it is unable to solve a problem exactly.
1.3. Organization of the Text The book is organized in chapters which will cover the first steps in Numerical Methods demonstrating when numeric is useful and how it can be applied to specific problems in simulation. We go on with the representation of numbers and the finite representation of numbers on a computer. Having introduced numbers and errors for a finite number representation we discuss how we can represent functions and their approximation by interpolation. Numerical interpolation is the basis of different other approaches in numeric such as integration and solution of differential equations. Related to the representation of functions are the methods of root finding for a function of a single or several variables. After the discussion of the procedures for numerical root finding we examine linear systems of equations. In this field the classical solution approaches are discussed as well as iterative methods to solve linear equations. The final topic is related to the solution of initial and boundary value problems. We will also examine the finite element method applied to ordinary differential equations. The whole material is organized in seven chapters where the first part of this chapter is the current introduction. In Chapter 2 we will deal with object oriented concepts of simulation; it is an introductory chapter to demonstrate where and how numeric methods is applied. In Chapter 3, we deal with concepts of numbers and the representation of numbers in finite formats. We also discuss the consequences of a finite representation and discuss the basic errors occurring in such representation. In Chapter 4 we use different approaches to approximate functions by polynomials. We also discuss how polynomials are represented in a numerically stable way. Chapter 5 discusses non polynomial functions and their roots. Different approaches are discussed to find roots of a single valued function. The classic approaches like bisection, Regula Falsi, and Newton's method are discussed. In addition, we use Banach's fixed point theorem to show that for a certain class of functions this approach is quite efficient. Chapter 6 deals with linear systems of equations. We discuss the methods by Gauß and give estimations on the number of operations needed to carry out this approach. For large systems of equations we introduce iterative solution procedures allowing us to treat also sparse equations. An important point in engineering are eigenvalues and eigenvectors, discussed in connection with the power method approach. Finally, in Chapter 7, we discuss the solution of ordinary differential equations. For ordinary differential equations the standard solver like Euler, Taylor, and Runge-Kutta's method are discussed. In addition to the solution of initial value problems, we discuss also the solution of boundary value problems especially two point boundary methods. The approaches used to solve boundary problems are shooting and optimization methods and finite differences and collocation methods. The finite element method and its basic principles are described for ordinary differential equations.
4
Mathematics for Engineers
1.4. Presentation of the Material Throughout the book I will use the traditional presentation of mathematical terms using symbols, formulas, definitions, theorems, etc. to set up the working frame. This representation is the classical mathematical part. In addition to these traditional presentation tools we will use Mathematica as a symbolic, numeric, and graphic tool. Mathematica is a computer algebra system allowing us to carry out hybrid calculations. This means calculations on a computer are either symbolic or/and numeric. Mathematica is a calculation tool allowing us to do automatic calculations on a computer. Before you use such kind of tool it is important to understand the mathematical concepts. The use of Mathematica allows you to minimize the calculations but you should be aware that you will only understand the concepts if you do your own calculations by pencil and paper. Once you have understood the way how to avoid errors in calculations and in concepts you are ready to use the symbolic calculations offered by Mathematica. It is important for your understanding that you make errors and derive an improved understanding from these errors. You will never reach a higher level of understanding if you apply the functionality of Mathematica as a black box solver of your problems. Therefore I recommend to you first try to understand by using pencil and paper calculations and then switch to the symbolic algebra system if you have understood the concepts. You can get a trial version of Mathematica directly from Wolfram Research by requesting a download address from where you can download the trial version of Mathematica. The corresponding web address to get Mathematica for free is: http://www.wolfram.com/products/mathematica/experience/request.cgi
2 Simulation Methods
2.1 Introduction Today we are in a situation which allows us to set up different kind of calculations ranging from simple algebraic calculations up to the complicated integration procedure for nonlinear partial differential equations. Since the 40ties of the last century there started a development of computers and software which supports the ideas of numerical calculations in an ever increasing way in efficiency. The new approach to mathematics by means of numerical calculations was dramatically changed and pushed in one direction by inventing and developing personal computers (PCs). PCs are now used by everybody who has to calculate or to simulate some kind of models. The PC somehow revolutionized the way how calculations are carried out today. In the past, numerical calculations were not the tool of the best choice but a burden for those which had to carry out such kind of calculations. For example, the calculations related to the now famous Kepler laws took more than 10 years of laborious calculations by Kepler himself. These calculations examining the paths of planets actually were numerical calculations done by pencil and paper. Later on, mechanical calculators and recently pocket calculators replaced ruler based calculators in engineering which was somehow a great step forward to make numerical calculations more efficient. Today, we use pocket calculators and computers to carry out this kind of calculations. However, numerical calculations are necessary to boil down a theory to a number; the theory itself can today be generated symbolically on computers. Since the beginning of mathematics there were always two branches of calculations; a symbolic and a numeric one. The symbolic calculations were used to establish the theoretical framework of mathematics while numerical calculations were used for specific practical applications. Up to now, these two branches remained in mathematics as a symbiosis of two ways to consistently formulate theoretical ideas and to derive practical information from it. Today, some of the mathematicians and
Mathematics for Engineers
6
engineers make a sharp distinction between numeric and symbolic and even some practically oriented engineers believe that numeric is the only way how calculations can be carried out today. This point of view ignores that today we have strong symbolic programs available which also run on computers and generate much more useful information than just a single number. The symbolic approach generates general formulas which allows us to describe the system or model under general conditions for the parameters. The symbolic approach to calculations is thus more useful for the theoretical work. It allows for example to formulate the theoretical background needed in numerical calculations. Another advantage of symbolic calculations is that formulas can be generated on a computer which are the basis of numerical calculations. This brings us to the point where we have to distinguish between the formulation of a model or equation and the evaluation of a model or equation. The formulation of a mathematical expression is solely related to the symbolic creation of the expression. The evaluation of an expression is related to the numerical evaluation and derivation of a number. To derive numbers from a symbolic expression, we have to specify the values of each symbol in the expression and use the algebraic rules we agreed on. The evaluation of an expression assumes that we know the symbols of this expressions and the numerical values for these symbols. On the other hand, if we know the symbolic representation of the expression then we usually distinguish between important symbols which can change their values and unimportant symbols which keep their values. This kind of classification allows us to introduce so called major variables and parameters in an expression, respectively. The major variables or model variables are those variables which can be changed continuously in the model. The parameters of the expression or model are those quantities which remain constant for the time of evaluation. Thus a model or an expression consists of two kinds of symbols: parameters and model variables. Model variables determine the basic structure of the model while parameters represent the influence of different quantities on the model. The basic definition of a model can be formulated as follows: Definition 2.1. Model and Parameters
A mathematical model is symbolically represented by the methods and the parameters. Methods define the structure of the model and thus the calculation while parameters define the influences onto the model. This kind of definition is used since the beginning of mathematics and becomes nowadays a practical application in software engineering. The link between the definition given and the application in generating calculation or simulation programs is the style of programming. Since the beginning of numerical calculations it was the case that between the tool (machine, pocket calculator, PC) there was always a link between the symbolic formulation and the numerical result. This link is known today as programming language allowing us to represent the original symbolic calculations in different steps appropriate for the calculation tool used to carry out the calculation. For mechanical calculations this programming language consists of how the steps of the calculations are applied to the machine while for computers the steps are collected in programs. During the past decades, different programming styles were developed such as sequential, functional, and object oriented programming. It turned out that the last programming style is most appropriate to establish a
IV - Chapter 2: Simulation Methods
7
one-to-one correspondence between the mathematical model and the programming style on a computer. This correspondence between the program and the theoretical background does not dramatically influence the numerical evaluation but helps to avoid difficult formulations. The following sections will introduce the ideas used in simulation and the methods to generate software from these ideas.
2.2 Simulation Simulation is the subject in engineering which generates the link between the reality and the theoretical description (model) of a situation. Simulation uses numerical and symbolic procedures to generate results which are related to the questions under discussion.
2.2.1 Structure of Models As discussed before, a model consists of two components. These two components are: the methods on which the model is based on and the parameters determining the properties of the model. The methods are the laws governing the model while the parameters are the interface of the method to the environment. Thus the parameters can be seen as the input quantities which determine the behavior of the model. The methods or laws of the model determine the structure of the model. This means that the methods determine the output or results in a specific way. These results are influenced by the parameters. The following diagram shows the schematic structure of a model.
Figure 2.1. Classification of a model by its parameters and methods.
To clarify the structure of a model let us discuss the most prominent model in engineering and physics. This model is Newton's third law which verbally is stated as to every action there is an equal and opposite reaction.
Mathematics for Engineers
8
Example 2.1. Newton's Law Newton's law was established by Sir Isak Newton in 1660s. He uses this model to describe his observations regarding the action and reaction of forces. Let us assume we are looking on a ball which is falling from a tower of height H. We know from Newton's second law that this fall can be described by the formula ma
F
(2.1)
where m is the mass of the ball, a the acceleration which the particle experiences, and F the force acting on the ball. The model here is a simple relation connecting physical (model) quantities like acceleration a and force F with model parameters like mass m. The distinction between parameters and model variables here is very simple because the only property the particle has is its mass. The model is generated by Newton's law combining acceleration with force. For the falling particle we know that the acting force is the gravitation force of earth. In this case F m g where g is the gravitational acceleration which in fact is also a constant. The model thus simplifies to ma
m g
(2.2)
or a
g.
(2.3)
Here the model reduces to a simple algebraic equation where the left hand side is the model variable and the right hand side represents a model parameter. Thus model parameters and model variables are the same. This situation changes if we assume that acceleration is a quantity which changes with time.Ô The first example was used to introduce the notation and terminology in modelling and simulation processes. However, the final relation was a very simple algebraic relation which does not need much mathematical efforts to determine the precise meaning of this relation and to evaluate the relation. Since the value of the terrestrial acceleration is known as g 9.81 mts 2 , there is no need to use numerical procedure to find a reliable value for a. However the question arises how accurate the value for g can be found in measurements or tables. The accuracy and precision will be discussed in a later section where we will define these terms. To see how models are created and how models are evaluated we will discuss here the same example with a different aim. Our goal is to introduce the time dependence of the model and to demonstrate how this time dependence changes the reliability of the results. Example 2.2. Newton's Law as a time-dependent relation If we assume that acceleration is defined due to the temporal changes of the velocity then we can replace acceleration in Newton's equation by the derivative of the velocity; i.e. a dv s dt. Thus Newton's law for a falling ball can be written as ma
m g
which is equivalent to
(2.4)
IV - Chapter 2: Simulation Methods
m
dv
m g
dt
9
(2.5)
or dv
g
dt
(2.6)
This is a simple first order differential equation for the velocity with constant right hand side. To solve this equation we can use either numerical or symbolic methods to derive a solution. Let us first use our mathematical knowledge to solve this equation. The equation is a separate equation which can be written in separated variables as g dt
dv
(2.7)
Integrating this equation results to à dv
g à dt
(2.8)
which will deliver the solution as a function of time g t C
v+t/
(2.9)
where C is an integration constant. The result gained is a linear function in time t. The parameters of the model are here g the terrestrial acceleration g and some integration constant C which is determined by the initial velocity of the ball. Let us assume that the initial velocity is zero and thus the constant C vanishes. This assumption will simplify the result to g t
v+t/
(2.10)
which is an exact result for the solution. On the other hand we can solve this first order ordinary differential equation by means of numerical methods. This means that we approximate the continuous description by a discrete one; i.e. dv dt
'v
v+ti1 / v+ti /
't
ti1 ti
g
(2.11)
which is equivalent to v+ti1 /
g+ti1 ti / v+ti /
(2.12)
This equation represents a formula which allows us to evaluate the velocity at a later time +ti1 / using the information we have for the velocity from previous times +ti /. This formula can be used to iteratively solve the velocities. Any iteration procedure starts with an initial value at the starting time t0 . Let us assume that the starting point for our iteration is t0 0. At the starting time the velocity is zero so that we start with v+t0 / 0. The next we have to choose is the time step we are interested in. Let's assume that we are interested in an interval of one second; i.e. the time step spans 1 second
Mathematics for Engineers
10
away from the initial one. So t1 1 and thus t1 t0 1. If we use the value for the terrestrial acceleration with g 9.81 we will gain the velocity at t1 1 by v1
0 9.81 +1 0/
9.81
If we continue the iteration for t2 which delivers v2
2 seconds, we iterate the formula by using the above formulas
v1 9.81 +2 1/
19.62
The next iteration is gained by the step v3
v2 9.81 +3 2/
29.43
The next step is to automatically evaluate this kind of formula up to a certain time. This iteration can be carried out by a program which can be used by a computer. Such kind of program is contained in the next line. vi tv tv 0 1 2 3 4 5 6 7 8 9 10 11
0; Table#i 1, vi vi 9.81 ++i 1/ i/, i, 0, 10'; Prepend#tv, 0, 0' 0 9.81 19.62 29.43 39.24 49.05 58.86 68.67 78.48 88.29 98.1 107.91
The results can be graphically represented in a plot, demonstrating the linear behavior of the solution.
11
IV - Chapter 2: Simulation Methods
0 20
vi #mss'
40 60 80 100 0
2
4
6
8
10
ti #s' Figure 2.2. Simple numerical solution for a motion using fixed time steps 't.
The theoretical result can also be used in a graph which is shown in the next plot: 0
v #mss'
20 40 60 80 100
0
2
4
6
8
10
t #s' Figure 2.3. Exact solution for the motion using the analytic expression, is given by v+t/
g t.
If we now plot both the theoretical and numerical result in common we see that the two solutions derived correspond. The following plot shows this result.
Mathematics for Engineers
12
0 20
v #mss'
40 60 80 100
0
2
4
6
8
10
t #s' Figure 2.4. Comparison of numerical and exact solution of the motion.
However, this picture changes if we change the time step 't +ti1 ti / by increasing the difference by a factor 2. If we generate the date again and plot the results as before vi tv tv 0 2 4 6 8 10 12
0; Table#i 2, vi vi 9.81 ++i 1/ i/, i, 0, 10, 2'; Prepend#tv, 0, 0' 0 9.81 19.62 29.43 39.24 49.05 58.86
we observe that the slope of the line changes.
13
IV - Chapter 2: Simulation Methods
0 10
vi #mss'
20 30 40 50 60
0
2
4
6
8
10
12
ti #s' Figure 2.5. Simple numerical solution for a motion using an increased fixed time step 't.
This becomes obvious when we plot the numerical result and the theoretical result in the same plot. 0
v #mss'
20 40 60 80 100
0
2
4
6
8
10
t #s' Figure 2.6. Numerical and theoretical solution in a common plot. The time step 't was increase by a factor 2.
The plot above shows one common characteristic of all numeric solutions generated by an iterative process. If the time step in the iteration is chosen the wrong way then the long term behavior of the numerical solutions deviates dramatically from the theoretical one. This phenomenon is always true for numerical calculations. Thus, the question raises how to guarantee that a numerical solution represents the correct figures of the calculation. This question is the central question in all numerical calculations.Ô These observations led us to the following question. Which requirements in a simulation are necessary to guarantee that the numerical results represent the true results? At the moment, we do not know how to
14
Mathematics for Engineers
define a true numerical representation and how to make sure that a related program generates the correct results.
2.2.2 Requirements on a Modeling Process Models and especially programs should satisfy some requirements to guarantee a quality standard for the calculations. Most of the programs related to models are currently written in a way to just solve a single and specific task. This means that programs are written in a way that they are only useful for a specific application. However, the development and testing of a program and thus a model needs a certain amount of time till the program is working and another period to make sure that the results are reliable. This observation on the development process consequently requires that a model respectively a program should be reusable. This means that the model or components of a model should be designed in such a way that in another setup these parts can be used again. Since the development of models are time and cost intensive a model should be scalable. This means that the development costs are gained not only once but several times. So if the model is scalable and the related program is useful for others too, then the return of investment should be a multiple of the development costs. To achieve this economic goal it is necessary to design the model in such a way that it can be easily used. This means that beside a complete documentation of the model there should be an easy way to use and apply the model. Usually, models need some fundamental theoretical description which are necessary for the development of the model but not for the application. However, the user should know what is the frame of the theoretical assumptions and where are the limits of these assumptions. Only if we know the limits and side conditions of a model we can make a judgment on the efficiency and reliability of an application. These few ideas concerning the general design process of a model and thus a program are summarized in the following list. The requirements on programs in practical applications are: 1. a program has to deliver reliable results 2. a program should be reusable 3. an application must be scalable 4. the use of the software should be as easy as possible 5. a program should be accompanied by a complete documentation. So far we discussed the basic requirements for a modelling process but did not describe how a model is generated and which steps are necessary to setup a model for symbolic or numeric calculations. The following section will briefly introduce the way how a model is generally developed and which steps are necessary to derive a model.
IV - Chapter 2: Simulation Methods
15
2.2.3 Modelling Process The creation of a model is usually a process which consists of different steps. These steps are basically related to the original physical system which exists in reality. The modeling process consists typically of an abstraction of the complicated real word incorporating only the most important features to describe the system. The abstraction is governed by efficiency and necessary arguments guaranteeing the basic properties of the real system. Thus, the abstraction is somehow a short hand description of the reality taking into account only the essential features of the system, compare Figure 2.7. If such an abstraction is derived from the reality it is necessary to create a mathematical formulation. The mathematical formulation usually consists of equations, differential equations, and boundary as well as initial conditions. The mathematical model again uses some abstract ways to describe the real system. This means that depending on the mathematical description the model can be simple or very complicated depending on the representation of the model. If the mathematical description is available, complicated or not the last step which must be carried out is to solve this model. At this point, numeric and numerical mathematics comes into play. The solution step of a model usually cannot be carried out in a pure symbolic way. Only special models or parts of a model are generally accessible by symbolic solution methods. Thus the standard approach to solve a mathematical model is numeric. However, numeric delivers a solution only under certain boundary conditions. This means that a numeric solution depends on the choice of some numeric parameters determining the solution strategy for a model. Thus, the solution derived is only valid under the validity of the numerical procedure. Our goal here is to clarify which of the assumptions are critical (if we apply numerical solution procedures) and which are not.
Figure 2.7. Simplified model for a car using springs and dampers in a simple mechanical model allowing elongations and rotations.
To summarize the steps used in a modelling process, we collected the different steps in the following list.
16
Mathematics for Engineers
Three step process: 1. the physical model (hardware, reality) 2. the mathematical formulation, and 3. the solution of the problem. The modeling process divides the real system into components and the components are divided into properties and methods to mathematically describe the system. The division of the real system into parts should be carried out in such a way that physical components are related to mathematical models or methods. This means that we look right from the beginning at parts and laws governing these components which combined generate the total system. This physical ideas have counterparts not only in the mathematical description but also in the programming style. The different programming styles also influence the efficiency of the calculations and especially the design of the model. In general there are the following ways to generate a program: procedural recursive functional object oriented These different types of program development allow to generate different kinds of programs which may be usable only once or they are reusable if programmed in an object - oriented way. The older programing approaches based on procedural programming languages like FORTRAN, MATLAB, C, etc. allow to generate programs in a direct way. However, the generated program using subprograms have a sequential structure. This means that there exists a chain of dependencies on the different parts of the program which makes it difficult to reuse parts of it. Recursive and functional programming generates also a dependent structure which cannot be efficiently divided in independent parts. The object oriented programming style allows to extract parts of the programs in such a way that these components can be reused in other programs. In addition, the object-oriented programming style is directly related to the hardware components, because the physical components are directly related to the classes and objects of the programs. How this works out will be discussed in more detail in the following section.
2.2.4 Object Oriented Approach Using Classes Object oriented programming has the advantage that classes consisting of properties and methods are directly related to the physical objects of the system. The methods are related to the mathematical model while the properties are the model parameters determining the systems property. An objectoriented approach in programming assumes that the total system can be divided into components which are described by specific formulas and parameters. Formulas correspond to methods and parameters correspond to properties of the class concept. The concept of a class is shown in Figure 2.8.
IV - Chapter 2: Simulation Methods
17
Figure 2.8. Representation of a class by parameters and methods.
A class basically consists of input values and output quantities. The input and output are related by methods and parameters which interact with each other. The methods are laws or formulas used in the calculations. The parameters are quantities which influence the formulas with their values. This influence is not only static but can be used in a dynamic way. The methods and parameters are used to derive an object from this class which is an entity determined by the choice of the parameters. Different parameter sets will define different objects. For each object the same methods are used to represent the laws determining the mathematical and physical behavior. The dynamic use of parameters allows us to connect different classes or different objects. Generating different objects allows us to generate a tool box and thus generate different models. The properties of a class can be summarized in two topics: classes depend on parameters determining the properties of the system classes are based on calculation methods There are different approaches to generate classes. The most recent way to generate programs based on classes is JAVA. Other programming languages are Simula, Smalltalk, C++, eclipse, .... An environment in Mathematica was designed by myself to have this kind of programing style not only available for numerical calculations but also for symbolic ones. The package within Mathematica allowing this style of programming is called Elements. Elements uses elementary data structures to represent a class. The most elementary structure in Mathematica is a list. Thus classes in Elements consist of lists containing the parameters and methods in different sublists. The classes are defined by the function Class[]. This function uses four arguments which define the class name, references to other classes, a list of properties, and a list of methods. The class name defines the name of the class by a string. The second argument is used to refer to other classes. If the class is based on the basic elements in Elements then the class Class["Elements"] should appear in the second slot. If the class we are going to define is based on another class than Elements, then this class name should occur at the second slot of the function. The third and fourth slot of the class function is
Mathematics for Engineers
18
used to define properties and methods used in this class. The following example demonstrates how Newton's equation can be represented in this class concept. Example 2.3. Class Concept Lets assume we need Newton's equation in a general form in our models. Solution 2.3. Newton's equation is given in its general form by m
d 2 x+t/
F
dt2
(2.13)
In words: mass multiplied by the second derivative of the dependent variable with respect to the independent variable equals the acting force. These terms can be directly implemented in a class. The following lines show how this works. newton
Class$"Newton", Class#"Element"',
+ parameter section / description "Classe Newton defining Newton's equation", Mass Force
10, Description "Mass of the body", Dimension ! kg, 0, Description "Force acting on the body",
Dimension ! kg m t s2 , dependentVariable x, Description "dependent variable used in Newton's equation", Dimension ! m, independentVariable t, Description "independent variable used in Newton's equation", Dimension ! s , + method section / equationOfMotion#' : Block#, Mass independentVariable,independentVariable dependentVariable# independentVariable' Force', Description "equation of motion", + Coordinates / getVariables#' : dependentVariable, independentVariable, Description "get dependent and independent variables"
(
Class Newton!
The class Newton allows to define the parameters mass m, force F, the dependent and independent variable x and t, respectively. The method used is the equation of motion defined in (2.13). Choosing the parameters of the class in different ways we are able to generate different types of objects. The different objects represent different models and thus we can generate a large variety of equations. The objects are generated by using the functionality of Elements which is contained in an object generator Í and represented by the function new[]. These functions allow us to define objects and
19
IV - Chapter 2: Simulation Methods
assign them to variables. The following line is an example for this procedure using the predefined values for the parameters. newton Înew#'
newton001
Object of Newton!
In Elements there are functions available which allow us to derive the properties of an object. The following line demonstrates this GetProperties#newton001' dependentVariable, x, description, Classe Newton defining Newton's equation, Force, 0, independentVariable, t, Mass, 10
Another function of Elements is SetProperties allowing us to change or set properties of an object SetProperties#newton001, Mass ! 100'
To check the new values of the object we use again GetProperties GetProperties#newton001' dependentVariable, x, description, Classe Newton defining Newton's equation, Force, 0, independentVariable, t, Mass, 100
The attributes attached to the properties can be derived by using the function GetAttributes[] GetAttributes#newton001, Force' Description Force of the body, Dimension
kg m s2
!
Attributes of the methods are derivable the same way, demonstrated in the next line GetAttributes#newton001, equationOfMotion' Description equation of motion
The methods are used by using the object and the function defined in the method's section combined with the object operator Í. The following line shows the derivation of the equation of motion newton001 ÎequationOfMotion#' 100 x
#t' m 0
Having the class Newton available we are now able to define different models by changing the properties of an object. Let us assume that a particle encounters a periodic external force with amplitude f0 and driving frequency Z. So we define a new object incorporating these properties of the force newton002
newtonÎnew#Mass ! m, Force ! f0 Sin#Z t''
Object of Newton!
20
Mathematics for Engineers
A second model uses a harmonic force due to a spring and in addition instead of the sine function a cosine function to represent the external force with amplitude A0 and driving frequency Z. The external force is here given by A0 cos+Z t/ newton003 newton Înew# unabhaengigeVariable ! s, Mass ! 20, Force ! k x#s' A0 Cos#Z s'' Object of Newton!
The two different equations of motion are now available by newton002 ÎequationOfMotion#' m x
#t' m f0 Sin#t Z'
and newton003 ÎequationOfMotion#' 20 x
#t' m A0 Cos#s Z' k x#s'
The use of classes and objects allows us to derive a huge variety of models by changing the properties of an object. This variation is not possible by using standard programming approaches like in procedural programs. The derivation of equations is one of the important steps in modelling a system. Also important is the last step of a simulation process which is related to the solution of the equations. To get the methods of a class Elements provides another function which allows to identify the methods in a class. This function is GetMethodsOfClass[className]. The following line demonstrates the application of this function. GetMethodsOfClass#newton' equationOfMotion, getVariables
Analogously, the properties of a class can be identified by using the function Properties[className]. For our example this is done by Properties#newton' dependentVariable, description, Force, independentVariable, Mass
The discussed example demonstrates that an object oriented approach to formulate a model is a natural approach and allows to setup different models by using a single class.Ô
2.2.5 Tests and Exercises The following two subsections serve to test your understanding of the last section. Work first on the test examples and then try to solve the exercises. 2.2.5.1 Test Problems T1. What is a model? Give examples. T2. What is a class? Give examples. T3. How are classes and models related?
IV - Chapter 2: Simulation Methods
21
What is an object oriented representation? Give examples.
2.2.5.2 Exercises E1. Set up a model for a ball thrown in a base ball game. E2. Define a class structure for a falling ball incorporating air friction. E3. Implement in Mathematica a model for a falling particle with friction.
2.3 Car on a Bumpy Road The problem considered in this section is a practical application originating from the early phase of a car design process. Engineers in the automotive business have to know in advance how a new car will react under certain environmental conditions. In the sequel, we discuss a procedure for building a model that provides basic information on the behavior of a car at the very beginning of its design. In this early stage of the process, very little information on the new car is available and thus it is sufficient to reduce the model to the most essential properties of the car. These properties can include the mass distribution on a chassis, that is, the location of the engine and other heavy components like the fuel tank, batteries, gear box, and so on. Another essential feature in this early design phase is the undercarriage. In our model, these components are combined into a multibody system interacting by forces with each other and with the environment. Knowing the preliminary behavior of the new car, the next step is to improve the model over several iterations to incorporate more details about the interacting components. Here, we restrict our considerations to the very beginning of the design process and demonstrate how an object-oriented approach can be used to model different components in a transparent way.
2.3.1 Modelling Steps 2.3.1.1 Hardware
Figure 2.9 shows the simplified one-dimensional mechanical model of a car moving on a bumpy road. In principle, the car consists of three main components: the two axles and the car body. We assume that the two axles are coupled to the chassis by a dash pot and a spring. In addition to the translation, we allow a narrow rotation of the body around the center of gravity. The wheels are in contact with the road (this interaction is modeled by introducing springs and dampers representing the two tires). The symbols shown in Figure 2.9 have the following meaning: ki , di with +i 1 4/, mk with +k 1, 2, 3/, a, and b are the physical and geometric parameters of the system. The ki and di denote force and damping constants, mk are the wheel and chassis masses, and a and b determine the center of the mass of the chassis. JC2 I is the inertia moment of the chassis. The dynamical coordinates denoted by z1 , z2 , z3 , and E describe the vertical and angular motion of the car. The acting external forces on the car are due to bumps in the road and gravity. The whole system can be described by Newton’s equations.
Mathematics for Engineers
22
Figure 2.9. Simplified model for a car on a bumpy road.
2.3.1.2 Formulation
These equations of motion (Newton's equations) are given for the coordinates by m1 z1
F1 F3
(2.14)
m2 z2
F2 F4
(2.15)
m3 z3
F3
(2.16)
I E
b F3 a F4,
(2.17)
where Fi are the acting forces. One of the main problems in car design is to estimate how the total system will react under different forces, different damping parameters, different velocities, different environment conditions, and so on. These questions are partially answered in the following subsections. The complete examination of all these inquiries would exceed the space available in this section. Hence, we will restrict our consideration to a few examples. To attack the physical and mathematical aspects we first identify classes that would capture the essence of this problem. If we look at the equations of motion (2.14-17) and at Figure 2.9, it seems natural to divide the car into three main classes: the body, the axles, and the environment.
2.3.2 Class Definitions The problem can be split into the following classes (see Figure 2.10): the general setup class, the car body, the axles, the car itself, and some tools needed to carry out the simulation. The simulation class that provides graphical output refers to other classes that serve as a basis for the computation required to produce the output. The definition of classes here is mainly guided by the hardware.
IV - Chapter 2: Simulation Methods
23
Figure 2.10. Connections of classes and objects for the simple car model.
In Figure 2.10, the static class structure of the car model is shown. Basic components are springs dampers, gravitation, axles, and the road. With these components a higher class for a car is created. The
Mathematics for Engineers
24
car class is used in the simulation and animation class. The arrows show how the classes are interconnected and where objects of classes are involved. Objects are the interfaces of the model and allow a modular composition of the simulation. The car model can be split into the following classes which are discussed in the following subsections: the general setup class, the car body, the axles, the car itself, and some tools needed to carry out the simulation. The simulation class provides graphical output. This class refers to all other classes that serve as a basis for the computation required to produce the output. 2.3.2.1 Class for Setup
The general setup is concerned with geometric properties and the influence of the environment, such as the external force given here by four bumps in the road. Setup
Class$"Setup", Class#"Element"',
+ Parameters / a 1.2, Description ! "distance front axle", Dimension ! m, b 1.2, Description ! "distance rear axle", Dimension ! m, v 5, Description ! "velocity of the car", Dimension ! m s s, stepWidth 0.1`, Description ! "step width", Dimension ! m, stepHeight 0.07, Description ! "step height", Dimension ! m, + Methods / extForce$t_( : Fold#Plus, 0, Table# +Tanh#100 v +t n stepWidth/' Tanh#100 +v +t n stepWidth/ stepWidth/'/ stepHeight s 2, n, 1, 4'' ( Class Setup!
2.3.2.2 Class Body
The class Body consists solely of geometric and physical parameters of the car body. This class includes also the initial conditions for displacement, angle, and velocity.
IV - Chapter 2: Simulation Methods
25
Body Class#"Body", Setup, + Parameters / description "Body", m 1200, Description ! "mass", Dimension ! kg, z0 0, Description ! "initial displacement", Dimension ! m, zp0 0, Description ! "initial velocity", Dimension ! m s s, h 0.75, Description ! "geometric quantity", Dimension ! m, E0 0, Description ! "initial value for E", Dimension ! rad, Ep0 0, Description ! "initial value for E prime", Dimension ! rad s s, zFunName z, Description ! "name of the z function", Dimension ! "", betaFunName E , Description ! "name of the E function", Dimension ! "", + Methods / ' Class Body!
2.3.2.3 Classes for Axles
The common properties of the two axles are collected in the class CarAxle. This class contains only parameters - force and damping constants together with initial conditions for the equations of motion. CarAxle Class#"CarAxle", Setup, + Parameters / description "Axle", m 42.5, Description ! "mass of the axle", k1 150 000, Description ! "wheelbody force constant", d1 700, Description ! "wheelbody damping constant", k2 4000, Description ! "wheelroad force constant", d2 1800, Description ! "wheelroad damping constant", z0 0, Description ! "initial displacement", zp0 0, Description ! "initial velocity", zFunName z, Description ! "name of the function z", + Methods / ' Class CarAxle!
In the next step we define classes for the front and the rear axle. The front axle consist solely of a method generating the equation of motion for the axle by considering the acting forces.
Mathematics for Engineers
26
FrontAxle
Class$"FrontAxle", CarAxle,
+ Parameters / , + Methods / getEquations$body_ s; ObjectOfClassQ#body, Body'( : Block#f1 k1 +zFunName#t' extForce#t'/ d1 +zFunName '#t' D#extForce#t', t'/, f3 k2 ++bodyÎ zFunName/#t' b +bodyÎ betaFunName/#t' zFunName#t'/ d2 +D#+bodyÎ zFunName/#t' b +bodyÎ betaFunName/#t', t' zFunName'#t'/, m zFunName''#t' +f1 f3/ 1 0, zFunName#0' z0, zFunName'#0' zp0' ( Class FrontAxle!
The rear axle is generated in the same way by taking the relations specific for this component into account. RearAxle
Class$"RearAxle", CarAxle,
+ Parameters / , + Methods / getEquations$body_ s; ObjectOfClassQ#body, Body'( : Block#f2 k1 +zFunName#t' extForce#t +a b/ s v'/ d1 ++bodyÎ zFunName/'#t' D#extForce#t +a b/ s v', t'/, f4 k2 ++bodyÎ zFunName/#t' a +bodyÎ betaFunName/#t' zFunName#t'/ d2 +D#+bodyÎzFunName/#t' a +bodyÎ betaFunName/#t', t' zFunName'#t'/, m zFunName''#t' +f2 f4/ 1 0, zFunName#0' z0, zFunName'#0' zp0' ( Class RearAxle!
2.3.2.4 Class Car
Finally, after setting up classes for the body and axles, we define a class to describe the car. This class incorporates parameters as well as methods to combine information from different components and deliver the equations of motion.
27
IV - Chapter 2: Simulation Methods
Car
Class$"Car", Setup,
+ Parameters / description "Car", body Null, Description ! "body", frontAxle Null, Description ! "front axle", rearAxle Null, Description ! "rear axle", + Methods / Ic$a0_, b0_, h0_, m03_( :
Block#, m03 s 12 ++a0 b0/^2 h0^2/',
getEquations#' : Block# f3 frontAxle Îk2 ++body ÎzFunName/#t' b +bodyÎ betaFunName/#t' +frontAxle ÎzFunName/#t'/ frontAxle Îd2 +D#+bodyÎzFunName/#t' b +bodyÎ betaFunName/#t', t' +frontAxle ÎzFunName/'#t'/, f4 rearAxle Îk2 ++body ÎzFunName/#t' a +bodyÎ betaFunName/#t' +rearAxle ÎzFunName/#t'/ rearAxle Îd2 +D#+body ÎzFunName/#t' a +body Î betaFunName/#t', t' +rearAxle ÎzFunName/'#t'/, Union#body Î m +body ÎzFunName/''#t' 2 f3 0, +bodyÎzFunName/#0' body Îz0, +bodyÎzFunName/ '#0' bodyÎzp0, Ic#a, b, body Îh, body Î m' +body Î betaFunName/''#t' +b f3 a f4/ 1 0, +bodyÎ betaFunName/#0' body ÎE0, +body Î betaFunName/'#0' bodyÎ Ep0, frontAxleÎgetEquations#body', rearAxleÎgetEquations#body'' ', getVariables#' : frontAxle ÎzFunName, rearAxle ÎzFunName, body ÎzFunName, body Î betaFunName ( Class Car!
2.3.2.5 Class Simulation
Having the classes for all physical components available, we need a tool to carry out the simulation. The following classes provide an efficient simulation of the model and generate a graphical representation of the results.
28
Mathematics for Engineers
MakeSimulation
Class$"MakeSimulation", Class#"Element"',
+ Parameters / description "Simulation of the motion", car Null, Description ! "Car", maxSteps 100 000, Description ! "Maximum number of steps in NDSolve", + Methods / simulate$tstart_, tend_( : Block#nsol getNSolution#tstart, tend', $TextStyle FontFamily ! "Arial", FontSize ! 12; MapThread#Plot#Evaluate#carÎextForce#t', Ó1 s. nsol', t, tstart, tend, PlotRange ! All, PlotStyle ! RGBColor#0.25098, 0, 0.25098', RGBColor#0.9, 0, 0', AbsoluteThickness#3', AxesLabel ! "t #s'", " ", PlotLabel ! Ó2, GridLines ! Automatic, Frame ! True' &, Map#Apply#Ó, t' &, carÎ getVariables#'', "Front Axle", "Rear Axle", "Body Movement", "Rotation Angle"'', getNSolution$tstart_, tend_( : NDSolve#car ÎgetEquations#', car ÎgetVariables#', t, tstart, tend, MaxSteps ! maxSteps' ( Class MakeSimulation!
If we wish to see the motion of the car, it is convenient to declare a special class for animations. MakeAnimation
Class$"MakeAnimation", Class#"Element"',
+ Parameters / description "Animate the motion of the car" path
"C:\\Documents and Settings\\Gerd.Baumann\\My Documents\\Mma\\Books\\Engineering\\Vol_IV_Numerics\\ Simulation_Methods\\Generic",
Description ! "Path of the car model"
,
+ Methods / animate$sim1_, tstart_, tend_, Gt_( : Block$names, l1, tire1, tire2, window1, window2, window3, body, sol sim1 ÎgetNSolution#tstart, tend', SetDirectory#path'; names FileNames#" .dat"';
IV - Chapter 2: Simulation Methods
29
l1 Map#ReadList#Ó, Number, RecordLists ! True, RecordSeparators ! If#StringMatchQ#$OperatingSystem, "Windows "', "\n", "\r"'' &, names'; tire1 GrayLevel#0.752941', Polygon$l1##1'' s. x_, y_
! x, y z1#t'( ;
GrayLevel#0.752941', Polygon$l1##2'' s.
tire2
x_, y_ window1
! x, y z2#t'( ;
GrayLevel#0.752941', Polygon$l1##3'' s. x_, y_
! x Cos# E#t'' y Sin#E#t'',
y Cos# E#t'' x Sin#E#t'' z3#t'( ; window2
GrayLevel#0.752941', Polygon$l1##4'' s. x_, y_
! x Cos# E#t'' y Sin#E#t'',
y Cos# E#t'' x Sin#E#t'' z3#t'( ; window3
GrayLevel#0.752941', Polygon$l1##5'' s. x_, y_
! x Cos# E#t'' y Sin#E#t'',
y Cos# E#t'' x Sin#E#t'' z3#t'( ; body
RGBColor#0, 0, 0.25098', Polygon$l1##6'' s. x_, y_
! x Cos# E#t'' y Sin#E#t'',
y Cos# E#t'' x Sin#E#t'' z3#t'( ; ListAnimate#Table#Show#Graphics#body, tire1, tire2, window1, window2, window3 s. sol', AspectRatio ! Automatic, Axes ! False, Frame ! True, PlotRange ! .1, 4.4, 0, 1.4, FrameTicks ! None', t, tstart, tend, Gt'' ( ( Class MakeAnimation!
2.3.3 Objects Up to this point, nothing more than a framework was defined in which the computation can be carried out. The given classes represent essentially only templates for creating the calculation objects. In order to setup the simulation, we have to generate objects and specify their properties. The car is generated by a modular process incorporating the distinguished objects. At this point, it is apparent that we have a great advantage in designing a specific car compared to traditional approaches in simulation. For example, we can define different objects for axles and use them as in different simulations as parameters of the underlying model. Components (objects) of the same type can be exchanged without causing any harm to the system. In the sequel, we demonstrate how this process is carried out.
Mathematics for Engineers
30
2.3.3.1 Body Object
First, we create the body by applying Elements method new[] to the class Body. All properties except zFunName and betaFunName will have their default valued specified in the class declaration. b
Body Înew#zFunName z3, betaFunName E'
Object of Body!
The actual properties of the body can be checked by GetPropertiesForm#b' Property description a b betaFunName h m stepHeight stepWidth v z0 zFunName zp0 E0 Ep0
Value Body 1.2 1.2 E 0.75 1200 0.07 0.1 5 0 z3 0 0 0
2.3.3.2 Axle Object
Next, objects for the axles are created. The front axle is defined with a mass of 45.5 kg, the function representing the displacement of the center of the wheel attached to this axle is denoted by z1 . fa
FrontAxle Înew#m ! 45.5, zFunName z1'
Object of FrontAxle!
The properties of the axle are recalled by:
IV - Chapter 2: Simulation Methods
31
GetPropertiesForm#fa' Property description a b d1 d2 k1 k2 m stepHeight stepWidth v z0 zFunName zp0
Value Axle 1.2 1.2 700 1800 150 000 4000 45.5 0.07 0.1 5 0 z1 0
Since no values for the damping and the force constant were given in the statement above to new[], these properties are initialized with values specified in the class declaration itself. A second version of the front axle with a more stiffer damping is generated in the following line. We will use this object later in our simulations to compare the influence of choosing stiffer springs in a design. Here, k1 and k2 are set to the specific values. fas
FrontAxle Înew#m 45.5, d1 5000, d2 5000, k1 150 000, k2 150 000, zFunName z1'
Object of FrontAxle!
For the rear axle we just use the default settings. The coordinate of elongation is denoted by z2 . ra
RearAxle Înew#zFunName z2'
Object of RearAxle!
A check of parameters shows the values.
Mathematics for Engineers
32
GetPropertiesForm#ra' Property description a b d1 d2 k1 k2 m stepHeight stepWidth v z0 zFunName zp0
Value Axle 1.2 1.2 700 1800 150 000 4000 42.5 0.07 0.1 5 0 z2 0
2.3.3.3 Car Object
Having all components needed available, it is easy to generate the car by incorporating the defined objects into a common model. The car object is defined by c
CarÎnew#frontAxle fas, rearAxle ra, body b';
It consist of the front axle, the rear axle, and the body. We note that an object-oriented approach allows an easy change of the behavior of the model by just changing different components or their properties, respectively. Hence, if an engineer would have access to a collection of different bodies, axles and springs, he would be able to create and test many different car designs without the need to adapt the model. 2.3.3.4 Simulation Object
A simulation object is created from the class MakeSimulation. The simulation can be performed with any object of the class Car. sim
MakeSimulation Înew#car c'
Object of MakeSimulation!
Finally, the actual simulation is initiated by calling the simulate[] method of the class MakeSimulation. The input for this function are just the starting point and the end point of the simulation interval. The result are four plots for coordinates of axles, the body, and the rotation angle of the body. All quantities are plotted depending on the simulation time.
33
IV - Chapter 2: Simulation Methods
GraphicsGrid#Partition#sim Îsimulate#0, 2', 2''
Front Axle
Rear Axle
0.06
0.06
0.04
0.04 0.02
0.02 t #s'
0.00 0.0
0.5
1.0
1.5
2.0
t #s'
0.00 0.0
Body Movement 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 0.0
0.5
1.0
1.5
2.0
Rotation Angle 0.06 0.04 0.02 t #s'
0.00 0.5
1.0
1.5
0.0
2.0
0.5
1.0
1.5
2.0
Figure 2.11. Simulation results for the different components, front and rear axle, body motion, and rotation about the center of mass.
Now the hard work for an engineer starts. If he is interested in an optimization of the elongations, he must change the properties of the car in the simulation. How to achieve the best result and how to select the best strategy depends on the experience of the individual engineer. However, within an object oriented simulation environment he has the flexibility to incorporate his thoughts in a quick and transparent way. For example, he can change some parameters of the front axle to achieve a better performance of the car... sim ÎcarÎ frontAxleÎ d1 sim ÎcarÎ frontAxleÎ k1
700; 100 000;
GraphicsGrid#Partition#sim Îsimulate#0, 2', 2''
Front Axle
Rear Axle
0.06
0.06
0.04
0.04
0.02
0.02 t #s'
0.00 0.0
0.5
1.0
1.5
2.0
t #s'
0.00 0.0
Body Movement 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 0.0
0.5
1.0
1.5
2.0
Rotation Angle 0.06 0.04 0.02 t #s'
0.00 0.5
1.0
1.5
2.0
0.0
0.5
1.0
1.5
2.0
Figure 2.12. Simulation results for changed parameters of the front axle.
...or he can replace the whole front axle by another one with a smoother damper.
Mathematics for Engineers
34
sim ÎcarÎ frontAxle
fa
Object of FrontAxle! GraphicsGrid#Partition#sim Îsimulate#0, 2', 2''
Front Axle
Rear Axle
0.06
0.06
0.04
0.04
0.02
0.02 t #s'
0.00 0.0
0.5
1.0
1.5
2.0
t #s'
0.00 0.0
Body Movement 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 0.0
0.5
1.0
1.5
2.0
Rotation Angle 0.06 0.04 0.02 t #s'
0.00 0.5
1.0
1.5
2.0
0.0
0.5
1.0
1.5
2.0
Figure 2.13. Simulation results for the model with a replaced front axle with different properties.
... and so on. However, a clever engineer will resort to some optimization strategies which can be carried out by an additional class in the development environment (not shown here). Finally, if the results are to be presented at meetings and workshops, the behavior of the optimized car can be shown as an animation using different simulation models or strategies. The generation of an animation object and the animation itself is started with an MakeAnimation Înew#'; anim an Îanimate#sim, 0, 2, 0.05'
Figure 2.14. Animation of the car motion during a ride over four bumpers.
The flip chart movie is suppressed in the printed version. However, the notebook version of this article shows you the movement of the car body and the wheels.
IV - Chapter 2: Simulation Methods
35
This example demonstrated that the object-oriented approach to modeling is very efficient and transparent to the user. It divides the process of modeling into two phases. In the first phase, the model and its components are implemented. In the optimization phase, the parameters of the model satisfying certain criteria are found. Both phases benefit from the availability of predefined components (objects). Using the classical approach, minor structural changes (e.g., using a different axle) require modifications by the model builder. This, in turn, may have made a subsequent verification step necessary. In contrast, the object-oriented approach allows this to be done much more easily. A new component (e.g., an axle) can be added or exchanged without affecting the soundness of the model. This fact is of crucial importance in the industrial environment where models contain a huge number of parameters. It is necessary to have a clear and well-founded procedure to handle them. If the process to manage and work with models is clearly specified, results can be obtained more quickly, and they are more predictable, and comprehensible.
3 Numbers and Errors
3.1 Introduction Numerical analysis uses results and methods from many areas of mathematics, particularly those of calculus and linear algebra. In this chapter, we consider a very useful tool from calculus, Taylor's theorem. This will be needed for both the development and understanding of many of the numerical methods discussed. Before we start with the discussion of approximations we have to examine the basic items which are numbers and their representation on computers. The representation of numbers is different compared to the symbolic representation in pencil and paper calculations. In such calculations we assume that a number can be represented by an infinite accuracy and precision. However the representation on a computer is completely different and need some basic understanding to prevent errors in calculations. The next two sections discuss the representation of numbers and the errors in numerical calculations. The last section of this chapter will introduce approximations using Maclaurin and Tailor's polynomials as a way to evaluate functions in a standard way. In this last section we also make estimations on the errors when the infinite numbers of terms are truncated and approximations are used. We will be concerned with this infinite series, which are sums that involve infinitely many terms. Infinite series are used, for example, to approximate trigonometric functions and logarithms, to solve differential equations, to evaluate difficult integrals, to create new functions, and to construct mathematical models of physical laws.
IV - Chapter 3: Numbers and Errors
37
3.2 Numbers and Representation 3.2.1 Numbers on the Real Line The simplest numbers are the natural numbers: 1
1, 2, 3, 4, 5, 6, …
1, 2, 3, 4, 5, 6, …
which are extended to all integers including zero and the negative integer numbers by: =
…, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, …
…, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, …
Such numbers are called integer numbers. With the exception that division by zero is ruled out, ratios of integers are called rational numbers. Examples are: 4
5 2 1 1 2 5 …, , , , 0, , , , …! 2 3 2 2 3 2
5 2 1 1 2 5 …, , , , 0, , , , …! 2 3 2 2 3 2
Observe that every integer is also a rational number since an integer p can be written as the ratio p
p 1
True
where p denotes any integer number. Division by zero is ruled out since we would want to be able to express the relationship y
p 0
1 encountered. More…
Power::infy : Infinite expression 0 y ComplexInfinity
in the alternative form 0y p 0 p
However, if p is different from zero, this equation is contradictory. Moreover, if p is equal to zero, this
38
Mathematics for Engineers
equation is satisfied by any number y, which means that the ratio p s 0 does not have a unique value — a situation that is mathematically unsatisfactory. For these reasons such symbols as p 0
and
0
(3.1)
0
are not assigned a value; they are said to be undefined. The early Greeks believed that the size of every physical quantity could, in theory, be represented by a rational number. They reasoned that the size of a physical quantity must consist of a certain whole number of units plus some fraction m s n of an additional unit. This idea was shattered in the fifth century B.C. by Hippasus of Metapontum, who demonstrated by geometric methods that the hypotenuse of a triangle in Figure 3.1 cannot be expressed as a ratio of integers. This unusual discovery demonstrated the existence of irrational numbers; that is, numbers not expressible as ratios of integers.
1 2 1 Figure 3.1. Calculation of the length of the hypotenuse by using Pythagoras formula a2 b2 case considered, is
11
2.
Other examples of irrational numbers are 1
2.
2.41421 3. 1.73205 3
7.
1.91293 N#S' 3.14159
c2 which, is for the specific
IV - Chapter 3: Numbers and Errors
39
N#cos+19 °/' 0.945519
The proof that S is irrational is difficult and evaded mathematicians for centuries; it was finally proven in 1761 by J.H. Lambert (see Figure 3.2).
Figure 3.2. Johann Heinrich Lambert (1728-1777). A Swiss-German scientist, Lambert taught at the Berlin Academy of Science. In addition to his work on irrational numbers, Lambert wrote landmark books on geometry, the theory on cartography, and perspective in art. His influential book on geometry foreshadowed the discovery of modern non-Euclidean geometry.
Rational and irrational numbers can be distinguished by their decimal representation. Rational numbers have repeating decimals; that is, from some point the decimals consists entirely of zeros or else a fixed finite sequence of digits are repeated over and over. Example 3.1. Rational numbers The following numbers demonstrate the behavior of rational numbers. 1 N% ) 2 0.5
1 s 2 terminates after the first digit. 1 N% ) 3 0.333333
1 s 3 has a continuous repetition of a single digit. +N#Ó1, 25' &/%
8 25
)
0.3200000000000000000000000
8 s 25 has two non-vanishing digits.
Mathematics for Engineers
40
+N#Ó1, 50' &/%
3 11
)
0.27272727272727272727272727272727272727272727272727
3/11 repeats the digits 2 and 7. 5 +N#Ó1, 50' &/% ) 7 0.71428571428571428571428571428571428571428571428571
5 s 7 repeats the 6 digits 714285 +N#Ó1, 50' &/%
6 13
)
0.46153846153846153846153846153846153846153846153846
6 s 13 repeats the 6 digits 461538. Irrational numbers are represented by non repeating decimals. An example for that behavior is +N#Ó1, 50' &/%
2 represented with 50 digits:
2)
1.4142135623730950488016887242096980785696718753769
with 250 digits this number looks as follows +N#Ó1, 250' &/%
2)
1.414213562373095048801688724209698078569671875376948073176679737990732478462107038850g 387534327641572735013846230912297024924836055850737212644121497099935831413222665927g 505592755799950501152782060571470109559971605970274534596862014728517418640889199
even within 500 digits there is no repetition +N#Ó1, 550' &/%
2)
1.414213562373095048801688724209698078569671875376948073176679737990732478462107038850g 387534327641572735013846230912297024924836055850737212644121497099935831413222665927g 505592755799950501152782060571470109559971605970274534596862014728517418640889198609g 552329230484308714321450839762603627995251407989687253396546331808829640620615258352g 395054745750287759961729835575220337531857011354374603408498847160386899970699004815g 030544027790316454247823068492936918621580578463111596668713013015618568987237235288g 509264861249497715421833420428568606014682472
2 does not begin to repeat from some point. The same behavior is observed for S
IV - Chapter 3: Numbers and Errors
41
+N#Ó1, 1000' &/#S' 3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628g 034825342117067982148086513282306647093844609550582231725359408128481117450284102701g 938521105559644622948954930381964428810975665933446128475648233786783165271201909145g 648566923460348610454326648213393607260249141273724587006606315588174881520920962829g 254091715364367892590360011330530548820466521384146951941511609433057270365759591953g 092186117381932611793105118548074462379962749567351885752724891227938183011949129833g 673362440656643086021394946395224737190702179860943702770539217176293176752384674818g 467669405132000568127145263560827785771342757789609173637178721468440901224953430146g 549585371050792279689258923542019956112129021960864034418159813629774771309960518707g 211349999998372978049951059731732816096318595024459455346908302642522308253344685035g 261931188171010003137838752886587533208381420617177669147303598253490428755468731159g 562863882353787593751957781857780532171226806613001927876611195909216420199
which also is known as an irrational number. There is no repetition of digits within the given precision.Æ In 1637 René Descartes published a philosophical work called Discourse on the Method of Rightly Conducting the Reason. In the back of that book were three appendices that purported to show how the method could be applied to concrete examples. The first two appendices were minor works that endeavored to explain the behavior of lenses and the movement of shooting stars. The third appendix, however, was an inspired stroke of genius; it was described by the nineteenth century British philosopher John Stuart Mill as "The greatest single step ever made in the progress of the exact sciences". In that appendix René Descartes (Figure 3.3) linked together two branches of mathematics, algebra and geometry. Descartes' work evolved into a new subject called analytic geometry; it gave a way of describing algebraic formulas by means of geometric curves using algebraic formulas.
Figure 3.3. René Descartes (1596-1650). Descartes a French aristocrat, was the son of a government official. He graduated from the University of Poitiers with a law degree at age 20. After a brief probe into the pleasures of Paris he became a military engineer, first for the Dutch Prince of Nassau and then for the German Duke of Bavaria. It was during his service as a soldier that Descartes began to pursue mathematics seriously and developed his analytic geometry. The story goes that Descartes coined his main philosophical point of view in a village near Ulm sitting in an oven "Cogito ergo sum" (I am thinking and therefore I exist).
Mathematics for Engineers
42
In analytic geometry, the key step is to establish a correspondence between real numbers and points on a line. To accomplish this, we arbitrarily choose one direction along the line to be called positive and the other negative. It is usual to mark the positive direction with an arrowhead as shown in Figure 3.4. Next we choose an arbitrary reference point on the line to be called the origin, and select a unit of length for measuring distances.
Origin
Figure 3.4. Coordinate line allowing to measure the distance between two numbers. The origin separates the positive from the negative numbers.
With each real number we can associate a point on the line as follows: Associate with each positive number r the point that is a distance of r units in the positive direction away from the origin. Associate with each negative number r the point that is a distance of r units in the negative direction from the origin. Associate the origin with the number 0. The real number corresponding to a point on the line is called the coordinate of the point and the line is called a coordinate line or sometimes the real line. Example 3.2. Numbers on the real line In
Figure
3.5
3, 1.75, 1 s 2,
we
have
the
location
of
the
points
whose
coordinates
are
2 , S, and 4.
3 4
marked
3
1 1.75 ccccc 2 2 1 0
2 1
2
S 3
4 4
Figure 3.5. Location of some real numbers 5 on the coordinate line.Ô
It is evident from the way in which real numbers and points on a coordinate line are related that each real number corresponds to a single point and that each point corresponds to a single real number. This fact is described by stating that the real numbers and the point on a coordinate line are in one-to-one correspondence. Looking at the numbers from a more general point of view, we observe that real numbers contain rational numbers and rational numbers contain integer numbers, and integer numbers contain natural numbers. This relation is graphically shown in Figure 3.6.
IV - Chapter 3: Numbers and Errors
43
5 4 = 1
Figure 3.6. The relation of sets of numbers. Natural numbers 1 are include in integer numbers =, integer numbers are includes in rational numbers 4, and rational numbers are part of real number 5. Despite the graphical representation by circles the sets of numbers are infinite sets.
3.2.2 Representation of Numbers Numbers are represented by a finite sequence of symbols using the digits 1 up to 0 if we use the natural decimal system. These symbols are commonly called digits in a decimal representation. Digits are the atomic parts of a number. For example the digits of the integer 1567 are IntegerDigits#1567' 1, 5, 6, 7
The digits of a real number are RealDigits#124.24536798976873447832' 1, 2, 4, 2, 4, 5, 3, 6, 7, 9, 8, 9, 7, 6, 8, 7, 3, 4, 4, 7, 8, 3, 2, 3
where the first three digits are the leading digits on the left hand side of the decimal point. The sequences of digits are used to represent the different types of numbers. Integer numbers are based on the decimal system which is one of several system in which numbers can be represented. The decimal system is based on the ten symbols 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. It is the system which is commonly used for calculations done by humans. If we represent a natural number such as 567859 in the decimal system we write the whole number by using individual digits representing the coefficients of powers of 10. The number 567856 is thus represented as 567 856 6 1 5 10 8 100 7 1000 6 10 000 5 100 000 True
Thus a string of digits represents a number according to the formula
44
Mathematics for Engineers
+an an1 …a2 a1 a0 /10
a0 100 a1 101 a2 102 ... an1 10n1 an 10n
(3.2)
This representation is only useful for integer numbers which are numbers greater than 1 excluding fractions of decimal numbers which are smaller than 1. Numbers smaller than 1, called fractions, can be represented in a similar way by using negative powers of 10. In general we have the symbolic representation a0 .b1 b2 b3 ...
a0 100 b1 101 b2 102 b3 103 ...
(3.3)
An example of this representation is 1.223575... which actually can be represented by 1.223575 1
2 10
2 100
3 1000
5 10 000
7 100 000
5 1 000 000
True
Some decimal numbers allow an infinite number of digits on the right of the decimal point as we discussed in the previous section. Thus a number like 2 , which is in fact an irrational number, has an infinite non recurring number of digits on the right of the decimal point. The following line shows 5000 digits of this number +N#Ó1, 5000' &/%
2)
1.414213562373095048801688724209698078569671875376948073176679737990732478462107038850g 387534327641572735013846230912297024924836055850737212644121497099935831413222665927g 505592755799950501152782060571470109559971605970274534596862014728517418640889198609g 552329230484308714321450839762603627995251407989687253396546331808829640620615258352g 395054745750287759961729835575220337531857011354374603408498847160386899970699004815g 030544027790316454247823068492936918621580578463111596668713013015618568987237235288g 509264861249497715421833420428568606014682472077143585487415565706967765372022648544g 701585880162075847492265722600208558446652145839889394437092659180031138824646815708g 263010059485870400318648034219489727829064104507263688131373985525611732204024509122g 770022694112757362728049573810896750401836986836845072579936472906076299694138047565g 482372899718032680247442062926912485905218100445984215059112024944134172853147810580g 360337107730918286931471017111168391658172688941975871658215212822951848847208969463g 386289156288276595263514054226765323969461751129160240871551013515045538128756005263g 146801712740265396947024030051749531886292563138518816347800156936917688185237868405g 228783762938921430065586956868596459515550164472450983689603688732311438941557665104g 088391429233811320605243362948531704991577175622854974143899918802176243096520656421g 182731672625753959471725593463723863226148274262220867115583959992652117625269891754g 098815934864008345708518147223181420407042650905653233339843645786579679651926729239g 987536661721598257886026336361782749599421940377775368142621773879919455139723127406g 689832998989538672882285637869774966251996658352577619893932284534473569479496295216g 889148549253890475582883452609652409654288939453864662574492755638196441031697983306g 185201937938494005715633372054806854057586799967012137223947582142630658513221740883g
IV - Chapter 3: Numbers and Errors
45
238294728761739364746783743196000159218880734785761725221186749042497736692920731109g 636972160893370866115673458533483329525467585164471075784860246360083444911481858765g 555428645512331421992631133251797060843655970435285641008791850076036100915946567067g 688360557174007675690509613671940132493560524018599910506210816359772643138060546701g 029356997104242510578174953105725593498445112692278034491350663756874776028316282960g 553242242695753452902883876844642917328277088831808702533985233812274999081237189254g 072647536785030482159180188616710897286922920119759988070381854333253646021108229927g 929307287178079988809917674177410898306080032631181642798823117154363869661702999934g 161614878686018045505553986913115186010386375325004558186044804075024119518430567453g 368361367459737442398855328517930896037389891517319587413442881784212502191695187559g 344438739618931454999990610758704909026088351763622474975785885836803745793115733980g 209998662218694992259591327642361941059210032802614987456659968887406795616739185957g 288864247346358588686449682238600698335264279905628316561391394255764906206518602164g 726303336297507569787060660685649816009271870929215313236828135698893709741650447459g 096053747279652447709409924123871061447054398674364733847745481910087288622214958952g 959118789214917983398108378827815306556231581036064867587303601450227320882935134138g 722768417667843690529428698490838455744579409598626074249954916802853077398938296036g 213353987532050919989360751390644449576845699347127636450716327915470159773354863893g 942325727754003826027478567417258095141630715959784981800944356037939098559016827215g 403458158152100493666295344882710729239660232163823826661262683050257278116945103537g 937156882336593229782319298606467978986409208560955814261436363100461559433255047449g 397593399912541953230093217530447653396470662761166175351875464620967634558738616488g 019884849747926404506544489691004079421181692579685756378488149898641685499491635761g 448404702103398921534237703723335311564594438970365316672194904935188290580630740134g 686264167247011065346349391640714628556798017793381442404526913706660977763878486623g 800339232437047411533187253190601916599645538115788841380843323210533767461812178014g 296092832411362752540887372905129407339479433061943956936702079429515878228349321931g 666411130154959469837897767434443539337709957134988407890850815892366070088658105470g 949790465722988880892461282816013133701029080290999745647849581545614648715516390502g 419857906131093458783306200262207372471676685455499904994085710809925759928893236615g 438271955005781625133038153146577907926868500806984428479152424275441026805756321565g 322061885751225113063937025362927161968251259192025216058701189596732244239267423734g 490764646727375347964598819149807931718002423855453886038368310800779182466462754117g 444250018727779518164383451463461299020763343017968554385631667723518389336667042222g 110939144930287963812839889311731308430042125550185498506529455637766031461255909104g 611384768282359592477228629042642736163264585443392877263860343149804896397363329754g 885925681149296836126725898573833216436663487023477302610106130507298611534129948808g 7744731112295426527516536659117301423606265
This means that irrational numbers need an infinite number of digits to represent the exact number. This observation is in contradiction to the fact that a computer uses a finite number of digits to represent numbers. In general, a decimal number has the following representation
Mathematics for Engineers
46
+an an1 ... a1 a0 .b1 b2 b3 .../10
n
k 0
k 1
k k Å ak 10 Å bk 10
(3.4)
where the subscript 10 on the left hand side denotes the base of the number system. The first part on the right hand side represents the integer part of the number and the second sum represents the fractional part of the number. Other representations of numbers can be given in different bases than 10. Especially on computers the use of the binary system is common which is based on two numbers 1, 0. Other systems are the octal system with base 8 or the hexadecimal system which is using base 16 (see Table 3.1). The octal system uses the digits 0, 1, 2, 3, 4, 5, 6, 7 and the hexadecimal system is based on the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f . Generalizing these observations, the representation of any number in a base E can be written in analogy to the decimal system by n
+an an1 ... a1 a0 .b1 b2 b3 .../ E
Å ak Ek Å bk Ek k 0
(3.5)
k 1
If for example we have a number given in octal representation like +127 653/8 3 80 5 81 6 82 7 83 2 84 1 85 3 8 +5 8 +6 8 +7 8 +2 8 1 //// 44 971
(3.6)
the result is given in the decimal system on the right hand side. This means the conversion of the number given in a certain basis is straight forward by using the formula from above. For example, the conversion of an octal number to a decimal number works the same way as in the decimal system. We multiply the digits by powers of base E and assign to each digit in the base representation a power starting from right to left. Numbers which are smaller than the second symbol and larger than the first symbol in the base are converted by negative powers of the base. The flowing number in octal representation is +0.24671/8 +0.24671/8
0 80 2 81 4 82 6 83 7 84 1 85
10 681
0.325958
32 768
(3.7)
This procedure can be carried out for any basis. Using the binary system, the following number +0.0100101/2 is representing the number +0.0100101/2 0 20 0 21 1 22 0 23 0 24 1 25 0 26 1 27
37 128
0.289063
(3.8)
This can be automatically generated by the function FromDigits[] by specifying the base and the number of leading digits on the left hand side of the radix point which is the point separating the fractional values from the integer values in any basis. The example from above can be converted by
IV - Chapter 3: Numbers and Errors
47
N#FromDigits#0, 0, 1, 0, 0, 1, 0, 1, 1, 2'' 0.289063
The more important question is how to convert numbers between different basis. Let's assume we have given a number in base E and like to know the representation of the number in base J. For example if E 10 and J 2 then we convert a decimal number to a binary number. As we know, the number in decimal form is given by N
+cm cm1 ... c2 c1 c0 / E
c0 E+c1 E+c2 E+c3 .... E+cm / ...///
(3.9)
The same number in binary representation is N
+bn bn1 ... b2 b1 b0 /J
b0 b1 J1 b2 J2 b3 J3 ... bn Jn
(3.10)
If we subtract the largest term from the right from the left hand side and decide if the result is positive N bn Jn
Nn
b0 b1 J1 b2 J2 b3 J3 ... bn1 Jn1
(3.11)
then we can continue this way to determine the exponents n and the digits bn in the basis J. Nn bn1 Jn1
Nn1
b0 b1 J1 b2 J2 b3 J3 ... bn2 Jn2
(3.12)
... The algorithm to convert a decimal number to a number in base J is available in the function BaseForm[]. For example the conversion of the number 3781 in an octal representation is carried out by BaseForm#3781, 8' 73058
The algorithm simplifies somehow for the binary system which uses only two different digits which makes the procedure simpler. The following lines contain a way to convert a decimal number to a binary number. x
306 000 782;
y
; While%x ! 0, x , AppendTo#y, 1'; x
If%EvenQ#x', AppendTo#y, 0'; x 2 y
x 1; x
x 2
));
Reverse#y'
1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0
The inversion of the binary numbers to decimal numbers is quite easily done by using the FromDigits[] function. The application to the number examined above shows that the conversion is correct. FromDigits#y, 2' 306 000 782
48
Mathematics for Engineers
So far, we know how numbers are converted between different bases. However, we did not talk about precision and accuracy of a number. By precision of a decimal number we mean the total number of significant decimal digits in a number x, while accuracy counts the number of significant decimal digits to the right of the decimal point in x. The following Mathematica functions determine the accuracy and precession of a decimal number Precision#19 088.23000212381732357623' 24.2808 Accuracy#1.23000212381732357623' 20.
3.2.3 Tests and Exercises The following two subsections serve to test your understanding of the last section. Work first on the test examples and then try to solve the exercises. 3.2.3.1 Test Problems T1. What are numbers? T2. How is a number represented? T3. What are decimal numbers? T4. What is the base of a number? T5. How is a fraction of a number defined?
3.2.3.2 Exercises E1. Can you trust your calculator? a. If you take 20 times the square root of 7 and then square the result 20 times, what should be the result? b. Carry out the calculation from part a) by using a pocket calculator. c. Compare the results of part a) and b). E2. Inscribe regular polygons into a halve a circle with radius one leads to the following recursive formula to calculate S:
Pn1
2n1
2
1
1
P2n 4n1
with P1
2
2
(1)
Carry out the calculation by using your calculator. See what happens! Surprising, isn't it? E3. Newton's method is widely used to calculate the zeros of a function. The calculation is based on the iteration formula xn1
xn
f +xn / f ' +xn /
(2)
with some initial value x0 close to the root given. a. Let us consider the function f +x/
x2 2
1 cos+x/
(3)
Show that x 0 is a root of this function and find the iteration formula for this function. b. Start the iteration process with the initial value x0 0.001 and other values close to zero. What are your
IV - Chapter 3: Numbers and Errors
49
observations?
E4. Why is 0.99999 … 0.9 equal to 1? a. Write your answer for 0.9999… as a series b. Calculate the sum of the series. E5. Convert the following numbers given in different basis to a decimal number. a. +123 AB/16 , b. +17.23/8 , c. +10 234/5 , d. convert the following binary numbers to a decimal number i) 1000111, ii) 11.0001, iii) 1.1111010101, and iv) 101010.01. E6. Conversion of numbers: a. Convert the following decimal numbers to binary numbers. i) 104, ii) 256, iii) 748.52 b. Convert the following decimal numbers to octal numbers i) 7, ii) 11, iii) 8888.1111 iv) 64.36, and v) 77777.0000092345 c. Convert the numbers from b) into base 5 and 7. E7. Find a binary number which approximates S to within 103 .
3.3 Errors on Computers At the beginning of this chapter we mentioned that numbers on a computer are represented by a finite number of digits. These digits are combined in a number format which is known as IEEE standard format. As we know, a number consists of several parts, the sign, the integer part, the fractional part and the radix point. The standard way to represent a non negative real number in decimal form is with an integer part, a fractional part, and a decimal point between them—for example, 67.5643, 0.07865, and 20078.67. Another standard form, often called normalized scientific notation, is obtained by shifting the decimal point and supplying appropriate powers of 10. Thus, the preceding numbers have alternative representations as 67.5643
0.675643 102
0.07865
0.7865 101
20 078.67
0.2007867 105
In normalized scientific notation, the number is represented by a fraction multiplied by 10n , and the leading digit in the fraction is not zero. This normalized scientific notation is also known as normalized floating-point representation. Such kind of number is represented in the decimal system by the expression x
0. d1 d2 d3 … 10n
(3.13)
where d1 0 and n is an integer, positive, negative or zero. The numbers d1 , d2 , … are the decimal digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. The representation consists of three parts: a sign that is either or ,
Mathematics for Engineers
50
a number, and an integer power of 10. The number r the exponent.
0. d1 d2 … is called normalized mantissa and n
Now consider a number x written in binary format. In analogy with the formula for decimal numbers we can write x
V x 2m
(3.14)
where V 1 or V 1, m is an integer, and x is a binary fraction satisfying +1/2 x +10/2 . In decimal this means 1 x 2. Computer IBM 7094 Burroughs 5000 Series IBM 360 s 370 CDC 6000 and Cyber Series DEC 11 s 780 VAX Hewlett Packard 67
E
n M
2 8 16 2 2 10
27 13 6 48 24 10
27 26 26 210 27 99
m
PC 32 bit PC 64 bit Cray
2 23 2 52 2 48
28 211 214
Table 3.1. Number representation on computers. E represents the base, n length of the mantissa, and M m the maximal exponents of the number x +.d1 d2 …dn / E Ee with m e M . Many binary computers have a finite word length of 32 bits. Newer generations now work with 64 bits (see Table 3.1). We will discuss a machine of 32 bit word length. This word length is typically related to a single-precision floating-point number. Because of the 32-bit word length, as much as possible of the normalized floating-point number x 2m
(3.15)
must be contained in those 32 bits. One way of allocating the 32 bits is as follows: the sign of x takes 1 bit the exponent m take 8 bits and the mantissa x takes the remaining 23 bits. exponent fraction +23 bits/ +8 bits/ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 sg
ig gg n
Figure 3.7. Structure of a IEEE standard 32 bit floating number. The number consists of 1 sign bit, 8 bits for the exponent and 23 bits for the mantissa (fraction).
Information on the sign for the exponent m is contained in the eight bits allocated for the integer m .
IV - Chapter 3: Numbers and Errors
51
In such a scheme, we can represent real numbers with m as large as 27 1 represents numbers from -127 through 128.
127. So the exponent
In a normalized representation of a nonzero floating-point number, the first bit in the mantissa is always 1 so that this bit does not have to be stored. This can be accomplished by shifting the binary point. The binary point is not stored in the 32-bits but it is understood that the border between the first 9 bits and the last 23 bits represents the radix point. Since the bit format is restricted to the 32 bits and the allocation of bits is as described, the largest and smallest numbers represented in this format is such as 0.9 2128 3.06254 1038
and 0.9 2127 5.28972 1039
The exact numbers are ,2 223 0 2127 126
smallest on is 2
340 282 346 638 528 859 811 704 183 484 516 925 440 and the
1 85 070 591 730 234 615 865 843 651 857 942 052 864
. These numbers show that the actual range
on a computer is limited and that numbers larger or smaller than these numbers will cause some problems. We turn now to the error that can occur when we attempt to represent a given real number x on the computer. Based on our 32-bit model suppose that we let x 28 737 262 802 378 or 2298 472 368 . The exponents of these numbers exceed the limitations of the 32-bit model. So these numbers would overflow and underflow, respectively, and the relative error in replacing x by the closest machine number will be very large. Such numbers are outside the range of a 32-bit representation. When we develop numerical algorithms, we always base our reasoning on the real-number system. However, when we run the methods on a computer we work with finite precision numbers. The finiteprecision numbers that can be represented in floating-point format are only a finite subset of the real numbers. A consequence of this finite interval representation is that some of the rules that apply to the real number system do not hold for computer-representable numbers. For example the limiting processes are all violated by the finite number representation. Furthermore, many of the laws that hold for real numbers, such as that associative and distributive laws, do not carry over to finite-precision arithmetic. The discrepancy that arise from the theoretical development of methods and their practical implication as computer algorithms sometimes create difficulties.
Mathematics for Engineers
52
Origin
Figure 3.8. Real line on which all the numbers are located from up to .
When an arithmetic operation is performed with two floating-point numbers, the result is a real number which is not necessarily representable in floating-point as discussed above. To work with such result, some adjustment has to be made with the resulting number. Since there is always a floating-point number close to the exact value, we perform only a small error if we replace the result with a nearby floating-point number. This process is called rounding, and the small error committed by it is the rounding error. There are several possible rounding strategies: rounding to the nearest rounding toward zero rounding toward symmetric rounding (rounding to an even digit) 0.6666 … 0.67 chopping (nearest floating point number between x and 0) 0.6666 … 0.66 The most plausible way of rounding is to pick the floating-point number that is close to the true value. We call this rounding to the nearest and will use round(a) to denote the floating-point number associated with any real number a by rounding to the nearest. The error created by rounding to the nearest is easily expressed in terms of the specifics of the floating-point system. For example, the IEEE double format and in most other conventions a+1 K/
round+a/
(3.16)
with K
1 2
H
(3.17)
where H has the value of a machine epsilon or ulp which is 2 52 2.22045 1016 . This is the normal way of rounding, so that when we use the term without a qualifier we will mean rounding to the nearest. Another kind of rounding is rounding toward zero, denoted by round0 +a/. Here we pick the closest floating-point number on the side toward zero from the true result. In essence this means just chopping off the bits that do not fit into the limited field width for a, so this is called chopping or truncating. Truncation is easy to implement in hardware, but gives larger errors than rounding and is not often used. The IEEE standard also recommends two other kinds of rounding: rounding toward infinity or rounding up gives the smallest floating-point number greater than a, while rounding toward negative infinity or rounding down gives the largest floating-point number smaller than a. The different rounding strategies are illustrated in Figure 3.9.
IV - Chapter 3: Numbers and Errors
round round round0 ab a b
round
53
round round round round0 ab
Figure 3.9. Different rounding strategies for floating-point numbers.
Rounding is necessary for most floating-point operations, and the error created carries through any sequence of arithmetic operations. We use the symbols fl+x/ to denote the result we get when we evaluate an expression x with floating-point operations. If floating-point arithmetic is implemented carefully and properly, each individual operation yields maximum possible accuracy; that is the result of an arithmetic operation is the rounded true result, so that fl+a b/
round+a b/
+a b/ +1 K/
(3.18)
and fl+a b/
round+a b/
a b+1 K/
(3.19)
where K satisfies K H s 2. These rules assume that the result of the operation after rounding is computer-representable; that is, we do not create a number so large that it exceeds that range of representable numbers. If a result is too large we have an overflow as discussed previously. Some computers terminate computations when an overflow or underflow happens, but in the IEEE format the provision for positive infinity, negative infinity, and not-a-number make it possible for the software to deal with exceptional cases in a more controlled fashion. While it is easy to describe the effect of rounding in one operation, it is much more difficult to see what happens in a lengthy computation. Every step creates a small error, and these individual errors add up. The important question is to what extend the accumulated rounding error affects the accuracy of the final result. At first, it may seem that we need not worry about rounding at all. The number of significant digits used by computers is so much larger than what is needed and what the data accuracy warrants that even if error accumulates over millions of operations it will never become important enough to affect the answers. There is some validity to this point of view, but there are some computations where small errors can grow very rapidly and a result such an optimistic attitude is not justified. For example if we use simple arithmetic operations such as root taking of a number and iterate this several times, this reads symbolically
54
Mathematics for Engineers
…
2
where dots represent about 200 iterations. If we carry out this iteration numerically we make following observation Manipulate#i, N#Nest#Sqrt, 2, i'', i, 1, 200, 1'
i
52, 1.
Up to a certain iteration number, the result converges to a single number. The following table of numbers shows the sequence of numbers generated during this iteration. x
2;
tab1
Table%x
N% x , 6), i, 1, 200)
1.41421, 1.18921, 1.09051, 1.04427, 1.02190, 1.01089, 1.00543, 1.00271, 1.00135, 1.00068, 1.00034, 1.00017, 1.00008, 1.00004, 1.00002, 1.00001, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000
IV - Chapter 3: Numbers and Errors
55
Now if we take the final result which is actually a simple 1.0000 in the displayed form and invert the operation by squaring the number as often as we carried out the iteration for root taking then we get x
Last#tab1';
Table$x
N$x2 , 6(, i, 1, 200(
1.00000, 1.00000, 1.00000, 1.0000, 1.0000, 1.0000, 1.000, 1.000, 1.000, 1.00, 1.00, 1.00, 1.00, 1.0, 1.0, 1.0, 1., 1., 1., 0., 0., 0., 0., 0., 0., 0. 101 , 0. 102 , 0. 105 , 0. 1010 , 0. 1021 , 0. 1042 , 0. 1084 , 0. 10168 , 0. 10337 , 0. 10675 , 0. 101350 , 0. 102700 , 0. 105400 , 0. 1010 800 , 0. 1021 600 , 0. 1043 201 , 0. 1086 402 , 0. 10172 804 , 0. 10345 609 , 0. 10691 218 , 0. 101 382 436 , 0. 102 764 873 , 0. 105 529 747 , 0. 1011 059 494 , 0. 1022 118 988 , 0. 1044 237 976 , 0. 1088 475 953 , 0. 10176 951 907 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467 , 0. 10323 228 467
General::ovfl : Overflow occurred in computation. j General::ovfl : Overflow occurred in computation. j
Mathematics for Engineers
56
General::ovfl : Overflow occurred in computation. j General::stop : Further output of General::ovfl will be suppressed during this calculation. j
which actually is not the original number. This means that during our iteration there were some accumulated errors which changed the final result dramatically. Another situation coming from catastrophic cancellation is related to subtraction. Suppose we compute the difference of two numbers ab
z
(3.20)
where the value of a and b are very close to each other. If these quantities arise from some computation they could have small errors, so that instead of z we actually get z, with z z K,
(3.21)
where K is small. This makes z accurate in an absolute sense, but its relative error z z z
K
(3.22)
ab
can be quite large if a b. Another way of saying this is that when we subtract two close quantities, many of the correct digits cancel, leaving only those two seriously contaminated by error. When such quantities are used in further calculations, one small error can cause errors many orders of magnitude larger in subsequent calculations. Example 3.3. Errors in Subtractions Consider the quadratic equation a x2 b x c
(3.23)
0
with a 0. As we know, the two solutions to this equation can be found in algebraic form as x1
b
b2 4 ac
(3.24)
2a
and x2
b
b2 4 ac 2a
.
(3.25)
While these formulas are commonly used, they are not, always the best to use. Take the specific values a 1, b 10.00001, and c 0.0001, corresponding to the roots
IV - Chapter 3: Numbers and Errors
57
x .; SetAccuracy#a, 16'; SetAccuracy#b, 16'; SetAccuracy#c, 16'; 0, x(; solution Solve$a x2 b x c solution s. a ! 1.0, b ! 10.00001, c ! 0.00001
x1
x 10., x 9.99999 107
This way of solving the equation guarantees that the solution is generated as expected. However, if we numerically solve this equation by a procedure we do not know like FindRoot, we find x .; equation a x2 b x c 0 s. a ! 1.0, b ! 10.00001, c ! 0.0001; Print#equation'; solution FindRoot#equation, x, 0.001, AccuracyGoal ! 2'; x2 N#x s. solution, 30' 0.0001 10. x 1. x2 m 0 9.89801 106
The relative error of the two solutions is thus Abs#+x2 +x s. x1327// s x2' 0.89897
which is quite big. So we may have lost some significant digits in our calculations. The reason for this behavior is cancellation between the very close values of b and
b2 4 a c .Ô
These example show that rounding errors can grow rapidly in just a few steps. We say that such a computation is unstable. In numerical analysis, the term stability has various definitions. Intuitively, a computation is stable if small errors have small effects on the result and unstable if small errors can be significantly magnified. If a numerical calculation is unstable we can overcome the instability by changing the algorithm, but this option is not always available. As we also observed, a certain kind of recursive computations can be unstable, but it is not easy to see how to reformulate them in a more stable manner. Example 3.4. Recursion and Error Propagation Let the sequence f +n/ be defined by f +n_/ : 2.9 f +n 1/ 1.8 f +n 2/
and let n
3, 4, 5, … with f +1/
f +1/
1; f +2/
1, f +2/
0.9
0.9;
The iteration is carried out by first defining the initial values for the first two iteration steps and then iterating the sequence 20 times
Mathematics for Engineers
58
list1
f +1/, f +2/;
list1
Join#list1, Table# f +i/
f +i/, i, 3, 20''
1, 0.9, 0.81, 0.729, 0.6561, 0.59049, 0.531441, 0.478297, 0.430467, 0.38742, 0.348678, 0.313811, 0.28243, 0.254187, 0.228768, 0.205891, 0.185302, 0.166772, 0.150095, 0.135085
If we graphically represent the data we observe that the decay follows a certain law which can be identified as a power law 1.0
fn
0.8
0.6
0.4
0.2 0
5
10
15
20
n
This becomes obvious if we logarithmically plot the data in a logarithmic scale 1.00 0.70
fn
0.50
0.30
0.20 0.15 5
10
15
20
n
The solution actually satisfies the relation fn
0.9n1
(3.26)
This is graphically verified in the following plot where relation (3.26) is plotted in connection with the
IV - Chapter 3: Numbers and Errors
59
results derived from the iteration. 1.0
fn
0.8
0.6
0.4
0.2 0
5
10
15
20
n
However, if we iterate this recursion with a larger iteration number, say n numbers list1 list1
f +1/, f +2/; Join#list1, Table# f +i/
100, we get the following
f +i/, i, 3, 100''
1, 0.9, 0.81, 0.729, 0.6561, 0.59049, 0.531441, 0.478297, 0.430467, 0.38742, 0.348678, 0.313811, 0.28243, 0.254187, 0.228768, 0.205891, 0.185302, 0.166772, 0.150095, 0.135085, 0.121577, 0.109419, 0.0984771, 0.0886294, 0.0797664, 0.0717898, 0.0646108, 0.0581497, 0.0523347, 0.0471012, 0.042391, 0.0381516, 0.034336, 0.0309015, 0.0278096, 0.025025, 0.0225154, 0.0202495, 0.0181959, 0.016319, 0.0145724, 0.0128859, 0.0111388, 0.00910777, 0.00636275, 0.002058, 0.00548476, 0.0196102, 0.046997, 0.100993, 0.208285, 0.422239, 0.849581, 1.70375, 3.41164, 6.827, 13.6573, 27.3177, 54.6381, 109.279, 218.56, 437.121, 874.244, 1748.49, 3496.98, 6993.96, 13 987.9, 27 975.9, 55 951.7, 111 903., 223 807., 447 614., 895 228., 1.79046 106 , 3.58091 106 , 7.16182 106 , 1.43236 107 , 2.86473 107 , 5.72946 107 , 1.14589 108 , 2.29178 108 , 4.58357 108 , 9.16713 108 , 1.83343 109 , 3.66685 109 , 7.3337 109 , 1.46674 1010 , 2.93348 1010 , 5.86696 1010 , 1.17339 1011 , 2.34679 1011 , 4.69357 1011 , 9.38714 1011 , 1.87743 1012 , 3.75486 1012 , 7.50971 1012 , 1.50194 1013 , 3.00389 1013 , 6.00777 1013 , 1.20155 1014
This sequence of numbers shows that errors in this calculation accumulate rapidly and that finally the sequence becomes very large in magnitude
Mathematics for Engineers
60
0 2.0 10
13
fn
4.0 1013 6.0 1013 8.0 1013 1.0 1014 1.2 1014 0
20
40
60
80
100
n
At the end, the error is so big that we cannot trust in this calculation.Ô This example shows that rounding errors can become significant even in short iteration sequences. To guard against errors that may affect our conclusions, we would like to have a mathematically rigorous theory by which we can predict rounding effects. On the other hand, an analytic solution is always useful to check against numerical results. If we know a symbolic solution for a problem it is always wise to check the numerical results against the results derived from symbolic results. Unfortunately only in simple and simplified models a symbolic solution is derivable. However, these simplifications can be used to carry out the check for a certain domain of values. We will see in the next section that error propagation cannot be formulated in a simple and satisfying theory that we can easily use. As a result, we normally have to deal with rounding errors in a pragmatic way. Here are some rules of thumb that work in most circumstances. The first step is to look at an algorithm carefully for possible places where cancellations could cause instability. Whenever possible, we should look for a reformulation that will increase stability. If all parts of the computation look stable, the chances are good that the entire process is stable, but this is not an absolute guarantee. If there is any doubt, rearranging the details of the method may give an indication of the loss of significant digits. Example 3.5. Different Algorithms Evaluate the sum
V
Å +1/
i1
i 1
1 (3.27)
1 i2
where the following relation is an approximation by n terms to the exact sum with infinite terms n
S
Å +1/
i1
i 1
1 1 i2
.
(3.28)
Solution 3.5. The summation of the finite approximation S can be done by adding the terms in
IV - Chapter 3: Numbers and Errors
61
forward order n
Å +1/
1
i1
i 1
1
1i
2
2
1 5
1 10
…
(3.29)
or in reverse order S
+1/n1
1 1 n2
+1/n
1 1 +n 1/2
…
(3.30)
While these two ways are mathematically equivalent, the rounding is different so they may not give exactly the same results. The direct summation for 50000 terms with a 16 digit accuracy delivers 500 000
val1
i1 Å +1/ SetAccuracy% i 1
1 i 1 2
, 16)
0.3639854725
The summation in the downward direction delivers val2
Sum%+1/i1 SetAccuracy%
1 i 1 2
, 16), i, 500 000, 1, 1)
0.3639855151
The comparison shows that only 7 digits are reliable in this calculation '
val1 val2
4.3 108
The results for different values of summation terms is given in the following table. down
'
n
up
100 000
0.3639854725 0.3639854725 0. 1011
200 000
0.3639854725 0.3639854725 0. 1011
500 000
0.3639854725 0.3639855151 4.3 108
1 000 000 0.3639854725 0.3639855402 6.8 108
The results show that the reliable digits decrease due to the rounding errors which become larger when larger summation indices are taken into account. If the rounding of the numbers is transferred to higher digits by increasing the accuracy of the number, we will get identical results, this happens if we set accuracy to 32 digits
Mathematics for Engineers
62
n
up
down
'
100000.0
0.3639854724589339
0.3639854724589339
0.000000 1026
200000.0
0.3639854724964335
0.3639854724964335
0.000000 1026
500000.0
0.3639854725069334
0.3639854725069334
0.000000 1026
1000000.0
0.3639854725084334
0.3639854725084334
0.000000 1025
In the case with 32 valid digits the first 16 digits, of the number are the same and thus the summation can be trusted as valid. In fact, we do not know if the numerical result is trustworthy because we do not know the behavior for very large numbers; i.e. . However, for this sum we can actually find a symbolic value which is given by
sval
Å +1/ i 1
1 2
1
i1
1 i2
+1 S csch+S//
The numeric value approximated to 16 valid digits is then given by nsval
N#sval, 16'
0.3639854725089334
Comparing the symbolic value with the upward approximation shows that nsval val1 0. 1011
the first 10 digits are in agreement with each other. For the downward summation we find that the agreement is restricted to nsval val2 4.26 108
the first 7 digits.Ô Sometimes, it is not clear how to rearrange the computations suitably. In that case, we may want to rerun the problem, perturbing the data and other parameters by a small amount. If the new results show a significant change, we can conclude that there is a stability problem. This is useful even in stable calculations. When the input data are experimental and have an inherent measurement error, perturbing the initial data by an amount that reflects this data error, the computations will show how the data error affects the final result. These examples give us hints as to what to do in practice about rounding errors. First, we should carefully examine the algorithm for places where there could be significant loss of significance, and for repetitive computations in which there could be error growth in each step. If possible, the
IV - Chapter 3: Numbers and Errors
63
implementation should be changed to produce more stable results. If, in spite of our best efforts, excessive rounding errors still seem possible, recomputations with perturbed data is advisable. If small changes in the data cause large changes in the results, the computations are not trustworthy. At that point, a complete, in-depth re-examination of the problem is advisable.
3.3.1 Theory of Rounding Errors While the heuristic and ad hoc attacks on rounding, that we just discussed are very useful in practice, a more rigorous treatment would obviously help to complement that point of view. Many attempts have been made to develop a rigorous approach based on bounding the errors in each step and using inequalities to carry the bounds forward. On the whole, this has not been particularly successful. While it is not hard to write the basic rules of rounding, the analysis tends to become intractable for even moderately complicated problems. In a computation involving several operations, the rounding error incurred at one step is carried along to subsequent steps. To analyze the entire effect of rounding we have to see how all of the small errors propagate and accumulate. While we can easily find the bound on the rounding error in each step when these bounds are propagated things get messy quite quickly. Suppose we add three floating-point numbers a, b, c. Even in this simple case, we have to state the order of the operations, because we cannot assume that the associative law holds, and the conclusions may be affected by the order in which the steps are carried out. For simplicity, let us assume that all three numbers are positive and that we add a and b first. Then fl++a b/ c/
fl+fl+a b/ c/
fl++a b/ +1 K1 / c/
++a b/ +1 K1 / c/ +1 K2 /
(3.31)
where K1 and K2 are not larger than one-half ulp in magnitude. From this we can produce a bound on the error involved in the two consecutive floating-point additions by a b c fl++a b/ c/ +a b c/ K +a b/ ,K K2 0
(3.32)
where K max+K1 , K2 /. If we ignore the very small term K2 and assume that a, b, and c are all roughly the same size, we see that, approximately, the relative error in the sum of three numbers is one ulp— not an unexpected result. With a little work and a few judicious assumptions one can carry this sort of arguments another step forward. Example 3.6. Rounding Errors in Sums Estimate the effect of rounding on the sum n
Sn
Å xi
(3.33)
i 1
where 0 xi 1. Again we have to be specific about the order in which the additions are performed. We do this by defining the partial sums recursively as
Mathematics for Engineers
64
Si1 xi
Si with S0
(3.34)
0. In a floating-point computation, this becomes
fl+Si /
fl+fl+Si1 / xi /
+fl+Si1 / xi / +1 Ki /
fl+Si1 / xi +fl+Si1 / xi / Ki
(3.35)
where Ki H s 2. The errors Hi
Si fl+Si /
(3.36)
then satisfy the recurrence Hi
Hi1 +fl+Si1 / xi / Ki
with H1
(3.37)
0. If the accumulated effect of rounding is not too severe, we can expect that
fl+Si1 / xi n
(3.38)
so that Hi Hi1
1 2
H n.
(3.39)
From this it follows that Hn
1 2
H n2 ,
(3.40)
and we might expect that after a few thousand operations five or six significant digits could be lost.Ô The result of this example suggests, then, that rounding errors may accumulate quickly enough to grow significantly even in fairly short computations. But this is a worst-case situation, and in reality the problem is much less serious. First because Sn can be expected to be of order n, the relative error Sn fl+Sn / Sn
O+n/
(3.41)
grows only linearly and is more manageable. Furthermore, since rounding errors behave like random variables with mean zero, some individual rounding will increase the propagated error while others will decrease it. This tends to make the accumulated effect much smaller than the worst-case prediction. In reality, in this kind of simple computation the rounding error rarely grows quickly enough to become significant. Example 3.7. Sums of Random Variables The following program generates 1000000 random numbers between 0 and 1 and computes the sum of these numbers in forward and backward direction. We expect that the difference is a good measure of the accumulated rounding errors.
IV - Chapter 3: Numbers and Errors
65
000 tab SetAccuracy#RandomReal#', 32'1i 000 ; 1 res1 Fold#Plus, 0, tab'; res2 Fold#Plus, 0, Reverse#tab''; res1, res2
500 278.9518474483814227179046680, 500 278.9518474483814227179046680
Repeated runs with the program indicated that the average loss of accuracy in summing these numbers was only one significant digit if at all. This shows that summing numbers is actually not a critical operation on a computer.Ô The difficulty with rigorous error analysis is clear: not only is it cumbersome, but it also tends to give overly pessimistic results. Virtually from the first use of floating-point arithmetic, numerical analysts have sought alternatives that would give a handle on the rounding problem. In particular, mathematicians have looked at number representations that can contain in the notation an assessment of the accuracy. Among the various alternative that have been proposed, interval arithmetic has been the most productive. In interval arithmetic a real number a is represented by two floating-point numbers a1 and a2 , such that a1 a a2 .
(3.42)
We write this as a a #a1 , a2 ', indicating that the real number a is represented by the two floating-point numbers that enclose it. The width of the interval #a1 , a2 ' gives the uncertainty about the true value of a. The rules of interval arithmetic are chosen so that these uncertainties, including those caused by rounding, are carried correctly through a computation of many steps, and so that the true result is always guaranteed to be included in the interval. The rules are fairly straightforward. For example, if all arithmetic is precise, interval addition uses the rule #a1 , a2 ' #b1 , b2 '
#a1 b1 , a2 b2 '.
(3.43)
Since the sum of any number in the interval #a1 , a2 ' and any number in the interval #b1 , b2 ' will lie in the interval #a1 b1 , a2 b2 ', the inclusion rule for interval arithmetic is satisfied. The interval multiplication is a little less obvious, but it is still not hard to see that the correct rule is #a1 , a2 ' #b1 , b2 '
#min+a1 b1 , a1 b2 , a2 b1 , a2 b2 /, max+a1 b1 , a1 b2 , a2 b1 , a2 b2 /'.
(3.44)
Example 3.8. Interval Arithmetic Let a a #0.1, 0.2' and b a #1, 1.4' evaluate the sum and the product of the two values a and b by using interval arithmetic. Solution 3.8. The sum is given by a b a #0.1, 0.2' #1, 1.4'
#0.9, 1.6'
In Mathematica, this expression is represented by
(3.45)
Mathematics for Engineers
66
Interval#.1, .2' Interval#1, 1.4' Interval#0.9, 1.6'
which delivers the expected result. The multiplication of the intervals is defined as a b a #0.1, 0.2' #1, 1.4'
#0.14, 0.28'
which again is given in Mathematica by Interval#.1, .2' Interval#1, 1.4' Interval#0.14, 0.28'
This demonstrates that some of the features of interval arithmetic is available in the computer aided software (CAS) and can be used for more complicated calculations if needed.Ô These rules work only when all operations are carried out without error. In finite precision arithmetic where rounding is necessary, the rounding has to be done so that the inclusion requirement is not violated. For example, if a a #a1 , a2 ' and b a #b1 , b2 ', then a b a #round+a1 b1 /, round+a2 b2 /'
(3.46)
and the computed interval will enclose the true value a b. Similar rules can be established for subtraction, multiplication, and division, leading to a complete set of rules for interval arithmetic in an environment of computer arithmetic. Notice that direct rounding is essential to assure that the effects of rounding are properly accounted for. With interval arithmetic we have complete control of the rounding error. If it grows rapidly, the interval become very large, if the intervals stay small we can be sure that the rounding error is not significant. Although interval arithmetic was proposed nearly forty years ago, only recently has it become accepted as a serious tool for numerical computations. There are several reasons for this. It is certainly slower than ordinary floating-point arithmetic, because every interval computation requires several floatingpoint operations. But with rapidly increasing computer speed, this objection has lost much of its force. A second difficulty is that interval arithmetic requires directed rounding. For most of the earlier year this was not available on many machines, and as a result it was quite difficult to implement interval arithmetic. The IEEE recommendations have gone a long way toward remedying this situation. The most serious objection to interval arithmetic is that the intervals tend to grow much more rapidly than necessary, and that we often end up with intervals that are so large that we cannot get any useful information from them. This is still a valid point, but more sophisticated analysis has been able to overcome it in many cases. There are now many important applications where interval arithmetic is able to solve a problem that defines treatment by standard floating-point computation. As a result of this program, it is possible to find compilers for several programming languages, such as C and FORTRAN, that provide as intervals data type and the supporting functions.
IV - Chapter 3: Numbers and Errors
67
3.3.2 Tests and Exercises The following two subsections serve to test your understanding of the last section. Work first on the test examples and then try to solve the exercises. 3.3.2.1 Test Problems T1. What is an ulp? T2. Why are errors generated on computers? T3. Are errors avoidable on computers? T4. What is a floating point number? T5. How is a standardized number defined? T6. What is a mantissa and an exponent of a number?
3.3.2.2 Exercises E1. The machine-H is the smallest number H such that 1 H 1.
(1)
Try to find the machine-H of your pocket calculator by using the following expression 2n ++1 2n / 1/.
(2)
Increase the natural number n and see what happens on your calculator. Obviously this expression should always be one. What can you say about the accuracy of your calculator? E2. The following numbers are given on a decimal computer with a four-digit normalized mantissa: a
0.4523 104 , b
0.2115 103 , c
0.2583 101
(3)
Perform the following operations, and indicate the error in the result, assuming symmetric rounding: a. a b c b. a s c c. a b d. a b c e. a b s c f. b cs a E3. Find the root of smallest magnitude of the equation x2 0.4002 100 x 0.8 104
0
(4)
use the formulas from Example 3.3 for solving the quadratic equation. Work in floating-point arithmetic using a fourdecimal-place mantissa. E4. Multiply the following numbers using the arithmetic in the number system in which they are represented and verify the results by converting the numbers to decimal system: a. +110 100.101/2 +1011.011/2 , b. +267.65/8 +34.23/8 , c. +BDA/16 +EEDD/16 , d. +165.38/10 +13.65/10 , e. +10 101.01/3 +101.01/3 , f. +1100/2 +100 111.11/2 , g. +1 230 001.23/2 i +133 221.112/2 i . E5. In the decimal number system there are some numbers with two infinite decimal expansions (e.g.,
68
Mathematics for Engineers
3.56999 ... 3.57000 ...). Does the negative decimal b 10) system have unique expansion for every number? E6. Express the following numbers in a floating-point representation with 24-bit fraction and 8-bit excess 126 exponent (i.e., the exponent offset 126): 3.14159265, 6.02486 1023 , 1.05443 107 . E7. Consider a machine with a word length of 60 bits, which represents floating-point numbers using 48 bits for fraction part, 11 bits for exponent and one bit for the sign. The exponent is represented in excess +717/8 form. What is the largest and smallest (nonzero) positive number that can be represented on this computer, if the numbers are all assumed to be normalized? If unnormalized numbers are permitted, what will be the range of numbers that can be represented? What is the smallest number x such that 0.314 x ! 0.314, on this machine? E8. Let all the numbers in the following calculations be correctly rounded to the number of digits shown a. 1.1034 0.956, b. 1 s 0.9785, c. 1.0 s0.0065, d. 2.654 1.76, e. 3.630.567 , f. tan+1.873/, g. Æ3.145 . For each of these calculations, determine the smallest interval in which the result using true instead of rounded values of the quantities must lie. E9. Consider the following expression 1048 315 1048 1032 679 1032
(5)
The correct value of this expression is 881, but most computers will not calculate it correctly. Assuming that there is no overflow, what is the value of this expression as calculated by computers using 48, 96, 100, 104 and 108-bit fraction part, respectively. What is the minimum size of fraction part needed to give the correct result. E10 Evaluate the polynomial p+x/
20 000 x4 74 641 x3 9282 x2 223 923 x 207 846
(6)
at x 1.732051 and compare the result with the exact value ,3.57230142698 109 0. Estimate the minimum number of bits required in the fraction part of a computer using binary floating-point arithmetic to give a reasonable value for the result. E11 Mathematics text books claim that a x a and a s x a, if and only if x 1, or a 0. Is this statement true with floating-point arithmetic? (Assume that there is no exponent overflow or underflow.) E12 Compute the sum +x x x … x x/ where x 1 s3 using three decimal digit floating-point arithmetic. What is the calculated value of the sum if the number of terms are 4, 30, 50, 300, 400, and 1000.
IV - Chapter 3: Numbers and Errors
69
3.4 Approximations In constructing numerical methods we often encounter the need for approximating functions of one or more variables. For example, approximations may be needed to compute the value of an elementary function, to represent the solution to a differential equation, or to solve a system of nonlinear equations. There are many ways of approximating functions, and the challenge is to find a way that is suitable to a particular application. This usually means that the approximation has to be worked with and that it gives good accuracy with a small amount of work, and that there exists a body of knowledge about it that helps us analyze the results. Polynomials score high on all these points and some sort of polynomial approximations are used in most numerical calculations.
3.4.1 Important Properties of Polynomials A function of the form pn +x/
a0 a1 x a2 x2 … an xn
(3.47) i
is a polynomial of degree n. The individual terms ai x are called monomials. For a given x, the value of pn +x/ can be computed from (3.47) by first forming xi , then multiplying by ai and adding all the monomials. This can obviously be done with only additions and multiplications, but is not often used in numerical work. A better way to evaluate (3.47) is Horner's scheme which is based on the nested form pn +x/
a0 x+a1 x+a2 x+a3 x+…/ …///
(3.48)
The polynomial in this form is conveniently evaluated recursively by p0
an
pi
x pi1 an1 i
(3.49) 1, 2, 3, …, n
(3.50)
This recursion can be organized in a table to keep track with the calculation an x p0
an p1
an1 x p0 an1 p2
an2 … a1 a0 x p1 an2 … pn1 pn pn +x/
Table 3.2. Horner's scheme represented in tabular form start with x on the left and fill in the value in each column. Transfer the result to the new column on the right of the current column and continue till the 1st column containing the constant term of the polynomial. The recursion (3.50) take fewer operations than first forming the monomials, then adding them. It also tends to be less subject to accumulation of rounding errors, so it is generally preferred over (3.47). In any case, either form shows that polynomials are easily evaluated with just a few arithmetic operations. This makes them ideal for computer work. Polynomials have a number of other desirable properties. The sum, difference, and product of two polynomials yield other polynomials. Polynomials are easily
Mathematics for Engineers
70
differentiated and integrated, with polynomials as the result. There are few functions that have these properties. To generate the Horner representation we have to rearrange the terms in the way given by (3.48) which is implemented in the Mathematica function Horner[]. The application of this function to a sixth order polynomial delivers 6
HornerForm%Å a#i' xi ) i 0
x +x +x +x +x +a+6/ x a+5// a+4// a+3// a+2// a+1// a+0/
To demonstrate the advantages of a Horner representation let us examine a polynomial of order 50. This can be done by so called Legendre polynomials. Define the derivative of a Legendre polynomial by dpoly+x_/
P50 +x/ x
1 140 737 488 355 328 ,630 570 903 409 776 208 342 578 107 850 x49 7 490 418 004 140 371 929 402 746 008 400 x47 41 737 844 651 936 814 720 022 517 706 600 x45 144 984 091 948 833 145 869 551 903 612 400 x43 351 937 190 940 312 716 989 799 378 930 100 x41 634 260 432 024 299 841 607 990 089 500 400 x39 880 125 430 955 067 757 736 930 068 014 600 x37 962 501 702 817 857 350 825 608 251 720 400 x35 842 188 989 965 625 181 972 407 220 255 350 x33 595 282 177 646 385 670 791 741 649 658 400 x31 341 736 064 945 147 329 528 592 428 507 600 x29 159 660 347 949 056 174 670 435 588 002 400 x27 60 650 197 110 518 092 326 107 025 312 600 x25 18 661 599 110 928 643 792 648 315 480 800 x23 4 619 750 073 449 067 396 810 199 429 200 x21 910 936 634 201 224 557 117 504 112 800 x19 141 096 163 449 646 194 988 309 060 950 x17 16 847 303 098 465 217 312 036 902 800 x15 1 511 937 457 554 570 784 413 568 200 x13 98 522 240 341 901 855 625 946 800 x11 4 441 576 408 856 231 196 251 700 x9 129 053 067 569 672 577 130 800 x7 2 161 175 772 697 866 124 200 x5 17 084 393 460 062 182 800 x3 40 293 380 802 033 450 x0
The following line plots the polynomial generated above. Plot uses machine-precision numbers to evaluate the polynomial by default. The plot is not very smooth due to the accumulation of representation errors associated with machine-precision computations.
IV - Chapter 3: Numbers and Errors
71
Plot#Evaluate#dpoly#x'', x, 1, 1.01`, PlotRange All, PlotPoints 200' 60 000 50 000 40 000 30 000 20 000 10 000 0 1.000
1.002
1.004
1.006
1.008
1.010
Next, the polynomial is put in Horner form before plotting. The time required to rearrange the polynomial is usually small compared to the time required to render the plot, and the time required to evaluate the polynomial is significantly reduced. The plot reveals that the Horner form of the polynomial is much more stable to evaluate using machine-precision numbers. Plot#Evaluate#HornerForm#dpoly#x''', x, 1, 1.01`, PlotRange All, PlotPoints 200' 60 000 50 000 40 000 30 000 20 000 10 000 0 1.000
1.002
1.004
1.006
1.008
1.010
Polynomials have been extensively studied and much is known about them. Since our main interest here is the use of polynomials to approximate non polynomial functions, we are most concerned about how this can be done effectively. This raises the question of what kind of functions can be approximated accurately by polynomials. The important Weierstrass Theorem guarantees that at least in principle, polynomials are suitable for a wide variety of approximations. Theorem 3.1. Weierstrass Theorem for Polynomial Approximation
Suppose that f is a continuous function on a finite interval #a, b'. Then, for any H ! 0, there exists a
Mathematics for Engineers
72
polynomial pn , such that maxaxb f +x/ pn +x/ H.
(3.51)
The Weierstrass theorem tells us that for any continuous function, there exists some polynomial that approximate that function arbitrarily closely. As stated, it does not tell us how to find the polynomial, but at least it assures us of its existence. To find an actual approximating polynomial, one sometimes can use another equally famous result, Taylor's Theorem. We will use Taylor's results in later discussions. For the moment we will use simple ideas for approximations only.
3.4.2 Polynomial Approximation by Interpolation A practical way of constructing polynomial approximations is the technique of interpolation. In its simplest form the interpolation problem can be stated as follows. Suppose that #a, b' is a finite interval and x0 , x1 , …, xn are distinct points in this interval. We also know the values of a function f at these points and want to find a polynomial pn +x/ of degree n that agrees with f at all points; that is pn +xi /
f +xi /,
i
0, 1, 2, …, n.
(3.52)
We will call this type of interpolation simple interpolation. There are many ways in which interpolating polynomials can be found. In the method of undetermined coefficients we consider the coefficients in (3.47) as unknowns that can be determined by imposing conditions on the polynomial. In simple interpolation, we have n 1 conditions specified by (3.52) to be satisfied, and n 1 coefficients in (3.47) to select. When we write this down explicitly, we get the linear system for the expansion coefficients as 1 x0 x20 … xn0 1 x1 x21 … xn1
1 xn x2n … xnn
a0 a1 an
f +x0 / f +x1 / . f +xn /
(3.53)
The coefficient matrix here is called a Vandermonde matrix. It is non singular because the system has a unique solution for any choice of the right hand side of the equation (3.53). The determinant of the Vandermonde matrix is therefore nonzero for distinct nodes x0 , x1 ,... , xn . However, the Vandermonde matrix is often ill conditioned, and the coefficients ai may therefore be inaccurately determined by solving system (3.53). Furthermore, the amount of work involved to obtain the polynomial in (3.47) is excessive. Therefore, this approach is not always recommended. Example 3.9. Interpolation To approximate Æx in #0, 1' by a second degree polynomial, we can use the interpolation points 1
0, 2 , 1 . The system to be solved then is
IV - Chapter 3: Numbers and Errors
73
1 0 0 eqs1
Thread% 1
1 2
1 4
.a0, a1, a2 1,
Æ , Æ!); TableForm#eqs1'
1 1 1 a0 1 a0
a1 2
a2 4
Æ
a0 a1 a2 Æ
The solutions follow by sol
Flatten#Solve#eqs1, a0, a1, a2''
a0 1, a1 3 4
Æ Æ, a2 2 ,1 2
Æ Æ0
and thus the approximating polynomial is given p2
a0 a1 x a2 x2 s. sol
2 ,1 2
Æ Æ0 x2 , 3 4
The function f +x/
Æ Æ0 x 1
Æx and its approximation are plotted in Figure 3.10
ex , p2
2.5
2.0
1.5
1.0 0.0
0.2
0.4
0.6
0.8
1.0
x Figure 3.10. Interpolation of the function f
Æx in the interval [0,1] by three points at x ± 0, 1 s 2, 1.
The interpolation approximation gives a good representation along the whole interval. There are only small deviations from the function to be approximated in between the interpolation points. As expected the interpolation polynomial is exact only for the interpolation points.Ô The method of undetermined coefficients is simple and intuitive, and gives results with minimum effort. But there are reasons why it is not always suitable. One of the reasons is that the system (3.53) tends to become ill-conditioned quickly as n increases and is not suitable for large n. Another reason is that it does not give a very explicit form of the polynomial, and so is hard to use for analysis.
Mathematics for Engineers
74
A second way of getting the interpolating polynomial, one that overcomes some of these limitations of the undetermined coefficients method, is the Lagrange form n
pn +x/
Å {i +x/ f +xi /
(3.54)
i 0
where {i +x/
+x x0 / +x x1 / …+x xi1 / +x xi1 / …+x xn / +xi x0 / +xi x1 / …+xi xi1 / +xi xi1 / …+xi xn /
.
(3.55)
The {i are the fundamental or Lagrange polynomials. It is an easy matter to check that (3.54) and (3.55) define a polynomial of degree n that satisfies all interpolating conditions. Example 3.10. Lagrange Interpolation Find a second order degree polynomial approximation to f +x/
Æx
in the interval #0, 1', using the Lagrange interpolation for the set of points 0, 1 s 2, 1. Solution 3.10. For this case (3.54) becomes p2 +x/
+x 0.5/ +x 1/ +0 0.5/ +0 1/
Æ0
+x 0/ +x 1/ +0.5 0/ +0.5 1/
Æ0.5
+x 0/ +x 0.5/ +1 0/ +1 0.5/
Æ1
After evaluating and collecting terms, we find p2 +x/
0.841679 x2 0.876603 x 1.
In spite of the low degree of the approximating polynomial, the result is quite good, showing a maximum error of about 0.014. Note that the answer we get here is identical to what we got in Example 3.9. This is as it should be; the different interpolation methods are theoretically equivalent, and if everything could be done precisely they would all give the same answer. In practice, though, this is not necessarily the case as rounding has varying effects.Ô A third way, that is very convenient for interpolation in tables, is the divided difference formula established by Newton. In this form, the interpolating polynomial is expressed by using the divided differences f #…' which are defined by f #xi , xi1 , …, xk '
f #xi1 , …, xk ' f #xi , …, xk1 ' xk xi
.
(3.56)
Using these quantities the interpolation polynomial can be written as pn +x/ f #x0 ' f #x0 , x1 ' +x x0 / f #x0 , x1 , x2 ' +x x0 / +x x1 / … f #x0 , x1 , x2 , …, xn ' +x x0 / +x x1 / …+x xn1 /. with f #xi '
(3.57)
f +xi /. The higher differences are shown in the table below. They serve to determine the
IV - Chapter 3: Numbers and Errors
75
coefficients by using the formula from above. x0
f #x0 '
x1
f #x1 '
x2
f #x2 '
x3
f #x3 '
f #x0 , x1 ' f #x0 , x1 , x2 ' f #x1 , x2 '
f #x0 , x1 , x2 , x3 ' f #x1 , x2 , x3 '
f #x2 , x3 '
f #x1 , x2 , x3 , x4 ' f #x2 , x3 , x4 '
f #x3 , x4 ' x4
f #x4 '
Table 3.3. A systematic scheme for calculating divided differences using equation (3.56) for a first number of steps.Ô The form of this interpolation formula is very convenient for computations. The divided differences are easy to evaluate and if we need to increase the degree of the polynomial, we only need to add a new term, saving much work. Example 3.11. Divided Differences Given the table of values x 0 1 2 3 f +x/ 6 3 6 9 let us derive the interpolating polynomial for f of degree at most 3 by using Newton’s divided difference form. We first construct the following divided difference table, like the model given in Table 3.3, but with one fewer entry in each column. xi f #xi ' f $xi , x j ( 0
6
1
3
f $xi , xk , x j ( f $xi , kk , xl , x j (
9 3 3 2
6
2 9
15 3
9
The values of the x j and f ,x j 0 are given in the first two columns, as in Table 3.3. To evaluate the divided difference formula, we require only the first numbers in columns 2, 3, 4, and 5, together with the first three numbers in the first column, delivering the interpolating polynomial p#x'
6 9 x 3 x +x 1/ 2 x +x 1/ +x 2/
6 9 x 3 +1 x/ x 2 +2 x/ + 1 x/ x
Mathematics for Engineers
76
Various other forms of constructing interpolating polynomials are known. Here we use, for the most part, either the method of undetermined coefficients or the Lagrange form. One question is, what happens when we increase the number of interpolation points and the degree of the interpolating polynomial? Does the polynomial converge to the function? A related question concerns the error of the approximation; in particular, how large the degree of the approximating polynomial needs to be to achieve a desired accuracy. The next theorem gives a practical answer to both of these questions. Theorem 3.2. Interpolating Polynomial
Assume that f is n 1 times continuously differentiable. Let x0 , x1 , …, xn be distinct points in #a, b' and let pn denote the interpolating polynomial on these points. Then f +x/ pn +x/
+x x0 / +x x1 / …+x xn / +n 1/
f +n1/ +x/
(3.58)
where a x b. From (3.58) we can draw some useful conclusions. If f is +n 1/ times continuously differentiable, then it follows easily that maxaxb + f +x/ pn +x//
ba
n1
+n 1/
maxaxb f +n1/ +x/ .
(3.59)
If we can find a bound for the +n 1/st derivative of f , then (3.59) gives us a way of guaranteeing the accuracy of the approximation. Such inequalities are called error bounds, because they allow us to put a mathematically rigorous limit on the error. Keep in mind that (3.59) is only a bound that guarantees that the error is smaller than the right side; in an actual computation the true error may be in fact much smaller. If that is the case, we say that the error bound is pessimistic or unrealistic. If a bound is close to the actual error we say that it is realistic. For many numerical methods finding realistic error bounds is quite complicated. Note that the bound (3.59) can be used only if f is +n 1/ times differentiable. If the function is less smooth; i.e. n the sense that it has fewer derivatives, then (3.59) is not applicable. While it is possible to derive other bounds for this case, we will not do so here. The bound (3.59) does not immediately guarantee convergence of the approximation. If the higher derivatives of f remain bounded then (3.59) indicates fast convergence, but this is the exception rather than the rule. For many functions, the magnitude of the higher derivatives increases, sometimes so much that the right side of (3.59) goes to infinity. Since the expression is only a bound, this does not necessarily mean that the approximation does not converge, but should be taken as a cautionary sign. The fact is that for many relatively simple functions approximation by interpolation does not work very well. The situation is fairly complicated so we will not go into it.
IV - Chapter 3: Numbers and Errors
77
The interpolating approximation and its error depend on the choice of the interpolation points. The error is zero at the interpolation points and generally larger at points distant from any of these points. If there is no clear reason for the choice of the interpolation points, it may be better to find an approximation that keeps the overall error small.
3.4.3 Polynomial Approximation by Least Squares Instead of asking that an approximation agree with the given function f at certain selected points, we can require instead that it agrees with the function closely on the whole interval, say by minimizing the square of 2-norm of the difference between the function and its approximation. The 2-norm of an integrable function f over an interval #a, b' is b
«« f ««22
Ã
f +x/ 2 Å x.
(3.60)
a
Using this kind of norm we can formulate the least square approximation by b
«« f pn ««22
2 Ã + f +x/ pn +x// Å x
(3.61)
a
with respect to all polynomials of degree n. This gives us the least squares approximation of the function by a polynomial. We can find a solution of the least square problem by writing the approximation as n
pn +x/
Å ai xi
(3.62)
i 0
inserting this expression into (3.61) delivers b
«« f pn ««22
Ã
a
2
n
f +x/ Å ai xi
Å x.
(3.63)
i 0
The minimum value of the error follows by differentiating with respect to the unknown coefficients a j d d aj
«« f pn ««22
0
d d aj
b
Ã
a
2
n
f +x/ Å ai xi
Åx
(3.64)
Åx
(3.65)
i 0
Equation (3.64) will vanish if (3.63) is extremal. 0
d d aj
b
Ã
a
n
n
f +x/2 2 Å ai xi f +x/ Å ai xi i 0
i 0
2
Mathematics for Engineers
78
b
d
0
d aj
n
d
2 Ã f +x/ Å x 2
d aj
a
a
i 0
n
b
b
Šai à xi f +x/ Šx
a
d aj
b
Ã
a
n
Å ai xi
2
Åx
(3.66)
i 0
b
2 à x j f +x/ Šx 2 Šai à xi x j Šx
0
d
(3.67)
a
i 0 n1
b j Å ai c j,i with j
0
0, 1, 2, …, n
(3.68)
i 1 n1
Å ai c j,i
b j with j
0, 1, 2, …, n
(3.69)
i 1
That is we set the derivative of (3.61) with respect to each coefficient ai to zero. This leads to the system of determining the coefficients, c11 c21
c12 c22
c13 c23
cn1,1 cn1,2 cn1,3
… …
c1,n1 c2,n1
… cn1,n1
a0 a1 an
b0 b1 . bn
(3.70)
where b
ci, j
i j2 Åx à x a
1 i j1
,bi j1 ai j1 0
(3.71)
and b
bi
i à x f +x/ Šx.
(3.72)
a
While this looks like an easy and straightforward solution to the problem, there are some issues of concern. One point is that the integral for bi may not have a closed form answer, so that its value would have to be determined numerically. Even though this is only a minor complication, it can be avoided by using a discrete least square approach. For this we use m sample points t1 , t2 , …, tm and try to make the discrepancy, Ui
pn +ti / f +ti /,
(3.73)
as small as possible for all i 1, 2, …, m. If m ! n 1 we cannot of course make all residuals vanish, but we can minimize the sum of the square of the Ui . When we do this, we see that we need to find the least square solution of the system
IV - Chapter 3: Numbers and Errors
c11 c21
c12 c22
c13 … c1,n1 c23 … c2,n1
cm,1 cm,2 cm,3
… cm,n1
79
a0 a1 an
b0 b1 . bm
(3.74)
where cij
ti
j1
(3.75)
bi
f +ti /.
(3.76)
and
In this version, there is no need to evaluate integrals. When m is much larger than n, the results of the discrete and the continuous least squares methods are not much different. There is, however, a much more serious problem in the conditioning of the matrices that arise from this approach. Example 3.12. Conditioning of Least Square Approximation If we take a
0, b
1, then in the equation of least square approximation is determined by
1
cij
(3.77)
i j1
and the matrix is a Hilbert matrix, which is very ill-conditioned. We should therefore expect that any attempt to solve the full least square problem (3.70) or the discrete version via (3.74) is likely to yield disappointing results.Ô The situation in this example is typical of similar cases. Least squares polynomial approximations, using form (3.47), are invariably very ill-conditioned and the method should be avoided except for very small n (say n 4). If higher degree least square polynomials are needed, we have to take a different path.
3.4.4 Least Squares Approximation and Orthogonal Polynomials Some of the difficulties in polynomial approximation come from the unsuitability of the monomial form (3.47). More effective ways can be found by relying on the concept of orthogonal polynomials. Definition 3.1. Orthogonal Polynomials
A sequence of polynomials p0 , p1 , …, pn , … is said to be orthogonal over the interval +a, b/ with respect to the weight function w+x/ if b
à w+x/ pi +x/ p j +x/ Šx a
0,
(3.78)
Mathematics for Engineers
80
for all i j. The sequence is said to be orthogonal if b
à w+x/ pi +x/ pi +x/ Šx
1,
(3.79)
a
for all i. Orthogonal polynomials are a major topic in approximation theory and many important results have been developed. Here we list only a few of the many types of orthogonal polynomials that have been studied, with recipes for computing the individual terms. As we will see, orthogonal polynomials are quite important in numerical analysis. Constructing algorithms with orthogonal polynomials often requires a knowledge of their special properties. 3.4.4.1 Legendre Polynomials
The Legendre polynomials Pn +x/ are orthogonal with respect to w+x/ 1 over +1, 1/. Legendre polynomials are useful if we have to deal with data or functions which are defined on a finite interval. The first few Legendre polynomials are tabulated in the following table n Pn +x/ 0 1 1 x 2 3 4 5
1 2 1 2 1 8 1 8
,3 x2 10 ,5 x3 3 x0 ,35 x4 30 x2 30 ,63 x5 70 x3 15 x0
Successive Legendre polynomials can be generated by the three-term recurrence formula Pn1 +x/
2n1 n1
x Pn +x/
n n1
Pn1 +x/.
(3.80)
These polynomials are orthogonal, but not orthonormal, so formula (3.79) will not be normalized to 1. However, they can be normalized by multiplying with a suitable constant. The following Figure 3.11 shows the first 5 Legendere polynomials.
IV - Chapter 3: Numbers and Errors
81
1.0
Pn +x/
0.5
0.0
0.5
1.0 1.0
0.5
0.0
0.5
1.0
x Figure 3.11. The first 5 Legendre polynomials are shown on the interval x ± #1, 1'.
We can verify the orthogonality of the polynomials by determining the matrix 1
à Pi +x/ P j +x/ Šx
ai, j
(3.81)
1
of the integrals with indices +i, j/ specifying the order of the respective Legendre polynomials. The matrix up to order 5 5 is shown in the next line 1
ParallelTable%Ã Pi +x/ P j +x/ Å x, i, 1, 5, j, 1, 5); MatrixForm#leg1'
leg1
1
2 3
0 0 0
0
0
2 5
0 0
0
2 7
0
0
0 0 0
2 9
0
0 0 0 0
2 11
0 0
It is obvious that the off diagonal elements vanish and that the diagonal elements i value different from 1.
j show a finite
3.4.4.2 Chebyshev Polynomials of the First Kind
The Chebyshev polynomials of the first kind Tn +x/ are orthogonal with respect to w+x/
1v
1 x2
along +1, 1/. Chebyshev polynomials are also useful to find approximations in a finite interval. The first few Chebyshev polynomials of the first kind are listed in the following table
Mathematics for Engineers
82
n Tn +x/ 0 1 1 x 2 2 x2 1 3 4 x3 3 x 4 8 x4 8 x2 1 5 16 x5 20 x3 5 x
Successive Chebyshev polynomials of the first kind can be generated by the three-term recurrence formula Tn1 +x/
2 x Tn +x/ Tn1 +x/.
(3.82)
Chebyshev polynomials of the first kind have many properties that make them attractive for numerical work. One of these is the identity Tn +x/
cos+n arcos+x//.
(3.83)
In spite of its appearance, the right side of (3.83) is a polynomial. From (3.83) it follows the important fact that the roots of Tn are explicitly given as zi
cos
+2 i 1/ S 2n
, i
0, 1, 2, …, n 1.
(3.84)
The following Figure 3.12 shows the first 5 Chebyshev polynomials of the first kind. 1.0
Tn +x/
0.5
0.0
0.5
1.0 1.0
0.5
0.0
0.5
1.0
x Figure 3.12. The first 5 Chebyshev polynomials of first kind are shown on the interval x ± #1, 1'.
The orthogonality of the polynomials can be verified by determining the matrix
IV - Chapter 3: Numbers and Errors
1
Ã
ai, j
Ti +x/ T j +x/
1
83
Åx
(3.85)
1 x2
of the integrals with indices +i, j/ specifying the order of the respective Chebyshev polynomials. The matrix of order 5 5 is shown in the next line 1
ParallelTable%Ã
leg1
Ti +x/ T j +x/
1
S 2
0 0 0 0
0
S 2
0 0 0
0 0
S 2
0 0
0 0 0
0
S 2
Å x, i, 1, 5, j, 1, 5); MatrixForm#leg1'
1 x2
S 2
0 0 0 0
Obviously the off diagonal elements vanish and that the diagonal elements i j show a finite value S s 2 different from 1. Contrary to the Legendre polynomials the diagonal elements are all the same for different polynomial orders. 3.4.4.3 Chebyshev Polynomials of the Second Kind
The Chebyshev polynomials of the second kind Un +x/ are orthogonal with respect to w+x/ 1 x2 along +1, 1/. The Chebyshev polynomials of the first kind are related to those of the second kind by Tn' +x/
n Un1 +x/.
(3.86)
An explicit form for Chebyshev polynomials of the second kind is Un +x/
sin++n 1/ arcos+x//
.
1x
2
The first few Chebyshev polynomials of second kind are n Un 0 1 1 2x 2 4 x2 1 3 8 x3 4 x 4 16 x4 12 x2 1 5 32 x5 32 x3 6 x
The following Figure 3.13 shows the first 5 Chebyshev polynomials of the first kind.
(3.87)
Mathematics for Engineers
84
6 4
Un +x/
2 0 2 4 6 1.0
0.5
0.0
0.5
1.0
x Figure 3.13. The first 5 Chebyshev polynomials of the second kind Un are shown on the interval #1, 1'.
The orthogonality of the polynomials can be verified by determining the matrix 1
ai, j
à Ui +x/ U j +x/ 1
1 x2 Å x
(3.88)
of the integrals with indices +i, j/ specifying the order of the respective Chebyshev polynomials. The matrix of order 5 5 is shown in the next line 1
ParallelTable%Ã
leg1
1
S 2
0 0 0 0
0
S 2
0 0 0
0 0
S 2
0 0
0 0 0
0
S 2
0 0 0 0
1 x2 Ui +x/ U j +x/ Å x, i, 1, 5, j, 1, 5); MatrixForm#leg1'
S 2
Again the off diagonal elements vanish and the diagonal elements i j show a finite value S s 2 different from 1. Contrary to the Legendre polynomials the diagonal elements have all the same finite value for all orders of the polynomial. 3.4.4.4 Laguerre Polynomials
These polynomials are orthogonal with respect to w+x/ Æx along the interval +0, /. Laguerre polynomials are polynomials which are useful in approximating functions in semi infinite intervals. The Laguerre polynomials can be generated from the three-term recurrence formula
IV - Chapter 3: Numbers and Errors
1
Ln1 +x/
n1
85
++2 n 1 x/ Ln +x/ n Ln1 +x//
starting with L0 +x/ table
1 and L1 +x/
(3.89)
1 x. The first few Laguerre polynomials are listed in the following
n Ln 0 1 1 1x 1 ,x2 4 x 20 2 1 , x3 9 x2 18 x 60 6 1 ,x4 16 x3 72 x2 96 x 240 24 1 , x5 25 x4 200 x3 600 x2 600 x 1200 120 1 ,x6 36 x5 450 x4 2400 x3 5400 x2 4320 720
2 3 4 5 6
x 7200
Orthogonality of Laguerre's polynomials are checked by verifying the integrals
ai, j
Ã
Li +x/ L j +x/ Æx Å x
(3.90)
0
for different polynomial orders i and j. The matrix elements with indices +i, j/ specifying the order of the respective Laguerre polynomials are calculated for a matrix of order 5 5 is shown in the next line
ParallelTable%Ã
leg1
Æx Li +x/ L j +x/ Å x, i, 1, 5, j, 1, 5); MatrixForm#leg1'
0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
The off diagonal elements vanish and the diagonal elements i Laguerre polynomials are orthogonal and normalized to 1.
j are equal to 1. This means that
A series of Laguerre polynomials with different polynomial order is shown in the following Figure 3.14.
Mathematics for Engineers
86
15 10 5 Ln +x/
0 5 10 15 0
2
4
6
8
10
x Figure 3.14. The first 5 Laguerre polynomials are shown on the interval x ± #0, 10'.
3.4.4.5 Hermite Polynomials
The weight w+x/
Æx along +, / leads to the Hermite polynomials. With 2
n Hn 0 1 1 2x 2 4 x2 2 3 8 x3 12 x 4 16 x4 48 x2 12 5 32 x5 160 x3 120 x
successive Hermite polynomials can be generated by Hn1 +x/
2 x Hn +x/ 2 n Hn1 +x/.
(3.91)
Orthogonality of Hermite's polynomials are checked by verifying the integrals
ai, j
Ã
Hi +x/ H j +x/ Æx Å x 2
(3.92)
for different polynomial orders i and j. The matrix elements with indices +i, j/ specifying the order of the respective Hermite polynomials are calculated for a matrix of order 5 5 is shown in the next line
IV - Chapter 3: Numbers and Errors
ParallelTable%Ã
leg1
S
2
0
0
S
8
87
Æx Hi +x/ H j +x/ Å x, i, 1, 5, j, 1, 5); MatrixForm#leg1' 2
0
0
0
0
0
0
S
0
0
0
0
48 0
0
0
0
0 384 0
0 S
0 3840
S
The off-diagonal elements vanish and the diagonal elements i j are equal to a finite value which increases with the order of the polynomial. This means that Hermite polynomials are orthogonal but not normalized. A series of Hermite polynomials with different polynomial order is shown in the following Figure 3.15.
100
Hn +x/
50
0
50
100
3
2
1
0
1
2
3
x Figure 3.15. The first 5 Hermite polynomials are shown on the interval x ± #3, 3'.
Orthogonal polynomials have many uses, but here we are interested only in orthogonal polynomials for the solution of the least square problem. For simplicity, assume that a 1 and b 1. Instead of the standard monomial form, we write the approximation as an expansion in Legendre polynomials n
pn +x/
Å ai Pi +x/.
(3.93)
i 0
We substitute this into the least square approximation (3.61) and repeat the process leading to (3.70) to find 1
cij
à Pi +x/ P j +x/ Šx.
(3.94)
1
Because of the orthogonality of the Legendre polynomials, the matrix in (3.70) is diagonal, making it
Mathematics for Engineers
88
well-conditioned and leading to the immediate solution 1
¼1 f +x/ Pi +x/ Å x
ai
1
¼1 P2i +x/ Å x
.
Example 3.13. Polynomial Approximation 2
Approximate f +x/
Æ x on #1, 1' by a polynomial of degree two, using a least square method.
Solution 3.13. Writing the solution as p2 +x/
a0 P0 +x/ a1 P1 +x/ a2 P2 +x/,
we can determine the coefficients by (3.95) as 1
2
x ¼1 Æ P0 +x/ Å x
a0
1
¼1 P0 +x/ P0 +x/ Å x Æ F+1/
For the first polynomial order we get 1
a1
2
¼1 Æ x P1 +x/ Å x 1
¼1 P1 +x/ P1 +x/ Å x 0
The second order Legendre polynomial delivers 1
a2
2
¼1 Æ x P2 +x/ Å x 1
¼1 P2 +x/ P2 +x/ Å x 5 4
Æ +3 5 F+1//
The result obtained this way is p2 5 8
a0 P0 +x/ a1 P1 +x/ a2 P2 +x/
Æ +3 5 F+1// ,3 x2 10 Æ F+1/
If we convert this symbolic formula to numeric values we find Simplify#N#p2'' 1.57798 x2 0.93666
The approximation and the original function are shown in the following Figure 3.16
(3.95)
IV - Chapter 3: Numbers and Errors
89
2.5
2
e x , p2
2.0
1.5
1.0 1.0
0.5
0.0
0.5
1.0
x Figure 3.16. Interpolation of the function f
2
Æx in the interval [-1,1] by using a Legendre polynomial approximation.Ô
It is worthwhile to note that the approximation by orthogonal polynomials gives in theory the same result as (3.70). But because the coefficients of the Legendre polynomials are integers, there is no rounding in this step and the whole process is much more stable. Least square approximation by polynomials is a good example of the stabilizing effect of a rearrangement of the computation. We can generalize the least square process by looking at the weighted error of the approximation «« f pn ««2w
b
2 Ã w+x/ + f +x/ pn +x// Å x,
(3.96)
a
where the weight function w+x/ is positive throughout +a, b/. This weighs pronounce the error more heavily where w+x/ is large, something that is often practically justified. If we know the polynomials Ii +x/ that are orthogonal with respect to this weight function, we write n
pn +x/
Å ai Ii +x/,
(3.97)
i 0
and, repeating the steps leading to (3.70), we find that b
cij
à w+x/ Ii +x/ I j +x/ Šx.
(3.98)
a
The orthogonality condition then makes the matrix in (3.70) diagonal and we get the result that b
ai
¼a w+x/ Ii +x/ f +x/ Å x b
¼a w+x/ I2i +x/ Å x
.
(3.99)
90
Mathematics for Engineers
Example 3.14. Weighted Approximation Find a second degree polynomial approximation to f +x/ square approximation with w+x/
1v
2
Æ x on +1, 1/, using a weighted least-
1 x2 .
Solution 3.14. Here we use the first kind of Chebyshev polynomials as an example. Using symmetry and (3.99) we find that the approximation is p2 +x/
a0 a1 T1 +x/ a2 T2 +x/,
(3.100)
where the coefficients a0 , a1 , and a2 follow by the integral ratios 2
1 Æ x T0 +x/
¼1 a0
1 T0 +x/ T0 +x/
¼1 Æ I0
Åx
1x2
Åx
1x2
1 2
where I0 +x/ is Bessel's function of order zero. The expansion coefficient a1 is 2
1 Æ x T1 +x/
¼1 a1
1 T1 +x/ T1 +x/
¼1
Åx
1x2
Åx
1x2
0
and the third coefficient is 2
1 Æ x T2 +x/
¼1 a2
1 T2 +x/ T2 +x/
¼1 2
Æ I1
Åx
1x2
Åx
1x2
1 2
here again I1 +x/ is Bessel's function. The total approximation is generated by these results by p2 2
a0 a1 T1 +x/ a2 T2 +x/ Æ ,2 x2 10 I1
1 2
Æ I0
1 2
The numerical representation is gained by Simplify#N#p2'' 1.70078 x2 0.902996
IV - Chapter 3: Numbers and Errors
91
The approximation of the function is graphically shown in the next plot
2.5
2
e x , p2
2.0
1.5
1.0 1.0
0.5
0.0
0.5
1.0
x Figure 3.17. Interpolation of the function f approximation.
2
Æx in the interval [-1,1] by using a weighted Chebyshev polynomial
Compared with the solution by Legendre polynomials the errors at the end points are much smaller but at the center the deviation is larger.Ô An advantage of least square methods over interpolation is that they converge under fairly general conditions. The proof of this is straightforward. Theorem 3.3. Convergence in the Mean
Let pn +x/ denote the nth degree least squares polynomial approximation to a function f , defined on a finite interval +a, b/. Then limn «« f pn ««2
0.
(3.101)
We call this behavior convergence in the mean. Proof 3.3. Given any H ! 0, the Weierstrass theorem tells us that there exists some polynomial pn such that max, pn +x/ f +x/ 0 H. This implies that b
2
2 Ã , f +x/ pn 0 Å x b a H . a
But then, because pn minimizes the square of the deviation, we must have
Mathematics for Engineers
92
b
2 2 Ã + f +x/ pn / Å x b a H a
and convergence follows.
QED Under suitable assumptions, almost identical arguments lead to the convergence of the weighted leastsquare method. But note that the arguments only prove convergence in the mean and do not imply that the error is small at every point of the interval.
3.4.5 Local Quadratic Approximation Recall that the local linear approximation of a function f at x0 is f +x/ f +x0 / f ' +x0 / +x x0 /.
(3.102)
In this formula, the approximating function p+x/
f +x0 / f ' +x0 / +x x0 /
(3.103)
is a first-degree polynomial satisfying p+x0 / f +x0 / and p' +x0 / f ' +x0 /. Thus, the local linear approximation of f at x0 has the property that its value and the values of its first derivatives match those of f at x0 . If the graph of a function f has a pronounced bend at x0 , then we can expect that the accuracy of the local linear approximation of f at x0 will decrease rapidly as we progress away from x0 (Figure 3.18).
1.5
y
1.0 0.5 0.0
f
local linear approximation
0.5 1.0 0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
x Figure 3.18. Graph of the function f +x/
x3 x and its linear approximation.
One way to deal with this problem is to approximate the function f at x0 by a polynomial p of degree 2 with the property that the value of p and the values of its first two derivatives match those of f at x0 . This ensures that the graphs of f and p not only have the same tangent line at x0 , but they also bend in
IV - Chapter 3: Numbers and Errors
93
the same direction at x0 . As a result, we can expect that the graph of p will remain close to the graph of f over a larger interval around x0 than the graph of the local linear approximation. The polynomial p is called the local quadratic approximation of f at x x0 . To illustrate this idea, let us try to find a formula for the local quadratic approximation of a function f at x 0. This approximation has the form f +x/ c0 c1 x c2 x2
(3.104)
which reads in Mathematica as f +x/ c2 x2 c1 x c0
eq11
f +x/ c2 x2 c1 x c0
where c0 , c1 , and c2 must be chosen so that the values of p+x/
c0 c1 x c2 x2
(3.105)
and its first two derivatives match those of f at 0. Thus, we want p+0/
f +0/, p' +0/
f ' +0/, p'' +0/
f '' +0/.
(3.106)
In Mathematica notation this reads eq1
p0 f +0/, pp0 f
+0/, ppp0 f
+0/
p0 f +0/, pp0 f
+0/, ppp0 f
+0/
where p0, pp0, and ppp0 is used to represent the polynomial, its first and second order derivative at x 0. But the values of p+0/, p' +0/, and p'' +0/ are as follows: p+x_/ : c0 c1 x c2 x2 ; p+x/ c0 c1 x c2 x2
The determining equation for the term p0 is p+0/ p0
eqh1 c0 p0
The first order derivative allows us to find a relation for the second coefficient pd
p+x/ x
c1 2 c2 x eqh2
+pd s. x 0/ pp0
c1 pp0
Mathematics for Engineers
94
The second order derivative in addition determines the higher order coefficient 2 p+x/
pdd
xx
2 c2 +pdd s. x 0/ ppp0
eqh3
2 c2 ppp0
Knowing the relations among the coefficients allows us to eliminate the initial conditions for the p coefficients which results to sol
Flatten#Solve#Eliminate#Flatten#eq1, eqh1, eqh2, eqh3', p0, pp0, ppp0', c0, c1, c2''
c0 f +0/, c1 f
+0/, c2
f
+0/ 2
!
and substituting these in the representation of the approximation of the function yields the following formula for the local quadratic approximation of f at x 0. eq11 s. sol f +x/
1 2
x2 f
+0/ x f
+0/ f +0/
Remark 3.1. Observe that with x 0. Formula (3.102) becomes f +x/ f +0/ f ' +0/ x and hence the linear part of the local quadratic approximation if f at 0 is the local linear approximation of f at 0. Example 3.15. Local Approximation Find the local linear and quadratic approximation of Æ x at x tions together. Solution 3.15. If we f +0/ f ' +0/ f '' +0/ Æ0 1
let
f +x/
Æx ,
Thus, the local quadratic approximation of ex at x Æx 1 x
1 2
then
0, and graph Æ x and the two approximaf ' +x/
f '' +x/
Æx ;
and
hence
0 is
x2
and the actual linear approximation (which is the linear part of the quadratic approximation) is Æ x 1 x. The graph of Æx and the two approximations are shown in the following Figure. As expected, the local quadratic approximation is more accurate than the local linear approximation near x 0.
IV - Chapter 3: Numbers and Errors
95
6
f
4
2
0 2
1
0
1
2
x Figure 3.19. Linear and quadratic approximation of the function f +x/
Æx .The quadratic approximation is dashed.Ô
3.4.6 Maclaurin Polynomial It is natural to ask whether one can improve on the accuracy of a local quadratic approximation by using a polynomial of order 3. Specifically, one might look for a polynomial of degree 3 with the property that its value and values of its first three derivatives match those of f at a point; and if this provides an improvement in accuracy, why not go on polynomials of even higher degree? Thus, we are led to consider the following general problem. Given a function f that can be differentiated n times at x x0 , find a polynomial p of degree n with the property that the value of p and the values of its first n derivatives match those of f at x0 . We will begin by solving this problem in the case where x0 p+x/
c0 c1 x c2 x2 c3 x3 … cn xn
0. Thus, we want a polynomial (3.107)
such that p' +0/, …, f +n/ +0/
f +0/
p+0/, f ' +0/
p+x/
c0 c1 x c2 x2 c3 x3 … cn xn
p+n/ +0/.
But
p' +x/
c1 2 c2 x 3 c3 x2 … n cn xn1
p'' +x/
2 c2 3 2 c3 x … n +n 1/ cn xn2
p+n/ +x/
n +n 1/ +n 2/ … +1/ cn
n cn
(3.108)
Mathematics for Engineers
96
Thus, to satisfy (3.108), we must have f +0/
p+0/
c0
f ' +0/
p' +0/
f '' +0/
p'' +0/
f ''' +0/
p''' +0/
2 3 c3
p+n/ +0/
n +n 1/ +n 2/ …+1/ cn
c1 2 c2
2 c2 3 c3
f +n/ +0/
n cn
which yields the following values for the coefficients of p+x/: f +0/, c1
c0
f ' +0/, c2
f '' +0/ 2
f +n/ +0/
, …, cn
n
.
The polynomial that results by using these coefficients in (3.107) is called the nth order Maclaurin polynomial for f . Definition 3.2. Maclaurin Polynomial
If f can be differentiated n times at x pn +x/
f +0/ f ' +0/ x
f '' +0/ 2
0, then we define the nth Maclaurin polynomial for f to be
x2 …
f +n/ +0/ n
xn .
(3.109)
The polynomial has the property that its value and the values of its first n derivatives match the values of f and its first n derivatives at x 0. Remark 3.2. Observe that p1 +x/ is the local linear approximation of f at 0 and p2 +x/ is the local quadratic approximation of f at x 0. Example 3.16. Maclaurin Polynomial Find the Maclaurin polynomials p0 , p1 , p2 , p3 , and pn for Æ x . Solution 3.16. For the exponential function, we know that the higher order derivatives are equal to the exponential function Table%
n Æ x xn
, n, 1, 8)
Æx , Æx , Æx , Æx , Æx , Æx , Æx , Æx
and thus the expansion coefficients of the polynomial defined as f +n/ +0/ follow as
IV - Chapter 3: Numbers and Errors
Table%
n Æ x xn
97
, n, 1, 8) s. x ! 0
1, 1, 1, 1, 1, 1, 1, 1
Therefore, we define a function generating the expansion coefficients and adding them up to a sum xm p$n_, x_, f_( : Fold%Plus, 0, Table%
m
+x,m f s. x ! 0/, m, 0, n))
The application of this function to our problem with the exponential Æ x delivers the sequence of approximations +TableForm#Ó1, TableHeadings None, "n", "pn "' &/$n, p#n, x, Æx 'n 0 ( 5
n pn 0 1 1 x1 2
x2 2 3
x1
x2 2
x1
x3 6
3
x 6
4
x4 24
5
x5 120
x4 24
x2 2
x1
x3 6
x2 2
x1
Figure 3.20 shows the graph of Æ x and the graphs of the first four Maclaurin polynomials. Note that the graphs of p1 +x/, p2 +x/, and p3 +x/ are virtually indistinguishable from the graph of Æ x near x 0, so that these polynomials are good approximations of Æx for x near 0.
6
f
4
2
0 2
1
0
1
2
x Figure 3.20. Maclaurin approximations of the function f +x/
Æx .The approximations are shown by dashed curves.Ô
Mathematics for Engineers
98
However, the farther x is from 0, the poorer these approximations become. This is typical of the Maclaurin polynomials for a function f +x/; they provide good approximations of f +x/ near 0, but the accuracy diminishes as x progresses away from 0. However, it is usually the case that the higher the degree of the polynomial, the larger the interval on which it provides a specified accuracy.
3.4.7 Taylor Polynomial Up to now we have focused on approximating a function f in the vicinity of x 0. Now we will consider the more general case of approximating f in the vicinity of an arbitrary domain value x0 . The basic idea is the same as before; we want to find an nth-degree polynomial p with the property that its first n derivatives match those of f at x0 . However, rather than expressing p+x/ in powers of x, it will simplify the computations if we express it in powers of x x0 ; that is, p+x/
c0 c1 +x x0 / c2 +x x0 /2 … cn +x x0 /n .
(3.110)
We will leave it as an exercise for you to imitate the computations used in the case where x0 show that c0
f +x0 /, c1
f ' +x0 /, c2
f '' +x0 / 2
, …, cn
f +n/ +x0 / n
0 to
(3.111)
.
Substituting these values in (3.110), we obtain a polynomial called the nth Taylor polynomial about x x0 for f . Definition 3.3. Taylor Polynomial
If f can be differentiated n times at x0 , then we define the nth Taylor polynomial for f about x to be pn +x/
f +x0 / f ' +x0 / +x x0 /
f '' +x0 / 2
+x x0 /2 …
f +n/ +x0 / n
+x x0 /n .
x0
(3.112)
Remark 3.3. Observe that the Maclaurin polynomials are special cases of the Taylor polynomials; that is, the nth-order Maclaurin polynomial is the nth-order Taylor polynomial about x0 0. Observe also that p1 +x/ is the local linear approximation of f at x x0 and p2 +x/ is the local quadratic approximation of f at x x0 . Example 3.17. Taylor Polynomial Find the five Taylor polynomials for ln+x/ about x
2.
Solution 3.17. In Mathematica Taylor series are generated with the function Series[]. The following line generates the table of the first five Taylor polynomials at x 2
IV - Chapter 3: Numbers and Errors
99
Table#m, Normal#Series#Log#x', x, 2, m'', m, 0, 4' ss TableForm#Ó, TableHeadings None, "m", "Tm "' & m Tm 0 log+2/ 1 2 3 4
x2 log+2/ 2 1 x2 8 +x 2/2 2 log+2/ 1 1 x2 +x 2/3 8 +x 2/2 2 log+2/ 24 1 1 1 64 +x 2/4 24 +x 2/3 8 +x 2/2
x2 2
log+2/
The graph of ln+x/ and its first four Taylor polynomials about x 2 are shown in Figure 3.21. As expected these polynomials produce the best approximations to ln+x/ near 2. 1.2 1.0
f
0.8 0.6 0.4 0.2 0.0 1.0
1.5
2.0
2.5
3.0
x Figure 3.21. Taylor approximations of the function f +x/
ln+x/. The approximations are shown by dashed curves.Ô
3.4.8 nth -Remainder The nth Taylor polynomial pn for a function f about x x0 has been introduced as a tool to obtain good approximations to values of f +x/ for x near x0 . We now develop a method to forecast how good these approximations will be. It is convenient to develop a notation for the error in using pn +x/ to approximate f +x/, so we define Rn +x/ to be the difference between f +x/ and its nth Taylor polynomial. That is n
Rn +x/
f +x/ pn +x/
This can also be written as
f +x/
6 k 0
f +k/ +x0 / k
+x x0 /k .
(3.113)
Mathematics for Engineers
100
n
f +x/
6 k 0
pn +x/ Rn +x/
f +k/ +x0 / k
+x x0 /k Rn +x/
(3.114)
which is called Taylor's formula with remainder. Finding a bound for Rn +x/ gives an indication of the accuracy of the approximation pn +x/ f +x/. The following theorem, which is given without proof states Theorem 3.4. Remainder Estimation
If the function f can be differentiated n 1 times on an interval I containing the number x0 , and if M is an upper bound for f +n1/ +x/ on I, that is f +n1/ +x/ M for all x in I, then Rn +x/
M +n 1/
x x0
n1
(3.115)
for all x in I. Proof 3.4. We are assuming that f can be differentiated n 1 times on the interval I containing the number x0 and that f +n1/ +x/ M
(3.116)
for all x in I. We want to show that Rn +x/
M +n 1/
x x0
n1
(3.117)
+x x0 /k .
(3.118)
for all x in I, where n
Rn +x/
f +x/
6 k 0
f +k/ +x0 / k
In our proof we will need the following two properties of Rn +x/: Rn +x0 /
R'n +x0 /
…
R+n/ n +x0 /
0
(3.119)
and R+n1/ +x/ n
f +n1/ +x/ for all x in I.
(3.120)
These properties can be obtained by analyzing what happens if the expression for Rn +x/ in Formula (3.118) is differentiated j times and x0 is then substituted in that derivatives. If j n, then the jth derivative of the summation in (3.118) consists of a constant term f + j/ +x0 / plus terms involving + j/ powers of x x0 . Thus Rn +x0 / 0 for j n, which proves all but the last equation in (3.119). For the last equation, observe that the nth derivative of the summation in (3.118) is the constant f +n/ +x0 /, so R+n/ 0. Formula (3.120) follows from the observation that the +n 1/-st derivative of the n +x0 / summation in (3.118) is zero.
IV - Chapter 3: Numbers and Errors
101
Now to the main part of the proof. For simplicity we will give the proof for the case where x x0 and leave the case where x x0 for the reader. It follows from (3.116) and (3.120) that R+n1/ +x/ M . n Thus calculating the mean value of this expression delivers x
x
x
x0
x0
x0
+n1/ à M Št à Rn +t/ Št à M Št.
However, it follows from (9) that R+n/ n +x0 / x
+n1/ Ã Rn +t/ Å t
R+n/ n +t/
x0
0, so
R+n/ n +x/.
x x0
(3.121)
(3.122)
Thus, performing the integration on the parts where M is involved, we obtain the inequalities M +x x0 / R+n/ n +x/ M +x x0 /.
(3.123)
Since the integration allows us to estimate the mean value of the nth order derivatives of R+n/ n +x/ we will repeat this step again to find M 2
+x/ +x x0 /2 R+n1/ n
M
+x x0 /2 .
2
(3.124)
If we keep repeating this process, then after n 1 integrations, we will obtain M +n 1/
+x x0 /n1 Rn +x/
M +n 1/
+x x0 /n1
(3.125)
which we can rewrite as Rn +x/
M +n 1/
+x x0 /n1 .
(3.126)
This completes the proof of (3.117), since the absolute value sign can be omitted in that formula when x x0 . So the receipt at the end is: find a constant M which is greater than the n 1-st order derivative of f and use the relation derived to estimate the magnitude of the error.
QED Example 3.18. Remainder of an Approximation Use the nth Maclaurin polynomial for Æ x to approximate Æ to five decimal-place accuracy. Solution 3.18. We note first that the exponential function Æ x has derivatives of all orders for every real number x. The Maclaurin polynomial is n
6 k 0
1 k
xk
1x
from which we have
x2 2
…
xn n
(3.127)
Mathematics for Engineers
102
n
Æ
Æ1
6 k 0
1k k
11
1 2
…
1 n
(3.128)
.
Thus, our problem is to determine how many terms to include in a Maclaurin polynomial for Æ x to achieve five decimal-place accuracy; that is, we want to choose n so that the absolute value of nth remainder at x 1 satisfies Rn +1/ 0.000005. To determine n we use the Remainder Estimation Theorem with f +x/ the interval [0,1]. In this case it follows from the Theorem that Rn +1/
Æx , x
1, x0
0, and I being
M
(3.129)
+n 1/
where M is an upper bound on the value of f +n1/ +x/ Æ x for x in the interval #0, 1'. However, Æ x is an increasing function, so its maximum value on the interval #0, 1' occurs at x 1; that is, Æx Æ on this interval. Thus, we can take M Æ in (3.116) to obtain Æ Rn +1/ . (3.130) +n 1/ Unfortunately, this inequality is not very useful because it involves Æ which is the very quantity we are trying to approximate. However, if we accept that Æ 3, then we can replace (3.130) with the following less precise, but more easily applied, inequality: Rn +1/
3 +n 1/
.
(3.131)
Thus, we can achieve five decimal-place accuracy by choosing n so that 3 +n 1/
0.000005
+n 1/ 600 000.
or
Since 9! = 362880 and 10!=3628800, the smallest value of n that meets this criterion is n five decimal-place accuracy Æ
11
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
2.71828
(3.132) 9. Thus, to
(3.133)
As a check, a calculator's twelve-digits representation of Æ is Æ2.71828182846, which agrees with the preceding approximation when rounded to five decimal places.Ô
3.4.9 Tests and Exercises The following two subsections serve to test your understanding of the last section. Work first on the test examples and then try to solve the exercises.
IV - Chapter 3: Numbers and Errors
103
3.4.9.1 Test Problems T1. What is Vandermonde interpolation? T2. How works the Lagrange interpolation? T3. What is Horner's scheme? T4. Explain Newton's finite difference interpolation? T5. What is the difference between Lagrange interpolation and least square interpolation using Legendere polynomials? T6. How is a local approximation defined?
3.4.9.2 Exercises E1. What can be the maximum spacing in a uniformly spaced table of sin+x/ for the first quadrant, if it is desired to have truncation error of less than 107 . Assume that we want to use only (i) linear, (ii) quadratic, (iii) cubic interpolation. E2. Horner's scheme is a method to evaluate polynomials in an efficient way. Given a polynomial of order n we can write pn +x/
an xn an1 xn1 … a1 x a0
a0 x+a1 x+a2 … an +x / …//
(1)
We see that pn +x/ calculated by as follows p0
an , p k
ank x pk1 with k
1, 2, …, n
(2)
Then we obtain pn
pn +x/
(3)
This recursion can be organized in a table to keep track with the calculation an an1 an2 … a1 a0 x p 0 an p 1 p2 … pn1 pn pn +x/ This table is called the famous Horner Scheme. Calculate p3 +4/ for the polynomial p3 +x/ x3 6 x2 5 x 8. a. Set up the Horner scheme in tabular form and carry out the necessary calculations. b. What is the value of p3 +4/. Can you use the same Horner scheme for p3 +4.5/? E3. For the polynomial p4 +x/ x4 10 x3 35 x2 50 x 24 calculate: a. p4 +1.5/ by using Horner's scheme. b. use x 2.5 in p4 and remove the factor +x 2/ from p4 +x/. c. use x 3.5 in p4 and remove the factor +x 3/ from p4 +x/. What do you observe in the calculations b) and c)? E4. Given the following set of data +x0 , y0 / +2, 5/, +x1 , y1 / +1, 1/, and +x2 , y2 / +2, 2/. a. Calculate the Vandermonde Matrix of the given interpolation problem and set up the augmented matrix +V y/ in order to solve the LES. b. Solve the LES. c. Write down the interpolation polynomial. E5. If we are given a table of f +x/
log10 +sin+x//, for x
0.01, 0.02, ..., 1.00. Over what range of x values will linear
interpolation be sufficient for an accuracy of 105 or 104 . E6. Consider the data in the range 0.4 x 1.2 in the following Table. x 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 f +x/ 5.16 3.6933 3.14 3.0 3.1067 3.3886 3.81 4.351 5.0 5.7491 6.5933 7.5292 8.5543 Using simple interpolation, calculate a. p2 +0.9/ using the first three points, b. p2 +0.9/ using the last three points,
Mathematics for Engineers
104
p3 +0.9/ using the first four points, d. p3 +0.9/ using the last four points, and e. p4 +0.9/ using all five data points. E7. Construct a polynomial of order 4 using Lagrange interpolation based on the table of data listed in problem E5. E8. The constant pressure specific heat p of low pressure air are tabulated in the following Table. T #K' 1000 1100 1200 1300 1400 1500 1600 C p #kJ skg K' 1.141 1.573 1.1722 1.1858 1.1982 1.2095 1.2197 Using simple interpolation with the base point as close to the specified value of T possible, calculate a. C p +1120/ using two points, b. C p +1120/ using three points, c. p +1480/ using two points, and d. C p +1480/ using three points. e. Fit the C p +T/ data to a fourth-degree polynomial and compute the deviations C p +T/
a b T c T 2 d T 3 e T 4.
(4)
E9. Evaluate the following quantities by interpolating between tables with specified uniform spacing a. sin+44.444 °/ with h 50 °, 10 °, 5 °, 1° with h 2.0, 1.0, 0.5, 0.1 b. Æ0.111 with h 0.1, 0.01, 0.001. c. 0.0121 Use Newton's formula based on 2, 4, 6, ..., 20 points and compare the results with actual values. To study the effect of errors in the tabulated values on interpolation, multiply each value of the tabulated function by +1 H R/, where R is a random number in interval +0.5, 0.5/, and try H 105 and 103 . Repeat the above calculation using these values. E10 The following table gives the world record for running events s #m' 100 200 400 800 1500 5000 10 000 30 000 t #s' 9.93 19.72 43.86 101.73 209.45 780.40 1633.81 5358.8 Using Newton's interpolation and try to estimate the world record for 1000m, 2000m, 25000m and marathon (42195m). Compare your results with actual values: 132.18s, 291.39s, 4435.8s and 7632s, respectively. Also try to estimate the world record for distance covered in one hour (actual record 20944m). Repeat the process by using log+s/ versus log+t/.
3.5 Power Series and Taylor Series In this section, we return to Taylor sums and generalize their properties to series. We will discuss properties of series and their relation to approximations. In the sections below we will also introduce generalized series like Hermite and Fourier series which are special series using Hermite and trigonometric functions as a basis for their generation.
3.5.1 Definition and Properties of Series In Section 3.4.6 we defined the nth Maclaurin polynomial for a function f as
IV - Chapter 3: Numbers and Errors
n
Å
f +k/ +0/ k
k 0
xk
f +0/ f ' +0/ x
105
f '' +0/ 2
and the nth Taylor polynomial for f about x n
Å k 0
f
+k/
+x0 /
k
+x x0 /k
x2 …
f +n/ +0/ n
xn
(3.134)
x0 as
f +x0 / f ' +x0 / +x x0 /
f '' +x0 / 2
+x x0 /2 …
f +n/ +x0 / n
+x x0 /n .
(3.135)
Since then, we have gone on to consider sums with an infinite number of terms, so it is not a big step to extend the notion of Maclaurin and Taylor polynomials to series by not stopping the summation index at n. Thus we have the following definition. Definition 3.4. Taylor Polynomial
If f has derivatives of all order at x0 , then we call the series
Å k 0
f +k/ +x0 / k
+x x0 /k
f +x0 / f ' +x0 / +x x0 / the Taylor series for f about x
Å k 0
f +k/ +0/ k
xk
f '' +x0 / 2
+x x0 / … 2
f +k/ +x0 / k
(3.136) +x x0 / ….
x0 . In the special case where x0
f +0/ f ' +0/ x
f '' +0/ 2
x2 …
f +k/ +0/ k
k
0, this series becomes
xk …
(3.137)
in which case we call it the Maclaurin series of f . Note that the nth Maclaurin and Taylor polynomials are the nth partial sums for the corresponding Maclaurin and Taylor series. A generalization of Maclaurin and Taylor series is achieved by replacing the expansion coefficients by arbitrary constant c0 , c1 , c2 , c3 ,…. The related series are so called power series about x0 0 or about a certain finite value x0 which have representations as Definition 3.5. Power Series
A series with arbitrary expansion coefficients c0 , c1 , c2 , … is called a power series about x0 about an arbitrary value of x0 if the series is given as
0 or
Å ck +x x0 /k k 0
or
c0 c1 +x x0 / c2 +x x0 /2 … ck +x x0 /k ….
(3.138)
Mathematics for Engineers
106
Å ck xk
c0 c1 x c2 x2 … ck xk ….
(3.139)
k 0
Power series reduce to Taylor or Maclaurin series if the expansion coefficients are ck Taylor or ck
f +k/ +0/ k
f +k/ +x0 / k
for
for Maclaurin series, respectively.
k If a numerical value is substituted for x in a power series ½ k 0 ck x , then the resulting series of numbers may either converge or diverge. This leads to the problem of determining the set of x-values for which a given power series converges; this is called its convergence set.
The main result on convergence of a power series in x x0 can be formulated as follows. Theorem 3.5. Radius of Convergence k For a power series ½ k 0 ck +x x0 / , exactly one of the following statements is true:
a) The series converges only for x
x0 .
b) The series converges absolutely (and hence converges) for all real values of x. c) The series converges absolutely (and hence converges) for all x in some finite open interval +x0 R, x0 R/ and diverges if x x0 R or x ! x0 R. At either of the values x x0 R or x x0 R, the series may converge absolutely, converge conditionally, or diverge, depending on the particular series. Remark 3.4. The same theorem is valid for x0 rin type.
0 which is related to power series of the Maclau-
It follows from this theorem that the set of values for which a power series in x x0 converges is always an interval centered at x x0 ; we call this the interval of convergence. In part a) of the Theorem 3.5 the interval of convergence reduces to the single value x x0 , in which case we say that the series has radius of convergence R 0; in part b) the interval of convergence is infinite (the entire real line), in which case we say that the series has radius of convergence R ; and in part c) the interval extends between x0 R and x0 R, in which case we say that the series has radius of convergence R. Example 3.19. Convergence of a Power Series Find the interval of convergence and radius of convergence of the series
Å k 1
+x 5/k k2
.
(3.140)
Solution 3.19. In Mathematica Taylor series are generated with the function Series[]. The following line generates the table of the five first Taylor polynomials at x 2:
IV - Chapter 3: Numbers and Errors
U
uk1
lim
+x 5/k1
limk
uk
n
x 5 limk
k2
limk
+k 1/2 +x 5/k
k 1
1k k
Å
2
k 1
1 k
1
2
1 2
2
1 32
Å
+1/
k 1
k
1
1
k2
22
k1
1 32
4 and x
6, we substitute these values
…
(3.142)
which is a convergent hyper harmonic series with p
2
x 5 1, or 1 x 5 1, or 4 x 6. The series
To determine the convergence behavior at the endpoints x in the given series. If x 6, the series becomes
k
.
1 k
Thus, the series converges absolutely if diverges if x 4 or x ! 6.
Å
x5
2
1 1
107
1 42
2. If x
4, the series becomes
…
(3.143)
Since this series converges absolutely, the interval of convergence for the given series is #4, 6'. The radius of convergence is R 1.Ô If a function f is expressed as a power series on some interval, then we say that f is represented by the power series on that interval. Sometimes new functions actually originate as power series, and the properties of the function are developed by working with their power series representations. For example, the functions
J0 +x/
Å k 1
+1/k 22 k +k /2
x2 k
1
x2 22 +1/2
x4 24 +2/2
x6 26 +3/2
…
(3.144)
and
J1 +x/
Å k 1
+1/k 2
2 k1
+k 1/ k
x2 k1
x 2
x3 2 +1/ +2/ 3
x5 2 +2/ +3/ 5
…
(3.145)
which is called the Bessel function in honor of the German mathematician and astronomer Friedrich Wilhelm Bessel (1784-1846), arise naturally in the study of planetary motion and in various problems that involve heat flow. To find the domain of these functions, we must determine where their defining power series converge. For example, in the case J0 +x/ we have
Mathematics for Engineers
108
U
uk1
limk
uk
limk
x2 +k1/
22 k ++k//2
22 +k1/ ++k 1//2
x2 k
limk
x2 4 +k 1/2
(3.146) 01
so that the series converge for all x; that is, the domain of J0 +x/ is +, /. We leave it as an exercise to show that the power series for J1 +x/ also converges for all x.
3.5.2 Differentiating and Integrating Power Series We begin considering the following problem: Suppose that a function f is represented by a power series on an open interval. How can we use the power series to find the derivative of f on that interval? The solution to that problem can be motivated by considering the Maclaurin series for sin+x/:
Å
sin+x/
k 0
+1/k +2 k 1/
x
x2 k1
x3 3
x5 5
x7 7
… – x
(3.147)
Of course, we already know that the derivative of sin+x/ is cos+x/; however, we are concerned here with using the Maclaurin series to deduce this. The solution is easy—all we need to do is differentiating the Maclaurin series term by term and observe that the resulting series is a Maclaurin series for cos+x/: d dx
x
1
x3 3
3 x2 3
x5
5
5 x4
x7 7
5
…
7 x6 7
…
1
x2 2
x4 4
x6 6
(3.148) …
cos+x/
Here is another example Series#Æx , x, 0, 5'
exponential 1x
x2 2
x3 6
x4 24
x5 120
O,x6 0
The differentiation delivers exponential x 1x
x2 2
x3 6
x4 24
O,x5 0
which is again the series of the exponential function. The preceding computations suggest that if a function f is represented by a power series on an open interval, then a power series representation of f ' on that interval can be obtained by differentiating the
IV - Chapter 3: Numbers and Errors
109
power series for f term by term. This is stated more precisely in the following theorem, which we give without proof. Theorem 3.6. Differentiation of Power Series
Suppose that a function f is represented by a power series in x x0 that has a nonzero radius of convergence R; that is
f +x/
Å ck +x x0 /k
+x0 R x x0 R/.
(3.149)
k 0
Then: a) The function f is differentiable on the interval +x0 R, x0 R/. b) If the power series representation for f is differentiated term by term, then the resulting series has radius of convergence R and converges to f ' on the interval +x0 R, x0 R/; that is
f ' +x/
Å k 0
d dx
,ck +x x0 /k 0
x0 R x x0 R.
(3.150)
This theorem has an important implication about the differentiability of functions that are represented by power series. According to the theorem, the power series for f ' has the same radius of convergence as the power series for f , and this means that the theorem can be applied to f ' as well as f . However, if we do this, then we conclude that f ' is differentiable on the interval +x0 R, x0 R/, and the power series for f '' has the same radius of convergence as the power series for f and f '. We can now repeat this process ad infinity, applying the theorem successively to f '', f ''',…, f +n/ ,… to conclude that f has derivatives of all orders in the interval +x0 R, x0 R/. Thus, we have established the following result. Theorem 3.7. Differentiation of Power Series
If a function f can be represented by a power series in x x0 with a nonzero radius of convergence R, then f has derivatives of all orders on the interval +x0 R, x0 R/. In short, it is only the most well-behaved functions that can be represented by power series; that is, if a function f does not possess derivatives of all orders on an interval +x0 R, x0 R/, then it cannot be represented by a power series in x x0 on that interval. Example 3.20. Derivatives of Power Series We defined the Bessel function J0 +x/ as
J0 +x/
Å k 0
+1/k 22 k +k /2
x2 k
(3.151)
with an infinite radius of convergence. Thus J0 +x/ has derivatives of all orders on the interval +, /, and these can be obtained by differentiating the series term by term. For example, if we
Mathematics for Engineers
110
write (3.151) as
J0 +x/
1Å k 1
+1/k
x2 k
22 k +k /2
(3.152)
and differentiate term by term, we obtain
J0 ' +x/
Å
+1/k +2 k/
k 1
22 k +k /2
x2 k1
Å k 1
+1/k 22 k1 k +k 1/
x2 k1 .Ô
(3.153)
Remark 3.5. The computations in this example use some techniques that are worth noting. First, when a power series is expressed in sigma notation, the formula for the general term of the series will often not be of a form that can be used for differentiating the constant term. Thus, if the series has a nonzero constant term, as here, it is usually a good idea to split it off from the summation before differentiating. Second, observe how we simplified the final formula by canceling the factor k from the factorial in the denominator. This is a standard simplification technique. Since the derivative of a function that is represented by a power series can be obtained by differentiating the series term by term, it should not be surprising that an antiderivative of a function represented by a power series can be obtained by integrating the series term by term. For example, we know that sin+x/ is an antiderivative of cos+x/. Here is how this result can be obtained by integrating the Maclaurin series for cos+x/ term by term: Ã 1
à cos+x/ dx x
x3 +3/ 2
x2 2
x5 +5/ 4
x4 4
x7 +7/ 6
x6 6
… dx
… C
x
x3 3
x5 5
x7 7
(3.154) … C
sin+x/ C.
The same idea applies to definite integrals. For example, by direct integration we have 1
1
Ã
0
S 4
1
1
x 1 2
Šx à Series% 0
x 1 2
, x, 0, 6) Å x
1
à ,1 x2 x4 x6 O,x7 00 Šx 0
The preceding computations are justified by the following theorem, which we give without proof. Theorem 3.8. Integration of Power Series
Suppose that a function f is represented by a power series in x x0 that has a nonzero radius of convergence R; that is
f +x/
Å ck +x x0 /k k 0
x0 R x x0 R.
(3.155)
IV - Chapter 3: Numbers and Errors
111
a) If the power series representation of f is integrated term by term, then the resulting series has radius of convergence R and converges to an antiderivative for f +x/ on the interval +x0 R, x0 R/; that is
à f +x/ dx
Å k 0
ck k 1
+x x0 /k1 C
x0 R x x0 R
(3.156)
b) If D and E are points in the interval +x0 R, x0 R/, and if the power series representation of f is integrated term by term from D to E , then the resulting series converges absolutely on the interval +x0 R, x0 R/ and E
Ã
D
f +x/ dx
E
Å Ã ck +x x0 /k dx . D
k 0
(3.157)
Based on the Theorem 3.8 and 3.6 there are different practical ways to generate power series for functions.
3.5.3 Practical Ways to Find Power Series In this section we discuss different examples demonstrating how power series can be calculated in practice. Example 3.21. Power Series for ArcTan Find the Maclaurin series for arctan+x/. Solution 3.21. It would be tedious to find the Maclaurin series directly. A better approach is to start with the formula Ã
arcTan
1 x 1 2
Åx
tan1 +x/
and integrate the Maclaurin series 1 serexp
Series%
x 1 2
, x, 0, 7)
1 x2 x4 x6 O,x8 0
term by term. This yields intArcTan
or
x7 7
x5 5
à Normal#serexp' Šx x3 3
x
Mathematics for Engineers
112
arcTan intArcTan x7
tan1 +x/
7
x5 5
x3 3
x
which is defined for 1 x 1.Ô Remark 3.6. Observe that neither Theorem 3.6 nor Theorem 3.8 addresses what happens at the endpoints of the interval of convergence. However, it can be proved that if the Taylor series for f about x x0 converge to f +x/ for all x in the interval +x0 R, x0 R/, and if the Taylor series converges at the right endpoint x0 R, then the value that it converges to at that point is the limit of f +x/ as x x0 R from the left; and if the Taylor series converges at the left endpoint x0 R, then the value that it converges to at that point is the limit of f +x/ as x x0 R. Taylor series provide an alternative to Simpson's rule and other numerical methods for approximating definite integrals. Example 3.22. Approximation of Integrals Approximate the integral 1
x Ã Æ dx 2
0
to three decimal place accuracy by expanding the integrand in a Maclaurin series and integrating term by term. Solution 3.22. The simples way to obtain the Maclaurin series for Æx is to replace x by x2 in the Maclaurin series of the exponential function Æx . 2
Normal#Series#Æx , x, 0, 5''
exp x5 120
x4
24
x3 6
x2 2
x1
The replacement is given as exp s. x x2
reExp
x10 120
x8 24
x6 6
x4 2
x2 1
Therefore in the integral we find à reExp Šx
intReRxp
x11 1320
x9 216
x7 42
x5 10
x3 3
This can be generally be written as
x
IV - Chapter 3: Numbers and Errors
1
x Ã Æ dx 2
Å
0
k 0
113
+1/k +2 k 1/ k
which represents a convergent series. The requirement that the accuracy should be three decimal places requires that the remainder is smaller than 0.0005. To find a formula for such an estimation we represent the largest part of the remainder as the difference between the integral and the nth partial sum as 1
x Ã Æ dx sn
1
2
0
+2 +n 1/ 1/ +n 1/
0.0005
To determine the n value for which this condition is satisfied, we solve the following relation FindRoot%+2 +n 1/ 1/ +n 1/
1 0.0005
, n, 5)
n 4.21824
which delivers the value 4.21. Since n should be an integer we select n the approximation of the integral delivers
5 as a sufficient value thus
N#intReRxp s. x 1' 0.746729
Ô The following examples illustrate some algebraic techniques that are sometimes useful for finding Taylor series. Example 3.23. Multiplication of Series Find the first three nonzero terms in the Maclaurin series for the function f +x/ Solution 3.23. Using the series for Æx and arctan+x/ gives 2
exp
x6 6
Normal$Series$Æx , x, 0, 7(( 2
x4 2
Normal$Series$tan1 +x/, x, 0, 7((
arcTan
x7 7
The product is
x2 1
x5 5
x3 3
x
Æx arctan+x/. 2
Mathematics for Engineers
114
Expand#arcTan exp' x13 42
11 x11 105
94 x9 315
71 x7 105
31 x5 30
4 x3 3
x
More terms in the series can be obtained by including more terms in the factors. Moreover, one can prove that a series obtained by this method converges at each point in the intersection of the intervals of convergence of the factors. Thus we can be certain that the series we have obtained converges for all x in the interval 1 x 1.Ô
3.5.4 Generalized Power Series A powerful method to solve several problems is the representation of a function f in terms of an expansion by a series of special functions fk +x/ defined by
f +x/
Å fk +x/
(3.158)
k 0
with fk +x/ as a function satisfying some well behaved properties. Among these functions are the well known power series and so called Fourier series where the expansion functions (basis functions) are given in Table 3.4. Table 3.4. Different types of generalized series for approximations Function fk +x/
ak xk
Series type power series
fk +x/
ak cos+k x/ bk sin+k x/
Fourier series
fk +x/
ak Hk +x/
Hermite series
fk +x/
ak Kk +x/
Bessel series
fk +x/
ak sinc+x k h/
Sinc series
ak and bk are expansion coefficients of the series.
The use of the different basis function in a series expansion of a function will result to different representations of the function. The functions fk +x/ are usually chosen in such a way that they will reflect the symmetry or some property of the problem. If the problem has periodic properties the Fourier series expansion is the tool of choice. If the property has rotational properties the Bessel series are appropriate etc. In the following we will restrict to Fourier series which are the most important one for periodic phenomena. 3.5.4.1 Fourier Sum
A Fourier sum is given as a finite number of trigonometric functions weighted by different coefficients. More precise a function f can be approximated by the following sum
IV - Chapter 3: Numbers and Errors
f +x/
1 2
115
M
a0 Å ak cos+k x/ bk sin+k x/ k 1
where M ± 1 \ and ak and bk are some real numbers determining the kth weight of the trigonometric function in this expansion. 3.5.4.2 Fourier Series
Fourier series arise from the practical task of representing a given periodic function f +x/ in terms of cosine and sine functions. These series are called trigonometric series. For this kind of series we have to assume that the function f +x/ is a periodic function, for simplicity we first assume a 2 S period. In addition the function f +x/ should be integrable over this period. Let us further assume that f +x/ can be represented by a trigonometric series f +x/
1 2
a0 Å ak cos+k x/ bk sin+k x/
(3.159)
k 1
that is, we assume that this series converges and has f +x/ as its sum. The problem of this general form of a Fourier series is that the expansion coefficients ak and bk are unknown constants for each k, respectively. The reason why a0 is multiplied by 1 s 2 is purely technical and will be seen below. Since we assume 2 S periodicity we will restrict the domain of the function f to a symmetric interval on the real line. We can thus reduce to the problem given a function f : +S, S/ 5 and looking for a representation of the function f by (3.159). The determination of the coefficients ak and bk is a simple matter if certain assumptions are made. Assume that the series (3.159) converges and that equality holds — all this of course, remain to be settled later on — and assume further that the series can be integrated term by term from S to S and that the same is true when both sides of (3.159) are multiplied by either cos+k x/ or by sin+k x/, where k is a non negative integer. Then, if we multiply (3.159) by cos+m x/, we get S
à f +x/ cos+m x/ Šx S
1 2
S
S
k 1
S
S
S
S
(3.160)
a0 à cos+m x/ Šx Šak à cos+m x/ cos+k x/ Šx bk à cos+m x/ sin+k x/ Šx!.
To simplify this expression we look at the following integrals S
à cos+m x/ Šx S
2S 0
The direct calculation gives us
if m 0 if m 0
(3.161)
Mathematics for Engineers
116
S
à cos+m x/ Šx
v1
S
2 sin+S m/ m
The two critical values for this integral are gained by setting m
0 and m ± 1. For both cases we find
lim v1 m0
2S
and lim v1 m1
0
The second kind of integrals we need to know are S
à cos+m x/ cos+k x/ Šx S
S 0
if m k if m 0
(3.162)
The result can be verified by a direct integration and performing the limits for different values for m and k. The integration delivers S
à cos+k x/ cos+m x/ Šx
v2
S
2 k sin+S k/ cos+S m/ 2 m cos+S k/ sin+S m/ k 2 m2
The limit for m k gives lim v2 mk
sin+2 S k/ 2k
S
which is actually S if k ± 1. The second case for m ± 1 and m k gives lim v2 m2
2 k sin+S k/ k2 4
here we see that for any number k ± 1 this limit vanishes. The last integral needed to evaluate the expansion coefficients for the Fourier series is
IV - Chapter 3: Numbers and Errors
117
S
à cos+m x/ sin+k x/ Šx
0
(3.163)
S
which can be directly verified by S
à sin+k x/ cos+m x/ Šx S
0
Using these results in our formula above, we can simplify this expression and derive relations for the expansion coefficients ak and bk by ak
1 S
S
à f +x/ cos+m x/ Šx S
with m
0, 1, 2, 3, ….
(3.164)
The reason for the coefficient 1 s 2 in (3.159) is now apparent: without it the previous formula would not be valid for m 0 and a separate formula would be needed. A similar procedure, but multiplying (3.159) by sin+m x/, yields bk
1 S
S
à f +x/ sin+m x/ Šx
with m
1, 2, 3, ….
S
(3.165)
This simple method was devised by L. Euler in 1777, but only after having first obtained these formulas in a more complicated manner. Fourier arrived at the same result independently, also discovering the simple method after a much more complicated one. There is no reason to believe at this moment that the assumptions made above on convergence of the series (3.159) and on the term by term integration are valid. But whether or not they are— this will be decided later on—we take the resulting formula as our starting point. Definition 3.6. Fourier Coefficients
If the following integrals exist, the numbers ak
1 S
S
à f +x/ cos+m x/ Šx
with m
0, 1, 2, 3, ….
(3.166)
with m
1, 2, 3, ….
(3.167)
S
and bk
1 S
S
à f +x/ sin+m x/ Šx S
are called the Fourier coefficients of f on + S , S /. The series f +x/
1 2
a0 Å ak cos+k x/ bk sin+k x/ k 1
(3.168)
Mathematics for Engineers
118
where ak and bk are given by the above integrals, is called a Fourier series of f on + S , S /. Example 3.24. Fourier Coefficients Find the Fourier series of the function f +x/
if S x S s 2 if S s 2 x S
0 1
whose graph is given in Figure 3.23. This function describes the signal of a switch which is Off till x S s 2 and On from S s 2 x S. 1.0
0.8
f+x/
0.6
0.4
0.2
0.0 3
2
1
0
1
2
3
x Figure 3.22. Step function for a switch.
Solution 3.24. According to the definition of the coefficients we have to evaluate the following integrals S
¼ S 1 Å x a0
2
S
3 2
and for k ! 0 we find S
¼ S cos+k x/ Å x ak
2
S
Sk sin- 2 1 sin+S k/
Sk
If k ± 1 then the coefficients are
IV - Chapter 3: Numbers and Errors
119
k1
ak
+1/ 2 Sk
if k is odd if k is even.
0
The second kind of coefficients is give by S
¼ S sin+k x/ Å x 2
bk
S
Sk cos- 2 1 cos+S k/
Sk
Examining this relation by inserting different integers for k we find bk
1 Sk
if k is odd 1 Sk
,1 +1/ 0 if k i even ks2
Hence the Fourier series can be defined by the following line as a function approximating the upper limit of the series with M terms M
four+M_/ : Block%, Å sin+k x/ Which%OddQ#k', k 1
cos+k x/ Which%OddQ#k',
+1/
1 Sk
1 +1/ks2
, EvenQ#k',
Sk
k1 2
Sk
, EvenQ#k', 0)
3 22
)
A six term series is then given by four+6/ sin+x/ S
sin+3 x/ 3S
sin+5 x/ 5S
sin+2 x/ S
sin+6 x/ 3S
cos+x/ S
cos+5 x/ 5S
cos+3 x/
The corresponding graph of the approximation is given in the next figure
3S
3 4
)
Mathematics for Engineers
120
1.0
f+x/
0.8 0.6 0.4 0.2 0.0 3
2
1
0
1
2
3
x Figure 3.23. Step function and 6th order approximation by the Fourier series.
The following sequence of figures shows the behavior of the approximation if the upper limit of the sum is increases from 1 to 20.
1.4 1.2
f+x/
1.0 0.8 0.6 0.4 0.2 0.0 4
2
0
2
4
x
The oscillation of the approximation is characteristic for Fourier approximations. This phenomena occurs especial at discontinuities of the function and is known as Gibbs phenomenon. The presence of the phenomenon in a singe example, or in a collection of examples, does not imply its existence in the general case. But in 1906 Maxime Bocher showed that the partial sum of the Fourier series of an
IV - Chapter 3: Numbers and Errors
121
arbitrary function always has this over swing of about 9% at any jump discontinuity. This is an intrinsic defect of the convergence of Fourier series.Ô
3.5.5 Tests and Exercises The following two subsections serve to test your understanding of the last section. Work first on the test examples and then try to solve the exercises. 3.5.5.1 Test Problems T1. What is an power series? T2. What theorems are available for calculating the radius of convergence? T3. What other series do you know except Mclaurin and Taylor series? T4. What is a power series useful for? T5. What are Fourier series?
3.5.5.2 Exercises 2
E1. Find the fourth Taylor polynomial T4 +x/ for the function f +x/ xÆx about x0 a. Find an upper bound for T4 +x/ f +x/ , for 0 x 0.4. 0.4
b. Approximate ¼0
0.
0.4
f +x/ Å x using ¼0 T4 +x/ Å x. 0.4
c. Find an upper bound for the error in b) using ¼0 T4 +x/ Å x. d. Approximate f ' +0.2/ using
T4' +0.2/,
and find the error.
E2. Use the error term of a Taylor polynomial to estimate the error involved in using sin+x/ x to approximate sin+1 °/. E3. Use a Taylor polynomial about Ss4 to approximate cos+42 °/ to an accuracy of 106 . E4. Let f +x/
+1 x/1 and x0
0. Find the nth Taylor polynomial Tn +x/ for f +x/ about x0 . Find a value of n necessary
for Tn +x/ to approximate f +x/ to within 106 on #0, 0.5'. E5. The polynomial p2 +x/
1
1 2
x2 is to be used to approximate f +x/
cos+x/ in $
1 1 , 2 (. 2
Find a bound for the maximum
error. E6. A Maclaurin polynomial for Æx is used to give the approximation 2.5 to Æ. The error bound in this approximation is established to be E 1 s 6. Find a bound for the error in E. E7. Integrate the Maclaurin series for Æx to show that 2
2
erf +x/ E8. Let f +x/ a. b. c. d. e.
S
Å k 0
+1/k x2 k1 +2 k 1/ k
(1)
ln,x2 20. Use Mathematica to determine the following.
The Taylor polynomial T3 +x/ for f expanded about x0 1. The maximum error f +x/ T3 +x/ , for 0 x 1. The Maclaurin polynomial M3 +x/ for f . The maximum error f +x/ M3 +x/ , for 0 x 1. Does T3 +0/ approximate f +0/ better than M3 +1/ approximates f +1/?
E9. The error function can also be expressed in the form erf +x/
2 S
2k x2 k1
k 0
1 3 5 …+2 k 1/
Æx Å 2
(2)
122
Verify that the series (2) and (1) agree for k
Mathematics for Engineers
1, 2, 3, and 4.
Find the nth Maclaurin polynomial Mn +x/ for f +x/
arctan+x/.
4 Roots of Equations
4.1 Introduction The following problem may be used as an introduction to the problem of root finding. An electrical cable is suspended from two towers that are 50 meters apart. The cable is allowed to dip 10 meters in the middle. How long is the cable? We know that the curve assumed by a suspended cable is a catenary (see Figure 4.1).
40
y
30
20
10
0
20
10
0
10
20
x Figure 4.1. Cable suspended between two towers (left and right in the figure).
When the y-axis passes through the lowest point, we can assume an equation of the form y N cosh+x s N/. Here N is a parameter to be determined. The conditions of the problem are that
Mathematics for Engineers
124
y+25/
y+0/ 10. Hence
N cosh
25 N
N 10.
(4.1)
From this equation, N can be determined by the methods discussed in this chapter. The result is N 32.79. The question now is how can we find this value and what are the procedures to calculate it. Another example for such kind of problems is the following missile-intercept problem. The movement of an object in the x y plane is decried by the parametrized equations x1 +t/
t and y1 +t/
1 Æt .
(4.2)
A second object moves according to the equations x2 +t/
1 cos+D/ t and y2 +t/
sin+D/ t 0.1 t2 .
(4.3)
Is it possible to choose a value for D so that both objects will be in the same place at some time? When we set the x and y coordinates equal to each other, we get the system t
1 cos+D/ t and 1 Æt
sin+D/ t 0.1 t2
(4.4)
that needs to be solved for the unknown D and t. If real values exist for these unknowns that satisfy the two equations, both objectives will be in the same place at some value t. But even though the problem is a rather simple one that yields a small system, there is no obvious way to get the answer, or even to see if there is a solution. However, if we graphically represent the two curves we observe that there is an intersection which means a solution (see Figure 4.2).
IV - Chapter 4: Roots of Equations
125
D
1.4 1.2 1.0 0.8 0.6 0.4 0.2
0.5
1.0
1.5
2.0
Figure 4.2. Two objects are crossing on a common point.
The numerical solution of a system of nonlinear equations is one of the more challenging tasks in numerical analysis and, as we will see, no completely satisfactory method exists for it. To understand the difficulties, we start with what at first seems to be a rather easy problem, the solution of a single equation in one variable f +x/
0.
(4.5)
The values of x that satisfy this equation are called the zeros or roots of the function f . In what is to follow we will assume that f is continuous and sufficiently differentiable where needed.
4.2 Simple Root Finding Methods To find the roots of a function of one variable is straightforward enough—just plot the function and see where it crosses the x-axis. The simplest methods are in fact little more than that and only carry out this suggestion in a systematic and efficient way. Relying on the intuitive insight of the graph of the function f , we can discover many different and apparently viable methods for finding the roots of a function of one variable. Suppose we have two values a and b, such that f +a/ and f +b/ have opposite signs. Then, because it is assumed that f is continuous, we know that there is a root somewhere in the interval #a, b'. To localize
Mathematics for Engineers
126
it, we take there is a root somewhere in the interval and compute f +c/. Depending on the sign of f +c/, we can then place the root in one of the two intervals #a, c' or #c, b'. We can repeat this procedure until the region in which the root is known to be located is sufficiently small. The algorithm is known as the bisection method. This method is based on the intermediate-value theorem which is shown in Figure 4.3. In this figure the graph of a function that is continuous on the closed interval #a, b' is shown. The figure suggests that if we draw any horizontal line y k, where k is between f +a/ and f +b/, then that line will cross the curve y f +x/ at least once along the interval #a, b'.
f+a/
y
k
a
x
b
f+b/ x Figure 4.3. Graph of a function with continuous behavior in the interval #a, b'.
Stated in numerical terms, if f is continuous on #a, b', then the function f must take on every value k between f +a/ and f +b/ at least once as x varies from a to b. For example, the polynomial p+x_/ :
x7 x 3
has a value of 3 at x 1 and a value of 129 at x 2. Thus, it follows from the continuity of p that the equation 3 x x7 k has at least one solution in the interval #1, 2' for every value of k between 3 and 129. This idea is stated more precisely in the following theorem. Theorem 4.1. Intermediate-Value Theorem
If f is continuous on a closed interval #a, b' and k is any number between f +a/ and f +b/, inclusive, then there is at least one number x in the interval #a, b' such that f +x/ k. Although this theorem is intuitively obvious, its proof depends on a mathematically precise development of the real number system, which is beyond the scope of this text. A variety of problems can be reduced to solving an equation f +x/ 0 for its roots. Sometimes it is possible to solve for the roots exactly using algebra, but often this is not possible and one must settle for decimal approximations of the roots. One procedure for approximating roots is based on the following consequences of the Intermediate-Value Theorem.
IV - Chapter 4: Roots of Equations
127
Theorem 4.2. Root Approximation
If f is continuous on #a, b', and if f +a/ and f +b/ are nonzero and have opposite signs, then there is at least one solution of the equation f +x/ 0 in the interval +a, b/. This result, which is illustrated in Figure 4.4, can be proved as follows.
y
f+a/!0
f+x/ 0
a
0
b
f+b/0 x
Figure 4.4. Graph of a function f +x/ allowing a root in the interval #a, b'.
Proof: Since f +a/ and f +b/ have opposite signs, 0 is between f +a/ and f +b/. Thus by the Intermediate-Value Theorem there is at least one number x in the interval #a, b' such that f +x/ 0. However, f +a/ and f +b/ are nonzero, so that x must lie in the interval +a, b/, which completes the proof. Example 4.1. Root finding of a Polynomial The equation x5 x 8 x2 1
0
can not be solved algebraically because the left hand side has no simple factors. Solution 4.1. If we graph p+x/ x5 x 8 x2 1 (Figure 4.5), then we are led to the conjecture that there is one real root and that this root lies inside the interval #2.1, 2'. The existence of a root is also confirmed by Theorem 4.2, since p+2.1/ 2.46101 and p+2/ 63 have opposite signs. Approximate this root to two decimal-place accuracy.
128
Mathematics for Engineers
#2.2, 1.5' 10
y
5
0
5
10 2.2
2.1
2.0
1.9
1.8
1.7
1.6
1.5
x
Figure 4.5. Graph of a function f +x/ allowing a root in the interval #a, b'.
The polynomial is defined by p+x_/ :
x5 8 x2 x 1
The following sequence of intervals shrinks the interval length in such a way that the conditions of Theorem 4.2 are satisfied. The intervals are given in curled brackets in the second argument of Map[]. Map#p#Ó317', p#Ó327' &, 2.1, 2, 2.1, 1.8, 2.06, 2.05, 2.059, 2.056, 2.0585, 2.058' ss TableForm#Ó, TableHeadings , "x", "p+x/"' & x
p+x/
2.46101 2.46101 0.0879704 0.031969 0.00402787
63 9.82432 0.464937 0.135084 0.0238737
The table shows that the interval is chosen in such a way that the signs of the polynomial p+x/ changes. However, the exact value of the root can be determined by FindRoot# p+x/ 0, x, 3.1' x 2.05843
IV - Chapter 4: Roots of Equations
Stating that the real value of x tal x-axis.Ô
129
2.0584 is the intersection of the polynomial p+x/ with the horizon-
4.2.1 The Bisection Method The bisection method is very simple and intuitive, but has all the major characteristics of other rootfinding methods. The simplest numerical procedure for finding a root is to repeatedly halve the interval #a, b', keeping the half on which f +x/ changes sign. This procedure is called the bisection method. It is guaranteed to converge to a root. The bisection method is very simple and uses the ideas introduced above that the product of the function at two different locations distinguishes three cases. If we have positive values there is no change in sign and thus no root, if we have a negative sign the two values are different in sign and we will have a root, if the result is zero we found the root itself. In general the following steps are used: 1. Step: Choose the lower and upper boundary of an interval including the root. This means f +xu / f +xl / 0. 2. Step: Estimate the root by the arithmetic mean. 3. Step: Make the following calculations to determine in which subinterval the root lies: a. If f +xl / f +xr / 0 the root lies in the lower interval. Therefore, set xu xr and return to step 2. b. If f +xl / f +xr / ! 0 the root lies in the upper interval. Therefore, set xl xr and return to step 2. c. If f +xl / f +xr / 0 the root equals xr ; terminate the computation. To be more precise in our definition, suppose that we are given an interval #a, b' satisfying f +a/ f +b/ 0 and an error tolerance H ! 0. Then the bisection method consists of the following steps: 1. Define c +a b/ s 2. 2. If b c H, then accept c as the root and stop. 3. If sign+ f +b// sign+ f +c// 0, then set a c. Otherwise, set b These algorithmic steps are implemented in the following lines:
c. Return to step 1.
Mathematics for Engineers
130
bisection$f_, a_, b_( : Block%c, H While%0
105 , ain
a, bin
b, m
1, results
,
0,
+ first step find the midpoint / +ain bin/ ; c 2 + second step select the root / If#+bin c/ H, Return#results''; AppendTo#results, m, c'; + third step select the interval / If#+f s. x ! bin/ +f s. x c/ 0, ain N#c', bin m m1
N#c'';
) )
The application of the function to a polynomial shows us the following results bisection$x6 x 1, 1, 1.4( ss TableForm#Ó, TableHeadings , "m", "c"' & m 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
c 1.2 1.1 1.15 1.125 1.1375 1.13125 1.13438 1.13594 1.13516 1.13477 1.13457 1.13467 1.13472 1.13474 1.13473
where m is the iteration step and c represents the approximation of the root at iteration step m. The graphical representation of the function shows that there is in fact an intersection with the x axis.
IV - Chapter 4: Roots of Equations
131
Plot$x6 x 1, x, 1, 1.4( 5 4 3 2 1 0 1 1.0
1.1
1.2
Figure 4.6. Graph of the function f +x/
1.3
1.4
x x 1 allowing a root in the interval #1, 1.3'. 6
In general, an iteration produces a sequence of approximate solutions; we will denote these iterates by x#0' , x#1' , x#2' , .... These sequence of approximations is shown dynamically in the following Figure 4.5.
5 4 3 2 1 0 1 1.0
1.1
1.2
1.3
Figure 4.7. Sequence of approximations of the root for the function f +x/
1.4
x6 x 1.
The difference between the various root-finding methods lies in what is computed at each step and how the next iterate is chosen.
Mathematics for Engineers
132
f+x0 /
x2
y
x4 x3
x0
x1
f+x1 / x Figure 4.8. The bisection method. After three steps the root is known to lie in the interval #x3 , x4 '.
To estimate the error bound of the bisection method we can proceed as follows. Let an , bn and cn denote the nth computed value of a, b, and c, respectively. Then easily we get 1
bn1 an1
2
+bn an / for n 1
(4.6)
and it is straightforward to deduce that bn an
1 n1
2
+b a/ for n 1
(4.7)
where b a denotes the length of the original interval with which we started. Since the root D is in either the interval #an , cn ' or +cn , bn ', we know that D cn cn an
bn cn
1 2
+bn an /.
(4.8)
This is the error bound for cn that is used in the second step of the bisection algorithm. Combining it with our estimation, we obtain the further bound D cn
1 2n
+b a/.
(4.9)
This shows that the iterates cn converges to D as n . To see how many iterations will be necessary, suppose we want to have D cn H. This will be satisfied if
(4.10)
IV - Chapter 4: Roots of Equations
1 2n
+b a/ H.
133
(4.11)
Taking logarithms of both sides, we can solve this to give n
log,
ba 0 H
(4.12)
.
log+2/
For the example we discussed above the number of iterations for an accuracy of 105 should be found within 1
n
log, 0.00001 0 log+2/
Thus we need about n
(4.13)
16.6096. 17 iterations which is in agreement with the calculation.
There are several advantages to the bisection method. The principal one is that the method is guaranteed to converge. In addition, the error bound, given is guaranteed to decrease by one-half with each iteration. Many other numerical methods have variable rates of decrease for the error, and these may be worse than the bisection method for some equations. The principal disadvantage of the bisection method is that it generally converges more slowly than most other methods. For functions f +x/ that have a continuous derivative, other methods are usually faster. These methods may not always converge; when they do converge, however, they are almost always much faster than the bisection method.
4.2.2 Method of False Position Suppose we have two iterates x0 and x1 that encloses the root. We can then approximate f +x/ by a straight line in the interval and find the place where this line cuts the x-axis. We take this as the new iterate
x2
x1
+x1 x0 / f +x1 / f +x1 / f +x0 /
.
(4.14)
When this process is repeated, we have to decide which of the three points x0 , x1 , or x2 , to select for starting the next iteration. There are two plausible choices. The first, we retain the last iterate and one point from the previous ones so that the two new points enclose the solution (Figure 4.9). This is the method of false position.
Mathematics for Engineers
134
y
f+x0 /
x3
x2
x1
x0 f+x1 /
x Figure 4.9. The method of false position. After the second iteration, the root is known to lie in the interval +x3 , x0 /.
The formula for the false position algorithm is based on the similarity of the two triangles involved in the iteration. Using the triangles generated by the straight line connecting the upper and lower value of the function in the interval #xn , xn1 ' we can write down the relation f +xn /
f +xn1 /
xn1 xn
xn1 xn1
(4.15)
This equation is equivalent to +xn1 xn1 / f +xn /
f +xn1 / +xn1 xn /
(4.16)
which is written by collecting terms as xn1 + f +xn / f +xn1 //
xn1 f +xn / xn f +xn1 /
(4.17)
which is equivalent to xn1 f +xn /
xn1
f +xn / f +xn1 /
xn f +xn1 / f +xn / f +xn1 /
If we add and subtract on the right hand side xn we find xn1
xn
xn
xn1 f +xn / f +xn / f +xn1 /
xn1 f +xn / f +xn / f +xn1 /
xn
xn f +xn1 / f +xn / f +xn1 /
xn f +xn / xn f +xn1 / xn f +xn1 / f +xn / f +xn1 /
(4.18)
IV - Chapter 4: Roots of Equations
xn xn xn
xn1 f +xn / f +xn / f +xn1 /
135
xn f +xn / f +xn / f +xn1 /
+xn1 xn / f +xn / f +xn / f +xn1 / +xn xn1 / f +xn / f +xn / f +xn1 /
The successive iterates of the false position method are then simply computed by xn
xn1
+xn xn1 / f +xn / f +xn / f +xn1 /
.
(4.20)
We use this form because it involves one less function evaluation and one less multiplication than the original relation (4.18) we started from. The algorithm for the secant method consists of three steps: 1. Generate the approximated root by the derived iteration formula 2. Check if the error requirements are satisfied; if yes stop and return the value 3. If sign+ f +a// sign+ f +c// 0, then set a c. Otherwise, set b c. Return to step 1. The following lines are an implementation of the secant method falsePositionMethod$f_, a_, b_( : Block%c, H While%0
105 , ain
a, bin
b, cold
b, k
0, results
0,
k k 1; + first step find the approximation / +bin ain/ ; c bin +f s. x ! bin/ +f s. x ! bin/ +f s. x ! ain/ + second step select the root and terminate / If#Abs#+cold c/' H, Return#results', cold c'; AppendTo#results, k, c'; + third step select the interval / If#+f s. x ! ain/ +f s. x ! c/ 0, ain N#c', bin N#c'' ) )
The application of the secant method shows the iteration steps
,
Mathematics for Engineers
136
falsePositionMethod$x6 x 1, 1, 2( ss TableForm#Ó, TableHeadings , "k", "c"' & k
c
1
63 62
2 3 4 5 6 7 8 9 10 11
1.19058 1.11766 1.14056 1.1328 1.13537 1.13451 1.1348 1.1347 1.13473 1.13472
The same example was used previously as an example for the bisection method. The results are given above. The last iterate equals the root D rounded to 5 significant digits. The false position method converge only a little bit faster than the bisection method. But as the iterates become closer to D, the speed of convergence increases.
4.2.3 Secant Method The secant method and the false position method are known as straight-line approximations to the given function y f +x/. Assume that two initial guesses to the root D are known and denoted by x0 and x1 . They may occur on opposite side of D or on the same side of D. The two points +x0 , f +x0 // and +x1 , f +x1 //, on the graph of y f +x/, determine a straight line, called a secant line. This line is an approximation to the graph of y f +x/ and its root x2 is an approximation of D (see Figure 4.10). To derive a formula for x2 , we proceed in a manner similar to that used to derive the false position formulas: Find the equation of the line and then find its root x2 . The equation of the line is given by y
p+x/
f +x1 / +x x1 /
f +x1 / f +x0 / x1 x0
.
(4.21)
Solving p+x2 / x2
0, we obtain x1 x0 x1 f +x1 / f +x1 / f +x0 /
(4.22)
Having found x2 , we can drop x0 and use x1 , x2 as a new set of approximate values for D. this leads to an improved value x3 ; and this process can be continued indefinitely. Doing so, we obtain the general iteration formula
IV - Chapter 4: Roots of Equations
xn1
xn f +xn /
xn xn1 f +xn / f +xn1 /
137
for n 1.
(4.23)
This is the secant method. It is called a two-point method, since two approximate values are needed to obtain an improved value. The bisection method is also a two-point method, but the secant method will almost always converge faster than bisection. Figure 4.10 illustrates how the secant method works and shows the difference between it and the method of false position. From this example we can see that now the successive iterates are no longer guaranteed to enclose the root.
y
f+x0 /
x1
x3
x2 x0
f+x1 /
x Figure 4.10. The secant method.
The algorithm for the secant method consists of three steps: 1. generate the approximated root by the derived iteration formula 2. change the boundary values a b and b c. 3. check if the error requirements are satisfied; if yes stop and return the value, if not return to step 1. The following lines are an implementation of the secant method:
Mathematics for Engineers
138
secantMethod$f_, a_, b_( : Block%c, H While%0
105 , ain
a, bin
b, cold
b, k
0, results
,
0,
k k 1; + first step find the approximation / +bin ain/ ; c bin +f s. x ! bin/ +f s. x ! bin/ +f s. x ! ain/ + second step select the root and terminate / ain N#bin'; bin N#c'; + third step select the root and terminate / If#Abs#+cold c/' H, Return#results', cold c'; AppendTo#results, k, c'; ) )
The application of the secant method shows the iteration steps secantMethod$x6 x 1, 1, 2( ss TableForm#Ó, TableHeadings , "k", "c"' & k c 1
63 62
2 3 4 5 6 7
1.03067 1.17569 1.12368 1.13367 1.13475 1.13472
The same example was used previously as an example for both the bisection and false position method. The results are given in the table above. The last iterate equals to the root D rounded to 5 significant digits. Contrary to the bisection method the secant method converge very rapidly. When the iterates become closer to D, the speed of convergence increases in a way which needs less steps. Example 4.2. Secant and False Position Method The function f +x_/ :
x2 Æ x 1
has a root in the interval #0, 1' since f +0/ f +1/ 0. Solution 4.2. The results for all three methods discussed so far, the bisection, the false position,
IV - Chapter 4: Roots of Equations
139
and secant methods, are demonstrated in the following. The function has a root near x 0.7 as shown in the following Figure 4.11.
1.5 1.0
f+x/
0.5 0.0 0.5 1.0 0.0
0.2
0.4
0.6
0.8
1.0
x Figure 4.11. Graph of the function f +x/
x2 Æx 1 for x ± #0, 1'.
All methods start with two points x0 derive the root.
0 and x1
1. The following tables show the steps needed to
First the bisection method is applied to the problem bisection#f#x', 0, 1' ss TableForm#Ó, TableHeadings , "k", "c"' & k
c
1
1 2
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0.75 0.625 0.6875 0.71875 0.703125 0.710938 0.707031 0.705078 0.704102 0.703613 0.703369 0.703491 0.70343 0.703461 0.703476
Next, we use the false position method
140
Mathematics for Engineers
falsePositionMethod#f#x', 0, 1' ss TableForm#Ó, TableHeadings , "k", "c"' & k
c
1
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
1.8816 0.420725 0.941745 0.589956 0.78112 0.660269 0.730769 0.68747 0.713283 0.697609 0.707023 0.701331 0.704759 0.70269 0.703937 0.703184 0.703638 0.703364 0.70353 0.70343 0.70349 0.703454 0.703476 0.703462
Æ1 Æ
Finally the secant method is used. secantMethod#f#x', 0, 1' ss TableForm#Ó, TableHeadings , "k", "c"' & k c 1 1 2 3 4 5 6 7
Æ1 Æ
0.569456 0.797357 0.685539 0.701245 0.703524 0.703467
IV - Chapter 4: Roots of Equations
141
The results for the different methods show that the bisection method needs the expected number of iterations. However, the false position method needs more steps than expected. If we look at the results generated during the iteration we observe that the root is approached. But during the first few iteration steps there is some oscillation around the root which makes the convergence not direct. Contrary to the secant method the false position method converge quite fast to the true root and does not show oscillations.Ô By using techniques from calculus and some algebraic manipulation, it is possible to show that the iterates xn satisfy D xn1
+D xn / +D xn1 /%
f '' +[n / 2 f ' +[n /
).
(4.24)
The unknown number [n is between xn and xn1 , and the unknown number [n is between the largest and the smallest of the numbers D, xn , and xn1 . The error formula closely resembles the Newton error formula which is discussed in the next section. This kind of formula should be expected, since the secant method can be considered as an approximation of Newton's method, based on the difference quotient f ' +xn /
f +xn / f +xn1 / xn xn1
.
(4.25)
Check as an exercise that the use of this expression in Newton's formula (4.31) will yield (see next subsection) xn xn1 xn1 xn f +xn / for n 1. (4.26) f +xn / f +xn1 /
4.2.4 Newton's Method Consider the graph of y f +x/ shown in Figure 4.12. The root D occurs where the graph crosses the xaxis. We will usually have an estimate of D, and it will be denoted here by x0 . To improve on this estimate, consider the straight line that is tangent to the graph at the point +x0 , f +x0 //. If x0 is near D, this tangent line should be nearly coincident with the graph of y f +x/ for points x about D. Then the root of the tangent line should nearly equal D. This root is denoted by x1 . To find a formula for x1 , consider the equation of the line tangent to the graph of y +x0 , f +x0 //. It is simply the graph of y p1 +x/ for the linear Taylor polynomial p1 +x/
f +x0 / f ' +x0 / +x x0 /.
f +x/ at (4.27)
By definition, x1 is the root of p1 +x/. Solving f +x0 / f ' +x0 / +x x0 / leads to
0
(4.28)
Mathematics for Engineers
142
x0
x1
f +x0 / f ' +x0 /
.
(4.29)
Since x1 is expected to be an improvement over x0 as an estimate of D, this entire procedure can be repeated with x1 as the initial guess. This leads to the new estimate x1
x2
f +x1 / f ' +x1 /
.
(4.30)
Repeating this process, we obtain a sequence of numbers x1 , x2 , x3 ,…that we hope will approach the root D. These numbers are called iterates, and they are defined recursively by the following general iteration formula: xn
xn1
f +xn /
n
f ' +xn /
0, 1, 2, …
This is Newton's method for solving f +x/
(4.31)
0. Some times it is also called Newton-Raphson method.
x1
0
x2
5 f+x/
10 15 20 x0 25 2.5
2.0
1.5
1.0
Figure 4.12. Iteration process of Newton's Method.
Another way to derive Newtons method is based on the simple facts of two point representations of lines. For this purpose, we note that the point-slope form of the tangent line to y f +x/ at the initial approximation x1 is y f +x1 /
f ' +x1 / +x x1 /.
(4.32)
If f ' +x1 / 0, then this line is not parallel to the x-axis and consequently it crosses the x-axis at some point +x2 , 0/. Substituting the coordinates of this point in the formula above yields 0 f +x1 /
f ' +x1 / +x2 x1 /.
Solving for x2 we obtain
(4.33)
IV - Chapter 4: Roots of Equations
x1
x2
143
f +x1 /
(4.34)
f ' +x1 /
The next approximation can be obtained more easily. If we view x2 as the starting approximation and x3 the new approximation, we can simply apply the given formula with x2 in place of x1 and x3 in place of x2 . This yields x2
x3
f +x2 / f ' +x2 /
.
(4.35)
provided f ' +x2 / 0. In general, if xn is the nth approximation, then it is evident from the two steps given above that the improved approximation xn1 is given by xn1
xn
f +xn / f ' +xn /
with n
1, 2, 3, ….
(4.36)
This formula is realized in Mathematica by the following line newtonsMethod$f_, x1_( :
x
f x f
s. x ! x1
The replacement x x1 in the function f f +x/ is necessary because we are dealing with numerical values in the calculation. The iteration of this function can be carried out by a special Mathematica function called Nest[] or NestList[]. These function generate a nested expression of the newtonsMethod[] function and deliver the approximation of the root. For example if we are going to determine one of the roots of the polynomial p+x/
x3 4 x2 5
0
defined as p+x_/ :
x3 4 x2 5
whose graph is given in Figure 4.13
Mathematics for Engineers
144
5
0
p+x/
5 10 15
2
1
0
1
2
3
4
x Figure 4.13. Graph of the polynomial p+x/
x3 4 x2 5.
If we apply Newton's Method to this polynomial we get for an initial value x1 list of 7 approximations
1.155 the following
NestList#newtonsMethod#p#x', Ó' &, 0.155, 7'
res
0.155, 4.357, 3.82396, 3.64124, 3.61838, 3.61803, 3.61803, 3.61803
The result is a list of approximations of the root starting with the initial value. Example 4.3. Newton's Method I Use Newton's Method to find
7
2.
Solution 4.3. Observe that finding x7 2
7
2 is equivalent to finding the positive root of the equation
0
so we take p+x_/ :
x7 2
and apply Newton's Method to this formula with an appropriate initial value x1 of the method delivers NestList#newtonsMethod#p#x', Ó' &, 0.85, 17' 0.85, 1.48613, 1.30035, 1.17368, 1.11532, 1.10442, 1.10409, 1.10409, 1.10409, 1.10409, 1.10409, 1.10409, 1.10409, 1.10409, 1.10409, 1.10409, 1.10409, 1.10409
This means that
7
2
1.10409 ….Ô
Example 4.4. Newton's Method II Find the solution of the equation cos+x/
x.
0.85. The iteration
IV - Chapter 4: Roots of Equations
145
Solution 4.4. To apply Newton's method to a function, we are not restricted to polynomials. However, we can apply this method to any kind of function which allows a first order derivative. As in the current case, we rewrite the equation as p+x_/ : cos+x/ x
and apply Newton's Method to this expression to get NestList#newtonsMethod#p#x', Ó' &, 0.155, 7'
res
0.155, 0.876609, 0.742689, 0.739088, 0.739085, 0.739085, 0.739085, 0.739085
The symbolic expression for this iteration can be found by replacing the numerical initial value by a general symbol as in the next line shown. We use x1 as a symbol instead of a number. symbolicNewton x1 ,
NestList#newtonsMethod#p#x', Ó' &, x1 , 2' ss Simplify
Cos#x1 ' Sin#x1 ' x1 1 Sin#x1 '
Cos#x1 ' x1 1 Sin#x1 '
x1
,
1 'Sin#x1 ' x1 Cos#x1 ' Cos% Cos#x1Sin#x ) +1 Sin#x1 '/ Sin#x1 ' x1 ' 1
+1 Sin#x1 '/ -1 Sin%
Cos#x1 'Sin#x1 ' x1 1Sin#x1 '
)1
!
The result is a symbolic representation of the nested application of Newton's method and thus represents an approximation formula for the root if we insert an initial value x1 into this formula. symbolicNewton s. x1 0.155 0.155, 0.876609, 0.742689
The symbolic formula delivers the same values for the approximation as expected. However, the symbolic representation of Newton's formula allows us to set up a tree of approximation formulas which can be efficiently used for different initial values. The advantage of the symbolic approach is that we get a formula for the approximation which needs only a single numeric value to get the final answer. There are for example no rounding errors.Ô In the following we will examine the error bounds of Newton's method. Let us assume f +x/ has at least two continuous derivatives for all x in some interval about the root D. Further assume that f ' +x/ 0.
(4.37)
This says that the graph of y f +x/ is not tangent to the x-axis when the graph intersects at x D. Also note that combining f ' +x/ 0 with the continuity of f ' +x/ implies that f ' +x/ 0 for all x near D. To estimate the error we use Taylor's theorem to write f +D/
f +xn / +D xn / f ' +xn /
1 2
+D xn /2 f '' +cn /
where cn is an unknown point between D and xn . Note that f +D/
(4.38) 0 by assumption, and then divide
146
Mathematics for Engineers
f ' +xn / to obtain 0
f +xn / f ' +xn /
D xn +D xn /2
f '' +cn /
(4.39)
2 f ' +xn /
solving for D xn1 , we have D xn1
+D xn /2 %
f '' +cn / 2 f ' +xn /
).
(4.40)
This formula says that the error in xn1 is nearly proportional to the square of the error in xn . When the initial error is sufficiently small, this shows that the error in the succeeding iterates will decrease very rapidly. This formula can also be used to give a formal mathematical proof of the convergence of Newton's method. For the estimation of the error, we are computing a sequence of iterates xn , and we would like to estimate their accuracy to know when to stop the iteration. To estimate D xn , we note that, since f +D/ 0, we have f +xn /
f +xn / f +D/
f ' +[n / +xn D/
(4.41)
for some [n between xn and D, by the mean-value theorem. Solving for the error, we obtain f +xn /
D xn
f ' +[n /
f +xn /
(4.42)
f ' +xn /
provided that xn is so close to D that f ' +xn /
f ' +[n /. From Newton's iteration formula this becomes
D xn xn1 xn .
(4.43)
This is the standard error estimation formula for Newton's method, and it is usually fairly accurate. The following function uses this estimation of errors to terminate the iteration. newtonsMethod$f_, x1_( : Block%x1in
x1, xnew, H
0.00001,
+ generate an infinite loop / While%0
0,
+ Newton's iteration formula / f xnew x s. x ! x1in; x f + check the error related to +4.43/ / If#Abs#xnew x1in' H, Return#xnew''; x1in N#xnew'; Print#"x ", xnew' ) )
IV - Chapter 4: Roots of Equations
147
newtonsMethod$x6 x 1, 1( 6 x 5 x
1.14358
x
1.13491
x
1.13472 1.13472
From the discussion above, Newton's method converges more rapidly than the secant method. Thus Newton's method should require fewer iterations to attain a given error. However, Newton's method requires two function evaluations per iteration, that of f +xn / and f ' +xn /. And the secant method requires only one evaluation, f +xn /, if it is programmed carefully to retain the value of f +xn1 / from the previous iteration. Thus, the secant method will require less time per iteration than the Newton method. The decision as to which method should be used will depend on the factors just discussed, including the difficulty or expense of evaluating f ' +xn /; and will depend on integrable human factors, such as convenience of use. Newton's method is very simple to program and to understand; but for many problems with a complicated derivative f ' +x/, the secant method will probably be faster in actual running time on a computer. The derivation of both the Newton and secant methods illustrate a general principle of numerical analysis. When trying to solve a problem for which there is no direct or simple method of solution, approximate it by another problem that you can solve more easily. In both cases, we have replaced the solution of f +x/ 0 with the solution of a much simpler root finding problem for a linear equation. The nature of the approximation being used also leads to the following observation: When dealing with problems involving differentiable functions f +x/, move to a nearby problem by approximating each such f +x/ with a linear function. The linearization of mathematical problems is common throughout in applied mathematics and numerical analysis. Finding the zeros of a given function f , that is arguments [ for which f +[/ 0, is a classical problem. In particular, determining the zeros of a polynomial (the zeros of a polynomial are also known as its roots) has captured the attention of pure and applied mathematicians for centuries. However, much more general problems can be formulated in terms of finding zeros, depending upon the definition of the function f which is a mapping of its domain D to its range R. For example, if D R 5n , then a mapping f : 5n 5n is described by n real functions ¶ fi +x1 , x2 , …, xn / and i 1, 2, …, n. The arguments will be combined in a vector x +x1 , x2 , …, xn / ¶ such that the functions fi become simply fi +x/. The collection of these n functions fi in a vector will ¶ ¶ ¶ ¶ result to f +x/ + f1 +x/, f2 +x/, …, fn +x//. ¶ The problem of solving f +x/ 0 becomes that of solving a system of (nonlinear) equations:
Mathematics for Engineers
148
¶ fi +x/
0 with i
1, 2, …, n.
(4.44)
The classical Newton method for such nonlinear systems is obtained by linearizing f . Linearization is also a means of constructing iterative methods to solve equation systems of the form (4.45) f +x¶ / 0. ¶ ¶ If we assume that ¶x [ is a zero for f , that x¶ 0 is an approximation to [, and that f is differentiable for ¶ ¶ x x0 then to a first approximation ¶ ¶ ¶ ¶ ¶ (4.46) 0 f +[/ f +x0 / Df +x0 / ,[ x0 0, where Df +x¶ 0 /
f1 x1
…
f1 xn
fn x1
…
fn xn
(4.47)
x x0
is the Jacobian of the functions f and ¶ [ x¶ 0 ,[1 +x¶ 0 /1 , …, [n +x¶0 /n 0
(4.48)
If the Jacobian Df +x¶ 0 / is nonsingular, then the equation 0 f +x¶ / Df +x¶ / +x¶ x¶ / 0
0
can be solved for x¶1 ¶x 1
1
0
¶x ,Df +x¶ /01 f +x¶ / 0 0 0
(4.49)
¶ ¶ and x1 may be taken as a closer approximation to the zero [. The generalized Newton method for solving systems of equations (4.45) is given by ¶x n1
¶x ,Df +x¶ /01 f +x¶ / with n n n n
0, 1, 2, 3, ….
(4.50)
The major problem with one point methods is to find starting points of the iteration which guarantee the convergence to the root. How delicate this problem is shows the following sequence of plots. In the plots the starting points for two equations are chosen on a regular grid. To each root of the equations there is a color assigned so that the structure of the basin of attraction is represented by the different colors. The resulting pictures show intricately interwoven sets in the plane.
IV - Chapter 4: Roots of Equations
149
Figure 4.14. Basins of attraction for the equation zn 1
0 where z
x Ç y and n
2, 3, 4, ... 10.
Example 4.5. Systems of Equations The 3 3 nonlinear system 3 x Cos#y z'
eqSystem
Æx y 20 z
10 S 3 3
1 0 2 1 2 11 1 sin+z/ 10 10
1 2
m 0, x2 81 y
1
2
10
Sin#z'
11 10
m 0,
m 0!; eqSystem ss TableForm
3 x cos+y z/ x2 81 -y Æ
x y
20 z
1 3
0
+10 S 3/ 0
can be written in the form (4.45) by defining the three coordinate functions f1 , f2 , and f3 . These functions are assigned to the rows of the given matrix.
Mathematics for Engineers
150
Solution 4.5. The Jacobi matrix of the system of equations is gained by differentiating the components with respect to the unknowns x, y, and z. Before we can evaluate this matrix we need to extract the functions f1 , f2 , and f3 from the equations. This is done by removing the equal sign in the equations in Mathematica eqMatrix
1 2
eqSystem s. Equal$l_, r_( l r
3 x Cos#y z',
11 10
x2 81
1 10
2
y
Sin#z', Æx y
1 3
+ 3 10 S/ 20 z!
The Jacobi matrix of these functions follows by using the outer product of the differential with respect to the unknown variables jacobiMatrix 3 2x Æx y y
Outer#D, eqMatrix, x, y, z'; jacobiMatrix ss MatrixForm
z Sin#y z' 1 162 , 10 x Æ y
y0 x
y Sin#y z' Cos#z' 20
To set up Newton's iteration scheme we need the inverse of the Jacobi matrix inverse
Inverse#jacobiMatrix';
This matrix is multiplied by the vector of functions f . To use this product in an iteration we replace the variables by their current iteration values in a NestedList iterateSol NestList#++x, y, z inverse.eqMatrix/ s. Thread#x, y, z Ó317, Ó327, Ó337'/ &, 1., 1., 1., 16' 1., 1., 1., 0.919624, 0.461046, 0.503385, 0.500984, 0.187983, 0.520853, 0.500531, 0.0622526, 0.521975, 0.500099, 0.0134795, 0.523249, 0.500005, 0.0029802, 0.523524, 0.5, 0.00244396, 0.523538, 0.5, 0.00244255, 0.523538, 0.5, 0.00244255, 0.523538, 0.5, 0.00244255, 0.523538, 0.5, 0.00244255, 0.523538, 0.5, 0.00244255, 0.523538, 0.5, 0.00244255, 0.523538, 0.5, 0.00244255, 0.523538, 0.5, 0.00244255, 0.523538, 0.5, 0.00244255, 0.523538, 0.5, 0.00244255, 0.523538
The results become stable after 9 iterations. To check that the found numbers are solutions of the original equations we insert these values into the original equations eqSystem s. Thread#x, y, z Last#iterateSol'' True, True, True
which demonstrates that the derived final values satisfy the original equations.Ô
IV - Chapter 4: Roots of Equations
151
4.2.5 Fixed-Point Method The Newton method and the secant method are examples of one-point and two-point methods, respectively. In this section, we give a more general introduction to iteration methods, presenting a general theory for one-point iteration formulas for a single variable. As a motivational example, consider solving the equation x2 7
(4.51)
0
for the root D 7 2.64575. To find this number we use the same ideas as in Newton's approach to set up a general iteration formula which can be stated as xn1
g+xn /
(4.52)
To solve the simple problem (4.51) we introduce four iteration schemes for this equation 1. xn1 2. xn1
3. xn1 4. xn1
7 xn x2n
(4.53)
7 (4.54)
xn 1 xn 1 2
xn
1 7
x2n
(4.55)
7 xn
(4.56)
As stated above, the iterations (4.53-56) all have the form (4.52) for appropriate continuous functions g+x/. For example, with (4.53) g+x/ 7 x x2 . The formulas (4.53-56) are represented by the graphs representing g+x/ and its first order derivative, respectively.
Mathematics for Engineers
152
4 g+x/
y
2
0 2 g'+x/
4 2.0
2.2
2.4
2.6
2.8
3.0
x Figure 4.15. Graph of the function g+x/
7 x x2 and its derivative g ' +x/
1 2 x.
We can iterate the function g+x/ by using the Mathematica function NestList[] to generate a sequence of numbers related to the iteration xn1
7 xn x2n
(4.57)
which delivers the following result for a specific initial value x0
2.6 as
NestList$,7 Ó Ó 0 &, 2.6, 6( 2
2.6, 2.84, 1.7744, 5.6259, 19.0249, 373.972, 140 222.
Assuming x 2 and x 3 as the lower and upper boundary of an interval in which the actual root is located, we observe that the first order derivative of this function g is larger than one (see Figure 4.15). The absolute value of the maximum of g ' +2/ 3 which is larger than 1. Take this for the moment as an observation and remember it in the following discussion. The second iteration formula xn1
7 xn
is also graphically represented on the interval x ± #2, 3' in the following Figure 4.16:
(4.58)
IV - Chapter 4: Roots of Equations
153
g+x/
3
y
2 1 0 g'+x/ 1
2.0
2.2
2.4
2.6
2.8
3.0
x Figure 4.16. Graph of the function g+x/
7 s x and its derivative g ' +x/
7 t x2 .
Again we can use the function NestList[] to generate a sequence of numbers based on iteration formula (4.54) 7 NestList%
Ó
&, 2.6, 16)
2.6, 2.69231, 2.6, 2.69231, 2.6, 2.69231, 2.6, 2.69231, 2.6, 2.69231, 2.6, 2.69231, 2.6, 2.69231, 2.6, 2.69231, 2.6
In this case we observe that the sequence using the same initial value x0 2.6 oscillated between two values which are enclosing the root we are looking for. However, contrary to the previous sequence the current sequence does not diverge. If we examine the first order derivative of this iteration we observe that the magnitude g ' +x/ is bounded and the maximum of this value is greater than one (see Figure 4.16). The next Figure 4.17 shows the graph of g+x/ for the following iteration xn1
1 xn
1 7
x2n
(4.59)
Mathematics for Engineers
154
2.5 g+x/ 2.0
y
1.5 1.0 g'+x/
0.5 0.0 2.0
2.2
2.4
2.6
2.8
3.0
x Figure 4.17. Graph of the function g+x/
1x
1 7
x2 and its derivative g ' +x/
1
2 7
x.
The related sequence of (4.59) shows convergent to a single value which in fact represents to a certain accuracy the root of x2 7. NestList% 1 Ó
1 7
Ó2 &, 2.5, 6)
2.5, 2.60714, 2.63612, 2.64339, 2.64517, 2.64561, 2.64572
For this iteration formula we also observe that the magnitude of the first order derivative of g ' is smaller than one (compare Figure 4.17). For the iteration formula (4.56) xn1
1 2
xn
7 xn
g+x/ and g' +x/ on the interval x ± #2, 3' is shown in the following Figure 4.18.
(4.60)
IV - Chapter 4: Roots of Equations
155
2.5 g+x/ 2.0
y
1.5 1.0 g'+x/
0.5 0.0 2.0
2.2
2.4
2.6
2.8
3.0
x Figure 4.18. Graph of the function g+x/
1 2
7
1
x
2
-x 1 and its derivative g ' +x/
The generation of the sequence using the initial value x0 1 NestList% 2
Ó
-1
7 x2
1.
2.6 shows
7 Ó
&, 2.6, 6)
2.6, 2.64615, 2.64575, 2.64575, 2.64575, 2.64575, 2.64575
that we approach the same value as for the iteration formula (4.59). Here again the maximum of the first order derivative of g ' +x/ is smaller than one. All four iterations have the property that if the sequence xn n 0 has a limit D, then D is a root of the defining equation. For each equation, we check this as follows: Replace xn and xn1 by D, and then show that this implies D 7 . The next lines show you the results of this calculation for the different cases, respectively. Solve$D2 D 7 D, D( D 7 Solve%
D
D
Solve% D
7 , D
7
D, D) 7 , D D2 7
7
D 1 D, D)
7 , D
7
Mathematics for Engineers
156
1 Solve%
2
D
D
7 D
D, D)
7 , D
7
To explain this results, we are now going to discuss a general theory for one-point iteration formulas which explains all the observed facts. The iterations (4.53-56) all have the same form xn1
g+xn /
(4.61)
for appropriate continuous functions g+x/. If the iterates xn converge to a point D, then limn xn1 D
limn g+xn /
(4.62)
g+D/.
(4.63)
Thus D is a solution of the equation x (4.63) a fixed point equation.
g+x/, and D is called a fixed point of the function g. We call
The next step is to set up a general approach to explain when the iteration xn1 g+xn / will converge to a fixed point of g. We begin with a lemma on the existence of solutions of x g+x/. Corollary 4.1. Fixed point Existence
Let g+x/ be a continuous function on an interval #a, b', and suppose g satisfies the property a x b and a g+x/ b. Then the equation x
(4.64)
g+x/ has at least one solution D in the interval #a, b'.
Proof 4.1. Define the function f +x/ x g+x/. It is continuous for a x b. Moreover, f +a/ 0 and f +b/ 0. By the intermediate value theorem there must be a point x in #a, b' for which f +x/ 0. We usually denote this value of x by D.
QED The geometric meaning of Corollary 4.1 is shown in Figure 4.19 for a graphical interpretation of the solution of x g+x/. The solutions D are the x-coordinates of the intersection points of the graphs of y x and y g+x/.
IV - Chapter 4: Roots of Equations
157
b
y
g+x/
y x
a D
a
b
x Figure 4.19. Example for the fixed point lemma and demonstration of the geometric meaning of the fixed point equation.
The following panel contains an animation of the interval selection for different functions g+x/ used in (4.53-56).
Mathematics for Engineers
158
a b function
7 x x2
7 x
1x
x2
1
7
2
7
- x1 x
Figure 4.20. Representation of the fixed point lemma for the different functions used in the iterations (4.53-56).
The observations so far made are formulated in the following theorem. Theorem 4.3. Contraction Mapping
Assume g+x/ and g ' +x/ are continuous for a x b, and assume g satisfies the conditions of Corollary 4.1. Further assume that O
maxaxb g' +x/ 1
Then the following statements hold
(4.65)
IV - Chapter 4: Roots of Equations
S1: There is a unique solution D of x
159
g+x/ in the interval #a, b'.
S2: For any initial estimate x0 in #a, b', the iterates xn will converge to D . S3: D xn S4:
On
1O
D x limn D xn1 n
x0 x1 , n 0 g ' +D /.
Thus for xn close to D D xn1 g ' +D/ +D xn /.
(4.66)
Proof 4.3. There is some useful information in the proof, so we go through most of the details of it. Note first that the hypotheses on g allow us to use Corollary 4.1 to assert the existence of at least one solution to x g+x/. In addition, using the mean value theorem, we have that for any two points w and z in #a, b', g+w/ g+z/
g' +c/ +w z/
(4.67)
for some c between w and z. Using the property of O in this equation, we obtain g+w/ g+z/
g ' +c/ «« w z O w z
for a w, z b.
S1: Suppose there are two solutions, denoted by D and E. Then D ing these, we find that D E
g+D/ and E
g+D/ g+E/.
(4.68) g+E/. By subtract(4.69)
Take absolute values and use the estimation from above D E O D E
(4.70)
+1 O/ D E 0.
(4.71)
Since O 1, we must have D val #a, b'.
E; and thus, the equation x
g+x/ has only one solution in the inter-
S2: From the assumptions in Corollary 4.1, it can be shown that for any initial guess x0 in #a, b', the iterates xn will all remain in #a, b'. For example, if a x0 b, then Corollary 4.1 implies a g+x0 / b. Since x1 g+x0 /, this shows x1 is in #a, b'. Repeat the argument to show that x2 g+x1 / is in #a, b', and continue the argument inductively. To show that the iterates converge, subtract xn1 D xn1
g+D/ g+xn /
g+xn / from D
g+D/, obtaining
g ' +cn / +D xn /
(4.72)
for some cn between D and xn . Using the assumption of the Lemma, we get D xn O D xn
n 0.
Inductively, we can then show that
(4.73)
Mathematics for Engineers
160
D xn On D x0 ,
n 0.
(4.74)
Since O 1, the right side of this expression goes to zero as n , and this then shows that xn D as n . This kind of convergence is also known as a Cauchy sequence. S3: If we use D xn O D xn with n
0 we get
D x0 D x1 x1 x0 O D x0 x1 x0 +1 O/ D x0 x1 x0 D x0
1 1O
(4.75) (4.76)
x1 x0 .
(4.77)
combining this with the final result of S2 we can conclude that D xn
On 1O
S4: We use D xn1 limn
x0 x1 , n 0. g+D/ g+xn /
D xn1
(4.78)
g ' +cn / +D xn / to write
limn g ' +cn /.
D xn
(4.79)
Each cn is between D and xn , and xn D, by S2. Thus, cn D. Combining this with the continuity of the function g' +x/ to obtain limn g' +cn /
g ' +D/
(4.80)
thus finishes the proof.
QED We need a more precise way to deal with the concept of the speed of convergence of an iteration method. We say that a sequence xn n 0 converges to D with an order of convergence p 1 if D xn1 c D xn p , n 0
(4.81)
for some constant c 0. The cases p 1, p 2, and p 3 are referred to as linear, quadratic, and cubic convergence, respectively. Newton's method usually converges quadratically; and the secant method has order of convergence p
,1
5 0 t 2. For linear convergence, we make the additional
requirement that c 1; as otherwise, the error D xn need to converge to zero. If g ' +D/ 1 in the preceding theorem, then the relation D xn1 O D xn shows that the iterates xn are linearly convergent. If in addition, g' +D/ 0, then the relation D xn1 g' +D/ +D xn / proves the convergence is exactly linear, with no higher order of convergence being possible. In this case, we call the value g ' +D/ the linear rate of convergence.
IV - Chapter 4: Roots of Equations
161
In practice, Theorem 4.3 is seldom used directly. The main reason is that it is difficult to find an interval #a, b' for which the conditions of the Corollary is satisfied. Instead, we look for a way to use the theorem in a practical way. The key idea is the result D xn1 g ' +D/ +D xn /, which shows how the iteration error behaves when the iterates xn are near D. Corollary 4.2. Convergence of the Fixed Point Method
Assume that g+x/ and g ' +x/ are continuous for some interval c x d, with the fixed point D contained in the interval. Moreover, assume that g ' +D/ 1.
(4.82)
Then, there is an interval #a, b' around D for which the hypotheses, and hence also the conclusion, of Theorem 4.3 are true. And if to be contrary, g ' +D / ! 1, then the iteration method xn1 g+xn / will not converge to D . When g ' +D / 1 no conclusion can be drawn. If we check the iteration formulas (4.53-56) by this corollary we observe that the first and second iteration scheme is not converging to the real root. This behavior is one of the shortcomings of the fixed point method. In general fixed point methods are only used in practice if we know the interval in which the fixed point is located and if we have a function g+x/ available satisfying the requirements of Theorem 4.3. The following examples demonstrate the application of the fixed point theorem. Example 4.6. Fixed Point Method I Let g +x/ ,x2 10 t 5 on #1, 1'. The Extreme Value Theorem implies that the absolute minimum of g occurs at x 0 and g +0/ 1 s 5. Similarly, the absolute maximum of g occurs at x 1 and has the value g+1/ 0. Moreover, g is continuous and 2x
g ' +x/
5
2 5
,
for all x ± +1, 1/.
(4.83)
So g satisfies all the hypotheses of Theorem 4.3 and has a unique fixed point in #1, 1'. Solution 4.5. In this example, the unique fixed point [ in the interval #1, 1' can be determined algebraically. If D
D2 1
g+D/
5
then D2 5 D 1
0
which, by the quadratic formula, implies that solfp D
g+x_/ :
Solve$D2 5 D 1 0, D( 1 2
.5 1 5
29 2!, D
,x2 10
1 2
.5
29 2!!
(4.84)
Mathematics for Engineers
162
g+x/ x
s. x 4
8 5 1
Note that g actually has two fixed points at [ 2 ,5 29 0. However, the second solution of {{[ -0.19258240356725187}, {[ 5.192582403567252}} is not located in the interval we selected but included in the interval #4, 6'. This second solution of the fixed point equation does not satisfy the assumptions of Theorem 4.3, since g' +4/ 8 s 5 ! 1. Hence, the hypotheses of Theorem 4.3 are sufficient to guarantee a unique fixed point but are not necessary (see Figure 4.21).
6
4 g+x/
1 2
+5
29 /
2 1 2
+5
29 /
0
1
0
1
2
3
4
5
6
x Figure 4.21. Fixed point equation g+x/
,x2 10 t 5 and the fixed points in the interval #1, 1'.Ô
To demonstrate the use of the fixed point theorem in finding roots of equations let us examine the following example. Example 4.7. Fixed Point Method II The equation x3 4 x2 10 0 has a unique root in #1, 2'. There are many ways to change the equation to the fixed-point form x g+x/ using simple algebraic manipulation. We select the following representation of the fixed point equation with g+x/
10 4x
1 2
.
(4.85)
Solution 4.7. Using the function g+x/ as represented in (4.85), we first have to check the prerequisites of Theorem 4.3. The function
IV - Chapter 4: Roots of Equations
163
10
g+x_/ :
x4
assumes the following values at the boundaries of the interval g+1/, g+2/
5
2,
3
!
showing that the interval #1, 2' is mapped to itself. The first order derivative of g+x/ generates the values
g+x/ x
1
5
s. x 1,
, 2
5 3
12
g+x/ x
s. x 2!
!
representing values which are smaller than one in magnitude. Thus the assumptions of Theorem 4.3 are satisfied and the fixed point is reached by a direct iteration to be 10 NestList%
4Ó
1s2
&, 1., 7)
1., 1.41421, 1.35904, 1.36602, 1.36513, 1.36524, 1.36523, 1.36523
which represents the root located in the interval #1, 2' (see Figure 4.22). A sufficient accuracy of the root is reached within a few iteration steps.
Mathematics for Engineers
164
2.0
1.8
g+x/
1.6
1.4
1.2
1.0 1.0
1.2
1.4
1.6
1.8
2.0
x Figure 4.22. Fixed point equation g+x/
-
10 1s2 4x
1
and the fixed points in the interval #1, 2'.Ô
So far, we did not talk about the accuracy of the fixed point method. Actually, the accuracy of the method is determined by part S3 of Theorem 4.3. This relation relates the actual fixed point to the rate of convergence rate O D xn
On 1O
x0 x1 , n 0
(4.86)
if we know the accuracy which is given by D xn number of iterations by Solve%H
On x0 x1 1O
H in the nth iteration we are able to estimate the
, n)
H +1O/
n
log- x0x1 1 log+O/
!!
This formula needs two of the iteration steps x0 and x1 and the convergence rate O x ± #a, b'.
max+g ' +x// with
Example 4.8. Fixed Point Method III We will estimate the number of iterations for the fixed point problem x
2x with x ± #1 s 3, 1'.
We are interested in an accuracy of H
(4.87) 105 .
Solution 4.8. The solution of equation (4.87) tells us that the number of iterations are related to the accuracy H and the convergence rate of the fixed point equation. The convergence rate O is determined for this equation by defining g as
IV - Chapter 4: Roots of Equations
165
g+x_/ : 2x
and its derivative g+x/
derg
x
x
2
log+2/
For the given interval we find O
max derg s. x
1 3
, derg s. x 1!
log+2/ 2
which is smaller than one. The first two iterations of the given fixed point equation follows by NestList%2Ó1 &,
initials 1 , 3
1 3
1 3
, 1)
!
2
The magnitude of the difference of these two values are Subtract initials
absdiff 1 3
2
1 3
Using these information in the formula derived above we find log-
1O
1
105 absdiff
log+O/
log
1 100 000
log-
log+2/ 2 1 3
2
1 3
log+2/ 1 2
which is about 10 to 11 iterations. We can check this by comparing the symbolic fixed point solution with the numeric solution. The symbolic fixed point follows by solving the fixed point equation x 2x as shown next
166
Mathematics for Engineers
Flatten#Solve#x 2x , x''
solution x
W+log+2// log+2/
!
The numerical iteration delivers the following results numsol
NestList%2Ó1 &,
1. , 15) 3
0.333333, 0.793701, 0.576863, 0.67042, 0.628324, 0.646928, 0.638639, 0.642319, 0.640682, 0.641409, 0.641086, 0.64123, 0.641166, 0.641194, 0.641182, 0.641187
The magnitude of the difference between the exact fixed point D and the iterations is shown in Figure 4.23 1 0.1
«Dxn «
0.01 0.001 104 105 0
5
10
15
n Figure 4.23. Error of the fixed point equation x
2x represented in a logarithmic plot. The error D xn
H decreases
continuously with an exponential decay. The error level 105 is reached after 14 iterations which is in agreement with the estimation given above.Ô
4.2.6 Tests and Exercises The following two subsections serve to test your understanding of the last section. Work first on the test examples and then try to solve the exercises. 4.2.6.1 Test Problems T1. What are simple root finding methods? Give examples. T2. What is the difference between two point methods and one point methods? T3. How does a bisection method work? T4. Explain Newton's method? State the basic idea. T5. What is the Regula Falsi method? T6. How does a secant method work?
IV - Chapter 4: Roots of Equations
167
How does a fixed point method work?
4.2.6.2 Exercises E1. A long conducting rod of diameter D meters and electrical resistance R per unit length is in a large enclosure whose walls (far away from the rod) are kept at temperature Ts °C. Air flows past the rod at temperature T . If an electrical current I passes through the rod, the temperature of the rod eventually stabilizes to T, where T satisfies Q+T/
S D h +T T / S D H V ,T 4 Ts4 0 I 2 R 8
0
(1)
where V is the Stefan-Boltzmann constant V 5.67 10 • 10~8 Wattst meter Kelvin , H is the rod surface emissivity H 0.8, h is the heat transfer coefficient of air flow h=20 Wattst meter2 Kelvin, D 0.1 meter, and I 2 R 100. The temperatures T Ts 25 °C. Use the method of bisection to find the steady state temperature T of the rod. 2
4
E2. Use Regula Falsi to find the root of the polynomial x3 4 x2 10 Use the starting interval #a, b'
(2)
0
#1, 2'.
E3. The Cray 1 supercomputer does not have a divide unit. Instead, to compute 1 s R, R ! 0 it forms a "reciprocal approximation," accurate to about half of a floating point word and then uses that as the initial guess for one Newton iteration. Let f +x/ +1 s x/ R. Show that the formula for Newton's method is xk1
xk +2 R xk /
(3)
a. Using this Newton iteration compute 1s R for R 1, 2, ..., 10. Tabulate the number of iterations needed to generate a result that is accurate to six digits. Use x0 0.01. b. Using the same function f +x/ repeat the computations with the bisection algorithm and compare the amount of work to that in (a). Use the initial interval #0.01, 2'. E4. Consider using Newton's method on the equation f +x/ +x 1/2 that has a double zero. Show that for this equation the iterations converge linearly rather than quadratically. Verify numerically that the convergence is linear at a multiple zero by trying to solve f +x/ exp+2 x/ 2 exp +x/ 1 0 and printing out the error at each iterate. E5. In laying water mains, utilities must be concerned with the possibility of freezing. Although soil and weather conditions are complicated, reasonable approximations can be made on the basis of the assumption that soil is uniform in all directions. In that case the temperature in degrees Celsius T+x, t/ at a distance x (in meters) below the surface, t seconds after the beginning of a cold snap, is given approximately by T+x, t/ Ts Ti Ts
x erf 2
Dt
(4)
where Ts is the constant surface temperature during the cold period, Ti is the initial soil temperature before the cold snap, and D is the thermal conductivity of the soil (in meters2 per second). Assume that Ti 20°C, Ts 15°C, D 0.138 106 m2 /sec. Use a numerical solution method to determine how deep a water main should be buried so that it will only freeze after 60 days exposure at this constant surface temperature. Note that water freezes at 0°C. Use a 4. order polynomial to approximate the error function erf +/ in the range of the following data set. x, erf +x/ 0, 0, 1, 0.59, 2, 0.905, 3, 0.987, 4, 0.999 E6. Use Regula Falsi and Newton's method to find the ten smallest positive values of x for which the line crosses the graph of tan+x/. The solution of this problem is important in determining the maximum load that a compressed rod will support without buckling. See, for example Timoshenko (1956) p 154. Compare the Regula Falsi and Newton method with respect to convergence, number of iterations and efficient handling. E7. Find the smaller root of the quadratic x2 100 x 1 0 by (i) using the quadratic formula, (ii) fixed-point iteration. Compare the efficiency of the two processes. E8. The problem "solve x2 x 2 0" can be reformulated in several ways for fixed point iteration, like
168
Mathematics for Engineers
x2 2
x 2. x
x2
3. x
1
4. x
x
2 x x2 x2 a
where a is any constant. Which of these iterations will converge to the two zeros and if it converges, then how fast would the convergence be in each of these cases? Estimate the number of iterations required to reduce the error by a factor of 100. In 4) estimate the range of a for which the iteration converges and also estimate the value of a for which the convergence is the fastest. E9. Compute 2 and 21s5 using the Newton's method and the secant iteration. Compare their efficiencies. E10 Find all real roots of a. Æx 2
b. exp. c. x
100
x2 a x131 131x2
1 with a
1 0.005 131x2
x 0.123
20.44
21.34, 22.8911, 25.3454
x2 a x131 131x2 2
1 with a
21.34, 22.8911, 25.3454
,x 2.21 x 0.420
5 Numerical Integration
5.1 Introduction The definite integral I+ f /
b
à f +x/ Šx
(5.1)
a
is defined in calculus as a limit of what are called Riemann sums. It is then proven that I+ f /
F+b/ F+a/
(5.2)
where F+x/ is any antiderivative of f +x/; this is the Fundamental Theorem of Calculus. Many integrals can be evaluated by using this formula, and a significant portion of most calculus textbooks is devoted to this approach. Nonetheless, most integrals cannot be evaluated by using (5.2) because most integrands f +x/ do not have antiderivatives expressible in terms of elementary functions. Examples of such integrals are 1
x Ã Æ Åx 0
2
S
S Ã x sin, x 0 Å x 0
So, other methods are needed for evaluating such integrals.
(5.3)
170
Mathematics for Engineers
In the first section of this chapter, we define two of the oldest and most popular numerical methods for approximating integrals like (5.1): the trapezoidal rule and Simpson's rule. We also analyze the error in using these methods and then obtain improvements on them. A third method is Gaussian quadrature which will be discussed in addition to the two methods stated. This method is more complicated in its origin than Simpson's and the trapezoidal method, but it is almost always much superior in accuracy for similar amounts of computation.
5.2 Trapezoidal and Simpson Method The central idea behind most formulas for approximating I+ f /
b
à f +x/ Šx
(5.4)
a
is to replace f +x/ by an approximating function whose integral can be evaluated. In this section, we will look at methods based on linear and quadratic interpolation.
5.2.1 Trapezoidal Method Approximate f +x/ by a linear polynomial p1 +x/
+b x/ f +a/ +x a/ f +b/
(5.5)
ba
which interpolates f +x/ at a and b (see Figure 5.1). The integral of p1 +x/ along #a , b' is the area of the shaded trapezoid shown in Figure 5.1; it is given by
y f#x'
p1+x/
a Figure 5.1. An illustration of the trapezoidal rule (5.1).
b
IV - Chapter 5: Numerical Integration
T1 + f /
f +a/ f +b/
+b a/
2
.
171
(5.6)
This approximates the integral I+ f / if f +x/ is almost linear on #a, b'. Example 5.1. Trapezoidal Method I Approximate the integral 1
I
Ã
0
1 1x
Åx
(5.7)
Solution 5.1. The true value of this integral is 1
I0
Ã
0
1 x1
Åx
log+2/
Defining the integrand as f +x_/ :
1 x1
and applying the interpolation formula to the interval #0, 1' we get the first approximation 1 T1 2
+1 0/ + f +0/ f +1//
3 4
The approximation deviates from the exact result by N#I0 T1' 0.0568528
The approximation is exact in the first digit on the right hand side of the radix point.Ô To improve the approximation T1 + f / in (5.4) when f +x/ is not a nearly linear function on #a, b', break the interval #a, b' into smaller subintervals and apply (5.5) on each subinterval. If the subintervals are small enough, then f +x/ will be nearly linear on each one. This idea is illustrated in Figure 5.2:
Mathematics for Engineers
172
y f+x/
p1+x/
x0
x1
a
x2 b
Figure 5.2. An illustration of the trapezoidal rule T2 +f / (5.2.1).
Example 5.2. Trapezoidal Method II Approximate the integral 1
I
Ã
0
1 1x
Åx
(5.8)
by using T1 + f / on two subintervals of equal length. Solution 5.2. The true value of this integral is 1
I0
Ã
1 x1
0
Åx
log+2/
For two subintervals of equal length 1
I1
Ã
2
0
log
3 2
1 x1
log
Å x Ã1 2
1
1 x1
Åx
4 3
Defining the integrand as f +x_/ :
1 x1
and applying the interpolation formula to the intervals #0, 1 s 2' and #1 s 2, 1' we get the first approximation
IV - Chapter 5: Numerical Integration
1 1 T2
2 2
0
f +0/ f
1 2
173
1 2
1
1
f +1/ f
2
1 2
17 24
The approximation deviates from the exact result by N#I1 T2' 0.0151862
The error in T2 is about 1 s 4 of that given for T1 compare Example 5.1.Ô We will derive a general formula to simplify the calculations when using several subintervals of equal length. Let the number of subintervals be denoted by n, and let h
ba
(5.9)
n
be the length of each subinterval. The endpoints of the subintervals are given by xj
a jh
j
0, 1, …, n.
(5.10)
Then break the integral into n subintervals I+ f /
xn
b
à f +x/ Šx a
Ã
n1
f +x/ Å x
ÅÃ
x0
i 0
xi1
f +x/ Å x.
(5.11)
xi
Approximate each subinterval by using a linear interpolating polynomial such as (5.5) and noting that each subinterval #xi , xi1 ' has length h. Then n1
Tn + f /
Å i 0
h 2
+ f +xi / f +xi1 //
h
1 2
f +x0 /
1 2
n2
f +xn / Å f +xi / .
(5.12)
i 1
This is called the trapezoidal numerical integration rule. The subscript n gives the number of subintervals being used; and the points x0 , x1 , …, xn are called the numerical integration node points. The following lines show an implementation of the trapezoidal method in a few steps using formula (5.12). The implementation is straight forward and does not use special features of the evaluation.
Mathematics for Engineers
174
trapezoidalMethod$f_, x_, a_, b_ , n_( : Block%h, xval, fval, + determine the step length / ba ; h n + generate the integration nodes / xval Table#a i h, i, 0, n'; + function values are generated / fval Map#+f s. x ! Ó/ &, xval'; First#fval' ; fval317 2 Last#fval' ; fval3Length#fval'7 2 + sum up the terms according to +5.12/ / N#h Fold#Plus, 0, fval'' )
The following line is an application of the function trapezoidalMethod to the function f +x/
x2 Æx
2
trapezoidalMethod,x2 Æx , x, 0, 1, 20 2
0.18932
Before giving some numerical examples of Tn + f /, we would like to discuss the choice of n. With a sequence of increasing values of n, Tn + f / will usually be an increasingly accurate approximation of I+ f /. But which sequence of values of n should be used? If n is doubled repeatedly, then the function values used in each T2 n + f / will include all of the earlier function values used in the preceding Tn + f /. Thus, the doubling of n will ensure that all previously computed information is used in the new calculation, making the trapezoidal rule less expensive than it would be otherwise. To illustrate how function values are reduced when n is doubled, consider T2 + f / and T4 + f /. T2 + f /
h
f +x0 / 2
f +x1 /
f +x2 /
with h
2
ba 2
, x0
a, x1
ab 2
, x2
b.
(5.13)
Also T4 + f / x0
h
a, x1
f +x0 /
f +x1 / f +x2 / f +x3 /
2 3ab 4
, x2
ab 2
, x3
f +x4 /
2 a3b 4
, x4
with h
ba 4
, (5.14)
b.
Comparing the two approximations, we observe that f +x1 / and f +x3 / need to be evaluated, as the other function values are known from the lower approximation. For this and other reasons, all of our examples of Tn + f / are based on doubling n.
IV - Chapter 5: Numerical Integration
175
Example 5.3. Higher Trapezoidal Method Here we will calculate Tn + f / using trapezoidal interpolation for three different functions. The functions are: f +x_/ : Æx
2
for the interval #0, 1'. 1 g+x_/ :
x2 1
for the interval #0, 4' and 1
h+x_/ :
cos+x/ 2
for the interval #0, 2 S'. The number of iterations are n
2, 4, 8, 16, 32, 64, 128, 256.
Solution 5.3. The exact values for these functions can be determined by using the antiderivatives of the functions 1
à f +x/ Šx
I1
0
1
S erf+1/
2
4
à g+x/ Šx
I2
0
1
tan +4/
and 2S
Ã
I3
h+x/ Å x
0
2S 3
The approximation for the different functions and different iteration numbers follow from the lines below Tf
+trapezoidalMethod+ f +x/, x, 0, 1, Ó1/ &/ s 2, 4, 8, 16, 32, 64, 128, 256
0.73137, 0.742984, 0.745866, 0.746585, 0.746764, 0.746809, 0.74682, 0.746823 Tg
+trapezoidalMethod+g+x/, x, 0, 4, Ó1/ &/ s 2, 4, 8, 16, 32, 64, 128, 256
1.45882, 1.32941, 1.32525, 1.32567, 1.32578, 1.32581, 1.32582, 1.32582
Mathematics for Engineers
176
Th
+trapezoidalMethod+h+x/, x, 0, 2 S, Ó1/ &/ s 2, 4, 8, 16, 32, 64, 128, 256
4.18879, 3.66519, 3.62779, 3.6276, 3.6276, 3.6276, 3.6276, 3.6276
The errors with respect to the exact value of the integrals are H1
I1 Tf
0.0154539, 0.00384004, 0.000958518, 0.000239536, 0.0000598782, 0.0000149692, 3.74227 106 , 9.35566 107 H2
I2 Tg
0.133006, 0.0035941, 0.000564261, 0.000144082, 0.000036038, 9.01059 106 , 2.25272 106 , 5.63183 107 H3
I3 Th
0.561191, 0.0375927, 0.000192788, 5.12258 109 , 8.88178 1016 , 8.88178 1016 , 0., 8.88178 1016
These data are collected in the following table for each integral and each number of used interpolation steps. TableForm$Prepend$2, 4, 8, 16, 32, 64, 128, 256, H1, H2, H3¬ , n, "H1", "H2", "H3"(( n 2 4 8
H1 0.0154539 0.00384004 0.000958518
H2 0.133006 0.0035941 0.000564261
H3 0.561191 0.0375927 0.000192788
16
0.000239536
0.000144082
5.12258 109
32
0.0000598782
0.000036038
64
0.0000149692
9.01059 10
8.88178 1016 6
8.88178 1016
128 3.74227 106 2.25272 106 0. 256 9.35566 107 5.63183 107 8.88178 1016
From the table it is obvious that the error decreases with increasing n. The third example converges very rapidly.Ô
5.2.2 Simpson's Method To improve trapezoidal integration in formula (5.6) we have to go to higher interpolations to approximate f +x/ on #a, b'. Let p2 +x/ be the quadratic polynomial that interpolates f +x/ at a, c +a b/ s 2 and b. Using this idea in the approximation of I+ f /, we get
IV - Chapter 5: Numerical Integration
177
b
I+ f / Ã p2 +x/ Å x a
+x c/ +x b/
b
à a
+a c/ +a b/
f +a/
+x a/ +x b/ +c a/ +c b/
f +c/
+x a/ +x c/ +b a/ +b c/
(5.15) f +b/! Å x.
This integral can be evaluated directly, but it is easier to first introduce h +b a/ s 2 and then change the variable of integration. We will evaluate the first term to illustrate the general procedure. Let u x a. Then t1
Simplify%
+x b/ +x c/ +a b/ +a c/
s. b a 2 h, c a h)
+a h x/ +a 2 h x/ 2 h2 t1 s. x a u
t2
+h u/ +2 h u/ 2 h2 b
Ã
a
+x c/ +x b/ +a c/ +a b/ 1 2 h2
2h
Ã
Åx
a2 h
Ã
+x c/ +x b/ Å x
a
+u h/ +u 2 h/ Å u
0
1 2 h2
u3 3
3 2
u h 2 h u! 2
2
2h 0
h 3
(5.16) .
The complete evaluation of (5.15) yields S2 + f /
h 3
% f +a/ 4 f
ab 2
f +b/).
The method is illustrated in Figure 5.3:
(5.17)
Mathematics for Engineers
178
Figure 5.3. An illustration of Simpson's rule S2 +f / (5.17).
The symbolic derivation of his formula takes the same steps as discussed above. First define the polynomial p2+x_/ :
E+a/ ++x b/ +x c// +a b/ +a c/
E+b/ ++x a/ +x c// +b a/ +b c/
E+c/ ++x a/ +x b// +c a/ +c b/
Then replace the points of interpolation with the specific values res1
Simplify%p2+x/ ss. c
ab 2
, b a 2 h!)
E+a/ ,a2 3 a h 2 a x 2 h2 3 h x x2 0 +a x/ +2 +a 2 h x/ E+a h/ +a h x/ E+a 2 h// 2 h2
and integrate the resulting polynomial with respect to x a2 h
Simplify%Ã
res1 Å x s. h
a
1 6
+a b/ 4 E
ab 2
In this formula we used E+x/
ba 2
)
E+a/ E+b/
f +x/.
Example 5.4. Simpson's Rule I Approximate the integral 1
I
Ã
0
1 1x
Åx
by using Simpson's rule.
(5.18)
IV - Chapter 5: Numerical Integration
179
Solution 5.4. First define the function 1
f +x_/ :
x1
and the step h ba
h
2
s. a 0, b 1
1 2
Then use the rule which delivers 1 S2
f +0/ f +1/ 4 f
3
01 2
h
25 36
The error now is H0
1
1
Ã
x1
0
log+2/
Å x S2
25 36
N#H0' 0.00129726
To compare this with the trapezoidal rule, use T2 , since the number of function evaluation is the same for both S2 and T2 by a factor of about 11, a significant increase in accuracy N#T2 S2' 0.0138889 N%
T2 S2 H0
)
10.7063
The accuracy is much better in a Simpson integration than in a trapezoidal integration.Ô The rule S2 + f / will be an accurate approximation to I+ f / if f +x/ is nearly quadratic on #a, b'. For other cases, proceed in the same manner as for the trapezoidal rule. Let n be an even integer, h +b a/ s n, and define the evaluation points for f +x/ by xj
a jh
j
0, 1, 2, …, n.
(5.19)
180
Mathematics for Engineers
Follow the ideas as stated for the trapezoidal method to introduce subintervals into the interval #a, b' #x0 , xn '. The intervals should be not too small and containing three interpolation points. Thus, xn
b
I+ f /
à f +x/ Šx a
Ã
n1
f +x/ Å x
x0
ÅÃ i 0
xi1
f +x/ Å x
(5.20)
xi
Approximate each subinterval by (5.17). This yields I+ f /
h
# f +x0 / 4 f +x1 / f +x2 / 3 f +x2 / 4 f +x3 / f +x4 / … f +xn2 / 4 f +xn1 / f +xn /'.
(5.21)
If these terms are combined and simplified, we obtain the formula Sn + f /
h 3
f +x0 / 4 f +x1 / 2 f +x2 / … 2 f +xn2 / 4 f +xn1 / f +xn /.
(5.22)
This is called Simpson's rule, and it has been among the most popular numerical integration methods for more than two centuries. The index n gives the number of subdivisions used in defining the integration node points x0 , x1 , x2 , …, xn . The following Mathematica function shows a simple implementation of Simpson's rule. simpsonMethod$f_, x_, a_, b_ , n_( : Block%h, xval, fval, intval, + determine the step length / ba ; h n + generate the integration nodes / xval Table#a i h, i, 0, n'; + evaluate the function values / fval Map#+f s. x ! Ó/ &, xval'; intval First#fval', Last#fval'; intval Join#intval, 2 Take#fval, 2, Length#fval' 1''; intval Flatten#Join#intval, 2 Take#fval, 2, Length#fval' 1, 2'''; + sum up the terms / h N% Fold#Plus, 0, intval') 3 )
The next line shows an application of the function to f +x/ simpsonMethod,Æx , x, 0, 2, 300 2
0.882081
Æx for x ± #0, 2'. 2
IV - Chapter 5: Numerical Integration
181
Example 5.5. Higher Simpson Method Here we will calculate Sn + f / using Simpson's interpolation for three different functions. The functions are: f +x_/ : Æx
2
for the interval #0, 1'. 1 g+x_/ :
x2 1
for the interval #0, 4' and 1
e+x_/ :
cos+x/ 2
for the interval #0, 2 S'. The number of iterations are n
2, 4, 8, 16, 32, 64, 128, 256.
Solution 5.5. The exact values for these functions can be determined by using the antiderivatives of the functions 1
à f +x/ Šx
I1
0
1
S erf+1/
2
4
à g+x/ Šx
I2
0
1
tan +4/
and 2S
Ã
I3
e+x/ Å x
0
2S 3
The approximation for the different functions and different iteration numbers follow from the lines below Sf
+simpsonMethod+ f +x/, x, 0, 1, Ó1/ &/ s 2, 4, 8, 16, 32, 64, 128, 256
0.74718, 0.746855, 0.746826, 0.746824, 0.746824, 0.746824, 0.746824, 0.746824 Sg
+simpsonMethod+g+x/, x, 0, 4, Ó1/ &/ s 2, 4, 8, 16, 32, 64, 128, 256
1.23922, 1.28627, 1.32387, 1.32581, 1.32582, 1.32582, 1.32582, 1.32582 Se
+simpsonMethod+e+x/, x, 0, 2 S, Ó1/ &/ s 2, 4, 8, 16, 32, 64, 128, 256
4.88692, 3.49066, 3.61532, 3.62753, 3.6276, 3.6276, 3.6276, 3.6276
Mathematics for Engineers
182
The errors with respect to the exact value of the integrals are H1
I1 Sf
0.000356296, 0.000031247, 1.98772 106 , 1.24623 107 , 7.79456 109 , 4.87245 1010 , 3.04541 1011 , 1.90337 1012 H2
I2 Sg
0.086602, 0.0395432, 0.00195038, 4.02219 106 , 2.33365 108 , 1.4606 109 , 9.13321 1011 , 5.70899 1012 H3
I3 Se
1.25932, 0.13694, 0.0122738, 0.0000642559, 1.70753 109 , 8.88178 1016 , 8.88178 1016 , 4.44089 1016
These data are collected in the following table for each integral and each number of used interpolation steps. TableForm$Prepend$2, 4, 8, 16, 32, 64, 128, 256, H1, H2, H3¬ , n, "H1", "H2", "H3"(( n 2 4
H1 0.000356296 0.000031247
H2 0.086602 0.0395432
8
1.98772 106
0.00195038
0.0122738
16
1.24623 107
4.02219 106
0.0000642559
32
7.79456 109
2.33365 108
1.70753 109
64
4.87245 10
10
1.4606 10
H3 1.25932 0.13694
9
8.88178 1016
128 3.04541 1011 9.13321 1011 8.88178 1016 256 1.90337 1012 5.70899 1012 4.44089 1016
From the table it is obvious that the error decreases with increasing n. The third example converges very rapidly.A comparison of this table with the trapezoidal results shows that a certain accuracy is reached within a reduced number of steps.Ô
5.2.3 Generalized Integration Rules The generalization of the Trapezoidal and Simpson's rule are known as Newton-Cotes formulas. The integration (quadrature) formulas by Newton-Cotes can be derived if the integrand f is replaced by an appropriate interpolation polynomial pn +x/ so that b
b
à f +x/ Šx à pn +x/ Šx a
a
For the following calculation we assume an equidistant partition of the interval #a, b' with
(5.23)
IV - Chapter 5: Numerical Integration
a i h and i
xi
183
0, 1, 2, …, n
(5.24)
where h +b a/ s n and n ! 0 and n ± 10 . If we assume that the interpolation polynomial is generated by a Lagrange polynomial of order n then we have n
pn +x/
n
Å fi Li +x/ with Li +x/ i 0
Ç k 0
x xk (5.25)
xi xk
ki
and fi f +xi / for i 0, 1, 2, …, n. Introducing new variables with x Lagrange polynomial in a simpler form n
Li +x/
Ii +t/
Ç k 0
tk ik
a h t we can rewrite the
.
(5.26)
ki
This representation can be used in the integration of the polynomial pn +x/ as n
b
à pn +x/ Šx a
n
b
Šfi à Li +x/ Šx a
i 0
n
n
h Šfi à Ii +t/ Št i 0
0
h Å fi Di .
(5.27)
i 0
The weights Di are functions of n only and do not depend on the function values of f . Defining the Lagrange polynomials in their reduced representation is straight forward in Mathematica and allows us to use them in the calculation of the weights. The reduced Lagrange polynomials are defined by n
lagrangePolynomial+n_, i_/ : Ç If%i k, k 0
tk ik
, 1)
This function delivers the corresponding function Ii +t/ by lagrangePolynomial+3, 1/ 1 2
+2 t/ +3 t/ t
These functions can be used in the calculations of the coefficients Di . The following calculation lists the weights for n 2 corresponding to the 1 s 3 Simpson rule n
n
2; Table%Ã lagrangePolynomial+n, i/ Å t, i, 0, n) 0
1 4 1 , , ! 3 3 3
The weights in general satisfy the property
184
Mathematics for Engineers
n
Å Di
n
(5.28)
i 0
We can check this relation with our function. n
4; Fold%Plus, 0, Table%Ã lagrangePolynomial+n, i/ Å t, i, 0, n))
n
0
4
The related approximation of the integral follows next by summing over the corresponding function values. n
8; Factor%Fold%Plus, 0, Simplify%Table%h f +a h i/ Ã lagrangePolynomial+n, i/ Å t, i, 0, n))))
n
0
1 14 175
4 h +5888 f +a h/ 928 f +a 2 h/ 10 496 f +a 3 h/ 4540 f +a 4 h/ 10 496 f +a 5 h/ 928 f +a 6 h/ 5888 f +a 7 h/ 989 f +a 8 h/ 989 f +a//
representing Simpson's basic rule of integration. To get the higher order approximations we define a relation in Mathematica which delivers the quadrature formulas. integrationRulesLagrange$n_, h_( : n
Fold%Plus, 0, Table%h f#a i h' Ã lagrangePolynomial#n, i' Åt, 0
i, 0, n) ss Simplify) ss Factor
The function integrationRulesLagrange allows us to generate the integration formulas in a straight forward way. The next line sets up a table of these formulas up to order 6. TableForm$i, integrationRulesLagrange#i, h'6i 1 , TableHeadings None, n, Formula( n Formula 1 2 3 4 5 6
1 h + f +a h/ f +a// 2 1 h +4 f +a h/ f +a 2 h/ f +a// 3 3 h +3 f +a h/ 3 f +a 2 h/ f +a 3 h/ f +a// 8 2 h +32 f +a h/ 12 f +a 2 h/ 32 f +a 3 h/ 7 f +a 4 h/ 7 f +a// 45 5 h +75 f +a h/ 50 f +a 2 h/ 50 f +a 3 h/ 75 f +a 4 h/ 19 f +a 5 h/ 19 f +a// 288 1 h +216 f +a h/ 27 f +a 2 h/ 272 f +a 3 h/ 27 f +a 4 h/ 216 f +a 5 h/ 140
41 f +a 6 h/ 41 f +a//
IV - Chapter 5: Numerical Integration
185
The first formula is known as Trapezoidal rule, the second as Simpson's rule, the third is called 3 s 8 Simpson rule, the forth is the rule established by Milne, the fifth has no specific name, and the sixth is Weddle's rule to approximate integrals. For larger orders n ! 6 the function integrationRulesLagrange does not deliver useful formulas because of the insensitivity of the integral due to changes in the function f . In principle, we can generate these formulas but they are not drastically improving the accuracy of the integral. This behavior is discussed in more detail in the following example. Example 5.6. Error Estimation Recall the examples we discussed in the previous section, with 1
I
Ã
0
1 1x
Åx
ln+2/.
(5.29)
Solution 5.6. Here f +x/ 1 s +1 x/, #a, b' will give us the related results and errors. f +x_/ :
#0, 1', and h
+b a/ s n. The table from above
1 x1
TableForm% Table%i, integrationRulesLagrange i,
1 i
, log+2./ integrationRulesLagrange i,
1 i
!, i, 1, 10) s.
a 0., TableHeadings None, "Formula", "Value", "Error") Formula 1 2 3 4 5
Value 0.75 0.694444 0.69375 0.693175 0.693163
Error 0.0568528 0.00129726 0.000602819 0.0000274226 0.0000158485
6
0.693148 8.81695 107
7
0.693148 5.52783 107
8
0.693147 3.39735 108
9
0.693147 2.22241 108
10
0.693147 1.45038 109
It is obvious from this table that the higher order interpolation polynomials reduce the error of approximation. However, within the range of 6 to 10 there is only a marginal improvement of the error.Ô
Mathematics for Engineers
186
5.2.4 Error Estimations for Trapezoidal and Simplson's Rule In the preceding section, numerical results for all integrands but one showed a regular behavior in the error for both trapezoidal and Simpson rules. To explain this regular behavior we consider error formulas for these integration methods. These formulas will lead to a better understanding of the methods, showing both their weakness and strengths, and they will allow improvements of the methods. We begin by examining the error of the trapezoidal rule. Theorem 5.1. Trapezoidal Error Estimation
Let f +x/ have two continuous derivatives on #a, b', and let n be a positive integer. Then for the error in integrating b
I+ f /
à f +x/ Šx
(5.30)
a
using the trapezoidal rule Tn + f / of (5.6), we have EnT + f /
I+ f / Tn + f /
h2 +b a/ 12
f '' +cn /
(5.31)
The number cn is some unknown point in #a, b', and h
+b a/ s n.
Formula (5.31) can be used to bound the error in Tn + f /, generally by bounding the term f '' +cn / by its largest possible value on the interval #a, b'. This will be illustrated in the following example. Also note that the formula for EnT + f / is consistent with the behavior of the error observed in the calculations (examine this). Example 5.7. Error Estimation Recall the examples we discussed in the previous section, with 1
I
Ã
0
1
Åx
1x
ln+2/.
(5.32)
Solution 5.7. Here f +x/ 1 s +1 x/, #a, b' error formula (5.31), we obtain EnT + f /
h2 12
f '' +cn /
0 cn 1, h
#0, 1', and f '' +x/
1 s n.
2 t +1 x/3 . Substituting into the
(5.33)
The formula cannot be computed exactly because cn is not known. But we can bound the error by looking at the largest possible value for f '' +cn / . Bound f '' +x/ on #a, b' #0, 1' max0x1 Then
2 +1 x/3
2.
(5.34)
IV - Chapter 5: Numerical Integration
EnT + f / For n
h2 12
1 and n E1T + f /
h2
+2/
6
187
(5.35)
.
2, we have 1 6
and E2T + f /
+1 s 2/2 6
(5.36)
.
Comparing these results with the true errors, we see that these bounds are two to three times the actual errors.Ô A possible weakness in the trapezoidal rule can be inferred from the assumptions of Theorem 5.1. If f +x/ does not have two continuous derivatives on #a, b', then does Tn + f / converge more slowly? The answer is yes for some functions, especially if the first derivative is not continuous. The error formula (5.31) can only be used to bound the error, because f '' +cn / is unknown. This will be improved on by a more careful consideration of the error formula. A central element of our proof of (5.31) leis in being able to demonstrate the n #D, D h': Dh
Ã
D
f +x/ Å x h%
f +D/ f +D h/ 2
)
h3 12
1 case for an interval
f '' +c/
(5.37)
for some c in #D, D h'. We will use this formula to obtain the general formula (5.31) in Theorem 5.1. Recall the derivation of the trapezoidal rule Tn + f / as given in (5.11). Then EnT + f / xn
b
à f +x/ Šx Tn + f / a
Ã
x0
n1
f +x/ Å x Tn + f /
ÅÃ i 0
xi1
f +x/ Å x h f +xi / f +xi1 /.
(5.38)
xi
Apply (5.37) to each of the terms on the right side of (5.38), to obtain EnT + f /
h3 12
n
Å f '' +Ji /.
The unknown constants J1 , J2 , …, Jn are located #x0 , x1 ', #x1 , x2 ', …#xn1 , xn '. By factoring (5.39), we obtain EnT + f /
h2 12
(5.39)
i 1
in
the
respective
subintervals
n
Å h f '' +Ji /.
(5.40)
i 1
The sum in (5.40) is equal to +b a/ f '' +cn / for some cn ± #a, b', thus obtaining the general case of (5.31). To estimate the trapezoidal error, observe that the term of the sum in (5.40) is a Riemann sum for the
Mathematics for Engineers
188
integral b
à f '' +x/ Šx
f ' +b/ f ' +a/.
(5.41)
a
The Riemann sum is based on the partition of #a, b' as n , this sum will approach the integral (5.41). Using (5.41) to estimate the right side of (5.40), we find that EnT + f /
h2 12
# f ' +b/ f ' +a/'.
(5.42)
qT This error estimate will be denoted En + f /. It is called the asymptotic estimate of the error because it qT improves as n increases. As long as f ' +x/ is computable En + f / will be very easy to be derived. Example 5.8. Error Estimation for Trapezoidal Rule Again consider the case 1
I
Ã
0
1 1x
Å x.
(5.43)
Solution 5.8. Then f ' +x/ qT En + f / For n
h2
%
1
12 +1 1/
1 and n
qT E1 + f /
1
qT E2 + f /
1
2
1 t +1 x/2 and (5.42) yields the estimate 1 +1 0/
2
)
h2 16
with h
1 n
.
(5.44)
2
16
64
0.0625
(5.45)
0.0156
(5.46)
These compare quite closely to the true errors calculated above.Ô qT The estimate En + f / has several practical advantages over the earlier error formula (5.31). First, it confines that when n is doubled the error decreases by a factor of about 4, provided that f ' +b/ f ' +a/ 0. This agrees with the results for our calculations in Example 5.6. Second (5.42) implies that the convergence of Tn + f / will be more rapid when f ' +b/ f ' +a/ 0. This is a partial explanation of the very rapid convergence observed in the third integral in Example 5.6. Finally, (5.42) qT leads to a more accurate numerical integration formula by taking En + f / into account: I+ f / Tn + f /
h2 12
which can be write as
+ f ' +b/ f ' +a//
(5.47)
IV - Chapter 5: Numerical Integration
I+ f / Tn + f /
h2 12
189
+ f ' +b/ f ' +a//.
(5.48)
This formula is called the corrected trapezoidal rule, and it will be denoted by CTn + f /. This correction needs an extension of our formula defined above. The following lines define the extension of the formula. correctedTrapezoidalMethod$f_, x_, a_, b_ , n_( : Block%h, h +b a/ s n; fp x f; trapezoidalMethod#f, x, a, b, n'
h2 12
++fp s. x ! b/ +fp s. x ! a//
)
Example 5.9. Application of CTn + f / Let us evaluate the integral 1
I
x Ã Æ Åx 2
0.746824
(5.49)
0
and compare it with different orders of approximations by Tn + f / and CTn + f /. Solution 5.9. The exact value of this integral is 1
x Ã Æ Åx
I0
2
0
1 2
S erf+1/
where Erf+1/ is the so called error function at 1. Using the numerical procedures for Tn + f / and CTn + f / for orders of n
2, 4, 8, 16, 32, 64, 128, 256, 512;
then the trapezoidal method delivers the results Tn
,trapezoidalMethod,Æx , x, 0, 1, Ó10 &0 s n 2
0.73137, 0.742984, 0.745866, 0.746585, 0.746764, 0.746809, 0.74682, 0.746823, 0.746824
For the corrected trapezoidal method the results are CTn
,correctedTrapezoidalMethod,Æx , x, 0, 1, Ó10 &0 s n 2
0.746699, 0.746816, 0.746824, 0.746824, 0.746824, 0.746824, 0.746824, 0.746824, 0.746824
The errors of the two methods are
Mathematics for Engineers
190
H1
I0 Tn
0.0154539, 0.00384004, 0.000958518, 0.000239536, 0.0000598782, 0.0000149692, 3.74227 106 , 9.35566 107 , 2.33891 107 H2
I0 CTn
0.000125571, 7.9575 106 , 4.98589 107 , 3.11797 108 , 1.949 109 , 1.21817 1010 , 7.61358 1012 , 4.75842 1013 , 2.9643 1014
The following table collects these information Prepend#Transpose#n, H1, H2, H2 s RotateLeft#H2'', "n", "H1", "H2", "Ratio"' ss TableForm n 2
H1 0.0154539
H2 0.000125571
Ratio 15.7802
4
0.00384004
7.9575 106
15.96
8
0.000958518
4.98589 107
15.9908
8
16
0.000239536
3.11797 10
32
0.0000598782
1.949 109
64
0.0000149692
1.21817 1010 16.
15.9978 15.9994
128 3.74227 106 7.61358 1012 16.0002 256 9.35566 107 4.75842 1013 16.0524 512 2.33891 107 2.9643 1014
2.36065 1010
The table shows that the corrected trapezoidal converges quite rapidly compared with the conventional method Tn + f /. When n is doubled the error in CTn + f / decreases by a factor of about 16.Ô 5.2.4.1 Error Formulas for Simpson's Rule
The type of analysis used in the preceding discussion can also be used to derive corresponding error formulas for Simpson's rule. These are stated in the following theorem, with the proof omitted. Theorem 5.2. Error for Simpson's Rule
Assume f +x/ has four continuous derivatives on #a, b', and let n be an even positive integer. Then the error in using Simpson's rule is given by EnS + f /
I+ f / Sn + f /
h4 +b a/ 180
f +4/ +cn /
with cn an unknown point in #a, b' and h the asymptotic error formula
(5.50)
+b a/ s n. Moreover, this error can be estimated with
IV - Chapter 5: Numerical Integration
qS En + f /
h4
180
191
+ f ''' +b/ f ''' +a//.
(5.51)
Note that (5.50) say that Simpson's rule is exact for all f +x/ that are polynomials of degree 3, whereas the quadratic interpolation on which Simpson's rule is based is exact only for f +x/ a polynomial of degree 2. The degree of precision being 3 leads to the power h4 in the error, rather than the power h3 , which would have been produced on the basis of the error in quadratic interpolation. It is this higher power h4 in the error and the simple form of the method that historically have caused Simpson's rule to be the most popular numerical integration rule. Example 5.10. Error in Simpson's Rule Estimate the error for the integral 1
I
Ã
0
1 1x
Åx
(5.52)
by applying Simpson's error formula. Solution 5.10. First let us define the function and the derivatives of third and fourth order 1
f +x_/ :
x1
f +x/ 3
f3
xxx 6 +x 1/4 4 f +x/
f4
xxxx 24
+x 1/5
The exact error is given by
EnS + f /
h4 180
f +4/ +cn /,
h
1 n
for some 0 cn 1. We can bound it by
(5.53)
Mathematics for Engineers
192
1
H
180
,h4 0 f4 s. x 0
2 h4 15
The asymptotic error is given by 1
Ha
180
,h4 0 ++f3 s. x 1/ +f3 s. x 0//
h4 32
qS Now if we set n 2, E2 + f / 0.00195; for comparison the actual error resulting from the calculations by applying Simpson's rule is 0.001308.Ô The behavior in I+ f / Sn + f / can be derived from (5.51). When n is doubled, h is halved, and h4 decreases by a factor of 16. Thus, the error EnS + f / should decrease by the same factor, provided that f ''' +b/ f ''' +a/. This is the error behavior observed in Example 5.9. The theory of asymptotic error formulas a
En + f / En + f /
(5.54)
such as for EnT + f / and EnT + f /, says that (5.54) is valid provided that a
limn
En + f / En + f /
(5.55)
1.
The needed size of n in (5.54) will vary with the integrand f , which is illustrated by Example 5.9. From (5.51) and (5.50), we also are lead to infer that Simpson's rule will not perform as well if f +x/ is not four times continuously differentiable on #a, b'. This is correct for most such functions, and other numerical methods are often necessary for integrating them. Example 5.11. Simpson's Rule with Slow Convergence Use Simpson's Rule to approximate 1
I
Ã
0
x Åx
2 3
.
(5.56)
Solution 5.11. The integration for different orders of iteration is done by S2
-simpsonMethod- x , x, 0, 1, Ó11 &1 s 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024
0.638071, 0.656526, 0.663079, 0.665398, 0.666218, 0.666508, 0.666611, 0.666647, 0.66666, 0.666664
The error of the numerical integration with respect to the exact value is
IV - Chapter 5: Numerical Integration
2
H
3
193
S2
0.0285955, 0.0101404, 0.00358739, 0.00126848, 0.000448484, 0.000158564, 0.0000560607, 0.0000198205, 7.00759 106 , 2.47756 106
The ratio of two consequent values is determined by ratio
H RotateLeft#H'
2.81996, 2.82668, 2.8281, 2.82837, 2.82842, 2.82843, 2.82843, 2.82843, 2.82843, 0.0000866416
The error and the ratio of the errors is collected in the following table Prepend#Transpose#2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, H, ratio', "n", "H", "Ratio"' ss TableForm n 2 4 8 16 32 64 128 256
H 0.0285955 0.0101404 0.00358739 0.00126848 0.000448484 0.000158564 0.0000560607 0.0000198205
512
7.00759 106 2.82843
Ratio 2.81996 2.82668 2.8281 2.82837 2.82842 2.82843 2.82843 2.82843
1024 2.47756 106 0.0000866416
The column Ratio shows that the convergence is much slower.Ô As was done by the trapezoidal rule, a corrected Simpson's rule can be defined: CSn + f /
Sn + f /
h4 180
+ f ''' +b/ f ''' +a//.
(5.57)
This will be usually a more accurate approximation than Sn + f /. The corrected Simpson's rule can be implemented by the following lines correctedSimpsonsMethod$f_, x_, a_, b_ , n_( : Block%h, h +b a/ s n; fp x,x,x f; simpsonMethod#f, x, a, b, n' )
h4 180
++fp s. x ! b/ +fp s. x ! a//
Mathematics for Engineers
194
Example 5.12. Application of CSn + f / Let us evaluate the integral 1
I
x Ã Æ Åx 2
0.746824
(5.58)
0
and compare it with different orders of approximations by Sn + f / and CSn + f /. Solution 5.12. The exact value of this integral formula 1
x Ã Æ Åx
I0
2
0
1
S erf+1/
2
where Erf+1/ is the so called error function at 1. Using the numerical procedures for Sn + f / and CSn + f / for orders of 2, 4, 8, 16, 32, 64, 128, 256, 512;
n
Then the trapezoidal method delivers the results Sn
,simpsonMethod,Æx , x, 0, 1, Ó10 &0 s n 2
0.74718, 0.746855, 0.746826, 0.746824, 0.746824, 0.746824, 0.746824, 0.746824, 0.746824
For the corrected trapezoidal method the results are CSn
,correctedSimpsonsMethod,Æx , x, 0, 1, Ó10 &0 s n 2
0.746669, 0.746823, 0.746824, 0.746824, 0.746824, 0.746824, 0.746824, 0.746824, 0.746824
The errors of the two methods are H1
I0 Sn
0.000356296, 0.000031247, 1.98772 106 , 1.24623 107 , 7.79456 109 , 4.87245 1010 , 3.04541 1011 , 1.90337 1012 , 1.19016 1013 H2
I0 CSn
0.000154648, 6.87001 107 , 8.15866 109 , 1.18803 1010 , 1.82354 1012 , 2.84217 1014 , 5.55112 1016 , 0., 0.
The following table collects this information
IV - Chapter 5: Numerical Integration
195
Prepend#Transpose#n, H1, H2, H2 s RotateLeft#H2'', "n", "H1", "H2", "Ratio"' ss TableForm n 2
H1 0.000356296
4
0.000031247
H2 0.000154648
Ratio 225.105
6.87001 107
84.205
8.15866 109
68.6738
1.98772 10
6
16
1.24623 10
7
32
7.79456 109
64
4.87245 1010 2.84217 1014 51.2
8
1.18803 10
10
65.1497
1.82354 1012 64.1602
128 3.04541 1011 5.55112 1016 ComplexInfinity 256 1.90337 1012 0.
Indeterminate
512 1.19016 1013 0.
0.
The table shows that the corrected Simpson's converges quite rapidly compared with the conventional method Sn + f /. When n is doubled the error in CSn + f / decreases by a factor of about 68.Ô
5.2.5 Tests and Exercises The following two subsections serve to test your understanding of the last section. Work first on the test examples and then try to solve the exercises. 5.2.5.1 Test Problems T1. What is the basic idea of the trapezoidal rule? T2. How does Simpson's rule work? T3. How can the accuracy of a simple integration rule be improved? T4. What is the idea of introducing correction terms?
5.2.5.2 Exercises E1. Use the simple integration formulas like Trapezoidal and Simpson to approximate the following integrals : Ss2
a. ¼0 Æsin+x/ Å x 1
b. ¼0 Æ+x1/sx Å x 1
c. ¼0 ln+1 s x/ sin+x/ Å x 1
d. ¼0 x5 Æx Å x 2
E2. Evaluate the following integrals using composite Simpson' s rule with n
3 and n
5 subintervals:
1
a. ¼0 xx Å x 0.984 x
b. ¼0
x Åx
E3. Derive the Simpson's 1 s 3 rule and 3 s8 rule, using integrating the appropriate interpolating polynomial. E4. Evaluate the following integrals accurately, using simple Trapezoidal and Simpson's formulas: 1
a. ¼0 Æ5 x 1
cos+10 x/ x
Åx
b. ¼0 xm +ln+1 s x//n Å x for m
0, 1 s2, 1.1 and n
1, 2, 3
196
Mathematics for Engineers
1
¼0 ln+1 s x/ sin+x/ Å x 1
d. ¼0
1 bsin+20 S x/
Åx
for b
2, 1.1, 1.01, 1.001, …
E5. Consider the integral 1
Ã
I
0
1 1x
Åx
(1)
and solve it either with the error correction formulas for Trapezoidal or Simpson's rule. E6. Suppose a body of mass m is traveling vertically upward starting at the surface of the earth. If all resistance except gravity is neglected, the escape velocity v is given by v2
2gRÃ
[2 Å [ where [
1
x R
,
(2)
R 6 378 137 m is the radius of the earth, and g 9.81 mt s2 is the force of gravity at the earth's surface. Approximate the escape velocity v. E7. The Laguerre polynomials L0 +x/, L1 +x/, … form an orthogonal set on #0, / and satisfy ¼0 Æx Li +x/ L j +x/ Å x 0 for i j. The polynomial Ln +x/ has n distinct zeros x1 , x2 ,…, xn in #0, /. Let
Ã
cn,i
0
n
Æx Ç
x xj
Å x.
(3)
Åcn,i f +xi /
(4)
j 1
xi x j
i j
Show that the quadrature formula
Ã
0
f +x/ Æx
n
i 1
has degree of precision 2 n 1. E8. Use the transformation t x1 and then the Composite Simpson's rule and the given values of n to approximate the following improper integrals.
1
a. ¼1 b. c. d.
Å x, n
4
x2 9 4 ¼1 x sin+x/ Å x, n cos+x/ ¼1 x3 Å x, n 6 1 ¼1 1x4 Å x, n 4
6
E9. Use the Composite Simpson's rule and the given values of n to approximate the following improper integrals. 1
Æx
2
1x x Æx
a. ¼0 b. ¼0
3
Å x, n
+x1/2
Å x, n
6 8
IV - Chapter 5: Numerical Integration
197
5.3 Gaussian Numerical Integration The numerical methods studied in the last section were based on integrating linear and quadratic interpolating polynomials, and the resulting formulas were applied on subdivisions of ever smaller subintervals. In this section, we consider a numerical method that is based on the exact integration of polynomials of increasing degree; no subdivision of the integration interval is used. To motivate this approach, recall from Section 2.4 of Chapter 2 the material on approximation of functions. Let f +x/ be continuous on #a, b'. Then Un + f / denotes the smallest error bound that can be attained in approximating f +x/ with a polynomial pn +x/ of degree n on the given interval a x b. The polynomial pn +x/ that yields this approximation is called the minimax approximation of degree n for f +x/, maxaxb f +x/ pn +x/
Un + f /
(5.59)
and Un + f / is called the minimax error. From Theorem 3.1 of Chapter 3, it can be seen that Un + f / will often converge to zero quite rapidly. If we have a numerical integration formula to integrate low- to moderate-degree polynomials exactly, then the hope is that the same formula will integrate other functions f +x/ almost exactly, if f +x/ is well approximated by such polynomials. To illustrate the derivation of such integration formulas, we restrict our attention to the integral I+ f /
1
à f +x/ Šx.
(5.60)
1
Its relation to integrals along other intervals #a, b' will be discussed later. The integration formula is to have the general form n
In + f /
Å w j f ,x j 0
(5.61)
j 1
and we require that the nodes x1 , …, xn and weights w1 , …, wn be so chosen that In + f / all polynomials f +x/ of as large a degree as possible. Case n
I+ f / for
1 The integration formula has the form
1
à f +x/ Šx w1 f +x1 /.
(5.62)
1
It is to be exact for polynomials of as large a degree as possible. Using f +x/ 1 and forcing equality in (5.62) gives us 2
w1 .
(5.63)
Mathematics for Engineers
198
Now use f +x/
x and again force equality in (5.62). Then
w1 x1
0
which implies x1
(5.64) 0. Thus (5.62) becomes
1
à f +x/ Šx 2 f +0/ I1 + f /.
(5.65)
1
This is the midpoint formula from the trapezoidal approximation. The formula (5.65) is exact for all linear polynomials. To see that (5.65) is not exact for quadratics, let f +x/ 1
2
1
3
2 2 Ã x Å x 2 +0/
x2 . Then the error in (5.65) is given by
0.
(5.66)
2 The integration formula is
Case n 1
à f +x/ Šx w1 f +x1 / w2 f +x2 /.
(5.67)
1
and it has four unspecified quantities: x1 , x2 , w1 , and w2 . To determine these, we require it to be exact for the four monomials f +x/ ± 1, x, x2 , x3 .
(5.68)
This leads to the four equations 2 0
w1 w2 w1 x1 w2 x2
2 3
w1 x21 w2 x22
0
w1 x31 w2 x32
(5.69)
This is a nonlinear system in four unknowns; equations
2 w1 w2, 0 w1 x1 w2 x2,
equations ss TableForm 2 w1 w2 0 w1 x1 w2 x2 2 3
w1 x12 w2 x22
0 w1 x13 w2 x23
its solution can be shown to be
2 3
w1 x12 w2 x22 , 0 w1 x13 w2 x23 !;
IV - Chapter 5: Numerical Integration
199
Solve#equations, x1, x2, w1, w2'
solution
1
w1 1, w2 1, x2
, x1
1
3
!, w1 1, w2 1, x2
3
1
, x1
3
1
!!
3
This yields the integration formula Clear# f ' w1 f +x1/ w2 f +x2/ s. solution
interpol f
1
f
1
3
, f
3
1
f
3
1
!
3
Thus the second order interpolation formula becomes I2 + f /
f
1
f
3
1 3
1
à f +x/ Šx. 1
(5.70)
From being exact for the monomials in (5.68), one can show this formula will be exact for all polynomials of degree 3. It also can be shown by direct calculation to not be exact for the degree 4 polynomial f +x/ x4 . Thus I2 + f / has degree of precision 3. For cases n 3 there occurs a problem in the solution for the weights and the interpolation points because the determining system of equations becomes nonlinear. The following function generates the determining equations for a Gauß integration. gaussIntegration$f_, x_, a_, b_ , n_( : Block%, varsX Table#ToExpression#StringJoin#"x", ToString#i''', i, 1, n'; varsW Table#ToExpression#StringJoin#"w", ToString#i''', i, 1, n'; vec1 Table$varsXi , i, 0, 2 n 1(; vecB
2 ), 0), i, 0, 2 n 1); 2i1 Thread#Map#varsW.Ó &, vec1' vecB'
Table%If%EvenQ#i', Abs%
equations ) soli
gaussIntegration#f, x, a, b, 3' ss TableForm
w1 w2 w3 m 2 w1 x1 w2 x2 w3 x3 m 0 w1 x12 w2 x22 w3 x32 m
2 3
w1 x13 w2 x23 w3 x33 m 0 w1 x14 w2 x24 w3 x34 m
2 7
w1 x15 w2 x25 w3 x35 m 0
We clearly observe that the equations are nonlinear due to the fact that the xi are not specified yet. The
Mathematics for Engineers
200
problem here is that there exist no reliable procedure to find the solutions for nonlinear algebraic equations. A way out of this unsatisfying situation is that we use the following observation from our previous integration methods. All the discussed previous methods have simple patterns of points and weights and are therefore popular. But it is also clear that we can work out any number of other quadrature formulas by simply using different, not necessarily equal spaced, interpolation points. If we take the points x0 , x1 , …, xn , the simple interpolation polynomial is given as n
pn +x/
Å li +x/ f +xi /,
(5.71)
i 0
where the li are the Lagrange polynomials defined in (3.55). It follows immediately that the quadrature weights are given by b
à li +x/ Šx.
wi
(5.72)
a
Since this involves only an integration of polynomials, the weights can be evaluated explicitly. While (5.72) always gives the quadrature weights, using it may involve some tedious manipulations. Sometimes it is easier to use the method of undetermined coefficients. Since every quadrature formula is of the form n
b
à f +x/ Šx Šwi f +xi / a
(5.73)
i 0
once we have chosen the quadrature points the only remaining issue is to compute the weights wi . Since there are n 1 weights, we can select them so that the rule is exact for all polynomials up to degree n. Imposing these conditions, we are led to the system w0 w1 … wn
ba
w0 x0 w1 x1 … wn xn
b2 a2 2
w0 xn0 w1 xn1 … wn xnn
(5.74)
bn1 an1 n1
where the first equation comes from the requirement that f +x/ 1 is to be integrated exactly, the second equation comes from integrating f +x/ x, and so on. Solving the system gives the quadrature weights. Example 5.13. Determination of Weights Derive the weights for the five-point closed Newton-Cotes formula. It should be clear that the weights are independent of a and proportional to h, so for simplicity we choose a 0 and x0 0, x1 1, x2 2, x3 3, x4 4. Substituting this into (5.74) we get the system
IV - Chapter 5: Numerical Integration
eqs
201
w0 w1 w2 w3 w4 4, w1 2 w2 3 w3 4 w4 8, w1 22 w2 32 w3 42 w4 w1 23 w2 33 w3 43 w4 64, w1 24 w2 34 w3 44 w4
1024 5
64 3
,
!; TableForm#eqs'
w0 w1 w2 w3 w4 4 w1 2 w2 3 w3 4 w4 8 w1 4 w2 9 w3 16 w4
64 3
w1 8 w2 27 w3 64 w4 64 w1 16 w2 81 w3 256 w4
1024 5
Solution 5.13. Solving this system of equations and scaling to the given interval, we arrive at the quadrature formula sol
Flatten#Solve#eqs, w0, w1, w2, w3, w4''
w0
14 45
, w1
64 45
, w2
8 15
, w3
64 45
, w4
14 45
!
Simplify#h w0, w1, w2, w3, w4. f +a/, f +a h/, f +a 2 h/, f +a 3 h/, f +a 4 h/ s. sol' 1 45
7 f +a/ 12 f +a 1/ 7 f +a 2/ 32 f a
1 2
32 f a
3 2
which approximates the integral a4 h
Ã
f +x/ Å x
a
2
h 45
(5.75) 7 f +a/ 32 f +a 2 h/ 12 f +a 2 h/ 32 f +a 3 h/ 7 f +a 4 h/.Ô
Example 5.14. Determination of Weights using Chebyshev Polynomials Find the weights for the numerical quadrature 1
à f +x/ Šx w0 f +x0 / w1 f +x1 / w2 f +x2 / w3 f +x3 /, 1
where the xi are the roots of the Chebyshev polynomials T4 . Solution 5.14. The roots of the Chebyshev polynomials are rCheb4
N#Solve#T4 +x/ 0, x''
x 0.92388, x 0.92388, x 0.382683, x 0.382683
The equations for the weights follow by
(5.76)
Mathematics for Engineers
202
MapThread%Table#ToExpression#"w" ! ToString#i'', i, 0, 3'.Ó1 Ó2 &,
eqs1
+Ó1 s. rCheb4 &/ s Table$xk , k, 0, 3(, Table%
1k1 +1/k1 k1
, k, 0, 3)!); TableForm#eqs1'
w0 w1 w2 w3 2 0.92388 w0 0.92388 w1 0.382683 w2 0.382683 w3 0 0.853553 w0 0.853553 w1 0.146447 w2 0.146447 w3
2 3
0.788581 w0 0.788581 w1 0.0560427 w2 0.0560427 w3 0
The solution of this system of equations is wsols
NSolve#eqs1, w0, w1, w2, w3'
w0 0.264298, w1 0.264298, w2 0.735702, w3 0.735702
The quadrature formula follows by IF
w0, w1, w2, w3.+ f +x/ s. Ó1 &/ s rCheb4 s. wsols
0.264298 f +0.92388/ 0.735702 f + 0.382683/ 0.735702 f +0.382683/ 0.264298 f +0.92388/
f+x/ 3s8NC Chebyshev cos+x/ 0.0046446 0.000629755 Æx 0.00524573 0.000762062 1 x2
0.00614962 0.00104407
To compare this with the four-point closed Newton-Cotes rule; that is, the three eight rule, we tested it on several simple functions. The results are given in the table above. Since the results for the Chebyshev roots are always better, we might suspect that there is some advantage in unequally spaced quadrature points. This issue will be explored more in the following.Ô It is clear from the preceding discussion that we can develop many different numerical integration methods and achieve high orders of convergence by simply increasing the degree of the approximating polynomial. In practice though relatively low order composite methods are often preferred. Higher order quadratures not only have more complicated weight patterns, but may also show increasing instability as the degree of the approximating polynomial gets larger. But there is one kind of quadrature where rapid convergence can be achieved while maintaining stability. The derivation of the formulas is not obviously based on interpolation, so we need a different approach. In (5.74) we showed how, once the quadrature points are fixed, the n 1 weights can be determined by making the rule exact for polynomials of degree n. The above examples suggested that there may be some advantage in using unequally spaced quadrature points. In that case, we have an extra n 1 parameters which we could use to try to make the rule exact for polynomials up to degree 2 n 1. In principle, all we have to do is to extend (5.74) by treating x0 , x1 , x2 , …, xn as unknowns, writing down the n 1 extra equations that assure exact integration for xn1 , xn2 , …, x2 n1 , and solving the
IV - Chapter 5: Numerical Integration
203
resulting system. Unfortunately, now the system is nonlinear and quite difficult to solve. But the idea is still useful, we just have to look at the problem from a different angle. The way we can solve it shows a surprising connection with orthogonal polynomials. For the interval #1, 1' it turns out that the quadrature points we need the roots of Legendre polynomials. Theorem 5.3. Gauss-Legendre Integration
Let x0 , x1 , x2 , …, xn be the roots of the Legendre polynomial Ln1 . Then there exists a set of quadrature weights so that the integration rule with x0 , x1 , x2 , …, xn as quadrature points is exact for all polynomials of degree 2 n 1. 1.0
0.5
0.0
0.5
1.0
1.0
0.5
0.0
0.5
1.0
Figure 5.4. The first few Legendre polynomial on the interval #1, 1'.
To establish quadratures of this type, we first have to find the roots of the Legendre polynomials. For n greater than three this has to be done numerically and we know from the last chapter how to do it. Once the quadrature points have been found, we can use either (5.72) or the method of undetermined coefficients (5.74) to get the weights. To see how this works out, we discuss the following example. Example 5.15. Gauss-Legendre Integration Find the points and weights for a three-point quadrature on #1, 1' that is exact for all polynomials up to degree five. Solution 5.15. We know that P3 +x/ 1 2
,5 x3 3 x0
so the corresponding roots are
Mathematics for Engineers
204
Solve#P3 +x/ 0, x'
rLege
3
x 0, x
5
!, x
3 5
!!
when we use these values in (5.74), we obtain MapThread%Table#ToExpression#"w" ! ToString#i'', i, 0, 2'.Ó1 Ó2 &,
eqs1
+Ó1 s. rLege &/ s Table$xk , k, 0, 2(, Table%
1k1 +1/k1 k1
, k, 0, 2)!); TableForm#eqs1'
w0 w1 w2 2 3 5 3 w1 5
3 5
w2
3 w2 5
w1 0
2 3
The solution of this system is given by wsol
Solve#eqs1, w0, w1, w2'
w0
8 9
, w1
5 9
, w2
5 9
!!
It is easy to verify that the rule integrates all polynomials of degree five exactly.Ô Numerical integration rules in which the quadrature point are selected for extra accuracy are called Gaussian quadratures. The following lines are an implementation of a Gauss-Legendre quadrature formula for an order of n. gaussLgendreIntegration$f_, x_, a_, b_ , n_( : Block%z, varsX Table#x, i, 0, n'; varsW Table#ToExpression#StringJoin#"w", ToString#i''', i, 0, n'; solNum Sort#Flatten#Chop#N#Solve#LegendreP#n 1, x' 0, x'''''; vec1 Transpose$ MapThread$Ó1 s. Ó2 &, Transpose$Table$varsXi , i, 0, n((, solNum ((; bi1 ai1
, i, 0, n); i1 equations Thread#Map#varsW.Ó &, vec1' vecB'; + Print#TableForm#equations''; / wsol Flatten#NSolve#equations, varsW''; +Map#f s. Ó &, solNum'/.+varsW s. wsol/ vecB
Table%
)
The following example generates an interpolation formula of order 36 which is quite accurate for
IV - Chapter 5: Numerical Integration
205
integrations f36
gaussLgendreIntegration#f#x', x, 1, 1, 36' ss N
0.0052736 f#0.997945' 0.0122368 f#0.989186' 0.019133 f#0.973493' 0.0258796 f#0.950972' 0.032471 f# 0.921781' 0.0387966 f#0.886125' 0.0449029 f#0.844253' 0.0506237 f#0.796459' 0.0560804 f#0.743079' 0.0610281 f# 0.684486' 0.0656931 f#0.621093' 0.0697183 f#0.553342' 0.0734715 f# 0.481711' 0.0764507 f#0.406701' 0.0791959 f#0.328837' 0.0810386 f# 0.248668' 0.0827024 f#0.166754' 0.0833607 f#0.0836704' 0.083886 f#0.' 0.0833567 f#0.0836704' 0.0827102 f#0.166754' 0.0810276 f#0.248668' 0.0792098 f#0.328837' 0.0764347 f#0.406701' 0.0734882 f#0.481711' 0.0697013 f#0.553342' 0.0657099 f#0.621093' 0.0610127 f#0.684486' 0.0560951 f#0.743079' 0.0506111 f#0.796459' 0.0449134 f#0.844253' 0.0387879 f#0.886125' 0.0324777 f#0.921781' 0.0258749 f#0.950972' 0.019136 f#0.973493' 0.0122353 f#0.989186' 0.00527405 f#0.997945'
Replacing the arbitrary function f by cosine we find f36 s. f Cos 1.68294
which is quite accurate if we compare this with the symbolic result. 1.
à cos+x/ Šx 1
1.68294
Example 5.16. Integration Application Let us consider a sailing boat subject to the action of the wind force (see Figure 5.5). The mast, of length L, is denoted by the straight line AB, while one of the two shrouds (strings for the side stiffening of the mast) is represented by the straight line BO. Any infinitesimal element of the sail transmits to the corresponding element of length dx of the mast a force of magnitude equal to f +x/ dx. The change of f along with the height x, measured from point A (basis of the mast), is expressed by the following force law f +x_/ :
x D Æ x +J/ x E
where D, E, and J are known constants. We assume for the following calculation of the total force R on the mast the parameters D 50, E 5 s 3, and J 1 s 4. We expect that the results have at least an accuracy of 104 .
Mathematics for Engineers
206
Figure 5.5. Geometry of a sailing boat. L measures the mast length and b the height where the total force R acts on the mast. The real object in the wind (left). Forces R, H, T and V acting on the mast.
Solution 5.16. The resultant R of the force f is defined as L
R
à f +x/ Šx
I+ f /,
(5.77)
0
and is applied at a point at distance equal to b (to be determined) from the basis of the mast. Computing R and the distance b, given by b I+x f / s I+ f /, is crucial for the structural design of the mast and shroud sections. Indeed, once the values of R and b are known, it is possible to analyze the structure mast-shroud (using for instance the method of forces), thus allowing for the computation of the reactions V and H at the basis of the mast and the traction T that is transmitted by the shroud, and are drawn in Figure 5.5 (right). Then, the internal actions in the structure can be found, as well as the maximum stresses arising in the mast AB and in the shroud BO, from which, assuming that the safety verifications are satisfied, one can finally design the geometrical parameters of the sections of AB and BO. The force distribution along the mast is determined by the force density of the sail. Due to the geometry of the sail it is plausible that the largest partial forces will be generated at the base of the sail at x 0 (see Figure 5.6).
IV - Chapter 5: Numerical Integration
207
20 D
70
D
20
f
15
10
5
0 0
5
10
15
20
x Figure 5.6. Force density f of the sail for different parameters D ± 20, 30, 40, 50, 60, 70 with E
5 s 3, and J
1 s 4.
The total force acting at a height b on the mast can be obtained by integrating the force density along the mast length L. The following definition of the total force assumes that D 50, E 5 s 3, and J 1 s 4. L
F+L_/ : Ã
f +x/ s. D 50, E
0
5 3
,J
1 4
! Åx
For this parameters we will find the total force R to be R
F+L/
50 If%Re+L/
Integrate%
Æ
5 3
L
Î L ² 5,
5 Æ5s12 Ei- 4 3
5 1 12
5
5 Æ5s12 Ei- 12 1 3
L
4 Æ 4 4,
x 4
x
x 5 3
, x, 0, L, Assumptions » Re+L/
5 3
Î L ² 5 ))
The symbolic result shows us that the total force will be related to exponential integrals which are a special type of functions, depending on the mast length L. A constraint of the derived result is that the Length of the mast should be always greater than 5 s 3. For practical applications this is always the case. If we change the mast length L we will find a total force represented in Figure 5.7. The curve in this graph shows that the total force increases if we increase the length of the mast and reaches a nearly stable value for mast length greater than 15.
208
Mathematics for Engineers
100 80
R
60 40 20 0 0
5
10
15
20
L Figure 5.7. Total Force R as a function of the mast length L for D
50, E
5 s 3, and J
The total force is located at height b which is determined by the ratio b L
b+L_/ :
5
1 s 4.
I+x f / s I+ f /.
1
¼0 -x f +x/ s. D 50, E 3 , J 4 !1 Å x L
5
1
¼0 - f +x/ s. D 50, E 3 , J 4 !1 Å x
If we graph this quantity with respect to the mast height L, then we get a curve shown in Figure 5.8. Again we observe that the force center reaches some saturation if the mast height is larger than 15. 5
4
b
3
2
1
0 0
5
10
15
20
L Figure 5.8. Location of the total force R measured from the bottom of the mast as a function of the total mast height L.Ô
IV - Chapter 5: Numerical Integration
209
5.3.1 Tests and Exercises The following two subsections serve to test your understanding of the last section. Work first on the test examples and then try to solve the exercises. 5.3.1.1 Test Problems T1. Summarize the basic idea of Gaussian integration. T2. State the advantages of a Gauß integral. T3. What is the difference of Simpson's rule and Gausß integration? T4. How are the integration points for a Gauß integral defined? T5. How do we get the weights for Gauß integrals?
5.3.1.2 Exercises E1. Use the Gauss quadrature to approximate the integrals: Ss2
a. ¼0 Æsin+x/ Å x 1
b. ¼0 Æ+x1/sx Å x 1
c. ¼0 ln+1 s x/ sin+x/ Å x 1
d. ¼0 x5 Æx Å x 2
Compare your results with problem No. E1 of section 4.2. E2. Use a Chebyshev-Gauss integration of order 4 with Chebyshev-T polynomials to approximate the following integrals: 1
a. ¼1 cos+x/ Å x 1
1
b. ¼1 2x Å x Compare your results with Simpson's 3 s8 rule. E3. Use Gauss-Legendre quadrature of order 3 to approximate the integral 1
a. ¼1 cos+x/ Å x 1
1
b. ¼1 2x Å x Compare your results with problem No. E2. E4. Compute, with an error less than 104 , the following integrals sin+x/
a. ¼0
,1x4 0 Æx
b. ¼0 c.
Åx Åx
+1x/5 x2 ¼ cos+x/ Æ
Åx
E5. Let us consider the quadrature formula Q+ f / I+ f /
1
w1 f +0/ w2 f +1/ w3 f ' +0/ for the approximation of
¼0 f +x/ Å x, where f ± C1 +#0, 1'/. Determine the coefficients w j , for j of exactness r 2.
1, 2, 3 in such a way that Q has degree
Mathematics for Engineers
210
5.4 Monte Carlo Integration Evaluation of integrals can also be done by approximations with random numbers. Random numbers are numbers which are generated in a random way within a given interval. Since the gambling games like roulette are mostly based on such random numbers and the finest gambling saloons are located in Monte Carlo the integration methods based on random numbers are called Monte Carlo procedures. To approximate a finite integral like 1
à f +x/ Šx
(5.78)
0
we generate random numbers x1 , x2 , x3 , …, xn in the interval +0, 1/ and sum the corresponding function values in the following way 1
à f +x/ Šx 0
1 n
n
Å f +xi /.
(5.79)
i 1
Here the integral is approximated by the average of n numbers f +x1 /, f +x2 /, …, f +xn /. When this is actually carried out, the error is of order 1 t
n , which is not at all competitive with good algorithms.
However, in higher dimensions, the Monte Carlo method can be quite attractive. For example 1
1
1
à à à f +x, y, z/ Šx Šy Šz 0
0
0
1 n
n
Å f +xi , yi , zi /
(5.80)
i 1
where +xi , yi , zi / is a random sequence of n points in the unit cube 0 x 1, 0 y 1, and 0 z 1. To obtain random points in the cube, we assume that we have a random sequence in +0, 1/ denotes by [1 , [2 , [3 ,… To get our first random point p1 in the cube, just let p1 +[1 , [2 , [3 /. The second is of course, p2 +[4 , [5 , [6 /, and so on. The following examples demonstrate the procedure for a simple integral and for a more advanced application. Example 5.17. Monte Carlo Integration of Functions Find the integral value of a function f +x/
Æ x for the interval #0, 1' by using Monte Carlo methods.
1
x Ã Æ Åx
?
(5.81)
0
Solution 5.17. We first evaluate the integral by determining the symbolic results which are combined with the boundaries to give
IV - Chapter 5: Numerical Integration
211
1
x Ã Æ Åx
I0
0
Æ1
The Monte Carlo method assumes that we have to generate a sequence of random numbers which are used as nodes in the integration step. Using this random numbers for x in the evaluation of the function values, we can generate the integral by summing these values and divide it by the number of generated random numbers. monteCarloIntegrate$f_, x_, a_, b_ , n_( : Block%, randomSequence
RandomReal#a, b'ni 1 ; functionValues
+f s. x Ó1 &/ s randomSequence;
Fold#Plus, 0, functionValues' +b a/ n
)
The application of the defined function to the exponential function delivers IM
monteCarloIntegrate+exp+x/, x, 0, 1, 10 000/
1.72261
This result is accurate up to H
I0 IM
0.00432425
the second digit.Ô The next example discusses how an integral for a given function in two dimensions is evaluated over a finite domain. Example 5.18. Monte Carlo Integration II Let us consider the problem of obtaining the numerical value of the integral à à f +x, y/ Å x Å y
à à sin- ln+x y 1/ 1 Šx Šy
:
:
(5.82)
over a disk in x, y-space, defined by the inequality :
+x, y/ x
1 2
2
y
1 2
2
1 4
!.
(5.83)
Solution 5.18. The domain is a circle of radius r 1 s 2 a sketch of this domain is given in the following plot filled with dots randomly distributed over this area. The surface defined over this area is shown beside this plot.
Mathematics for Engineers
212
We proceed by generating random points in the square #0, 1' #0, 1' and discard those that do not lie q in the disk of radius r 1 s 2. We take from the specified n points in the square only the n values which belong to the disk. The integral is estimated by à à f +x, y/ Šx Šy :
q +area of disk :/ +average height of f over n points/
,S r2 0
1 n
(5.84)
n
Å f +xi , yi /. i 1
This formula and the related domain is implemented next monteCarloIntegration$f_, x_, a_, b_ , y_, c_, d_ , n_( : Block%randVals, intvals, integral, randVals
DeleteCases% If% Ó1317
1
2
2
Ó1327
1 2
2
1 4
, Ó1, Null) & s
Partition#RandomReal#'ni 1 , 2', Null); intvals
+f s. x Ó1317, y Ó1327 &/ s randVals; 2
integral
S , 12 0 Fold#Plus, 0, intvals' Length#randVals'
)
The numerical integration for 105 points in the square delivers monteCarloIntegration%Sin%
Log#x y 1' ), x, 0, 1, y, 0, 1, 100 000)
0.567768
A sequence of different numbers of iteration taken from the square #0, 1' #0, 1' is shown in the following plot
IV - Chapter 5: Numerical Integration
213
ListPlot%Table%i, monteCarloIntegration%Sin%
Log#x y 1' ), x, 0, 1, y, 0, 1, i)!,
i, 10, 100 000, 1000), FrameLabel "n", "IM ")
0.570
IM
0.569 0.568 0.567 0.566 0
20 000
40 000
60 000
80 000
100 000
n
It is obvious that the results converge to a finite value which is about 0.5675… demonstrating that this integral has a finite value.Ô
5.4.1 Tests and Exercises The following two subsections serve to test your understanding of the last section. Work first on the test examples and then try to solve the exercises. 5.4.1.1 Test Problems T1. What is a Monte Carlo integration? T2. What do we need for Monte Carlo integration?
5.4.1.2 Exercises 1 1 1x
E1. Approximate the integral ¼0
Å x by a Monte Carlo integration up to an accuracy of 103 .
E2. Evaluate the integral y à à ln+1 x/ Æ Å x Å y
(1)
,y z 0 ÅxÅ yÅz à à à ln+1 x/ Æ
(2)
2
:
for the domain :
+x, y/ x2 y2 2 .
E3. Evaluate the integral 2
2
:
for the domain :
+x, y, z/ x2 y2
1 4
z2 2 .
E4. Evaluate the integral 2 2 2 ,x y z 0 ÅxÅ yÅz à à à ,x y z 0 Æ 2
:
2
2
(3)
214
for the domain :
Mathematics for Engineers
+x, y, z/
1 2
x 2 y2
1 4
z2 2 .
6 Solutions of Equations
6.1 Introduction Systems of simultaneous linear equations occur in solving problems in a wide variety of disciplines, including mathematics, statistics, the physical, biological, and social sciences, engineering, and business. They arise directly in solving real-world problems, and they also occur as part of the solution process for other problems, for example, solving systems of simultaneous nonlinear equations. Numerical solutions of boundary value problems and initial boundary value problems for differential equations are a rich source of linear systems, especially large-size ones. In this chapter, we will examine some classical methods for solving linear systems, including direct methods such as the Gaußian elimination method, and iterative methods such as the Jacobi method and Gauß-Seidel method. The notation and theory for linear systems are given in the first section. By using matrix algebra we extend the ideas for solving linear systems and consider special types of linear systems and the effect of rounding errors. The next section introduces the idea of an iteration method for solving linear systems, giving some of the more common and simple iteration procedures.
216
Mathematics for Engineers
6.2 Systems of linear Equations One of the topics studied in elementary algebra is the solution of pairs of linear equations such as a x b y d x e y
c f
(6.1)
The coefficients a, b, …, f are given constants, and the task is to find the unknown values x, y. In this chapter, we examine the problem of finding solutions to longer systems of linear equations, containing more equations and unknowns. To write the most general system of linear equations that we will study, we must change the notation used in (6.1) to something more convenient. Let n be a positive integer. The general form for a system of n linear equations in the n unknowns x1 , x2 , x3 , …xn is a11 x1 a12 x2 ... a1 n xn b1 a21 x1 a22 x2 ... a2 n xn b2 am1 x1 am2 x2 ... amn xn bm .
(6.2)
The coefficients are given symbolically by aij , with i the number of the equation and j the number of the associated unknown component. On some occasions, to avoid possible confusion, we also use the symbol ai, j . The right-hand side b1 , b2 , …, bm are numbers given; and the problem is to calculate the unknowns x1 , x2 , …, xn . The linear system is said to be of order n. A solution of a linear equation a1 x1 a2 x2 … an xn b is a sequence of n numbers s1 , s2 , s3 , …, sn such that the equation is satisfied when we substitute x1 s1 , x2 s2 , …, xn sn . The set of all solutions of the equation is called its solution set or sometimes the general solution of the equation. A finite set of linear equations in the variables x1 , x2 , …, xn is called a system of linear equations or a linear system. The sequence of numbers s1 , s2 , s3 , …, sn is called a solution of the system if x1 s1 , x2 s2 , …, xn sn is a solution of every equation in the system. A system of equations that has no solutions is said to be inconsistent; if there is at least one solution of the system, it is called consistent. To illustrate the possibilities that can occur in solving systems of linear equations, consider a general system of two linear equations in the unknowns x1 x and x2 y: a1 x b1 y
c1
with a1 , b1 not both zero
a2 x b2 y
c2
with a2 , b2 not both zero.
The graphs of these equations are lines. Since a point +x, y/ lies on a line if and only if the numbers x and y satisfy the equation of the line, the solutions of the system of equations correspond to points of intersection of line l1 and line l2 . There are three possibilities, illustrated in Figure 6.1:
IV - Chapter 6: Solutions of Equations
217
2.5 2.0
y
1.5 1.0
l1
0.5 0.0
l2
0.5 1.0 0
1
2
3
4
5
x 3.0 2.5
y
2.0 1.5 l2 1.0 0.5 l1 0.0 0
1
2
3
4
5
x 2.5 2.0 1.5 y
l2 1.0 0.5 l1 0.0 0
1
2
3
4
5
x Figure 6.1. The three possible scenarios to solve a system of linear equations. Top a single and unique solution exists, middle no solution exists, and bottom an infinite number of solutions exists.
The line l1 and l2 may intersect at only one point, in which the system has exactly one solution The lines l1 and l2 may be parallel, in which case there is no intersection and consequently no solution to the system.
Mathematics for Engineers
218
The lines l1 and l2 may coincide, in which case there are infinitely many points of intersection and consequently infinitely many solutions to the system. Although we have considered only two equations with two unknowns here, we will show that the same three possibilities hold for arbitrary linear systems: Remark 6.1. Every system of linear equations has no solutions, or has exactly one solution, or has infinitely many solutions. For linear systems of small orders, such as the system (6.1) it is possible to solve them by paper and pencil calculations or with the help of a calculator, with methods learned in elementary algebra. For systems arising in most applications, however, it is common to have larger orders, from several dozens to millions. Evidently, there is no hope to solve such large systems by hand. We need to employ numerical methods for their solutions. For this purpose, it is most convenient to use the matrix/vector notation to represent linear systems and to use the corresponding matrix/vector arithmetic for their numerical treatment. The linear system of equations (6.2) is completely specified by knowing the coefficients aij and the right-hand constants bi . These coefficients are arranged as the elements of a matrix:
A
a11 a12 … a1 n a21 a22 … a2 n
(6.3)
an1 an2 … ann We say aij is the +i, j/ entry of a matrix A. Similarly, the right-hand constants bi are arranged in the form of a vector
b
b1 b2 bn
(6.4)
The letters A and b are the names given to the matrix and the vector. The indices of aij now give the numbers of the row and column of A that contain aij . The solution x1 , x2 , …, xn is written similarly ¶x
x1 x2 xn
(6.5)
With this notation the linear system (6.2) is then written in compact form ¶ A.x
b.
(6.6)
The reader with some knowledge of linear algebra will immediately recognize that the left-hand side of ¶ (6.6) is the matrix multiplied by a vector x, and (6.6) expresses the equality between the two vectors
IV - Chapter 6: Solutions of Equations
219
¶ A.x and b. The basic method for solving a system of linear equations is to replace the given system by a new system that has the same solution set but is easier to solve. This new system is generally obtained in a series of steps by applying the following three types of operations to eliminate unknowns symbolically: 1. Multiply an equation through by a nonzero constant. 2. Interchange two equations. 3. Add a multiple of one equation to another. Since the rows (horizontal lines) of an augmented matrix correspond to the equations in the associated system, these three operations correspond to the following operations on the rows of the augmented matrix: 1. Multiply a row through by a nonzero constant. 2. Interchange two rows. 3. Add a multiple of one row to another row. These are called elementary row operations. The following example illustrates how these operations can be used to solve systems of linear equations. Since a systematic procedure for finding solutions will be derived in the next section, it is not necessary to worry about how the steps in this example were selected. The main effort at this time should be devoted to understanding the computations and the discussion. Example 6.1. Using Elementary Row Operations In the first line below we solve a system of linear equations by operating on the equations in the system, and in the same line we solve the same system by operating on the rows of the augmented matrix. Solution 6.1. x y2 z 9 2 x4 y3 z 1 3 x6 y5 z 0
1 1 2 9 2 4 3 1 3 6 5 0
Add 2 times the first equation (row) to the second to obtain which x y2 z 9 2 y 7 z 17 3 x6 y5 z 0
1 1 2 9 0 2 7 17 3 6 5 0
Add 3 times the first equation (row) to the third to obtain x y2 z 9 2 y 7 z 17 3 y 11 z 27
1 1 2 9 0 2 7 17 0 3 11 27
Multiply the second equation (row) by 1 s 2 to obtain
Mathematics for Engineers
220
x y2 z y
7 2
9 17 2
z
3 y 11 z
27
1 1
2
9
0 1
7 2
17 2
0 3 11 27
Add 3 times the second equation (row) to the third to obtain x y2 z
9
1 1
17 2 3 2
7 y 2 z 1 z 2
0 1 0 0
2
9
7 2 1 2
17 2 3 2
Multiply the third equation (row) by 2 to obtain x y2 z y
7 2
z
z
9 17 2
3
1 1
2
9
0 1
7 2
17 2
0 0
1
3
Add 1 times the second equation (row) to the first to obtain x y
11 z 2 7 z 2
z
35 2 17 2
1 0 0 1
3
0 0
11 2 7 2
35 2 17 2
1
3
11
Add 2 times the third equation (row) to the first and obtain x y z
1 2 3
The solution thus is x
7 2
times the third equation to the second to
1 0 0 1 0 1 0 2 0 0 1 3 1, y
2, z
3.Ô
6.2.1 Gauß Elimination Method We have just seen how easy it is to solve a system of linear equations once its augmented matrix is in reduced row-echelon form. Now we shall give a step-by-step elimination procedure that can be used to reduce any matrix to reduced row-echelon form. As we state each step in the procedure, we shall illustrate the idea by reducing the following matrix to reduced row-echelon form. To demonstrate the procedure let us consider the following augmented matrix 0 0 2 0 7 12 2 4 10 6 12 28 2 4 5 6 5 1 Step 1: Locate the leftmost column that does not consist entirely of zeros.
IV - Chapter 6: Solutions of Equations
221
0 0 2 0 7 12 2 4 10 6 12 28 2 4 5 6 5 1 Step 2: Interchange the top row with another row, if necessary, to bring a nonzero entry to the top of the column found in step 1. 2 4 10 6 12 28 0 0 2 0 7 12 2 4 5 6 5 1 Step 3: If the entry that is now at the top of the column found in step 1 is a, multiply the first row by 1 s a in order to introduce a leading 1. 1 2 5 3 6 14 0 0 2 0 7 12 2 4 5 6 5 1 Step 4: Add suitable multiples of the top row to the rows below so that all enters below the leading 1 become zeros. 1 2 5 3 6 14 0 0 2 0 7 12 0 0 5 0 17 29 Step 5: Now cover the top row in the matrix and begin again with step 1 applied to the submatrix that remains. Continue in this way until the entire matrix is in row-echelon form. 1 2 5 3
6
14
0 0
1
0
7 2
6
0 0
5
0 17 29
1 2 5 3
6
14
7 2 1 2
6
1 2 5 3
6
14
0 0
1
0
7 2
6
0 0
0
0
1
2
0 0
1
0
0 0
0
0
1
The entire matrix is now in row-echelon form. To find the reduced row echelon form we need the following additional step. Step 6: Beginning with the last nonzero row and working upward, add suitable multiples of each row to the rows above to introduce zeros above the leading 1's.
Mathematics for Engineers
222
1 2 5 3
6
14
0 0
1
0
7 2
6
0 0
0
0
1
2
1 2 5 3 6 14 0 0 1 0 0 1 0 0 0 0 1 2 1 2 5 3 0 2 0 0 1 0 0 1 0 0 0 0 1 2 1 2 0 3 0 7 0 0 1 0 0 1 0 0 0 0 1 2 The last matrix is in reduced row-echelon form. When we use only the first five steps, the above procedure creates a row-echelon form and is called Gaußian elimination. Caring out step 6 in addition to the first five steps which generates the reduced row-echelon form is called Gauß-Jordan elimination. Remark 6.2. It can be shown that every matrix has a unique reduced-echelon form; that is, one will arrive at the same reduced row-echelon form for a given matrix no matter how the row operations are varied. In contrast, a row-echelon form of a given matrix is not unique; different sequences of row operations can produce different row-echelon forms. In Mathematica there exists a function which generates the reduced-echelon form of a given matrix. For the example above the calculation is done by 0 0 2 0 7 12 MatrixForm%RowReduce% 2 4 10 6 12 28 )) 2 4 5 6 5 1 1 2 0 3 0 7 0 0 1 0 0 1 0 0 0 0 1 2
Example 6.2. Gauß-Jordan Elimination Solve by Gauß-Jordan elimination the following system of linear equations x1 5 x2 3 x3 5 x5 3 x1 7 x2 4 x3 x4 3 x5 2 x6 2 x1 9 x2 9 x4 3 x5 12 x6
1 3 7
Solution 6.2. The augmented matrix of this system is
IV - Chapter 6: Solutions of Equations
am
223
1 5 3 0 5 0 1 3 7 4 1 3 2 3 ; 2 9 0 9 3 12 7
The reduced-echelon form is gained by adding 3 times the first row to the second row am
1 5 3 0 5 0 1 0 8 5 1 12 2 0 ; 2 9 0 9 3 12 7
Adding in addition 2 times the first row to the third one gives am
1 5 3 0 5 0 1 0 8 5 1 12 2 0 ; 0 1 6 9 7 12 5
Multiplying the third by 1 and interchanging the second with the third row will give am
1 5 3 0 5 0 1 0 1 6 9 7 12 5 ; 0 8 5 1 12 2 0
A multiple of the second row of 8 will give am
1 5 3 0 5 0 1 0 1 6 9 7 12 5 ; 0 0 43 73 44 98 40
Division of the last row by 43 gives am
1 5 3 0 5 0 1 6 9 7 0 0 1
73 43
44 43
0 12
1 5 ;
98 43
40 43
Adding a multiple of 6 of the third row to the second row produces 1 5 3 0 am
0 1 0 0 0 1
51 43 73 43
5
0
1
37 43 44 43
72 43 98 43
25 43 40 43
;
Three time the last row added to the first row generates 1 5 0 am
0 1 0 0 0 1
219 43 51 43 73 43
83 43 37 43 44 43
294 43 72 43 98 43
163 43 25 43 40 43
;
5 times the second row added to the first row gives
224
Mathematics for Engineers
36
1 0 0 43 am
0 1 0 0 0 1
51 43 73 43
102 43
37 43 44 43
66 43 72 43 98 43
38 43 25 43 40 43
;
The same result is generated by MatrixForm#RowReduce#am'' 36
102 43 37 43 44 43
66 43 72 43 98 43
1 0 0 43 0 1 0 0 0 1
51 43 73 43
38 43 25 43 40 43
The same augmented matrix am can be treated by a function which prints the intermediate steps. MatrixForm#am' 1 5 3 0 5 0 1 3 7 4 1 3 2 3 2 9 0 9 3 12 7
The application of the function GaussJordanForm to the matrix am is shown in the following lines. On the left of the matrix the operations carried out on rows are stated GaussJordanForm#am' ss MatrixForm
Forward pass
1 5 3 0 5 0 1 0 (Row 2)-(3)*(Row 1) 0 8 5 1 12 2 2 9 0 9 3 12 7 1 5 3 0 5 0 1 0 (Row 3)-(2)*(Row 1) 0 8 5 1 12 2 0 1 6 9 7 12 5 1 5 (Row 2)/(8) 0 1
3 0 5
0
5 8
1 8
0 1 6
3 2
1 1 4
9 7 12 5
1 5 3 0 (Row 3)-(1)*(Row 2) 0 1 0 0
(Row
43 3)/( 8 )
0
58 43 8
1 8 73 8
5
0
3 2
11 2
1 5 3 0
5
0
1
0 1 58
3 2
14
0
0 0 1
1 8 73 43
44 98 43 43
40 43
1 1 4 49 4
0 5
IV - Chapter 6: Solutions of Equations
225
Backward pass
1 5 0 (Row 1)-(3)*(Row 3) 0 1 58 0 0 1
219 43 1 8 73 43 219 43 51 43 73 43
1 5 0 5
(Row 2)-( 8 )*(Row 3) 0 1 0 0 0 1
83 43 3 2 44 43 83 43 37 43 44 43
1 0 0 36 102 43 43 51 43 73 43
(Row 1)-(5)*(Row 2) 0 1 0 0 0 1 36
102 43 37 43 44 43
1 0 0 43 51 43 73 43
0 1 0 0 0 1
66 43 72 43 98 43
37 43 44 43
294 43
163 43
14
0
40 43
98 43
294 43 72 43 98 43 66 43 72 43 98 43
163 43 25 43 40 43 38 43 25 43 40 43
38 43 25 43 40 43
in a single step. The result is that there are three leading variables x1 , x2 , and x3 which are determined by x4 , x5 , and x6 Ô
6.2.2 Operations Count It is important to know the length of a computation and, for that reason, we count the number of arithmetic operations involved in Gauß-Jordan elimination. For reasons that will be apparent in later sections, the count will be divided into three parts. Table 6.1. The following Table contains the number of operations for each of the steps in the elimination Step
Additions
1
+n 1/2
+n 1/2
n1
2
+n 2/
n1
1
+n 2/ 1
n2 1
Total
n+n1/ +2 n1/ 6
n+n1/ +2 n1/ 6
n+n1/ 2
2
Multiplications Divisions 2
The elimination step. We count the additions/subtractions (A,S), multiplications (M), and the divisions (D) in going from the original system to the triangular system. We consider only the operations for the coefficients of A and not for the right-hand side b. Generally the division and
Mathematics for Engineers
226
multiplications are counted together, since they are about the same in operation time. Doing this gives us AS
MD
n +n 1/ +2 n 1/
(6.7)
6 n +n 1/ +2 n 1/ 6
n +n 1/
1
2
3
n ,n2 10
(6.8)
AS denotes the number of addition and subtraction and MD denotes that of multiplication and division. Modification of the right side b. Proceeding as before, we get AS
MD
n+n 1/
+n 1/ +n 2/ … 1
(6.9)
2 n+n 1/
+n 1/ +n 2/ … 1
(6.10)
2
The back substitution step. As before AS
MD
0 1 … +n 1/
n+n 1/
n+n 1/
12…n
(6.11)
2
2
(6.12)
.
¶ Combining these results, we observe that the total number of operations to obtain x is AS
MD
n+n 1/ +2 n 1/ 6
n+n 1/ 2
n,n2 3 n 10
n+n 1/
n+n 1/ +2 n 5/
2
6
(6.13)
(6.14)
3
Since AS and MD are almost the same in all of these counts, only MD is discussed. These operations are also slightly more expensive in running time. For large values of n, the operation count for GaußJordan elimination is about n3 t 3. This means that as n is doubled, the cost of solving the linear system increases by a factor of 8. In addition, most of the cost of Gauß-Jordan elimination is in the elimination step since for the remaining steps MD
n+n 1/ 2
n+n 1/ 2
n2 .
(6.15)
Thus, once the elimination steps have been completed, it is much less expensive to solve the linear system. Consider solving the linear system
IV - Chapter 6: Solutions of Equations
¶ A.x
b
where A has order n and is non singular. Then A1 exists and A1 .+A.x¶ / A1 .b ¶ x
A1 .b.
227
(6.16)
(6.17) (6.18)
Thus, if A1 is known, then x¶ can be found by matrix multiplication.
For this, it might at first seem reasonable to find A1 and to solve for x¶. But this is not an efficient procedure, because of the great cost needed to find A1 . The operations cost for finding A1 can be shown to be MD
n3 .
(6.19)
¶ This is about three times the cost of finding x by the Gauß-Jordan elimination method. Actually there is no savings in using A1 . The chief value of A1 is a theoretical tool for examining the solution of non singular systems of linear equations. With a few exceptions, one seldom needs to calculate A1 explicitly.
6.2.3 LU Factorization When we use matrix multiplication, another meaning can be given to the Gaußian elimination method. The matrix of coefficients of the linear system being solved can be factored into the product of two triangular matrices. We begin this section with the result, and then we discuss the implications and ¶ applications of it. Let A.x b denote the system to be solved, as before, with A the n n coefficient matrix. In the elimination step of Gauß-elimination the linear system was reduced to the upper ¶ triangular system U.x g with
U
u11 u12 … u1 n 0 u22 … u2 n 0 0 … unn
(6.20)
Introduce an auxiliary lower triangular matrix L based on the multipliers of the Gauß-elimination operation.
L
1 0 … 0 l21 1 … 0 ln1 ln2 … 1
(6.21)
The relationship of the matrices L and U to the original matrix A is given by the following theorem.
Mathematics for Engineers
228
Theorem 6.1. LU Factorization
Let A be a non singular matrix, and let L and U be defined by (6.21) and (6.20). If U is produced without pivoting, then LU
A.
(6.22)
This is called the L U factorization of A. Also there is a result analogous to (6.22) when pivoting is used, but it will not be needed here. The strategy to solve the linear system of equations using the decomposition of A is as follows: ¶ ¶ 1. Step: Rewrite the system A.x b as L.U.x b. ¶ 2. Step: Define a new n 1 matrix y by U.x y 3. Step: Use U.x¶ y to rewrite L.U.x¶ b as L.y b and solve this system for y. ¶ 4. Step: Substitute y in U.x
y and solve for ¶x.
¶ Although this procedure replaces the problem of solving the single system A.x b by the problem of solving the two systems L. y b and U.x¶ y, the latter system are easy to solve because the coefficient matrices are triangular. To find formulas for the lij and uij , we multiply the elements of L and U and equate the results to the elements of the matrix A. Multiplying the first row of L and the first column of U shows that u11
a11 .
(6.23)
Then, from the second row of L and the first column of U, l21 u11
a21
(6.24)
.
(6.25)
so that l21
a21 a11
We can see without too much difficulty how to proceed: u12
a12
l21 u12 u22
(6.26) a22
(6.27)
giving u22
a22
a21 a11
a12
(6.28)
and so on. The general scheme is lij
1 u jj
j1
aij Å lik ukj ! k 1
i
2, 3, …, n;
j
1, 2, 3, …i 1
(6.29)
IV - Chapter 6: Solutions of Equations
229
j1
aij Å lik ukj
uij
i
1, 2, …, n
j
i, i 1, …, n.
(6.30)
k 1
If none of the denominators u jj are zero. This determines the matrices L and U uniquely. Although not entirely obvious, LU factorization is essentially equivalent to the Gauß elimination method. To see this, assume for the moment that no row interchanges are needed and look at the element in row i and column j. Since the elements in row i change only during the first i 1 stages, we know that A+k/ i, j
A+n/ i, j
(6.31)
where k i. Also from the way the elements are computed, we can see that i1
A+n/ i, j
aij Å mi,k1 A+k/ k, j
i1
aij Å mi,k1 A+n/ k, j ,
k 1
(6.32)
k 1
and mi, j1
A+n/ i, j
1
A+n/ j, j
A+n/ j, j
i1
aij Å mi,k1 A+n/ k, j !.
(6.33)
k 1
If we compare (6.32) and (6.33) with (6.29) and (6.30), we find that they are the same if we identify uij
A+n/ ij
(6.34)
lij
mi, j1 .
(6.35)
and
The upper triangular matrix in the LU decomposition is the reduced matrix of the Gauß elimination method, while the lower triangular matrix L contains all the multipliers used in the Gauß elimination method. As described, the LU factorization breaks down if one of the elements uii is zero, but as in the Gauß elimination method, this can be fixed by a row interchange. To express row interchanges formally, we introduce the concept of a permutation matrix. The square matrix P is said to be a permutation matrix if it consists entirely of zeros and ones, and each row and column has exactly one nonzero element. Example 6.3. Permutation Matrix The matrix P
RotateLeft#IdentityMatrix#3''; MatrixForm#P'
0 1 0 0 0 1 1 0 0
Mathematics for Engineers
230
is a permutation matrix. Solution 6.3. If we multiply a 3 3 matrix A by P, the second row of A becomes the first row of PA, the third row of A becomes the second row of P.A, and the first row of A becomes the third row of P.A. a b c MatrixForm%P. d e f ) g h k d e f g h k a b c
Post-multiplying by P rearranges columns, putting the third column into the first, the first into the second, and the second into the third. a b c MatrixForm% d e f .P) g h k c a b f d e k g h
.Ô Theorem 6.2. Permutation Matrix and LU Decomposition
Let A be a n n non singular matrix. Then there exists some permutation matrix P such that P.A
L.U
(6.36)
where L is a lower triangular matrix of the form (6.21) and U is an upper triangular matrix of the form (6.20). Because the Gauß elimination method and LU method are effectively identical, the total number of basic additions and multiplications to get the decomposition is approximately n3 t 3. Suppose we have found the decomposition PA
L U.
(6.37)
Then the linear system ¶ A.x b
(6.38)
leads to ¶ LU x
Pb ¶ Now we set U.x
(6.39) ¶ z and solve
IV - Chapter 6: Solutions of Equations
¶ L.z P b ¶ ¶ for z. Since L is triangular, solving for z can be done in O,n2 0 operations. Next, we solve ¶ U.x
¶ z
231
(6.40)
(6.41)
¶ for x, which can also be done in O,n2 0 operations. Thus, once the L U decomposition has been found, solving a linear system is a fast process. The implication of this is that if we have to solve a linear system many times with different right hand sides, we should probably do a L U decomposition first. It is also possible to use the Gauß elimination method in such situations by putting several right sides in the augmented matrix, but this is possible only if all of the right sides are available at the same time. The L U method does not require this and is therefore more flexible. Example 6.4. LU Decomposition To demonstrate the procedure discussed above let us consider a matrix A and a right hand side b. The matrix is given by A
1 2 3 a b c 4 6 8
1 2 3 a b c 4 6 8
The right hand side of our equation is b0
g, h, k
g, h, k
where the symbols a, b, c, g, h, k are constants. The equations are TableForm#Thread#A.x1, x2, x3 b0'' x1 2 x2 3 x3 g a x1 b x2 c x3 h 4 x1 6 x2 8 x3 k
Solution 6.4. The LU decomposition with pivoting is generated by the function "LUDecomposition[]" which generates a matrix representation which is decomposable in a lower and upper triangular matrix lu, p, cn
LUDecomposition#A'
1 2 3 4 , 1, 3, 2, 1! 4 2 b a a 2 a2bc
232
Mathematics for Engineers
The first element is a combination of upper- and lower-triangular matrices, the second element is a vector specifying rows used for pivoting, and for approximate numerical matrices A the third element is an estimate of the condition number of A. The decomposition due to the algorithm described above is done by l, u
LUMatrices#lu'
1, 0, 0, 4, 1, 0, a, a
b , 1!!, 2
1, 2, 3, 0, 2, 4, 0, 0, a 2 b c!
resulting into two matrices L
l and U
u where the lower and upper matrix has the form
MatrixForm#l' 1 4
0 1
a a
0 0 b 2
1
MatrixForm#u' 1 2 3 0 2 4 0 0 a2bc
To show that the product of the lower and upper matrix generates the original matrix with some pivoting steps we exchange the rows of A and subtract the product of L and U which simplifies to 0. Simplify#A3 p7 l.u' 0 0 0 0 0 0 0 0 0
The solution of the auxiliary system L.z¶ zV
P.b is a vector with solutions for the zi variables
Flatten#z1, z2, z3 s. Solve#Thread#l.z1, z2, z3 IdentityMatrix#3'3 p7.b0', z1, z2, z3''
g, k 4 g,
1 2
+6 a g 2 a k 4 b g b k 2 h/!
The final equation is generated and solved by the following step Solve#Thread#u.x1, x2, x3 zV', x1, x2, x3' x1
8 b g 3 b k 6 c g 2 c k 2 h
x2
2 +a 2 b c/ 8ag3ak 4cgck 4h 2 +a 2 b c/
,
, x3
6 a g 2 a k 4 b g b k 2 h 2 +a 2 b c/
!!
IV - Chapter 6: Solutions of Equations
233
which gives us in fact the solution which can be derived by the Gauß elimination method, too.Ô While the Gauß elimination method and LU methods are the most suitable for solving general linear equations, in practice one often deals with systems that have a special structure. In such cases, we can use variants of the LU method to increase the efficiency. Definition 6.1. Cholesky Factorization
If for a symmetric matrix A there exists a lower triangular matrix L such that L LT ,
A
(6.42)
and A is said to have a Cholesky factorization. If a matrix has a Cholesky factorization, it can be found in a straight forward manner. From a11 a12 … a1 n a21 a22 … a2 n
an1 an2 … ann
l11 0 … 0 l11 l21 … ln1 l21 l22 … 0 0 l22 … ln2 . ln1 ln2 … lnn 0 0 … lnn
(6.43)
we see that we must have l11
(6.44)
a11
l21 l11
a21
(6.45)
and so on, up to ln1 l11
an1 .
(6.46)
From these, for consecutive values of i, we compute i1
aii Å l2ik ,
lii
(6.47)
k 1
i 1, i 2, …, n
and for j l ji
1 lii
i1
a ji Å l jk lik .
(6.48)
k 1
These equations constitute the Cholesky algorithm. Not every matrix has a Cholesky decomposition, but there are some important cases where we can guarantee such a factorization.
Mathematics for Engineers
234
Definition 6.2. Positive-Definite Matrix
A square matrix A is said to be positive-definite if ¶T ¶ x Ax!0
(6.49)
for all x 0. It can be shown that this relation implies the existence of a constant k ! 0, such that ¶T ¶ (6.50) x Axk ¶ ¶ 1. for all x that satisfy «« x «« Theorem 6.3. Cholesky Factorization
If A is a symmetric positive-definite matrix, then it has a Cholesky factorization. The Cholesky method does not allow for row interchanges, but it is known that for positive-definite matrices this is not necessary. The advantage of the Cholesky method is that it is faster than the LU or Gauß elimination methods. It is not hard to show that the total number of arithmetic operations in the Cholesky method is about one-half that of the LU method. Since many applications lead to positivedefinite matrices, the Cholesky variation is quite useful. Example 6.5. Cholesky Factorization The following matrix should be decomposed using the properties of Cholesky's method. A
4 2 6 2 2 5 6 5 29
4 2 6 2 2 5 6 5 29
Solution 6.5. To get a decomposition we first have to check that A is symmetric and positive ¶T ¶ definite; i.e. A AT and x A x ! 0. First we check the symmetry. At
A¬
4 2 6 2 2 5 6 5 29
which obviously is the original matrix. Then we check the positive definite behavior by r1
Simplify#x1, x2, x3.+A.x1, x2, x3/'
4 x12 4 x1 +x2 3 x3/ 2 x22 10 x2 x3 29 x32
which is always positive for any arbitrary choice of x¶ . The following line is just a test that this is true for an arbitrary choice of variables
IV - Chapter 6: Solutions of Equations
235
r1 s. x1 RandomReal#1000, 1000', x2 RandomReal#1000, 1000', x3 RandomReal#1000, 1000' 1.52444 107
We can also check this behavior by PositiveDefiniteMatrixQ#A' True
Now we know that A is a symmetric positive definite matrix which allows a LU-decomposition in the frame of a Cholesky decomposition. First let us use the LU decomposition of A to compare it with the Cholesky decomposition. The LU decomposition can be found by l, u
LUMatrices#A'
1, 0, 0, 3, 1, 0, 2, 2, 1, 2, 2, 5, 0, 1, 14, 0, 0, 32
Delivering two matrices L l and U u representing the lower and upper triangular matrices. The product of the two matrices should deliver the original matrix up to row permutations which are due to pivoting steps used by the function. The product of the derived matrices delivers l.u 2 2 5 6 5 29 4 2 6
Which is in fact the original matrix with interchanged rows +1 2, 2 3, 3 1/. If we use the lower triangular matrix, we get as an result from Definition 6.1 l.l ¬ 1 3 2 3 10 8 2 8 9
which does not agree with the original matrix A. This disagreement is due to the algorithm used to derive the lower and upper matrices. In fact there are three different ways to derive these matrices which each is based on different assumptions. The method used above is called Doolittle's method and requires that 1's be on the diagonal of L, which results in the factorization described in Theorem 6.1. However, there is a second way to represent L and U, called Crout's method, requiring that 1's be on the diagonal elements of U. Finally there is Cholesky's method, which requires that lii uii , for each i. If we look for a Cholesky decomposition of A we will find v
CholeskyDecomposition#A' 2 1 3 0 1 2 0 0 4
which is of course different from the LU decomposition matrix. But now the matrix v satisfies the
Mathematics for Engineers
236
condition stated in Definition 6.1 which is v¬ .v 4 2 6 2 2 5 6 5 29
The result is in agreement with the original matrix as expected.Ô
6.2.4 Iterative Solutions ¶ The linear system A.x b that occur in many applications can have a very large order. For such systems, the Gaußian elimination method is often too expensive in either computation time or computer memory requirements, or possibly both. Moreover the accumulation of round-off errors can sometimes prevent the numerical solution from being accurate. As an alternative, such linear systems are usually solved with iteration methods, and that is the subject of this section. In an iterative method, a sequence of progressively accurate iterates is produced to approximate the solution. Thus, in general, we do not expect to get the exact solution in a finite number of iteration steps, even if the round-off error effect is not taken into account. In contrast, if round-off errors are ignored, the Gaußian elimination method produces the exact solution after +n 1/ steps of elimination and backward substitution for the resulting upper triangular system. Gaußian elimination method and its variants are usually called direct methods. In the study of iteration methods, a most important issue is the convergence property. We provide a framework for the convergence analysis of a general iteration method. For the two classical iteration methods, the Jacobi and Gauß-Seidel methods, studied in this section, a sufficient condition for convergence is stated. We begin with some numerical examples that illustrate to popular iteration methods. Following that, we give a more general discussion of iteration methods. Consider the linear system 9 x1 x2 x3 2 x1 10 x2 3 x3 3 x1 4 x2 11 x3
b1 b2 b3
(6.51)
One class of iteration methods for solving (6.51) proceeds as follows. In the equation numbered k, solve for xk in terms of the remaining unknowns. In the above case, 1 9
x1 x2 x3 Let x+0/
1 10 1 11
+b1 x2 x3 /
+b2 2 x1 3 x3 /
(6.52)
+b3 3 x1 4 x2 /
¶ +0/ +0/ T x+0/ be an initial guess of the true solution x. Then define an iteration sequence: 1 , x2 , x3
IV - Chapter 6: Solutions of Equations
x+k1/ 1
1 9
x+k1/ 2
1 10 1 11
x+k1/ 3
237
+k/ ,b1 x+k/ 2 x3 0
+k/ ,b2 2 x+k/ 1 3 x3 0 +k/ ,b3 3 x+k/ 1 4 x2 0
for k 0, 1, 2, … This is called the Jacobi iteration method or the method of simultaneous replacements. The system from above is solved for b
10, 19, 0
10, 19, 0
and the initial guess for the solution x1, x2, x3
0, 0, 0
0, 0, 0
The following steps show interactively how an iterative method works step by step. The first step uses the initial guess of the solution x1, x2, x3
N%
1 9
+b317 x2 x3/,
1 10
+b327 2 x1 3 x3/,
1 11
+b337 3 x1 4 x2/!)
1.11111, 1.9, 0.
The second step which uses the values from the first iteration step generates x1, x2, x3
N%
1 9
+b317 x2 x3/,
1 10
+b327 2 x1 3 x3/,
1 11
+b337 3 x1 4 x2/!)
0.9, 1.67778, 0.993939
The values derived are used again in the next iteration step x1, x2, x3
N%
1 9
+b317 x2 x3/,
1 10
+b327 2 x1 3 x3/,
1 11
+b337 3 x1 4 x2/!)
1.03513, 2.01818, 0.855556
again the resulting values are used in the next step x1, x2, x3
N%
1 9
+b317 x2 x3/,
0.98193, 1.94964, 1.01619
and so on
1 10
+b327 2 x1 3 x3/,
1 11
+b337 3 x1 4 x2/!)
Mathematics for Engineers
238
x1, x2, x3
1 N%
9
+b317 x2 x3/,
1 10
+b327 2 x1 3 x3/,
1 11
+b337 3 x1 4 x2/!)
1.00739, 2.00847, 0.97676
¶ The final result after this few iterations is an approximation of the true solution vector x 1, 2, 1. To measure the accuracy or the error of the solution we use the norm of vectors. Thus the error of the solution is estimated to be +k1/ ¶ +k/ (6.54) H «« x¶ x «« which gives a crude estimation of the total error in the calculation. The error of the individual components may be different for each component because the norm estimates an integral error of the iterates. To understand the behavior of the iteration method it is best to put the iteration formula into a vectormatrix format. Rewrite the linear system A.x¶ b as ¶ ¶ (6.55) N.x b P.x where A N P is a splitting of A. The matrix N must be non singular; and usually it is chosen so that the linear systems ¶ (6.56) N.z f are relatively easy to solve for general vectors f . For example, N could be diagonal, triangular, or tridiagonal. The iteration method is defined by ¶ +k1/ ¶ +k/ (6.57) N.x b P.x k 0, 1, 2, … The Jacobi iteration method is based on the diagonal structure of N. The following function implements the steps of a simple Jacobi iteration
IV - Chapter 6: Solutions of Equations
239
jacobiMethod$A_, b_, x0_, eps___( : Block%H
102 , Ha
100, P, x0in ,
+ determine the diagonal matrix N and its inverse / diag DiagonalMatrix#Tr#A, List''; idiag DiagonalMatrix#1 s Tr#A, List''; P diag A; Print#"N^+1/", " P ", N#Norm#idiag.P, 1'''; Print#"+N^+1/ P/2 ", N#Norm#idiag.P, 2'''; + set the initial guess for the solution / x0in x0; + iterate as long as the error is larger than the specified value / While%Ha ! H, xk1 Ha
idiag.+P.x0in/ idiag.b; N%
+xk1 x0in/.+xk1 x0in/ );
Print#xk , " ", PaddedForm#N#xk1', 8, 6', " H x0in N#xk1'
", PaddedForm#Ha, 8, 6'';
); xk1 )
The application of the function to the example discussed above delivers the successive iterates and the estimated error in printed form and a list of numbers as final result. jacobiMethod#9, 1, 1, 2, 10, 3, 3, 4, 11, 10, 19, 0, 0, 0, 0' N^+1/ P +N^+1/ P/2
0.474747 0.496804
xk
1.111111,
1.900000,
0.000000 H
2.201038
xk
0.900000,
1.677778, 0.993939 H
1.040128
xk
1.035129,
2.018182, 0.855556 H
0.391516
xk
0.981930,
1.949641, 1.016192 H
0.182571
xk
1.007395,
2.008472, 0.976760 H
0.075262
xk
0.996476,
1.991549, 1.005097 H
0.034765
xk
1.001505,
2.002234, 0.995966 H
0.014928
xk
0.999304,
1.998489, 1.001223 H
0.006820
0.999304, 1.99849, 1.00122
The printed values demonstrate that the error decreases and the solution approaches the result
Mathematics for Engineers
240
x¶
1, 2, 1 as expected from the calculations above.
For the general matrix A
a11 0 0 0 a22 0 0 0 ann
N
and P
aij of order n, the Jacobi method is defined with
(6.58)
N A. For the Gauß-Seidel method, let
N
and P
a11 0 a21 a22 an1 an2
0 0 ann
(6.59)
¶ N A. The linear system N.z
f is easily solved because N is diagonal for the Jacobi iteration methods and lower triangular for the Gauß-Seidel method. For systems A.x¶ b to which the Jacobi and Gauß-Seidel methods are often applied, the above matrices N are non singular. ¶+k/ ¶ ¶ +k/ To analyze the convergence of the iteration, subtract (6.55) from (6.57) and let H x x , obtaining ¶+k1/ NH ¶H+k1/
¶+k/ PH
(6.60)
M ¶H
+k/
(6.61)
where M N 1 P. Using the vector and matrix norm, we get +k1/ +k/ «« ¶H «« «« M «« «« ¶H «« .
(6.62)
By induction on k, this implies ¶+k/ ¶+0/ «« H «« «« M ««k «« H «« .
(6.63)
Thus, the error converges to zero if «« M «« 1.
(6.64)
We attempt to choose the splitting A N P so that this will be true, while also having the system ¶ N.z f be easily solvable. The result also says the error will decrease at each step by at least a factor +0/ of «« M ««; and that convergence will occur for any initial guess x¶ . The following lines implement the Seidel method. In Matrix form the Gauß-Seidel method reads +k1/ +L D/.x¶
¶ +k/ U.x b
which is equivalent to ¶ +k1/ ¶ +k/ x +L D/1 .,b U.x 0
(6.65)
(6.66)
IV - Chapter 6: Solutions of Equations
241
+k1/ and so can be viewed as a linear system of equations for x¶ with lower triangular coefficient matrix L D.
seidelMethod$A_, b_, x0_, eps___( : Block%H
106 , Ha
100, P, x0in ,
+ determine the diagonal matrix N and its inverse / diag DiagonalMatrix#Tr#A, List''; off A diag; l Table#If#i ! j, A3i, j7, 0', i, 1, Length#A', j, 1, Length#A''; u off l; ld diag l; ip Inverse#ld'; Print#"N^+1/ P ", N#Norm#ip.u, 1'''; Print#"+N^+1/ P/2 ", N#Norm#ip.u, 2'''; + set the initial guess for the soluti on / x0in x0; + iterate as long as the error is larger than the default value / While%Ha ! H, xk1 Ha
ip.+b u.x0in/; N%
+xk1 x0in/.+xk1 x0in/ );
Print#xk , " ", PaddedForm#N#xk1', 8, 6', " H x0in N#xk1'
", PaddedForm#Ha, 8, 6'';
); xk1 )
The same example as used for the Gauß-Jacobi method is used to demonstrate that the method converges quite rapidly. seidelMethod#9, 1, 1, 2, 10, 3, 3, 4, 11, 10, 19, 0, 0, 0, 0'
242
Mathematics for Engineers
N^+1/ P
0.520202
+N^+1/ P/2
0.328064
tk
1.111111,
1.677778, 0.913131 H
2.209822
tk
1.026150,
1.968709, 0.995753 H
0.314143
tk
1.003005,
1.998125, 1.000138 H
0.037686
tk
1.000224,
1.999997, 1.000060 H
0.003353
tk
1.000007,
2.000017, 1.000008 H
0.000224
tk
0.999999,
2.000003, 1.000001 H
0.000018
tk
1.000000,
2.000000, 1.000000 H
2.523111 106
tk
1.000000,
2.000000, 1.000000 H
2.980622 107
1., 2., 1.
Ô
6.2.5 Tests and Exercises The following two subsections serve to test your understanding of the last section. Work first on the test examples and then try to solve the exercises. 6.2.5.1 Test Problems T1. What is a linear system of equations? T2. How is linear system of equations represented? T3. How many solutions does a linear system of equations allow? T4. How does the Gauß algorithm work? T5. Which kind of algebraic operations (additions, subtractions, multiplications interchanges, ...) do not change a linear system of equations? T6. What is a LU factorization? T7. Which kind of iterative solutions do you know? T8. How many solutions does a linear system of equations allow?
6.2.5.2 Exercises E1. For each of the following linear systems, obtain a solution by graphical methods, if possible. Explain the results from a geometrical point of view. x2 y 3 a. 2 x 4 y 6 x2 y 0 b. 2x4 y 0 2x yz 1 c. 2x4 yz 1 2x y 1 2 d. 4 x 2 y x3 y 5 E2. Use Gaußian elimination and two digit rounding arithmetic to solve the following linear systems. Do not reorder the
IV - Chapter 6: Solutions of Equations
243
equations. (The exact solution to each system is x 4x yz 8 6 a. 4 x 10 y 4 z x2 y4z 11 8 x 2 y 10 z 18 5 b. 2 x 4 y z x y3z 9
1, z
1, y
3).
E3. Consider the following matrices. Find the permutation matrix P so that P A can be factored into the product L U , where L is lower triangular with 1's on its diagonal and U is upper triangular for these matrices. 0 1 1 1 2 1 a. A 1 1 1 1 2 1 2 4 0 b. A 0 1 1 1 1 1 0 1 1 4 3 c. A 2 1 2 4 2 1 2 3 E4. Factor the following matrices into the L U 1.012 2.132 3.104 2.132 4.906 7.013 a. A 3.104 7.013 0.014 2 1 1 3 3 9 b. A 3 3 5 2.1756 4.0231 2.1732 4.0231 6.0000 0 c. A 1.0000 5.2107 1.1111 6.0235 7.00000 0
decomposition using the L U factorization algorithm with lii
5.1967 1.1973 0 4.1561
E5. Let A be the 20 20 tridiagonal matrix given by ai,i a1,2
a20,19
3, ai,i1
ai,i1
2, for each i
2. Let b be the twenty dimensional column vector given by b1
i 2, 3, ..., 19. Solve A x E6. Suppose m linear systems
1 for all i.
2, …, 19, and a1,1 b20
1 and bi
a20,20
3,
0, for each
b using the factorization for tridiagonal systems.
Ax
+k/
+k/
b , k
(1)
1, 2, …, m
are to be solved, each with the n n coefficient matrix A. Show that Gaußian elimination with backward substitution applied to the augmented matrix -A b
+1/
b
+2/
…b
+m/
1
(2)
requires n3 t 3 m n2 n s 3 multiplications/divisions and n3 t 3 m n2 n2 t 2 m n n s 6 additions/subtractions. E7. Find the first two iterations of the Jacobi method for the following linear systems, using x 0
0:
Mathematics for Engineers
244
10 5 0 0 x1 6 x2 5 10 4 0 25 0 4 8 1 x3 11 x4 0 0 1 5 11 4 1 0 1 0 0 x1 x2 1 4 1 0 1 0 0 1 4 0 0 1 x3 b. x4 1 0 0 4 1 0 0 1 0 1 4 1 x5 x6 0 0 1 0 1 4 3 1 1 x1 1 0 c. 3 6 2 x2 3 3 7 x3 4
0 5 0 6 2 6
E8. Repeat exercise E7 using the Gauß-Seidel method. E9. The forces on the bridge truss shown in the following Figure satisfy the equations listed in the table below:
Forces in a network of trusses. Joint Horizontal component 1
F1
2 2
2
2 2
f1
3 4
f1 f2 3 2
f2 f5 3 2
f4 f5
f4
Vertical component 2 2
0 0
2 2
f1 F2
f1 f3
f3 10 000
0 0
1 f 2 4
F3
0 1 f 2 4
0
0 0
Approximate the solution of the resulting linear system to within 102 accuracy. Use the Gauß-Seidel method, the Jacobi method to find a solution.
6.3 Eigenvalue Problem Another important problem in linear algebra is the computation of the eigenvalues and eigenvectors of a matrix. Eigenvalue problems occur frequently in connection with mechanical vibrations and other periodic motion. Example 6.6. String Three equal masses are connected by a string, uniformly spaced, between two supports.
IV - Chapter 6: Solutions of Equations
y1
245
y3
y2
Figure 6.2. A simple vibration problem.
When the masses are pulled from equilibrium in the y-direction, the tension in the string will pull them back toward equilibrium and cause the entire assembly to vibrate. Using a number of simplifying assumptions (e.g. that the extension from equilibrium is small compared with the length of the string) and some suitable scaling, the equations of motion will have the form d 2 y1 dt2
2 y1 y2
d2
y2 dt2
y1 2 y2 y3
2
d y3 dt2
(6.67)
y2 2 y3
Under certain conditions, the assembly will settle into a steady state in which all three masses move periodically with the same frequency. Solution 6.6. If we assume that the solution is given by yi
ai cos+Z t/
(6.68)
for each variable, we find that the vibrational frequencies are determined by Z 2 a1 Z2 a2
2 a1 a2 a1 2 a2 a3
Z2 a3
a2 2 a3
Putting O
(6.69)
Z2 , this can be written in matrix form as
2 1 0 a1 1 2 1 . a2 a3 0 1 2
a1 O a2 . a3
(6.70)
The vibrational frequencies are therefore determined by the eigenvalues of a symmetric 3 3 matrix.Ô The preceding is an instance of the simple eigenvalue problem ¶ ¶ A.x O x
(6.71)
Mathematics for Engineers
246
that is common in practice. Some physical situations lead to the more general eigenvalue problem A.x¶ O B.x¶ . (6.72) In both cases, the matrices are square, and we want to find eigenvalues O for which the equation has a ¶ nontrivial eigenvector x. Although the general eigenvalue problem has applications, we will restrict our discussion to the more common simple problem. The set of all eigenvalues of an eigenvalue problem, ¶ called the spectrum of A, will be denoted by V+A/. The pair +O, x/ is an eigensolution. As in the case of solutions of linear systems, the algorithms for computing eigenvalues involve a fair amount of manipulative detail. Fortunately, computer programs for them are readily available so the need for implementing them does not often arise. For most users it is not essential to remember all the fine points of the eigenvalue algorithms, but it is important to understand the basic ideas behind these methods, know how to use them, and have an idea of what their limitations are. For this, we first need to review some of the elementary results from the theory of eigenvalues. The results quoted here are results stated in the lecture on linear algebra (see Volume II). We will leave it to the reader to look up the proofs in books on linear algebra. The first few theorems are simple and can be proved as exercises. Theorem 6.4. Eigensolution
¶ ¶ If +O , x/ is an eigensolution of the matrix A, then +D O E , x/ is an eigensolution of the matrix ¶ ¶ D A E I. If x is an eigenvector, so is c x for any constant c 0. The second part of this theorem states that the eigenvectors are indeterminate to within a constant. ¶ When convenient, we can therefore assume that all eigenvectors are normalized by «« x «« 1. Theorem 6.5. Eigenvalues of A1 1 If +O , ¶x/ is an eigensolution for the matrix A and A is invertible, then , O , x¶ 0 is an eigensolution of
A1 .
Many of the techniques for solving eigenvalue problems involve transforming the original matrix into another one with closely related, but more easily computed, eigensolutions. The primary transformation used in this is given in the following definition. Definition 6.3. Similarity Transformation
Let T be an invertible matrix. Then the transformation B
T.A.T 1
(6.73)
is called a similarity transformation. The matrices A and B are said to be similar. Theorem 6.6. Eigensolution of Similar Matrices
¶ ¶ Suppose that +O , x/ is an eigensolution for the matrix A. Then +O , T.x/ is an eigensolution of the
IV - Chapter 6: Solutions of Equations
247
similar matrix B
T.A.T 1 .
(6.74)
An important theorem, called the Alternative Theorem, connects the solvability of a system A.x¶ with the spectrum of A.
b
Theorem 6.7. Alternative Theorem
The n n linear system ¶ +A O I/.x y
(6.75)
has a unique solution if and only if D is not in the spectrum of A. The alternative theorem gives us a way of analyzing the spectrum of a matrix. If O is in V+A/ then A O I cannot have an inverse; consequently, if A O I is not invertible, then O must be in the spectrum of A. This implies that the spectrum of A is the set of all O for which det+A O I/
0.
(6.76)
Since the determinant of any matrix can be found with a finite number of multiplications and divisions, say by expansion of minors, it follows that det+A O I/ is a polynomial of degree n in O. This is the characteristic polynomial which will be denoted by p A +O/
det+A O I/.
(6.77)
The eigenvalue problem is therefore reducible to a polynomial root-finding problem which, at least in theory, is well understood. While we usually do not solve eigenvalue problems quite in this way, the observation is used in many of the eigenvalue algorithms. For instance, we can draw on our extensive knowledge of polynomials and their roots to characterize the spectrum. Theorem 6.8. Number of Eigenvalues
An n n matrix has exactly n eigenvalues in the complex plane. In this theorem we must account for multiple roots of p A +O/. If O0 is a root of p A +O/ with multiplicity k, then O0 is an eigenvalue with the same multiplicity. In practice, one way of finding eigenvalues is to transform the original matrix by a sequence of similarity transformations until it becomes more manageable, for example, so that its determinant can be evaluated quickly or so that its eigenvalues can be immediately read off. The latter is certainly true if the matrix is diagonal. When a matrix is not diagonal, but in some way nearly so, we can use the next result to locate its eigenvalues approximately. Theorem 6.9. Spectrum of a Matrix A
The spectrum of a matrix A is contained entirely in the union of the disks (in the complex plane)
Mathematics for Engineers
248
di +D/
O : Aii O Å Aij !
i
1, 2, 3, …, n.
(6.78)
ji
This is a simplified version of the Gerschgorin Theorem. The more general version states that each disjoint region, which is made up from the union of one or more disk components, contains exactly as many eigenvalues as it has components. Theorem 6.9 is useful because it tells us exactly how many eigenvalues there are in a region. Unfortunately, we may have to go into the complex plane to find them and this complicates matters. There is also the issue of eigenvectors that is less readily resolved. All of this becomes a lot easier when we deal with symmetric matrices. Theorem 6.10. Symmetric Matrices Eigenvalues
The eigenvalues of a symmetric matrix are all real. To each eigenvalue there corresponds a distinct, real eigenvector. Theorem 6.11. Symmetric Matrices Eigenvalues
If A is symmetric, then the eigenvectors associated with two different eigenvalues are orthogonal to each other. This implies that, even if there are multiple eigenvalues, there are eigenvectors u 1 , u 2 , …, un such that T
ui u j
0
for all i j.
(6.79)
Since the orthogonality of the eigenvectors makes them linearly independent, it follows from this theorem that every n-vector x¶ can be expanded as a linear combination of the eigenvectors of a symmetric n n matrix. In particular, because we can always normalize the eigenvectors such that T u i ui 1, we have that for every n-vector x¶ ¶x
n
¶T Å ,x u i 0 ui .
(6.80)
i 1
Such simple results normally do not hold for non symmetric matrices, but they are of great help in working with symmetric cases. Since in practice symmetric matrices are very common, the symmetric eigenvalue problem has received a great deal of attention. The power method is conceptually the simplest way of getting eigenvalues of a symmetric matrix. It is an iterative method that, in many cases, converges to the eigenvalue of largest magnitude. For simplicity, we will assume that the eigenvalues of the symmetric matrix A are all simple and have been labeled so that O1 ! O2 …. The corresponding eigenvectors will be u 1 , u 2 ,… ¶ Take any vector x; by Theorem 6.11, this vector can be expanded as
IV - Chapter 6: Solutions of Equations
249
n
¶ x
Å ai u i .
(6.81)
i 1
Then A.x¶
n
Å ai Oi ui
(6.82)
i 1
and generally ¶ Ak .x
n k Å ai Oi ui i 1
n
O1k Å ai i 1
Oi O1
k
ui
(6.83)
As k becomes large all the terms in this sum will become small except the first, so that eventually ¶ (6.84) Ak x O1k a1 u 1 . If we take any component of this vector, say the ith one, and compare it for successive iterates, we find that the dominant eigenvalue O1 is approximated by ¶ ,Ak1 .x0i O1 (6.85) ¶ . ,Ak .x0i For stability reasons, we usually take i so that we get the largest component of ,Ak .x¶0i , but in principle any i will do. An approximation to the corresponding eigenvector can be obtained by u c Ak .x¶ , 1
(6.86)
where the constant c can be chosen for normalization. In practice, we do not evaluate (6.85) by computing the powers of A. Instead, we iteratively compute ¶ ¶ xk1 A.xk (6.87) with initial guess x¶0 ¶ +xk1 /i O1 ¶ . +xk /i
x¶. Then
Example 6.7. Power Method Apply the power method to the following matrix
(6.88)
Mathematics for Engineers
250
A
1 0 0.2` 0 3 0.1` 0.2` 0.1` 0 0.1` 0.2` 0.1`
1 0 0.2 0 3 0.1 0.2 0.1 0 0.1 0.2 0.1
0.1` 0.2` ; MatrixForm#A' 0.1` 2
0.1 0.2 0.1 2
with the initial guess x
1, 1, 1, 1
1, 1, 1, 1
or we can use any other random choice of the initial vector like x
RandomReal#'4i
1
0.624162, 0.456389, 0.61321, 0.514327
Solution 6.7. The iterations of the vectors are done by nested applications of the matrix product on the initial vector
IV - Chapter 6: Solutions of Equations
l1
251
NestList#A.Ó1 &, x, 24' 0.624162 0.798237 0.761254 1.01117 0.86346 2.02893 2.08166 8.98674 15.9562 64.569 155.054 539.992 1472.43 4737.49
0.456389 1.53335 4.45952 13.8089 40.8908 124.621 372.416 1127.49 3383.76 10 214.6 30 714.5 92 597.9 278 675. 839 663.
0.61321 0.221904 0.231619 0.801799 1.27505 5.17223 11.8862 42.1858 113.079 367.565 1051.34 3277.13 9644.8 29 498.1
0.514327 0.81364 2.03596 3.08074 9.10456 9.8171 45.2785 14.6769 259.97 169.717 1746.7 2770.14 13 361. 30 124.8
13 649.6
2.52796 106
87 926.3
111 107.
42 345.5
7.6149 10
266 637.
293 537.
125 027.
2.29301 107
381 570.
7
1.13899 10
6
6.90635 10 6
8
2.07981 10
799 313.
966 805.
2.41469 10
6
2.74484 106
7.25715 10
6
8.60265 106
3.45069 106 6.26389 108 2.18862 107 2.52305 107 1.0351 107
1.8864 109
6.58521 107 7.73505 107
3.12565 107 5.68126 109 1.98446 108 9.39656 10
7
1.71097 10
10
5.97398 10
8
2.302 108 6.98823 108
2.83327 108 5.15285 1010 1.79964 109 2.09343 109 8.52599 108 1.55184 1011 5.41886 109 6.32715 109
The observation is that the numbers in the vector increase in each iteration. However the ratio of the last and the previous to the last value deliver the eigenvalue in a correct way as predicted l
Last#Last#l1'' Last#l1327'
3.02239
This value is in agreement with the methods used by Mathematica to derive the complete set of eigenvalues Eigenvalues#A' 3.01163, 2.01529, 1.04267, 0.0390154
The normalization of the related vector delivers the eigenvector as
Mathematics for Engineers
252
Last#l1' u1 Last#l1'.Last#l1' 0.00548612, 0.998547, 0.0348682, 0.0407126
Again if we compare the derived eigenvector with the eigenvectors calculated by Mathematica we get a close agreement between the vectors. Note the minus sign in front of the eigenvector in the Mathematica calculation. Eigenvectors#A' 0.00548788, 0.99855, 0.0348708, 0.0406546, 0.0301335, 0.038901, 0.0445904, 0.997793, 0.980882, 0.0135727, 0.190454, 0.0376048, 0.192179, 0.0346609, 0.980063, 0.0366429
These approximations of the eigenvalue and the eigenvector can be expected to be accurate to at least three significant digits.Ô The power method is simple, but it has some obvious shortcomings. First, note that the argument leading to the iteration formula for O works only if a1 0; that is, a starting guess x¶ must have a component in the direction of the eigenvector u1 . Since we do not know u 1 , this is hard to enforce. In practice, though, rounding will eventually introduce a small component in this direction, so the power method should work in any case. But it may be quite slow and it is a good idea to use the best guess for u1 as an initial guess. If no reasonable value for u 1 is available, we can simply use a random number generator to choose a starting value. The power method is iterative, so its rate of convergence is of concern. It is not hard to see that it has an iterative order of convergence one and that each step reduces the error roughly by a factor of O2
c
O1
.
(6.89)
The method, as described, works reasonably well only if the dominant eigenvalue is simple and significantly separated from the next largest eigenvalue. We can sometimes make an improvement by shifting the origin, as indicated by Theorem 6.4. If we know the approximate positions of the eigenvalue closest to O1 say O2 , and the eigenvalue farthest from it, say Ok , we can shift the origin so that O2
Ok
O1
O1
,
(6.90)
which minimizes the ratio of largest to second largest eigenvalue. This can be achieved by shifting the origin by an amount E, where E is halfway between O2 and Ok . When we now apply the power method to the matrix A I E, we can expect faster convergence. Example 6.8. Power Method Improved Consider the matrix A from the last example
IV - Chapter 6: Solutions of Equations
253
MatrixForm#A' 1 0 0.2 0 3 0.1 0.2 0.1 0 0.1 0.2 0.1
0.1 0.2 0.1 2
Gerschgorin's theorem tells us that the approximated location of the eigenvalues are 2, 0, 1, 3. Solution 6.8. We therefore expect that in the computations the error is reduced approximately by a factor of 2 s 3 on each iteration. If we shift the origin with 12
E
2 1
2
If we shift the original matrix A by B
A E IdentityMatrix#4'; MatrixForm#B' 3 2
0
0.2 0.1
0
7 2
0.1 0.2
0.2 0.1
1 2
0.1 3
0.1 0.2 0.1 2
we find the eigenvalues by the same iteration procedure by using the same initial guess for x¶ 1, 1, 1, 1
x
1, 1, 1, 1
The iteration delivers the transformed vectors l1
NestList#B.Ó1 &, x, 14'
1, 1, 1, 1, 1.8, 3.8, 0.9, 1.1, 2.77, 13.17, 1.08, 2.68, 4.639, 46.739, 2.679, 1.001, 7.3942, 163.654, 6.8411, 11.5811, 13.6176, 575.79, 22.4229, 16.7827, 26.5893, 2020.86, 73.1923, 93.588, 63.8812, 7099.06, 253.359, 273.769, 173.871, 24 926.8, 876.739, 1040.88, 540.242, 87 539.7, 3069.91, 3529.1, 1777.25, 307 402., 10 749.9, 12 575.3, 6073.39, 1.0795 106 , 37 728.1, 43 870.1 , 21 042.7, 3.79078 106 , 132 415., 154 474. , 73 494.5, 1.33119 107 , 464 942., 541 791. , 257 409., 4.67464 107 , 1.63254 106 , 1.90353 106
Mathematics for Engineers
254
The ratio of the last and the previous to the last value deliver the eigenvalue Last#Last#l1'' l
Last#l1327'
E
3.01341
Here we have to use the shift E again because otherwise the eigenvalue is to large. The normalization of the related vector delivers the eigenvector as Last#l1' u1 Last#l1'.Last#l1' 0.00549851, 0.998549, 0.0348726, 0.0406613
We can expect an error attenuation of about 3 s 7 0.428571 on each iteration, about twice as fast as the original computation.Ô
6.3.1 Tests and Exercises The following two subsections serve to test your understanding of the last section. Work first on the test examples and then try to solve the exercises. 6.3.1.1 Test Problems T1. What are eigenvalues? T2. What are eigenvectors? T3. What are eigenvalues and eigenvectors useful for? T4. What is the characteristic polynomial of a matrix? T5. How many eigenvalues does a n n matrix have? T6. What is the important property of a symmetric matrix? T7. How are the eigen values of the inverse matrix related to the original matrix?
6.3.1.2 Exercises E1. Compute the eigenvalues and associated eigenvectors of the following matrices. 2 1 a. A 1 2 1 1 b. A 2 2 c. A
d. A
0
1 3
1 3
0
2 1 1 2 3 2 1 1 2
E2. Show that the characteristic polynomial p +O/ E3. Show that A is singular if and only if O
det+A OI/ for the n n matrix A is an nth degree polynomial.
0 is an eigenvalue of A.
E4. Show that if O is a eigenvalue of a matrix A and «« • «« is a vector norm, then an eigenvector x associated with O exists with «« x «« 1.
7 Ordinary Differential Equations
7.1 Introduction Differential equations are among the most important mathematical tools used in producing models of engineering, of physics, and biology. In this chapter, we consider numerical methods for solving ordinary differential equations, that is, those differential equations that have only one independent variable. The differential equations we consider are of the form y' +x/
f +x, y+x//
x x0
(7.1)
where y+x/ is an unknown function that is being sought. This kind of equations is found in modeling electrical circuits for example. In a simple electrical circuit, the current I is a function of time I I+t/. The function I+t/ will satisfy an ordinary differential equation if the form as give above dI dt
f +t, I/.
(7.2)
Here again the right hand side is a function of t and I that depends on the circuit and on the nature of the electric forces acting or supplied to the circuit. In this chapter we will develop methods which will allow us to solve this kind of equations numerically. The following sections are concerned with the numerical solution of initial value problems for ordinary differential equations. We will introduce the most basic one-step methods, beginning with the most basic Euler scheme, and working up to the extremely popular Runge–Kutta fourth-order method that can be successfully employed in most situations.
Mathematics for Engineers
256
7.2 Mathematical Preliminaries It will be useful to review some elementary definitions and concepts from the theory of differential equations. An equation involving a relation between the values of an unknown function and one or more of its derivatives is called a differential equation. We shall always assume that the equation can be solved explicitly for the derivative of highest order. An ordinary differential equation of order n will then have the form y+n/ +x/
f ,x, y+x/, y+1/ +x/, …, y+n1/ +x/0
(7.3)
By a solution of (7.3) we mean a function I+x/ which is n times continuously differentiable on a prescribed interval and which satisfies (7.3); that is I+x/ must satisfy I+n/ +x/
f ,x, I+x/, I+1/ +x/, …, I+n1/ +x/0
(7.4)
The general solution of (7.3) will normally contain n arbitrary constants, and hence there exists an nparameter family of solutions. If y+x0 /, y' +x0 /, …, y+n1/ +x0 / are prescribed at one point x x0 , we have an initial-value problem. We shall always assume that the function f satisfies conditions sufficient to guarantee a unique solution to this initial-value problem. A simple example of a first-order equation is eq1
y+x/ x
y+x/
y
+x/ y+x/
Its general solution is the exponential function solution
DSolve#eq1, y, x'
y +x Ì c1 Æx /
where c1 is an arbitrary constant. If the initial condition y+x0 / written solution
y0 is prescribed, the solution can be
DSolve#eq1, y+x0/ y0, y, x'
y ,x Ì y0 Æxx0 0
Differential equations are further classified as linear and nonlinear. An equation is said to be linear if the function f in (7.3) involves y and its derivatives linearly. Linear differential equations possess the important property that if y1 +x/, y2 +x/, …, ym +x/ are any solutions of (7.3), then so is m
y+x/
C1 y1 +x/ C2 y2 +x/ … Cm ym +x/
Å Ck yk +x/ k 1
for arbitrary constants Ck . A simple second-order equation is
(7.5)
IV - Chapter 7: Ordinary Differential Equations
2 y+x/
eq2
xx
257
y+x/
y
+x/ y+x/
It is easily verified that Æx and Æx are solutions of this equation, and hence by linearity the following sum is also a solution: y +x Ì c1 Æ x c2 Æx /
sol2
y +x Ì c1 Æx c2 Æx /
which can be verified by a direct substitution of the solution into the equation eq2 s. sol2 True
Two solutions y1 and y2 of a second-order linear differential equation are said to be linearly independent if the Wronskian of the solution does not vanish, the Wronskian is defined by ,
W+y1 , y2 /
,
y1 y1 . , y2 y2
,
y1 y2 y2 y1
(7.6)
The concept of linear independence can be extended to the solutions of equations of higher order. If y1 +x/, y2 +x/,…, yn +x/ are n linearly independent solutions of a homogeneous differential equation of order n, then n
y+x/
Å Ck yk +x/
(7.7)
k 1
is called a general solution. Among linear equations, those with constant coefficients are particularly useful since they lend themselves to a simple treatment. We write the nth -order linear differential equation with constant coefficients in the form Ly
y+n/ an1 y+n1/ … a0 y+0/
0
(7.8)
where the ai are assumed to be real. If we seek solutions of (7.8) in the form Æ E x , then direct substitution shows that E must satisfy the polynomial equation En an1 En1 … a0
0
(7.9)
This is called the characteristic equation of the nth -order differential equation (7.8). If the equation (7.9) has n distinct roots Ei +i 1, 2, …, n/ then it can be shown that n
y+x/
Å Ck Æ Ek x k 1
(7.10)
258
Mathematics for Engineers
where the Ck are arbitrary constants. If E1 D Ç J is a complex root of (7.9), so is its conjugate, E2 D Ç J. Corresponding to such a pair of conjugate-complex roots are two solutions y1 ÆD x cos+J x/ and y2 ÆD x sin+J x/, which are linearly independent. When (7.9) has multiple roots, special techniques are available for obtaining linearly independent solutions. In particular, if E1 is a double root of (7.9), then y1 Æ E1 x and y2 x Æ E1 x are linearly independent solutions of (7.8). For the special equation eq3
2 y+x/ xx
a2 y+x/ 0
a2 y+x/ y
+x/ 0
the characteristic equation is chareq
Simplify%Thread%
eq3 s. y ,x Ì Æ x E 0 Æx E
, Equal))
a2 E2 0
with roots solCar3
Solve#chareq, E'
E Ç a, E Ç a
and its general solution is sol
Fold$Plus, 0, MapThread$Ó2 Æ x E s. Ó1 &, solCar3, c1 , Ç c2 ((
c1 ÆÇ a x Ç c2 ÆÇ a x
which can be represented by trigonometric functions Simplify#ExpToTrig#sol'' + c2 Ç c1 / sin+a x/ +c1 Ç c2 / cos+a x/
Finally, if Equation (7.8) is linear but non homogeneous, i.e., if Ly
y+n/ an1 y+n1/ … a0 y+0/
g+x/
(7.11)
and if ]+x/ is a particular solution of (7.11), i.e., if L ]+x/
g+x/
(7.12)
then the general solution of (7.11), assuming that the roots of (7.8) are distinct, is n
y+x/
]+x/ Å Ck Æ Ek x k 1
(7.13)
IV - Chapter 7: Ordinary Differential Equations
259
Example 7.1. Linear ODE Find the solution of the equation eq4
2 y+x/ xx
3 y+x/ 4
y+x/ x
x
y
+x/ 4 y
+x/ 3 y+x/ x
satisfying the initial conditions y+0/
initialConditions y+0/
4 9
, y
+0/
7 3
4 9
, y
+0/
7 3
!
!
Solution 7.1. To find a particular solution ]+x/, we try y +x Ì a x b/
particularSolution y +x Ì a x b/
since the right side is a polynomial of degree 1 and the left side is such a polynomial whenever y y+x/ is. Substituting this ansatz into the ODE, we find a polynomial relation eq4 s. particularSolution
ansatz1
3 +a x b/ 4 a x
who's coefficients should vanish to satisfy the relation. The resulting equations or the coefficient are coefficentEquations
Thread#Flatten#Normal#CoefficientArrays#ansatz1, x''' 0'
3 b 4 a 0, 3 a 1 0
which results to solCoeff a
1 3
Solve#coefficentEquations, a, b' ,b
4 9
!!
Hence the particular solution of the ODE is given by particularSolution s. solCoeff
particularSolution y x Ì
x 3
4 9
!
To find solutions of the homogeneous equation
Mathematics for Engineers
260
First#eq4' 0
homEq4
y +x/ 4 y +x/ 3 y+x/ 0
we examine the characteristic equation charEq
Simplify%Thread%
homEq4 s. y ,x Ì Æ x E 0 Æx E
, Equal))
E2 3 4 E
Its roots are Flatten#Solve#charEq, E''
solCharEq
E 1, E 3
Hence the two linearly independent solutions of the homogeneous system are Fold$Plus, 0, MapThread$Ó2 Æ x E s. Ó1 &, solCharEq, c1 , c2 ((
homogenousSol c1 Æx c2 Æ3 x
The general solution of equation thus is + y+x/ s. particularSolution/ homogenousSol
generalSolution c1 Æx c2 Æ3 x
x 3
4 9
To find the solution satisfying the initial conditions, we must have ic1
+generalSolution s. x 0/ initialConditions31, 27
c1 c2
4 9
4 9
and ic2
generalSolution x
c1 3 c2
1 3
s. x 0 initialConditions32, 27
7 3
defining a system of determining equations for the unknown constants C1 and C2 . The solution of these equations is solDet
Flatten#Solve#ic1, ic2, c1 , c2 ''
c1 1, c2 1
Hence the desired solution is
IV - Chapter 7: Ordinary Differential Equations
solution
261
y +x Ì $Y/ s. $Y +generalSolution s. solDet/
y xÌ
x 3
Æ x Æ3 x
4 9
This solution can be used to verify the original ODE just by inserting it into the equation Simplify#eq4 s. solution' True
demonstrating that the derived solution is in fact a solution.Ô
7.2.1 Tests and Exercises The following two subsections serve to test your understanding of the last section. Work first on the test examples and then try to solve the exercises. 7.2.1.1 Test Problems T1. What is a differential equation? T2. What is an ordinary differential equation? T3. What is an order of an ordinary differential equation? T4. How are solutions of an ordinary differential equation defined? T5. What is the difference between a linear and a non linear ordinary differential equation? T6. What is the superposition principle telling you? T7. What is the Wronskian? T8. What is an initial value problem?
7.2.1.2 Exercises E1. Solve the following initial value problems a. y' y+x/ b with y+x0 / y0 and b ± 5. with y+x0 / y0 . b. y' y+x/2 with y+x0 / y0 . c. y' y s y' E2. Solve the following ordinary differential equations by using the characteristic method a. y'' y+x/. b. y+2/ y+x/ y'. E3. Find the solution of the following differential equations: a. y' t2 5 t t1s2 b. y' y c. y' y d. y'' y E4. Find the solution of the following initial value problems: a. y' t2 t1s3 y+0/ 2 b. y' 2 y y+0/ 15 c. y' y y+0/ 1 d. y'' y y+S/ 0, y' +0/ 2 E5. When a flexible cable of uniform density is suspended between two fixed points and hangs of its own weight, the
Mathematics for Engineers
262
shape y
f +x/ of the cable must satisfy a differential equation of the form d2 y dx2
k
1
dy
2
(1)
dx
where is a positive constant. a. Let z dys dx in the differential equation. Solve the resulting first order differential equation (in z), and then integrate to find y. b. Determine the length of the cable with its highest point at + b, h/ and its lowest point in the center +0, a/ assume h ! a.
7.3 Numerical Integration by Taylor Series We are now prepared to consider numerical methods for integrating differential equations. We shall first consider a first-order initial-value differential equation of the form y' +x/
f +x, y+x//
with y+x0 /
y0 .
(7.14)
The function f may be linear or nonlinear, but we assume that f is sufficiently differentiable with respect to both x and y. It is known that (7.14) possesses a unique solution if all f s y is continuous on the domain of interest. If y+x/ is the exact solution of (7.14), we can expand y+x/ into a Taylor series about the point x x0 : ser1
Series# y+x/, x, x0, 3'
y+x0/ +x x0/ y
+x0/
1 2
+x x0/2 y
+x0/
1 6
+x x0/3 y+3/ +x0/ O,+x x0/4 0
The derivatives in this expansion are not known explicitly since the solution is not known. However, if f is sufficiently differentiable, they can be obtained by taking the total derivative of (7.14) with respect to x, keeping in mind that y is itself a function of x. Thus we obtain for the first few derivatives:
IV - Chapter 7: Ordinary Differential Equations
263
f +x, y+x//;
t0
Do%t0
Append%t0,
Last#t0' x
s. y
+x/ f +x, y+x//), i, 1, 3);
TableForm#t0' f +x, y+x// f +x, y+x// f +0,1/ +x, y+x// f +1,0/ +x, y+x// f +0,1/ +x, y+x// , f +x, y+x// f +0,1/ +x, y+x// f +1,0/ +x, y+x//0 f +x, y+x// f +1,1/ +x, y+x// f +x, y+x// , f +x, y+x// f +0,2/ +x, y+x// f +1,1/ +x, y+x//0 f +2,0/ +x, y+x// , f +x, y+x// f +0,1/ +x, y+x// f +1,0/ +x, y+x//0 f +1,1/ +x, y+x// 2 , f +x, y+x// f +0,1/ +x, y+x// f +1,0/ +x, y+x//0 , f +x, y+x// f +0,2/ +x, y+x// f +1,1/ +x, y+x//0 f +0,1/ +x, y+x// , f +0,1/ +x, y+x// , f +x, y+x// f +0,1/ +x, y+x// f +1,0/ +x, y+x//0 f +x, y+x// f +1,1/ +x, y+x// f +x, y+x// , f +x, y+x// f +0,2/ +x, y+x// f +1,1/ +x, y+x//0 f +2,0/ +x, y+x//0 f +x, y+x// f +2,1/ +x, y+x// f +x, y+x// , f +x, y+x// f +1,2/ +x, y+x// f +2,1/ +x, y+x//0 f +x, y+x// , f +0,2/ +x, y+x// , f +x, y+x// f +0,1/ +x, y+x// f +1,0/ +x, y+x//0 f +x, y+x// f +1,2/ +x, y+x// f +x, y+x// , f +x, y+x// f +0,3/ +x, y+x// f +1,2/ +x, y+x//0 f +2,1/ +x, y+x//0 f +3,0/ +x, y+x//
Continuing in this manner, we can express any derivative of y in terms of f +x, y/ and its partial derivatives. It is already clear, however, that unless f +x, y/ is a very simple function, the higher total derivatives become increasingly complex. For practical reasons then, one must limit the number of terms in the Taylor expansion of the right hand side of (7.14) to a reasonable number, and this restriction leads to a restriction on the value of x for which the expansion is a reasonable approximation. If we assume that the truncated series above yields a good approximation for a step of length h, that is, for x x0 h, we can then evaluate y at x0 h; reevaluate the derivatives y', y''; etc., at x x0 h; and then use the Taylor expansion if f to proceed to the next step. If we continue in this manner, we will obtain a discrete set of values yn which are approximations to the true solution at the points xn x0 n h +n 0, 1, 2, …/. In this chapter we shall always denote the value of the exact solution at a point xn by y+xn / and of an approximate solution by yn . In order to formalize this procedure, we first introduce the operator Tk +x, y/ In[21]:=
f +x, y/
h 2
f ' +x, y/ …
hk1 k
f +k1/ +x, y/ with k
1, 2, …
TaylorFormula$x_, f_, x0_, y0_, n_( : Block#t0, t0 f; Do#t0 Append#t0, x Last#t0' s. y '#x' ! f', i, 1, n'; rule1 Map#Apply#Rule, Ó' &, Transpose#Table#D#y#x', x, i', i, 0, n', t0'' s. x ! x0; Normal#Series#y#x', x, x0, n'' s. rule1 s. y#x0' ! y0, x x0 ! h '
(7.15)
Mathematics for Engineers
264
The application of this operator to a function f +x, y+x// of order 0 delivers h TaylorFormula+x, f +x, y+x//, x0, y0, 0/ h f +x0, y0/
where we assume that a fixed step size h is being used, and where f + j/ denotes the jth total derivative of the function f +x, y+x// with respect to x. We can then state the following Algorithm. Algorithm Taylor: Taylor's algorithm of order k to find an approximate solution of the differential equation y' y+a/
f +x, y/
(7.16)
y0
(7.17)
along an interval #a, b' 1. Choose a step h +b a/ s N . Set xn
anh
with n
0, 1, 2, …, N
(7.18)
2. Generate approximations yn to y+xn / from the recursion yn1 yn h Tk +xn , yn / for n where Tk +x, y/ is defined by +7.15/.
0, 1, 2, …, N 1
(7.19)
3. Iterate formula (7.19) by N steps. The discussed algorithm is implemented in the following function TaylorSolve which uses the ODE and its initial conditions as its main input information. The dependent variable is defined in the second slot. The third slot contains the independent variable and the interval #a, b' on which the solution is derived. Nmax specifies the maximal number of subintervals in the interval #a, b'. The order k parameter defines the order of approximation in the Taylor series expansion.
IV - Chapter 7: Ordinary Differential Equations
In[22]:=
265
TaylorSolve$eq_List, depend_, independ_, a_, b_ , Nmax_, orderK_( : Block%f$h, h, x0, yn, lis, yn1, tylor, + extract information from equation / f$h Last#First#eq''; y0 Last#Last#eq''; + fix step length / +b a/ ; h Nmax + initialize iteration / x0 a; yn y0; xn x0; lis x0, yn; + generate Taylor formula / tylor TaylorFormula#independ, f$h, [, ], orderK'; + iterate the formula numerically / Do# yn1 yn N#h tylor s. [ ! xn, ] ! yn'; xn x0 +n 1/ h; AppendTo#lis, xn, yn1'; yn yn1, n, 0, Nmax 1'; + return the results / y ! Interpolation#lis' )
The following line shows an application of the function TaylorSolve to the initial value problem y'
+x y+x//2
with y+0/
1 y+x/ TaylorSolve
y+x/ x
1 on the interval #0, 2'.
+x y+x//2 y+x/ 1
(7.20)
, y+0/ 1!, y, x, 0, 2, 38, 1
y InterpolatingFunction#'
The returned solution represents the numerical solution in pairs xn , yn . Using these pairs in an interpolation we can represent them by a function. The accuracy of the algorithm depends on Nmax and k the order of approximation. To see how critical the influence of these parameters are let us examine this in a few examples. If we keep both of these numbers low, we can derive the following approximations of the solution
Mathematics for Engineers
266
s1
Table%TaylorSolve
y+x/ x
+x y+x//2 y+x/ 1
, y+0/ 1!, y, x, 0, 2, 8, k , k, 0, 3)
y InterpolatingFunction#', y InterpolatingFunction#', y InterpolatingFunction#', y InterpolatingFunction#'
These four solutions can be compared with a more sophisticated approach used by the Mathematica function NDSolve which generates for the same initial value problem the solution by sh
NDSolve%
y+x/ x
+x y+x//2 y+x/ 1
, y+0/ 1!, y, x, 0, 2)
y InterpolatingFunction#'
Plotting all five solutions in a common plot allows us to compare qualitatively the solutions.
1.30
k
0
k
2
k
3
k
1
1.25
y
1.20 1.15 1.10 1.05 1.00 0.0
0.5
1.0
1.5
2.0
x Figure 7.1. Solution of the initial value problem y '
+xy+x//2 1y+x/
with y+0/
1 by using different orders of a Taylor approximation.
The different approximation orders of the Taylor algorithm are located either above or below the Mathematica solution. The lowest order approximation, corresponding to an Euler approximation is located above the Mathematica solution. The higher order approximations of the Taylor series are below the Mathematica solution. The Taylor approximations only deviate from each other in the first approximation. However the higher approximations are very close to each other. This means that in a Taylor approximation of ordinary differential equations it is sufficient to work with a second order approximation. Higher order approximations do not increase the accuracy if the time steps are large. If we decrease the length of the time step the accuracy of the integration becomes better and better. The following sequence of solution demonstrates this behavior.
IV - Chapter 7: Ordinary Differential Equations
267
N
N
20 h
0.1 k
3
1.30 1.25 1.20 y
Out[24]=
1.15 1.10 1.05 1.00 0.0
0.5
1.0
1.5
2.0
x
Figure 7.2. Solution of the initial value problem y '
+xy+x//2 1y+x/
with y+0/
1 by using different orders of a Taylor approximation.
In conclusion the step size of the algorithm influences the accuracy more pronounced then the approximation order of the Taylor representation. In addition, if the step size is sufficiently small the differences between the lower order approximations and the "higher" order approximation in the Taylor series vanishes. Taylor's algorithm, and other methods based on the above algorithm, which calculate y at x xn1 by using only information about y and y' at a single point x xn , are frequently called one-step methods. Taylor's theorem with remainder shows that the local error of Taylor's algorithm of order k is E
hk1 f +k/ +[, y+[//
hk1
+k 1/
+k 1/
y+k1/ +[/ with xn [ xn h.
(7.21)
The Taylor algorithm is said to be of order k if the local error E as defined above is O,hk1 0. On setting k
0 in the algorithm above, we obtain Euler's method and its local error,
yn h f +xn , yn / with E
yn1
h2 2
y'' +[/
(7.22)
To illustrate Euler's method, consider the initial-value problem y'
y
with y+0/
1
(7.23)
Mathematics for Engineers
268
On applying (7.22) with h sol2
0.01 and retaining six decimal places, we obtain
TaylorSolve
y+x/ x
y+x/, y+0/ 1!, y, x, 0, 1, 100, 0
y InterpolatingFunction#'
Since the exact solution of this equation is y Æ x , we can compare Euler's algorithm with the exact solution. It is clear that, to obtain more accuracy with Euler's method, we must take a considerably smaller value for h this demonstrated in the following sequence of calculations.
Nit
h Out[32]=
Error Plot
0.0714286
2.5 2.0 1.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Figure 7.3. Solution of the initial value problem y '
0.00 0.02 0.04 0.06 0.08 0.10 0.0 0.2 0.4 0.6 0.8 1.0
+xy+x//2 1y+x/
with y+0/
1 by using different orders of a Taylor approximation.
Because of the small step size required, Euler's method is not commonly used for integrating differential equations. We could, of course, apply Taylor's algorithm of higher order to obtain better accuracy, and in general, we would expect that the higher the order of the algorithm, the greater the accuracy for a given step size. If f +x, y/ is a relatively simple function of x and y, then it is often possible to generate the required derivatives relatively cheaply on a computer by employing symbolic differentiation, or else by taking advantage of any particular properties the function f +x, y/ may have. However, the necessity of calculating the higher derivatives makes Taylor's algorithm completely unsuitable on high-speed computers for general integration purposes. Nevertheless, it is of great theoretical interest because most of the practical methods attempt to achieve the same accuracy as a Taylor algorithm of a given order without the disadvantage of having to calculate the higher derivatives. Although the general Taylor algorithm is hardly ever used for practical purposes, the special case of Euler's method will be considered in more detail for its theoretical implications. Example 7.2. Taylor Algorithm Using Taylor's series, find the solution of the differential equation
IV - Chapter 7: Ordinary Differential Equations
y+x/
eq5
x
y
+x/
1 2
x
1
y+x/2
x2
y+x/2
269
y+x/ x
y+x/ x
with the initial value y+1/ 1
initial
y+1/ 1
for the interval #1, 2'. Solution 7.2. The numerical solution follows by applying TaylorSolve to the ODE sol5
TaylorSolve+eq5, initial, y, x, 1, 2, 17, 2/
y InterpolatingFunction#'
the graphical representation shows that the solution agrees fairly well with the exact solution 1 y+x/ . x h
0.5
0.0588235
0.6
y
0.7 0.8 0.9 1.0 1.0
1.2
1.4
1.6
1.8
2.0
x Figure 7.4. Solution of the initial value problem y
+x/ order 2 and n
1 x2
y+x/2
y+x/ x
with y+1/
1 by using a Taylor approximation of
17.
Increasing the total number of subintervals decreases the time step so that the solution becomes fairly good for an approximation order of k 2
Mathematics for Engineers
270
h
0.5
0.00588235
0.6
y
0.7 0.8 0.9 1.0 1.0
1.2
1.4
1.6
1.8
2.0
x Figure 7.5. Solution of the initial value problem y
+x/ order 2 and n
1 x2
y+x/2
y+x/ x
with y+1/
1 by using a Taylor approximation of
170.
However, if we increase the interval and keep the number of total subdivisions constant, we observe that the numerical solution collapses as shown in Figure 7.6. h
0.00588235
0.5
y
0.0
0.5
1.0 5
10
15
20
x Figure 7.6. Solution of the initial value problem y
+x/ order 2 and n
1 x2
y+x/2
y+x/ x
with y+1/
1 by using a Taylor approximation of
170. The interval of integration was enlarged by a factor 10.
The increase of the total integration steps and the increase of the approximation order can somehow resolve this problem with a dramatic increase in calculation time
IV - Chapter 7: Ordinary Differential Equations
h
271
0.00037037
0.1
136.817,
y
0.2
!
0.3 0.4 5
10
15
20
x Figure 7.7. Solution of the initial value problem y
+x/ order 7 and n
1 x2
y+x/2
y+x/ x
with y+1/
1 by using a Taylor approximation of
2700. The interval of integration was enlarged to x±[0,20].
7.3.1 Implicit and Predictor–Corrector Methods From this point onwards, we shall abandon the original initial value problem, and turn our attention to numerically solving the equivalent integral equation. Let us rewrite the integral equation, starting at the mesh point k instead of t0 , and integrating until time t tk1 . The result is the basic integral formula u+tk1 /
u+tk / Ã
tk1
F+s, u+s// Å s
(7.24)
tk
that (implicitly) computes the value of the solution at the subsequent mesh point. Comparing this formula with the Euler Method uk1
uk h F+tk , uk /,
where h
and assuming for the moment that uk the integral by tk1
Ã
F+s, u+s// Å s h F+tk , u+tk //.
tk1 tk
(7.25)
u+tk / is exact, we discover that we are merely approximating
(7.26)
tk
As we discussed this is the so called Left Endpoint Rule for numerical integration — that approximates the area under the curve g +t/ F+t, u+t// between tk t tk1 by the area of a rectangle whose height g+tk / F+tk , u+tk // F+tk , uk / is prescribed by the left - hand endpoint of the graph. As indicated in Figure 7.8, this is a reasonable, but not especially accurate method of numerical integration.
272
Mathematics for Engineers
Figure 7.8. Different types of numerical integration methods.
In Chapter 5 we encountered much better methods of approximating the integral of a function. One of these is the Trapezoid Rule, which approximates the integral of the function g+t/ by the area of a trapezoid obtained by connecting the two points +tk , g+tk // and +tk1 , g+tk1 // on the graph of g by a
IV - Chapter 7: Ordinary Differential Equations
273
straight line, as in the second Figure 7.8. Let us therefore try replacing the Left Endpoint Rule by the more accurate trapezoidal approximation tk1
Ã
1
F+s, u+s// Å s
h F+tk , u+tk // F+tk1 , u+tk1 //.
2
tk
(7.27)
Substituting this approximation into the integral formula, and replacing the solution values u+tk /, u+tk1 / by their numerical approximations, leads to the (hopefully) more accurate numerical scheme uk1
1
uk
2
h F+tk , uk / F+tk1 , uk1 /,
(7.28)
known as theTrapezoid Method. It is an implicit scheme, since the updated value uk1 appears on both sides of the equation, and hence is only defined implicitly. Example 7.3. Implicit Method Consider the differential equation 1
u'
4
t u
3
(7.29)
Solution 7.3. The Trapezoid Method with a fixed step size h takes the form uk1
1
uk
2
h% 1
4 3
t k uk 1
4 3
tk1 uk1 ).
(7.30)
In this case, we can explicit solve for the updated solution value, leading to the recursive formula 1 2
1 uk1
1
1 2
h,1
h,1
4 t0 3 k
1
4 t 0 3 k1
uk
1
1 2
1 2
h
h
2 3
2 3
h tk
h+tk h/
uk
(7.31)
Implementing this scheme for three different step sizes we are able to compare the approximation with a reference solution generated by Mathematica. This comparison delivers the following errors between the computed and the reference solutions at times t 1, 2, 3. The reference solution for (7.29) can be derived with DSolve%
u+t/ t
u t Ì Æ
4t
2 t2 3
3 t
1 u+t/, u+0/ 1!, u, t)
!!
The errors at the different times are listed in the following table
Mathematics for Engineers
274
h 0.1
E+1/ E+2/ E+3/ 0.00133315 0.00060372 0.00012486
0.01
0.00001335 6.02 106 1.24 106
0.001 1.3 107
6. 108
1. 108
The numerical data indicates that the Trapezoid Method is of second order. For each reduction in step size by 1 s 10, the accuracy in the solution increases by, roughly, a factor of 1 s 100 1 t 102 ; that is, the numerical solution acquires two additional accurate decimal digits.Ô The main difficulty with the Trapezoid Method (and any other implicit scheme) is immediately apparent. The updated approximate value for the solution uk1 appears on both sides of the iteration equation. Only for very simple functions F+t, u/ can one expect to solve such an scheme explicitly for uk1 in terms of the known quantities tk , uk and tk1 tk h. The alternative is to employ a numerical equation solver, such as the bisection algorithm or Newton’s Method, to compute uk1 . In the case of Newton’s Method, one would use the current approximation uk as a first guess for the new approximation uk1 . The resulting scheme requires some work to program, but can be effective in certain situations. An alternative, less involved strategy is based on the following far-reaching idea. We already know a half-way decent approximation to the solution value uk1 — namely that provided by the more primitive Euler scheme a
uk h F+tk , uk /
uk1
(7.32)
Let’s use this estimated value in place of uk1 on the right hand side of the implicit equation. The result uk1 uk
1 2
a
h F+tk , uk / F,tk h, uk1 0
uk
1 2
h F+tk , uk / F+tk h, uk h F+tk , uk //
(7.33)
is known as the improved Euler method. It is a completely explicit scheme since there is no need to solve any equation to find the updated value uk1 . Example 7.4. Predictor Corrector Method For our favorite equation u'
1
4 3
t u
(7.34)
Solution 7.4. The improved Euler method begins with the Euler approximation a
uk1
uk h 1
4 3
tk uk
and then replaces it by the improved value
(7.35)
IV - Chapter 7: Ordinary Differential Equations
uk
uk1
uk
1 2
1
h% 1
2
h% 1
% 1
2 3
4 3
4 3
4
t k uk 1
tk uk 1
h2 %1 h 1
4 3
4 3
3
275
a
tk1 uk1 )
+tk h/ uk h 1
tk )
1 2
h2 1
4 3
4 3
tk uk )
2
t k ) uk .
Implementing this scheme leads to the following errors in the numerical solution at the indicated times. The Improved Euler Method performs comparably to the fully implicit scheme, and significantly better than the original Euler Method. h 0.1
E+1/ 0.0007023
0.01
4.59 106 0.00001068 0.00001264
0.001 4. 108
E+2/ E+3/ 0.00097842 0.00147748 1.1 107
1.2 107
The improved Euler method is the simplest of a large family of so-called predictor–corrector algorithms. In general, one begins a relatively crude method — in this case the Euler Method — to prea dict a first approximation uk1 to the desired solution value uk1 . One then employs a more sophisticated, typically implicit, method to correct the original prediction, by replacing the required update a uk1 on the right hand side of the implicit scheme by the less accurate prediction uk1 . The resulting explicit, corrected value uk1 will, provided the method has been designed with due care, be an improved approximation to the true solution. The numerical data of this example indicates that the improved Euler method is of second order since each reduction in step size by 1 s 10 improves the solution accuracy by, roughly, a factor of 1 s 100. Ô
7.3.2 Tests and Exercises The following two subsections serve to test your understanding of the last section. Work first on the test examples and then try to solve the exercises. 7.3.2.1 Test Problems T1. How does Taylor's method work? T2. Which steps are used in a Taylor approximation of a ordinary differential equation? T3. What is a predictor corrector method? T4. How is Euler's method related to Taylor's method? T5. What is the idea of Euler's method?
7.3.2.2 Exercises E1. Calculate an approximate value for y+0.1/ using one step of the Taylor series method of order 3 on the ordinary differential equation y''
y2 Æt y with y+0/
1 and y ' +0/
1.
(1)
276
Mathematics for Engineers
Solve for y+1/ using Euler's method with a step length of h with y+0/ 0, a. y' 1 y2 b. y'
+1 x/1 y+x/
with y+0/
0.1 and h
0.01.
1.
E3. Using Taylor's series, find the solution of the differential equation x y with y+2/
x y' at x
2
(2)
2.1 correct to five decimal places.
E4. Solve the equation 1 y' x
2
y
y2 with y+1/
x
1
(3)
from x 1 to x 2. Use Taylor's algorithm of order 2. Solve the problem with h estimate the accuracy of the results. E5. For the equation x y
y'
1 s 16, 1 s32, 1 s64, 1 s 128 and
1 with y+1/
y2
1.
(4)
derive the difference equation corresponding to Taylor's algorithm of order 3. Carry out by hand one step of the integration with h 0.01. Write a program for solving this problem, and carry out the integration from x 1 to x 2, using h 1 s 64 and h 1 s128. E6. For the equation y'
2 y with y+0/
1.
(5)
obtain the exact solution of the differential equation and use Euler's method to integrate the equation. Estimate a value of h small enough to guarantee four place accuracy in the solution along the interval #0, 1'. Carry out the solution with an appropriate value of h for 10 steps. E7. Use Euler's method to approximate the solutions for each of the following initial value problems. a. y' x Æ3 x 2 y, 0 x 1, y+0/ 0, with h 0.5 b. y' 1 +x y/2 , 2 x 3, y+2/ 1, with h 0.5 c. y' 1 y s x, 1 x 2, y+1/ 2, with h 0.25 E8. In a circuit with impressed voltage E having resistance R, inductance L, and capacitance C in parallel, the current I satisfies the differential equation d2 E
dI dt Suppose C
0.3 farads, R
1.4 ohms, L
C
dt
1 dE R dt
1 L
(6)
E
1.7 henries, and the voltage is given by E+t/
If I+0/
2
0, find the current I for the values t
Æ0.06 S t sin+2 t S/.
0.1 j, where j
(7)
1, ..., 100.
E9. Given the initial value problem 2 y' with exact solution y +x/
x
y x2 Æ x ,
1 x 2, y+1/
0,
(8)
t2 ,Æt Æ0:
a. Use Taylor's method of order two with h 0.1 to approximate the solution, and compare it with the actual values of y. b. Use the answers generated in part a) and linear interpolation to approximate y at the following values, and
IV - Chapter 7: Ordinary Differential Equations
277
compare them to the actual values of y. i) y+1.04/, ii) y+1.55/ iii) y+1.97/ c. Use Taylor's method of order four with h 0.1 to approximate the solution, and compare it with the actual values of y. E10 A projectile of mass m 0.11 kg shot vertically upward with initial velocity v +0/ 8 m ss is slowed due to the force of gravity, Fg mg, and due to air resistance, Fr kv v , where g 9.8 m t s2 and k=0.002 kgs m. The differential equation for the velocity v is given by m v'
m g k v v .
(9)
a. Find the velocity after 0.1, 0.2,... , 1.0 s. b. To the nearest tenth of a second, determine when the projectile reaches its maximum height and begins falling.
7.4 Runge-Kutta Methods As mentioned previously, Euler's method is not very useful in practical problems because it requires a very small step size for reasonable accuracy. Taylor's algorithm of higher order is unacceptable as a general-purpose procedure because of the need to obtain higher total derivatives of y+x/. The RungeKutta methods attempt to obtain greater accuracy, and at the same time avoid the need for higher derivatives, by evaluating the function f +x, y/ at selected points on each subinterval. We shall derive here the simplest of the Runge-Kutta methods. A formula of the following form is sought: yn1
yn a k1 b k2
(7.37)
where k1
h f +xn , yn /
(7.38)
k2
h f +xn D h , yn E k1 /
(7.39)
and a, b, D, and E are constants to be determined so that (7.37) will agree with the Taylor algorithm of as high an order as possible. On expanding y+xn1 / in a Taylor series through terms of order h3 , we obtain xn .; yn . ser1
Series# y+h xn/, h, 0, 3'
y+xn/ h y
+xn/
1 2
h2 y
+xn/
1 6
h3 y+3/ +xn/ O,h4 0
278
Mathematics for Engineers
f +x, y+x//;
t0
Last#t0'
s. y
+x/ f +x, y+x//), i, 1, 2);
Do%t0
Append%t0,
rule1
Simplify%+Rule Ó1 &/ s Table%
x
i y+x/ xi
¬
, i, 1, 3), t0! s. x xn, y+_/ yn)
y
+xn/ f +xn, yn/, y
+xn/ f +xn, yn/ f +0,1/ +xn, yn/ f +1,0/ +xn, yn/, 2
y+3/ +xn/ f +0,2/ +xn, yn/ f +xn, yn/2 - f +0,1/ +xn, yn/ 2 f +1,1/ +xn, yn/1 f +xn, yn/ f +0,1/ +xn, yn/ f +1,0/ +xn, yn/ f +2,0/ +xn, yn/! ser01
ser1 s. rule1
y+xn/ h f +xn, yn/ 1 6
1 2
h2 , f +xn, yn/ f +0,1/ +xn, yn/ f +1,0/ +xn, yn/0 2
h3 - f +0,2/ +xn, yn/ f +xn, yn/2 - f +0,1/ +xn, yn/ 2 f +1,1/ +xn, yn/1 f +xn, yn/ f +0,1/ +xn, yn/ f +1,0/ +xn, yn/ f +2,0/ +xn, yn/1 O,h4 0
where we have used the Taylor expansions, and all functions involved are to be evaluated at xn , yn . On the other hand, using Taylor's expansion for functions of two variables, we find that k2 h Simplify$Normal#Series# f +h D xn, k1 E yn/, h, 0, 2, k1, 0, 2'' s.
k2Rule
f +2,2/ +xn, yn/ 0, f +1,2/ +xn, yn/ 0, f +2,1/ +xn, yn/ 0 ( k2 h
1 2
h2 D2 f +2,0/ +xn, yn/ h k1 D E f +1,1/ +xn, yn/
h D f +1,0/ +xn, yn/
1 2
k12 E2 f +0,2/ +xn, yn/ k1 E f +0,1/ +xn, yn/ f +xn, yn/
where all derivatives are evaluated at xn , yn . If we now substitute this expression for k2 into (7.37) and note that k1 rearrangement in powers of h that yn1 .
h f +xn , yn /, we find upon
IV - Chapter 7: Ordinary Differential Equations
279
yn1 Expand#a h f +xn, yn/ b k2 yn s. k2Rule s. k1 h f +xn, yn/, h' yn1 a h f +xn, yn/
1 2
b h3 D2 f +2,0/ +xn, yn/ 1
b h3 D E f +xn, yn/ f +1,1/ +xn, yn/
b h3 E2 f +xn, yn/2 f +0,2/ +xn, yn/ 2 b h2 D f +1,0/ +xn, yn/ b h2 E f +xn, yn/ f +0,1/ +xn, yn/ b h f +xn, yn/ yn form1
CoefficientList#Expand#a h f +xn, yn/ b k2 yn s. k2Rule s. k1 h f +xn, yn/', h'
yn, a f +xn, yn/ b f +xn, yn/, b D f +1,0/ +xn, yn/ b E f +xn, yn/ f +0,1/ +xn, yn/, 1 2
b D2 f +2,0/ +xn, yn/ b D E f +xn, yn/ f +1,1/ +xn, yn/
form2
6
2
b E2 f +xn, yn/2 f +0,2/ +xn, yn/!
CoefficientList#ser01, h'
y+xn/, f +xn, yn/, 1
1
1 2
, f +xn, yn/ f +0,1/ +xn, yn/ f +1,0/ +xn, yn/0, 2
- f +0,2/ +xn, yn/ f +xn, yn/2 - f +0,1/ +xn, yn/ 2 f +1,1/ +xn, yn/1 f +xn, yn/ f +0,1/ +xn, yn/ f +1,0/ +xn, yn/ f +2,0/ +xn, yn/1!
On comparing this with (7.37) we see that to make the corresponding powers of h and h2 agree we must have TableForm#Thread#form1 form2'' yn y+xn/ a f +xn, yn/ b f +xn, yn/ f +xn, yn/ b D f +1,0/ +xn, yn/ b E f +xn, yn/ f +0,1/ +xn, yn/ 1 2
1 2
, f +xn, yn/ f +0,1/ +xn, yn/ f +1,0/ +xn, yn/0
b D2 f +2,0/ +xn, yn/ b D E f +xn, yn/ f +1,1/ +xn, yn/
1 2
b E2 f +xn, yn/2 f +0,2/ +xn, yn/
1 6
- f +0,2/ +xn, yn/ f +xn,
which can be solved with respect to the unknowns a, b, D, and E solred
Reduce%a b 1, b D
b 1aÐa1 0ÐD
1 2
,bE
1 2 +a 1/
1 2
!, a, b, D, E)
ÐED
Although we have four unknowns, we have only three equations, and hence we still have one degree of freedom in the solution of this system. We might hope to use this additional degree of freedom to obtain a agreement of the coefficients in the h3 terms. It is obvious, however, that this is impossible for all functions f +x, y/. There are many solutions to this system, the simplest perhaps being
Mathematics for Engineers
280
solred s. a b
1 2
1 2
!
ÐD 1Ð E D
Since we fixed the parameters in (7.37) we can write down an algorithm for the integration Algorithm Runge-Kutta: Runge-Kutta method of order 2 for the equation y' y+x0 /
f +x, y/
(7.40)
y0
(7.41)
generate approximations yn to y+x0 n h/, for h fixed and n yn
yn1
1 2
0, 1, … using the recursion formula
+k1 k2 /
(7.42)
with k1
h f +xn , yn /
(7.43)
k2
h f +xn h, yn k1 /
(7.44)
The local error of (7.42) is of the form y+xn1 / yn1
h3 12
, fxx 2 f fxy f 2 fyy 2 f x f y 2 f f y2 0 O,h4 0
(7.45)
The complexity of the coefficient in this error term is characteristic of all Runge-Kutta methods and constitutes one of the least desirable features of such methods since local error estimates are very difficult to obtain. The local error of (7.42), is, however, of order h3 , whereas that of Euler's method is h2 . We can therefore expect to be able to use a larger step size with (7.42). The price we pay for this is that we must evaluate the function f +x, y/ twice for each step of the integration. Formulas of the Runge-Kutta type for any order can be derived by the method used above. However, the derivations become exceedingly complicated. The most popular and most commonly used formula of this type is contained in the following Algorithm. Algorithm Runge-Kutta: Runge-Kutta method of order 4 for the equation y' y+x0 /
f +x, y/
(7.46)
y0
(7.47)
generate approximations yn to y+x0 n h/, for h fixed and n yn1
yn
1 6
+k1 2 k2 2 k3 k4 /
0, 1, … using the recursion formula (7.48)
IV - Chapter 7: Ordinary Differential Equations
281
with k1
h f +xn , yn /
(7.49)
k2
h 1 h f xn , yn k1 2 2
(7.50)
k3
h 1 h f xn , yn k2 2 2
(7.51)
k4
h f +xn h, yn k3 /
(7.52)
The local discretization error of this Algorithm is O,h 0. Again the price we pay for the favorable 5
discretization error is that four function evaluations are required per step. This price may be considerable in computer time for those problems in which the function f +x, y/ is complicated. The Runge-Kutta methods have additional disadvantages, which will be discussed later. Formula (7.48) is widely used in practice with considerable success. It has the important advantage that it is self-starting: i.e., it requires only the value of y at a point x xn to find y and y' at x xn1 . The following function contains an implementation of a 4th order Runge-Kutta algorithm.
Mathematics for Engineers
282
RungeKutta$eq_List, depend_, independ_, a_, b_ , Nmax_( : Block%f$h, h, x0, y0, xn, yn, lis, k1, k2, k3, k4, + extract information from equation / f$h Last#First#eq''; y0 Last#Last#eq''; + fix step length / +b a/ ; h Nmax + initialize iteration / x0 a; yn y0; xn x0; lis x0, yn; + iterate the 4thorder Runge Kutta algorithm / Do% k1 k2 k3 k4 yn xn
h f$h s. x ! xn, y#x' ! yn; h 1 h f$h s. x ! xn , y#x' ! yn k1!; 2 2 h 1 k2!; h f$h s. x ! xn , y#x' ! yn 2 2 h f$h s. x ! xn h, y#x' ! yn k3; 1 +k1 2 k2 2 k3 k4/); N%yn 6 N#x0 n h';
AppendTo#lis, xn, yn', n, 1, Nmax); depend ! Interpolation#lis')
Example 7.5. Runge-Kutta The following initial value problem is solved by the 4th order Runge-Kutta algorithm y'
y
with y+0/
1
(7.53)
Solution 7.5. The line below carries out the iteration sol02
RungeKutta
y+x/ x
y+x/, y+0/ 1!, y, x, 0, 8, 8
y InterpolatingFunction#'
Comparing the numerical with the exact solution shows excellent agreement
IV - Chapter 7: Ordinary Differential Equations
283
1500
y
1000
500
0 0
2
4
6
8
x Figure 7.9. Solution of the initial value problem y ' y with y+0/ 1. The integration was performed with a 4th order RungeKutta method. The exact solution y Æx is plotted in connection with the numerical solution. It is obvious that both representations of the solution are close together.
However the absolute error of the calculations shows deviations especially for large arguments. 50
40
«yÆx «
30
20
10
0 0
2
4
6
8
x Figure 7.10. Representation of the absolute error for the initial value problem y '
y with y+0/
performed with a 4th order Runge-Kutta method. The error is determined by y Æx for n
1. The integration was
8 steps.
Increasing the number of integration steps reduces the error dramatically by two orders of magnitude.
Mathematics for Engineers
284
0.008
y
0.006
0.004
0.002
0.000 0
2
4
6
8
x Figure 7.11. Representation of the absolute error for the initial value problem y '
y with y+0/
performed with a 4th order Runge-Kutta method. The error is determined by y Æx for n
1. The integration was
100 steps.Ô
Example 7.6. Runge-Kutta Algorithm Using Runge-Kutta algorithm, find the solution of the differential equation eq5
y+x/ x
y
+x/
1 2
x
1 x2
y+x/2
y+x/2
y+x/ x
y+x/ x
with the initial value initial
y+1/ 1
y+1/ 1
for the interval #1, 2'. Solution 7.6. The numerical solution follows by applying RungeKutta to the ODE sol5
RungeKutta+eq5, initial, y, x, 1, 2, 17/
y InterpolatingFunction#'
the graphical representation of the absolute error y yex shows that the solution agrees fairly well 1 with the exact solution yex (see Figure 7.12). x
IV - Chapter 7: Ordinary Differential Equations
h
285
0.0588235
8. 106
«yyex «
6. 106 4. 106 2. 106
0 1.0
1.2
1.4
1.6
1.8
2.0
x Figure 7.12. Representation of the absolute error for the initial value problem y
+x/
1 x2
y+x/2
y+x/ x
with y+1/
1. The
th
integration was performed with a 4 order Runge-Kutta method.
However, if we increase now the interval and keep the number of total subdivisions, we observe that the numerical solution is stable. This was not the case with the Taylor series approximation where for the same initial value problem the procedure became unstable. h
0.00588235
0.00008
«yyex «
0.00006
0.00004
0.00002
0 5
10
15
20
x Figure 7.13. Representation of the absolute error for the initial value problem y
+x/
1 x2
y+x/2
y+x/ x
with y+1/
1. The
th
integration was performed with a 4 order Runge-Kutta method.Ô
7.4.1 Tests and Exercises The following two subsections serve to test your understanding of the last section. Work first on the test examples and then try to solve the exercises.
286
Mathematics for Engineers
7.4.1.1 Test Problems T1. What is the Runge-Kutta method? T2. What is the basic idea to derive the Runge-Kutta method? T3. Is there only a single Runge-Kutta method? T4. Explain the difference between a second and fourth order Runge-Kutta method.
7.4.1.2 Exercises E1. Water flows from an inverted conical tank with circular orifice at the rate dx dt
0.6 S r2
x 2g
A+x/
(1)
,
where r is the radius of the orifice, x is the height of the liquid level from the vertex of the cone, and A+x/ is the area of the cross section of the tank x units above the orifice. Suppose r 0.1 m, g 9.82 m t s2 , and the tank has an initial water level of 8 m and initial volume of 512 S s3 m3 . a. Compute the water level after 10 min with h 20 s. b. Determine, to within 1 min, when the tank will be empty. E2. Use the fourth order Runge-Kutta method to approximate the solutions for each of the following initial value problems. a. y' x Æ3 x 2 y, 0 x 1, y+0/ 0, with h 0.5 b. y' 1 +x y/2 , 2 x 3, y+2/ 1, with h 0.5 c. y' 1 y s x, 1 x 2, y+1/ 2, with h 0.25 E3. Solve for y+1/ using the second order Runge-Kutta method with a step length of h a. y' 1 y2 with y+0/ 0, b. y'
+1 x/1 y+x/
with y+0/
0.1 and h
0.01.
1.
E4. Using the Runge-Kutta method of order four to obtain approximations to the solution of the initial-value problem
with h
0.2, N
10, and x
1
y'
y x2 1, 0 x 2,
y+0/
y'
y x2 1, 0 x 2,
y+0/
2
,
(2)
,
(3)
0.2 i.
E5. For the problem 1 2
compare Euler's method with h 0.025 and the Runge-Kutta fourth order method with h 0.1 at the common mesh points of these methods 0.1, 0.2, 0.3, 0.4, and 0.5. E6. Use the Runge-Kutta method of order four to approximate the solutions to each of the following initial value problems, and compare the results to the exact values. a. y' x Æ2 x 2 y, 0 x 1, y+0/ 0, 2 x 3, y+2/ 1, b. y' 1 +x y/2 , 1 x 2, y+1/ 2, c. y' 1 y s x , d. y' cos+2 x/ sin+x/, 0 x 1, y+0/ 1. E7. The initial value problem y' allows the solution
Æ y,
0x
2 10
,
y+0/
1
(4)
IV - Chapter 7: Ordinary Differential Equations
yex
287
1 ln+1 Æ x/.
(5)
Apply the fourth order Runge-Kutta method to verify this solution for x ± #0, 2'. Use a graphing tool to display the absolute error H yex y .
7.5 Stiff Differential Equations While the fourth order Runge–Kutta Method with a sufficiently small step size will successfully integrate a broad range of differential equations—at least over not unduly long time intervals — it does occasionally experience unexpected difficulties. While we have not developed sufficiently sophisticated analytical tools to conduct a thorough analysis, it will be instructive to look at why a breakdown might occur in a simpler context. Example 7.7. Stiff Differential Equations The elementary linear initial value problem du
250 u
dt
with u+0/
(7.54)
1.
is an instructive and sobering example. Solution 7.6. The explicit solution is easily found; it is a very rapidly decreasing exponential: u +t/ e250 t with u+1/ Æ250 2.66919 10109 . The following table gives the result of approximating the solution value u +1/ 2.67 10 109 at time t 1 using three of our numerical integration schemes for various step sizes: h
Euler
Improved Euler RK4
0.1
6.34 1013
3.99 1024
2.81 1041
0.01
4.07 1017
1.22 1021
1.53 1019
0.001 1.15 10
125
6.17 10
108
2.69 10109
The results are not misprints! When the step size is 0.1, the computed solution values are perplexing large, and appear to represent an exponentially growing solution — the complete opposite of the rapidly decaying true solution. Reducing the step size beyond a critical threshold suddenly transforms the numerical solution to an exponentially decaying function. Only the fourth order RK4 method with step size h=0.001 — and hence a total of 1000 steps — does a reasonable job at approximating the correct value of the solution at t 1.Ô You may well ask, what on earth is going on? The solution couldn’t be simpler — why is it so difficult to compute it? To understand the basic issue, let us analyze how the Euler Method handles such simple differential equations. Consider the initial value problem du dt
Ou
with u+0/
1,
(7.55)
Mathematics for Engineers
288
which has an exponential solution u+t/
ÆO t .
(7.56)
The Euler Method with step size h relies on the iterative scheme uk1
+1 O h/ uk and u0
1,
(7.57)
with solution uk
+1 O h/k .
(7.58)
If O ! 0, the exact solution (7.56) is exponentially growing. Since 1 O h ! 1, the numerical iterates are also growing, albeit at a somewhat slower rate. In this case, there is no inherent surprise with the numerical approximation procedure — in the short run it gives fairly accurate results, but eventually trails behind the exponentially growing solution. On the other hand, if O 0, then the exact solution is exponentially decaying and positive. But now, if O h 2, then 1 O h 1, and the iterates (7.57) grow exponentially fast in magnitude, with alternating signs. In this case, the numerical solution is nowhere close to the true solution; this explains the previously observed pathological behavior. If 1 1 O h 0, the numerical solutions decay in magnitude, but continue to alternate between positive and negative values. Thus, to correctly model the qualitative features of the solution and obtain a numerically respectable approximation, we need to choose the step size h so as to ensure that 0 1 O h, and hence h 1 s O when O 0. For the value O 250 in the example, then, we must choose h 1 s 250 0.004 in order that Euler's method gives a reasonable numerical answer. A similar, but more complicated analysis applies to any of the Runge– Kutta schemes. Thus, the numerical methods for ordinary differential equations exhibit a form of conditional stability. Paradoxically, the larger negative O is — and hence the faster the solution tends to a trivial zero equilibrium — the more difficult and expensive the numerical integration. The system (7.55) is the simplest example of what is known as a stiff differential equation. In general, an equation or system is stiff if it has one or more very rapidly decaying solutions. In the case of autonomous (constant coefficient) linear systems u A u, stiffness occurs whenever the coefficient matrix A has an eigenvalue with a large negative real part: ¥H+O/ C 0, resulting in a very rapidly decaying eigensolution. It only takes one such eigensolution to render the equation stiff, and ruin the numerical computation of even the well behaved solutions! Curiously, the component of the actual solution corresponding to such large negative eigenvalues is almost irrelevant, as it becomes almost instantaneously tiny. However, the presence of such an eigenvalue continues to render the numerical solution to the system very difficult, even to the point of exhausting any available computing resources. Stiff equations require more sophisticated numerical procedures to integrate, and we refer the reader for more details to the books by Hairer, Nörsett, and Iserles [12,17].
IV - Chapter 7: Ordinary Differential Equations
289
7.5.1 Tests and Exercises The following two subsections serve to test your understanding of the last section. Work first on the test examples and then try to solve the exercises. 7.5.1.1 Test Problems T1. What are stiff differential equations? T2. Explain the problems with stiff differential equations. T3. How can stiff differential equations be identified?
7.5.1.2 Exercises E1. The differential equation
y' has the exact solution y
1 3
3 0x , 2
30 y,
Æ30 x . Using h
1 y+0/
(1)
3
0.1 for Euler's algorithm and Runge-Kutta fourth order algorithm gives
the results at x 1.5. Compare the results of the exact solution with the numerical approximation at the endpoint of the integration. E2. The stiff initial value problem y'
2
5 Æ5 x +y x/ 1, 0 x 1, y+0/
1
(2)
5 x
allows the exact solution y+x/ x Æ . Compare this solution with a numerical solution generated by a fourth order Runge-Kutta method with step length h 0.25 and h 0.1. E3. Solve the following stiff initial value problems using Euler's method, and compare the results with their exact solution. a. y' 9 y, 0 x 1, y+0/ Æ, with h 0.1, b. y'
20 ,y x2 0 2 x,
0 x 1,
y+0/
1 , 3
c. y'
20 y 20 sin+x/ cos+x/,
0 x 2,
y+0/
1, with h
with h
0.1, 0.25,
E4. Repeat the solution of exercise E3 with a Runge-Kutta fourth order method using step sizes h
0.1 and h
0.25.
E5. Consider the second order equation y'' 1001 y ' 1000 y
0, 0 x 2, y+0/
1 and y ' +0/
1
(3)
Solve the initial value problem using a fourth order Runge-Kutta method with step length h 0.01 and h 0.002. Discuss the numbers of integration steps to get the final value of the initial value problem at y+2/. Compare your numerical results with the exact solution.
7.6 Two-Point Boundary Value Problems When ordinary differential equations are required to satisfy boundary conditions at more than one value of the independent variable, the resulting problem is called a two-point boundary value problem. As the terminology indicates, the most common case by far is where boundary conditions are supposed to be satisfied at two points — usually the starting and ending values of the integration. However, the phrase two-point boundary value problem is also used loosely to include more complicated cases, e.g., where some conditions are specified at endpoints, others at interior (usually singular) points.
Mathematics for Engineers
290
The crucial distinction between initial value problems and two point boundary value problems is that in the former case we are able to start an acceptable solution at its beginning (initial values) and just march it along by numerical integration to its end (final values), while in the present case the boundary conditions at the starting point do not determine a unique solution to start with — and a “random” choice among the solutions that satisfy these (incomplete) starting boundary conditions is almost certain not to satisfy the boundary conditions at the other specified point. First we consider the problem of finding the shape of a cable that is fastened at each end and carries a distributed load. The cables of a suspension bridge provide an important example. Let u+x/ denote the position of the center line of the cable, measured upward from the x-axis, which we assume to be horizontal (see Figure 7.14). Our objective is to find the function u+x/. The shape of the cable is determined by the forces acting on it. In our analysis, we consider the forces that hold a small segment of the cable in place. (see Figure 7.14). The key assumption is that the cable is perfectly flexible. This means that force inside the cable is always a tension and that its direction at every point is the direction tangent to the center line. 5
4
u+x/
3 h1 2 x 1
0
x'x f+x/'x
h0
0.0
0.5
1.0
1.5
2.0
2.5
a 3.0
x Figure 7.14. Hanging cable for a bridge support.
We suppose that the cable is not moving. Then by Newton’s second law, the sum of the horizontal components of the forces on the segment is 0, and likewise for the vertical components. If T+x/ and T+x 'x/ are the magnitudes of the tensions at the ends on the segment, we have the two equations in horizontal and vertical direction: T+x 'x/ cos+I+x 'x// T+x/ cos+I+x//
0
T+x 'x/ sin+I+x 'x// T+x/ sin+I+x// f +x/ 'x
(7.59) 0
(7.60)
In the second equation, f +x/ is the intensity of the distributed load, measured in force per unit of horizontal length, so f +x/ 'x is the load borne by the small segment.
IV - Chapter 7: Ordinary Differential Equations
291
From Equation (7.59) we see that the horizontal component of the tension is the same at both ends of the segment. In fact, the horizontal component of tension has the same value — call it T — at every point, including the endpoints where the cable is attached to solid supports. By simple algebra we can now find the tension in the cable at the ends of our segment satisfies T
T+x 'x/
cos+I+x 'x//
and T+x/
T (7.61)
cos+I+x//
and substitute these into Equation (7.60), which becomes T cos+I+x 'x//
sin+I+x 'x//
T cos+I+x//
sin+I+x// f +x/ 'x
0
(7.62)
or T +tan+I+x 'x// tan+I+x/// f +x/ 'x
0
(7.63)
Before going further we should note (see Figure 7.14) that I+x/ measures the angle between the tangent to the center line of the cable and the horizontal. As the position of the center line is given by u+x/, tan+I+x// is just the slope of the cable at x. From elementary calculus we know du+x/
tan+I+x//
dx
(7.64)
.
Substituting the derivative for the slope and making some algebraic adjustments, we obtain du+x 'x/
T
dx
du+x/ dx
f +x/ 'x
0
(7.65)
Dividing through by 'x yields , T
du+x'x/ dx
du+x/ 0 dx
'x
f +x/.
(7.66)
In the limit, as x approaches 0, the difference quotient on the left hand side of (7.66) becomes the second derivative of u, and the result is the equation T
d 2 u+x/ dx2
f +x/.
(7.67)
which is valid for x in the range 0 x a, where the cable is located. In addition, u+x/ must satisfy the boundary conditions u+0/
h0 and u+a/
h1 .
(7.68)
Mathematics for Engineers
292
For any particular case, we must choose an appropriate model for the loading, f +x/. One possibility is that the cable is hanging under its own weight of w units of weight per unit length of cable. Then in Equation (7.67), we should put f +x/
w
1
du
2
(7.69)
dx
Therefore, with this assumption, the boundary value problem that determines the shape of the cable is
T
d 2 u+x/ dx2
w
1
du
2
dx
with u+0/
h0 and u+a/
h1 .
(7.70)
Note that the differential equation is nonlinear. Nevertheless, we can find its general solution in closed form and satisfy the boundary conditions by appropriate choice of the arbitrary constants that appear. Another case arises when the cable supports a load uniformly distributed in the horizontal direction, as given by a constant load f +x/
w.
(7.71)
This is approximately true for a suspension bridge. The boundary value problem to be solved is then T
d 2 u+x/ dx2
w with u+0/
h0 and u+a/
h1 .
(7.72)
The general solution of the differential equation (7.72) can be found by a polynomial solution satisfying the boundary conditions as u+x_/ : c1 x c2
w x2 2T
where c1 and c2 are arbitrary. The two boundary conditions require boundaryEquations
u+0/ h0, u+a/ h1; TableForm#boundaryEquations'
c2 h0 a2 w 2T
a c1 c2 h1
These two are solved for c1 and c2 in terms of given parameters. The result is gained from the following line solCoeff c1
Flatten#Solve#boundaryEquations, c1, c2'' a2 w 2 h0 T 2 h1 T 2aT
, c2 h0!
Inserting the derived coefficients c1 and c2 into our general polynomial ansatz, we find
IV - Chapter 7: Ordinary Differential Equations
293
u+x/ s. solCoeff
x ,a2 w 2 h0 T 2 h1 T0 2aT
h0
w x2 2T
Clearly, this function specifies the cable’s shape as part of a parabola opening upward. The discussed introductory example demonstrates the difference between an initial value problem and a two point boundary value problem. An initial value problem looks for a solution of a differential equation under the condition that the initial values are specified while a boundary value problem needs both the initial values and the terminal values of an interval to be defined before we start to integrate. The “standard” two-point boundary value problem has the following general form: We desire the solution to a set of N coupled first order ordinary differential equations, satisfying n1 boundary conditions at the starting point x0 a and a remaining set of n2 boundary conditions at the final point xn b. Recall that all differential equations of order higher than first can be written as coupled sets of first order ordinary differential equations. So far we have considered numerical solution of initial value problems, where all boundary conditions are specified at one point. Now we shall consider more general problems, where boundary conditions involve the function values at more than one point. Thus, for a system of n first order differential equations, the boundary conditions may be written in the form g j +u1 +a/, …, un +a/, u1 +b/, …, un +b//
0,
j
1, 2, …, n.
(7.73)
Here it is assumed that the boundary conditions involve function values at two different points. This definition can be easily generalized to more than two points. In practice, the boundary value problems usually involve boundary conditions at two different points and are also referred to as two-point boundary value problem. Further, in most cases the boundary conditions are separable in the sense that each individual condition involves the function value at one point only. In that case, the boundary conditions can be written as g j +u1 +a/, …, un +a//
0,
j
1, 2, …, n1 .
(7.74)
g j +u1 +b/, …, un +b//
0,
j
n1 1, …, n.
(7.75)
In this chapter, we assume that boundary conditions are in this form. The numerical methods can be easily generalized to more general forms. The simplest linear boundary value problem is one in which the solution of a second order equation, say d 2 u+x/ dx
2
p+x/
du+x/ dx
q+x/ u+x/
0
(7.76)
is specified at two distinct points, say u+a/
D and u+b/
E
(7.77)
Mathematics for Engineers
294
The solution u+x/, is sought in the interval a x b. A formal approach to the exact solution of the boundary value problem is obtained by considering the related initial value problem, U '' p+x/ U ' q+x/ U
0
D, and U ' +a/
U+a/
(7.78) s
(7.79)
The theory of solutions of such initial value problems is well known and if, for example, the functions p+x/ and q+x/ are continuous on #a, b', the existence of a unique solution of (7.78-79) in #a, b' is assured. Let us denote this solution by U
U+s; x/
(7.80)
and recall that every solution of (7.78) or (7.76) is a linear combination of two particular "independent" solutions of (7.76), u+1/ +x/ and u+2/ +x/, which satisfy, say, u+1/ +a/
1, u+1/' +a/
0; u+2/ +a/
0, u+2/' +a/
1.
(7.81)
Then the unique solution of (7.78) which satisfies (7.79) is D u+1/ +x/ s u+2/ +x/.
U+s; x/
(7.82)
Now if we take s such that U+s; b/
D u+1/ +b/ s u+2/ +b/
E
(7.83)
then u +x/ U+s; x/ is a solution of the boundary value problem (7.76-77). Clearly, there is at most one root of equation (7.83), s
E D u+1/ +b/ u+2/ +b/
,
(7.84)
provided that u+2/ +b/ 0. If, on the other hand, u+2/ +b/ 0 there may not be a solution of the boundary value problem (7.76-77). A solution would exist in this case only if E a u+1/ +b/, but it would not be unique since then U+s; x/ of (7.82) is a solution for arbitrary s. Thus there are two mutually exclusive cases for the linear boundary value problem, the so called alternative principle: either a unique solution exists or else the homogeneous problem (i.e., u +a/ u+b/ 0) has a non trivial solution (which is s u+2/ +x/ in this example). These observations permit us to study the solution of the inhomogeneous equation d 2 u+x/ dx
2
p+x/
du+x/ dx
q+x/ u+x/
r+x/
(7.85)
subject to the boundary conditions (7.77). This problem can be reduced to the previous case if a particular solution of (7.85), say u+ p/ +x/, can be found. Then we define
IV - Chapter 7: Ordinary Differential Equations
295
u+x/ u+ p/ +x/,
w+x/
(7.86)
and find that w+x/ must satisfy the homogeneous equation (7.76). The boundary conditions for w+x/ become, from (7.86) and (7.77) w+a/
D u+ p/ +a/
D'
(7.87)
w+b/
E u+ p/ +b/
E'
(7.88)
Thus, we can find the solution of (7.85) and (7.77) by solving (7.76-77) with +D, E/ replaced by +D', E'/. A definite problem for the determination of u+ p/ +x/ is obtained by specifying particular initial conditions, say u+ p/ +a/
u+ p/' +a/
(7.89)
0,
which provides a standard type of initial value problem for the equation (7.85). The formulation of boundary value problems for linear second order equations can be easily extended to more general nth order equations or equivalently to nth order systems of first order equations (not necessarily linear). For example, in the latter case we may consider the system u'
f +u, x/
(7.90)
where we use the row vectors u +u1 , u2 , …, un /, f + f1 , f2 , …, fn / and the functions fk fk +u1 , …, un , x/ are functions of n 1 variables. The n boundary conditions may be, say, u1 +a/ D1 , u2 +a/ D2 , …, um1 +a/ Dm1 , um1 1 +b/ …um2 +b/ Em2 with m1 m2 n and m1 , m2 ! 0. Thus, we specify m1 quantities at x
Em1 1 ,
a and the remaining n m1
fk +u, x/
(7.91) m2 quantities at x
b.
In analogy with (7.78-79) we consider the related initial value problem: U Ui +a/
f ,U , x0 Di
Um1 j +b/
sj
(7.92) with i
1, 2, …, m1
(7.93)
with j
1, 2, …, m2 .
(7.94)
We indicate the dependence on the m2 arbitrary parameters s j by writing Uk
Uk +s1 , s2 , …, sm2 , x/,
k
1, 2, …, n.
(7.95)
These parameters are to be determined such that Um1 j +s1 , s2 , …, sm2 , b/
E j,
j
1, 2, …, m2 .
(7.96)
This represents a system of m2 equations in the m2 unknowns s j . In the corresponding linear case (i.e., in which each fk is linear in all the Uk ) the system (7.96) becomes a linear system and its solvability is
Mathematics for Engineers
296
thus reduced to a study of the non singularity of a matrix of order m2 . In the general case, however, the roots of a transcendental system (7.96) are required and the existence and uniqueness theory is more complicated (and in fact, is not as completely developed as it is for the linear case). In the next few subsections we will examine different types of numerical methods for approximating the solutions of boundary value problems.
7.6.1 The Shooting Method We want to explain the simple shooting method first by means of an example. Suppose we are given the boundary-value problem u'' +x/
f +x, u, u'/ with u+a/
D and u+b/
E.
(7.97)
D and u' +a/
s.
(7.98)
The initial value problem u'' +x/
f +x, u, u'/ with u+a/
in general (7.98) has a uniquely determined solution u +x/ u+s; x/ which of course depends on the choice of the initial value s for u' +a/. To solve the boundary value problem (7.97), we must determine q q s s so as to satisfy the second boundary condition, u+b/ u+s; b/ E. In other words: one has to find q a zero s of the function F +s/ u+s; b/ E. For every argument s the function F+s/ can be computed. For this, one has to determine the value u +b/ u+s; b/ of the solution u+s; x/ of the initial value problem (7.98) at the point x b. The computation of F+s/ thus amounts to the solution of an initial value problem. q For determining a zero s of F+s/ one can use, in principle, any method of Chapter 4. If one knows, e.g., values s0 , s1 with F+s0 / 0, F+s1 / ! 0
(7.99)
q one can compute s by means of a simple bisection method. Since u+s; b/, and hence F+s/, are in general continuously differentiable functions of s, one can also use q Newton’s method to determine s. Starting with an initial approximation s0 , one then has to iteratively compute values si according to the prescription si1
si
F+si / F ' +si /
.
(7.100)
u+si ; b/ and thus F+si /, can be determined by solving the initial value problem u''
f +x, u, u'/, u+a/
D, u' +a/
Example 7.8. Shooting Method I Consider the boundary value problem
si .
(7.101)
IV - Chapter 7: Ordinary Differential Equations
3
u'' +x/
2
u2 , with u+0/
4, u+1/
297
(7.102)
1.
Solution 7.8. One finds the solution of the initial-value problem
equation u
+x/
2 u+x/ xx
3 u+x/2 2
3 u+x/2 2
, u+0/ 4, u
+0/ s!
, u+0/ 4, u
+0/ s!
by integrating this problem with a variety of initial conditions for s. The following line generates the solution and the final value for u+1/ as a list of data points +s, u+1, s// sData
Table#Flatten#V, u+1/ 1 s. NDSolve#equation s. s V, u, x, 0, 1'', V, 100, 0, 1';
The points found are graphically represented in the following Figure 7.15 representing the relation F+s/ u+1, s/ 1. The graph shows that there are two crossings of the function F+s/ with the s-axis. Consequently we expect to find two solutions of the boundary value problem.
80
u+1,s/
60
40
20
0 100
80
60
40
20
0
s Figure 7.15. Graph of the relation F+s/
u+1, s/ 1 for the boundary value problem u ''
3 2
u2 with u+0/
4, u+1/
1.
The data points found for F+s/ are used to generate an interpolation function to increase the accuracy of the root finding procedure needed to locate the initial conditions. F
Interpolation#sData'
InterpolatingFunction#'
The roots of F+s/ u+1, s/ 1 FindRoot to this equation
0 are found in the next step by applying the root finding methods of
Mathematics for Engineers
298
initialSolutions +FindRoot#F+s/ 0, s, Ó1' &/ s 10, 20; VValues s s. initialSolutions 8., 35.8586
The two roots for s are located at s 8 and s 35.858 …. Knowing the initial values allows us to derive the solution of the boundary value problem by finally solving the initial value problem. The solution follows by inserting the two initial values in the original equations and the corresponding initial conditions. solutions
Flatten#+NDSolve#equation s. s Ó1, u, x, 0, 1' &/ s VValues'
u InterpolatingFunction#', u InterpolatingFunction#'
The derived solutions are shown in Figure 7.16. It is obvious that the solution are different from each other. We can conclude from this result that a boundary value problem contrary to an initial value problem does not have a unique solution. 4 2 0
un
2 4 6 8 10 0.0
0.2
0.4
0.6
0.8
1.0
x Figure 7.16. Graph of the two solutions for the boundary value problem u ''
3 2
u2 with u+0/
4, u+1/
1.Ô
The shooting method is not only useful for second order boundary value problems but can also be applied to higher order equations. The following example is discussing a third order ordinary differential equation. This third order problem needs in addition to the two conditions used for the second order equation an additional condition for the first order derivative. Example 7.9. Shooting Method II Let us consider the third order boundary value problem u''' +x/ x u+x/
,x3 2 x2 5 x 30 Æ x , with u+0/
0, u+1/
0, and u' +0/
1.
(7.103)
For this equation the boundary conditions are specified at a finite location. Contrary to Example 7.8 the boundary conditions are mixed. This means that we have so called Dirichlet (variables) and von
IV - Chapter 7: Ordinary Differential Equations
299
Neumann (derivatives) boundary conditions. Solution 7.9. The first step of solving Equation (7.103) is the reformulation in terms of initial values with a second order derivative at x 0. This second order derivative as initial condition should be determined in such a way that the boundary condition u+1/ 0 is satisfied. The equation plus initial conditions reads equation
3 u+x/ xxx
x u+x/ ,x3 2 x2 5 x 30 Æ x , u+0/ 0, u
+0/ 1, u
+0/ s!
u+3/ +x/ x u+x/ Æx ,x3 2 x2 5 x 30, u+0/ 0, u
+0/ 1, u
+0/ s
By integrating this problem with a variety of initial conditions for s we can set up the condition which serves to determine F+s/ u+1/ 0. The following line generates the solutions and the final values for u+1/. This information is collected in a list of data points +s, u+1, s// sData
Table#Flatten#V, u+1/ s. NDSolve#equation s. s V, u, x, 0, 1'', V, 10, 10, 1';
The points found are graphically represented in Figure 7.17 representing the relation F+s/ u+1, s/. The graph shows that for the specified initial values a linear relation exists with a single root at s 0.
4
u+1,s/
2
0 2 4 10
5
0
5
10
s Figure 7.17. Graph of the relation F+s/ u+0/ 0, u+1/ 0, and u ' +0/ 1.
u+1, s/ for the boundary value problem u ''' +x/ x u+x/
,x3 2 x2 5 x 30 Æx , with
The data points found for F+s/ are used to generate an interpolation function to increase the accuracy of the root finding procedure needed to locate the initial conditions. F
Interpolation#sData'
InterpolatingFunction#'
The roots of F+s/ u+1, s/ FindRoot to this equation
0 are found in the next step by applying the root finding methods of
300
Mathematics for Engineers
initialSolutions +FindRoot#F+s/ 0, s, Ó1' &/ s 1; VValues s s. initialSolutions 9.26777 108
There is only a single root for s 0. Knowing this initial value allows us to derive the solution of the boundary value problem by finally solving the initial value problem. The solution follows by inserting the initial value in the original equation and the corresponding initial conditions. solutions
Flatten#+NDSolve#equation s. s Ó1, u, x, 0, 1' &/ s VValues'
u InterpolatingFunction#'
For the given problem there exists a symbolic solution which is given by uex ,x x2 0 Æ x . The existence of the symbolic solution allows us to compare the numeric solution with the symbolic one. Especially we are able to determine the absolute error H uex un . Here uex represents the symbolic solution and un the numeric one. The following Figure 7.18 shows the local error H as a function of x. We observe that the error of our calculation lies below 108 . The variation of this error is large in the middle of the interval and becomes zero at the end points as expected. 1.4 108 1.2 108
«uex un «
1. 108 8. 109 6. 109 4. 109 2. 109 0 0.0
0.2
0.4
0.6
0.8
1.0
x Figure 7.18. Graph of the local error H with u+0/ 0, u+1/ 0, and u ' +0/ 1.Ô
uex un for the boundary value problem u ''' +x/ x u+x/
,x3 2 x2 5 x 30 Æx ,
IV - Chapter 7: Ordinary Differential Equations
301
7.6.2 The Optimization Approach Since the shooting approach needs a large number of iterations to find a compatible initial condition for the integration, we can ask for an optimized approach for the solution of a boundary problem. The solution of a standard boundary value problem by means of the shooting method is found by means of iterative techniques. These approaches use integration procedures like Runge – Kutta methods to integrate the differential equation by guessing and adjusting a parameter in such a way that a boundary value problem is transformed to an initial value problem. This is due to the fact that the integrating procedure usually solve initial value problems only. The following example will demonstrate a different approach using an optimization step to solve the problem. In fact this approach is nothing more than the shooting approach but automized. As an example assume we have the problem of a laminar flow over a flat plate. The problem, usually attributed to Blasius, represents an example of the exact solution of the Navier – Stokes equation. The mathematical formulation, originating from a similarity analysis, is expressed by a third order nonlinear ordinary differential equation with specified boundary conditions at finite and infinite points. The Blasius equation itself reads u+x/ u'' +x/ 2 u''' +x/
0
(7.104)
subject to the boundary conditions u+0/
0, u' +0/
0, and u' +/
1.
(7.105)
These three boundary conditions represent physical conditions of the problem. However, the two first conditions at x 0 can be read as initial conditions for equation (7.104). As mentioned the order of the ordinary differential equation is three. Therefore we need a third initial condition to solve a regular initial value problem. The idea here is to replace the boundary condition at infinity by an appropriate initial condition which is not specified as boundary condition. The function and its first order derivative are already defining initial conditions. Thus we are only able to use higher order derivatives to fix the initial value. The simplest case is to specify here the second order derivative at x 0 in such a way that we can satisfy the boundary condition at infinity. This means we introduce the initial condition u'' +0/
D subject to
min,+u' +/ 1/2 0.
(7.106)
This constraint allows us to formulate the original boundary value problem as an optimization problem. The constraint is subject to the condition that we minimize the distance between the boundary at infinity. Implied in the formulation is that u' +/ is obtained by integrating the differential equation (7.104). In addition equation (7.104) can be considered as a differential constraint to the solution of the minimization problem. There is another point which has to be considered if we are going to numerically integrate the differential equation (7.104). Usually we do not have access to the higher order derivatives of the function in an integration routine. Since (7.106) requires the specification of the second order derivative we need to adapt the initial condition D in our optimization process of the
Mathematics for Engineers
302
boundary condition. So, we have to make sure in our calculations that the needed derivatives are available. One way to solve this problem is that for the integration of the initial equations we have to express them in a system of first order differential equations. This needs a transformation of an nth order differential equation to a system of n first order differential equations. This transformation is always possible if we introduce n new variables assigned to the function and higher order derivatives. To carry out the transformation to a system of first order ordinary differential equations in Mathematica we have to apply the next steps to the original boundary value problem. First we define the equation by equation
2
3 u+x/ xxx
u+x/
2 u+x/ xx
0
2 u+3/ +x/ u+x/ u
+x/ 0
The next line solves the original equation with respect to the highest derivative solHigh
Flatten%Solve%equation,
u+3/ +x/
1 2
3 u+x/ xxx
))
u+x/ u
+x/!
Then we introduce new variable names to set up the system of first order ordinary differential equations varsName Table#ToExpression#"u" ! ToString#i'', i, 1, 3'; depVars +Ó1 x &/ s varsName u1+x/, u2+x/, u3+x/
The first order derivatives of these variables are needed as the left hand side of the system of first order differential equations deriVars
depVars x
u1
+x/, u2
+x/, u3
+x/
In our original equation there occur derivatives up to order three. We actually need only the first two derivatives in representing the transformation from the original variable to the new variables deriReplace
Thread%Table%
i u+x/ xi
, i, 0, 2) depVars)
u+x/ u1+x/, u
+x/ u2+x/, u
+x/ u3+x/
The original derivatives up to order three are needed in generating the system of first order differential equations. We use the first order derivatives of the new variables on the left hand side and set these equal to the derivatives of the original variable with increasing order. The derivatives of u itself are replaced by the expressions in new variables as collected above
IV - Chapter 7: Ordinary Differential Equations
origDeri
Table%
i u+x/ xi
303
, i, 1, 3)
u
+x/, u
+x/, u+3/ +x/ systemOfEquations Thread#deriVars origDeri' s. solHigh s. Thread%Table% u1
+x/ u2+x/, u2
+x/ u3+x/, u3
+x/
1 2
i u+x/ xi
, i, 0, 2) depVars)
u1+x/ u3+x/!
The resulting system of equations is a nonlinear system in variables uk with k 1, 2, 3. The corresponding initial conditions are derived with the help of initial conditions in the new variables defined by d1
deriReplace s. x 0
u+0/ u1+0/, u
+0/ u2+0/, u
+0/ u3+0/
Replacing the old conditions by the new one we find the needed initial equations for integrating the first order system initialValues
u+0/ 0, u
+0/ 0, u
+0/ D s. d1
u1+0/ 0, u2+0/ 0, u3+0/ D
The needed information of the integration problem is collected in the following function which performs a numerical integration provided the parameter D E is specified as a numeric value. solU$E_ ? NumericQ, xend_, logicSwitch_( : Block#equations, solution, sol, sol1, + collect the equations and the initial values / equations Flatten#systemOfEquations, initialValues' s. D E; + integrate the initial problem / solution NDSolve#equations, varsName, x, 0, xend'; + extract the solution / sol Flatten#u2#x' s. solution s. x xend'; sol1 solution; If#logicSwitch, sol, sol1' '
The function returns as a result either the final value for the dependent variable u or the complete solution. The returned information is specified by a logical switch with values True and False. For True the final value and for False the complete solution is returned. The cost function of our minimization problem is c function used in the optimization step
+u' +/ 1/2 represented by the following
Mathematics for Engineers
304
optimalInitials$boundary_List, E_, xEnd_( : Block#vec, + find the solution / vec solU#E, xEnd, True'; + cost function / +vec boundary/.+vec boundary/ '
The actual minimization can be done by the Mathematica function FindMinimum which delivers the minimal value and the parameter estimation for the initial condition startInitial
FindMinimum#optimalInitials+1, D, 10/, D'
1.08732 1018 , D 0.332057
Using this value for D in the final numerical integration of the system of first order differential equations we get the solution of the boundary value problem solU+D, 10, False/ s. Last#startInitial'
boundarySolution
u1 InterpolatingFunction#', u2 InterpolatingFunction#', u3 InterpolatingFunction#'
The graphical representation of the result shows how the solution, the first and second order derivative of the solution behaves. It is obvious from Figure 7.19 that the boundary conditions are satisfied for the solution. Even the boundary at infinity is represented in a correct way. 2.0
u1 , u2 , u3
1.5
1.0
0.5
0.0
0
2
4
6
8
10
x Figure 7.19. Solution for the boundary value problem u+x/ u '' 2 u ''' 0 with u+0/ 0, u ' +0/ 0, and u ' +/ 1 known as Blasius problem. Shown are the numerical approximation for u1 u, u2 u ', and u3 u '' shown in blue, red, green, respectively.Ô
This example demonstrates that the optimization approach is straight-forward and delivers reliable results. However, the amount of programming and the transformation of the original boundary value
IV - Chapter 7: Ordinary Differential Equations
305
problem to a system needs some additional work. The most important point is the loss of specific control of the initial conditions. The following example will demonstrate this for a similar problem as discussed above. Example 7.10. Optimization Approach Similar to the Blasius equation there is another famous boundary layer equation which was derived by Falkner and Skan [9]. The Falkner – Skan equation is also a third order ordinary differential equation which incorporates two parameters related to the physical conditions. The Falkner – Skan equation in the scaled variable u is given by D u+x/
falknerSkan
2 u+x/ xx
3 u+x/ xxx
E 1
u+x/
2
x
0
u+3/ +x/ D u+x/ u
+x/ E ,1 u
+x/2 0 0
where D and E are constants. This equation should be solved subject to the boundary conditions boundaryConditions
u+0/ 0, u
+0/ 0, u
+/ 1
u+0/ 0, u
+0/ 0, u
+/ 1
In fact the Falkner– Skan equation can be simplified by assuming D 1 and E 2 m s +m 1/ where m is the exponent in the power law of the boundary velocity [29]. The exponent m determines a critical parameter during the integration. Using the original notation the Falkner Skan equation can be simplified by the following transformations falknerSkanSim 2 m ,1 u
+x/2 0 m1
D u+x/
2 u+x/ xx
3 u+x/ xxx
E 1
u+x/
2
x
0 s. D 1, E
2m m1
!
u+3/ +x/ u+x/ u
+x/ 0
Solution 7.10. Since the type of the equation is similar to Blasius' equation we can use the procedures applied to this equation with minor modifications. First we solve the Falkner –Skan equation with respect to the highest derivative solHigh
Flatten%Solve%falknerSkanSim,
u+3/ #x'
3 u+x/ xxx
))
2 m 2 m u
#x'2 u#x' u
#x' m u#x' u
#x' 1m
!
Then we introduce new variable names to set up the system of first order ordinary differential equations varsName Table#ToExpression#"u" ! ToString#i'', i, 1, 3'; depVars +Ó1 x &/ s varsName u1#x', u2#x', u3#x'
Mathematics for Engineers
306
The first order derivatives of these variables are needed as the left hand side of the system of first order differential equations depVars
deriVars
x
u1 +x/, u2 +x/, u3
+x/
In our original equation there occur derivatives up to order three. We actually need only the first two derivatives in representing the transformation from the original variable to the new variables deriReplace
Thread%Table%
i u+x/ xi
, i, 0, 2) depVars)
u+x/ u1+x/, u
+x/ u2+x/, u
+x/ u3+x/
The original derivatives up to order three are needed in generating the system of first order differential equations. We use the first order derivatives of the new variables on the left hand side and set these equal to the derivatives of the original variable with increasing order. The derivatives of u itself are replaced by the expressions in new variables as collected in the previous lists origDeri
Table%
i u+x/ xi
, i, 1, 3)
u
+x/, u
+x/, u+3/ +x/ systemOfEquations Thread#deriVars origDeri' s. solHigh s. Thread%Table% u1
+x/ u2+x/, u2
+x/ u3+x/, u3
+x/
i u+x/ xi
, i, 0, 2) depVars)
m u1+x/ u3+x/ 2 m u2+x/2 2 m u1+x/ u3+x/ m1
!
The resulting system of equations is a nonlinear system in variables uk with k 1, 2, 3. The corresponding initial conditions are derived with the help of initial conditions in the new variables defined by d1
deriReplace s. x 0
u+0/ u1+0/, u
+0/ u2+0/, u
+0/ u3+0/
Replacing the old conditions by the new one we find the needed initial equations for integrating the first order system initialValues
u+0/ 0, u
+0/ 0, u
+0/ D s. d1
u1+0/ 0, u2+0/ 0, u3+0/ D
The needed information of the integration problem is collected in the following function which performs a numerical integration provided the parameter D E is specified as a numeric value. We
IV - Chapter 7: Ordinary Differential Equations
307
also have to add the parameter m as an argument of our integration function because the equation itself depend on the parameter m. solUFS$E_ ? NumericQ, P_, xend_, logicSwitch_( : Block#equations, solution, sol, sol1, + collect the equations and the initial values / equations Flatten#systemOfEquations, initialValues' s. D E, m P; + integrate the initial problem / solution NDSolve#equations, varsName, x, 0, xend'; + extract the solution / sol Flatten#u2#x' s. solution s. x xend'; sol1 solution; If#logicSwitch, sol, sol1' '
The cost function of our minimization problem is c function used in the optimization step
+u' +/ 1/2 represented by the following
optimalInitials$boundary_List, E_, P_, xEnd_( : Block#vec, + find the solution / vec solUFS#E, P, xEnd, True'; + cost function / +vec boundary/.+vec boundary/ '
The actual minimization can be done by Mathematica in a two-step process: first determining the optimal initial conditions and second find the solution to this initial condition. These two steps are incorporated in the function given in the next line mVariation$m_( : Block#startInitial, startInitial FindMinimum#optimalInitials#1, D, m, 5', D'; Print#"startInitial ", startInitial'; boundarySolution solUFS#D, m, 5, False' s. Last#startInitial' '
Since we are interested in the behavior of the velocity given by u2 in our set of equations, under the variation of the parameter m, we generate the values by Schlichting [29] data1
Map#Ó, mVariation#Ó' &, .091, 0.065, 0, 1 s 9, 1 s 3, 3 s 4';
The graphical representation of the result shows how the velocity u' +x/ u2 +x/ changes if m is changed. It is obvious from Figure 7.20 that the boundary conditions are satisfied for the choice of parameters specified.
Mathematics for Engineers
308
1.2 1.0
u2
0.8 0.6 0.4 0.2 0.0
0
1
2
3
4
5
x Figure 7.20. Solution for the boundary value problem ,1 +u '/2 0
2m m1
u+x/ u '' u '''
known as Falkner Skan problem. Shown are the numerical approximation for u2 m ± .091, 0.065, 0, 1 s 9, 1 s 3, 3 s 4, respectively.
0 with u+0/
0, u ' +0/
0, and u ' +/
1
u ' for Different values of
However if we change the parameter values m to values larger than 1 or smaller than 1 s 10, we observe that the method is not any more generating results which allow a consistent physical interpretation. Figure 7.21 shows three solutions for m ! 1 and two for m 1 s 10.
1.0 0.5
u2
0.0 0.5 1.0 1.5 2.0
0
1
2
3
4
5
x Figure 7.21. Solution for the boundary value problem ,1 +u '/2 0
2m m1
u+x/ u '' u '''
known as Falkner Skan problem. Shown are the numerical approximation for u2 m ± .15, .1, 1, 2, 4, respectively.
0 with u+0/
0, u ' +0/
0, and u ' +/
u ' for Different values of
The reason for this non physical behavior is that there exist not only a single solution for the boundary value problem but two. The multiple solutions occur because the boundary value problem is not uniquely solvable for values m W 0.35. For this range of values we always have two possible solu-
1
IV - Chapter 7: Ordinary Differential Equations
309
tions for the boundary value problem. The same happens for negative values of m. This means that the optimization method will find only one of the two solutions which might be the wrong one.Ô This example demonstrates that due to the possible multiple solution structure of boundary value problems we should take care of the choice of initial values. If the boundary value problem allows multiple solution the optimization approach should also have information about the correct choice of the initial value. This information needs more efforts in an algorithm to be detected. To avoid this kind of problem we can still take another approach which is completely different from the approaches discussed so far. The following section will discuss an approach based on finite differences.
7.6.3 The Finite Difference Approach We assume that we have a linear differential equation of order greater than one, with conditions specified at the end points of an interval #a, b'. We divide the interval #a, b' into N equal parts of width h. We set x0 a, xN b, and we define xk
x0 k h with k
1, 2, …, N 1
(7.107)
In Mathematica we introduce the simple transformation rule d0
x h k x0
x h k x0
for the interior mesh points. The corresponding values of u at these mesh points are denoted by uk
u+x0 k h/ with k
0, 1, 2, …, N.
(7.108)
To solve a boundary value problem by the method of finite differences, every derivative appearing in the equation, as well as in the boundary conditions, are replaced by an appropriate difference approximation. Central differences are usually preferred because they lead to greater accuracy. Some typical central-difference approximations are defined as replacement rules in the following Mathematica expressions for the first and second order derivatives d1
u
x Ì
u
x Ì
u+h +k 1/ x0/ u+h +k 1/ x0/ 2h
u+h +k 1/ x0/ u+h +k 1/ x0/ 2h
The second order derivative is replaced by d2
u
x Ì
u
x Ì
u+h +k 1/ x0/ u+h +k 1/ x0/ 2 u+h k x0/ h2
u+h +k 1/ x0/ u+h +k 1/ x0/ 2 u+h k x0/ h2
In each case the finite difference representation is an O,h2 0 approximation to the respective derivative.
310
Mathematics for Engineers
To illustrate the procedure, we consider the linear second order differential equation equation
2 u+x/ xx
f +x/
u+x/ x
g+x/ u+x/ q+x/
f +x/ u
+x/ g+x/ u+x/ u
+x/ q+x/
where f , g, and q are continuous functions of x. Under the boundary conditions boundaries
u+x0/ D, u+xN/ E
u+x0/ D, u+xN/ E
The finite difference approximation to this equation now follows by applying the transformations of the derivatives and the discretization of the variable to our equation. This results to discreteEquation
Simplify#equation s. d2 s. d1 s. d0 s. u_#h m_ x0' u+m/'
u+k 1/ +2 h f +k// u+k 1/ +h f +k/ 2/ 2 u+k/ ,h2 g+k/ 20 2 h2
q+k/
In this expression we also replace the discrete independent variable x0 k h by k itself to get variables depending only on the discrete index as uk u+x0 k h/. From this discretized version of the original differential equation we can derive a set of equations for the variables uk with k 1, 2, …, N 1. For our equation let us assume that N 8 so that the system of equations is generated by systemOfEquations
Table#discreteEquation, k, 1, 7' s. u+0/ D, u+8/ E
u+2/ + f +1/ h 2/ D +2 f +1/ h/ 2 u+1/ ,g+1/ h2 20 2 h2
q+1/,
u+1/ +2 f +2/ h/ u+3/ + f +2/ h 2/ 2 u+2/ ,g+2/ h2 20 2 h2 u+2/ +2 f +3/ h/ u+4/ + f +3/ h 2/ 2 u+3/ ,g+3/ h2 20 2 h2 u+3/ +2 f +4/ h/ u+5/ + f +4/ h 2/ 2 u+4/ ,g+4/ h2 20 2 h2 u+4/ +2 f +5/ h/ u+6/ + f +5/ h 2/ 2 u+5/ ,g+5/ h2 20 2 h2 u+5/ +2 f +6/ h/ u+7/ + f +6/ h 2/ 2 u+6/ ,g+6/ h2 20 2 h2 u+6/ +2 f +7/ h/ E + f +7/ h 2/ 2 u+7/ ,g+7/ h2 20 2 h2
q+2/, q+3/, q+4/, q+5/, q+6/,
q+7/!
IV - Chapter 7: Ordinary Differential Equations
D and uN
In this set of equation we replaced u0
311
E.
The coefficients of uk in this set can, of course, be computed since f +x/, g+x/, and q+x/ are known functions of x. This linear system can now be solved by any of the methods discussed in Chapter 6. In matrix form we have A.u k
(7.109)
b
where u k +u1 , u2 , …, uN1 / representing the vector of unknowns; b representing the vector of known quantities on the right hand side of our discretized equation; and A, the matrix of coefficients. The matrix A in this case is tridiagonal and of order N 1. It has the special form t3 Normal#CoefficientArrays#systemOfEquations, Table#u+k/, k, 1, 7'''; MatrixForm#t3327' g+1/ h2 2
f +1/ h2
h2
2 h2
2 f +2/ h
g+2/ h 2
2 h2
h2
2 h2
2 f +3/ h
g+3/ h2 2
f +3/ h2
2 h2
h2
2 h2
2 f +4/ h
g+4/ h 2
2 h2
h2
2 h2
2 f +5/ h
g+5/ h2 2
0
2
0
0
0
0
0
0
0
0
0 f +2/ h2
0 0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
f +4/ h2
2 h2
0
f +5/ h2
h2
2 h2
2 f +6/ h
g+6/ h2 2
2 h2
0
0
0 f +6/ h2
h2
2 h2
2 f +7/ h
g+7/ h2 2
2 h2
h2
The right hand side of the linear equation is also available and reads MatrixForm#t3317' D +2 f +1/ h/ 2 h2
q+1/
q+2/ q+3/ q+4/ q+5/ q+6/ E + f +7/ h2/ 2 h2
q+7/
The system A.uk b can be solved directly using algorithms from Chapter 6. We can even derive a solution symbolically for general functions f , g, and q. However, we will examine a specific equation in the following example to return to the numerical side of the problem. Example 7.11. Finite Difference Solution Solve the boundary value problem u'' +x/
u+x/ with the boundary conditions u+0/
0 and u+1/
1,
Mathematics for Engineers
312
using finite difference methods. Taking f +x/
0, g+x/
1, and q+x/
0.
Solution 7.11. We obtain the system for the determination of the unknowns uk by inserting the functions into the determining equations and fixing the step length by choosing the total number of discrete points in the interval to be Ntot
10
10
The step length h step
+b a/ s N is then given by
N%h
1 Ntot
)
h 0.1
The discrete system of equations including the boundary conditions follow by varying the discrete index k from 1 to N 1. systemOfEquations
Table#discreteEquation, k, 1, Ntot 1' s. u+0/ 0, u+Ntot/ 1
u+2/ + f +1/ h 2/ 2 u+1/ ,g+1/ h2 20 2 h2
q+1/,
u+1/ +2 f +2/ h/ u+3/ + f +2/ h 2/ 2 u+2/ ,g+2/ h2 20 2 h2 u+2/ +2 f +3/ h/ u+4/ + f +3/ h 2/ 2 u+3/ ,g+3/ h2 20 2 h2 u+3/ +2 f +4/ h/ u+5/ + f +4/ h 2/ 2 u+4/ ,g+4/ h2 20 2 h2 u+4/ +2 f +5/ h/ u+6/ + f +5/ h 2/ 2 u+5/ ,g+5/ h2 20 2 h2 u+5/ +2 f +6/ h/ u+7/ + f +6/ h 2/ 2 u+6/ ,g+6/ h2 20 2 h2 u+6/ +2 f +7/ h/ u+8/ + f +7/ h 2/ 2 u+7/ ,g+7/ h2 20 2 h2 u+7/ +2 f +8/ h/ u+9/ + f +8/ h 2/ 2 u+8/ ,g+8/ h2 20 2 h2 u+8/ +2 f +9/ h/ f +9/ h 2 u+9/ ,g+9/ h2 20 2 2 h2
q+2/, q+3/, q+4/, q+5/, q+6/, q+7/, q+8/,
q+9/!
The coefficient matrix and the right hand side are derived next
IV - Chapter 7: Ordinary Differential Equations
t3
313
Normal#CoefficientArrays#systemOfEquations, Table#u+k/, k, 1, Ntot 1''';
The next step performs the replacement of the arbitrary functions f , g, and q as well as the boundary values are specified. t4
t3 s. f +x Ì 0/, g +x Ì 1/, q +x Ì 0/ s. step s. D 0, E 1
0, 0, 0, 0, 0, 0, 0, 0, 100., 201., 100., 0, 0, 0, 0, 0, 0, 0, 100., 201., 100., 0, 0, 0, 0, 0, 0, 0, 100., 201., 100., 0, 0, 0, 0, 0, 0, 0, 100., 201., 100., 0, 0, 0, 0, 0, 0, 0, 100., 201., 100., 0, 0, 0, 0, 0, 0, 0, 100., 201., 100., 0, 0, 0, 0, 0, 0, 0, 100., 201., 100., 0, 0, 0, 0, 0, 0, 0, 100., 201., 100., 0, 0, 0, 0, 0, 0, 0, 100., 201.
The resulting data represent the left hand side of the determining equations; the matrix A and the right hand side b of the discretized linear system. This system can be solved by a linear solver like solUk
N#LinearSolve#t4327, t4317''
0.0852447, 0.171342, 0.259152, 0.349554, 0.443452, 0.541784, 0.645534, 0.75574, 0.873502
Appending the boundary values to the result we get the complete data set of the solution AppendTo#solUk, 1'; PrependTo#solUk, 0';
The data points at different locations xk are generated by using the step length and the total number of discrete points xk 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.
uk 0 0.0852447 0.171342 0.259152 0.349554 0.443452 0.541784 0.645534 0.75574 0.873502 1
The corresponding exact solution can be used to generate the same points at xk using the exact solution of our differential equation
314
Mathematics for Engineers
xk
uex k
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.
0 0.0852337 0.17132 0.259122 0.349517 0.443409 0.54174 0.645493 0.755705 0.873482 1.
The graphical representation of the two data sets are shown in the following Figure 7.22. There is in fact no difference in the graphical representation of the data. However, we can graph the error of the numerical data by calculating the deviations from the exact solution (see Figure 7.23). 1.0
0.8
u
0.6
0.4
0.2
0.0 0.0
0.2
0.4
0.6
0.8
1.0
x Figure 7.22. Solution for the boundary value problem u '' +x/ u+x/ with u+0/ 0 and u+1/ 1. Shown are the exact solution u+x/ sinh+x/ s sinh+1/ and the numerical approximation with h 1 s 10. The numerical approximations and the exact solution are not distinguishable in this graph.
IV - Chapter 7: Ordinary Differential Equations
xk 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.
315
H «uk u« 0 0.0000109852 0.0000213697 0.0000305398 0.0000378552 0.0000426355 0.0000441459 0.0000415818 0.0000340525 0.0000205642 0.
The errors are shown in the following Figure 7.23
0.00004
H
0.00003
0.00002
0.00001
0 0.0
0.2
0.4
0.6
0.8
1.0
x Figure 7.23. Error for the boundary value problem u '' +x/ u+x/ with u+0/ 0 and u+1/ 1. Shown are the magnitude of the difference between the exact solution u+x/ sinh+x/ s sinh+1/ and the numerical approximation with h 1 s 10. The error reaches a maximum of approximately 4 105 at the center of the interval. Due to this magnitude of the error, the numerical results are accurate within four digits.Ô
The finite difference method delivers reliable results especially for linear problems. Another approach similar to this discretization is the collocation method. The difference here is the approximation of the derivatives of the differential equations. The following section will discuss this method in more detail.
Mathematics for Engineers
316
7.6.4 The Collocation Method In recent years a great deal of interest has focused on approximation methods for solving boundary value problems in both one and higher dimensional cases. In those approximation methods, rather than seeking a solution at a discrete set of points, an attempt is made to find a linear combination of linearly independent functions which provide an approximation to the solution. Actually the basic ideas are very old, having originated with Galerkin and Ritz, but more recently they have taken new shape under the term "finite element" methods, and they have been refined to the point where they are now very competitive with finite difference methods [31]. We shall sketch very briefly the basic notions behind these approximation methods focusing on the so called collocation method. For simplicity we assume that we have a second order linear boundary value problem which we write in the form u'' +x/ p+x/ u+x/ q+x/ u+x/ a0 u+a/ a1 u' +a/
D
b0 u+b/ b1 u' +b/
E
Let \ j +x/, j
r+x/,
axb
(7.110) (7.111)
with
a0 b0 0.
(7.112)
1, 2, …, N, be a set of linearly independent functions to be chosen in a manner to be
described later. An approximate solution to (7.110) is then sought in the form N
U+x/
Å uk \k +x/
(7.113)
k 1
The coefficients uk in this expansion are to be chosen so as to minimize some measure of the error in satisfying the boundary value problem. Different methods arise depending on the definition of the measure of error. In the collocation method the coefficients are chosen so that U+x/ satisfies the boundary conditions (7.111-112) and the differential equation (7.110) exactly at selected points interior to the interval #a, b'. Thus the uk satisfy the equations a0 U+a/ a1 U ' +a/
D
(7.114)
b0 U+b/ b1 U ' +b/
E
(7.115)
U '' +xi / p+xi / U+xi / q+xi / U+xi /
r+xi /,
i
1, 2, …, N 2
(7.116)
where the xi are a set of distinct points on the interval #a, b'. When written out (7.114-116) is a linear system of N equations in the N unknowns uk . Once (7.114-116) is solved, by, for example, the methods of Chapter 6, its solution uk is substituted into (7.113) to obtain the desired approximate solution. The error analysis for this method is very complicated and beyond the scope of this book. In
IV - Chapter 7: Ordinary Differential Equations
317
practice one can obtain a sequence of approximations by increasing the number N of basis functions. An estimate of the accuracy can then be obtained by comparing these approximate solutions at a fixed set of points on the interval #a, b'. We turn now to a consideration of the choice of the basis functions \k +x/. They are usually chosen so as to have one or more of the following properties: 1. The \k +x/ are continuously differentiable on #a, b' 2. The \k +x/ are orthogonal along the interval #a, b', i.e., b
à \k +x/ \ j +x/ Šx
0 for j k.
a
3. The \k +x/ are "simple" functions such as polynomials or trigonometric functions 4. The \k +x/ satisfy those boundary conditions (if any) which are homogeneous. One commonly used basis in today's collocation methods is the set of shifted sinc functions xk h \k +x/ sinc, h 0 , k M , M 1, …, 0, …, N which is orthogonal over the real line. Note that sinc+x/ sin+Sx/ s +Sx/ 0. Another important basis set is \k +x/ Pk +x/, k 0, 1, 2, …, N where Pk +x/ are the Legendre polynomials described in Chapter 3. These polynomials are orthogonal along the interval #1, 1'. In addition radial functions f +x/
Æ+xk h/ are discussed. 2
Example 7.12. Collocation Method A simple example of a second order boundary value problem is equation
2 u+x/ xx
u+x/
u
+x/ u+x/
with boundaries specified as boundaries
u+0/ 0, u+1/ 1
u+0/ 0, u+1/ 1
Our target is to find the solution u
u+x/.
Solution 7.12. We select polynomials for our basis functions and we seek an approximate solution U+x/ in the form solutionTarget
u ,x Ì u1 x u2 x2 u3 x3 0
u ,x Ì u1 x u2 x2 u3 x3 0
we see that U+0/ 0 regardless of the choice of the coefficients uk 's. Since there are three coefficients we must impose three conditions on U+x/. One condition is that U+x/ must satisfy the boundary condition at x 1, hence one equation for the u2 's is
Mathematics for Engineers
318
+u+1/ s. solutionTarget/ 1
eq1
u1 u2 u3 1
We can impose two additional conditions by insisting that U+x/ satisfy the differential equation of our problem exactly at two points interior to the interval #0, 1'. We choose, for no special reason, x 1 s 4 and x 3 s 5. One computes directly that Thread% u
eq2
u1 4
u2 16
u3 64
1
3 ,u
4
0,
3 u1 5
5
! s. solutionTarget 0, 0) 9 u2 25
27 u3 125
0!
The system of equations from the boundary condition and the internal point conditions can be solved directly to yield the solution Flatten#Solve#Flatten#eq1, eq2', u1, u2, u3''
coeffSolution u1
1 2
, u2
17 6
, u3
10 3
!
Substituting these into our targeted solution yields the approximate solution of the problem solutionTarget s. coeffSolution
approximateSolution u xÌ
10 x3 3
17 x2 6
x 2
The exact solution of the problem follows by exactSolution
u x Ì
Flatten%DSolve% Æ1x ,Æ2 x 10 Æ2 1
2 u+x/ xx
u+x/, u+0/ 0, u+1/ 1!, u, x))
!
The approximation and the exact solution u+x/ 2 Æ sinh+x/ t ,Æ2 10 can now be compared by using the uex uap H as an estimation of the error. The following Figure 7.24 demonstrates the result:
IV - Chapter 7: Ordinary Differential Equations
319
0.5 0.4
H
0.3 0.2 0.1 0.0 0.0
0.2
0.4
0.6
0.8
1.0
x Figure 7.24. The error H
uex uap is shown for the equation u ''
u for u+0/
0 and u+1/
1. The accuracy at the
endpoints is obvious while in the middle of the interval the deviation becomes large.
It is obvious from Figure 7.24 that the boundary conditions are satisfied. However, the approximation in the interval shows a large deviation from the exact solution. In conclusion the basis chosen for the approximation is not the best one.Ô
7.6.5 Tests and Exercises The following two subsections serve to test your understanding of the last section. Work first on the test examples and then try to solve the exercises. 7.6.5.1 Test Problems T1. What is the difference between an initial value problem and a boundary value problem? T2. Are boundary value problems uniquely solvable? T3. Are boundary value problems uniquely solvable? T4. Are initial value problems useful to solve boundary value problems? T5. How is optimization used in solving boundary value problems?
7.6.5.2 Exercises E1. The differential equation approximating the physical situation of a bar is of the form E I w'' +x/
S w+x/
q 2
x +x l/,
(1)
where w+x/ is the deflection a distance x from the left end of the bar, and l, q, E, S, and I represent, respectively, the length of the beam, the intensity of the uniform load, the modulus of elasticity, the stress at the endpoints, and the central moment of inertia. Since no deflection occurs at the ends of the beam, we also have the two boundary conditions w+0/
w+l/
0.
(2)
Find the solution of the problem. E2. In exercise E1 the deflection of a bar with supported end points subject to uniform loading was approximated by
320
Mathematics for Engineers
equation (1). Using a more appropriate representation of curvature gives the differential equation w '' +x/
EI
S w+x/
2 3s2
,1 +w' +x// 0
q 2
x +x l/, for 0 x l,
Approximate the solution by using the shooting method with h
(3)
0.1 m. Compare your results with E1.
E3. Use the shooting method to solve the boundary value problem y '' Æx y sin+y '/
0, 1 x 2, y+1/
y+2/
0.
(4)
E4. Use the optimization method to solve the boundary value problem 2
y ''
x
y'
2 x
2
y
sin+ln+x//
1 x 2,
,
x2
y+1/
1, y+2/
2.
(5)
Compare your result with the exact solution c1 x
y+x/
c2 x2
3 10
sin+ln+x//
1 10
cos+ln+x//,
(6)
where 11 c1
10
c2
1 and c2
70
+8 12 sin+ln+2// 4 cos+ln+2///.
(7)
E5. The boundary value problem 4 +y x/,
y''
0 x 1, y+0/
0, y+1/
2,
(8)
1
has the solution y+x/ Æ2 ,Æ4 10 ,Æ2 x Æ2 x 0 x. Use the finite difference method with step lengths h h 1 s 4 to approximate the solution and compare the results to the exact solution. E6. The boundary value problem y' 2 y cos+x/, 0 x
y '' has the solution y+x/
1 10
S 2
3
, y+0/
10
1
, y+S s2/
10
1 s 2 and
(9)
+sin+x/ 3 cos+x//. Use the shooting method and the finite difference method to find
approximations for the problem. Compare the numerical results generated by step lengths of length h Ss4 and h Ss 8 with the exact solution. E7. Let u represent the electrostatic potential between two concentric metal spheres of radii R1 and R2 +R1 R2 ). The potential of the inner sphere is kept constant at V1 volts, and the potential of the outer sphere is 0 volts. The potential in the region between the two spheres is governed by Laplace's equation, which, in this particular application, reduces to d2 u dr
2
2 du r dr
0, R1 r R2 , u+R1 /
V1 , u+R2 /
0.
(10)
Suppose R1 2 cm, R2 4 cm, and V1 110 volts. a. Approximate u+2.5/ and u+3/ using the optimization method. b. Try to solve the problem also with the finite difference method. Compare your numerical results. c. Compare the numerical results of part a) and b) with the exact potential given by u+r/ E8. Consider the boundary value problem
V1 R1
R2 r
r
R2 R1
.
(11)
IV - Chapter 7: Ordinary Differential Equations
y'' y
321
0, 0 x b, y+0/
0, y+b/
B.
(12)
Find choices for b and B so that the boundary value problem has a. Exactly one solution, b. No solution, c. Infinitely many solutions. E9. The deflection of a uniformly loaded, long rectangular plate under an axial tension force is governed by a second order differential equation. Let S represent the axial force and q the intensity of the uniform load. The deflection w along the elemental length is given by q ,x2 l x0, 0 x l, w+0/ w+l/ 0, D w'' +x/ S w+x/ (13) 2 where l is the length of the plate and D is the flexual rigidity of the plate. Let q
200 N sm, S
100 N s m,
D 6.6 106 N s m, and l 50 m. Approximate the deflection at 1 m intervals with the finite difference method. E10 Use the finite difference method with h 0.5 to approximate the solution to the boundary value problem y ''
+y'/2 y ln+x/, 1 x 2, y+1/
Compare your results to the exact solution y
0, y+2/
ln+2/.
(14)
ln+x/.
E11 Try to solve the boundary value problem y'' y
x, 0 x 1, y+0/
0, y+1/
1.
(15)
by the collocation method. Start with the trial function N
yN
x Åc j sin+ j S x/
(16)
j 1
which automatically satisfies the boundary conditions for all c j 's. Try N
2 and N
4 and compare the results.
7.7 Finite Elements and Boundary Value Methods In this section, we introduce the powerful Finite element method for finding numerical approximations to the solutions of boundary value problems involving both ordinary and partial differential equations can be solved by direct integration. The method relies on the characterization of the solution as the minimizer of a suitable quadratic functional. The innovative idea is to restrict the infinite-dimensional minimization principle characterizing the exact solution to a suitably chosen finite-dimensional subspace of the function space. When properly formulated, the solution to the resulting finitedimensional minimization problem approximates the true minimizer. The finite-dimensional minimizer is found by solving the induced linear algebraic system, using either direct or iterative methods. We begin with one-dimensional boundary value problems involving ordinary differential equations, and then we show how to adapt the finite element analysis to partial differential equations, specifically the two-dimensional Laplace and Poisson equations in the next chapter.
Mathematics for Engineers
322
7.7.1 Finite Elements for Ordinary Differential Equations The characterization of the solution to a linear boundary value problem via a quadratic minimization principle inspires a very powerful and widely used numerical solution scheme, known as the finite element method. In this section, we give a brief introduction to the finite element method in the context of one-dimensional boundary value problems involving ordinary differential equations. The underlying idea is strikingly simple. We are trying to find the solution to a boundary value problem by minimizing a quadratic functional P#u' on an infinite-dimensional vector space U. The solution u ± U to this minimization problem is found by solving a differential equation subject to specified boundary conditions. However, minimizing the functional on a finite dimensional subspace W ¯ U is a problem in linear algebra, and, moreover, one that we already know how to solve! Of course, restricting the functional P#u' to the subspace W will not lead to the exact minimizer. Nevertheless, if we choose W to be a sufficiently "large" subspace, the resulting minimizer w ± W may very well provide a reasonable approximation to the actual solution u ± U. A rigorous justification of this process, under appropriate hypotheses, requires a full analysis of the finite element method, and we refer the interested reader to either Strang and Fix or Zienkiewicz and Taylor [31, 38]. Here we shall concentrate on trying to understand how to apply the method in practice. To be a bit more explicit, consider the minimization principle P#u'
1 2
«« L#u' ««2 ; f , u?
(7.117)
for the linear system K#u'
f,
L ÍL,
K
where
(7.118)
representing our boundary value problem. The norm in (7.117) is typically based on some form of a
weighted inner product