277 52 9MB
English Pages 1055 [1051] Year 2021
Luigi Morino
Mathematics and Mechanics The Interplay Volume I: The Basics
Mathematics and Mechanics - The Interplay
Luigi Morino
Mathematics and Mechanics The Interplay Volume I: The Basics
123
Luigi Morino Professor of Engineering (retired) Roma Tre University Rome, Italy Professor of Aerospace and Mechanical Engineering (retired) Boston University Boston, MA, USA
ISBN 978-3-662-63205-5 ISBN 978-3-662-63207-9 https://doi.org/10.1007/978-3-662-63207-9
(eBook)
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer-Verlag GmbH, DE part of Springer Nature. The registered company address is: Heidelberger Platz 3, 14197 Berlin, Germany
To Nancy, Federica and Francesca
Preface
This is the first of three stand–alone volumes which cover, respectively: (1) basic calculus and basic mechanics, (2) ordinary differential equations and n-particle mechanics, and (3) partial differential equations and continuum mechanics. They will be referred to as Volume I, II and III, respectively. This preface covers all three volumes, which will be referred to as the book. The title, Mathematics and Mechanics — The Interplay, may be misleading. So let me clarify that the book covers only classical mechanics. Quantum mechanics and relativistic mechanics are not dealt with in this book, and neither is statistical mechanics, an important branch of classical mechanics. On the other hand, the range of mathematical topics is focused primarily on analysis (calculus and its generalizations) and matrix algebra (including generalized eigenvectors), with some elementary notions of Euclidean and analytical geometry. The interplay is enriched by historical notes that show the interaction in the development of the two subjects.
• Inspirations for the book When I was a student in Engineering, the books Introduzione alla Fisica Matematica by Enrico Persico (Ref. [54]) and Mathematical Methods in Engineering by Theodore von K´arm´an and Maurice A. Biot (Ref. [35]) were instrumental in my scientific development, by showing me how mathematics can be used to formulate and solve practical problems in physics and engineering. Also very important were the mathematical analysis books by Francesco Giacomo Tricomi of the Turin Polytechnic, in particular Refs. [66]– [70].
vii
viii
Preface
However, a greater motivation came about fifty years ago, shortly after my arrival in the United States, when my polymath friend Larry Bucciarelli described to me an unusual freshman program he had initiated at MIT with his friend David Oliver. The name of the program was (and still is) Concourse. As I remember it, the program provided integrated courses in mathematics, mechanics and social sciences. The common theme was Celestial Mechanics in the first term (which included calculus and the Newton formulation of the Kepler laws of the planets’ motion), and Population Growth in the second (which included matrix algebra). Up to seven professors attended each lecture, with one of them being in charge and the others actively participating. I felt that Concourse was a great idea to help the students better explore the interconnection between the various disciplines. I tried to follow such an approach by integrating mathematics and mechanics in my own courses, such as a sophomore course on mathematical mechanics, junior/senior courses on mathematical methods in engineering, and graduate courses on advanced mathematical physics and fluid–solid interaction, courses that are the basis for this book. I like to say that Concourse was my Urania, the ancient Greek muse of Astronomy, the only muse connected with science and mathematics — she is often represented carrying a celestial globe and a pair of compasses.
• The interplay This book is written in the same vein. Accordingly, its main objective is to present the interplay between mathematics and mechanics. Indeed, such interplay, like love, is a two–way street. This may be appreciated if we examine their development from a historical viewpoint. Initially, mechanics provided the stimulus for the development of new mathematical tools. For instance, Newton developed calculus to be able to explain the Kepler laws of the motion of the planets around the Sun. Later on, there have been cases in which mathematics was developed before mechanics. For instance, the Italian mathematician Gregorio Ricci-Curbastro and then his pupil Tullio Levi-Civita introduced, at the end of the nineteenth century, absolute differential calculus, now better known as tensor analysis, as a purely mathematical exercise (Ref. [41]). Their formulation was crucial for the development of the Einstein theory of general relativity (Ref. [19]). To further illustrate this point, the book includes historical notes, which show how the same people who were studying physics (in particular mechanics) were also instrumental in developing mathematics as a means to understand the world in which we live. The names of Archimedes, Galileo, Newton, Euler, Lagrange and Gauss pop to mind. I don’t make any claim
Preface
ix
to be an expert on the history of mathematics or mechanics. Nonetheless, I think that you might like learning a bit about the historical context of the development of the interplay between the two. Accordingly, I have included short bio–sketches of all the scientists encountered. To emphasize the interplay, materials on mathematics and those on mechanics are intertwined. However, I believe it to be important to emphasize the distinction between mathematics and mechanics and to keep the two separate in your mind. Therefore, each chapter deals primarily with either mathematics or mechanics. Typically a chapter on mathematics provides the tools required for a chapter on mechanics that follows; for instance, derivatives are introduced before discussing acceleration in dynamics, and vectors are introduced before discussing forces in three–dimensional statics. Vice versa, a chapter on mechanics often anticipates mathematical concepts that are then addressed in a rigorous way in a mathematical chapter that follows; for instance, the solutions for the mass–spring and mass–spring–damper equations are introduced before the chapter on differential equations (in Vol. II) and provide both the motivation and the basic ideas behind such a mathematical tool.
• Features of the book This book is not intended as a textbook for a specific course. Indeed, the book covers material that, as a student, I had in about thirty one–semester courses. On the contrary, it is meant as a reference text that complements textbooks for specific courses, so as to provide the interconnection between the corresponding subject matters. Specifically, the book is meant to give you an adequate grasp of mathematics and mechanics, so as to make you capable of understanding the literature on those subjects that may be of interest to you. Also, I start from the very beginning, namely with the arithmetic for natural numbers, integers, rationals and real numbers, so as to make the book self–contained. Then, the level of mathematical sophistication is gradually increased, so as to span from elementary school all the way to graduate–level courses. One feature of the book I have been particularly careful with is to provide the proof of every single statement that is needed for the understanding of the book. In particular, I banned the expression “It may be shown that . . . ,” with the exception of those results that are not used later on in the book, which are provided only for your information. Granted, some of my mathematician friends will be horrified at some of the “proofs” I give. However, this is true
x
Preface
only for the high–school material, in particular in the first chapter, where I use intuition and/or illustrative examples rather than rigor, pretty much as expected in high–school. In more advanced chapters, sometimes I prefer to start with an intuitive introduction. When I do this, I warn you and add a second proof — a rigorous one this time. The sequence of the subjects is not dictated by historical considerations, but only by the desire to postpone complicated material as much as possible, akin to a course sequence in a well–organized curriculum. Thus, systems of algebraic equations (dealt with the so–called Gaussian elimination technique), a relatively simple subject, is introduced first, followed by statics in one dimension. Specifically, you could see mechanics as being the driver (namely that the sequence of topics is dictated by mechanics), with the required mathematics introduced as needed. On the other hand, one can conceive of mathematics as the driver, with the mechanics’ chapters that follow providing illustrative examples of what to do with the math. In any case, the two are presented at an increasing level of complexity: basic calculus along with n-particle and rigid–body dynamics, with solutions of some simple problems (Volume I); systems of differential equations along with n-particle dynamics, with methods of solution for linear dynamics problems (Vol. II); multivariate calculus along with continuum mechanics, with application to fluids and solids (Vol. III).
• Gut–level mathematics It is a common belief among some pure mathematicians that what’s important in teaching mathematics is to instill rigor in young minds. However, there are always two sides to the same coin. This is true for mathematics as well. We have to distinguish between pure mathematics and applied mathematics, in particular, mathematical physics. In the first case, we address mathematics per se, for its own beauty, irrespective of its usefulness in possible applications. Some advocate the use of such an approach to teach mathematics even in high school, feeling that a different approach would take away the beauty and purity of mathematics. I beg to differ, strongly, with the proponents of such an approach! [I know, I’m not the only one.] While I fully share their point of view in reference to teaching students that are especially gifted in pure mathematics (for whom appropriate advanced–placement courses should always be available), I consider such an approach to be the main cause for the disaffection (or plain aversion) that way too many people have towards mathematics, with the dis-
Preface
xi
astrous consequence of a negative attitude towards science and technology. Indeed, once students are turned off, they are going to lose interest in the subject matter (“to hate it,” as they would put it), and memorize just what is needed to pass the course. On the contrary, in my half–century experience as a math teacher for non– mathematicians, I found that the other side of the coin (namely a simple intuitive approach, with plenty of enjoyable examples and useful applications), is more advantageous in introducing mathematics. Indeed, the intuitive approach is much more attractive than a stern, rigorous one. This is the reason why I stay away from the axiomatic approach. I will refer to such an approach as “gut–level mathematics.”1
• Creativity and rigor But . . . ! There is more to the story. In general, having a gut–level grasp of mathematics helps in absorbing it, and hence in retaining it. Even more important, the gut–level approach helps with creativity in mathematics. Specifically, one may have an intuition (like a light bulb turning itself on in your brain, say, when you are falling asleep), such as: “There’s got to be a theorem that states such and such.” In my experience, this is facilitated by the gut–level approach used here. In adopting such an approach I was influenced by The Farther Reaches of Human Nature by Abraham H. Maslow (Ref. [46], pp. 81–95). In his account, creativity (not just in the mathematical and physical sciences, but in any field, including literature, visual arts and music) may be conceived as composed of two separate phases. First, you come up with an idea (primary phase of creativity), and then you implement/elaborate upon it (secondary phase of creativity), in particular you make sure that your idea is correct. Intuition is essential in the primary phase, rigor in the secondary one. Thus, mathematicians may obtain a hunch that there is a theorem such and such, 1
I avoid using the term “intuitive mathematics,” in place of “gut–level mathematics”, because I don’t want to create confusion with the term intuitionism. Intuitionism is a school of thought in mathematics spearheaded by the Dutch mathematician Luitzen E. J. Brouwer (1881–1966; the Brouwer fixed point theorem is named after him), in direct opposition to the school of thought known as formalism, headed by the German mathematician David Hilbert. Intuitionism does not accept the “principle of excluded middle,” also known as “tertium non datur,” Latin for “no third (possibility) is given.” If you want to know more on the subject, see Chapter 9 (Hilbert versus Brouwer; Formalism versus Intuitionism) of the book by Hal Hellman, Ref. [30], p. 179–199. Intuitionism became part of a larger movement called Constructivism. [If you want to know more on the subject, see Chapter 10, “Absolutists/Platonists versus Fallibilists/Constructivists. Are Mathematical Advances Discoveries or Inventions?” again of the book by Hal Hellman (Ref. [30], p. 200–214).]
xii
Preface
and that the proof may be obtained following a certain procedure. This might emerge in their minds in the middle of the night, when they are half awake. The next morning (or the next few days, sometimes in the next few years), they make sure that the intuition is correct and develop a whole new formulation around it. For this, they need rigor. Accordingly, rigor is also important. The next morning, after you had your brilliant idea in the middle of the night, you roll up your sleeves, use all the rigor you can muster, and make sure that your intuition is correct (and sometimes it is not). For this reason, while emphasizing the intuitive approach, I will also be rigorous, with the exception of the above–mentioned cases.
• What do I assume you know? Where do I start from? In view of the stated objective of starting at a very elementary level, in this book no prior knowledge regarding mechanics is required. On the other hand, I will assume as a starting point that you are familiar with the basic mathematical concepts taught in high school. However, to make this book self–contained, in the beginning I review (and prove or illustrate) all the elementary material that you need to read this book, material that you had in high school, and some even earlier. In other words, I cover the material from the ground up. In summary, all you need is a spirit of scientific adventure, combined with patience, determination and perseverance. If you enjoy math and mechanics with their applications, and always wanted to know more about them, the journey will be rewarding — from pre– calculus to advanced applicable mathematics, and from no physics whatsoever to continuum mechanics of fluids and solids. But even if you have been turned off in the past, you might find out that the book’s subject matter is not as difficult as you might fear, and that the material in this book might even be fun and funny. As I used to say to my classes, I do not promise you Nirvana — I only promise you a rose garden. You might encounter a few thorns along the way, but in the end you’ll find out that it was worth the effort.
Acknowledgements
For the approach used in this book, I am strongly indebted to the professors I had in my freshman and sophomore years in college, Prof. Gaetano Fichera for Mathematical Analysis I and II, Prof. Edoardo Amaldi for Physics I and II, and Prof. Antonio Signorini for Analytical Mechanics, as well as Prof. Antonio Eula for graduate courses in Aerodynamics and my doctoral thesis advisor Prof. Paolo Santini. I am especially grateful to my old friends Prof. Tom Lardner of the University of Massachusetts, who read the whole first draft of the book and gave me valuable suggestions which are incorporated in the final draft, and Prof. Larry Bucciarelli of MIT, who not only inspired me with his course Concourse, but also provided me with a valuable feedback on the final draft. I also wish to acknowledge the contributions by my friends Profs. Dan Keppler and Jerry Milgram also of MIT, Profs. Marvin Freedman, Guido Sandri, Ray Nagem and Sheryl Grace of Boston University, Prof. Doug Sondak of Wentworth Institute of Technology, Profs. Aldo Frediani and Franco Giannessi of the University of Pisa and Prof. Luciano Iess of the University of Rome La Sapienza. I am also indebted to Dr. Kadin Tseng from Boston University, Profs. Massimo Gennaretti, Umberto Iemma and Giovanni Bernardini of the University Roma Tre, and Prof. Franco Mastroddi of the University of Roma La Sapienza, my traveling companions in our joint quest for a deeper understanding of the interplay between mathematics and mechanics. Special thanks go to my friends Dr. Cecilia Leotardi of the Italian National Research Council, Laura Bates of the University of St. Andrews and Alessandra Parlato, who were instrumental in the early phase of the writing, along with Dr. Fabio Cetta of ENI and Dr. Emil Suciu, for their careful editing.
xiii
xiv
Acknowledgements
I also wish to thank Dr. Thomas Ditzinger and Holder Sch¨ ape of Springer Verlag for their patient support and their valuable suggestions, which enhanced considerably the readability of the book. I should add that all of the students I taught over fifty odd years, through their questions and requests for clarification, contributed to my understanding of the difficulties encountered in learning the interplay between mathematics and mechanics, and hence to the style of the book. I have learned from them as much as they learned from me. Thanks to all of you. Especially appreciated are the contributions of my daughters, Francesca and Federica, and my nephews, Michael and Alex Roche, who read the pre– calculus portions of the manuscript and provided me with valuable feedback regarding the clarity (or lack thereof) of the introductory material from the point of view of somebody who, at that time, had only completed their bachelor’s degrees. Last, but not least, I wish to thank my wife Nancy, not just for her patience and understanding, especially while I was writing the book, but also for carefully reading the whole volume and patiently improving my English writing skills.
User Guide
The objectives stated in the Preface placed a lot of constraints on the structure of the book. This forced me to adopt some unorthodox writing techniques, which I want you to be aware of before reading this book.
• Structure of the book The book stems from a collection of lecture notes for several courses I taught in mathematics and mechanics over the course of several years, at Boston University and the University Rome III, specifically, a sophomore course in analytical mechanics, junior courses in mathematical methods in engineering (complex series, linear algebra, systems of differential equations, multivariate calculus) and graduate courses in advanced analysis and theoretical mechanics. The material on structural dynamics and aerodynamics stems from my aeronautical engineering course on aeroelasticity. In merging these sets of notes, I made sure to provide the proof for every relevant statement in the book. In addition, I made sure that any statement was introduced before it was used, so as to avoid circular proofs. Ditto for definitions. [Minor exceptions occur. In these cases, you are informed and motivation is provided.] As stated in the Preface, the book consists of three volumes. The three volumes are independent of each other, although one begins where the preceding finishes, so as to avoid superpositions. They primarily cover, respectively, (I) basics calculus and mechanics (II) ordinary differential equations and n-particle mechanics, (III) partial differential equations and continuum mechanics.
xv
xvi
User Guide
Let us focus on the present one, namely Volume I. This is, in turn, divided into three parts. The first covers pre–calculus, specifically arithmetic, linear algebraic systems, elementary matrix algebra, one–dimensional statics, Euclidean geometry, functions, and analytical geometry. The second covers limits, derivatives, integrals, one–dimensional dynamics, logarithms and exponentials, Taylor polynomials, and the effects of damping on dynamics. The third covers vectors in physics, more on matrices, three–dimensional statics, multivariate calculus, three–dimensional single–particle and n-particle dynamics (including the Kepler laws and the dynamics of the Sun–Earth– Moon system), apparent forces in non–inertial frames of reference (including an analysis of tides), and rigid–body dynamics. [The level of this material is compatible with the freshman and sophomore years in mathematics, physics or engineering.]
• The symbols
♦♥♣♠
and more
As mentioned in the Preface, to make your life simpler, I tried to avoid introducing complicated material if unnecessary. For this reason, mathematics related to linear algebraic systems, matrices and linear algebra (often covered in a single chapter in standard textbooks) is spread over five separate chapters, three of which are in this volume. This allows me to increase gradually the level of complexity of the material covered, and at the same time, show how mathematics and mechanics are interrelated. However, for organizational purposes, sometimes I found it convenient to group simple material with more complicated ones, so as to avoid having a very short chapter. In these cases, I introduced a simple way to indicate when the material was introduced before it is needed. At other times, for the sake of completeness, I introduced material at a level that some of you might find much too elementary. To help you in distinguishing the level being covered, I used the symbols ♦ ♥ ♣ ♠ to indicate material that might be too simple for you, or too difficult (to be postponed, or to be skipped altogether, or to be read only for your enjoyment). Specifically, ♦
: The material is very elementary, and I included it because I always hated so–called introductory books that assume that I know much more than what I actually do. It is for those of you that have been out of high school for a while and feel rusty on the subject matter. Indeed, ♦ is used primarily in Chapter 1. I recommend that you glance at it and skip it if you realize that it is something quite familiar to you from high school. [In the portions marked with ♦ , I often use illustrative examples instead of proofs or axioms.]
User Guide
xvii
♥
: Extra material that I believe you might enjoy. The reason to include it is to give some flavor to the presentation, to spice it up, so to speak. It is like adding a yummy piquant sauce to an otherwise bland pasta dish. I refer to these items as “divertissements.” However, some of this material might be a bit complicated. My recommendation is that you work on any material of this kind only if you: (i) enjoy it, and (ii) do not find it too difficult to follow, as the material is virtually inessential to the understanding of the rest of the book. ♣ : I placed more difficult material at that point of the book simply to group it with material of the same nature, rather than inserting it when it is needed, thereby spreading it into bits and pieces. However, I deem it too difficult for you at that point in your journey. Nonetheless, it is used later in the book, and sooner or later you might want to read it. For the moment, it can wait. I recommend that you postpone it and read only when it is referenced. ♠ : The material is typically quite difficult and not required for the understanding of the rest of the book. It is included to address some issues at a deeper level, even though I deem the material not essential, in that it is not gut–level enough to provide a better grasp of the material. [Sometimes it is used just in connection with a difficult and formal proof of a theorem.] I didn’t make much of an effort to reduce its complexity. I recommend that you skip this material altogether. You might want to go back to it, whenever you feel ready for it and ... can no longer keep your curiosity under control! ◦ Warning. Subsubsections are marked with the symbols ♦ ♥ ♣ ♠ only when they differ from the corresponding subsection. We have two more symbols, namely:
: This material is of historical interest. It is used in particular for footnotes giving short biographies of scientists mentioned in the text. [Indeed, I included a bio–sketch of each scientist encountered — sometimes, just the nationality and the dates of birth and death. If you are not interested in the historical interplay between mathematics and mechanics, just skip these portions. : This material deals with the meaning or the root of a term. It helped my students to retain the meaning of the term and I hope that this will help you as well. Again, if you are not interested, just skip it. These are used primarily for footnotes. Indeed, the footnotes are only of the two above kinds. If you are not interested in these matters, you may skip all the footnotes.
xviii
User Guide
Warning In the index, the footnotes dealing with biographies are listed by the last name of the scientist, followed by their first name. A few more considerations are warranted. “Comment” is used at the beginning of a parenthetical sentence. Comments provide you with a deeper analysis of the material under consideration. If I need to reference such material, I use Remark instead. If I want to bring your attention to a comment that I deem particularly important, I will use Warning. If the parenthetical sentence is short, I use brackets, namely [. . . ]. If this is used within a Definition, it provides you with considerations, which — strictly speaking — are not part of the definition. The case in which a sentence in brackets begins with “Hint: ” warrants a few extra words. Sometimes, I let you work out the details of a proof. If you are unable to accomplish that, you may use the help provided, which is indicated by “[Hint: . . . ].” Specifically, if you want to improve your mathematical skills, I recommend that you try to solve the corresponding problem and use the hints only if you cannot find the solution. In other words, you may conceive of the hints as “Solutions to exercises.” [Indeed, you will notice that no exercises are proposed in this book. These are inserted in the text. Whenever you encounter either “as you may verify,” or “Why?” that is an exercise that you may wish to address, so as to sharpen your grasp of the material. No exercise requires more than what has been covered before.] ◦ Warning. As stated in the Preface, in this book I presume that you need a refresher from high (or even elementary) school. However, I want you to be aware that you may start reading this book wherever you feel comfortable. [In any case, please read Section 1.1, on the existence and uniqueness of the solution of linear algebraic equations.] Indeed, skipping some portions of the book should not cause you any problem, because you may easily identify the material you need, through the reference I make to such material. Specifically, in the spirit of making your life easy, whenever appropriate I have been very meticulous (somebody might say excessively so) in referencing the corresponding material that was covered in earlier pages of the book, by providing you with the number of the appropriate equation, section, theorem, and so on. The equations are not only referenced — whenever feasible I repeated the equations (if short enough I repeated it within the text). Sometimes, I referred to it by its name. Similarly, if you encounter a term that has been already introduced and you don’t remember its meaning, the Index, which is very detailed, can help you, by giving you the number of the page where the term was defined, either through a formal definition or within the text. I strongly encourage you to follow these guidelines. Otherwise, do not complain that the book is too difficult to read!
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii User Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Part I The beginning – Pre–calculus 1
Back to high school — algebra revisited . . . . . . . . . . . . . . . . . . 1.1 Wanna learn math? Let’s talk bign`e and bab` a .............. 1.1.1 One equation with one unknown . . . . . . . . . . . . . . . . . . . 1.1.2 Two equations with two unknowns . . . . . . . . . . . . . . . . . 1.2 Where do we go from here? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Guidelines for reading this book . . . . . . . . . . . . . . . . . . . . 1.2.2 Axioms, theorems, and all that stuff ♦ . . . . . . . . . . . . . . 1.3 Natural numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Pierino. Arithmetic with natural numbers ♦ . . . . . . . . . 1.3.2 Division of natural numbers ♦ . . . . . . . . . . . . . . . . . . . . . 1.3.3 Fundamental theorem of arithmetic ♥ . . . . . . . . . . . . . . 1.3.4 Even and odd numbers ♦ . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.5 Greatest common divisor, least common multiple ♥ . . 1.4 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Giorgetto and arithmetic with integers ♦ . . . . . . . . . . . . 1.5 Rational numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Arithmetic with rational numbers ♦ . . . . . . . . . . . . . . . . 1.5.2 The rationals are countable ♥ . . . . . . . . . . . . . . . . . . . . .
3 5 5 7 9 9 11 14 16 19 21 23 23 24 24 27 27 32 xix
xx
Contents
1.5.3 Decimal representation of rationals ♣ . . . . . . . . . . . . . . . 1.6 Real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.1 Irrational and real numbers . . . . . . . . . . . . . . . . . . . . . . . . 1.6.2 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.3 Bounded and unbounded intervals . . . . . . . . . . . . . . . . . . 1.6.4 Approximating irrationals with rationals ♥ . . . . . . . . . . 1.6.5 Arithmetic with real numbers ♥ . . . . . . . . . . . . . . . . . . . . 1.6.6 Real numbers are uncountable ♥ . . . . . . . . . . . . . . . . . . . 1.6.7 Axiomatic formulation ♠ . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Imaginary and complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.1 Imaginary numbers ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.2 Complex numbers ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 Appendix. The Greek alphabet . . . . . . . . . . . . . . . . . . . . . . . . . . .
33 37 39 40 43 44 45 50 51 52 52 53 55
2
Systems of linear algebraic equations . . . . . . . . . . . . . . . . . . . . . 2.1 Preliminary remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The 2 × 2 linear algebraic systems . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Method by substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Gaussian elimination method . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 The 3 × 3 linear algebraic systems . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Gaussian elimination for 3 × 3 systems . . . . . . . . . . . . . . 2.3.2 Illustrative examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Another illustrative example; ∞2 solutions . . . . . . . . . . 2.3.4 Interchanging equations and/or unknowns . . . . . . . . . . . 2.3.5 Existence and uniqueness theorems . . . . . . . . . . . . . . . . . 2.3.6 Explicit solution for a generic 3 × 3 system ♠ . . . . . . . . 2.4 The n × n linear algebraic systems . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Gaussian elimination procedure . . . . . . . . . . . . . . . . . . . . 2.4.2 Details of the Gaussian elimination procedure ♠ . . . . . 2.4.3 Existence and uniqueness theorems ♠ . . . . . . . . . . . . . . .
57 58 59 60 63 67 68 69 71 74 75 77 80 82 83 84 86
3
Matrices and vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Column matrices and row matrices . . . . . . . . . . . . . . . . . 3.1.2 Basic operations on matrices and vectors . . . . . . . . . . . . 3.2 Summation symbol. Linear combinations . . . . . . . . . . . . . . . . . .
89 92 93 95 96
xxi
Contents
3.2.1 Summation symbol. Kronecker delta . . . . . . . . . . . . . . . . 3.2.2 Linear combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear dependence of vectors and matrices . . . . . . . . . . . . . . . . . Additional operation on matrices . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Product between two matrices . . . . . . . . . . . . . . . . . . . . . 3.4.2 Products involving column and row matrices . . . . . . . . . 3.4.3 Symmetric and antisymmetric matrices . . . . . . . . . . . . . 3.4.4 The operation A : B ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A x = b revisited. Rank of A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Linear independence and Gaussian elimination . . . . . . . 3.5.2 Rank. Singular matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Linear dependence/independence of columns . . . . . . . . . 3.5.4 Existence and uniqueness theorem for A x = 0 ♣ . . . . . 3.5.5 Existence and uniqueness theorem for A x = b ♣ . . . . . 3.5.6 Positive definite matrices ♣ . . . . . . . . . . . . . . . . . . . . . . . . 3.5.7 A x = b, with A rectangular ♣ ♣ ♣ . . . . . . . . . . . . . . . . . . Linearity and superposition theorems . . . . . . . . . . . . . . . . . . . . . 3.6.1 Linearity of matrix–by–vector multiplication . . . . . . . . . 3.6.2 Superposition in linear algebraic systems . . . . . . . . . . . . Appendix A. Partitioned matrices ♣ . . . . . . . . . . . . . . . . . . . . . . Appendix B. A tiny bit on determinants ♣ . . . . . . . . . . . . . . . . 3.8.1 Determinants of 2 × 2 matrices ♣ . . . . . . . . . . . . . . . . . . 3.8.2 Determinants of 3 × 3 matrices ♣ . . . . . . . . . . . . . . . . . . Appendix C. Tridiagonal matrices ♠ . . . . . . . . . . . . . . . . . . . . . .
96 99 100 103 103 106 109 113 114 115 116 117 118 119 123 124 127 127 128 131 134 135 136 139
Statics of particles in one dimension . . . . . . . . . . . . . . . . . . . . . . 4.1 Newton equilibrium law for single particles . . . . . . . . . . . . . . . . 4.1.1 Particles and material points . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Equilibrium of systems of particles . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 The Newton third law of action and reaction . . . . . . . . . 4.3 Longitudinal equilibrium of spring–particle chains . . . . . . . . . . 4.4 Spring–particle chains anchored at both ends . . . . . . . . . . . . . . 4.4.1 Chains of three particles . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Chains of n particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Particle chains anchored at one endpoint . . . . . . . . . . . . . . . . . .
141 144 145 146 151 151 152 154 155 157 159
3.3 3.4
3.5
3.6
3.7 3.8
3.9 4
xxii
Contents
4.6 Unanchored spring–particle chains . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Appendix A. Influence matrix ♥ . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Appendix B. General spring–particle systems ♠ . . . . . . . . . . . . 4.8.1 The matrix K is positive definite/semidefinite ♥ . . . . . . 4.9 Appendix C. International System of Units (SI) ♣ . . . . . . . . . .
161 162 166 167 169
5
Basic Euclidean geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Circles, arclengths, angles and π . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Secants, tangents and normals . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Polygons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Triangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Triangle postulate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Similar triangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.3 Congruent triangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.4 Isosceles triangles. Pons Asinorum . . . . . . . . . . . . . . . . . . 5.6 Quadrilaterals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Rectangles and squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.2 Parallelograms ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.3 Rhombuses ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Pythagorean theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.1 The Pythagorean theorem in three dimensions . . . . . . . 5.8.2 Triangle inequality ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 Additional results ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9.1 Three non–coaligned points and a circle ♥ . . . . . . . . . . . 5.9.2 Easy way to find midpoint and normal of a segment ♥ 5.9.3 Circles and right triangles ♥ . . . . . . . . . . . . . . . . . . . . . . .
175 176 177 179 180 181 182 184 185 187 188 189 190 191 192 194 196 197 199 199 200 201
6
Real functions of a real variable . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introducing functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.3 Range and domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.4 The one–hundred meter dash ♥ . . . . . . . . . . . . . . . . . . . . 6.1.5 Even and odd functions ♣ . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Linear independence of functions . . . . . . . . . . . . . . . . . . . . . . . . .
203 204 206 207 208 208 211 212
xxiii
Contents
6.2.1 Linear independence of 1, x and x2 ♥ . . . . . . . . . . . . . . . Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Zeroth–degree polynomials . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 First–degree polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 Second–degree polynomials. Quadratic equations . . . . . 6.3.4 Polynomials of degree n ♣ . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.5 Linear independence of 1, x,. . . , xn ♣ . . . . . . . . . . . . . . . Trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Sine, cosine and tangent . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Properties of trigonometric functions . . . . . . . . . . . . . . . . 6.4.3 Bounds for π ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.4 An important inequality ♣ . . . . . . . . . . . . . . . . . . . . . . . . 6.4.5 Area of a circle ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exponentiation. The functions xα and ax . . . . . . . . . . . . . . . . . . 6.5.1 Exponentiation with natural exponent ♦ . . . . . . . . . . . . 6.5.2 Exponentiation with integer exponent . . . . . . . . . . . . . . . 6.5.3 Exponentiation with rational exponent ♥ . . . . . . . . . . . . 6.5.4 Exponentiation with real exponents ♣ . . . . . . . . . . . . . . 6.5.5 The functions xα and ax , and then more ♣ . . . . . . . . . . Composite and inverse functions ♣ . . . . . . . . . . . . . . . . . . . . . . . 6.6.1 Graphs of inverse functions ♣ . . . . . . . . . . . . . . . . . . . . . . Single–valued and multi–valued functions ♣ . . . . . . . . . . . . . . . Appendix. More on functions ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.1 Functions, mappings and operators ♣ . . . . . . . . . . . . . . . 6.8.2 Functions vs vectors ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . .
214 215 216 216 219 223 226 227 228 231 235 239 239 240 240 243 243 246 248 248 249 250 252 252 253
Basic analytic geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Cartesian frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Changes of axes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Types of representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Two dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Three dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Straight lines on a plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Reconciling two definitions of straight lines ♥ . . . . . . . . 7.3.2 The 2 × 2 linear algebraic system revisited . . . . . . . . . . . 7.4 Planes and straight lines in space . . . . . . . . . . . . . . . . . . . . . . . . .
255 256 257 259 260 260 262 264 265 266
6.3
6.4
6.5
6.6 6.7 6.8
7
xxiv
Contents
7.5
7.6 7.7
7.8 7.9
7.10
7.4.1 Planes in space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.2 Straight lines in space ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . Circles and disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Addition formulas for sine and cosine ♣ . . . . . . . . . . . . . 7.5.3 Prosthaphaeresis formulas ♣ . . . . . . . . . . . . . . . . . . . . . . . 7.5.4 Properties of triangles and circles ♥ . . . . . . . . . . . . . . . . Spheres and balls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conics: the basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.1 Ellipses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.2 Parabolas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.3 Hyperbolas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conics. General and canonical representations ♠ . . . . . . . . . . . Conics. Polar representation ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.9.1 Ellipses ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.9.2 Parabolas ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.9.3 Hyperbolas ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.9.4 Summary ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix. “Why conic sections?” ♥ . . . . . . . . . . . . . . . . . . . . . .
266 267 268 269 271 273 275 277 278 279 280 281 283 286 286 289 291 293 294
Part II Calculus and dynamics of a particle in one dimension 8
Limits. Continuity of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Paradox of Achilles and the tortoise . . . . . . . . . . . . . . . . . . . . . . 8.2 Sequences and limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Monotonic sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Limits of important sequences . . . . . . . . . . . . . . . . . . . . . 8.2.3 Fibonacci sequence. The golden ratio ♥ . . . . . . . . . . . . . 8.2.4 The coast of Tuscany. ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.5 Principle of mathematical induction ♣ . . . . . . . . . . . . . . 8.3 Limits of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Rules for limits of functions . . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 Important limits of functions . . . . . . . . . . . . . . . . . . . . . . 8.3.3 The O and o (“big O” and “little o”) notation ♣ . . . . . 8.4 Continuous and discontinuous functions . . . . . . . . . . . . . . . . . . . 8.4.1 Discontinuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Properties of continuous functions ♣ . . . . . . . . . . . . . . . . . . . . . .
299 302 303 308 309 311 315 319 320 321 323 326 328 330 334
Contents
9
xxv
8.5.1 Successive subdivision procedure . . . . . . . . . . . . . . . . . . . 8.5.2 Boundedness vs continuity . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.3 Supremum/infimum, maximum/minimum . . . . . . . . . . . 8.5.4 Maxima and minima of continuous functions . . . . . . . . . 8.5.5 Intermediate value properties . . . . . . . . . . . . . . . . . . . . . . 8.5.6 Uniform continuity ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.7 Piecewise–continuous functions . . . . . . . . . . . . . . . . . . . . . 8.6 Contours and regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6.1 Two dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6.2 Three dimensions ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6.3 Simply– and multiply–connected regions ♣ . . . . . . . . . . 8.7 Functions of two variables ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7.1 Non–interchangeability of limits ♣ . . . . . . . . . . . . . . . . . . 8.7.2 Continuity for functions of two variables ♣ . . . . . . . . . . 8.7.3 Piecewise continuity for two variables ♣ . . . . . . . . . . . . . 8.7.4 Boundedness for functions of two variables ♣ . . . . . . . . 8.7.5 Uniform continuity in two variables ♣ . . . . . . . . . . . . . . 8.8 Appendix. Bolzano–Weierstrass theorem ♣ . . . . . . . . . . . . . . . . 8.8.1 Accumulation points ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8.2 Bolzano–Weierstrass theorem ♣ . . . . . . . . . . . . . . . . . . . . 8.8.3 Cauchy sequence ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8.4 A rigorous proof of Theorem 73, p. 308 ♣ . . . . . . . . . . .
334 335 336 338 342 343 345 347 347 349 351 353 353 354 355 356 356 357 357 358 359 360
Differential calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Derivative of a function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.1 Differentiability implies continuity ♥ . . . . . . . . . . . . . . . . 9.1.2 Differentials. Leibniz notation . . . . . . . . . . . . . . . . . . . . . . 9.1.3 A historical note on infinitesimals . . . . . . . . . . . . . . . . 9.1.4 Basic rules for derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.5 Higher–order derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.6 Notations for time derivatives . . . . . . . . . . . . . . . . . . . . . . 9.2 Derivative of xα . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 The case α = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 The case α = n = 1, 2, . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.3 The case α = −1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.4 The case α = −2, −3, . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
361 362 366 366 368 369 372 374 374 376 377 377 377
xxvi
Contents
9.2.5 The case α = 1/q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.6 The case α rational ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.7 The case α real ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Additional algebraic derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . Derivatives of trigonometric and inverse functions . . . . . . . . . . More topics on differentiable functions . . . . . . . . . . . . . . . . . . . . 9.5.1 Local maxima/minima of differentiable functions . . . . . 9.5.2 Rolle theorem and mean value theorem . . . . . . . . . . . . . 9.5.3 The l’Hˆopital rule ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.4 Existence of discontinuous derivative ♥ . . . . . . . . . . . . . Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.1 Multiple chain rule, total derivative ♣ . . . . . . . . . . . . . . 9.6.2 Second– and higher–order derivatives ♣ . . . . . . . . . . . . . 9.6.3 Schwarz theorem on second mixed derivative ♣ . . . . . . 9.6.4 Partial derivatives for n variables ♣ . . . . . . . . . . . . . . . . Appendix A. Curvature of the graph of f (x) . . . . . . . . . . . . . . . 9.7.1 Convex and concave functions ♣ . . . . . . . . . . . . . . . . . . . 9.7.2 Osculating circle. Osculating parabola ♣ . . . . . . . . . . . . 9.7.3 Curvature vs f (x) ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix B. Linear operators. Linear mappings ♣ . . . . . . . . . . 9.8.1 Operators ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8.2 Linear operators ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8.3 Linear–operator equations ♣ . . . . . . . . . . . . . . . . . . . . . . . 9.8.4 Uniqueness and superposition theorems ♣ . . . . . . . . . . . 9.8.5 Linear mappings ♠ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix C. Finite difference approximations ♣ . . . . . . . . . . . .
378 378 379 381 382 384 384 386 387 391 392 393 394 395 398 399 399 400 401 402 402 403 406 406 409 409
10 Integral calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.1 Properties of integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.2 Integrals of piecewise–continuous functions . . . . . . . . . . 10.2 Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Primitives of elementary functions . . . . . . . . . . . . . . . . . . 10.3 Use of primitives to evaluate integrals . . . . . . . . . . . . . . . . . . . . . 10.3.1 First fundamental theorem of calculus . . . . . . . . . . . . . . . 10.3.2 Second fundamental theorem of calculus . . . . . . . . . . . . .
411 412 414 418 419 421 422 422 424
9.3 9.4 9.5
9.6
9.7
9.8
9.9
Contents
xxvii
10.4 Rules for evaluating integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.1 Integration by substitution . . . . . . . . . . . . . . . . . . . . . . . . 10.4.2 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Applications of integrals and primitives . . . . . . . . . . . . . . . . . . . 10.5.1 Separation of variables ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.2 Linear independence of sin x and cos x . . . . . . . . . . . . . . 10.5.3 Length of a line. Arclength . . . . . . . . . . . . . . . . . . . . . . . . 10.6 Derivative of an integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6.1 The cases a = a(x) and b = b(x) . . . . . . . . . . . . . . . . . . . . 10.6.2 The case f = f (u, x); the Leibniz rule . . . . . . . . . . . . . . . 10.6.3 General case: a = a(x), b = b(x), f = f (u, x) . . . . . . . . . 10.7 Appendix A. Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7.1 Improper integrals of the first kind ♣ . . . . . . . . . . . . . . . 10.7.2 Improper integrals of the second kind ♣ . . . . . . . . . . . . . 10.7.3 Riemann integrable functions ♣ . . . . . . . . . . . . . . . . . . . . 10.8 Appendix B. Darboux integral ♣ . . . . . . . . . . . . . . . . . . . . . . . . .
427 427 429 430 430 433 434 436 437 437 439 440 440 442 443 443
11 Rectilinear single–particle dynamics . . . . . . . . . . . . . . . . . . . . . . 11.1 Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Rectilinear motion, velocity and acceleration . . . . . . . . . 11.1.2 The one–hundred meter dash again ♥ . . . . . . . . . . . . . . . 11.1.3 What is a frame of reference? . . . . . . . . . . . . . . . . . . . . . . 11.2 The Newton second law in one dimension . . . . . . . . . . . . . . . . . . 11.2.1 What is an inertial frame of reference? . . . . . . . . . . . . . . 11.3 Momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 A particle subject to no forces . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.1 The Newton first law in one dimension ♥ . . . . . . . . . . . . 11.5 A particle subject to a constant force . . . . . . . . . . . . . . . . . . . . . 11.5.1 A particle subject only to gravity . . . . . . . . . . . . . . . . . . . 11.6 Undamped harmonic oscillators. Free oscillations . . . . . . . . . . . 11.6.1 Harmonic motions and sinusoidal functions . . . . . . . . . . 11.7 Undamped harmonic oscillators. Periodic forcing . . . . . . . . . . . 11.7.1 Harmonic forcing with Ω = ω . . . . . . . . . . . . . . . . . . . . . . 11.7.2 Harmonic forcing with Ω = ω. Resonance . . . . . . . . . . . . 11.7.3 Near resonance and beats (Ω ω) ♥ . . . . . . . . . . . . . . . 11.7.4 Periodic, but not harmonic, forcing ♣ . . . . . . . . . . . . . . .
447 448 448 450 452 453 455 457 459 460 460 461 464 469 470 470 474 476 478
xxviii
Contents
11.7.5 An anecdote: “Irrational mechanics” ♥ . . . . . . . . . . . . . . 11.8 Undamped harmonic oscillators. Arbitrary forcing . . . . . . . . . . 11.9 Mechanical energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.9.1 Kinetic energy and work . . . . . . . . . . . . . . . . . . . . . . . . . . 11.9.2 Conservative forces and potential energy . . . . . . . . . . . . 11.9.3 A deeper look: U = U (x, t) ♥ . . . . . . . . . . . . . . . . . . . . . . 11.9.4 Conservation of mechanical energy . . . . . . . . . . . . . . . . . . 11.9.5 Illustrative examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.9.6 Minimum–energy theorem for statics . . . . . . . . . . . . . . . . 11.9.7 Dynamics solution via energy considerations . . . . . . . . .
479 484 486 486 489 491 492 493 494 495
12 Logarithms and exponentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Natural logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.1 Properties of ln x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.1 Properties of exponential . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Hyperbolic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Inverse hyperbolic functions ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.1 Derivatives of inverse hyperbolic functions ♣ . . . . . . . . . 12.4.2 Explicit expressions ♠ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Primitives involving logarithm and exponentials . . . . . . . . . . . . 12.6 Appendix. Why “hyperbolic” functions? ♥ . . . . . . . . . . . . . . . . 12.6.1 What is u? ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
497 497 498 503 504 507 511 511 512 514 518 519
13 Taylor polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Polynomials of degree one. Tangent line . . . . . . . . . . . . . . . . . . . 13.2 Polynomials of degree two; osculating parabola . . . . . . . . . . . . . 13.3 Polynomials of degree n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Taylor polynomials of elementary functions . . . . . . . . . . . . . . . . 13.4.1 Algebraic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.2 Binomial theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.3 Trigonometric functions and their inverses . . . . . . . . . . . 13.4.4 Logarithms, exponentials, and related functions . . . . . . 13.5 Lagrange form of Taylor polynomial remainder ♣ . . . . . . . . . . . 13.6 Appendix A. On local maxima and minima again ♣ . . . . . . . . 13.7 Appendix B. Finite differences revisited ♣ . . . . . . . . . . . . . . . . . 13.8 Appendix C. Linearization ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . .
521 523 524 526 527 527 530 531 534 537 540 542 543
Contents
xxix
14 Damping and aerodynamic drag . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1 Dampers. Damped harmonic oscillators . . . . . . . . . . . . . . . . . . . 14.1.1 Dampers and dashpots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.2 Modeling damped harmonic oscillators . . . . . . . . . . . . . . 14.2 Damped harmonic oscillator. Free oscillations . . . . . . . . . . . . . . 14.2.1 Small–damping solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.2 Large–damping solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.3 Critical–damping solution ♣ . . . . . . . . . . . . . . . . . . . . . . . 14.2.4 Continuity of solution in terms of c ♥ . . . . . . . . . . . . . . . 14.3 Damped harmonic oscillator. Harmonic forcing . . . . . . . . . . . . . 14.4 Damped harmonic oscillator. Arbitrary forcing ♣ . . . . . . . . . . . 14.5 Illustrative examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5.1 Model of a car suspension system . . . . . . . . . . . . . . . . . . . 14.5.2 Time to change the shocks of your car! ♥ . . . . . . . . . . . 14.5.3 Other problems governed by same equations ♥ . . . . . . . 14.6 Aerodynamic drag ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6.1 Free fall of a body in the air ♥ . . . . . . . . . . . . . . . . . . . . . 14.6.2 A deeper analysis. Upward initial velocity ♥ . . . . . . . . . 14.7 Energy theorems in the presence of damping . . . . . . . . . . . . . . . 14.8 Appendix A. Stability of x ¨ + p˘ x˙ + q˘ x = 0 . . . . . . . . . . . . . . . . . 14.9 Appendix B. Negligible–mass systems . . . . . . . . . . . . . . . . . . . . . 14.9.1 The problem with no forcing . . . . . . . . . . . . . . . . . . . . . . . 14.9.2 The problem with harmonic forcing . . . . . . . . . . . . . . . . . 14.9.3 The problem with arbitrary forcing ♣ . . . . . . . . . . . . . . .
545 546 546 547 548 551 552 554 555 556 562 563 563 566 568 569 569 573 573 574 579 580 581 582
Part III Multivariate calculus and mechanics in three dimensions 15 Physicists’ vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1 Physicists’ vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1.1 Multiplication by scalar and addition . . . . . . . . . . . . . . . 15.1.2 Linear independence of physicists’ vectors . . . . . . . . . . . 15.1.3 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1.4 Base vectors and components . . . . . . . . . . . . . . . . . . . . . . 15.2 Dot (or scalar) product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.1 Projections vs components in R3 . . . . . . . . . . . . . . . . . . . 15.2.2 Parallel– and normal–vector decomposition . . . . . . . . . . 15.2.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
585 586 589 591 592 593 597 600 601 602
xxx
Contents
15.2.4 Dot product in terms of components . . . . . . . . . . . . . . . . 15.2.5 Lines and planes revisited . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.6 Tangent and normal to a curve on a plane . . . . . . . . . . . Cross (or vector) product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.1 Cross product in terms of components . . . . . . . . . . . . . . Composite products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.1 Scalar triple product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.2 Vector triple product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.3 Another important vector product . . . . . . . . . . . . . . . . . . Change of basis ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gram–Schmidt procedure for physicists’ vectors ♣ . . . . . . . . . . Appendix. A property of parabolas ♥ . . . . . . . . . . . . . . . . . . . . .
603 604 606 606 610 612 612 615 616 617 619 620
16 Inverse matrices and bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.1 Zero, identity and diagonal matrices . . . . . . . . . . . . . . . . . . . . . . 16.1.1 The zero matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.1.2 The identity matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.1.3 Diagonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 The inverse matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.1 Definition and properties . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.2 The problem A X = B. Inverse matrix again ♠ . . . . . . . 16.3 Bases and similar matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3.1 Linear vector spaces Rn and Cn and their bases . . . . . . 16.3.2 Similar matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3.3 Diagonalizable matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3.4 Eigenvalue problem in linear algebra ♥ . . . . . . . . . . . . . . 16.4 Orthogonality in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4.1 Projections vs components in Rn . . . . . . . . . . . . . . . . . . . 16.4.2 Orthogonal matrices in Rn . . . . . . . . . . . . . . . . . . . . . . . . 16.4.3 Orthogonal transformations . . . . . . . . . . . . . . . . . . . . . . . . 16.4.4 Gram–Schmidt procedure in Rn . . . . . . . . . . . . . . . . . . . . 16.5 Complex matrices ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5.1 Orthonormality in Cn ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5.2 Projections vs components in Cn ♣ . . . . . . . . . . . . . . . . . 16.5.3 Hermitian matrices ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5.4 Positive definite matrices in Cn ♣ . . . . . . . . . . . . . . . . . .
623 625 625 626 627 628 628 632 632 633 635 636 637 638 640 641 642 643 644 645 646 647 648
15.3 15.4
15.5 15.6 15.7
Contents
xxxi
16.5.5 Unitary matrices ♠ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5.6 Gram–Schmidt procedure in Cn ♠ . . . . . . . . . . . . . . . . . . 16.6 Appendix A. Gauss–Jordan elimination ♣ . . . . . . . . . . . . . . . . . 16.7 Appendix B. The L-U decomposition, and more ♣ . . . . . . . . . . 16.7.1 Triangular matrices. Nilpotent matrices ♣ . . . . . . . . . . . 16.7.2 The L-U decomposition ♠ . . . . . . . . . . . . . . . . . . . . . . . . . 16.7.3 Tridiagonal matrices ♠ . . . . . . . . . . . . . . . . . . . . . . . . . . . .
648 649 649 654 654 659 660
17 Statics in three dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.1 Applied vectors and forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2 Newton equilibrium law of a particle . . . . . . . . . . . . . . . . . . . . . . 17.2.1 Illustrative examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3 Statics of n particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3.1 General formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3.2 Statics of spring–connected particles . . . . . . . . . . . . . . . . 17.4 Chains of spring–connected particles . . . . . . . . . . . . . . . . . . . . . . 17.4.1 Exact formulation for spring–particle chains ♥ . . . . . . . 17.4.2 Equilibrium in the stretched configuration ♥ . . . . . . . . . 17.4.3 The loaded configuration (planar case) ♥ . . . . . . . . . . . . 17.4.4 Linearized formulation (planar case) ♥ . . . . . . . . . . . . . . 17.5 Moment of a force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5.1 Properties of moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.6 Particle systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.6.1 The Newton third law . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.6.2 Resultant force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.6.3 Resultant moment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.6.4 Equipollent systems. Torques and wrenches . . . . . . . . . . 17.6.5 Considerations on constrained particle systems . . . . . . . 17.7 Rigid systems. Rigid bodies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.7.1 Levers and balance scales . . . . . . . . . . . . . . . . . . . . . . . . . . 17.7.2 Rigid system subject to weight. Barycenter . . . . . . . . . . 17.8 Appendix A. Friction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.8.1 Static friction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.8.2 Dynamic friction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.9 Appendix B. Illustrative examples of friction . . . . . . . . . . . . . . . 17.9.1 Pushing a tall stool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
663 664 665 666 670 671 672 674 675 676 678 679 683 684 686 686 687 689 690 693 694 695 697 699 700 701 701 702
xxxii
Contents
17.9.2 Ruler on two fingers ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703 17.9.3 Particle on a slope with friction, no drag ♥ . . . . . . . . . . 705 17.9.4 Particle on a slope with friction and drag ♥ . . . . . . . . . 705 18 Multivariate differential calculus . . . . . . . . . . . . . . . . . . . . . . . . . . 18.1 Differential forms and exact differential forms . . . . . . . . . . . . . . 18.2 Lines in two and three dimensions revisited . . . . . . . . . . . . . . . . 18.2.1 Lines in two dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2.2 Lines in three dimensions ♣ . . . . . . . . . . . . . . . . . . . . . . . 18.3 Gradient, divergence, and curl . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3.1 Gradient and directional derivative . . . . . . . . . . . . . . . . . 18.3.2 Divergence ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3.3 The Laplacian ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3.4 Curl ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3.5 Combinations of grad, div, and curl ♣ . . . . . . . . . . . . . . 18.3.6 The del operator, and more ♣ . . . . . . . . . . . . . . . . . . . . . 18.4 Taylor polynomials in several variables . . . . . . . . . . . . . . . . . . . . 18.4.1 Taylor polynomials in two variables . . . . . . . . . . . . . . . . . 18.4.2 Taylor polynomials for n > 2 variables ♣ . . . . . . . . . . . . 18.5 Maxima and minima for functions of n variables . . . . . . . . . . . . 18.5.1 Constrained problems. Lagrange multipliers ♣ . . . . . . .
707 708 710 711 714 718 718 722 723 723 727 727 728 729 732 732 733
19 Multivariate integral calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.1 Line integrals in two dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2 Path–independent line integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2.1 Simply connected regions . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2.2 Multiply connected regions ♣ . . . . . . . . . . . . . . . . . . . . . . 19.3 Double integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3.1 Double integrals over rectangular regions ♣ . . . . . . . . . . 19.3.2 Darboux double integrals ♣ . . . . . . . . . . . . . . . . . . . . . . . 19.3.3 Double integrals over arbitrary regions . . . . . . . . . . . . . . 19.4 Evaluation of double integrals by iterated integrals . . . . . . . . . 19.5 Green theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.5.1 Consequences of the Green theorem . . . . . . . . . . . . . . . . . 19.5.2 Green theorem vs Gauss and Stokes theorems ♣ . . . . . 19.5.3 Irrotational, lamellar, conservative, potential ♣ . . . . . . 19.5.4 Exact differential forms revisited . . . . . . . . . . . . . . . . . . .
737 738 740 740 742 746 746 747 749 751 757 758 760 762 766
Contents
xxxiii
19.5.5 Integrating factor ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.6 Integrals in space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.6.1 Line integrals in space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.6.2 Surface integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.6.3 Triple (or volume) integrals . . . . . . . . . . . . . . . . . . . . . . . . 19.6.4 Three–dimensional Gauss and Stokes theorems . . . . . . . 19.7 Time derivative of volume integrals ♣ . . . . . . . . . . . . . . . . . . . . . 19.8 Appendix. Integrals in curvilinear coordinates ♣ . . . . . . . . . . . 19.8.1 Polar coordinates. Double integrals ♣ . . . . . . . . . . . . . . . 19.8.2 Cylindrical coordinates. Volume integrals ♣ . . . . . . . . . 19.8.3 Spherical coordinates. Volume integrals ♣ . . . . . . . . . . . 19.8.4 Integration using general coordinates ♣ . . . . . . . . . . . . . 19.8.5 Orthogonal curvilinear coordinates ♣ . . . . . . . . . . . . . . . 19.8.6 Back to cylindrical and spherical coordinates ♣ . . . . . .
766 768 768 769 771 773 773 775 776 778 780 782 786 788
20 Single particle dynamics in space . . . . . . . . . . . . . . . . . . . . . . . . . 20.1 Velocity, acceleration and the Newton second law . . . . . . . . . . . 20.2 Illustrative examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2.1 Particle subject to no forces. Newton first law . . . . . . . . 20.2.2 Particle subject to a constant force . . . . . . . . . . . . . . . . . 20.2.3 Sled on snowy slope with friction (no drag) . . . . . . . . . . 20.2.4 Sled on snowy slope with friction and drag ♥ . . . . . . . . 20.2.5 Newton gravitation and Kepler laws . . . . . . . . . . . . . . 20.3 Intrinsic velocity and acceleration components . . . . . . . . . . . . . 20.3.1 A heavy particle on a frictionless wire . . . . . . . . . . . . . . . 20.3.2 The circular pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3.3 The Huygens isochronous pendulum ♥ . . . . . . . . . . . . . . 20.4 The d’Alembert principle of inertial forces . . . . . . . . . . . . . . . . . 20.4.1 Illustrative example: a car on a turn ♥ . . . . . . . . . . . . . . 20.5 Angular momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.5.1 Spherical pendulum ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.6 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.6.1 Energy equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.6.2 Potential vs conservative force fields . . . . . . . . . . . . . . . . 20.6.3 Potential energy for forces of interest . . . . . . . . . . . . . . . 20.6.4 Minimum–energy theorem for a single particle . . . . . . .
791 792 794 795 796 799 801 802 804 805 808 810 813 814 819 819 824 824 826 830 833
xxxiv
Contents
20.7 Dynamics solutions, via energy equation . . . . . . . . . . . . . . . . . . . 20.7.1 A bead moving along a wire, again . . . . . . . . . . . . . . . . . 20.7.2 Exact circular pendulum equation ♥ . . . . . . . . . . . . . . . . 20.8 Appendix A. Newton and the Kepler laws ♥ . . . . . . . . . . . . . . . 20.8.1 Newton law of gravitation ♥ . . . . . . . . . . . . . . . . . . . . . . . 20.8.2 Planets on circular orbits ♥ . . . . . . . . . . . . . . . . . . . . . . . 20.8.3 The Kepler laws ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.8.4 Gravity vs gravitation ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . 20.9 Appendix B. Cycloids, tautochrones, isochrones ♣ . . . . . . . . . . 20.9.1 Cycloids ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.9.2 Tautochrones ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.9.3 From cycloids to tautochrones ♠ . . . . . . . . . . . . . . . . . . .
834 835 836 838 839 840 843 848 849 850 853 854
21 Dynamics of n-particle systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1 Dynamics of n-particle systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Dynamics of mass–spring chains . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.1 Two–particle chain. Beats. Energy transfer ♥ . . . . . . . . 21.2.2 A more general problem ♥ . . . . . . . . . . . . . . . . . . . . . . . . 21.3 What comes next? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4.1 Center of mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Angular momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5.1 Useful expressions for angular momentum . . . . . . . . . . . 21.6 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.6.1 Kinetic energy. The K¨onig theorem . . . . . . . . . . . . . . . . . 21.6.2 Decomposition of the energy equation . . . . . . . . . . . . . . . 21.6.3 Potential energy for two particles ♣ . . . . . . . . . . . . . . . . 21.6.4 Potential energy for fields of interest ♣ . . . . . . . . . . . . . . 21.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.7.1 How come cats always land on their feet? ♥ . . . . . . . . . 21.7.2 Two masses connected by a spring ♥ . . . . . . . . . . . . . . . 21.7.3 The Kepler laws revisited ♥ . . . . . . . . . . . . . . . . . . . . . . . 21.8 Appendix A. Sun–Earth–Moon dynamics ♥ ♥ ♥ . . . . . . . . . . . . 21.9 Appendix B. Potential energy for n particles ♠ . . . . . . . . . . . . . 21.9.1 Conservative external forces ♠ . . . . . . . . . . . . . . . . . . . . . 21.9.2 Conservative internal forces ♠ . . . . . . . . . . . . . . . . . . . . .
855 856 857 858 863 865 866 867 870 872 873 875 876 879 883 884 885 890 893 895 903 903 904
xxxv
Contents
21.9.3 Energy conservation for conservative fields
♠
. . . . . . . . 906
22 Dynamics in non–inertial frames . . . . . . . . . . . . . . . . . . . . . . . . . . 22.1 Frames in relative motion with a common point . . . . . . . . . . . . 22.2 Poisson formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2.1 Transport theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3 Frames of reference in arbitrary motion . . . . . . . . . . . . . . . . . . . 22.3.1 Point of the moving frame . . . . . . . . . . . . . . . . . . . . . . . . . 22.3.2 Point in the moving frame . . . . . . . . . . . . . . . . . . . . . . . . . 22.4 Apparent inertial forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.4.1 Some qualitative examples . . . . . . . . . . . . . . . . . . . . . . . . . 22.4.2 Gravity vs gravitation revisited . . . . . . . . . . . . . . . . . . . . 22.4.3 Mass on a spinning wire ♥ . . . . . . . . . . . . . . . . . . . . . . . . 22.4.4 A spinning mass–spring system ♥ . . . . . . . . . . . . . . . . . . 22.4.5 Spherical pendulum, again ♥ . . . . . . . . . . . . . . . . . . . . . . 22.4.6 Foucault pendulum ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.5 Inertial frames revisited ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.6 Appendix A. The tides ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.6.1 A relatively simple model ♥ . . . . . . . . . . . . . . . . . . . . . . . 22.6.2 So! Is it the Sun, or is it the Moon? ♥ . . . . . . . . . . . . . . 22.6.3 Wrapping it up ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.7 Appendix B. Time derivative of rotation matrix ♠ . . . . . . . . . .
907 908 912 914 915 915 917 920 921 926 927 929 933 935 937 938 940 945 947 949
23 Rigid bodies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.1 Kinematics of rigid bodies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.1.1 Instantaneous axis of rotation ♣ . . . . . . . . . . . . . . . . . . . 23.2 Momentum and angular momentum equations . . . . . . . . . . . . . 23.2.1 Momentum equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.2.2 Angular momentum equation . . . . . . . . . . . . . . . . . . . . . . 23.2.3 Euler equations for rigid–body dynamics . . . . . . . . . . . . 23.2.4 Principal axes of inertia ♥ . . . . . . . . . . . . . . . . . . . . . . . . . 23.2.5 On the stability of frisbees and oval balls ♥ . . . . . . . . . . 23.3 Energy equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.3.1 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.4 Planar motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.4.1 Momentum and angular momentum equations . . . . . . . 23.4.2 Energy equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
951 952 952 954 955 955 958 960 962 964 966 967 968 970
xxxvi
Contents
23.5 A disk rolling down a slope ♥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.5.1 Solution from P0 to P1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.5.2 Solution from P1 to P2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.5.3 Solution after P2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.6 Pivoted bodies ♣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.6.1 Illustrative example: my toy gyroscope ♥ . . . . . . . . . . . . 23.7 Hinged bodies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.7.1 Illustrative example: realistic circular pendulum . . . . . . 23.8 Appendix A. Moments of inertia ♥ . . . . . . . . . . . . . . . . . . . . . . . 23.9 Appendix B. Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.9.1 Matrices vs tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.9.2 Dyadic product. Dyadic tensor . . . . . . . . . . . . . . . . . . . . . 23.9.3 Second–order tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.9.4 Tensor–by–vector product . . . . . . . . . . . . . . . . . . . . . . . . . 23.9.5 Tensors components and similar matrices . . . . . . . . . . . . 23.9.6 Transpose of a tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.9.7 Symmetric and antisymmetric tensors . . . . . . . . . . . . . . . 23.9.8 An interesting formula ♠ . . . . . . . . . . . . . . . . . . . . . . . . . .
971 972 977 977 980 983 985 986 987 991 991 992 993 994 996 996 997 998
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 999 Astronomical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1003 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005
Part I The beginning – Pre–calculus
Chapter 1
Back to high school — algebra revisited
Here we are, at the beginning of our journey into the wondrous and mysterious worlds of mathematics and mechanics!
• Overview of Part I Let me start by giving you an overview of Part I. This covers, primarily, all the high school material that comes before calculus, mostly mathematics, namely the so–called pre–calculus material. You have encountered most of this in high school, although I will follow an approach that is probably more rigorous than what you are used to. This part goes to show you how little high school math you really need before tackling math at the college level. Specifically, we begin with a problem I had to solve when I was quite young (namely buying pastries for my family) and introduce the so–called systems of two linear algebraic equations with two unknowns. Then, we discuss systems of numbers (natural, integer, rational, real and complex numbers) and their arithmetic (Chapter 1). In Chapter 2 we go back to the problem of buying pastries and generalize it to systems of n linear algebraic equations, with n unknowns; we also discuss techniques for solving problems of such a type. In Chapter 3, we recast the results of Chapter 2 by using a more modern mathematical jargon, namely in terms of matrices and elementary matrix algebra. Then, in Chapter 4, such material is used to study one–dimensional statics, specifically to find out under what conditions one or more particles that are forced to be co–aligned remain in equilibrium. We focus in particular on the equilibrium of chains of spring–connected particles. Next, in Chapter 5 we consider Euclidean algebra, primarily planar, up to the Pythagorean theorem and its applications. Then, in Chapter 6, we introduce the formal definition of function and present some basic functions, namely powers, polynomials, and
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_1
3
4
Part I. The beginning – Pre–calculus
basic trigonometric functions (sine, cosine and tangent), as well as the operation of exponentiation, namely the power of a real number when the exponent itself is a real number. Finally, in Chapter 7, we introduce the basic elements of the so–called analytic geometry, namely the use of mathematical formulas to describe some geometrical figures, in two and three dimensions, such as straight lines and planes, circles and spheres, as well as ellipses, parabolas and hyperbolas.
• Overview of this chapter As mentioned above, here we begin with a personal anecdote, specifically a problem I had to solve when I was quite young, namely how many bign`e and bab` a I could buy for my family with the money my mother had given me. This problem may be formulated as a so–called system of two linear algebraic equations with two unknowns. The existence and uniqueness of the solution of this problem (or lack thereof) are also discussed. Next, before we can proceed, we need a detour. After summarizing the guidelines presented in the User Guide for the reading of the book (Section 1.2), we review some basic concepts on numbers and their arithmetic, specifically: natural numbers (Section 1.3), integers (Section 1.4), rational numbers (Section 1.5), as well as irrational and real numbers (Section 1.6), along with a bit on complex numbers (Section 1.7). We also have an appendix, where we present the Greek alphabet (Section 1.8), essential if you want to be able to read other books on mathematics and/or physics. ◦ Warning. This chapter covers material with all levels of difficulty. Indeed, most of the material is marked with the symbols introduced in the User Guide (see also Subsection 1.2.1, on the guidelines for reading this book), namely: ♦
Maybe too easy for you. Postpone reading it. ♥ Not really needed, but you might like it. ♠ Forget about it. ♣
Indeed, in some places, I went way beyond the call of duty, and, for the sake of completeness, I have included some advanced material, such as the fundamental theorem of arithmetic (Subsection 1.3.3), or the facts that: (i) the decimal √ representation of a rational number is periodic (Theorem 6, p. 35), (ii) 2 is irrational (Theorem 8, p. 39), (iii) rationals are countable (Subsection 1.5.2) and (iv) real numbers are uncountable (Subsection 1.6.6).
1. Introduction
5
1.1 Wanna learn math? Let’s talk bign` e and bab` a As promised in the preface, I’ll start from a very simple level. [As mentioned in the User Guide, you may start reading this book wherever you feel comfortable with. For, skipping some portion of the book should not cause you any problem, because you may easily identify the material you need, through the reference I make to such material. However, please read this section, on the existence and uniqueness of the solution of linear algebraic equations, a key ingredient of the overall construction.] As stated above, I will start with a personal anecdote (with some very simple math), to show you, first of all, how intriguing math may pop up when you least expect it. Wanna learn math? You will pardon my slang! I said “Wanna learn math” only to send a message: “Math is everyday business.” To convince you of that, . . . let’s talk about bign`e and bab` a, delicious pastries of my youth. Shortly after World War II, when I was still a young boy, on Sundays we kids were treated with sweets. One day, my mother was too busy cooking and sent me to buy the pastries. She gave me some money, say, the equivalent of 10 dollars, and told me: “Buy some good ones, whatever you like, for a total of 10 dollars.” [I grew up in Italy and we were using liras, but the math would be too cumbersome in liras — too many zeros. In those days, the exchange rate was 625 Italian liras to one US dollar.] The pastries I liked the most were the bign`e. When I got to the pastry shop, I found out that one bign`e cost, say, the equivalent of 2 dollars. That was my first time, and I did not want to look like I didn’t know my arithmetic. So, I quickly divided 10 by 2 and told the owner “I would like to have 5 bign`e,” gave him the 10 dollar bill and took my 5 bign`e. ◦ For the culinary purist. A bign`e is a pastry puff filled with either whipped cream, or custard, or zabaione, or chocolate. To be accurate, I should say a bign`e ripieno (filled–up beignet, supposedly created in Florence in the XVI century, by a cook of Caterina de’ Medici), to distinguish it from a bign`e fritto (fried beignet), which is similar to the American beignet (fried dough), of French origin.
1.1.1 One equation with one unknown Let us formulate the problem in mathematical terms. Let x indicate the unknown number of bign`e, whereas a = 2 denotes the cost of one bign`e,
6
Part I. The beginning – Pre–calculus
expressed in dollars, and b = 10 is the total amount of money available, also expressed in dollars. Then, the above problem may be stated as a x = b. [Indeed, the cost of one bign`e (namely a = 2) times the number of bign`e (namely x) equals the total cost. This must be equal to the total amount of money available (namely b = 10).] This is an example of an equation that you might remember having seen in high school, or even earlier. The quantities a and b are prescribed (namely known quantities), whereas the quantity x is not known, and indeed is called the unknown of the equation. [At the risk of oversimplifying, we can say that an equation consists of a mathematical expression equated to a constant and containing one or more unknown quantities. We will have plenty of opportunities to amplify upon this.] The solution to the above equation is of course b divided by a, written as x = b/a. In our case, we obtain x = 10/2 = 5. ◦ Comment. Following tradition, unknowns are typically denoted by the last few letters of the alphabet, such as x, y, z, whereas constants (prescribed quantities) are typically denoted by the first few letters of the alphabet, such as a, b, c.
• A more complicated problem One day, I had a more complicated problem to address. We were going to have guests and there were going to be ten people at the table. Thus, my mother wanted me to buy ten pieces of pastry and she gave me the equivalent of 15 dollars. The bign`e again cost the equivalent of 2 dollars. In addition, there were bab` a, small sponge cakes saturated with rum, also very good. The bab` a cost one dollar apiece. I figured out that one bign`e plus one bab` a together would cost 3 dollars, and that 15/3=5. So I asked the owner for 5 bign`e and 5 bab` a, gave him my 15 dollars and took my 10 pieces of pastry, which is exactly what I wanted. That day, I was very proud of my piece of bravura in arithmetic, as I got complimented for it by my parents. ◦ For the culinary purist. Italians consider bab` a as part of traditional Neapolitan cuisine. However, to be accurate, they are probably of Polish origin. Their introduction is attributed to Stanislaw Leszczynski, the king of Poland, who lived exiled in France in the eighteenth century. Thus, the Polish bab` a creation probably arrived in Naples through France.
7
1. Introduction
1.1.2 Two equations with two unknowns I had been lucky that day because, in this case, the number of bign`e that I ended up buying was equal to the number of bab` a. What would have happened if my mother had given me 18 dollars instead of 15? I would not have known, then, how to formulate the problem so as to have one piece of pastry for each guest. I know how to do it now! Let us call x the number of bign`e and y the number of bab` a. Thus, we can say that I wanted x + y = 10 (total number of pastries), and I would have to pay 2 x + y dollars (namely 2 dollars for each bign`e and one dollar for each bab` a ), which had to be equal to the total amount of money available, namely 18 dollars. The corresponding equation is given by 2 x + y = 18. In mathematical jargon, we have a system of two equations with two unknowns (x and y), namely x + y = 10, 2x + y = 18.
(1.1)
I claim that the solution to the above problem is x=8
and
y = 2.
(1.2)
How did I come up with this solution? You’ll find that out in Section 2.2. For the time being, let us just verify that this is the case. [“Verifying” means that replacing the unknowns with the solution obtained, an equation becomes an identity (namely the left side equals the right side).] Indeed, we have: 8 + 2 = 10 and 2 × 8 + 2 = 18.
• An even more complicated problem At times, however, the problems can become much more complicated. Let us assume that both bign`e and bab` a, cost two dollars each. Something new appears in this case — if I replace a bign`e with a bab` a, the total cost does not change. Let us see how mathematics informs us of this fact. The problem may be formulated as follows: x + y = 10, 2x + 2y = c,
(1.3)
where c is the (yet unspecified) amount of dollars that my mother gave me. Let us discuss the problem on the basis of the value that the constant c takes. Let us assume first that my mother had given me exactly 20 dollars,
8
Part I. The beginning – Pre–calculus
namely that c = 20. We now have x + y = 10, 2x + 2y = 20.
(1.4)
In this case, the second equation is equivalent to the first from a mathematical point of view. For, if we divide the second by two we obtain the first. Then, we say that they are formally equal. However, the everyday meaning of the two equations is quite different. The first one says that the total number of pastries is 10, the second that the total cost is 20 dollars. The fact that they are formally equal means that, if the first is satisfied, the other is automatically satisfied as well. Thus, we say that the second equation is redundant (namely superfluous), in the sense that, from a mathematical point of view (and only from a mathematical point of view), the two equations tell us the same thing, although to the man on the street their meaning is totally different. Thus, one of the two equations may be dropped and we are left with one equation with two unknowns. As a consequence, the solution exists, but is not unique, since any combination x and y that adds up to 10 is a good solution: for instance, 7 bign`e and 3 bab` a, or 4 bign`e and 6 bab` a are both correct solutions to the problem, as you may verify. Next, let us assume that my mother had given me an amount different from 20 dollars, so that in Eq. 1.3 we have c = 20. Then, the two equations contradict each other. For, the same quantity (namely x + y) must be equal to 10 and to c/2 = 10. Of course, this is impossible, no matter how hard we try! In this case, the solution does not exist at all. Remark 1. I told you about these personal anecdotes not only to send the message “Math is everyday business,” but also to show you how quite intriguing mathematical problems may arise when you less expect them. Indeed, we have encountered an example of a general problem that we will address in depth later on. Specifically, if we have n equations with n unknowns (where n may take any value you wish), typically the solution exists and is unique (as for Eq. 1.1). On the other hand, under certain unusual conditions, the solution may not exist at all (as for Eq. 1.3 with c = 20), whereas, under even more unusual conditions, the solution might exist, but not be unique (as for Eq. 1.3 with c = 20).
1. Introduction
9
1.2 Where do we go from here? In the preceding section, we dealt only with whole numbers (namely 1, 2, 3, and so on). However, in practice, we have to deal with decimal numbers, be they of finite or infinite length, periodic or not. Accordingly, before continuing with the story on bign`e and bab` a, we have to take a step back and review some material that you learned in high school (or even earlier), and then some that is new to you.
1.2.1 Guidelines for reading this book Before anything else, you ought to be familiar with some guidelines on how to read this book. These guidelines have been provided in the User Guide. However, if you are like me, you probably skipped the User Guide. If so, I summarize them here, just to be on the safe side. [If you have not read the Preface either, this is the first of three volumes. The collection of the three volumes is referred to as the book.] The hardest issue I had to deal with in organizing the subject matter was how to group and sequence the material. Should I place a result just before it is needed, so as to make your life easier, as typical in a textbook? Or, should I group together material on the same subject matter, so as to be useful as a reference book, as typical in a treatise? I chose this second option, again with exceptions. [For instance, systems of linear algebraic equations and matrices are covered in five different chapters, three of which are in this volume (Chapters 2, 3 and 16.] However, one issue arose. When I decided to group together closely related material, I ended up placing together very simple material with very difficult ones. This is especially true in this chapter, where you will find, side by side, grammar school material with much more sophisticated stuff. That said, I chose to have it both ways — I wanted a book that is introductory and easy to read (like a textbook), but I also wanted the material not to be broken up in bits and pieces, so as to make it useful as a treatise, a reference book. Accordingly, as pointed out in the User Guide, I devised a system that allowed me to have my cake and eat it too. Specifically, some of the material is marked with the symbols ♦ , ♥ , ♣ , ♠ , with the following meanings and recommendations: ♦
The material is very elementary, and I included it because I always hated so–called introductory books that assume that I know much more than what I actually do. It is for those of you that have been out of high
10
Part I. The beginning – Pre–calculus
school for a while and feel rusty on the subject matter. I recommend that you glance at it and skip it if you realize that it is something quite familiar to you from high school. [These are the parts where I often use illustrative examples instead of axioms or proofs.] ♥ Extra material that I believe you might enjoy learning. The reason to include it is to give some flavor to the presentation, to spice it up, so to speak. It is like adding a yummy piquant sauce to a pasta dish. I refer to these items as “divertissements.” However, some of this material might be a bit complicated. My recommendation is that you work on any material of this kind only if you enjoy it and do not find it too difficult to follow, as the material is virtually inessential to the understanding of the rest of the book. ♣ I placed such material at that point of the book simply to group it with material of the same nature (rather than inserting it when it is needed), thereby avoiding spreading it in bits and pieces. However, I deem it too difficult for you at that point in our journey. Nonetheless, it is used later in the book, and sooner or later you might want to read it. For the moment, it can wait. I recommend that you postpone it and read it only when it is referenced. ♠ The material is typically quite difficult and not required for the understanding of the rest of the book. It is included to address some issues at a deeper level. [Sometimes it is used just in connection with the proof of a theorem if I deem the proof not essential.] I didn’t make much of an effort to reduce its complexity. I recommend that you skip this material altogether. You might want to go back to it, when you feel ready for it and can no longer keep your curiosity under control! I used two more symbols, primarily in footnotes, namely:
: This material deals with the meaning or the root of a term. Again, if you are not interested, just skip it. : This material is of historical interest. It is used in particular in those footnotes that give a short biography of any scientist mentioned in the text. If you are not interested in the historical developments, just skip this material. Read carefully: In the Index, the biographies are listed by the last name of the scientist, followed by their first name. Also, if you are looking for the meaning of a symbol, all the symbols listed at the beginning of the Index. ◦ Warning. I strongly encourage you to pay attention to these guidelines. Otherwise, do not complain that the book is too difficult to read. [More details in the User Guide.]
11
1. Introduction
Remark 2. Let me say this again. As mentioned in the Preface, in this book mathematical concepts are first dealt with at an intuitive level. The expression used throughout the book to indicate this approach is “gut–level mathematics.” [A rigorous approach follows the gut–level one only if I deem it necessary.] Between a long proof that is intuitive, and a short elegant proof that is not easy to grasp at a gut level, I will always choose the first! You can bet on it! [Recall that a proof is provided for every statement that is later used in this volume. However, sometimes, in particular in this chapter, I use illustrative examples instead of axioms or proofs.] ◦ Warning. Here, I presume that you need a refresher from high (or even elementary) school. However, as pointed out in the User Guide you may start reading this book wherever you feel comfortable with, since detailed references to required preceding material are provided throughout the book.
• What do you need to know? You might ask: “What do I need to know to be able to start this journey? What do you expect me to know?” I only expect you to be familiar with some material from high school. However, as stated above, I hate introductory books that assume too much from the reader. Accordingly, in Part I of this volume, I review all the material that you need to read this book, material that you had in high school, and some even earlier. Let’s be clear. This part is not meant to replace the courses that you had then — only to refresh your memory. In other words, I tried to cover the material from the ground up. Thus, in answer to your earlier question (“What do I need to know?”), I can say that I presume that you have completed your high school studies, and that you have no problem with the specific material reviewed in this chapter (of course, only with the basic portions, namely excluding those marked with ♣ , ♥ and ♠ ). In closing, as stated in the Preface, all you need is a spirit of scientific adventure, combined with patience, determination and perseverance.
1.2.2 Axioms, theorems, and all that stuff
♦
Before we tackle arithmetic, you might want to refresh your memory on some basic terminology. To begin with, mathematics may be divided into three major areas (see for instance Refs. [6], [7] and [8]):
12
Part I. The beginning – Pre–calculus
Algebra, which you might also remember from high school.1 This was introduced by the Arabs, in medieval times;2 Geometry, which includes Euclidean geometry, a subject introduced by the ancient Greeks, as you might remember from high school;3 Analysis, or mathematical analysis for clarity (also known as infinitesimal calculus), which is based upon the notion of limit (Chapter 8). This was introduced by Isaac Newton and Gottfried Wilhelm Leibniz (apparently independently), in the seventeenth century.4 Arithmetic vs algebra. What is the difference between arithmetic and algebra? What do they have in common? At the risk of oversimplifying, we can say that Arithmetic (from αριθμoς, arithmos, ancient Greek for number) is a branch of algebra that deals with numbers and the corresponding operations. Algebra is much more general. Equations and unknowns are part of algebra, not of arithmetic! Even limiting ourselves to elementary algebra, we can say that the formulation regarding bign` e and bab` a is already within the realm of algebra. [Want to know more about what is covered in modern algebra? Take a look at the book A Survey of Modern Algebra, by Birkhoff and Mac Lane (Ref. [12]).] 1
2 Muhammad bin M¯ us¯ a al–Khw¯ arizm¯ı (c. 780–850), a Persian mathematician, astronomer and geographer, is widely considered to be the father of algebra, as he wrote an influential book widely considered to be the first book on algebra. The book, an expanded compilation of how to solve some linear and quadratic equations, is entitled Al–Kitab al– mukhtas.arf¯ı h¯ıs¯ ab al–˘ gabr wa’l–muq¯ abala (Ref. [3]), Arabic for The Compendious Book on Calculation by Completion and Balancing, and was later known as Kitab al–Jabr wa–l– Muqabala, or simply as al–Jabr (literally, “Completion,” but typically translated as the The Book ). The term algebra stems from this title (Boyer, Ref. [13], p. 104). 3 The term geometry come from Ancient Greek: γεωμετ ρια (geometria, originally land– survey), from γε (earth, land) and μετ ρoν (measure). [The Greek alphabet is addressed in Section 1.8.] 4
“The differential and integral calculus was invented independently by Newton and Leibniz, although the ancient Greeks had developed a closely related approach — the method by exhaustion. It is the instrument for almost all higher mathematics” (Russell, Ref. [57], p. 536). To be a bit more specific, Newton and Leibniz, along with their followers quarreled about whom should be credited for it. Newton discovered it first but didn’t publish it immediately. A good example is his manuscript titled “De analysi per aequationes numero terminorum infinitas, composed in 1669 on the basis of ideas acquired in 1665– 1666, but not published until 1711” (Boyer, Ref. [13], p. 395). Similarly, his manuscript ”Method of Fluxions and Infinite Series, written in 1671, was not published until 1736” (Hellman, Ref. [30], p. 69). The first publication on the subjects is Philosophiae Naturalis Principia Mathematica (also known simply as Principia, Ref. [49]), dated 1687. Leibniz on the other hand rediscovered calculus in 1675–76, “in ignorance of Newton’s previous but unpublished work on the same subject. Leibniz’s work was first published in 1684. Newton’s in 1687. The consequent dispute as to priority was unfortunate, and discreditable to all parties” (Russell, Ref. [57], p. 582). More importantly, they used different approaches and this fact had an important effect on the evolution of mathematics in England and on the Continent: “Because Newton’s major work and first publication on the calculus, the Principia, used geometrical methods, the English continued to use mainly geometry for about a hundred years after his death. The Continentals took up Leibniz’s analytical methods and extended and improved them. These proved to be far more effective; so not only did the English mathematicians fall behind, but mathematics was deprived of contributions that some of the ablest minds might have made” (Klein, Ref. [38], Vol. 1, pp. 380–381). [See also Footnote 1, p. 367, on the different notations used by Newton and
1. Introduction
13
Also, “valid facts” in mathematics are either definitions (for which a proof is not envisioned), or propositions (for which a proof is provided). A proposition is usually referred to as a theorem. However, it is called a lemma if it is introduced just to streamline the proofs of other theorems. [In other words, a lemma is a fancy name for a “second–class” theorem, typically one needed only to prepare you for the proof of the theorem that follows.] On the other hand, a proposition is called a corollary if it is a simple consequence of the preceding theorem. Theorems, lemmas and corollaries are composed of a hypothesis (which is what we assume to be true, or the framework under which we are operating), a statement (which is what we want to prove), and the corresponding proof.
• Axioms, postulates We have to be careful here! There is a risk. We might end up proving Proposition B in terms of Proposition A, and then Proposition A in terms of Proposition B. This would be faulty logic, a vicious circle, like a dog chasing its own tail. Accordingly, we have to begin from something that is not to be proven. For this, we need axioms (or postulates), namely statements for which a proof is not provided. ◦ Comment. For the ancient Greeks, an axiom was a self–evident truth. However, at some point in time, it was noted that the Euclid fifth postulate was not at all self–evident. Indeed, it could be replaced by a different axiom, thereby introducing a geometry different from the Euclidean. [More on this in Definition 95, p. 183.] Since then, the term axiom has been understood with its modern meaning: one simply assumes it to hold true (sort of a hypothesis), and looks at its consequences. ◦ Warning. An axiomatic approach is one based upon the use of axioms. [For an example of an axiomatic formulation of numbers, see Subsection 1.6.7.] As stated in the Preface, in line with the gut–level mathematics approach, I’ll stay away from axiomatic formulations. [The motivations for this choice are further addressed in Subsection 1.6.7.] Leibniz, and their impact on subsequent developments.] If you would like to learn more about the disputes between Newton and Leibniz (and their followers), see Mathematical Thought from Ancient to Modern Times by Morris Klein (Ref. [38], Vol. 1, pp. 356–381), A History of Mathematics by Carl B. Boyer (Ref. [13], pp. 391–414), and Great Feuds in Mathematics: Ten of the Liveliest Disputes Ever by Hal Hellman (Ref. [30], pp. 51–72).
14
Part I. The beginning – Pre–calculus
• Primitive terms Similar considerations hold for definitions. We want to avoid “circular” definitions, namely defining “B ” in terms of “A,” and “A” in terms of “B.” Accordingly, we need “primitive terms,” namely terms for which the definition is not provided, in the sense that it is not deemed necessary, because everybody is assumed to be familiar with what is meant.
1.3 Natural numbers Do you remember what natural numbers are? What about integers and rationals? What about irrational, real, imaginary and complex numbers? Let us start with these numbers, just to make sure we are operating on the same wavelength. Natural numbers are addressed in this section, with integers, rationals, real and complex numbers covered in the following ones. ◦ Warning. Allow me to point out that the material concerning numbers (from natural to complex) is very important, since it is at the foundation of mathematics, and is highly sophisticated if treated at the depth it deserves. I consider a rigorous treatment of the foundations of mathematics way too difficult for you at this point. Instead, in this chapter I will not be as rigorous as I would like to be — if I did, I am afraid you’d stop reading the book right here, and say: “An easy introduction! What’s he talking about? This is way too difficult for me to read.” Accordingly, in this chapter the basic material (and only the basic one, again that not marked with ♣ , ♥ , ♠ ) is presented at a high school level (some even at a grammar school level). [If you want to know more about number theory (still relatively elementary), you might like to check out the book by Tattersall (Ref. [63]). ◦ Comment. If you want to have an idea of how complicated the foundation of mathematics can be, borrow from your library the book Principia Mathematica, by Bertrand Russell and Alfred Whitehead (Ref. [58]), an over a century old milestone in the field. A relatively more recent overview of the issues involved is given in Ref. [6] (not easy reading either). For an intermediate level approach, still considerably beyond the scope of this book, see the book by Walter Rudin (Ref. [56]). Excellent historical presentations are available in the books by Morris Kline (Mathematical Thought from Ancient Days to Modern Times, Ref [38]), G. Temple (100 Years of Mathematics – A Personal Viewpoint, Ref. [65]), and Charles B. Boyer (History of Mathematics, Ref. [13]). They address, in particular, how the work of Russell and Whitehead was reinterpreted by Kurt G¨odel (Ref. [24]), to reach quite the opposite conclusions, namely the G¨odel theorem on formally undecidable propositions.
1. Introduction
15
Also very interesting, on historical matters in mathematics, is the book by Hal Hellman, Great Feuds in Mathematics: Ten of the Liveliest Disputes Ever (Ref. [30]), which shows that mathematics is not as clear–cut a subject, as they made me believe — it is sometimes a matter of opinion, contrary to the Italian say: “La matematica non `e un’opinione” (mathematics is not an opinion”).] Let us introduce the following Definition 1 (Natural number). Consider the ordered list of the numbers 0, 1, 2, 3, and so on. These are called the natural numbers. The symbol N denotes the collection of all the natural numbers. A natural number n1 is called greater (smaller ) than another natural number n2 if it follows (precedes) n2 in the ordered list of the natural numbers. Zero is the smallest of all the natural numbers. ◦ Warning. The definition of natural numbers is not uniform throughout the mathematical literature. Some authors do not include the number 0 with the natural numbers. Some even allow both possibilities in their definition (namely with and without zero). I chose the above definition simply because of convenience, as it will simplify the definition of the so–called natural powers of a number a, namely 1, a, a2 , and so on (Definition 16, p. 37). ◦ Comment. In school, the natural numbers are often referred to as the “whole numbers.” However, I prefer to avoid the term “whole number.” First of all, it is very rarely used in the mathematical literature (college level or higher). To make things worse some authors state that whole numbers include 0, ±1, ±2, and so on. [The symbol ± indicates that both possibilities — plus and minus — are included.] In this book, these numbers are called integers (Definition 8, p. 24). Because of these considerations, I have chosen to ban the term “whole number” altogether from now on.5 ◦ Comment. You might not know that, in the ancient days, zero was not a number. If you want to learn more about all the pains the poor “zero” had to go through before being accepted with the other numbers, you might enjoy reading the book Zero. The Biography of a Dangerous Idea, by Charles Seife (Ref. [60], in particular pp. 71–81). According to Seife, in the wake of the invasion by the Arabs, who had adopted the zero from the Indians, accepting the introduction of zero as a number was resisted for centuries, especially by the Catholic Church, because accepting the zero meant accepting the void. In turn, accepting the void meant contradicting Aristotle (who, contrary to the atomists, didn’t believe that the void existed), thereby invalidating some of 5
You might be interested in the corresponding use in some other languages. In German the term whole translates as “ganz,” and “ganze Zahl ” (whole number) in the only term used for integer. The same holds true in Italian: the term whole translates as “intero,” and “numero intero” (whole number) is again the only term used for integer.
16
Part I. The beginning – Pre–calculus
the Church’s beliefs, including some proofs of the existence of God. According to Seife (Ref. [60], p. 78), “The man who reintroduced zero to the West was Fibonacci.”6 Still according to Seife (Ref. [60], pp. 80–81), the acceptance was not immediate. In 1299, the city of Florence was still banning the use of zero. Only its usefulness to the accounting systems of Italian merchants forced the entrance of zero into western civilization in the fifteenth century. ◦ Comment. Incidentally, the Mayas included the zero in their numbering system. Thus, each of their eighteen 20–day months started with day 0 and ended with day 19 (Seife, Ref. [60], pp. 16–18). With their counting system, the last day of 1999, would have been truly the end of the last millennium, namely the end of a period composed of 2000 years, starting from day one, . . . pardon! . . . starting from day zero.
1.3.1 Pierino. Arithmetic with natural numbers
♦
To address the rules of arithmetic, let me tell you another story. When I was a young student in college, in order to make some pocket money, I used to give private lessons. One time, I had a student, Pierino, who was particularly smart and inquisitive, but definitely ill–prepared to take his arithmetic test — before I started working with him, of course. He was not happy just with proofs: he wanted to have a gut–level understanding of arithmetic, which would help him in “seeing” the various rules, and hence remembering them. He was also quite persistent and wouldn’t give up, unless I presented him with some sort of example that would help him with that. Here are the rules of arithmetic, pretty much the way that I finally came up with, in order to make him satisfied. [Incidentally, “gut–level mathematics,” is an expression I picked up from Pierino. He used the Italian expression “matematica viscerale.” The English literal translation, visceral mathematics, doesn’t really cut it. It does not quite have the same connotation.] My work with Pierino was limited to natural numbers. Starting from 0, each number is equal to the preceding one increased by 1. As I presented it to him, one may conceive of a specific number m (for instance, m = 23) as representing the number of beads in a jar, starting from zero (no beads) and adding one bead at a time. 6
Leonardo da Pisa (born Leonardo Bonacci, c. 1170–1250), better known as Fibonacci (from filius Bonacci, Latin for son of Bonacci), was a mathematician from Pisa (now in Italy), considered one of the great mathematicians of the Middle Ages. He wrote the highly influential Liber Abaci (Ref. [23]), which is among the first Western books based on Hindu– Arabic numbers. The book includes the Hindu–Arabic place–valued decimal system (zero included of course), and the Fibonacci sequence, discussed later (Eq. 8.25).
17
1. Introduction
To make Pierino’s life simpler, I used very simple definitions for the operations of addition and multiplication. For instance, regarding addition, one may conceive the natural number m + n as follows. Consider a jar that is initially empty; first put in m beads, and then add n more beads. The number m + n is equal to the total number of beads in the jar. Regarding multiplication, one may conceive the number m × n as representing the total number of beads that end up in an initially empty jar, after you put in the jar first m beads, then m more additional beads, and then m more beads, for a total of n times. [The terms in a sum are called the addends.7 The terms in a product are called the factors.] These definitions made it easy for him to grasp that by adding 0 to a number you obtain that same number. Similarly, if you multiply a number by 1, you obtain the same number. Using mathematical expressions, these rules are written as m+0=m
and
m × 1 = m.
(1.5)
Remark 3. As you might have learned in school, by definition the equality symbol “=” has the following three properties: (i) reflexive, namely a = a, (ii) symmetric, that is, if a = b, then b = a, and (iii) transitive, that is, if a = b and b = c, then a = c). Next, I had to introduce the properties of natural numbers, again by using a gut–level approach, namely using examples rather than axioms or proofs. To begin with, we have that the order of the terms in a sum is irrelevant, as is the order of the terms in a product. Expressing this in mathematical terms, we have m+n=n+m
and
m × n = n × m.
(1.6)
[Incidentally, these are called the commutative properties of addition and multiplication, respectively. Don’t start panicking! I’ll not use this terminology — I always hated it, as I could never remember which is which. Instead, I will refer to the equation number.] Regarding the first equality in Eq. 1.6, Pierino was only partially satisfied when I pointed out that the number of beads in the jar does not depend on the order in which one puts the beads in the jar. That didn’t quite cut it! He was finally satisfied when I suggested that he could think of m + n as follows: place m beads, aligned along a straight line, followed by n beads, aligned along the same line. Then, by rotating the picture by 180◦ , one obtains the representation for n + m and of course the same number of beads. For the 7
From addendum, Latin for “that which must be added” (gerundive of addere, Latin for “to add”).
18
Part I. The beginning – Pre–calculus
second equality in Eq. 1.6, he was happy when I pointed out to him that one may represent m × n as n rows of m beads each. By rotating the picture by 90◦ , one obtains the representation for n × m, namely m rows of n beads each. Of course, the number of beads is not changed by the rotation. Next, I had to convince him that, if one has a sequence of additions, one can arbitrarily choose which one to perform first. The same holds for a sequence of multiplications. For instance, we have (m + n) + p = m + (n + p)
and
(m × n) × p = m × (n × p).
(1.7)
[Incidentally, these are called associative properties of addition and multiplication, respectively.] The first expression in Eq. 1.7 says that adding p to m + n is equal to adding m to n + p. This is an indication that the number of beads in the jar does not depend upon how you group them. Alternatively, he could think of m beads aligned on a straight line, followed by n beads and then by p beads. By rotating the image by 180◦ , we have the representation of (p + n) + m, which is equivalent to m + (n + p), because of the first in Eq. 1.6. The total does not depend upon the order we put them in the jar! [The use of parentheses here is self–evident. More complicated cases are addressed in Subsubsection “Please excuse my dear aunt Sally (PEMDAS),” p. 30.] The second in Eq. 1.7 says that multiplying m × n to p is equal to multiplying n × p by m. I told Pierino to think of (m × n) × p as p layers of n rows of m beads each, namely a parallelepipedal configuration, with m beads in one direction, n beads in another direction, and p beads along the vertical direction. By a suitable rotation of this configuration, you may have m beads along the vertical direction, which is a representation of the product (n × p) × m, which equals m × (n × p), because of the second equation in Eq. 1.6. Of course, we have the same number of beads in this case as well, thereby showing the validity of the second in Eq. 1.7. Then, we had to come up with a way to visualize (m + n) × p = m × p + n × p.
(1.8)
[Incidentally, this is called the distributive property.] Equation 1.8 says that multiplying m + n by p is equal to adding m × p to n × p. I told Pierino to consider p rows of m + n beads split into two groups, the first composed of p rows of m beads and the second composed of p rows of n beads. The number of beads is the same for the two cases. Finally, we have one more rule, with which he never had any problem. Specifically, it was clear to him that a number multiplied by zero gives zero m×0=0
(1.9)
19
1. Introduction
and that if m × n = 0, then, necessarily, m = 0, or n = 0, or both.
(1.10)
This last rule implies that if m × n = 0 and m = 0,
then, necessarily, n = 0.
1.3.2 Division of natural numbers
(1.11)
♦
In grammar school, you learned how to divide a natural number, say n, by another one, say m = 0, so as to obtain n = m × q + r,
where r is smaller than m,
(1.12)
and n, m = 0, q and r are natural numbers. The number n is called the dividend (from “dividendum,” Latin for “that which is to be divided”), m is called the divisor (from “divider,” Latin for “that which divides”), q is called the quotient (from “quotiens,” Latin for “how many times”) and r is called the remainder. The corresponding procedure is often referred to as the long–division algorithm for natural numbers. [For instance, if you divide 201 (dividend) by 10 (divisor), you obtain 20 (quotient), with a remainder equal to 1. Indeed, 201 = 10 × 20 + 1.] ◦ Comment. Let us be a bit more sophisticated. Let us start from zero and keep adding the divisor m = 0. Say we found a number q˘ + 1 such that m × (˘ q + 1) exceeds n, whereas m × q˘ doesn’t. Then, the quotient q coincides with q˘, whereas the remainder r is given by r = n − (m × q) ∈ [0, m). In other words, given n and m = 0, we can always obtain (for instance by using the long division algorithm) q as well as r. This result is unique. Accordingly, we have the following Definition 2 (Divisible, or multiple). A natural number n is called divisible by (or multiple of) a nonzero natural number m if, and only if, n and m are such that the operation in Eq. 1.12 yields a remainder r equal to zero. Remark 4 (If, and only if ). If this is the first time you have encountered the expression “if, and only if,” you might wonder why I have added the portion “and only if.” So, let us address some basic elements from the field of mathematical logic. The statement “a certain fact (say Proposition A) is true if another fact (say Proposition B) is true” means the following: “the fact that B is true implies that A is also true” (in other words, “the fact that
20
Part I. The beginning – Pre–calculus
A is false implies that B is also false”). On the other hand, the statement “A is true only if B is true” means the following: “the fact that A is true implies that B is also true” (in other words, “the fact that B is false implies that A is also false”). Thus, the statement “A is true if, and only if, B is true” is equivalent to the following two statements: (i) “the fact that B is true implies that A is also true,” and also (ii) “the fact that A is true implies that B is also true.”8 [Looking at it from a different angle, namely using a different terminology, the statement “A is true if B is true” is equivalent to: “the fact that B is true is a sufficient (but not necessary) condition for A to be true,” whereas the statement “A is true only if B is true” is equivalent to: “the fact that B is true is a necessary (but not sufficient) condition for A to be true.” Finally, the statement “A is true if, and only if, B is true” is equivalent to the fact that “B is true is a necessary and sufficient condition for A to be true.”] ◦ Warning. It is a standard practice among mathematicians to write “iff ” as a short–hand notation to mean “if, and only if.” From now on, I will often use this convention, especially in definitions. We have the following Definition 3 (Division for natural number). Iff n is divisible by m, we may introduce the operation of division for integers as the inverse of the operation of multiplication. Specifically, we write n/m = q
or
n = q, m
(1.13)
iff n = q m. Remark 5. I will not use the symbol “p : q” to denote division. Specifically, I will use the symbol “:” only in the following instances: 1. “:=” or “=:” to mean “equal, by definition” (Definition 32, p. 64). 2. Matrix operation, A : B (Eq. 3.95).
• Prime numbers We have the following Definition 4 (Prime number). A natural number greater than 1 is called prime iff it is divisible only by 1 and itself. 8
Just to make sure: “equivalent” is a two–way street — if A is equivalent to B, then B is automatically equivalent to A.
21
1. Introduction
Accordingly, 1 is not a prime, even though in the past some mathematicians have included it among the primes. [The fact that the definition excludes the number 1 from the primes is necessary for the uniqueness in the fundamental theorem of algebra (Theorem 3, p. 23).] The first few prime numbers are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37. We have the following Theorem 1 (The number of primes is infinitely large). There exist infinitely many primes. ◦ Proof : ♥ Consider the collection of all the prime numbers we have uncovered, say p1 , p2 , p3 , . . . , pk . We have to show that there exists at least an additional one. The number n = (p1 × p2 × p3 × · · · × pk ) + 1
(1.14)
is not divisible by any of the numbers in the collection. [Indeed, for any ph (h = 1, . . . , k) the remainder is 1 = 0.] Accordingly, there exist two possibilities: n is either a prime, or is divisible by a prime different from those above. In either case, there exists an additional prime.
1.3.3 Fundamental theorem of arithmetic
♥
We have the following Theorem 2 (Fundamental theorem of arithmetic. Existence). Any natural number greater than one may be written as a product of prime numbers, called the “prime factors.” Specifically, for any natural number n, we have n = p1 × p2 × p3 × · · · × pk ,
(1.15)
where the prime factors p1 , . . . , pk are prime numbers, not necessarily different from each other. The expression on the right side of Eq. 1.15 is called the “factorized form of n.” The multiplicity of a factor is the number of times such a factor appears in the factorized form. ◦ Proof : ♥ The theorem is trivially true if n is prime. [The fact that it may be difficult to determine whether n is prime is of no interest here.] Hence, let us assume that it is not prime. Accordingly, there exists a number, say n1 = n, such that n/n1 = n2 . [The fact that it may be difficult to identify n1 is also of no interest here.] Thus, we have n = n1 × n2 . If both n1 and n2 are
22
Part I. The beginning – Pre–calculus
prime, we have completed the proof. So, let us consider the case in which n1 and/or n2 are not prime. We may repeat the considerations presented above for n, and continue to do so until all the factors are prime. ◦ Comment. The prime factors of the product of two natural numbers include all of those of each of the two numbers, as you may verify. [Hint: Perform the product in factorized form.]
• Euclid lemma In order to address the uniqueness, we need a preliminary result, namely the Euclid lemma. To introduce the problem, consider the product of two natural numbers, say 36 = 4 × 9. We have that 36 is divisible by m = 6, even though neither 4, nor 9 is divisible by 6. [In fact, we have that 4 divisible by 2, and 9 divisible by 3, and that 6 = 2 × 3. Hence, 36 is divisible by 6.] Things however are different if the divisor m is a prime number. In this case, we have the following9 Lemma 1 (Euclid lemma). If the product n = n1 × n2 of two natural numbers, n1 and n2 , is divisible by a prime number p, then at least one of the two natural numbers, n1 or n2 , must be divisible by p. ◦ Proof : ♥ An equivalent statement is: “If n1 and n2 are not divisible by the prime p, then their product n = n1 × n2 is not divisible either.” Accordingly, let us prove this statement instead. If n1 and n2 are not divisible by p, it means that we have nh = p × qh + rh (Eq. 1.12), with rh greater that 0 and smaller than p (h = 1, 2). Therefore, we have n = n1 × n2 = (p × q1 + r1 ) (p × q2 + r2 ) = p × p × q1 × q2 + r1 × q2 + r2 × q1 + r1 × r2 .
(1.16)
To prove that n is not divisible by p, we simply have to show that r1 × r2 is not divisible by p. To this end, note that both r1 and r2 are smaller than p, and hence each one of them is factorizable with prime factors less than p, and so is their product.
• Uniqueness of factorization (Eq. 1.15) We have the following 9
The lemma is in Euclid’s Elements (Book VII, Prop. 30; Ref. [22], Vol. 2, pp. 319–332).
23
1. Introduction
Theorem 3 (Fundamental theorem of arithmetic. Uniqueness). The factorized form in Eq. 1.15 is unique. ◦ Proof : ♥ Assume that we have obtained two different representations for n, namely n = p1 × p2 × · · · × ph
and
n=p ˘1 × p ˘2 × · · · × p ˘k .
(1.17)
According to the Euclid lemma (Lemma 1 above), at least one of the ˘pj , say p ˘∗ , must be divisible by the prime number p1 . However, p ˘∗ is itself a prime number, and hence, being divisible by p1 , must coincide with p1 . Next, let p ˘∗ be renamed p ˘1 . [Rearrange the order of the ˘pj ’s if necessary.] Then, we can divide the two right sides of the equalities in Eq. 1.17 by p1 = p ˘1 . Set n1 = n/p1 (smaller than n, since 1 is not a prime number). Next, we may repeat our reasoning and obtain that there exists a ˘p2 = p2 , along with the resulting number n2 = n1 /p2 ). We may repeat this a total of h times and obtain that k = h and pj = p ˘j (j = 1, . . . , k).
1.3.4 Even and odd numbers
♦
We have the following definition Definition 5 (Even and odd number). All the numbers that are multiples of 2 are called even. The others are called odd. Accordingly, we obtain that the number zero is an even number, as you may verify. We have the following Theorem 4 (Product of natural numbers). The product of two natural numbers, say n1 and n2 , is odd if, and only if, n1 and n2 are both odd. ◦ Proof : If either n1 or n2 is even, there would be a factor 2 in the corresponding factorization, and hence their product n1 × n2 would be even.
1.3.5 Greatest common divisor, least common multiple ♥
We have the following
24
Part I. The beginning – Pre–calculus
Definition 6 (Greatest common divisor). Given two natural numbers, the greatest common divisor is the largest number that divides both of them. The greatest common divisor equals the product of all the common prime factors, with the multiplicity equal to the lowest one encountered in either number, as you may verify. We also have the following Definition 7 (Least common multiple). Given two natural numbers, the least common multiple is the smallest possible natural number that may be divided by both of them. The least common multiple equals the product of all the prime factors in the two numbers, with the multiplicity of equal to the highest one encountered in either number, as you may verify.
1.4 Integers Sometimes, we use numbers that have a sign. For instance, we say the temperature of the air is −5 degrees to mean that it is 5 degrees below the zero, independently of the scale you use to measure the temperature. We can also say that we had a temperature drop of 10 degrees, bringing the temperature from 5 degrees to −5 degrees. Accordingly, the introduction of the operation of subtraction (inverse of the operation of addition) forces us to broaden our system of numbers to include negative numbers. Specifically, we have the following Definition 8 (Integer). The integers (from the Latin word “integer,” namely entire) are: 0, ±1, ±2, ±3, ±4 and so on. Thus, the integers may be divided into three groups: (i) positive integers, namely 1, 2, 3, 4, . . . , which coincide with the nonzero natural numbers; (ii) negative integers, namely −1, −2, −3, −4, . . . , which are the opposites of the positive integers; (iii) the zero, which is the opposite of itself. The symbol Z denotes the collection of all the integers (from Z, for Zahlen, German for numbers).
1.4.1 Giorgetto and arithmetic with integers
♦
Here, we extend the rules of arithmetic from natural numbers to integers. [This subsection is based upon my private lessons to Pierino’s older brother,
25
1. Introduction
Giorgetto, who even more inquisitive than Pierino. I used to call him Curious George. If you are like my young students Pierino and Giorgetto, and want to have a gut feeling for the rules, you might want to read this subsection.] Remark 6. The extension, from natural numbers to integers, is obtained by imposing that the rules introduced for natural numbers are still valid. For negative integers, we define −n as the number that added to n gives zero: n + (−n) = 0. We typically write n − n = 0. Adding a negative integer is equivalent to subtracting the corresponding positive integer. It may be thought of as removing beads from the jar. For instance, if we put 5 beads in the jar, and then remove 5 beads, we are left with no beads in the jar: 5 + (−5) = 0. Next, we have m × (−n) = −m × n. [For, subtracting m beads from a jar n times is equivalent to subtracting m × n beads.] Moreover, I claim that (−m) × (−n) = m × n,
(1.18)
a particular case of which is the familiar (and mysterious) rule (−1)2 = 1. In order to prove this, simply note that both sides of Eq. 1.18 are opposite of m × (−n). Indeed, we have (−m) × (−n) + m × (−n) = (−m) + m × (−n) = 0, m × n + m × (−n) = m × n + (−n) = 0. (1.19) [Hint: Use (m + n) × p = m × p + n × p (Eq. 1.8), which we impose to be valid for integers as well (Remark 6, above).] As a consequence, we have the following rule: The product of two numbers with the same sign is positive; The product of two numbers with opposite signs is negative.
(1.20)
We have the following Definition 9 (Natural power of integers). The n-th power of an integer m is given by the number 1 multiplied n times by m, namely mn = m × · · · × m,
(1.21)
n times
where n = 1, 2, . . . is called the exponent. According to Eq. 1.20, even powers of a negative integer m (namely when n is an even natural number) are positive, whereas odd ones are negative.
26
Part I. The beginning – Pre–calculus
◦ Comment. The above definition appears to me preferable to that I learned in high school, namely that an equals the number a multiplied by itself n times (as in Eq. 1.21), because it includes a0 = 1 (except for the case a = 0, Eq. 1.54). Next, let us address these generalizations at a greater depth. Subtraction is by definition the inverse of addition. Specifically, we say that p = m − n,
iff n + p = m.
(1.22)
For instance, 2 = 5 − 3, since 2 + 3 = 5. In particular, we have (reverse the order of the two expressions in the above equation, and set p = 0) m = n,
iff m − n = 0.
(1.23)
This allows us to introduce two more rules. Specifically, given an equality, you can: (i) add any number to both of its sides, and (ii) multiply both of its sides by any number. Let us prove these two rules. The first may be stated as m + p = n + p,
iff m = n.
(1.24)
[Indeed, according to Eq. 1.23, the first expression is equivalent to (m + p) − (n + p) = m − n = 0, in agreement with the second expression.] The second rule may be stated as: m=n
implies m × p = n × p.
(1.25)
[Indeed, according to Eq. 1.23, the first expression is equivalent to m − n = 0, namely p × (m − n) = 0, in agreement with the second expression.] Remark 7. Let us be careful here. The reverse of Eq. 1.25 is true only if p does not vanish. Specifically, we have that m×p=n×p
implies m = n,
iff p = 0.
(1.26)
Indeed, dividing an equation by zero can yield disastrous consequences. For instance, starting from n = 0, we have n × 1 = 0, and hence n × 1 = n × 0. Then, dividing by n, one obtains 1 = 0!!! The explanation for this conundrum is simple. In this case, dividing by n is not allowed, because n = 0. For future reference, we also have that m = n and p = q
imply
m + k × p = n + k × q,
as you may verify. [Hint: Use Eq. 1.25, and then Eq. 1.24.]
(1.27)
27
1. Introduction
1.5 Rational numbers In Definition 3, p. 20, we introduced the operation of division between two natural numbers as the inverse of the multiplication, under the assumption that the first number is divisible by the second, namely that the remainder r in Eq. 1.12 is equal to zero. We do not know what to do when the remainder differs from zero. In order to address this issue, we have to broaden our system of numbers by introducing rational numbers. Specifically, we have the following Definition 10 (Rational number). A rational number r (or simply a rational) is the ratio of two integers, say p and q, namely the fraction “p divided by q,” denoted by r = p/q
or
r=
p , q
(1.28)
where p is called the numerator, q the denominator. The expression p/q will be referred to as the fractional representation of the rational number r. The symbol Q denotes the collection of all the rational numbers (from Q, for Quotient). ◦ Comment. Note that the same number may have different fractional representations. Indeed, we can multiply numerator and denominator by the same number (Eq. 1.30), and still have the same rational, albeit with a different representation. [For instance, 1/2 and 2/4 correspond to the same rational number.] Accordingly, we have the following Definition 11 (Irreducible fraction). We can divide numerator and denominator by their common factors (namely their greatest common divisor, Definition 6, p. 24). This yields a unique fraction that cannot be reduced any further (no more common divisors left). Such a fraction is said to be irreducible, or in lowest terms.
1.5.1 Arithmetic with rational numbers
♦
We know that division is the inverse of multiplication. In mathematical term, we have that r=
p , q
iff
r × q = p.
(1.29)
28
Part I. The beginning – Pre–calculus
For instance, 1.5 = 3/2, since 1.5 × 2 = 3. As a consequence, we have that we can multiply numerator and denominator by the same number m = 0, namely r=
p p×m = . q q×m
(1.30)
[Hint: The first equality in Eq. 1.30 yields r × q = p, whereas the second yields r × q × m = p × m, which is equivalent to the first, because of Eq. 1.25.] ◦ Comment. A number r = p/q is positive iff p and q have the same sign, negative iff p and q have opposite signs. Accordingly, multiplying the numerator and denominator by −1, we have r=
p −p = . q −q
(1.31)
Thus, without loss of generality, we can always assume the denominator to be positive, since, according to the equation above, we can always change the signs of both numerator and denominator, without changing the value of r.
• Sum of rationals Next, we have to address a subtle rule, one which a few of my students have had problems with. How do we add p1 /q1 to p2 /q2 ? Once, one of my students (a junior in engineering! Boo to him!) wrote, incorrectly, r=
p1 p2 p1 + p2 + = . q1 q2 q1 + q2
(1.32)
Another wrote, also incorrectly (boo to him too!) r=
p1 p2 p1 + p2 + = . q1 q2 q1 × q2
(1.33)
There seemed to be some confusion among my students (a very very few, I must add) as to what the correct rule is. So, let us look at the problem carefully. Let’s say you want to add a quarter of an hour to a third of an hour. That would be 15 minutes plus 20 minutes, for a total of 35 minutes. What we did here is first to make sure that the two terms are “compatible,” namely that they are both expresses in minutes (sixtieth of an hour). Then, they can be added, as we are no longer mixing apples and oranges. In general, we multiply numerator and denominator of p1 /q1 by q2 , and numerator and denominator of p2 /q2 by q1 , so as to have a common denominator. Then, we can add/subtract the two terms and obtain
29
1. Introduction
r=
p1 p2 p1 × q2 p2 × q 1 p1 × q2 ± p2 × q1 ± = ± = . q1 q2 q1 × q2 q2 × q 1 q1 × q2
(1.34)
For future reference, Eq. 1.34 yields in particular m+
m×q p m×q+p p = + = . q q q q
(1.35)
[Alternatively, Eq. 1.34 may be proven by using a more elegant approach (definitely less gut–level). Specifically, let us multiply r by q1 × q2 to yield p1 p2 × q1 × q2 = p1 × q2 ± p2 × q1 . r × q 1 × q2 = ± (1.36) q1 q2 Then, dividing by q1 × q2 , we obtain the same result as in Eq. 1.34.] ◦ Comment. Note that, given two rational numbers, say r1 = p1 /q1 and r2 = p2 /q2 , we have (use Eq. 1.34) r1 − r 2 =
p1 p2 p 1 × q 2 − p2 × q 1 − = . q1 q2 q1 × q2
(1.37)
Thus, two rational numbers, say r1 and r2 are equal (namely r1 − r2 = 0) iff p1 × q2 = p2 × q1 . ◦ Comment.♣ You might get picky and point out that the denominator in Eq. 1.34, namely q1 × q2 , may be replaced by the least common multiple between q1 and q2 (Definition 7, p. 24). You would be correct. However, this would make the presentation more cumbersome, without adding any insight into this issue. That said, Eq. 1.34 is OK as is.
• Multiplications and divisions of rationals Two more rules and we are home free for operations on rational numbers. The first pertains to multiplications. The product of two rationals equals the product of the numerators divided by the product of the denominators: r=
p1 p2 p1 × p2 × = . q1 q2 q1 × q2
(1.38)
[Hint: Multiply the first equality by q1 and then by q2 , to obtain r ×q1 ×q2 = p1 × p2 . Then, divide the result by q1 × q2 .] The last rule to consider pertains to divisions. The ratio of two rationals equals the product of the numerator of the first times the denominator of the second, divided by the product of the denominator of the first times the numerator of the second, namely
30
Part I. The beginning – Pre–calculus
r=
p1 /q1 p1 q2 p1 × q2 = × = . p2 /q2 q1 p2 q 1 × p2
(1.39)
Indeed, multiplying the first equality by p2 /q2 , one obtains r×
p1 p2 = . q2 q1
(1.40)
Then, multiplying by q2 and dividing by p2 , we obtain the second equality in Eq. 1.39. For instance, using Eqs. 1.38 and 1.39, we have
2/3 2 3 2 4 8 = = × = . (1.41) 3/4 3 4 3 3 9 Indeed, three fourths of 8/9 equal 2/3.
• Please excuse my dear aunt Sally (PEMDAS) In Subsection 1.3.1, we have seen the use of parentheses in the expressions (m + n) + p = m + (n + p) and (m × n) × p = m × (n × p) (Eq. 1.7), as well as (m + n) × p = m × p + n × p (Eq. 1.8). In these expressions, we have used parentheses to impose the order in which the operations are to be performed. Here, we want to discuss the issue for more general cases, for which the situation is not self–evident. Accordingly, let us address the use of parentheses, namely “(. . . ).” The same rules apply to brackets, namely “[. . . ],” and curly brackets (or braces), namely “{. . . }.” Specifically, we briefly review the “standard order” in which the operations are performed, namely the priorities of the four operations introduced thus far. ◦ Comment. The main reason to present this material is to tell you the convention used in this book for the symbol “/”. ◦ Warning. It should be emphasized that the rules presented here (namely the standard order of the operations and the rules for the use of the parentheses) are valid for all the number systems, namely: natural numbers, integers, rationals, as well as real numbers, and even complex numbers. The rules are relatively clear for multiplication, addition and subtraction: multiplications have priority over additions and subtractions, whereas additions and subtractions are “same level ” operations, in the sense that one does not have priority over the other. For example, we have that a + b × c means: multiply b times c and then add a to the result. If we want the addition to be performed first, we have to add parentheses, so as to have (a + b) × c, which means: “Multiply by c the result of the addition a + b.”
31
1. Introduction
When we deal with division, things get a bit more complicated. We have a a to distinguish between and a/b. Specifically, when we use the notation , b b no ambiguity arises. For instance, a1 × (b1 + c1 ) a2 × (b2 + c2 )
(1.42)
means: “Divide the result of the operations in the numerator by the result of the operations in the denominator.” However, the matters get muddy when we deal with a/b. What do we mean by a/b × c? Do we mean a/b × c = a/(b × c) =
a , b×c
or
a/b × c = (a/b) × c =
a × c? b
(1.43)
When I was in school and even in college, I was taught that the correct interpretation is the first one. Then, for my doctoral thesis, I was going to use a computer. [Incidentally, this was the first computer available at my university and I was quite excited. In those days though, you could not use the computer yourself. I had to write a computer program (less than twenty lines in my case) and give it to an operator, who would punch a bunch of cards, get them read by the computer, compile and run the code, print the results and give them to me. He and I were never able to get the code to run properly. I ended up getting the result graphically by hand.] I learned then that — to my surprise — in FORTRAN (the language used by our computer) multiplications and divisions are treated as same–level operations, and are performed from left to right. In this case, the correct interpretation is the second one in Eq. 1.43. In FORTRAN, if you want to obtain the first result, you have to write a/(b × c). Recently, I checked with several of my colleagues, both in Italy and in the States, regarding which one is the correct interpretation in arithmetic. They came up with different answers, namely: (i) first option in Eq. 1.43, (ii) second option, and even (iii) “I do not follow a specific rule all the time.” For instance, one colleague said: “I use Rule 2, unless it is clear that Rule 1 applies, as in a/2b = a/(2b).” Then, I checked on the web, under “arithmetic, order of operations” and I found out that the prevailing rule in the United States is the opposite of what I was taught, namely that the order of the operation is as follows: 1. Parentheses; 2. Exponents; 3. Multiplications and Divisions (same level operations; from left to right); 4. Additions and Subtractions (same level operations; from left to right).
32
Part I. The beginning – Pre–calculus
[Of course, the rule for division refers to the sign “/” (see Eq. 1.42 otherwise).] The initials of the six operations listed above form the acronym PEMDAS. A mnemonic gimmick, often used in the States, to remember the acronym is the phrase “Please Excuse My Dear Aunt Sally.” ◦ Rule used in this book. In order to eliminate any possible misunderstanding, I will avoid the use of a/b × c altogether. I will use either: 1. a/(b × c), even if I was taught that the parentheses in this case are unnecessary in mathematics (but necessary in some computer programming languages, and in the PEMDAS rules), or 2. (a/b) × c, since, the way I was taught, the parentheses in this case are necessary in mathematics (but not in some computer programming languages). ◦ Warning. In this book, if you find a/b × c, it’s an accidental (Freudian?) lapsus. I like to call it a “slip of the . . . finger.”
1.5.2 The rationals are countable
♥
Let us begin by introducing the following definitions: Definition 12 (One–to–one relationship). By one–to–one relationship between the elements of two distinct mathematical entities, we mean that there is a full correspondence between the elements of the two groups, in the sense that for any element of the first group there exists one and only one corresponding element in the second group, and vice versa. Definition 13 (Countable/uncountable collection of numbers). A collection of infinitely many numbers is called countable, iff they can be organized so as to be in a one–to–one relationship with the natural numbers (namely, so as to have an element in correspondence with the number zero, then one in correspondence with the number one, and so on). A collection of infinitely many numbers that are not countable is called uncountable. Next, we have the following Theorem 5. The rational numbers are countable. ◦ Proof : To this end, consider first the positive rationals. Let us place them in an array that contains all of them, as
33
1. Introduction
1/1 1/2 1/3 1/4 1/5 · · · 2/1 2/2 2/3 2/4 2/5 · · · 3/1 3/2 3/3 3/4 3/5 · · · 4/1 4/2 4/3 4/4 4/5 · · · 5/1 5/2 5/3 5/4 5/5 · · · ··· [It is apparent that the array contains all the positive rational numbers. You will find p/q on the p-th row and q-th column of the array.] Next, we can order them by successive diagonal lines (ascending from left to right), with an increasing number of elements in each diagonal. [In order to avoid any possible misunderstanding, the first few numbers are shown in Table 1.1. Note that the sum of numerator and denominator of each term on any given diagonal is constant. For instance, 3+1=2+2=1+3.] This process generates a one–to–one correspondence between positive rationals and natural numbers. Next, let us consider all the rationals. Specifically, let us consider the following sequence. Starting with zero as the first number in the sequence, let us include the positive rational numbers in the sequence shown in Table 1.1, with each one of them immediately followed by the corresponding negative one. [To avoid any possible misunderstanding, the first few numbers are shown in Table 1.2.] We have hereby generated a one–to–one correspondence between rationals and natural numbers. In other words, the rational numbers are countable!
Natural Rationals
0
1
2
3
4
5
6
7
8
9
10
...
1/1
2/1
1/2
3/1
2/2
1/3
4/1
3/2
2/3
1/4
5/1
...
Table 1.1 Ordering of positive rationals Natural
0
1
2
3
4
5
6
7
8
9
10
...
Rationals
0
1/1
-1/1
2/1
-2/1
1/2
-1/2
3/1
-3/1
2/2
-2/2
...
Table 1.2 Ordering of rationals
1.5.3 Decimal representation of rationals
♣
Let us introduce the following definitions: Definition 14 (Decimal representation). The decimal representation of a rational number consists of two strings of digits, separated by a point,
34
Part I. The beginning – Pre–calculus
called the decimal point. [Use the long division algorithm, or your calculator, to obtain that.] The portion to the left of the decimal point is called the integer part, that to the right of the decimal point is called the fractional part. There exist three possibilities for the fractional part of the decimal representation of a rational number, namely 1. It does not exist (or, if you prefer, it is equal to zero). These are integers, namely rational numbers with an irreducible fraction (Definition 11, p. 27), whose denominator equals 1. 2. It is composed of a finite number of digits. These are called terminating decimals. 3. It is composed of an infinite string of digits, which after a while contain a substring that repeats itself forever. These are called periodic, or recurring decimals. [A proof of this fact is given in Theorem 6, p. 35.] Definition 15 (Periodic number, period). The term periodic includes all three types (namely integers, terminating, and recurring). The portion of the fractional part that repeats itself is called the period. The number of digits of the period is called the period length. As you might remember from high school, in order to shorten our representations, we put a bar (called the vinculum for your information) over the period. [For instance, we have 1/3 = 0.333 · · · = 0.3.] Integers and terminating decimals are considered to have a period equal to zero, which is typically omitted. [For instance, the number 1 may be conceived as equivalent to 1.000000 · · · = 1.0, namely as a number with a period equal to zero. Similarly, we typically write 9/8=1.125, instead of 9/8 = 1.1250 (terminating decimal).] In order to clarify the classification above, let us consider some illustrative examples. For instance, if we use a calculator, we obtain 49/30 = 1.633 333 333 333 333 333 33.
(1.44)
If our calculator had more digits, we would simply have more threes. [For a proof follow the procedure in Eq. 1.50.] The digit 3 repeats itself forever. Hence, 3 is the period of this number. Thus, we write 49/30 = 1.63. Next, let us make things a bit more interesting. Let us evaluate 10/17. Using my calculator, I get 10/17=0.5882352941176470 5882352941176470. Had I had a calculator with a greater number of digits, I would have obtained 10 = 0.5882352941176470 5882352941176470 5882352941176470. 17
(1.45)
35
1. Introduction
You might note that the sequence 5882352941176470 repeats itself. Indeed, if you had a (highly theoretical) calculator with an infinite number of digits, you would obtain that this portion keeps on repeating itself forever. In other words, the period is 5882352941176470, and we can write 10/17 = 0.5882352941176470. [A proof is given below (see Eq. 1.51).]
• Any rational number is periodic
♥
It should be emphasized that decimals that are not periodic are not rational numbers. For, we have the following Theorem 6. Any rational number p/q is necessarily periodic and the period length cannot be greater than q − 1. ◦ Proof : ♥ If we divide p by q (long division algorithm, extended beyond the decimal point), we have a sequence of operations, each of which yields a remainder. Let us see what happens, specifically what happens after we start working on the fractional part of the resulting number. If the remainder equals 0, the period equals zero (Definition 15 above). So, we can concentrate on the possibility that the remainder is never equal to zero. Let us assume that we encounter a value of the remainder that is equal to one of those we already encountered while working on the fractional part of the resulting number. In this case, the sequence of operations repeats itself and therefore the number is periodic. Turning to the last part of the theorem, recall that, by definition, the remainder is an integer smaller than the divisor q (Eq. 1.12). Therefore, there exist at most q − 1 distinct nonzero values for the remainder. Accordingly, the number of digits in the period cannot be greater than q − 1. For instance, for the number 10/17, the sequence 5882352941176470 in Eq. 1.45 has 16 digits.
• Any periodic number is rational
♥♥♥
Extra! Extra! Read all about it! This material is included to satisfy any curiosity you might have on this issue. We have seen that any rational number has a periodic decimal representation. You might wonder whether the reverse is true. The answer is: “Yes.” Indeed, we have the following Theorem 7. Any periodic decimal number is rational.
36
Part I. The beginning – Pre–calculus
◦ Proof : Consider the following algorithm. Let r denote a rational number in its decimal periodic representation. For simplicity, let us assume r to be positive. Let m denote the number of digits of the decimal–part portion before the period. Multiply r by 10m so as to move the decimal point just in front of the periodic portion. Thus, we can write r × 10m = k + 0.p,
(1.46)
where k and the period p are natural numbers. Next, let n denote the period length. We have, ... “Surprise! Surprise!”: 0.p × 10n = p + 0.p.
(1.47)
[For instance, we have 0.367 × 103 = 367.367.] The above equation may be written as 0.p × 10n − 0.p = p, or 0.p =
p . 10n − 1
(1.48)
Finally, combining Eqs. 1.46 and 1.48 and using a + b/c = (a c + b)/c (Eq. 1.35), we obtain 1 k × (10n − 1) + p p r= m × k+ n = , (1.49) 10 10 − 1 10m × (10n − 1) which is the ratio of two integers. This is a fractional representation of the periodic number under consideration. Hence, r is a rational number. We can summarize the above result with the following rule to transform a periodic number r into a fractional representation. Multiply r successively by 10 (say m times) so as to move the decimal point just in front of the periodic portion. Let p denote the period. Separate the result into the sum of k and 0.p. Multiply 0.p by q = 10n − 1 = 99 . . . 99, with as many 9’s as the period length, namely n. This gives you p, namely an integer. Then, r may be expressed as in Eq. 1.49. For instance, using my calculator I obtain 1492/770 = 1.9376 623. On the other hand, using the algorithm summarized above, we have 376, 623 19 × 999, 999 + 376, 623 1 19 + = 1.9376 623 = 10 999, 999 10 × 999, 999 19, 376, 604 1, 492 18, 999, 981 + 376, 623 = = . (1.50) = 9, 999, 990 9, 999, 990 770 [Hint: For the last equality, divide numerator and denominator by 12,987.] Similarly, applying our rule to 10/17 = 0.5882352941176470 (Eq. 1.45), we have
37
1. Introduction
5882352941176470 10 = . 9999999999999999 17
(1.51)
◦ Comment. In summary, the system of fractional numbers and that of periodic decimal numbers coincide, in the sense that any periodic number may be written as the ratio of two integers, and vice versa. Remark 8. Note that 0.9 = 1. If you are not convinced, consider the fact that 1-0.999 999 999=0.000 000 001. Taking more and more digits, we have 1 − 0.9 = 1 − 0.999 999 999 · · · = 0.
(1.52)
Alternatively, we have that 10 × 0.9 − 0.9 = 9.9 − 0.9 = 9 (use the rule in the proof of Theorem 7 above). This implies 0.9 = 9/(10 − 1) = 9/9 = 1. The reason to bring this up is simply to point out that there is at least one case in which the decimal representation is not unique. A similar situation occurs whenever the period is 9. You’ll be glad to know that this is the only possible type of non–uniqueness for the decimal representation. [Regarding the non–uniqueness of the fractional representation, go back to Definition 11, p. 27, of irreducible fractions.]
1.6 Real numbers Let us begin with the following (an extension of the natural power of an integer, Eq. 1.21) Definition 16 (Natural power of a). The n-th power of a real number a, denoted by b = an ,
(1.53)
equals the number 1 multiplied n times (n = 0, 1, . . . ) by the number a. In particular, for n = 0 we have a0 = 1
(a = 0).
(1.54)
[For the case a = 0, see Remark 9 below.] The numbers 1, a, a2 , . . . are called the natural powers of a. Remark 9. The reason for assuming a = 0 in Eq. 1.54 is that the expression 00 is complicated and controversial. Whereas it may be convenient in many cases to define 00 = 1, I believe that this may be a source of confusion. For instance, as we will see, x0 = 1 for any value of x = 0, whereas 0x equals 0 for
38
Part I. The beginning – Pre–calculus
any value of x = 0. Accordingly, I prefer to leave the value of 00 undefined, and address the issue case by case. Next, consider the inverse operation of x = y 2 . We have the following Definition 17 (Square root). The square root b of a positive number a, √ denoted by b = a, is a positive real number b > 0 such that its square equals a, namely √ b = a > 0, a>0 . (1.55) iff a = b2 In this book, notation
√ a (a > 0) is always understood to be positive. I will use the √ b± = ± a,
(1.56)
to indicate both roots. [If a = 0, we have b = 0.] We have seen that all the rational numbers are periodic. However, not all the numbers are √ periodic. An example familiar to all of us is the square√root of 2, namely 2 = 1.4142135623730950488016887 . . . . [The proof that 2 is irrational is given in Theorem 8 below.]10 In general, we have the following Definition 18. The n-th root of a real number x, is the number y whose n-th power gives back the original number x, namely y=
√ n
x
is equivalent to
x = yn .
(1.57)
Thus, the introduction of roots forces us to broaden our system of numbers to include irrational, and hence real numbers. These are addressed in the remainder of this section. 10 “The square root of 2, which was the first irrational to be discovered, was known to early Pythagoreans” (Russell, Ref. [57], p. 209). After all, according to the well–known b2 = c2 , as we will see in Section 5.8), the diagonal– Pythagorean theorem (namely a2 +√ to–side ratio of any square equals 2. Pythagoras of Samos (c. 582–507 BC), a Greek mathematician, astronomer, scientist and philosopher (as well as musician, according to the legend), was the founder of the mathematical, mystic, religious, and scientific society called the Pythagorean Brotherhood. According to Penrose (Ref. [53], p. 10), “the members of the Pythagorean brotherhood were all sworn to secrecy. Accordingly, almost all of their detailed conclusions have been lost. Nonetheless, some of these conclusions were leaked out, with unfortunate consequences for the ‘moles’ — on at least one occasion, death by drowning!” The rule of secrecy was indeed taken seriously: “Hippasos of Metapontion, who violated this rule, was shipwrecked as a result of divine wrath at his impiety” (Russell, Ref. [57], p. 33).
1. Introduction
39
1.6.1 Irrational and real numbers Accordingly, let us introduce the following Definition 19 (Irrational number). Numbers whose decimal representation is not periodic are called irrational. ◦ Comment. Of course, this does not mean that they have gone out of their minds! Nor does it mean that these numbers behave in an unpredictable way, or that they use no logic in their thinking. It simply means that they are not rational, namely that they cannot be expressed as the ratio of two integers, as rational numbers can. The proof of the theorem that follows is the first example of a proof by contradiction (also known as reductio ad absurdum, Latin for “reduction to absurdity”). Specifically, let us introduce the following Definition 20 (Proof by contradiction). A proof by contradiction consists in the following. Assume (to obtain a contradiction) that opposite of the theorem to be true. Then, show that this yields a result that is patently wrong (contradiction). Therefore, our initial assumption must be wrong, thereby proving the theorem. We have the following11 √ Theorem 8. The number 2 is irrational. ◦ Proof : ♥ Assume, to obtain a contradiction, that there exists a rational number r = p/q such that r2 = (p/q)2 = 2, where p and q have no common prime factors, so that p/q is irreducible (Definition 11, p. 27). This implies p2 = 2 q 2 , namely that p is even. [For, if p were an odd number, its square would be odd (Theorem 4, p. 23).] In turn, if p is even, there exists an integer p1 = p/2. Thus, we have that 4p21 = 2 q 2 , or q 2 = 2 p21 , namely that q is √ even. However, this contradicts the fact that p/q is irreducible. Therefore, 2 is not rational. Finally, we can discuss the real numbers, which are the typical numbers used in this book. We have the following Definition 21 (Real number). A number whose decimal representation has a fractional part that is not necessarily periodic is called real. [In other words, the collection of the real numbers include all the rational numbers and all the irrational numbers.] The symbol R denotes the collection of all the real numbers. 11
The theorem is in Euclid’s Elements (Book II, Prop. 10; Ref. [22], Vol. 1, pp. 395– 402). The proof given here is often referred to as the Euclid proof of the irrationality of √ 2.
40
Part I. The beginning – Pre–calculus
◦ Comment. The term “real ” may be misleading. It makes it sound like the other systems of numbers (namely natural, integer and rational) are unreal, or maybe fake. However, the use of the term “real ” will start to make sense when we will encounter imaginary numbers (Section 1.7). They are really ... unreal! Totally out of this world!
• Equalities In analogy with Eq. 1.23, we say that a = b iff a − b = 0. We have that iff a = a and b = b , then a + b = a + b .
(1.58)
For, (a + b) − (a + b ) = (a − a ) + (b − b ) = 0. This allows us to say that if two equations are satisfied then their sum is satisfied as well.
• Absolute value Next, let us introduce the following Definition 22 (Absolute value). The absolute value of a number, say a, is denoted by |a| and is defined as follows: |a| = a, = −a,
if a is positive or zero; if a is negative.
(1.59)
◦ Warning. The definition applies as well to integers and rationals. For future reference, note that | a × b | = | a | × | b |.
(1.60)
[Additional rules regarding the use of the absolute value are presented later (Eqs. 1.70–1.72).]
1.6.2 Inequalities Here, we consider inequalities. Recall that real numbers may be positive, or negative, or zero (Definition 21, p. 39). We have the following
41
1. Introduction
Definition 23 (The symbols >, , and agree that to say that a > 0 (to be read as “a is greater than zero”) means that a is positive, whereas to say a < 0 (to be read as “a is less than zero”) means that a is negative. Similarly, we introduce the symbol ≥, and agree to say that a ≥ 0 (to be read “a is greater than or equal to zero”) means that a is nonnegative, whereas to say that a ≤ 0 (to be read “a is less than or equal to zero”) means that a is nonpositive. Accordingly, Eq. 1.20 may be restated as follows a × b > 0,
iff
(a > 0 and b > 0),
or
(a < 0 and b < 0);
a × b < 0,
iff
(a > 0 and b < 0),
or
(a < 0 and b > 0).
(1.61)
We also have the following Definition 24 (The inequality a > b). We say that a > b,
iff
a − b > 0.
(1.62)
Similar definitions apply to the other types of inequalities. Accordingly, for a, b, c real, we have a>b
is equivalent to a + c > b + c,
a>b
is equivalent to a × c > b × c
(c > 0),
(1.64)
a>b
is equivalent to a × c < b × c
(c < 0),
(1.65)
a > b and b > c
imply a > c.
(1.63)
(1.66)
[Hint: The first three equations are an immediate consequence of the definition in Eq. 1.62 (use also Eq. 1.61). For the fourth one, use Eq. 1.62 to rewrite both inequalities on the left of Eq. 1.66 as a − b > 0 and b − c > 0. Then, add the two left sides and use the fact that the sum of two positive numbers is positive. The end result is a − c > 0, which is equivalent to a > c, as in Eq. 1.66.] ◦ Comment. The first statement (Eq. 1.63) says that we can add any number to both sides of an inequality. The second and third say that we can multiply any inequality by a positive real number, whereas if we multiply an inequality by a negative real number, the inequality gets reversed (namely the > sign is replaced by the < sign). The last one says that if a is greater than b, which in turn is greater than c, then a is also greater than c. Only slightly more complicated are the following inequalities
42
Part I. The beginning – Pre–calculus
a>b a>b
1 1 > b a is equivalent to an > bn
is equivalent to
(a, b > 0),
(1.67)
(a, b, n > 0),
(1.68)
where a and b are real, and n integer. To prove the first one, let us multiply both sides of a > b by 1/(a b) > 0. This is a legitimate operation, because of Eq. 1.64. This yields 1/b > 1/a, as in Eq. 1.67. On the other hand, to prove Eq. 1.68, use Eq. 1.64 to write a > b as a/b > 1.
(1.69)
Multiplying this by a/b, we have a2 /b2 > a/b. Then, use again a/b > 1 to obtain a2 /b2 > a/b > 1. This way, we have obtained a2 /b2 > 1 (use Eq. 1.66), in agreement with Eq. 1.68 with n = 2. You may repeat the procedure and obtain, successively, a3 > b3 , a4 > b4 , . . . , an > bn , with n as large as you want. [If we want to be rigorous, we should use the principle of mathematical induction, which we will address in Subsection 8.2.5. However, the above proof suffices here.] For future reference, note that if a and ±b have the same sign then |a ± b| equals |a|
+ |b|; on
the other hand, if a and ±b have opposite signs then |a ± b| equals |a| − |b| . We may exploit this to write
|a| − |b| ≤ |a ± b| ≤ |a| + |b|. (1.70) This is sort of an overkill, because the expression in the middle can take only the two outer values: the left equal sign occurs when the signs of a and ±b are opposite, the right one when they are equal. Nonetheless, it is a useful inequality to have for future reference. Next, replacing a with a + c and b with b + c, Eq. 1.70 yields
|a + c| − |b + c| ≤ |a − b| ≤ |a + c| + |b + c|. (1.71) The left equal sign occurs when the signs of a + c and b + c are equal, the right one when they are opposite, as you may verify. Also for future reference, note that (use Eq. 1.70, as well as |a × b| = |a| × |b|, Eq. 1.60)
α × a + β × b ≤ α × a + β × b . (1.72) [If you are not familiar with the Greek alphabet, the symbols α and β denote the Greek letters alpha and beta (Section 1.8).]
43
1. Introduction
1.6.3 Bounded and unbounded intervals Consider the following12 Definition 25 (Peano epsilon). The symbol ∈, known as the Peano epsilon, means “belongs to.” We have the following Definition 26 (Bounded interval). Consider all the real numbers between a (called the lower boundary) and b (called the upper boundary), where b > a. The collection of all these numbers is called an interval. We use the following notation, depending upon whether the numbers a and/or b are included in the interval: x ∈ (a, b)
to mean a < x < b,
x ∈ [ a, b ]
to mean a ≤ x ≤ b,
x ∈ (a, b ]
to mean a < x ≤ b,
x ∈ [ a, b)
to mean a ≤ x < b.
(1.73)
All the intervals above are called bounded. The intervals (a, b) and [a, b] are called, respectively, open and closed. Sometimes, it may be of interest to consider all the numbers x such that x > a (or x ≥ a), as well as x < b (or x ≤ b). In this case, we use the following Definition 27 (Unbounded interval. The infinity symbol ∞). The symbol ∞, to be read infinity, is used in the following notations: 12 The symbol ∈ comes from , which denotes the Greek letter epsilon (Section 1.8, on the Greek alphabet). It is the first letter of the word στ ιν (estin, ancient Greek for “is”). It was introduced by the Italian mathematician Giuseppe Peano (1858–1932), in 1899 (Ref. [52]). Peano also introduced a form for the remainder of Taylor polynomials (Eq. 13.16). His main contribution is the axiomatic theory of natural numbers, namely 0, 1, 2, etc. In his axiomatic approach, he uses the terms number, zero, and immediate successor as primitive terms. His five postulates are: (i) zero is a number; (ii) the immediate successor of a number is a number; (iii) zero is not the immediate successor of a number; (iv) no two numbers have the same immediate successor; (v) any property belonging to zero, and also to the immediate successor of every number that has that property, belongs to all the numbers. The fifth Peano postulate corresponds to the principle of mathematical induction, which we will address in Subsection 8.2.5. Incidentally, the five Peano postulates were the subject of the first day of classes I had in college, fresh out of high school, in the course of Mathematical Analysis I, taught by Prof. Gaetano Fichera. I remember him saying (jokingly?) that, when he got to page 2 of a book he was writing (in a cave, as a partisan during World War II), he wondered how he could justify the page numbering, since he had not yet introduced the number 2. For this reason, he said, he didn’t use the postulates [Iff I understood him correctly, of course! The class was pretty big, several hundred students, and the acoustics was lousy.]
44
Part I. The beginning – Pre–calculus
x ∈ (a, ∞)
to mean a < x,
x ∈ [ a, ∞)
to mean a ≤ x,
x ∈ (−∞, b)
to mean x < b,
x ∈ (−∞, b]
to mean x ≤ b,
x ∈ (−∞, ∞)
to mean any number.
(1.74)
All the intervals above are called unbounded. Remark 10. In this book, ∞ is not a real number. It is only a convenient symbol used to indicate an unbounded interval. [More uses will be introduced later.]
1.6.4 Approximating irrationals with rationals
♥
For any real number, a, we can find a sequence of rational numbers that gets closer and closer to a. Specifically, any real number may be “identified” as closely as we wish by using rational numbers. For instance, we have √ 1.414 213 562 373 095 0488 016 887 < 2 < 1.414 213 562 373 095 0488 016 888. Note the generality of this approach. Indeed, given a, for any prescribed q (arbitrarily large), there always exists a unique p such that the product q a lies between p and p + 1, namely p ≤ q a < p + 1,
(1.75)
p p+1 ≤a< , q q
(1.76)
or
which is read “a falls between p/q and (p + 1)/q,” or “a is bracketed by p/q and (p+1)/q.” The length of the corresponding interval (namely 1/q) is called the bracket. Accordingly, we say that any real number may be bracketed by two rational numbers, with a bracket as small as we wish. For example, if √ a = 2 (namely a2 = 2), given any q as large as we wish, Eq. 1.75 may be satisfied by finding p such that p2 < 2 q 2 < (p + 1)2 . Thus, we have to look at the list of the squares of the integers and find out which is the unique number p whose square is less than 2 q 2 (a known quantity), whereas the square of the following one is greater.
45
1. Introduction
Remark 11. This approach, namely “getting closer and closer without ever quite reaching” (that is, “bracketing something, with a bracket that may be made as small as we wish,” as in Eq. 1.76) is a very powerful tool and will be used again and again, for instance in connection with similar triangles (Theorem 42, p. 184), or in generalizing the definition of power ab to the case in which a and b are both real numbers (Section 6.5). [It should be pointed out that such an approach is an anticipation (a “sneak preview ”) of the notion of limit addressed in Chapter 8.]
1.6.5 Arithmetic with real numbers
♥
For real numbers, we cannot use beads in a jar. Thus, the discussion of the rules of arithmetic within the field of real numbers has to be modified. [Of course, this is true for rational numbers as well.] The plan of action is as follows: instead of using beads to represent (natural) numbers, I will use geometry to obtain the rules of arithmetic. Remark 12 (Compass and straight edge). As a matter of fact, this approach was used by the ancient Greeks to solve algebraic problems (geometric algebra). In those days, algebra had not yet been introduced. All the geometrical and algebraic proofs had to be obtained necessarily only by using compasses and straight edges.
• Spaces. Points along a line vs numbers. Distance In order to achieve the above objective, let us begin by considering the notion of space and its dimensions. In classical mechanics, the world we live in has three dimensions. What does that mean? Let’s begin with some examples. To identify the location of a point on the Earth, we need latitude, longitude and altitude above (or below) the sea level. You may also think of height, width and length of a room, or the use of the terms back–and–forth, left–and–right and up–and–down. All we need to identify your location are three numbers. This is what we are referring to when we use the term three–dimensional space. Were you to live on a surface, akin to the inhabitants of Flatland, described by Edwin A. Abbott (1838–1926) in his 1884 book Flatland: A Romance of Many Dimensions (Ref. [1]), your world would be a two–dimensional space. You would need two numbers to identify the location of a point in your space.
46
Part I. The beginning – Pre–calculus
If you were living along a line (like yourself in a car while driving along a highway) your world would be a one–dimensional space. You would need only one number to identify the location of a point (namely the location of your car along the highway). To make our life simple, let us begin by addressing one–dimensional spaces, which is all we need now. Specifically, here we address the relationship between numbers and points along an infinite line. To proceed, let us introduce the so–called unit length, which is used to measure lengths, namely the distances between two points, measured along the line. If you think of yourself traveling along a highway, you would use, as your unit length, a mile or a kilometer. If you measure the length of a piece of cloth, you could use a yard or a meter. You could also use the combined length of your spread arms, which corresponds to two yards in Leonardo da Vinci’s Vitruvian Man.13 Or you could use the distance from your nose to your hand.14 The next item of business is to introduce a special point on the string from which we measure distances — any point would do, as long as we pick one. We will refer to this point as the origin of our one–dimensional world and denote it by O. Moreover, let us choose one of the two directions that you can travel along the line starting from the origin, and call it the positive direction. [Typically, for a horizontal line on a piece of paper the positive direction is to the right of O.] Finally, consider the point along the positive direction of the string that has a unit distance from the origin. We will refer to this as Point P1 . We can repeat the process and identify a second point that has a unit distance from P1 , also along the positive direction, so that its distance from the origin equals twice the unit length. We will refer to this as Point P2 . We may perform this process n times and identify n points Pk (k = 1, 2, . . . , n), whose distance from the origin equals k time the unit length. These points are in a one–to– one correspondence (Definition 12, p. 32) with the first n natural numbers, 1, 2, . . . , n. If you go in the negative direction (namely the opposite one), you can introduce a one–to–one correspondence between P−1 , P−2 , . . . , P−n , and the negative integers, −1, −2, . . . , −n. As n gets larger and larger, we have 13 Leonardo da Vinci (1452–1519) was an Italian artist and scientist. He had strong interests in a variety of subjects, which include painting, sculpture, architecture, mathematics, engineering, anatomy and astronomy, just to name a few. He is widely considered as the ideal Renaissance man. 14 Supposedly, the yard was introduced by Henry I of England (c. 1068–1135), as the length of a string stretched from the tip of his nose to the end of his thumb. The tradition even says that a foot corresponds to the length of his foot, and an inch to the width of his thumb. [In some languages, a single term is used for both inch and thumb. For instance, in Italian the term “pollice” is used for both.] However, the origin of these terms appears to be much more complicated, going back to the ancient Romans.
1. Introduction
47
generated a one–to–one correspondence between the integers and the points O, P±1 , P±2 , P±3 , P±4 , . . . , as depicted in Fig. 1.1. [I never said that the line had to be straight. Think of the line on a map representing a given highway.]
Fig. 1.1 Real line
Next, let us divide the portion of the string between O and P1 into ten equal sub–portions. This way, we have generated a one–to–one correspondence between the nine points at the sub–portion boundaries and the numbers 0.1=1/10, 0.2=2/10, . . . , 0.9=9/10. We may repeat the same construction for every portion of the string, say between Pn and Pn+1 , with n ∈ (−∞, ∞). Accordingly, we have constructed a one–to–one correspondence between the points introduced so far and all the decimal numbers with only one digit after the decimal point. We may repeat the same procedure for the resulting sub–portions of the line and construct a correspondence between points and all the decimal numbers with two digits after the decimal point (such as dollars and cents). We may continue this way and generate a one–to–one correspondence between all the real numbers and all the points on the line. [Of course, this is only an ideal construction, since — sooner or later — it will involve scale lengths that cannot even be measured.]15 Remark 13 (Difference between En and Rn♥ ). We should emphasize the importance of the introduction of the origin. Unless we introduce the origin, we cannot assign a number to the location of a point P on the line. In high school, you probably had a course in Euclidean geometry (which is reviewed in Chapter 5). The corresponding spaces are called Euclidean, and are denoted by En , where n = 1, 2, 3 is the number of dimensions. In a Euclidean space, there is no origin. In fact, the locations of the various 15
Let us be sophisticated, for once. The one–to–one correspondence of real numbers and points on a line is known as the Cantor–Dedekind axiom, after Georg Cantor and Julius Wilhelm Richard Dedekind (1831–1916), a German mathematician who made important contributions to abstract algebra.
48
Part I. The beginning – Pre–calculus
points on the plane (or in space) are irrelevant. For, we are not interested in the location of a point, but only in the relative positions of the various points, specifically on their distances. On the contrary, once we introduce the origin, we also have a way to identify the location. In the one–dimensional world under consideration, a real number x is all we need for this, provided of course that we introduce the origin and the positive direction. In Chapter 7, this is extended to two and three dimensions, by introducing Cartesian axes and Cartesian components to identify the location of a point, such as P = (x, y, z). Such spaces are called Cartesian, and are denoted by Rn , where n = 1, 2, 3 is the number of dimensions.
• Arithmetic rules for real numbers We may use the “getting closer and closer process,” used to go from rational to real numbers (Eq. 1.76), to extend the rules of algebra from rational to real numbers, as you may verify. However, here I propose an approach, which I find to be more gut– level (and so did Pierino’s and Giorgetto’s older brother, Filippo, nicknamed Pippo). Specifically, using the one–to–one correspondence introduced above, we can extend to real numbers all the rules that we have obtained thus far for integers and rational numbers. [This approach provides a gut–level understanding of the rules of algebra for rationals as well.] ◦ Warning. Here, we assume that the line is straight, so that the intervals correspond to segments. [I know! I am violating my own rule of not using a term before it is defined. Indeed, straight lines and segments will be formally introduced later (Definition 80, p. 176). What the heck! Nobody is perfect! You understand me anyway, don’t you? If not, go and read Chapter 5.] In particular, for additions, let us consider a horizontal straight line. Then, for visualizing, say a + b, we start from Pa and add an oriented segment (for a formal definition see again Definition 80, p. 176) with length b, which goes from left to right for b positive, whereas for b negative we add an oriented segment that goes from right to left. For subtractions, we use the rule that subtracting a positive (negative) number is equivalent to adding the corresponding negative (positive) one. Then, we may use the reasoning employed for integers (beads in a jar) and obtain that all the rules regarding addition and subtraction apply to real numbers as well, as you may verify. In Subsection 1.3.1, on arithmetic with natural numbers, for visualizing multiplications, we conceived the number m × n as representing the total number of beads that end up in an initially empty jar, when you put in the jar, first m beads, then m more additional beads, and then m more beads, for
49
1. Introduction
a total of n times. This was visualized as n rows with m beads each. As an alternative, we could replace each bead with a unit square (namely a square with sides equal to the unit length), to which we assign a unit area. Then, consider n rows with m unit squares each. We have m × n unit squares, to which we assign an area A = m × n. By extrapolation, for visualizing multiplications between real numbers, say a×b, we can use the area of a rectangle of sides a and b, namely A = a×b. Then, all the tricks I came up with to prepare Pierino for his exam may be used to obtain the equivalent rules for real numbers (of course, using lengths and areas instead of beads and rows of beads). Remark 14. ♣ If you would like to see how a simple idea may be made complicated, here is the lengthy version of the above extrapolation for multiplications of real numbers. Let us start with a unit square, namely a square with unit sides. We define its area to be equal to one, namely A = 1. Then, consider a rectangle with sides equal respectively to p1 and p2 , where p1 and p2 are positive integers. We identify its area with the number of unit squares that it contains, namely A = p1 × p2 . [In other words, we state that areas are additives, in the sense that the overall area of two distinct (specifically, not overlapping, with the possible exception of boundary points) geometrical figures equals the sum of the area of each one of them (additive property).] Next, consider a rectangle whose sides r1 and r2 are equal to rational numbers, say, r1 = p1 /q1 and r2 = p2 /q2 . In order to come up with the definition of the area for this case, consider a “mini–square” with sides equal to 1/(q1 × q2 ). There are (q1 × q2 )2 of these squares in a unit square. Thus, we can define the area of this mini–square to be A = 1/(q1 × q2 )2 . Accordingly, we have r1 =
p1 p 1 × q2 = . q1 q1 × q2
(1.77)
Thus, we can align p1 × q2 mini–squares along the side r1 . Similarly, we can fit p2 × q1 mini–squares along the side r2 . Therefore, our rectangle contains (p1 × q2 ) × (p2 × q1 ) mini–squares. Accordingly, the area is A=
p1 p2 (p1 × q2 ) × (p2 × q1 ) = × = r1 × r2 . 2 (q1 × q2 ) q1 q2
(1.78)
For instance, let us check this out against a rectangle with sides r1 = 5/3 and r2 = 1/2, for which you can actually make a drawing. There are 5 × 2 = 10 mini–squares of size 1/(q1 × q2 ) = 1/6 in direction r1 , and 3 in direction r2 , for a total of 10 × 3 = 30 mini–squares. Each mini–square has an area equal to (1/6)2 = 1/36. Accordingly, the total area is equal to 30/36=5/6.
50
Part I. The beginning – Pre–calculus
Next, comes a question from me to you: “What would you do if a and b are irrational?” [Hint: Use Remark 11, p. 44, on the process of getting closer and closer. For a formal definition of rectangles, see Subsection 5.6.1.] Accordingly, we can generalize, to real numbers, the natural–number property m × n = n × m (second in Eq. 1.6) into a × b = b × a.
(1.79)
Similarly, following the same approach (in particular, the one–to–one correspondence between real numbers and points on a line, introduced above), you may show that all the properties of the operations obtained for natural numbers, integers, and rational numbers hold as well for real numbers. [Alternatively, you may use the approach in Subsection 1.6.7 (axiomatic approach, something I do not recommend at this point in our journey).]
1.6.6 Real numbers are uncountable
♥
For the sake of completeness, I include the following16 Theorem 9 (Real numbers are uncountable). Real numbers in the interval (0, 1) are uncountable. ◦ Proof : We prove this by contradiction (Definition 20, p. 39). Assume (to obtain a contradiction) that the real numbers in the interval (0,1) are countable, namely that they can be arranged so as to be in a one–to–one correspondence with the natural numbers. Thus, we can list them as x1 , x2 , . . . . The contradiction is obtained by constructing a number a that is not included in such a list. Let aj and xkj denote the j-th digit in the decimal representation for a and xk , respectively. [In other words, a and xk may be expressed as a = .a1 a2 . . . and xk = .xk1 xk2 . . . .] Next, let us choose the first digit of a to be different from that of x1 (namely a1 = x11 ), so that 16
The procedure used in this theorem is known as the Cantor diagonal process, named after the German mathematician Georg Ferdinand Ludwig Philipp Cantor (1845–1911). Cantor’s mathematics was extremely innovative, truly amazing, and highly controversial at the time. He introduced the notions of one–to–one correspondence, countable and uncountable sets. His more advanced contributions (set theory with its paradoxes, transfinite numbers, ℵ0 , ℵ1 , ℵ2 , . . . , and the corresponding cardinal numbers), are way beyond the level of this book. Just to give you a glimpse of his work, he proved that there are infinitely many possible sizes (cardinality) for infinite sets. The amazing third–middle set (introduced in Vol. III) is also named after him. If you want to know more about Cantor and his feuds with Kronecker, you might like to read Chapter 6 of the book Great Feuds in Mathematics: Ten of the Liveliest Disputes Ever, by Hal Hellman (Ref. [30]).
51
1. Introduction
a = x1 . Then, choose the second digit of a different from that of x2 (namely a2 = x22 ), so that a = x2 . Continuing like this and choosing the k-th digit of a to be different from the k-th digit of xk (namely ak = xkk , so that a = xk ), we have generated a number that differs from all the xk , in contradiction to the assumption that the sequence x1 , x2 , . . . , includes all the real numbers, thereby proving that the real numbers in (0, 1) are uncountable.
1.6.7 Axiomatic formulation
♠
I have repeatedly stated that I do not use the axiomatic approach, which was mentioned in Footnote 12, p. 43, in connection with the Peano postulates. Here, however, I’ll give you a glimpse of it, so you have an idea of what I am trying to avoid. Definition 28. Consider a collection A of mathematical entities equipped with two operations (addition “+” and multiplication “×”) such that for every pair r and s in the collection, the sum r +s and the product r ×s belong to the collection. The collection is called a field iff the following axioms hold: 1. If r = r ∈ A and s = s ∈ A, then r + s = r + s and rs = r s 2.a. r + s = s + r, for all r and s ∈ A. 2.b. r × s = s × r, for all r and s ∈ A. 3.a. (r + s) + t = r + (s + t), for all r, s, t ∈ A. 3.b. (r × s) × t = r × (s × t), for all r, s, t ∈ A. 4. r × (s + t) = r × s + r × t, for all r, s, t ∈ A. 5.a. There exists an element 0, such that r + 0 = r, for all r ∈ A. 5.b. There exists an element 1, such that 1 × r = r, for all r ∈ A. 6.a. For each r ∈ A, there is an (−r) ∈ A, called the opposite of r, such that r + (−r) = 0. 6.b. For each r = 0 ∈ A, there is an r -1 ∈ A, called the inverse of r, such that r -1 × r = 1. 7. If r = 0 and r s = r t, then s = t. Delving deeper as to why I avoid the axiomatic approach, I should emphasize that the axioms are supposed to satisfy two conditions. Specifically, they have to be: (i) non–contradictory and (ii) independent. The first property means that no axiom contradicts the others. The second property means that we should make sure that we have not included as an axiom a fact that can be obtained for other axioms. [This is not an easy task. For instance, several great mathematicians tried and failed to prove that the Euclid fifth
52
Part I. The beginning – Pre–calculus
postulate can be obtained from the first four. More on the Euclid fifth postulate in Remark 47, p. 182.] Accordingly, some trivial results have to be from the original set of axioms, because the additional results cannot be included with the original axioms. [Were we to do this, the resulting set of axioms would violate the rule of their being independent.] For instance, r + t = s + t implies 0 = (r + t) − (s + t) = (r − s) + (t − t) = r − s, namely r = s, that is, any number may be subtracted from an equality. [Hint: Use Axioms 1 and 3.a.] Also, 0 × r + r = (0 + 1) × r = r implies 0 × r = 0. Similarly, “−1 × r” equals “−r” stems from −1 × r + r = (−1 + 1) × r = 0. [If you want to have an idea of how detailed and meticulous this process is, you might like to consult the book A Survey of Modern Algebra, by Birkhoff and Mac Lane (Ref. [12]).] ◦ Comment. The collection of all the real numbers satisfies the axiom properties in Subsection 1.6.7, and hence it is a field. The same holds true for rational numbers (and even for complex numbers, addressed in the next section). Accordingly, I will often use the expression in the real field (or in the complex field ).
1.7 Imaginary and complex numbers So far, we have addressed natural numbers, integers, rationals and real numbers (Sections 1.3–1.6). Do these encompass all the useful numbers? Not quite! Here, we deal with imaginary and complex numbers. Complex numbers will not be needed until we address quadratic equations (Subsection 6.3.3). Accordingly, I recommend that you skip reading the rest of this section and come back to it when needed.
1.7.1 Imaginary numbers
♣
We run into problems, for instance, when we try to find the square root of a negative number. This leads √ to the introduction of imaginary numbers, in particular the number ı = −1 , which by definition is such that ı2 = −1.
(1.80)
Next, let us introduce the imaginary numbers as the product of ı by a real number.
53
1. Introduction
Imaginary numbers well deserve their name, since they are truly a product of our imagination. They make the term “real numbers” quite appropriate in comparison.
1.7.2 Complex numbers
♣
Here, we introduce complex numbers. Remark 15. Complex numbers deserve their name. Indeed, they present a lot of complexities (play on words intended). They are the most complicated numbers that we encounter in this book. In spite of their lack of connection with physical reality, they can really be quite handy. Often their use simplifies considerably the mathematical manipulations needed to obtain the desired result. Indeed, after the initial shock, you will find out that they end up making our lives much, much simpler. The use of complex numbers may be very useful, even when the starting point and the end result are both real. Indeed, much of this book could not have been written without the use of complex numbers. This provides you with a glimpse of a very important lesson regarding mathematics: the more sophisticated is your math, the simpler it is to obtain the results. This whole book testifies to that. We have the following Definition 29 (Complex number). Complex numbers consist of the collection of all the numbers obtained as the sum of a real number and an imaginary number, namely z := x + ı y,
(1.81)
where the real numbers x and y are called respectively the real part and the imaginary part of z. They are denoted as follows x = (z)
and
y = (z).
(1.82)
The symbol C denotes the collection of all the complex numbers. The complex conjugate of z, denoted by an overbar (namely z), is defined as z := x − ı y.
(1.83)
Note that x = (z) =
1 (z + z) 2
and
y = (z) =
1 (z − z). 2ı
(1.84)
54
Part I. The beginning – Pre–calculus
The operations of addition and multiplication are identical to those for real numbers, plus of course the relationship ı2 = −1 (Eq. 1.80). Thus, given any two complex numbers, say z1 = x1 + ıy1 and z2 = x2 + ıy2 , we have z1 + z2 = (x1 + x2 ) + ı (y1 + y2 ) = z2 + z1
(1.85)
z1 z2 = (x1 x2 − y1 y2 ) + ı (x1 y2 + x2 y1 ) = z2 z1 ,
(1.86)
and
as you may verify. The above equations imply z1 + z2 = (x1 + x2 ) − ı (y1 + y2 ) = z1 + z2
(1.87)
z1 z2 = (x1 x2 − y1 y2 ) − ı (x1 y2 + x2 y1 ) = z1 z2 .
(1.88)
z z = x2 + y 2 ≥ 0,
(1.89)
and
We also have
with the equal sign only for z = 0. Then, we introduce the following Definition 30 (Absolute value). The absolute value (or modulus, or magnitude) of a complex number z = x + ıy is defined by √ |z| := x2 + y 2 = z z ≥ 0, (1.90) where again the equal sign holds only for z = 0. Also, we have Definition 31 (Complex plane). Consider a plane with its Cartesian axes. Given a complex number z = x + ı y, let the abscissa coincide with x = (z) and the ordinate with y = (z). In this way, we have constructed a one–to– one correspondence between a complex number and a point of a plane. Such a plane is known as the complex plane.
• Natural powers and roots of complex numbers Next, consider the natural powers of complex numbers. Akin to the real numbers (Definition 16, p. 37, we say that w = zn
(1.91)
55
1. Introduction
denotes the number 1 multiplied n times by z. Akin to real numbers, the n-th root of a complex number z, is the number w whose n-th power gives back the original number z, namely w=
√ n
z
is equivalent to
z = wn .
(1.92)
◦ Comment. As we will see, the n-th root of a complex number always √ exists. In fact, there always exist n of them. [After all, z = n w is the root of z n − w = 0, and — as we will see — a polynomial of degree n always has n roots (Remark 50, p. 225).] To give you a simple example, consider the following problem: z 4 + 4 = 0. The solutions to this problem are z = ±1 ± ı, as you may verify. We will encounter plenty of additional examples in this book. [The extension of the exponentiation to the complex field (namely z w , with z and w complex) is much more complicated, and it will be addressed in Vol. II.]
1.8 Appendix. The Greek alphabet If you are not familiar with the Greek letters, you are at a disadvantage. They are so widely used in mathematics and in mechanics that it is a good idea for you to become familiar with them. [We have already introduced the letters α and β (Eq. 1.72) and (Footnote 12, p. 43).] In fact, it would have been very difficult for me to write this book without them. [I think of ε and δ in the discussion of limits in Chapter 8.] Moreover, were I to do that, I would not be doing you any favors — you would have problems in reading other books on mathematics and mechanics. Accordingly, in Table 1.3 below, I have listed the symbol used for each Greek letter (lower case in the first/fifth column, and upper case in the second/sixth), as well as its name (third/seventh column). In the fourth/eighth column, you’ll find the English transliteration. Note that many of the upper case Greek letters are identical to their English equivalent. [Notable exceptions are the capital letters Rho and Chi, which are written like the English letters P and X, whereas their English equivalent are R and Ch.] Accordingly, only the following capital Greek letters are used in mathematics: Γ (Gamma), Δ (Delta), Θ (Theta), Λ (Lambda), Ξ (Xi ), Π (Pi ), Σ (Sigma), Υ (Upsilon), Φ (Phi ), Ψ (Psi ) and Ω (Omega).
56
Part I. The beginning – Pre–calculus α
A
Alpha
A
ν
N
Nu
N
β
B
Beta
B
ξ
Ξ
Xi
X
γ
Γ
Gamma
G
o
O
Omicron
O
δ
Δ
Delta
D
π
Π
Pi
P
; ε
E
Epsilon
E
ρ;
P
Rho
R
ζ
Z
Zeta
Z
σ; ς
Σ
Sigma
S
η
H
Eta
I
τ
T
Tau
T
θ; ϑ
Θ
Theta
Th
υ
Υ
Upsilon
Y
ι
I
Iota
I
φ; ϕ
Φ
Phi
Ph
κ
K
Kappa
K
χ
X
Chi
Ch
λ
Λ
Lambda
L
ψ
Ψ
Psi
Ps
μ
M
Mu
M
ω
Ω
Omega
O
Table 1.3 The Greek alphabet
◦ Warning. Note also that, at times, there exist two different symbols to indicate the same letter (specifically for lower case epsilon, theta, rho, sigma and phi ). I will not use them interchangeably.
Chapter 2
Systems of linear algebraic equations
In this chapter, we address the generalization of the formulation for bign`e and bab` a. We also discuss methodologies to obtain their solution, in particular, the Gaussian elimination technique, which might be new to you.
• Overview of this chapter As some of you might have learned in high school, a set of equations like those in Eq. 1.1 or in Eq. 1.3 is called a system of two linear algebraic equations with two unknowns.1 The primary goal here is to present a technique known as the Gaussian elimination, namely a procedure used to find the solution of systems of equations, such as those encountered in Section 1.1, on bign`e and bab` a. As already pointed out in Remark 1, p. 8, in reference to systems of two equations with two unknowns, we have that: (i) typically the solution exists and is unique, (ii) under some unusual conditions, the solution does not exist at all, and (iii) under even more unusual conditions, the solution exists, but is not unique. As we will find out, even for systems of n linear algebraic equations with n unknowns, we encounter the three above possibilities (and only those three): under certain conditions, the solution exists and is unique, but sometimes the solution might not even exist at all, or might exist, but not be unique. The problem will be addressed in its full generality, namely for systems of n linear algebraic equations with n unknowns. However, to avoid a sudden increase in complexity, I will slowly build up the level of sophistication, by 1 These equations are called algebraic because they involve only the operations used in algebra. The reason why they are called linear is not particularly relevant at this point. Hence, to make your life simpler, this point is postponed to Remark 34, p. 128.
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_2
57
58
Part I. The beginning – Pre–calculus
starting from the solution of a system of two equations with two unknowns, then going to a system of three equations with three unknowns, and then generalizing this even further. I prefer to be too repetitive than too difficult to read. Specifically, in Section 2.2, we address the problem of how to solve a system of two linear algebraic equations with two unknowns, namely a system of the type a x + b y = f, c x + d y = g.
(2.1)
This means: “Find x and y such that these two equations are both satisfied.” For systems of two equations with two unknowns, we will consider two methods: first the method by substitution, and then the method by elimination, namely the Gaussian elimination procedure. Next, in Section 2.3, we extend the problem to the case of three linear algebraic equations with three unknowns; in particular, we first illustrate the Gaussian elimination procedure and then use it to obtain an explicit expression for the solution of a 3 × 3 system. Finally, in Section 2.4, we address systems of n equations with n unknowns (with n arbitrary), and present the Gaussian elimination procedure in its full generality.
2.1 Preliminary remarks In order to stir up your interest, I’d like to point out that the hidden agenda in presenting the Gaussian elimination is also to prepare the ground for one of the most important concepts in this book, that of linear independence of a given collection of mathematical quantities. [Specifically, in Chapter 3 we deal with linear independence of vectors, whereas in Chapter 6 we deal with linear independence of functions.] The material in this chapter makes it much easier to introduce such a concept. ◦ Warning. From now on, I will rarely the symbol × to denote multiplication. That’s elementary algebra! Instead of a × b, I will use a b. For numbers, instead of 1 × 2, I will use 1 · 2, so as not to confuse 1 multiplied by 2 with 12. Accordingly, I can introduce the more compact expression “n × n linear algebraic system,” where the symbol × does not indicate multiplication, but a shorthand notation for n equations with n unknowns. [The symbol × is also used to denote the dimensions of a matrix (Definition 43, p. 92), as well as the cross product between two vectors (Eq. 240).]
2. Systems of linear algebraic equations
59
Remark 16. In the literature, the corresponding theorems of existence and uniqueness of the solution are typically presented in terms of determinants. [A notable exception is the beautiful book Linear Algebra Done Right, by Sheldon Axler (Ref. [5]), which unfortunately is definitely too difficult for you at this point in our journey.] However, I never liked determinants, and I consider them not easy to understand, especially for a beginner, as I was when my professors forced them down my throat (definitely not gut–level mathematics, especially the n × n case, with n arbitrarily large). To avoid determinants, here the theorems of existence and uniqueness are presented in terms of a technique known as the Gaussian elimination procedure, which is the primary objective of this chapter (Subsection 2.2.2 for the 2 × 2 case, later extended to the 3 × 3 and n × n cases).2
2.2 The 2 × 2 linear algebraic systems Consider again Eq. 1.1. We have already verified that the solution is given by x = 8 and y = 2 (Eq. 1.2). How did I come up with this solution? Is this solution unique? Is there a general method to do this? In order to discuss this, let us consider a generic system of two linear algebraic equations with two unknowns, as given by Eq. 2.1. As stated above, here we introduce two methods of solution for such a problem. The first method (by substitution, Subsection 2.2.1) is the most gut–level of the two and probably what you were taught in high school. [Indeed, this is the main reason for including it here, that is, to give you a 2
The Gaussian elimination procedure is named after Carl Friedrich Gauss (1777–1855), a German mathematician and scientist. Gauss is considered one of the greatest and most influential mathematicians of all time. He contributed significantly to many fields of mathematics, including number theory, statistics, analysis and differential geometry. He has been referred to as Princeps mathematicorum (Latin for “Prince of Mathematicians”) and as the “greatest mathematician since antiquity.” However, his contributions are not limited to mathematics, but span over geophysics, geodesy, astronomy, optics, electrostatics and magnetism. In 1796, at the age of 19, Gauss showed how to construct a heptadecagon, namely a regular polygon with 17 edges, only by using a compass–and–straightedge construction. [In fact, Gauss was a child prodigy, as you may gather from the following anecdote (Boyer, Ref. [13], p. 497). Once upon a time, when Gauss was about ten, a grammar school teacher gave his class the following problem: “Find the sum of all the numbers between 1 and 100.” He had hoped to have some time to take a nap in the back of the classroom. Unfortunately for him, Gauss was in his class. After a short while, he came up with the correct answer: 5, 050. We don’t know whether the story is true and, if so, how he came up with his answer. We can only speculate that he noted that if one writes the numbers from 1 to 100 on one row, and then from 100 to 1 in the next row, the sum of each column is always 101. There are 100 pairs of 101, for a total of 10,100; however, we have to divide this by 2 because we want the total of a single row. Thus, the sum is given by 12 101 · 100 = 5, 050.]
60
Part I. The beginning – Pre–calculus
gut–level introduction to the problem and connect the material with notions that are already familiar to you from high school. If this is not the case, just skip Subsection 2.2.1.] The method by substitution becomes cumbersome for large systems of equations. In this case, the preferred method is the Gaussian elimination method, which is widely used in practice (Subsection 2.2.2).
2.2.1 Method by substitution The method by substitution consists in using, say, the first equation to express one of the two variables in terms of the other, and then substituting this in the second equation, so as to obtain one equation with only one unknown, which is easily solved. Then, we can substitute back into the first equation and obtain the other unknown.
• Illustrative examples Let us consider again the equation for bign`e and bab` a (Eq. 1.1), namely x + y = 10, 2x + y = 18.
(2.2)
We may use the first in Eq. 2.2 to express x in terms of y so as to have x = 10 − y.
(2.3)
[Expressing y in terms of x is also a legitimate approach and yields the same results, as you may verify.] Substituting this expression for x into the second in Eq. 1.1, one obtains 2 (10 − y) + y = 18, namely −y = −2, that is, y = 2. Substituting back into Eq. 2.3 yields x = 8. Thus, the (unique) solution is indeed x = 8 and y = 2, as stated in Eq. 1.2. Next, let us consider the second equation for bign`e and bab` a (Eq. 1.3), namely x + y = 10, x + y = c/2.
(2.4)
2. Systems of linear algebraic equations
61
The first yields again x = 10 − y, as in Eq. 2.3. Substituting x into the second equation, one obtains (10 − y) + y = c/2, or (−1 + 1) y = c/2 − 10, namely 0 y = c/2−10. Consider first the case c/2 = 10. We obtain 0 y = c/2−10 = 0. This equation cannot be satisfied, no matter how hard we try, no matter which value we choose for y. [Indeed, the left side is always equal to zero, whereas the right side differs from zero.] We say that, in this case, the solution does not exist. Next, consider the case c/2 = 10. Now we have: 0 y = 0. This equation is satisfied for any value of y. We may choose for y any value we want, say y = y0 . Then, we are left with x = 10 − y0 (with y0 arbitrarily chosen). In other words, one equation may be discarded and we are left with one equation with two unknowns, which only allows us to obtain one in terms of the other. Remark 17. Let us try to understand what is happening here. If c/2 = 10, the second equation is equal to the first. As in the case of Eq. 1.4, one of the two equations is redundant, in the sense that, from a mathematical point of view (and only from a mathematical point of view), the two equations tell us the same thing, even though, from a man–on–the–street point of view, their meaning may be totally different. [Again, if we use these two equations to address the problem of bign`e and bab` a discussed in Chapter 1, the first equation says that the total number of pastries must be 10, the second that the total cost must be 20 dollars.]
• Explicit solution for a generic 2 × 2 system Let us consider a generic 2 × 2 system, namely (Eq. 2.1)3 ax + by = f, cx + dy = g.
(2.5)
Remark 18. The case a = 0 is trivial. [Indeed, if a = 0, instead of solving for x, we can solve for y, as y = f /b, and substitute the result in the second in Eq. 2.5 to obtain x = (bg − df )/(bc). Of course, this solution coincides with that obtained later, namely Eqs. 2.9 and 2.11, with a = 0.] Accordingly, let us consider the case a = 0. From the first in Eq. 2.5, we obtain 3
One of my students was not clear about the use of the term “generic.” As you probably already know, in mathematics the term generic has a specific meaning, namely “the most general, and yet unspecified.” Sometimes it is used as “typical,” namely as “the most general, and yet unspecified, but with possible exceptional cases,” as in the sentence: “A generic natural number does not begin with 1.”
62
Part I. The beginning – Pre–calculus
x = (f − b y)/a.
(2.6)
Substituting this expression for x into the second in Eq. 2.5, one obtains c (f − by)/a + dy = g, or bc fc d− y=g− . (2.7) a a Then, multiplying both sides of the equation by a = 0 yields a d − b c y = a g − f c.
(2.8)
Here, we have two possibilities, namely ad − bc = 0 and ad − bc = 0. These two cases are addressed below. ◦ The case ad − bc = 0. We have y=
a g − fc . ad − bc
(2.9)
Back substituting into Eq. 2.6, one obtains (use A + B/C = (A C + B)/C, Eq. 1.35) 1 a g − fc f (a d − b c) − b (a g − f c) fa d − b a g x= f −b = = , (2.10) a ad − bc a (a d − b c) a (a d − b c) or (recall that here a = 0) x=
f d − bg . ad − bc
(2.11)
◦ The case ad − bc = 0. In this case, Eq. 2.8 yields 0 y = a g − c f.
(2.12)
Thus, we have again two possibilities. If ag − cf = 0, this equation cannot be satisfied by any value of y, no matter how hard we try. We say that the solution does not exist. On the other hand, if we also have ag − cf = 0, Eq. 2.12 reduces to 0 y = 0, and is satisfied by any value for y. Once y has been chosen (arbitrarily), again say y = y0 , substituting into Eq. 2.6, we obtain that x is given by x = (f − b y0 )/a. [The comments in Remark 17, p. 61, on redundant equations, apply here as well. A deeper analysis of this situation is presented in Subsubsection “The case ad − bc = 0,” on p. 64, and Subsubsection “A deeper look,” on p. 65.]
2. Systems of linear algebraic equations
63
2.2.2 Gaussian elimination method The substitution method becomes cumbersome, to say the least, if we have a relatively large system of equations, namely n equations with n unknowns, where n is larger than, say, 5 or 6. Fortunately, there is a second method (fully equivalent to the method by substitution), which obviates this problem. This method (typically used for relatively large systems of equations, because it is easy to implement with computers) is known as the Gaussian elimination method, and is addressed in this subsection, for the limited 2 × 2 case. [The cases 3 × 3 and n × n are addressed in the sections that follow.] Remark 19. The Gaussian elimination procedure is usually dealt with only as a numerical technique, namely as a method for solving systems of equations of the type discussed thus far, by using computers. Here, we present it primarily to discuss theoretical issues, namely to explore the existence and uniqueness of the solution. For the 2 × 2 case, the objective of the Gaussian elimination procedure is to eliminate the first variable from the second equation, specifically by subtracting the first equation multiplied by a suitable constant from the second equation. Specifically, let us consider again the problem in Eq. 2.1, namely a x + b y = f, c x + d y = g.
(2.13)
Again, the case a = 0 is trivial — interchange the two equations. Accordingly, without loss of generality, we can assume a = 0. In this case, the method consists in multiplying the first in Eq. 2.13 by c/a and subtracting it from the second so as to eliminate x from the second equation. ◦ Comment. You are going to ask me why do we choose to multiply the first in Eq. 2.13 by c/a: “Why c/a and not another number?” The reason why we picked the number c/a is because dividing by a and then multiplying by c the coefficient of x in the first equation becomes equal to c, which is what we want, because by subtracting this from the second equation the first term is eliminated. This is the essence of the Gaussian elimination procedure. The result is (d − bc/a) y = g − f c/a, namely (multiply by a) (a d − b c) y = a g − c f,
(2.14)
which coincides with Eq. 2.8. Remark 20. It should be emphasized that Eq. 2.14, coupled with the first in Eq. 2.13, is fully equivalent to the original system in Eq. 2.13, in the sense
64
Part I. The beginning – Pre–calculus
that a solution of such a system satisfies also the first in Eq. 2.13 along with Eq. 2.14, and vice versa. [For, the original second equation equals the final second equation (Eq. 2.14 divided by a) plus the first equation multiplied by c/a. If these are satisfied their sum is satisfied as well (Eq. 1.58, p. 40).] This issue is very important and will be addressed in Theorem 12, p. 77, for a system of 3 equations with 3 unknowns, and in Theorem 15, p. 86, for a system of n equations with n unknowns. Again, if ad − bc = 0, y is given by Eq. 2.9. Then, back–substitution into the first in Eq. 2.13 yields again that x is given by Eq. 2.11. Thus, the solution is given by x=
f d − bg ad − bc
and
y=
ag − f c , ad − bc
(2.15)
in agreement with Eqs. 2.9 and 2.11. ◦ Comment. The method of solution by substitution and the Gaussian elimination procedure differ only at skin–deep level. Indeed, if you look at them carefully, you’ll notice that the operations involved are fully equivalent, almost identical.
• The case ad − bc = 0 If ad − bc tends to zero, x and y become larger and larger. However, when ad − bc reaches zero, we are in trouble. Indeed, we cannot divide an equality by zero (Remark 7, p. 26). Thus, the case ad − bc = 0 must be examined separately. This occurs when ad = bc, namely (dividing both terms of the equality by ab = 0) when c/a = d/b. [Here, for the sake of simplicity, we assume that none of the coefficients vanishes, unless otherwise stated. You may work out the case in which one of the coefficients vanishes. Hint: Use Remark 18, p. 61).] We have the following Definition 32 (The symbols := and =:). The symbol “a := b” means “a equals b, by definition of a” (namely the definition of a is simply to be equal to b). Similarly, a =: b means “a equals b, by definition of b,” namely the definition of b is to be equal to a. Let us set σ := c/a = d/b. [The symbol σ denotes the Greek letter sigma (Section 1.8).]
(2.16)
65
2. Systems of linear algebraic equations
Equation 2.16 implies c = σ a and d = σ b, and hence c x + d y = σ (a x + b y).
(2.17)
Thus, in the case under consideration, we can say that the left side of the second equation in Eq. 2.13 equals that of the first multiplied by σ, independently of the values of x and y. Next, consider the expressions for x and y in Eq. 2.15. In order to understand what happens to the solution when ad − bc equals zero, we have to see what happens to the right–side terms. Consider in particular the expression for y in Eq. 2.15. There exist two possibilities, namely ag − cf = 0 and ag − cf = 0. Again, if ag − cf = 0, we have: 0 y = a g − c f = 0, which is mathematically impossible — the solution does not exist. The last case considered is when ad − bc and ag − cf are both zero. In this case, both the numerator and the denominator vanish in the expression for y in Eq. 2.15. In this case, we also have df − bg = 0 (so that the numerator and denominator vanish as well in the expression for x in Eq. 2.15). [For, ad − bc = 0 and ag − cf = 0 imply c d = a b
and
c g = . a f
(2.18)
Comparing the two, we have the desired result, d/b = g/f , namely df −bg = 0, which is what we wanted to show.]
• A deeper look Expanding upon the considerations in Remark 17, p. 61, let us try to understand what is happening here. The case discussed in the preceding comment occurs when we have (Eq. 2.18) c d g = = =: σ. a b f
(2.19)
What does this mean? Let us examine this issue more carefully. Equation 2.19 implies that cx + dy − g = σ (ax + by − f ),
(2.20)
independently of x and y. In other words, for the case under consideration, the second in Eq. 2.13 is equal to the first multiplied by a constant σ. As a consequence, the second equation does not tell us anything new with respect to the first equation. Hence, it is redundant and may be discarded. Thus, we
66
Part I. The beginning – Pre–calculus
are left with one equation and two unknowns. Then, we can assign to one of the two unknowns any arbitrary value, say y = y0 , and solve for the other one, as x = (f − b y0 )/a. We say that the solution exists, but is not unique. At the cost of being too repetitive, as an illustrative example let us consider one more time the second problem for bign`e and bab` a (Eq. 1.3), namely x + y = 10, x + y = c/2.
(2.21)
By subtracting the first equation from the second so as to eliminate x, we obtain: 0 y = c/2 − 10, for any value of x and y. As in the method by substitution presented in Subsection 2.2.1, for c/2 = 10, we have: 0 y = c/2−10 = 0. This equation cannot be satisfied by any value for y; we say that, if c/2 = 10, the solution does not exist. On the other hand, if c/2 = 10, we have: 0 y = 0. This equation is satisfied for any value of y. We may choose for y any value we want, say y = y0 again. Then, we are left with x = 10−y0 , with y0 arbitrarily chosen. [Stated differently, for c/2 = 10, the second equation coincides with the first, and hence it may be discarded. Then, we are left with only one equation for two unknowns, which only allows us to obtain one of the two unknowns, say x, in terms of the other, namely x = 10 − y0 . On the other hand, if c/2 = 10 the two equations are incompatible.]
• Homogeneous systems. Nontrivial solution We have the following Definition 33 (Homogeneous system. Trivial solution). Consider the particular case f = g = 0. In this case, the system is called homogeneous (otherwise, it is called nonhomogeneous). Of course, the above considerations apply to this case as well. However, now there exist only two possibilities. Specifically, it is no longer possible for the solution not to exist, because x = y = 0 is always a solution to any homogeneous problem. This solution is trivially true: each term on the left side vanishes, so that the result equals zero, akin to the right side. Accordingly, such a solution is referred to as the trivial solution. Remark 21. There is still a question, namely whether or not the solution x = y = 0 is unique. It is very important to note that, if ad − bc = 0, this solution is unique, as you can see from Eq. 2.15, by setting f = g = 0. On the other hand, if ad − bc vanishes (namely if c/a = d/b), then the two equations are fully equivalent, namely one of the two is redundant and may be discarded. Therefore, if ad − bc = 0, we always have a nonzero solution,
2. Systems of linear algebraic equations
67
which is obtained from any one of the two equations (they are equivalent), by arbitrarily choosing the value for one of the two unknowns, and solving for the other: this is known as the nontrivial solution. It is apparent that the nontrivial solution may be always multiplied by an arbitrary constant. In plain language, the nontrivial solution is known, except for a multiplicative constant. In particular, if the constant is equal to zero, we recover the trivial solution. In the rest of this book, the collection of all the nontrivial solutions that differ only by the value of the multiplicative constant is considered as a single solution. Thus, if we say that a nontrivial solution is uniquely defined, it is understood “except for a multiplicative constant,” since this is implicitly stated in the term “nontrivial.”
2.2.3 Summary We can summarize the results from Section 2.2 as follows: Theorem 10 (Homogeneous 2 × 2 systems). A homogeneous 2 × 2 linear algebraic system, admits two possibilities: 1. If c/a = d/b, the solution exists and is unique. This solution is given by x = y = 0 (trivial solution). 2. If c/a = d/b, the solution exists, but is not unique. It is called the nontrivial solution, and is uniquely defined except for an arbitrary multiplicative constant. [If such a constant vanishes, we recover the trivial solution, which always exists.] In addition, we have Theorem 11 (Nonhomogeneous 2×2 systems). A nonhomogeneous 2×2 linear algebraic system admits three possibilities: 1. If c/a = d/b, a unique solution exists and is given by Eq. 2.15. 2. If c/a = d/b = g/f , the solution does not exist. 3. If c/a = d/b = g/f , the solution exists, but is not unique. ◦ Comment. More on this in Subsection 7.3.2, where we provide a geometrical interpretation of this theorem in terms of the intersection of two straight lines on a plane, and also discuss what happens if c/a is very close to d/b (ill–conditioned 2 × 2 systems).
68
Part I. The beginning – Pre–calculus
2.3 The 3 × 3 linear algebraic systems Here, we address 3 × 3 linear algebraic systems. Before we do this, let me add a coda to the story of bign`e and bab` a discussed in the preceding section, specifically to the story when we had ten people at the table (Eq. 1.3). I am telling you this story just to show you that more complicated problems may also arise in everyday life. One year later, we had again ten people at the table. My mother felt that we should do better than the year before, maybe because the economic situation of my family was considerably better; after all, those were the post– war years. She wanted to add a “diplomatico” for everyone except for herself and my father. She claimed it was because of their diet, but in retrospect I believe it was to save money. They didn’t have a weight issue then, especially my father, who had recently come back from being a prisoner of war and was still underweight. [For the culinary purist, a diplomatico consists of a layer of sponge cake, inside two layers of a special pastry cream (crema pasticcera mixed with chantilly cream), in turn all inside two layers of millefoglie (puff pastry), all topped with powdered sugar.] I asked her to give me the equivalent of 40 dollars. Why? We’ll see in a sec! Let me first formulate the problem. Let again x denote the number of bign`e (2 dollars apiece), y the number of bab` a (1 dollar apiece), and z the number of diplomatici (3 dollars apiece; delicious but expensive!). The problem may be formulated with three equations and three unknowns, as follows x + y + z = 18, 2x + y + 3z = 40, x + y − z = 2.
(2.22)
The first tells us that the total number of pastries had to be 18 (2 for 8 people, plus one for each of my parents); the second that the total money available was 40 dollars; the third tells us that the number of diplomatici equals the number of bign`e plus the number of bab` a minus 2, those for my parents. The solution is x = 6, y = 4 and z = 8 (namely six bign`e, four bab` a and eight diplomatici ), as you may verify. How did I come up with the solution? I didn’t. I had worked the other way around. I figured out I’d buy the least expensive pastries, namely the bab` a for my family (my parents, my sister and me), for 4 dollars, six bign`e for the guests, for 12 dollars, and 8 diplomatici (6 guests, my sister and me) for 24 dollars. That’s why I had asked my mother for 40 dollars. ◦ Comment. Indeed, even if I had not started with the solution, and worked the other way around, I could have easily obtained the solution, because the
2. Systems of linear algebraic equations
69
problem may be formulated in a much simpler way. To begin with, it is apparent that the number of diplomatici had to be 8, again 6 guests, my sister and me. [This is consistent with Eq. 2.22. If you subtract the last equation from the first, you obtain 2z = 16, namely eight diplomatici.] That leaves x+y = 10. In addition, the cost of the 8 diplomatici is 3·8 = 24 dollars, with a remaining total of 40 − 24 = 16 dollars so as to have 2x + y = 16. This way, we are back to a system of two equations with two unknowns, namely x + y = 10, 2x + y = 16,
(2.23)
a much easier problem to solve than 3 equations with 3 unknowns. For instance, if you subtract the first equation (x + y = 10) from the second (2x + y = 16), you obtain x = 6, and hence y = 4 (namely six bign`e and four bab` a ), in agreement with what was stated above. Here, however, we pretend that we have not noticed what is stated in the above comment, and address how to solve a generic 3 × 3 systems.
2.3.1 Gaussian elimination for 3 × 3 systems In Eq. 2.22, we have seen an example of a 3 × 3 linear algebraic system. Here, we address the Gaussian elimination procedure for a generic system of this type, namely a11 x1 + a12 x2 + a13 x3 = b1 , a21 x1 + a22 x2 + a23 x3 = b2 , a31 x1 + a32 x2 + a33 x3 = b3 .
(2.24)
[Note that the first index in the coefficients ahk indicates the equation number, whereas the second indicates the unknown that multiplies it.] We have the following definitions: Definition 34 (Homogeneous 3 × 3 system). A 3 × 3 linear algebraic system is called homogeneous iff all the terms on the right side vanish, namely iff b1 = b2 = b3 = 0: a11 x1 + a12 x2 + a13 x3 = 0, a21 x1 + a22 x2 + a23 x3 = 0, a31 x1 + a32 x2 + a33 x3 = 0. Otherwise, it is called nonhomogeneous.
(2.25)
70
Part I. The beginning – Pre–calculus
Definition 35 (Known terms and unknowns). The coefficients ahk in Eq. 2.24 will be referred to as the left–side coefficients, or simply the coefficients of the system. Moreover, we will use the term unknowns for xk . Finally, the coefficients bh are typically referred to as the right–side terms. [However, one time I wrote the above equation in the following equivalent form a11 x1 + a12 x2 + a13 x3 − b1 = 0, a21 x1 + a22 x2 + a23 x3 − b2 = 0, a31 x1 + a32 x2 + a33 x3 − b3 = 0.
(2.26)
Some of my students, seeing that all the terms on the right side vanish, concluded (wrongly!) that they were dealing with a homogeneous problem. For this reason, whenever there is a possibility of misunderstanding, I will use the terminology known terms, or prescribed terms, instead of the traditional right–side terms.] Definition 36 (Main diagonal). The main diagonal (or simply the diagonal ) of the array of the coefficients ahk is composed of the elements a11 , a22 and a33 . Definition 37 (Equivalent 3 × 3 systems). Two 3 × 3 linear algebraic systems (namely systems of linear algebraic equations) are called equivalent iff any solution of the first is also a solution of the second, and vice versa. [See Remark 20, p. 63, for the 2 × 2 case.] Definition 38 (Upper triangular system). A system of 3 × 3 linear algebraic equations is called upper triangular (or in upper triangular form) iff all the left–side coefficients that are below the main diagonal vanish, as in a11 x1 + a12 x2 + a13 x3 = b1 , a22 x2 + a23 x3 = b2 , a33 x3 = b3 .
(2.27)
Remark 22 (Relevance of upper triangular systems). It is apparent that it is quite easy to find the solution for the upper triangular system in Eq. 2.27. The last equation gives us x3 . Substituting x3 in the second of Eq. 2.27, the resulting equation gives us x2 . Finally, substituting x2 and x3 in the first of Eq. 2.27, the resulting equation gives us the remaining unknown, x1 .
2. Systems of linear algebraic equations
71
The objective of the Gaussian elimination procedure is to transform the system in Eq. 2.24 into an equivalent upper triangular system, as in Eq. 2.27. In the following, we first examine some illustrative examples. Then, we generalize the results and present some theorems regarding the solvability of such systems. Finally, we use the Gaussian elimination procedure to provide an explicit expression for the solution of a generic 3 × 3 system.
2.3.2 Illustrative examples Before showing how to generalize the Gaussian elimination method to the case of 3 equations with 3 unknowns, let us consider some specific examples that shed some light on the complexities that may arise when dealing with these systems. Let us begin with the following 3 × 3 linear algebraic system: x + 2y + 3z = 10, 4x + 5y + 6z = 13, 7x + 8y + bz = c.
(2.28)
In line with the approach used for the 2 × 2 case, the Gaussian elimination procedure consists of the following steps: (i) subtract the first equation, multiplied by a suitable constant (of course, a different one each time), so as to eliminate the first unknown from the second and third equations, and then (ii) use the resulting second equation, multiplied by a suitable constant, to eliminate the second unknown from the third equation. The resulting system (which is equivalent to the original one, see Remark 23 below) is in upper triangular form (and hence easy to solve, as pointed out in Remark 22 above). In our case, multiply the first equation by 4 and subtract the result from the second equation. Also, multiply the first equation by 7 and subtract the result from the third equation. This yields x + 2y + 3z = 10, −3y − 6z = −27, −6y + (b − 21) z = c − 70.
(2.29)
This way, we have eliminated the unknown x from the second and third equations. Next, we want to eliminate the unknown y from the last equation. Multiplying the second equation in Eq. 2.29 by 2 and subtracting the result from
72
Part I. The beginning – Pre–calculus
the last one, we obtain x + 2y + 3z = 10, −3y − 6z = −27, (b − 9)z = c − 16.
(2.30)
Remark 23. As in the 2 × 2 case (Remark 20, p. 63), the new system (Eq. 2.30) is fully equivalent (Definition 37, p. 70) to the original one, in the sense that any (x, y, z) satisfying Eq. 2.30 will satisfy Eq. 2.28 as well. For instance, if x, y and z are such that the first two equations of Eq. 2.30 are satisfied, then the second in Eq. 2.28 (which equals the second in Eq. 2.30 plus four times the first) is necessarily satisfied as well. Similar considerations hold for the third in Eq. 2.30. [More on this in Theorem 12, p. 77.]
• The case b = 9 Here, we assume that b = 9. In this case, we can solve for z from the last equation and obtain z = (c − 16)/(b − 9), then substitute this value in the second one and solve for y, and finally substitute y and z in the first one and solve for x. For instance, consider the case in which b = 10, c = 20. Then, we have z = (c−16)/(b−9) = 4, whereas the second equation yields y = (27−6 z)/3 = 9 − 2 · 4 = 1. Next, substituting into the first, we obtain x = 10 − 2 · 1 − 3 · 4 = 10 − 2 − 12 = −4. [It is always a good idea to check our results. Substituting x = −4, y = 1 and z = 4 in Eq. 2.28, with b = 10 and c = 20, we have −4 + 2 · 1 + 3 · 4 = 10, −4 · 4 + 5 · 1 + 6 · 4 = 13, −7 · 4 + 8 · 1 + 10 · 4 = 20.
(2.31)
Thus, the above values for x, y and z do indeed satisfy as well Eq. 2.28 with b = 10 and c = 20.]
• The case b = 9 Here, we assume that b = 9. In this case, the last equation in Eq. 2.30 reads 0 z = c − 16. Thus, if in addition c = 16, the solution does not exist. Much more interesting is the case b = 9 and c = 16. In this case, the sum of the first and last equations equals the second equation multiplied by two. Indeed,
73
2. Systems of linear algebraic equations
(x+2y+3z −10)−2 (4x+5y+6z−13)+(7x+8y+9z−16) = 0,
(2.32)
independently of the values of x, y, and z. In other words, one of the three equations may be expressed in terms of the other two. For instance, the second equation equals the semi–sum of the other two, and hence it is redundant, since it does not provide us with any additional information.4 Therefore, it may be discarded, and we are left with only two equations with three unknowns. We can assign an arbitrary value to one of the unknowns, say z = C and solve for the other two. Of course, we can do the same with Eq. 2.30. Here, the last equation is 0 z = 0, which is identically satisfied and therefore may be discarded. Next, we fix z, say z = C. We are left with x + 2y = 10 − 3 C, −3y = −27 + 6 C.
(2.33)
The second gives us y, and then the first one gives us x. The result is x = −8 + C,
y = 9 − 2 C,
z = C,
(2.34)
with C arbitrarily chosen. [Again, let us check our results. Substituting x = −8 + C, y = 9 − 2 C and z = C into the original problem (Eq. 2.28, with b = 9 and c = 16), we have indeed (−8 + C) + 2 (9 − 2 C) + 3 C = 10, 4 (−8 + C) + 5 (9 − 2 C) + 6 C = 13, 7 (−8 + C) + 8 (9 − 2 C) + 9 C = 16,
(2.35)
as required. Note that C does not appear on the right side.]
• The homogeneous case. Nontrivial solution Next, consider the corresponding homogeneous case 4
For your information, the prefixes half (from German), semi (from Latin), and hemi (from ancient Greek) have the same meaning. Thus, we use the term “positive semiaxis,” to indicate one half of the x-axis (specifically from the origin, in the positive direction); we also use “semi–sum” to mean “one half of the sum,” as well as “semi–difference” to mean “one half of the difference.” Similarly, “hemisphere” stands for “one half of a sphere.” [Even demi (from French) means half, as in “demi–sec” (half–dry champagne), or “demitasse” (half a cup, or espresso cup).]
74
Part I. The beginning – Pre–calculus
x + 2y + 3z = 0, 4x + 5y + 6z = 0, 7x + 8y + bz = 0.
(2.36)
The system resulting from the Gaussian elimination procedure is given by Eq. 2.30 with right sides equal to zero, namely x + 2y + 3z = 0, −3y − 6z = 0, (b − 9)z = 0.
(2.37)
In this case, you may immediately see that if b = 9 the (unique) solution is the trivial solution x = y = z = 0. [Recall that the zero solution of a homogeneous system is called the trivial solution (Definition 33, p. 66), whereas the nonzero solution is called the nontrivial solution (Remark 21, p. 66).] More interesting is the case b = 9. Now, the third equation may be discarded, and one variable, say z, may be given an arbitrary value, say z = C. Then, the solution is x = C,
y = −2C,
z = C.
(2.38)
[Again, we say that the nontrivial solution is uniquely defined except for a multiplicative constant, namely C.] Let us verify our results. Substituting x = 1, y = −2 and z = 1 into Eq. 2.36, we have indeed: 1 − 2 · 2 + 3 · 1 = 0, 4 · 1 − 5 · 2 + 6 · 1 = 0, 7 · 1 − 8 · 2 + 9 · 1 = 0.
(2.39)
Remark 24. Note that the non–unique solution equation in Eq. 2.34 is obtained from the solution with z = 0 (namely x = −8, y = 9 and z = 0), by adding the trivial solution (Eq. 2.38), with z = C. [As we will see, this consideration is valid in general (Theorem 36, p. 130).]
2.3.3 Another illustrative example; ∞2 solutions It should be noted that the 3 ×3 case may be richer than the 2 ×2 case, in the sense that not just one, but two equations of the resulting upper triangular
75
2. Systems of linear algebraic equations
system may have all the left–side coefficients equal to zero. To illustrate this issue, consider the system x + 2y + 3z = c1 , 2x + 4y + 6z = c2 , 3x + 6y + 9z = c3 .
(2.40)
Applying the Gaussian elimination procedure, after the very first step, we have x + 2y + 3z = c1 , 0 = c2 − 2c1 , 0 = c3 − 3c1 .
(2.41)
Here, you have the situation indicated above: two equations of the resulting upper triangular system have all the coefficients on the left side equal to zero. Clearly, the solution exists if, and only if, c2 = 2c1 , and c3 = 3c1 . This example shows that, if the second and third rows of coefficients are equal to the first one multiplied by suitable constants (namely 2 and 3, in our case), then the same must hold true for the right sides, in order for the solution to exist. Of course, if it exists, the solution is not unique. In this case, 2 equations may be discarded, and correspondingly there exist two unknowns (for instance y and z), to which we may assign arbitrary values. To indicate that we have two arbitrary constants, mathematicians say that there are ∞2 solutions. [Similarly, we say that for the equation in Eq. 2.34 we have ∞1 solutions.] Accordingly, in the present case, for Eq. 2.40 with c2 = 2c1 , and c3 = 3c1 , we say that there are ∞2 solutions, whereas, for Eq. 2.28 with b = 9 and c = 16, we say that there are ∞1 solutions. ◦ Warning. Recall that in this book ∞ is not a number (Remark 10, p. 44). Accordingly, you ought to think of the symbol ∞p not as a number, but rather as a convenient way to state the concept with a short mathematical expression.
2.3.4 Interchanging equations and/or unknowns Some comments are in order. Specifically, we must point out that sometimes, in order for us to be able to continue the Gaussian elimination procedure, it is necessary to perform appropriate interchanges of equations and/or unknowns.
76
Part I. The beginning – Pre–calculus
Of course, this approach works if, and only if, at least one of the remaining coefficients does not vanish. To illustrate this issue, let us consider a small modification of Eq. 2.40, namely x + 2y + 3z = c1 , 2x + 4y + 7z = c2 , 3x + 7y + 9z = c3 .
(2.42)
Applying the first cycle of the Gaussian elimination procedure, we have x + 2y + 3z = c1 , 0 + 0 + z = c2 − 2c1 , 0 + y + 0 = c3 − 3c1 .
(2.43)
Now, the coefficient a22 (namely the coefficient of the second unknown y in the second equation) vanishes, but (contrary to what we had in Eq. 2.41) not all the coefficients in the last two equations vanish. Accordingly, we cannot proceed in the same way as we did before, when we went from Eq. 2.28 to Eq. 2.29 (specifically, used the second equation to remove the second unknown from the third equation). Nonetheless, the problem may be circumvented by introducing a small modification to the Gaussian elimination procedure. We have two possibilities. The first one consists in interchanging the last two equations and obtain x + 3z + 2y = c1 , 0 + y + 0 = c3 − 3c1 , 0 + 0 + z = c2 − 2c1 ,
(2.44)
which is an upper triangular system (Eq. 2.27), as desired. The second possibility consists in interchanging the variables y and z, so as to have x + 3z + 2y = c1 , 0 + z + 0 = c2 − 2c1 , 0 + 0 + y = c3 − 3c1 .
(2.45)
which is again an upper triangular system. To explore additional possibilities, let us consider another system, namely
77
2. Systems of linear algebraic equations
x + 2y + 3z = c1 , 2x + 4y + 6z = c2 , 3x + 6y + 10z = c3 .
(2.46)
Applying the Gaussian elimination procedure, we have, after the very first step, x + 2y + 3z = c1 , 0 + 0 + 0 = c2 − 2c1 , 0 + 0 + z = c3 − 3c1 .
(2.47)
Interchanging the last two equations and the last two unknowns, we have x + 3z + 2y = c1 , 0 + z + 0 = c3 − 3c1 , 0 + 0 + 0 = c2 − 2c1 ,
(2.48)
which is again an upper triangular system, with the left–side coefficients of the last equation all equal to zero. In the form provided in Eq. 2.48, we can easily see that the solution does not exist if c2 − 2c1 = 0. On the other hand, if c2 − 2c1 = 0, the solution exists, but is not unique. We can choose y arbitrarily, namely we have ∞1 solutions.
2.3.5 Existence and uniqueness theorems Let us be more formal and present some theorems that summarize the above results for 3 × 3 linear algebraic systems. Before doing this, it is convenient to show that the Gaussian elimination procedure yields equivalent linear algebraic systems (Definition 37, p. 70). Indeed, we have the following generalization of Remark 23, p. 72, Theorem 12 (Equivalence of Gaussian elimination systems). Given a linear algebraic 3 × 3 system, the Gaussian elimination procedure yields a 3 × 3 upper triangular system equivalent to the original one, in the sense that the solution to the final system satisfies also the original one and vice versa. ◦ Proof : Indeed, the only operations performed to go from the original system to the upper triangular one consist in adding to an equation an earlier one multiplied by a suitable constant. Once we substitute the solution into the original system, adding an equation multiplied by a constant to another equation corresponds to adding the same number to both sides of the sec-
78
Part I. The beginning – Pre–calculus
ond equation. [For, adding the same number to the two sides of an equation is a legitimate operation, in the sense that the equality remains valid (Eq. 1.27)]. Next, consider the following Lemma 2. Given a 3 × 3 system, consider the equivalent upper triangular system, as obtained through the Gaussian elimination procedure, inclusive of possible interchanges of equations and/or unknowns. Only two possibilities can occur, namely: 1. All the main–diagonal terms of the equivalent upper triangular system differ from zero. 2. All the left–side coefficients of the last p equations (p = 1 or p = 2) of the equivalent upper triangular system vanish. ◦ Proof : Indeed, the Gaussian elimination procedure is interrupted if, and only if, all the coefficients of the remaining equations vanish. For, if at least one of the coefficients does not vanish, through suitable interchanges of equations and/or unknowns, this coefficient may be moved into the diagonal location under consideration, thereby allowing the procedure to be continued, as we did in Subsection 2.3.4. Remark 25. In order to avoid lengthy sentences, here (and typically in the rest of the book) by x, y and z we denote, respectively, the first, second and third unknown after the Gaussian elimination procedure has been completed, including any possible interchange of equations and/or unknowns.] To be specific, we assume that first we perform the Gaussian elimination in order to identify the necessary interchanges. Then, we repeat the Gaussian elimination with the interchanges already performed. When I say “including any possible interchange of equations and/or unknowns,” I am referring to the second time we follow the Gaussian elimination procedure. Consider a homogeneous 3 × 3 linear algebraic system and the equivalent upper triangular system obtained via Gaussian elimination. We have the following Theorem 13 (Homogeneous 3 × 3 systems). For the homogeneous 3 × 3 linear algebraic system, defined in Eq. 2.25, consider the upper triangular system obtained via Gaussian elimination. Then, there exist only two possibilities, namely: 1. If none of the diagonal coefficients of the equivalent upper triangular system vanishes, the original system admits the solution x = y = z = 0. This solution is unique and is called the trivial solution.
2. Systems of linear algebraic equations
79
2. If all the left–side coefficients of the last p = 1 or 2 equations of the equivalent upper triangular system vanish, the original system admits non– zero solutions. These are not unique and are called nontrivial solutions. Specifically, we can obtain ∞p (p = 1, 2) distinct nontrivial solutions, which are defined except for multiplicative constants. ◦ Proof : Lemma 2 above assures us that there exist only two possibilities. Consider the first one, namely that all the main–diagonal terms of the equivalent upper triangular system differ from zero. In this case, it is apparent that the only solution for the last equation is z = 0; then, the second equation yields only y = 0, and finally the first yields only x = 0. There are no arbitrary values to assign to the variables. In other words, the solution of our equivalent upper triangular system exists, is unique, and is given by x = y = z = 0 (trivial solution). As a consequence, the same is true for the original system, again because the two are equivalent systems (Theorem 12, p. 77). Next, consider the second possibility, namely that all the coefficients of the p last equations (p = 1 or 2) of the equivalent upper triangular system vanish. In this case, we can assign arbitrary values to p variables. For instance, if p = 2, we can assign the value 1 to one of them, and 0 to the other; then we can do the reverse. This way, we have ∞2 distinct solutions for the original system, because the upper triangular system and the original one are equivalent systems (Theorem 12, p. 77). Next, consider a nonhomogeneous 3 × 3 linear algebraic system. We have the following Theorem 14 (Nonhomogeneous 3×3 systems). Consider the nonhomogeneous 3 × 3 linear algebraic system in Eq. 2.24. There exist three possibilities, namely: 1. Assume that none of the diagonal coefficients of the equivalent upper triangular system vanishes. Then, the solution exists and is unique. 2. Assume that, in the equivalent upper triangular system, all the left–side coefficients in the last p = 1 or 2 equations vanish, but the same does not hold true for the right–side terms. Then, the solution does not exist. 3. Assume that, in the equivalent upper triangular system, all the left–side coefficients in the last p = 1 or 2 equations vanish, and that the same holds true for the right–side terms. Then, the solution exists, but is not unique. There are ∞p solutions. ◦ Proof : Let us begin with Item 1 (which corresponds to Case 1 of Lemma 2, p. 78). From the last equation, we can solve for z, back substitute this into the second equation and solve for y, and finally back substitute in the first equation and obtain x (recall Remark 25, p. 78, on reordering equations
80
Part I. The beginning – Pre–calculus
and/or renaming unknowns). Thus, the solution of the upper triangular system exists and is unique. Therefore, the same is true for the original system, because the two are equivalent systems (Theorem 12, p. 77). Next, consider the last two items (which are subcases of Case 2 of Lemma 2, p. 78). In Item 2, all the left–side coefficients of the last p = 1, 2 equations vanish, but the corresponding right–side terms of the equations are not all zero. In other words, at least in one case, we have: 0 = c = 0, a clear impossibility. Thus, it is apparent that the solution does not exist, and the same holds true for the original system (equivalent because of Theorem 12, p. 77). Finally, consider Item 3, namely that all the left–side coefficients of the last p = 1, 2 equations vanish, and the corresponding right–side terms vanish as well. Then, the last p equations may be discarded. Hence, we can choose arbitrary values for p unknowns and solve for the others. Thus, in this case, the solution exists, but is not unique. We have ∞p solutions. [Again this is true because of Theorem 12, p. 77.] ◦ Comment. More on this in Subsection 7.4.1, where we provide a geometrical interpretation of this theorem in terms of the intersection of three planes, and discuss what happens when the diagonal coefficient of the last p equations (p = 1 or 2) of the upper triangular system nearly vanishes (ill–conditioned 3 × 3 systems).
2.3.6 Explicit solution for a generic 3 × 3 system
♠
Here, for the sake of completeness, we present an explicit expression for the solution of the generic 3 × 3 linear algebraic system, as given in Eq. 2.24. Let us consider first the case in which none of the diagonal terms of the resulting upper triangular system vanishes. Here, we apply a slight modification of the Gaussian elimination procedure: multiply the first equation by a21 and subtract it from the second equation multiplied by a11 (nonstandard Gaussian elimination procedure). [In the standard Gaussian elimination, the second is not multiplied by a constant. For the difference between the two, see Remark 27, p. 82.] Similarly, multiply the first equation by a31 and subtract it from the third equation multiplied by a11 . Then, for the last two equations, we have (a11 a22 − a21 a12 ) x2 + (a11 a23 − a21 a13 ) x3 = a11 b2 − a21 b1 , (a11 a32 − a31 a12 ) x2 + (a11 a33 − a31 a13 ) x3 = a11 b3 − a31 b1 .
(2.49)
Similarly, if we apply again the nonstandard Gaussian elimination procedure (namely multiply the first in Eq. 2.49 by a11 a32 − a31 a12 , and subtract the
81
2. Systems of linear algebraic equations
result from the second in Eq. 2.49 multiplied by a11 a22 −a21 a12 ), one obtains, for the last equation of the upper triangular system, a x3 = b,
(2.50)
with a := (a11 a22 − a21 a12 ) (a11 a33 − a31 a13 ) − (a11 a32 − a31 a12 ) (a11 a23 − a21 a13 ), b := (a11 a22 − a21 a12 ) (a11 b3 − a31 b1 ) − (a11 a32 − a31 a12 ) (a11 b2 − a21 b1 ).
(2.51)
If we perform the multiplications in the first of the two expressions above, we obtain eight terms on the right side of a. Two of these terms (namely those that do not contain a11 ) equal ± a21 a12 a31 a13 and offset each other. Similarly, two of the right–side terms of b (again those that do not contain a11 ) equal ± a21 a12 a31 b1 and offset each other. Then, we are left with a = a11 A
and
b = a11 A3 ,
(2.52)
where A := a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a11 a23 a32 − a12 a21 a33 − a13 a22 a31
(2.53)
and A3 := a11 a22 b3 + a12 b2 a31 + b1 a21 a32 − a11 b2 a32 − a12 a21 b3 − b1 a22 a31 .
(2.54)
Finally, combining Eqs. 2.50 and 2.52, dividing both sides of the resulting equation by a11 = 0, and solving for x3 , we have x3 =
A3 , A
(2.55)
with A and A3 given respectively by Eq. 2.53 and 2.54. Next, back–substitution into the upper triangular system gives us x2 and then x1 . With a lot of patience, you may verify that this yields xk = where
Ak A
(k = 1, 2),
(2.56)
82
Part I. The beginning – Pre–calculus
A1 := b1 a22 a33 + a12 a23 b3 + a13 b2 a32 − b1 a23 a32 − a12 b2 a33 − a13 a22 b3 , A2 := a11 b2 a33 + b1 a23 a31 + a13 a21 b3 − a11 a23 b3 − b1 a21 a33 − a13 b2 a31 .
(2.57)
Remark 26 (Rotating indices). A simple way to obtain this result is by “cyclic rotation of the indices,” namely by replacing 1 with 2, as well as 2 with 3, and 3 with 1. This is justified by the fact that the numbering system used for the variables and for the equations is completely arbitrary and the results apply if we move the last variable to become the first and the last equation to become the first (equivalent systems!). This indeed corresponds to “rotating indices,” in the sense indicated above. Note that rotating indices does not affect the expression for A (Eq. 2.53), as you may verify, or take my word for it. We have assumed that A, as given in Eq. 2.53, does not vanish. It is apparent that only in this case the solution exists and is unique. On the other hand, if A = 0, we have again the last two of the possibilities presented in Theorem 14, p. 79. [Take my word for it.] Remark 27. The following comments may be of interest. Note that in the Gaussian elimination procedure, as presented thus far, only one equation is multiplied by a suitable constant, in the sense that the equation from which a preceding one is subtracted is not altered. [As we will see in Section 2.4, this applies to the n × n case as well.] In the nonstandard Gaussian elimination procedure, on the contrary, both equations are multiplied by suitable constants. [As we will see, this difference impacts only the evaluation of the determinant, a notion barely touched upon in the next chapter (Section 3.8). This issue is addressed in full in Vol. II.]
2.4 The n × n linear algebraic systems In this section, we consider the generic n × n linear algebraic system a11 x1 + a12 x2 + · · · + a1n xn = b1 , a21 x1 + a22 x2 + · · · + a2n xn = b2 , ..., an1 x1 + an2 x2 + · · · + ann xn = bn . We have the following definitions:
(2.58)
83
2. Systems of linear algebraic equations
Definition 39 (Homogeneous n × n system). Akin to Definition 34, p. 69, for the 3 × 3 case, an n × n linear algebraic system is called homogeneous iff all the right–side terms vanish, namely iff b1 = · · · = bn = 0: a11 x1 + a12 x2 + · · · + a1n xn = 0, a21 x1 + a22 x2 + · · · + a2n xn = 0, ..., an1 x1 + an2 x2 + · · · + ann xn = 0.
(2.59)
Otherwise, it is called nonhomogeneous. Definition 40 (Main diagonal). Consider the array of the coefficients ahk in Eq. 2.58. Its main diagonal (or simply the diagonal ) is composed of the elements a11 , a22 , . . . , ann . [This generalizes Definition 36, p. 70, limited to the 3 × 3 case.] Definition 41 (Upper and lower triangular systems). An n × n linear algebraic system is called upper triangular iff all the coefficients below the main diagonal vanish, as in a11 x1 + a12 x2 + · · · + a1n xn = b1 , a22 x2 + · · · + a2n xn = b2 , ..., ann xn = bn .
(2.60)
An n×n linear algebraic system is called lower triangular iff all the coefficients above the diagonal vanish. [This generalizes Definition 38, p. 70, again limited to the 3 × 3 case.]
2.4.1 Gaussian elimination procedure We have the following extension of Definition 37, p. 70, of equivalent 3 × 3 linear algebraic systems: Definition 42 (Equivalent n × n systems). Two n × n linear algebraic systems are called equivalent iff any solution of the first is also a solution of the second, and vice versa. The objective of the Gaussian elimination procedure is to transform the linear algebraic system in Eq. 2.58 into an equivalent upper triangular system,
84
Part I. The beginning – Pre–calculus
as in Eq. 2.60, whose solution is easy to obtain (Remark 22, p. 70). The procedure is conceptually identical to that used for the 3 × 3 systems (of course, we use the convention presented in Remark 25, p. 78, on reordering equations and/or renaming unknowns). Specifically, the Gaussian elimination procedure consists of two phases. The first phase, in turn, consists at most of n − 1 cycles. In the first cycle, we subtract the first equation, multiplied by a suitable constant, from all the other equations, so as to eliminate the first unknown from each one of them (of course, a different constant is used for each equation). In the second cycle, we subtract the second equation of the resulting system, multiplied by a suitable constant, from the h-th equation (h > 2), so as to eliminate the second unknown from all of them. If we continue this way, after the last cycle the resulting system is upper triangular. In the second phase of the Gaussian elimination procedure, we obtain the solution of the resulting upper triangular system (starting from xn and going backwards). ◦ Comment. The solution we obtain coincides with that of the original system, because the two systems are equivalent. [The proof of the equivalence of the original system to the upper triangular system obtained via Gaussian elimination will be provided by Theorem 15, p. 86, an extension to n × n systems of Theorem 12, p. 77, limited to 3 × 3 systems.] The details of the procedure and the corresponding existence/uniqueness theorems are provided in the two subsections that follow.
2.4.2 Details of the Gaussian elimination procedure
♠
Let us go back to the Gaussian elimination procedure for the system in Eq. 2.58. Remark 28. Again, here we use the convention in Remark 25, p. 78, on reordering equations and/or unknowns. Specifically, without loss of generality, we assume that no interchange of rows and/or columns is required. Indeed, we may always perform all the necessary interchanges before we begin the elimination procedure. In other words, we presume that all the necessary interchanges (albeit not known a priori ) have been performed before we begin the elimination procedure. [In practice, this requires that first we use a Gaussian elimination procedure, simply to determine the interchanges necessary. Next, we perform the required interchanges. Then, we go through the Gaussian elimination procedure a second time, this time with no need for interchanges. Of course, we would never do this. The important fact is that it can be done if we want to.]
85
2. Systems of linear algebraic equations
We proceed as stated in the preceding section: in order to eliminate the variable x1 from the second to the last equation, we multiply the first equation by ah1 /a11 and subtract the result from the h-th equation (h = 2, . . . , n), to obtain a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1 , 0 + a22 x2 + a23 x3 + · · · + a2n xn = b2 , 0 + a32 x2 + a33 x3 + · · · + a3n xn = b3 , ..., 0 + an2 x2 + an3 x3 + · · · + ann xn = bn ,
(2.61)
where ahk = ahk − bh = bh −
ah1 a1k a11
(k = 1, . . . , n),
ah1 b1 , a11
(2.62)
with h = 2, . . . , n. [Note that for k = 1, the first in Eq. 2.62 yields ah1 = 0 (h = 2, . . . , n), as shown in Eq. 2.61.] Remark 29. Here, the symbol ahk (to be read “ahk prime”) is used to indicate that the value of the coefficient in the location hk has been altered with respect to ahk . The same holds true for the coefficients bh . Note also that the coefficients of the first equation have not been changed with respect to Eq. 2.58. [Similarly, in Eq. 2.63 the symbol ahk (to be read “ahk double prime”) will be used to indicate that the value of the coefficient in the location hk has been altered with respect to ahk . The same holds true for the coefficients bh .] Next, we want to eliminate the variable x2 from the third to the last equation. To accomplish this, we multiply the second equation by ah2 /a22 and subtract the results from the h-th equation (h = 3, . . . , n), to obtain a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1 , 0 + a22 x2 + a23 x3 + · · · + a2n xn = b2 , 0+
0 + a33 x3 + · · · + a3n xn = b3 ,
..., 0+ where
0 + an3 x3 + · · · + ann xn = bn ,
(2.63)
86
Part I. The beginning – Pre–calculus
ahk = ahk − bh = bh −
ah2 a a22 2k
(k = 2, . . . , n),
ah2 b , a22 2
(2.64)
with h = 3, . . . , n. The first two rows remain unchanged with respect to Eq. 2.61. [Note that this time, Eq. 2.64 with k = 2 yields ah2 = 0 (h = 3, . . . , n), in agreement with Eq. 2.63.] If we continue this way, in the end we obtain that for sure the first n − 1 coefficients of the last equation are equal to zero, and the system is upper triangular.
2.4.3 Existence and uniqueness theorems
♠
In this subsection, included only for the sake of completeness, we present some theorems (including those of existence and uniqueness) that extend to n × n linear algebraic systems the 3 × 3 results. [The material here is essentially a repeat of Subsection 2.3.5, on existence and uniqueness theorems for 3 × 3 systems. Accordingly, you may skip it and take my word for every statement in this subsection.] Before proceeding, it is convenient to show that the Gaussian elimination procedure yields equivalent linear algebraic systems (Definition 42, p. 83), thereby generalizing Theorem 12, p. 77, limited to 3 × 3 systems. We have the following Theorem 15. The Gaussian elimination procedure yields an upper triangular system equivalent to the original one. ◦ Proof : The proof is virtually identical to that for the 3 × 3 case (Theorem 12, p. 77). Specifically, once we substitute the solution into the original system, adding an equation multiplied by a constant to another equation corresponds to adding the same number to the two sides of the second equation, which is a legitimate operation (Eq. 1.27). Generalizing Lemma 2, p. 78 (limited to 3 × 3 systems), we also have the following Lemma 3. Given an n × n system, consider the equivalent upper triangular system obtained through the Gaussian elimination procedure. Then, only two cases may occur, namely: 1. All the diagonal terms of the equivalent upper triangular system differ from zero.
2. Systems of linear algebraic equations
87
2. All the left–side coefficients of the last p equations of the equivalent upper triangular system vanish, where p can take any value between 1 and n − 1, namely p ∈ [1, n − 1]. ◦ Proof : The proof is similar to that for the 3 × 3 case (Lemma 2, p. 78). Specifically, the Gaussian elimination procedure is interrupted if, and only if, all the coefficients of the remaining equations vanish. [For, if at least one of the coefficients does not vanish, through suitable interchanges of equations and/or unknowns, this coefficient may be moved into the diagonal location under consideration, thereby allowing the procedure to be continued (Subsection 2.3.4)]. Then, generalizing Theorem 13, p. 78 (limited to 3 × 3 problems), we have the following Theorem 16 (Homogeneous n × n systems). Let us consider a homogeneous n×n linear algebraic system and the equivalent upper triangular system obtained via Gaussian elimination. We have only two possibilities, namely: 1. If all the diagonal coefficients of the equivalent upper triangular system are different from zero, only the solution x1 = · · · = xn = 0 exists. This solution is unique and is called the trivial solution. 2. If all the coefficients of the last p ∈ [1, n − 1] equations of the equivalent upper triangular system vanish, the solution exists, but is not unique. We can find ∞p distinct nontrivial solutions, which are defined except for multiplicative constants. ◦ Proof : Lemma 3 above assures us that there are only two possibilities. Consider the first one, namely that all the main–diagonal terms of the equivalent upper triangular system differ from zero. In this case, the only solution for the last equation is xn = 0. Then, we can back substitute this into the preceding equation and obtain xn−1 = 0, and so on. Thus, in this case, only the trivial solution of the upper triangular system exists. Therefore, the same is true for the original system, because the two are equivalent systems (Theorem 15, p. 86). Next, consider Item 2, namely that all the left–side coefficients on the last p ∈ [1, n − 1] equations vanish (Case 2 of Lemma 3 above). These equations may be discarded. Then, we have n−p equations with n unknowns. Thus, we can choose arbitrary values for the last p unknowns and solve for the others. [In particular, if p > 1, we can assign the value 1 to any one of them, and 0 to the remaining ones, for a total of p distinct solutions.] Next, consider the nonhomogeneous n × n problem, namely the case in which not all the coefficients bj vanish. Generalizing Theorem 14, p. 79, to the n × n problem, we have the following
88
Part I. The beginning – Pre–calculus
Theorem 17 (Nonhomogeneous n × n systems). Consider a nonhomogeneous n × n linear algebraic system and the equivalent upper triangular system obtained via Gaussian elimination. Then, we have three possibilities, namely: 1. If all the diagonal coefficients of the equivalent upper triangular system are different from zero, the solution exists and is unique. 2. If the equivalent upper triangular system has p ∈ [1, n − 1] equations with left–side coefficients that are all equal to zero, but the same does not hold true for the right–side terms, the solution does not exist. 3. If the equivalent upper triangular system has p ∈ [1, n − 1] equations with left–side coefficients that are all equal to zero, and the same holds true for the right–side terms, the solution exists, but is not unique. ◦ Proof : Item 1 corresponds to Case 1 of Lemma 3, p. 86. The other two are subcases of Case 2 of such a lemma. Let us consider first Item 1. Specifically, assume that all the diagonal terms of the resulting upper triangular system differ from zero. In this case, from the last equation we can obtain xn , then we can back substitute this into the preceding one and solve for xn−1 , and so on. Thus, the solution of the upper triangular system exists and is unique. The same is true for the original system, because the two are equivalent systems (Theorem 15, p. 86). Next, consider Item 2. We have, at least for one equation, 0 = c = 0, again a clear impossibility! The solution does not exist. Finally, consider Item 3, namely assume that all the left–side coefficients of the last p equations vanish, and the corresponding right–side terms vanish as well. The last p equations may be discarded, and we are left with n − p equations with n unknowns. Thus, we can choose arbitrary values for p unknowns and solve for the others. In this case, the solution exists, but is not unique. By setting one of the unknowns equal to 1 and the others to 0, we obtain p distinct solutions, which are known except for p arbitrary multiplicative constants.
Chapter 3
Matrices and vectors
In this chapter, I raise the ante, in the sense that I will increase the level of mathematical sophistication. However, I maintain, such an increase is for you a blessing in disguise. Let me clarify this statement. To begin with, I will cover pretty much the same material covered in Chapter 2. So, what’s new? The main difference is that I will use a more modern and sophisticated mathematical jargon and notation, which in the end will make your life easier. Indeed, this new jargon has two advantages. First of all, it is more concise and will allow us to communicate without using lengthy expressions. I am referring to the matrix notation. [Another step toward conciseness stems from the introduction of the terms “linear combination,” “linear dependence” and “linear independence.”] The second (and more important) advantage of this new language is that the concepts introduced here pave the way to the introduction of even more powerful generalizations, as extensions of those presented here. I am thinking of theorems on the superposition for linear algebraic systems and their later extension to the so–called linear operators. In summary, the approach introduced here may appear to be an unnecessary elaboration of what we have seen in the preceding chapter. However, I promise, your patience will be rewarded later.
• Overview of this chapter We begin this chapter by introducing a new mathematical entity, called the matrix. Just to give you a simple idea of a matrix, let us consider the coefficients in Eq. 2.1, which reads a x + b y = f, c x + d y = g.
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_3
(3.1)
89
90
Part I. The beginning – Pre–calculus
It is convenient (as it will be apparent at the end of this chapter) to conceive of the coefficients a, b, c and d as an array of numbers, namely a b A := . (3.2) c d This array is called a 2 × 2 matrix.1 In Section 3.1, we extend Eq. 3.2 to m×n matrices (namely arrays having m rows and n columns) and also consider an m×1 matrix, which is also called a column matrix, or an m-dimensional vector. For instance, the terms on the right sides of Eq. 2.1 may be considered to form a two–dimensional vector, which is written as f f := . (3.3) g Similarly, x :=
x y
(3.4)
is called the vector of the unknowns.2 Still in Section 3.1, we present two basic operations that govern the use of matrices, namely the product of a matrix (in particular, a vector) by a number, and the sum of two matrices (or two vectors) having the same dimension. These two operations are essential in Section 3.2, to introduce the ubiquitous summation symbol, as well as the definition of linear combinations of matrices and vectors. This, in turn, is used in Section 3.3, to introduce two definitions (extremely important for the material in this book), namely those of linear dependence and linear independence of matrices and vectors. Next, in Section 3.4, we introduce additional operations on matrices. We begin with the definition of products between matrices, in particular, between 1
The first to see that the coefficients of a linear algebraic system could be arranged into an array (namely a matrix) was Gottfried Wilhelm Leibniz (1646–1716), an eminent German mathematician and philosopher (yes, the one of the monads and metaphysical optimism). In mathematics, he is best known for introducing infinitesimal calculus, independently of Isaac Newton. [Considerations on the respective contributions were presented in Footnote 4, p. 12.] Moreover, Leibniz worked on determinants, linear equations and what is now called Gaussian elimination (Chapter 2). The Leibniz chain rule (Eq. 9.24) is named after him. In addition, Leibniz made significant contributions to analytic geometry.
2 On notations. In this book, upper–case sans serif letters, like A, B, . . . always represent matrices, as in Eq. 3.2, whereas lower–case sans serif letters, like a, b, . . . , always represent vectors, as in Eq. 3.3. The notation that uses upper–case or lower–case sans serif letters will be referred to as matrix notation. Brackets, namely [...], are used for matrices, braces (also as known as curly brackets), namely {...}, are used for vectors.
91
3. Matrices and vectors
a matrix and a vector. The usefulness of introducing these rules may be appreciated by the fact that the definition of the product of a matrix A by a vector will be such that a system of two equations with two unknowns, namely a system of the type in Eq. 3.1, may be written as a b x f = , (3.5) c d y g or, in a more compact form, A x = f,
(3.6)
where A, f and x are defined by Eqs. 3.2, 3.3 and 3.4, respectively. We also introduce symmetric and antisymmetric matrices and the matrix operation A : B (Eq. 3.95). Then, in Section 3.5, we revisit the Gaussian elimination method for the solution of linear algebraic systems and restate some of the theorems introduced in the preceding chapter in terms of the newly acquired terminology. This will enable us to use expressions that are much more concise than those in Chapter 2. Still in Section 3.5, we introduce some special matrices, namely positive definite matrices, and study the solvability of the corresponding systems. We also extend Eq. 3.6 to m × n matrices, with m = n. Next, in Section 3.6, we finally clarify what is meant by “linearity.” We will also show why the property of linearity is so important, by introducing certain theorems (known as superposition theorems) that hold for the so– called linear systems. We also have three appendices. In the first, Section 3.7, we introduce the so–called partitioned matrices. In the second, Section 3.8, we introduce the definition of the determinant of a matrix, which is a number that depends upon all the elements of the matrix. For the sake of simplicity, we do this only for the limited cases of 2 × 2 and 3 × 3 matrices. [For, the formulation for n × n matrices is quite complicated and unnecessary for the objective of this volume. As already stated in Remark 16, p. 58, when I was a student I always hated determinants of n × n matrices, and I still do, as they are complicated and not gut–level at all. They will be addressed when needed, namely in Vol. II.] Finally, in the third appendix (Section 3.9), we discuss the Gaussian elimination procedure for the so–called tridiagonal matrices.
92
Part I. The beginning – Pre–calculus
3.1 Matrices Here, we formally introduce matrices and vectors and some of their basic operations. Consider the left–side coefficients in Eq. 2.58. Akin to the 2 × 2 case (see Eq. 3.2), these coefficients form an n × n array. We have the following definitions: Definition 43 (Matrix). Expanding upon the definition in Eq. 3.2, an m×n matrix A is an array of numbers, with m rows and n columns. Specifically, we have ⎤ ⎡ a11 a12 . . . a1n ⎢ a21 a22 . . . a2n ⎥ ⎥ A = ahk := ⎢ (3.7) ⎣ . . . . . . . . . . . . ⎦. am1 am2 . . . amn The positive integers m and n are called the dimensions, or sizes, of the matrix A. The coefficients ahk , where h = 1, . . . , m and k = 1, . . . , n, are called the elements, or entries, of the matrix. The elements ah1 , . . . , ahn form the h-th row of the matrix A. The elements a1k , . . . , amk form the k-th column of the matrix A. The element ahk belongs to the h-th row and the k-th column. [Again, matrices are denoted by sans serif capital letters, such as A, B, . . . (Footnote 2, p. 90).] Definition 44 (Square matrix). An n × n matrix A is called a square matrix of order n. The main diagonal of a square matrix consists of the elements akk (k = 1, . . . , n), which are called the diagonal elements, as in Definition 40, p. 83. The other elements are called the off–diagonal elements. The superdiagonal (subdiagonal ) consists of the elements just above (just below) the main diagonal. The corresponding elements are called superdiagonal (subdiagonal ) elements. A matrix is called diagonal iff all the off–diagonal elements vanish; correspondingly, we will use the notation ⎡ ⎤ a11 0 . . . 0 ⎢ 0 a22 . . . 0 ⎥ ⎥ A=⎢ (3.8) ⎣. . . . . . . . . . . . ⎦ = Diag akk . 0 0 . . . ann A matrix is called tridiagonal iff its nonzero elements are only diagonal, subdiagonal and superdiagonal. Definition 45 (Transpose matrix). The transpose of an m × n matrix A, denoted by AT , is an n × m matrix, which is obtained by interchanging rows
93
3. Matrices and vectors
with columns, namely ahk with akh : ⎡ a11 a21 ⎢ a12 a22 AT = ⎢ ⎣. . . . . . a1n a2n
... ... ... ...
⎤ am1 am2 ⎥ ⎥. ... ⎦ amn
(3.9)
In simple terms, given an m × n matrix A, its transpose AT is obtained by “flipping” A around the line a11 , a22 , . . . . Note the difference between Eqs. 3.7 and 3.9, in that rows and columns are interchanged. [The term “transpose” is used also as a verb, specifically “to transpose A” means “to consider the transpose of A.”] Definition 46 (Upper and lower triangular matrices). Consistently with the definition of upper and lower triangular linear algebraic systems (Definition 41, p. 83), a square matrix is called upper triangular, or in upper triangular form, iff all the elements below its main diagonal vanish. A square matrix is called lower triangular, or in lower triangular form, iff all the elements above its main diagonal vanish.
3.1.1 Column matrices and row matrices A matrix consisting of a single column, as in Eq. 3.3, deserves a special name. Accordingly, let us introduce the following definitions: Definition 47 (Vector, or column matrix). A matrix with a single column and n rows is called an n-dimensional column matrix, or an ndimensional vector. Vectors are denoted by any of the following notations ⎧ ⎫ ⎪ ⎪ b1 ⎪ ⎪ ⎨ b2 ⎬ b = bh = , (3.10) . . .⎪ ⎪ ⎪ ⎩ ⎪ ⎭ bn where bh (h = 1, . . . , n) is called the h-th element of b. Definition 48 (Row matrix). A 1 × n matrix is called a row matrix, or simply an n-dimensional horizontal vector. Note that a row matrix is the transpose of a column matrix. Accordingly, we use the following notations cT = ck = c1 , c2 , . . . , cn .
(3.11)
94
Part I. The beginning – Pre–calculus
◦ Comment. It might be useful to emphasize that: (i) bracket, namely [ ], are used only for matrices, (ii) braces (curly brackets), namely { }, are used only for column matrices, namely vectors, and (iii) “funny–looking brackets,” namely , are used only for row matrices. [The last two notations are widely used in structural analysis, and I have adopted them because, in my opinion, they are visually helpful.] ◦ Warning. Sometimes, for the sake of notational compactness, we take advantage of Eq. 3.11, and use, instead of Eq. 3.10, the convenient notation b = b1 , b2 , . . . , bm T .
(3.12)
This is particularly convenient when the expression is used within the text. ◦ Warning. The k-th column of an m × n matrix A = [ahk ] is denoted by ak = a1k , a2k , . . . , amk T .
(3.13)
Similarly h-th row of the m × n matrix A = [ahk ] is denoted by ˘aTh = ah1 , ah2 , . . . , ahn .
(3.14)
Remark 30 (A word of caution. Vector vs vector). In Chapter 15, we will introduce vectors of another kind (such as the velocity in two or three dimensions). In other words, the term “vector,” which mathematicians traditionally use to refer to a column matrix, has a completely different meaning for physicists, who by the term “vector ” mean an oriented segment (such as an arrow), which satisfies specific rules and operations. This terminology may be really confusing. For example, one could encounter the expression: “The components of a vector form a vector,” to be understood as “the components of a vector (with the meaning given by physicists) form a vector (with the meaning given by mathematicians).” However, I have not been able to come up with a different terminology, so as to avoid any possible source of confusion. Indeed, there exists no other name for a vector, with the meaning given by physicists. On the other hand, there are special column matrices called eigenvectors (Remark 166, p. 961), for which no other name is used either. Thus, I will follow the tradition and use the term vector for both entities. However, I will use the terms mathematicians’ vector, or physicists’ vector, any time the meaning does not emerge clearly from the context. This way, we can say “The components of a physicists’ vector form a mathematicians’ vector,” without fear of causing any confusion. ◦ Warning. For the time being (namely until we arrive at Chapter 15) the term “vector” will be used only in reference to column matrices, namely mathematicians’ vectors.
3. Matrices and vectors
95
3.1.2 Basic operations on matrices and vectors We have the following Definition 49 (Scalar). Within a context that includes vectors, numbers are referred to as scalar quantities, or simply as scalars, just to emphasize their different nature when compared to vectors. Here, we introduce two basic operations for matrices: (i) multiplication of a matrix by a scalar, and (ii) sum of two matrices. Specifically, we have the following definitions: Definition 50 (Multiplication of a matrix by a scalar). Given a scalar β and an m × n matrix A, the multiplication of A by the scalar β is defined by ⎤ ⎡ β a11 β a12 . . . β a1n ⎢ β a21 β a22 . . . β a2n ⎥ ⎥. β A = β ahk := β ahk = ⎢ (3.15) ⎣ ... ... ... ... ⎦ β am1 β am2 . . . β amn In plain words, multiplying a matrix A by a scalar β means to multiply each element of A by β. Definition 51 (Sum of two matrices). Given two m × n matrices B and C, their sum is defined by B + C = bhk + chk = bhk + chk . (3.16) In plain words, the sum of two matrices corresponds to an element–by– element addition. A vector is a particular case of a matrix (column matrix). Thus, we have, as particular cases, the following definitions: Definition 52 (Multiplication of a vector by a scalar). Given a scalar α and an n-dimensional vector b, their multiplication is defined by ⎫ ⎧ α b1 ⎪ ⎪ ⎪ ⎪ ⎨ α b2 ⎬ α b = α bh := α bh = . (3.17) ... ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ α bn In plain words, multiplying a vector by a scalar α means to multiply each element by α.
96
Part I. The beginning – Pre–calculus
Definition 53 (Sum of two vectors). Given two n-dimensional vectors b and c, their sum is defined by b + c = b h + c h = bh + c h . (3.18) In plain words, the sum of two vectors having the same dimension corresponds to an element–by–element addition. Of course, similar definitions hold for row matrices. Just transpose the last two equations.
3.2 Summation symbol. Linear combinations In this section, we introduce a very important notion, that of linear combination. Before doing this, it is convenient to introduce the summation symbol, which allows us to shorten the corresponding expressions.
3.2.1 Summation symbol. Kronecker delta We have the following Definition 54 (Summation symbol). Given a sequence of q numbers, "q cp , cp+1 , . . . , cq−1 , cq , the expression k=p ck denotes a sum that includes all the terms ck , with the integer k varying, sequentially, from k = p to k = q ≥ p, namely q #
ck := cp + cp+1 + · · · + cq−1 + cq .
(3.19)
k=p
"q The symbol k=p is known as the summation symbol.3 The subscript k is called the index of summation, whereas p and q are called the end values of the index of summation. Note that, if q = p, only one term is included, namely p #
ck := cp .
(3.20)
k=p 3
The symbol coincides with the Greek capital letter sigma, Σ, which stands for “S,” the initial letter of the word “Summa” (Latin for “the highest,”, and hence “the sum”).
97
3. Matrices and vectors
The definition may be broadened to include the case q < p, by adopting the convention that if q < p none of the terms is included, namely q #
ck := 0,
for q < p.
(3.21)
k=p
It is apparent that the use of the summation symbol shortens considerably certain expressions, and hence it is quite useful. For instance, using the summation symbol in Eq. 3.19, the system of n equations with n unknowns in Eq. 2.58 may be written in a more compact way as n #
ahk xk = bh
(h = 1, 2, . . . , n).
(3.22)
k=1
• Rules in the use of the summation symbol There are certain rules for the use of the summation symbol that make it even more useful. The first rule is that the index of summation k may be renamed as we wish, namely we may replace k with any other letter. For instance, we have q #
ck =
k=p
q #
ch .
(3.23)
h=p
Both sides equal cp + cp+1 + · · · + cq−1 + cq . For this reason, the index of summation is often referred to as the dummy index of summation. Even more useful is the consideration that we may replace k with h + r, with r fixed, but arbitrary. In this case however, we have to change the end values of the summation index accordingly. Specifically, if we set k = h + r, we note that: (i) when k = p, we have h = p − r, whereas (ii) when k = q, we have h = q − r. Thus, we have q # k=p
ck =
q−r #
ch+r .
(3.24)
h=p−r
Again, both sides equal cp + cp+1 + · · · + cq−1 + cq . Sometimes we want to extend the summation to all the terms ck , with p ≤ k ≤ q, but we want to exclude one specific term, say the term cr , with p ≤ r ≤ q. Then, we use the following notation
98
Part I. The beginning – Pre–calculus q #
ck := cp + cp+1 + · · · + cr−1 + cr+1 + · · · + cq−1 + cq .
(3.25)
k=p
(k=r)
Also, note that, if we have a double summation and the end values of the second index of summation do not depend upon those of the first, then the two symbols may be interchanged q p # #
chk =
h=n k=r
p q # #
chk .
(3.26)
k=r h=n
Finally, sometimes in the case of the double–summation symbol with indices of summation spanning over the same range (say h and k, both spanning from p to q), we will use the concise notation q #
chk :=
h,k=p
q q # #
chk .
(3.27)
h=p k=p
The self–evident extension to more than two summation symbols is also used. "n " n ◦ Comment.♣ For the expression h=0 k=h chk , the interchange rule of the two summation symbols is a bit more complicated and not of interest at this point. However, for future reference, we have k n # #
chk =
k=0 h=0
as you may verify (use Fig. 3.1).
Fig. 3.1 Double summation
n n # # h=0 k=h
chk ,
(3.28)
99
3. Matrices and vectors
• The Kronecker delta, δhk It is convenient at this point to introduce the Kronecker delta, which is defined by4 δhk = 1,
for h = k;
= 0,
for h = k.
(3.29)
[The symbol δ denotes the Greek letter delta (Section 1.8). The use of the letter δ for the Kronecker delta is universally accepted; hence its name.] An important rule that governs the use of the Kronecker delta is that, for any ak (k = 1, . . . , n), we have n #
δhk ak = ah ,
(3.30)
k=1
since, according to Eq. 3.29, all the terms in the sum vanish whenever k = h, leaving only the h-th term, which indeed equals ah .
3.2.2 Linear combinations The two matrix operations introduced in Subsection 3.1.2 (multiplication by a scalar and addition) allow us to introduce the definition of linear combination. Specifically, we have the following Definition 55 (Linear combination of matrices). By linear combination of q given m×n matrices, A1 , . . . , Aq , with q given scalars c1 , . . . , cq (known as the coefficients of the linear combination), we mean the sum of the products of each matrix by the corresponding coefficient, namely the following expression q #
c k A k = c1 A 1 + · · · + c q A q .
(3.31)
k=1
In particular, for vectors (namely column matrices), we have the following Definition 56 (Linear combination of vectors). By linear combination of q given n-dimensional vectors, b1 , . . . , bq , with q given scalars c1 , . . . , cq , 4 Named after the German mathematician Leopold Kronecker (1823-1891). He proposed that mathematics limits itself to “mathematical objects which could be constructed with a finite number of operations from the integers” (Ref. [50]). He had major disputes with other mathematicians of his time, in particular with Georg Cantor. [For more on this subject, see Hellman, Ref. [30], Chapter 6.]
100
Part I. The beginning – Pre–calculus
we mean the sum of the products of each vector by the corresponding scalar, namely the following expression: q #
c k b k = c1 b 1 + · · · + c q b q .
(3.32)
k=1
◦ Comment. The above definition may be extended to linear combinations of row matrices, which is simply given by the transpose of the above equation. Also, we can speak of linear combinations of linear algebraic equations. Specifically, by linear combination of m equations with n unknowns (namely "n of the expressions k=1 ahk xk = bh , where h = 1, . . . , m, Eq. 3.22), with m arbitrarily prescribed coefficients c1 , . . . , cm , we mean the following expression: # m m n # # ahk xk = c h bh , (3.33) ch h=1
k=1
h=1
which may be written as cT A x = cT b.
(3.34)
[We can also have linear combinations of other mathematical entities, such as real functions of a real variable (introduced in Section 6.1), operators (introduced in Subsection 6.8.1) and sequences (introduced in Section 8.2).]
3.3 Linear dependence of vectors and matrices In this section, we introduce two very important definitions, specifically those of linear dependence and linear independence of vectors and matrices. As we will see, these notions are crucial in reformulating the results of the preceding chapter, namely those regarding the existence and uniqueness of the solution of an n × n linear algebraic system. [The importance of these notions stems from the fact that they have a much broader applicability. Indeed, they also apply to other mathematical entities. For instance, linear dependence/independence of functions is addressed in Section 6.2.] ◦ Comment. As mentioned in the introductory remarks for this chapter, these notions represent a considerable increase in the level of abstraction of the material. Indeed, the concepts might be hard to grasp at first. However, once you have assimilated them, you will find out that it was worth the effort — you have to trust me on that. Let us begin by introducing the following
101
3. Matrices and vectors
Definition 57 (The zero vector 0 and the zero matrix O). The symbol 0 denotes the zero vector, namely a vector whose elements are all equal to zero. A nonzero vector is any vector different from 0. Similarly, the symbol O denotes the zero matrix, namely a matrix whose elements are all equal to zero. A nonzero matrix is any matrix different from O. Then, we have the following Definition 58 (Linearly dependent vectors). Consider n nonzero m-dimensional vectors b1 , . . . , bn . They are called linearly dependent iff there exists a nontrivial linear combination (namely one in which not all the coefficients ck vanish) such that n #
ck bk = 0.
(3.35)
k=1
Remark 31. The above definition is equivalent to saying that one of the vectors may be expressed as a nontrivial linear combination of the others. Indeed, in view of the fact that at least one of the coefficients, say cj , differs from zero, then, bj may be expressed in terms of the others, as (for the summation symbol in the equation below, recall Eq. 3.25) bj = −
n # ck bk . cj
(3.36)
k=1
(k=j)
◦ Comment. In other words, given n nonzero p-dimensional vectors, the following three expressions are equivalent: 1. The vectors are linearly dependent; 2. There exists a nontrivial linear combination of the vectors equal to the zero vector; 3. One of the vectors may be expressed as a linear combination of the others. In particular, two linearly dependent vectors are necessarily proportional to one another, as you may verify. Next, consider the following Definition 59 (Linearly independent vectors). The n nonzero vectors b1 , . . . , bp are called linearly independent iff they are not linearly dependent, namely iff the fact that
102
Part I. The beginning – Pre–calculus n #
ck bk = 0
(3.37)
k=1
necessarily implies that all the coefficients ck vanish. [In particular, two vectors are linearly independent iff they are not proportional to one another.] We have the following Theorem 18 (Equating coefficients). Given a collection of n linearly independent vectors bk (k = 1, . . . , n), if we have that n #
αk bk =
k=1
n #
βk bk ,
(3.38)
(k = 1, . . . , n).
(3.39)
k=1
then, necessarily, α k = βk
"n ◦ Proof : Equation 3.38 may be written as k=1 (αk − βk ) bk = 0, which implies αk − βk = 0 (namely Eq. 3.39), because of the linear independence of the vectors bk . Similarly, if we have that n # k=1
αk bk +
n #
βk bk =
k=1
n #
γk bk ,
(3.40)
k=1
then, necessarily, α k + β k = γk
(k = 1, . . . , n).
(3.41)
The proof is conceptually identical to that of the above theorem. Indeed, you can have as many linear combinations of the same vectors as you wish. Of course, similar considerations apply to matrices. After all, vectors are particular matrices, namely column matrices. Thus, we have the following Definition 60 (Linearly dependent, linearly independent matrices). Consider q nonzero m × n matrices A1 , . . . , Aq . They are called linearly dependent iff there exists ck (k = 1, . . . , q), not all equal to zero, such that q #
ck Ak = O.
(3.42)
k=1
The q nonzero matrices A1 , . . . , Aq are called linearly independent iff they are not linearly dependent, namely iff the fact that
103
3. Matrices and vectors q #
ck Ak = O
(3.43)
k=1
necessarily implies that all the coefficients ck vanish. ◦ Comment. Similar definitions apply to equations. [Note that related concepts were already used in connection with Eq. 2.32, where we were discussing equations, instead of vectors (see also Eq. 3.33). Indeed, the situation in Eq. 2.32 may be restated as: “The three equations in Eq. 2.28, with b = 9 and c = 16 are linearly dependent, since a nontrivial linear combination of these equations, with coefficients 1, −2 and 1, vanishes.” The comment that follows Eq. 2.32 may be restated as: “The second equation is equal to a linear combination of the other two, with coefficients equal to 12 .”]
3.4 Additional operation on matrices Thus far, we have introduced the following operations for matrices: products by a scalar, sums, and their combined use, namely linear combinations. Here, we introduce an additional operation for matrices, namely the product between two matrices and that between a matrix and a vector. Then, we introduce related material, which includes symmetric and antisymmetric matrices and the operation A : B.
3.4.1 Product between two matrices We have the following Definition 61 (Product between matrices). Given an m × q matrix A and a q × n matrix B, the product between A and B, denoted by C = A B,
(3.44)
is an operation that produces an m × n matrix C, whose elements are given by chk =
q #
ahj bjk
j=1
We will also use the notation
(h = 1, . . . , m; k = 1, . . . , n).
(3.45)
104
Part I. The beginning – Pre–calculus
chk = ahj bjk ,
(3.46)
with the same meaning. For instance, we have 1 2 1 0 (1 · 1 + 2 · 2) = 0 1 2 3 (0 · 1 + 1 · 2)
(1 · 0 + 2 · 3) 5 6 = . (0 · 0 + 1 · 3) 2 3
(3.47)
Remark 32. Note that the two subscripts j involved in the sum in Eq. 3.45 are the second subscript of the first matrix and the first subscript of the second matrix. This is a general rule: “In matrix products, the summation index always corresponds to the subscripts in the middle of the expression (adjacent subscripts).” Note also that the element chk is obtained by adding all the products of the j-th element of the h-th row of A by the j-th element of the k-th column of B. For this reason, the matrix product defined above is sometimes referred to as a “row–by–column product.” [This is the only type of matrix–by–matrix product of interest in this book, and hence it is referred simply as the matrix product.] Accordingly, in Eq. 3.44, the dimension of the rows of A (namely the number of columns of A) and the dimension of the columns of B (namely the number of rows of B) must be equal. Moreover, A and C have the same number of rows, whereas B and C have the same number of columns. This is shown in the following scheme, where A is a 5 × 3 matrix, B is 3 × 4, C is 5 × 4: ⎡ ⎤ ⎡ ⎤ ∗ ∗ ∗ ⎡ ∗ ∗ ∗ ∗ ⎤ ⎢∗ ∗ ∗⎥ ∗ ∗ ∗ ∗ ⎢∗ ∗ ∗ ∗⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ∗ ∗ ∗ ⎥ ⎣ ∗ ∗ ∗ ∗ ⎦ = ⎢ ∗ ∗ ∗ ∗ ⎥. (3.48) ⎢ ⎥ ⎢ ⎥ ⎣∗ ∗ ∗⎦ ∗ ∗ ∗ ∗ ⎣∗ ∗ ∗ ∗⎦ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ◦ Warning. These facts will be often tacitly assumed in the rest of this book.
• Matrix algebra is non–commutative We have to emphasize that, in general, in a matrix product the order of the two matrices cannot be changed: A B = B A.
(3.49)
[However, exceptions are possible (see Eq. 3.52).] To express Eq. 3.49 concisely, mathematicians say: “In general, matrices do not commute.” An equivalent statement is: “Matrix algebra is non–commutative.”
105
3. Matrices and vectors
◦ Comment. Let us examine this issue in detail. Consider a matrix algebra limited, for the time being, to the matrix operations of linear combinations (Subsection 3.2.2) and products (Eq. 3.44). [Matrix inversion will be introduced in Section 16.2.] This algebra is similar to that of real numbers. However, . . . there is a huge however. For the m × q matrix A and the q × n matrix B we have defined the product A B (Eq. 3.44). However, if m = n the product B A does not even exist, because of the above rule that the number of columns of the first matrix must be equal to the number of rows of the second. Moreover, if m = n, both A B and B A exist, but if n = q, namely if the matrices A and B are not square, we have again A B = B A. [For, A B is an n × n matrix, whereas B A is a q × q matrix.] Therefore, necessary conditions to have A B = B A are that the matrices: (i) be square and (ii) have the same dimensions. However, these conditions are not sufficient. For instance, we have 1 2 1 0 5 6 1 0 1 2 1 2 = , whereas = , (3.50) 0 1 2 3 2 3 2 3 0 1 2 7 as you may verify (follow Eq. 3.47). In summary, we can say that matrix algebra is equal to real–number algebra, except for Eq. 3.49 (again, non–commutative algebra). Accordingly, we have to introduce the following Definition 62 (Left– and right–multiplication). Consider the product A B. We read this as follows: “The matrix A is right–multiplied by the matrix B.” Alternatively, we say: “The matrix B is left–multiplied by the matrix A.” Finally, recall that, immediately below Eq. 3.49, I wrote: “However, exceptions are possible.” Indeed, we have, for instance, a 0 c 0 ac 0 c 0 a 0 = = . (3.51) 0 b 0 d 0 bd 0 d 0 b Accordingly, let us introduce the following Definition 63 (Commuting matrices). Two matrices, A and B, are said to commute, iff A B = B A.
(3.52)
[In particular, any two n-dimensional diagonal matrices commute, as you may verify. More on this in Theorem 157, p. 628.] You might like to know that matrices that commute are the exception, not the rule. [The general necessary and sufficient conditions for two matrices to commute are addressed in Vol. II.]
106
Part I. The beginning – Pre–calculus
• Transpose of a matrix product We have the following Theorem 19 (Transpose of matrix product). Given an m × q matrix A and a q × n matrix B, the transpose of the matrix product A B is given by T A B = BT A T , (3.53) namely the transpose of A B is obtained by taking the transpose of each and interchanging the order of multiplication. "n ◦ Proof : Consider the matrix C = A B, which has elements chk = j=1 ahj bjk , by definition of matrix product (Eq. 3.45). Then, setting, for any matrix Q, "n "n T QT =: [qhk ], we have cThk = ckh = j=1 akj bjh = j=1 bThj aTjk , in agreement with Eq. 3.53. Similarly, we have
ABC
T
= CT BT AT ,
(3.54)
as you may verify. [Apply Eq. 3.53 twice.]
3.4.2 Products involving column and row matrices The product operation between matrices is still valid when a matrix is right– multiplied by a column matrix (namely a vector), or left–multiplied by a row matrix, or both. These cases are examined below.
• Matrix–by–vector multiplication A particular case of the matrix–by–matrix multiplication C = A B (Eq. 3.44) occurs when B is a vector (column matrix). In this case, the result is itself a column matrix. Specifically, the multiplication of an m × n matrix A by an n-dimensional vector b yields an m-dimensional vector c, namely c = A b.
(3.55)
In terms of elements, we have ch =
n # k=1
ahk bk
(h = 1, . . . , m).
(3.56)
107
3. Matrices and vectors
• Multiplications involving row matrices Similarly, the operation c T = bT A
(3.57)
means, in terms of elements, ck =
m #
bh ahk
(k = 1, . . . , n).
(3.58)
c k bk .
(3.59)
h=1
We also have cT b =
n # k=1
In particular, if c = b, we have, for any p-dimensional real vector b, bT b =
n #
b2k ≥ 0,
(3.60)
k=1
where the equal sign applies if, and only if, b = 0. The fact that, for any b real, we have bT b ≥ 0 allows us to introduce the following Definition 64 (Magnitude of a vector). Given any real n-dimensional vector b, its magnitude, denoted by the symbol b, is defined by $ % n √ %# T b := b b = & b2k ≥ 0, (3.61) k=1
where the equal sign applies if, and only if, b = 0. ◦ Food for thought. You may conceive of this definition as a way to extend to n dimensions the Pythagorean theorems in two and three dimensions (Eqs. 5.6 and 5.8). Finally, if A is an m × n matrix, b is an n-dimensional vector and c is an m-dimensional vector, we have cT A b =
n m # # h=1 k=1
For instance, for m = n = 2, we have
ch ahk bk .
(3.62)
108
Part I. The beginning – Pre–calculus
a11 a12 b1 T c A b = c1 c 2 = c1 a11 b1 +c1 a12 b2 +c2 a21 b1 +c2 a22 b2 . (3.63) a21 a22 b2 Similarly, for m = n = 3, we have cT A b = c1 a11 b1 + c1 a12 b2 + c1 a13 b3 + c2 a21 b1 + c2 a22 b2 + c2 a23 b3 + c3 a31 b1 + c3 a32 b2 + c3 a33 b3 .
(3.64)
In particular, if c = b we have n #
bT A b =
bh ahk bk .
(3.65)
h,k=1
For instance, for m = n = 2, we have a11 a12 b1 T = a11 b21 +a12 b1 b2 +a21 b2 b1 +a22 b22 . b A b = b1 b2 a21 a22 b2
(3.66)
Similarly, for m = n = 3, we have bT A b = a11 b21 + a12 b1 b2 + a13 b1 b3 + a21 b2 b1 + a22 b22 + a23 b2 b3 + a31 b3 b1 + a32 b3 b2 + a33 b23 . • The best–kept secret in matrix algebra I would like to point out that ch =
"n
k=1
c = Ab =
n #
(3.67)
♣
ahk bk (Eq. 3.56) implies bk a k ,
(3.68)
k=1
where ak = a1k , . . . , amk T denotes the k-th column of A (Eq. 3.13). [This equivalence will turn out to be quite useful in applications.] Typically, in my opinion, this fact is not adequately emphasized. Accordingly, in my classes, I refer to it as the “best–kept secret in matrix algebra.” Similarly, we have cT A =
m #
ch ˘aTh ,
h=1
where ˘aTh = ah1 , . . . , ahn is the h-th row of A (Eq. 3.14).
(3.69)
109
3. Matrices and vectors
3.4.3 Symmetric and antisymmetric matrices We have the following Definition 65 (Symmetric and antisymmetric matrices). A square matrix A = ahk is called symmetric iff it is equal to its transpose, A = AT , namely iff ahk = akh .
(3.70)
A square matrix A = ahk is called antisymmetric (or skew–symmetric), iff it is equal to the opposite of its transpose, A = −AT , namely iff ahk = −akh .
(3.71)
[Equation 3.71 implies that, for all the antisymmetric matrices, we have ahh = 0,
(3.72)
since 0 is the only number equal to its opposite.]
• Distinct elements in symmetric and antisymmetric matrices A symmetric 2 × 2 matrix is given by a b A= b d
(3.73)
(namely Eq. 3.2, with c = b), where the values of a, b and d are arbitrary. Thus, these matrices have only three distinct elements. On the other hand, an antisymmetric 2 × 2 matrix has only one distinct element, as it is given by 0 b A= . (3.74) −b 0 Similarly, a symmetric 3 × 3 matrix has only six distinct elements, and is given by ⎡ ⎤ a b c A = ⎣ b d e ⎦, (3.75) c e f
110
Part I. The beginning – Pre–calculus
where the values of a, b, c, d, e and f are arbitrary. On the other hand, an antisymmetric 3 × 3 matrix has only three distinct elements, and is given by ⎡ ⎤ 0 a b A = ⎣ −a 0 c ⎦, (3.76) −b −c 0 with arbitrary values for a, b and c. The generic n × n symmetric matrix is given by ⎤ ⎡ a11 a12 . . . a1n ⎢ a12 a22 . . . a2n ⎥ ⎥ A=⎢ ⎣ . . . . . . . . . . . . ⎦. a1n a2n . . . ann
(3.77)
The number of distinct elements is 21 n(n − 1). Indeed, we have n elements along the diagonal. In addition, only half of the n2 − n remaining elements are distinct. Accordingly, in total we have n+
1 2 1 (n − n) = n (n + 1). 2 2
(3.78)
◦ Comment. Alternatively, we can use the following approach. We have n distinct elements in the first row, n − 1 elements in the second row distinct from those in the first row, n − 2 elements in the third row distinct from those in the first two rows, and so on, and only one element in the last row distinct from those in the preceding rows. The number of distinct elements is the sum of n elements from the first row, n − 1 from the second, and so on, namely n + (n − 1) + (n − 2) + · · · + 2 + 1 =
n # k=1
k=
1 n (n + 1). 2
(3.79)
[Hint: Following what Gauss arguably did when he was about ten (Footnote 2, p. 59), we can place the numbers 1, 2, . . . , n, in increasing order on one line and, in decreasing order from n to 1, in the line below. We have n columns, with the sum of two numbers in each column always equal to n + 1. Thus, the sum of all the numbers in the two lines is n (n + 1), with one half of this in each line.] Similarly, the generic n × n antisymmetric matrix is given by ⎡ ⎤ 0 a12 . . . a1n ⎢ −a12 0 . . . a2n ⎥ ⎥ A=⎢ (3.80) ⎣ . . . . . . . . . . . . ⎦. −a1n −a2n . . . 0
111
3. Matrices and vectors
The number of distinct elements is 12 n (n − 1), as you may verify. [Hint: Subtract n from Eq. 3.78.] ◦ Comment. The sum of the distinct elements of a symmetric matrix and of an antisymmetric matrix equals 1 1 n (n + 1) + n (n − 1) = n2 , 2 2
(3.81)
as it should.
• Properties of symmetric and antisymmetric matrices If y = x, for a 2 × 2 symmetric matrix A = [ahk ], Eq. 3.66 simplifies into xT A x = a11 x21 + a22 x22 + 2 a12 x1 x2 ,
(3.82)
as you may verify. Similarly, for a 3 × 3 symmetric matrix, we have (use Eq. 3.67) xT A x = a11 x21 + a22 x22 + a33 x23 + 2 a12 x1 x2 + 2 a13 x1 x3 + 2 a23 x2 x3 .
(3.83)
On the other hand, if A is antisymmetric, we have, by definition, AT = −A, and hence Eq. 3.86 yields, for any x and y, yT A x = −xT A y.
(3.84)
In particular, if y = x, we have xT A x = 0,
(3.85)
since 0 is the only number opposite of itself. For instance, for a 3 × 3 antisymmetric matrix A = [ahk ], using the fact that akk = 0 (Eq. 3.72), we have xT A x = (a12 + a21 ) x1 x2 + (a13 + a31 ) x3 x1 + (a23 + a32 ) x2 x3 , which vanishes because akh = −ahk . ◦ Comment. For future reference, note that yT A x is a scalar, and hence equal to its transpose. Thus, using (A B C)T = CT BT AT (Eq. 3.54), we have (for any matrix) T yT A x = yT A x = xT AT y. (3.86) If the matrix A is symmetric, we have AT = A by definition, and hence Eq. 3.86 yields, for any x and y,
112
Part I. The beginning – Pre–calculus
yT A x = xT A y.
(3.87)
[Vice versa, if the above equation is valid for any x and y, then the matrix A is symmetric, as you may verify.] Remark 33. Also for future reference, note that, for any Q, we have (use (A B C)T = CT BT AT , Eq. 3.54), T T Q A Q = QT AT Q. (3.88) Therefore, if A is symmetric, QT A Q is also symmetric. Finally, we have the following Theorem 20. The product of two symmetric matrices is symmetric if, and only if, the two matrices commute. ◦ Proof : Note that, for symmetric matrices, taking the transpose of the product of two symmetric matrices is equivalent to reversing the order of the product. Indeed, we have (for the first equality, use Eq. 3.53) T A B = BT AT = B A. (3.89) Therefore, iff they commute (Eq. 3.52), we have (A B)T = A B.
• Decomposition into symmetric and antisymmetric parts
♣
The above definitions of symmetric and antisymmetric matrices allow us to introduce the following Definition 66 (Symmetric and antisymmetric parts of a matrix). The symmetric part of any square matrix A = ahj is defined as AS :=
1 A + AT . 2
(3.90)
The antisymmetric part of any square matrix A = ahj is defined as 1 A − AT . (3.91) 2 [You may verify that indeed AS := 12 A + AT is a symmetric matrix, namely ATS = AS, and AA := 12 A − AT is antisymmetric, namely ATA = −AA.] AA :=
We have the following
113
3. Matrices and vectors
Theorem 21. Any matrix may be uniquely decomposed into its symmetric and antisymmetric parts, as A = A S + AA ,
(3.92)
with AS and AA given by Eqs. 3.90 and 3.91. ◦ Proof : Indeed, we have A S + AA =
1 1 A + AT + A − AT = A. 2 2
(3.93)
On the other hand, assume that A = AS + AA, with AS and AA symmetric and antisymmetric, respectively (not necessarily given by Eqs. 3.90 and 3.91). This implies AT = ATS + ATA = AS − AA. Combining, we obtain the unique expressions AS = 12 A + AT and AA = 12 A − AT , in agreement with Eqs. 3.90 and 3.91. For future reference, using A = AS + AA (Eq. 3.92) and xT AA x = 0 (Eq. 3.85, we have, for any square matrix A and any vector x, xT A x = xT AS + AA x = xT AS x. (3.94)
3.4.4 The operation A : B
♣
Here, we introduce one more matrix operation, with applications to symmetric and antisymmetric matrices. Specifically, we have the following (recall Eq. 3.27 for the definition of double summation symbols) Definition 67 (The operation A : B). Given two n × n matrices, A = ahk and B = bhk , the operation A : B is defined by A:B =
n #
ahk bhk .
(3.95)
h,k=1
[Note that the result of the operation is a scalar. Also, recall that in this book the symbol “:” is not used for division (Definition 5, p. 20).] In particular, for any A, we have A:A =
n #
a2hk ≥ 0,
h,k=1
where the equal sign holds only for A = O.
(3.96)
114
Part I. The beginning – Pre–calculus
From the definition, we have A : B = AT : BT
and
A : B = B : A.
(3.97)
If A is an n × n symmetric matrix and B is an n × n antisymmetric matrix, we have A:B = 0 iff AT = A and BT = −B . (3.98) Indeed, the first in Eq. 3.97 yields A : B = AT : BT = −A : B, which implies Eq. 3.98. [For instance, for 3 × 3 matrices, using ahk = akh and bkk = 0 (Eq. 3.72), we have A : B = a12 (b12 + b21 ) + a13 (b13 + b31 ) + a23 (b23 + b32 ),
(3.99)
which vanishes, because bhk = −bkh .] Moreover, if A is a symmetric matrix and B an arbitrary matrix, we have that A : B = A : BS iff AT = A . (3.100) Indeed, using B = BS + BA (Eq. 3.92), as well as Eq. 3.98, we have A : B = A : BS + BA = A : BS.
(3.101)
Similarly, if A is an antisymmetric matrix and B an arbitrary matrix, then A : B = A : BA iff AT = −A , (3.102) as you may verify.
3.5 A x = b revisited. Rank of A As anticipated in Eq. 3.6, the notation for matrix–by–vector product introduced above (namely Eq. 3.55 to mean Eq. 3.56), allows us to rewrite the "n expression k=1 ahk xk = bh (with h = 1, . . . , n; Eq. 3.22) in a more compact form, as A x = b,
(3.103) where A = ahk is an n × n matrix, whereas x = xk and b = bh are n-dimensional vectors.
115
3. Matrices and vectors
Here, we take advantage of Eq. 3.103 to reformulate, in terms of linear independence of the rows of A, the results on Gaussian elimination obtained in the preceding chapter, and introduce the notions of: (i) rank of a matrix and (ii) singular and nonsingular matrices.
3.5.1 Linear independence and Gaussian elimination Let us begin by showing that the rows of an upper triangular matrix with no zero diagonal elements are linearly independent. Specifically, we have the following Lemma 4. Consider an n × n upper triangular matrix U. If all the diagonal elements differ from zero, namely if ukk = 0
(k = 1, . . . , n),
(3.104)
then the rows of U are linearly independent. ◦ Proof : Consider the following problem: ⎡ ⎤ u11 u12 . . . u1n n # ⎢ 0 u22 . . . u2n ⎥ ⎥ ch uhk = c1 , c2 , . . . , cn ⎢ ⎣. . . . . . . . . . . . ⎦ = 0 h=1 0 0 . . . unn
(k = 1, . . . , n), (3.105)
namely a linear combination of the rows of U equated to zero. In terms of scalar equations, we have c1 u11 = 0, c1 u12 + c2 u22 = 0, ..., c1 u1n + c2 u2n + · · · + cn unn = 0.
(3.106)
Recall that ukk = 0, by hypothesis. Thus, the first in Eq. 3.106 yields c1 = 0. Then, the second in Eq. 3.106 yields c2 = 0. Continuing this way, we obtain, successively, c3 = 0, . . . , cn = 0. In summary, to satisfy Eq. 3.105 all the coefficients must vanish. Hence, the rows of U are linearly independent, by definition of linearly independent row matrices (transpose Definition 59, p. 101). Next, consider the upper triangular matrix U, as obtained through the Gaussian elimination procedure. According to Lemma 3, p. 86, there exist only two possibilities: either all the diagonal elements of U differ from zero,
116
Part I. The beginning – Pre–calculus
or all the elements of the last p ∈ [1, n − 1] rows of U vanish. Then, we have the following Theorem 22. Consider an n × n matrix A and the corresponding upper triangular matrix U, as obtained through the Gaussian elimination procedure. If all the diagonal elements of U differ from zero, the rows of A are linearly independent. On the other hand, if all the elements of the last p rows of U vanish, then only r := n − p rows of A are linearly independent. ◦ Proof : Again, for simplicity, and without loss of generality, here we assume that no interchange of rows and/or columns is required (Remark 28, p. 84). Let us consider the first possibility, namely that all the diagonal elements of U differ from zero. Then, according to Lemma 4 above, the rows of U are linearly independent. Therefore, the rows of the original matrix A are also linearly independent. [For, the k-th row of U equals the k-th row of A with the addition of a linear combination of the preceding rows of A. Hence, a linear combination of the rows of U is equal to a linear combination of the rows of A. Thus, we have that there exists no nontrivial linear combination of the rows of A which vanishes.] This completes the first part of the theorem. Next consider the other possibility, namely that all the elements of the last p rows of U vanish (p = 1, 2 . . . ). This means that each of the last p rows of A is a linear combination of the preceding rows of A (which are linearly independent, as you may verify). In other words, only the first n − p rows of A are linearly independent (Definition 58, p. 101; see also Eq. 3.36, and the comment that follows). ◦ Comment. If you have had a course on matrix algebra based upon the use of determinants, you will appreciate the advantage of introducing the notions of linear dependence/independence through the Gaussian elimination procedure, instead of using determinants. [Let me say it one more time: I always hated determinants as a student, and I still do — they are formally elegant, but abstract — not gut–level at all.]
3.5.2 Rank. Singular matrices Next, it is convenient to introduce the following Definition 68 (Rank). The rank r of a square n × n matrix A is the largest number of linearly independent rows of A. [Of course, we have r = n − p, where p is the number of rows that are linearly dependent upon the preceding ones, as determined by the Gaussian elimination procedure.] Accordingly, we can restate Lemma 3, p. 86, as follows
3. Matrices and vectors
117
Lemma 5. Given an n × n system A x = b, only two possibilities may occur 1. The rank of A equals n. [This corresponds to: “All the diagonal elements of the equivalent U differ from zero.”] 2. The rank of A differs from n, say r = n−p, with p > 0. [This corresponds to: “All the elements of the last p rows of the equivalent U vanish.”] ◦ Proof : This is simply a way to restate Lemma 3, p. 86 in terms of rank (Definition 68 above). [See also Theorem 22, p. 116.] We also have Definition 69 (Singular and nonsingular matrices). An n × n matrix A is called nonsingular iff its rows are linearly independent, namely iff its rank is equal to n. An n × n matrix A is called singular iff its rows are linearly dependent, namely iff its rank is less than n.
3.5.3 Linear dependence/independence of columns We have the following theorems: Theorem 23. Given a square n × n matrix A, the following statements are equivalent: 1. its columns ak (k = 1, . . . , n) are linearly independent, 2. its rows ˘aTh (h = 1, . . . , n) are linearly independent, 3. A is nonsingular (rank equals n). ◦ Proof : Indeed, by definition of linear independence, to say that the columns ak are linearly independent is equivalent to saying that their linear combina"n tion k=1 ck ak equals 0 if, and only if, all the coefficients ck vanish. In other "n words, using the “best–kept secret in matrix algebra” (A c = k=1 ck ak , Eq. 3.68), the columns are linearly independent iff the only solution of A c = 0 is c = 0. This is true iff A is nonsingular, namely iff its rows are linearly independent (Definition 69 above). Theorem 24. Given a square n × n matrix A, with rank r < n, we have that the number of linearly dependent columns equals the number of linearly dependent rows, namely r. ◦ Proof :♣ The fact that the number of linearly dependent rows equals r stems directly from the definition of rank (Definition 68, p. 116). Next, consider the homogeneous system A x = 0, and perform the Gaussian elimination
118
Part I. The beginning – Pre–calculus
procedure. [Without loss of generality, we assume again that no interchange of rows and/or columns is required (Remark 28, p. 84).] The fact that there are p = n − r rows linearly dependent means that the last p rows of the resulting system may be discarded. Accordingly, using the partitioned–matrix notation in Appendix A (Section 3.7), we have obtained that the original n × n system A x = 0 is equivalent to an r × n system, given by x1 UQ = 0, (3.107) x2 where U is a nonsingular upper triangular r × r matrix, whereas Q is an r × p matrix, and x2 is arbitrary. Setting x2 = −{αk } (namely αk = −xr+k , with k = 1, . . . , p), and using the “best–kept secret in matrix algebra” (Q c = "n k=1 ck qk , where qk is the k-th column of Q, Eq. 3.68), we obtain Q x2 = " n − k=1 αk qk . Accordingly, Eq. 3.107 is equivalent to U x1 = −Q x2 =
p #
αk qk ,
(3.108)
k=1
where αk are arbitrary. Next, choosing α1 = 1 and αh = 0 (h = 2, . . . , p), we obtain q1 = Ux1 , for which a unique solution x1 exists. Therefore, q1 equals a linear combination of the (r-dimensional) columns of U, namely is linearly dependent upon them. Correspondingly, denoting by ah the h-th (n-dimensional) column of the original matrix A, we have that the column ar+1 is a linear combination of the first r columns ah (h = 1, . . . , r). [For, adding a row multiplied by a constant to another row does not affect the fact that a column of a matrix is a linear combination of r other ones, as you may verify.] Of course, the same result applies to the other columns as well.
3.5.4 Existence and uniqueness theorem for A x = 0
♣
Here, we address at a greater depth the theorems of existence and uniqueness for the n × n homogeneous problem A x = 0. We may restate Theorem 16, p. 87, on the solution of a homogeneous n × n linear algebraic system, as follows Theorem 25 (Homogeneous n × n systems). Consider a homogeneous n × n systems of linear algebraic equations and an equivalent upper triangular system. Then, there exist only two possibilities:
3. Matrices and vectors
119
1. The rank of the matrix is n (nonsingular matrix). Then, there exists only the trivial solution. 2. The rank of the matrix is r = n − p < n (singular matrix). Then, there exist ∞p linearly independent nontrivial solutions, which are uniquely defined except for multiplicative constants (Remark 21, p. 66). ◦ Comment. The theorem is simply a more precise restatement of Theorem 16, p. 87. Specifically, “distinct” is replaced by “linearly independent,” a concept that at that point had not yet been introduced. [Indeed, one could have distinct but not linearly independent solutions.] ◦ Proof : Item 1 (r = n) presents no changes with respect to Theorem 16, p. 87. For Item 2 (r < n), let us consider the nontrivial solutions ' ( (k) x1 xk := (3.109) (k) x2 (k) (k) of Eq. 3.107, where x2 is arbitrary. Next, let us choose x2 = x2 = − δjk (j, k = 1, . . . , p). These are the desired linearly independent solutions, because (k) the vectors x2 = − δjk are linearly independent, as you may verify. Any other solution may be expressed as a linear combination of these, as you may verify. [This fact will be further addressed in Theorem 35, p. 129, on the superposition of the solutions of A x = 0.]
3.5.5 Existence and uniqueness theorem for A x = b
♣
In this subsection, we address nonhomogeneous n×n linear algebraic systems.
• Rouch´ e–Capelli theorem Theorem 17, p. 88, on nonhomogeneous n × n linear algebraic systems, may be restated as follows:5 5
Named after the French mathematician Eug` ene Rouch´ e (1832–1910) and the Italian mathematician Alfredo Capelli (1855–1910). [This name for the theorem is used in English speaking countries and in Italy. This theorem is also known as the Rouch´ e–Fonten´ e theorem (France), the Rouch´ e–Frobenius theorem (Spain and many countries in Latin America), the Kronecker–Capelli theorem (Russia, Poland, Romania), and the Frobenius theorem (Czechia and Slovakia). Apparently, each country has its own preference. There might even be more names for this theorem. These are only those I have been able to find by surfing the web.]
120
Part I. The beginning – Pre–calculus
Theorem 26 (Rouch´ e–Capelli theorem). Consider a nonhomogeneous n × n linear algebraic system. Introduce the so–called augmented n × (n + 1) matrix ⎤ ⎡ a11 a21 . . . an1 b1 ⎢ a12 a22 . . . an2 b2 ⎥ + ⎥ A =⎢ (3.110) ⎣ . . . . . . . . . . . . . . .⎦ = [A, b], a1n a2n . . . ann bn namely the matrix obtained by augmenting the matrix A, by including the + + vector b as the last column of A . Let us define the rank of A as the greatest + of the ranks of the n matrices obtained from A by removing one of the columns of A. Then, we have the following three possibilities: 1. The rank of A is n (nonsingular matrix). Then, the solution exists and is unique. + 2. The rank of A is r = n − p < n, but the rank of A is greater than r. Then, the solution does not exist. + 3. The rank of both A and A is r = n − p < n. Then, the solution exists, but it is not unique. For, p of the unknowns may be chosen arbitrarily and we have ∞p linearly independent solutions. ◦ Proof : Again, the theorem is simply a more concise way to state Theorem + 17, p. 88. Indeed, if the rank of A is greater than r (Item 2), it means that there exists at least one equation that is linearly independent of the r equations corresponding to the r linearly independent rows of A, as you may verify. [Hint: Use Lemma 5, p. 117, along with the definitions of rank and singular matrix (Definitions 68 and 69 above).]
• Conditions for the solution to exist Let us examine the problem at a greater depth. We have the following Theorem 27. Consider the problem A x = b, where A is a square n × n matrix, with rank r < n. Then, the necessary condition for the solution of this problem to exist is that the following p = n − r equations are satisfied: cTk b = 0
(k = 1, . . . , p),
(3.111)
where ck are p linearly independent solutions of A T ck = 0
(k = 1, . . . , p).
(3.112)
121
3. Matrices and vectors
◦ Comment. Note that, for r < n, the problem AT x = 0 always admits p = n − r > 0 linearly independent solutions, say ck (Theorem 25, p. 118). ◦ Proof : Left–multiply A x = b by cTk (k = 1, . . . , p), and use cTk A = 0 (Eq. 3.112). This yields Eq. 3.111. Thus, the conditions in Eq. 3.111 is necessary for the solution to exist. We can improve upon the above theorem and obtain that the above conditions are also sufficient. Indeed, we have the following Theorem 28. Consider a nonhomogeneous n × n linear algebraic system, A x = b. Let r = n − p < n denote the rank of A. This means that the last p rows of A are linearly dependent upon the first r = n − p, namely that there exist p × r coefficients qjh (j = r + 1, . . . , n; h = 1, . . . , r) such that ajk =
r #
qjh ahk
(j > r).
(3.113)
h=1
[Again, without loss of generality, we have assumed that no interchange of rows and/or columns is required (Remark 28, p. 84).] The theorem states that, correspondingly, the solution of A x = b exists, provided that bj =
r #
qjh bh
(j > r).
(3.114)
h=1
[In other words, the fact that we have the same coefficients qjk in Eqs. 3.113 and 3.114 is a sufficient condition for the solution to exist.] ◦ Proof : Note that Eqs. 3.113 and 3.114 imply that the j-th equation (j > r) may be written as n # k=1
ajk xk − bj =
r # h=1
qjh
# n
ahk xk − bh
(j > r),
(3.115)
k=1
as you may verify. [Hint: Recall that two summation symbols, with fixed end may be interchanged (Eq. 3.26).] Thus, the j-th equation (j = r + 1, . . . , n) is a linear combination of the preceding ones. Accordingly, the last p equations are redundant, in the sense that if the first r equations are satisfied, the last p equations are automatically satisfied as well. Hence, they may be discarded, leaving r equations with n unknowns. ◦ An alternate proof.♥ A more elegant and concise proof (albeit less gut– level) is presented here. Let A1 and A2 denote respectively the top portion of A (specifically its first r rows) and its bottom portion (specifically the last p rows). Similarly, let b1 and b2 denote the corresponding top and bottom portion of b. Then, Eqs. 3.113 and 3.114 may be written as
122
Part I. The beginning – Pre–calculus
A 2 = Q A1
and
b 2 = Q b1 ,
where Q = [qjh ]. Then, for any x such that A1 x = b1 , we have A2 x − b2 = Q A1 x − b1 = 0,
(3.116)
(3.117)
in agreement with Eq. 3.115. Accordingly, the conclusions are the same as in the proof above. ◦ Comment. One could have the impression that the necessary conditions differ from the sufficient conditions. Here, we show that the two are equivalent. To this end, let us introduce the n × n identity matrix given by ⎡ ⎤ 1 0 0 ... 0 ⎢ ⎥ ⎢ 0 1 0 ... 0 ⎥ ⎥ 0 0 1 . . . 0 I := δhk = ⎢ (3.118) ⎢ ⎥ ⎣... ... ... ... ...⎦ 0 0 0 ... 1 (where δhk is the Kronecker delta), which is such that, for any b, we have I b = b, as you may verify. Then, using the partitioned–matrix notation in Appendix A (Section 3.7), Eq. 3.116 may be written as A1 b1 Q, −I =O and Q, −I = 0, (3.119) A2 b2 or CT A = O and CT b = 0 (where we have set [Q, −I] =: CT = [ck ]T ), in agreement with Eqs. 3.112 and 3.111, respectively, as you may verify.
• Fredholm alternative Here, we consider a square matrix n × n matrix A. We have the following6 Theorem 29 (Fredholm alternative). Consider the problem A x = b, where A is a square n × n matrix. For the solution to exist, there are two alternatives: (i) for r = n, the solution exists and is unique; (ii) for r < n, the solution exists iff Eq. 3.111 holds. In this case, there exist ∞p linearly independent solutions. ◦ Proof : For Item (i), the solution exists and is unique (Item 1 of the Rouch´e– Capelli theorem, Theorem 26, p. 120). For Item (ii), use Theorems 27 and 28 6 Named after the Swedish mathematician Erik Ivar Fredholm (1866–1927). The Fredholm integral equations are also named after him. He made seminal contributions to the theory of integral equations and operator theory, which anticipated Hilbert’s work. The Fredholm alternative applies, in essence, to the Fredholm integral equations as well.
123
3. Matrices and vectors
above. The last p unknowns are arbitrary. This fact may be used to obtain p linearly independent solution, as these may be obtained, for instance, by setting one of the elements of x2 equal to one, and the other equal to zero, which guarantees their linear independence (Theorem 25, p. 118).]
3.5.6 Positive definite matrices
♣
In the following, we introduce an important class of real symmetric matrices, namely the so–called positive definite matrices, which play a relevant role in mechanics, as we will see. Recalling that symmetric matrices are necessarily square, we have the following Definition 70 (Real positive definite and related matrices). A real positive definite A is a symmetric matrix that satisfies the following inequality xT A x > 0,
for all x = 0.
(3.120)
A real positive semidefinite A is a symmetric matrix that satisfies the following inequality xT A x ≥ 0,
for all x = 0.
(3.121)
The definitions of real negative definite and real negative semidefinite matrices are analogous, specifically with the > sign replaced by the < sign. You might wonder whether such matrices exist. In order to reassure you, I will give you an example of such a matrix. Let D denote a 3 × 3 matrix in which the off–diagonal elements vanish, namely ⎡ ⎤ d11 0 0 D = ⎣ 0 d22 0 ⎦. (3.122) 0 0 d33 Then, we have (see Eq. 3.83) xT D x = d11 x21 + d22 x22 + d33 x23 .
(3.123)
If dkk > 0 (k = 1, 2, 3), we have xT D x > 0, for all x = 0. Hence, the matrix is real positive definite. On the other hand, if one or two coefficients dkk vanish, while the others are positive, we have xT D x ≥ 0, for all x = 0. [For instance, if d11 = 0 (and the others are positive), we have xT D x = 0, for x = c, 0, 0 T , and xT D x > 0 otherwise. Hence, such a matrix is positive semidefinite.]
124
Part I. The beginning – Pre–calculus
We have the following Theorem 30. The columns and rows of a real positive definite matrix A are linearly independent. ◦ Proof : We use the proof–by–contradiction approach. Assume, to obtain a contradiction, that the columns ak are linearly dependent, namely that there "n exists coefficients ck such that k=1 ck ak = A c = 0 (use the “best–kept secret in matrix algebra,” Eq. 3.68). This implies cT A c = 0, in contradiction to the assumption that A is real positive definite. Of course, the rows are also linearly independent, since for a symmetric matrix the h-th element of the k-th row is equal to the h-th element of the k-th column. We have the following Corollary 1. Consider a real positive definite n × n matrix A. We have that A is nonsingular, namely that its rank is n, and that the solution of A x = b exists and is unique. ◦ Proof : Theorem 30 above implies that, by definition, the rank of A is n (Definition 68, p. 116), namely that the matrix A is nonsingular (Definition 69, p. 117). Then, the existence and uniqueness of the solution of A x = b is an immediate consequence of the Rouch´e–Capelli theorem (Item 1 of Theorem 26, p. 120).
3.5.7 A x = b, with A rectangular
♣♣♣
In Subsection 3.5.5, we have addressed the theorems of existence and uniqueness for n × n case, with rank r ≤ n. Here, we address the problem A x = b in greater depth. Specifically, we extend our analysis to the case in which A is an m × n matrix, with m = n. ◦ Warning. Again, without loss of generality, we assume that the Gaussian elimination procedure requires no interchange of rows and/or columns (Remark 28, p. 84). Depending upon the structure of the system resulting from the Gaussian elimination procedure, we may identify three possible cases, in addition to the square n × n matrix: (i) a rectangular m × n matrix, with m < n and rank r = m; (ii) a rectangular m × n matrix, with n < m and rank r = n; (iii) an m × n matrix, with rank r < m and r < n. These cases are addressed below, after we present some prerequisite results on the linear independence of rows and columns of m × n matrices.
3. Matrices and vectors
125
• Linear independence of rows and columns of m × n matrices In Theorems 23, p. 117, and 24, p. 117, we showed that the number of linearly independent columns in an n × n square matrix with rank r ≤ n equals the number of linearly independent rows. Here, we generalize such a theorem to rectangular m × n matrices, with m = n. We have the following Theorem 31. Given an m × n matrix A, the number of linearly independent columns equals the number of linearly independent rows. ◦ Proof : The case m = n has already been addressed in Subsection 3.5.3. The case in which the number of rows is less than the number of columns, namely m < n, may be easily recast in terms of those in Theorems 23, p. 117, and 24, p. 117, by adding n − m rows all composed of zeros. Thus, we only need to address the case in which the number of columns is less than the number of rows, namely n < m. Let us apply the standard Gaussian elimination procedure; after n cycles, we obtain that the last m − n rows are all necessarily composed of zeros. Thus, the last m − n rows are necessarily linearly dependent upon the preceding ones and may be discarded, thereby generating a square matrix. Therefore, we have again recast our problem in terms of that in Theorem 24, p. 117.
• Case I: An m × n matrix, with rank r = m < n In the remainder of this subsection, we address the existence and uniqueness theorems for an m × n system A x = b, with m = n. Here, we consider the case m < n, with the rank r equal to m. This implies that all the rows are linearly independent, namely that the system resulting from the Gaussian elimination procedure is of the type x1 U A2 (3.124) = b , x2 where U is an m × m nonsingular upper triangular matrix. Of course, there are only m < n linearly independent columns (Theorem 31 above). In view of the fact that the n − m elements of x2 are arbitrary, there exist n − m linearly independent solutions. [Again, the k-th of these may be obtained, for instance, by setting the k-th element of x2 equal to one, and the others equal to zero, which guarantees their linear independence (Theorem 25, p. 118).]
126
Part I. The beginning – Pre–calculus
• Case II: An m × n matrix, with rank r = n < m Here, we consider the case m > n, with rank r = n. This implies that all the n columns are linearly independent, namely that the system resulting from the Gaussian elimination procedure is of the type U b1 x= , (3.125) O b2 where U is an r × r nonsingular upper triangular matrix. There are two possibilities. The first is that b2 = 0, in which case the solution does not exist. The second is that b2 = 0, in which case the last m − r equations may be discarded, and we are back to the n × n case – the solution exists and is unique.
• Case III: An m × n matrix, with r < m and r < n Finally, consider the general case, in which the rank is lower than the number of both rows, m, and columns, n. This implies that the system resulting from the Gaussian elimination procedure is of the type U A2 x1 b1 = , (3.126) O1 O2 x2 b2 where U is an r × r nonsingular upper triangular matrix. As in Case II, there are two possibilities. The first is that b2 = 0, in which case the solution does not exist. The second is b2 = 0, in which case the last m − r equation may be discarded and we are back to Case I (the solution exists, but is not unique).
• Generalization of Rouch´ e–Capelli theorem We may summarize the results of this subsection as follows. For rectangular matrices, we have the following (a generalization of the Rouch´e–Capelli theorem, Theorem 26, p. 120, valid for square matrices) Theorem 32. Consider a rectangular an m × n matrix A, with rank r, with + r ≤ m and r ≤ n. Consider the augmented m × (n + 1) matrix A , defined by ⎤ ⎡ a11 a12 . . . a1n b1 ⎢ a21 a22 . . . a2n b2 ⎥ + ⎥ (3.127) A =⎢ ⎣ . . . . . . . . . . . . . . .⎦ = [A, b]. am1 am2 . . . amn bm
127
3. Matrices and vectors +
+
Let r denote the rank of A . [As in the Rouch´e–Capelli theorem (Theorem + 26, p. 120), the rank of A is defined as the greatest of the ranks of the n + matrices obtained from A by removing one of the columns of A.] There are + + + two alternatives: (i) r > r and (ii) r = r. If r > r, no solution exists. + If r = r, a solution exists; such a solution is not unique (∞n−r solutions), unless r = n in which case the solution is unique. ◦ Proof : Assume, for the time being, that r is less than both m and n (Case III). This implies that, after Gaussian elimination, the resulting system is of the type (use Eq. 3.126) U A2 x1 b1 = , (3.128) O 1 O 2 x2 b2 where U is an r × r upper triangular matrix and A2 is an r × (n − r) matrix. In addition, the bottom matrices (namely O1 , O2 and b2 ) have m − r rows. + Next, let us consider the first alternative, namely that r > r. This fact implies necessarily that b2 = 0, as you may verify. In this case, no solution exists. On the other hand, turning to the second alternative, the fact that + r = r implies necessarily that b2 = 0. Hence, the bottom matrices may be discarded and we have ∞n−r solutions. To complete the proof, note that, if r = n (Case II), A2 and O2 do not appear in the equation. This fact does not affect the results, except for the fact that in this case the solution is unique. On the other hand, if r = m (Case I), the bottom (zero) matrices do not appear in the equation and we + can only have r = r. The solution exists but is not unique.
3.6 Linearity and superposition theorems In this section, we consider a very important concept: the linearity of the operation of multiplication by a matrix. We also address some important superposition theorems that descend from the linearity of this operation.
3.6.1 Linearity of matrix–by–vector multiplication Consider the following Definition 71 (Linear operations on vectors). Consider an operation L, which, performed on a vector, produces another vector. Iff we have
128
Part I. The beginning – Pre–calculus
L (α a + β b) = α La + β Lb,
(3.129)
the operation is called linear. We have the following Theorem 33 (Linearity of matrix–by–vector multiplication). The operation of matrix–by–vector multiplication is linear, in the sense that A β b + γ c = β A b + γ A c. (3.130) ◦ Proof : Indeed, we have (use Eq. 3.56) n #
n n # # ahk β bk + γ ck = β ahk bk + γ ahk ck ,
k=1
k=1
(3.131)
k=1
in agreement with Eq. 3.130.
Remark 34. Incidentally, the fact that the operation of matrix–by–vector multiplication is linear (Theorem 33 above) motivates the terminology linear algebraic equations. [This is the motivation referred to in Footnote 1, p. 57.] ◦ Comment. Note that Eq. 3.130 is equivalent to A (β b) = β A b,
(3.132)
A (b + c) = A b + A c.
(3.133)
These equations provide an equivalent definition of linearity for the operation of matrix–by–vector multiplication. The equivalence is established as follows: (i) Eqs. 3.132 and 3.133 are consequences of Eq. 3.130. [For, setting γ = 0 in Eq. 3.130, one obtains Eq. 3.132, whereas setting β = γ = 1 in Eq. 3.130, one obtains Eq. 3.133]; (ii) vice versa, Eq. 3.130 is a consequence of Eqs. 3.132 and 3.133. [For, applying first Eq. 3.133 and then Eq. 3.132 to the left side of Eq. 3.130, one obtains the right side.]
3.6.2 Superposition in linear algebraic systems In this subsection, we present some superposition theorems valid for any linear algebraic system. For simplicity, we limit ourselves to square n × n matrices. We have the following
129
3. Matrices and vectors
"p Theorem 34 (Superposition theorem for A x = k=1 bk ). Consider a linear algebraic system, given by A x = b (Eq. 3.103), where A is nonsingular and b=
p #
ck bk .
(3.134)
c k xk ,
(3.135)
k=1
The solution to Eq. 3.103 is given by x=
p # k=1
where xk satisfies the equation A x k = bk .
(3.136)
◦ Proof : We have Ax = A
p #
c k xk =
k=1
p #
c k A xk =
k=1
p #
ck bk = b,
(3.137)
k=1
as you may verify. [Hint: Use Eq. 3.135 for the first equality, Eq. 3.130 (linearity) for the second, Eq. 3.136 for the third, Eq. 3.134 for the last.] Recall that, if the rank r of an n × n matrix A is less than n, then there always exist p = n − r linearly independent solutions xk (k = 1, . . . , p) to the nonhomogeneous problem A x = 0 (Theorem 25, p. 118). Then, we have the following Theorem 35 (Superposition of solutions of A x = 0). Let xkH (with k = 1, . . . , p = n − r) denote p linearly independent nontrivial solutions of the homogeneous system A xH = 0, where the rank of A is r < n. Then, any linear combination of xkH , namely x=
p #
ck xkH ,
(3.138)
k=1
where ck are arbitrary constants, is also a solution. ◦ Proof : We have Ax = A
p # k=1
ck xkH =
p # k=1
ck A xkH = 0,
(3.139)
130
Part I. The beginning – Pre–calculus
in agreement with the theorem.
◦ Comment. Point of clarification on notation. In this book, only lower– case superscripts denote powers. Hence, xkH in Eq. 3.138 does not mean xk to the power H. It simply denotes the k-th solution of the Homogeneous system A xH = 0. Similarly, given any vector with a subscript, say aQ, I typically use for its components the notation ahQ , with Q as a superscript. My students, after an initial shock, end up appreciating this notation. Consider the following Definition 72 (Associated homogeneous system). Given an n×n linear algebraic system of the type A x = b, the associated homogeneous system of equations is obtained by setting b = 0, to yield A x = 0. Recall that in Remark 24, p. 74, we noted that the difference between two distinct solutions of the nonhomogeneous problem equals a solution of the homogeneous problem. We can now generalize such an observation and state the following very important Theorem 36 (On the uniqueness of nonhomogeneous solution). If the associated homogeneous system of equations A xH = 0 admits only the trivial solution xH = 0, then the solutions of the nonhomogeneous equation A x = b is unique. On the other hand, if the associated homogeneous equation admits p = n − r nontrivial linearly independent solutions xkH (k = 1, . . . , p), then any solution of A x = b may be written as x = xN +
p #
ck xkH ,
(3.140)
k=1
where the coefficients ck are arbitrary, whereas xN is any solution of the nonhomogeneous problem. ◦ Proof : The expression in Eq. 3.140 satisfies the nonhomogeneous equation A x = b, as you may verify. Next, let us assume that the nonhomogeneous system has two solutions, namely that there exist x1 and x2 such that A x1 = b and A x2 = b. Then, we have A (x1 − x2 ) = A x1 − A x2 = b − b = 0.
(3.141)
Assume first that the associated homogeneous system, A x = 0, admits only the trivial solution. Then, Eq. 3.141 implies x1 = x2 , and hence the solution is unique, as stated in the first part of the theorem. On the other hand, for p = n − r > 0, the associated homogeneous system admits p linearly independent
131
3. Matrices and vectors
nontrivial solutions (Theorem 25, p. 118). Then, Eq. 3.141 implies x1 − x2 = "p H k=1 ck xk (Theorem 35 above), in agreement with Eq. 3.140. ◦ Comment. The definition of linearity in Eq. 3.129 applies, virtually unchanged, to a much broader class of linear problems, and all the superposition theorems introduced above for matrix–by–vector multiplications apply to these linear problems as well (Section 9.8). The usefulness of the superposition theorems is the reason why much of the book is limited to the solutions of linear problems. [The fact that we deal primarily with the solution of linear problems might give you the wrong impression that linearity is the norm. On the contrary, in nature, linear problems are the exception, not the norm. The truth is that we know how to obtain exact solutions for linear problems more easily than for nonlinear ones. For the latter, we typically obtain approximate solutions by using computational techniques. In engineering, the designers choose systems that behave linearly, whenever they can.]
3.7 Appendix A. Partitioned matrices
♣
Given a generic m × p matrix B and a generic m × q matrix C, consider the m × n matrix A (with n = p + q), defined as follows: the first p columns of A coincide with the columns of B, whereas the last q columns of A coincide with the columns of C. In other words, we have ⎡ ⎤ ⎡ ⎤ a11 . . . a1p a1,p+1 . . . a1n b11 . . . b1p c11 . . . c1q A = ⎣ . . . . . . . . . . . . . . . . . . ⎦:= ⎣ . . . . . . . . . . . . . . . . . . ⎦. (3.142) am1 . . . amp am,p+1 . . . amn bm1 . . . bmp cm1 . . . cmq This may be written also as ahk = bhk = ch,k−p
(k = 1, . . . , p); (k = p + 1, . . . , n).
(3.143)
We may also look at it from a different angle, namely that the matrix A has been split up into two parts (partitioned), the first being denoted by B, the second by C. For this reason, the matrix A is called the partitioned matrix that is obtained by including the columns of B followed by the columns of C. To refer to a partitioned matrix, we use the notation A= BC . (3.144) If we have more than one matrix, we write
132
Part I. The beginning – Pre–calculus
A = A1 . . . A n . [For clarity, the following notation may also be used A = A1 , . . . , A n ,
(3.145)
(3.146)
namely with a comma separating the various matrices.] Of course, the partitioning can be done also by rows, or by both rows and columns. Accordingly, analogous definitions are used for A1 (3.147) A= A2 and A11 A12 . A= A21 A22
The same is true for column and row matrices, namely ⎧ ⎫ ⎨ b1 ⎬ b = ... ⎩ ⎭ bn
(3.148)
(3.149)
and cT = cT1 . . . cTn . [If needed for clarity, the following notation may also be used cT = cT1 , . . . , cTn , namely with a comma separating the various row matrices.] In general, we will use the concise notation ⎡ ⎤ A11 A12 . . . A1n ⎢ A21 A22 . . . A2n ⎥ ⎥ A = Ahj := ⎢ ⎣ . . . . . . . . . . . . ⎦, Am1 Am2 . . . Amn
(3.150)
(3.151)
(3.152)
where the compatibility of the dimensions of the various matrices Ahj is tacitly understood. The above notation (as well as similar ones presented below) will be referred to as partitioned–matrix notation. For future reference, in analogy with the definition of diagonal matrices (Definition 44, p. 92), we have the following
133
3. Matrices and vectors
Definition 73 (Block–diagonal matrix). A square matrix A is called block–diagonal iff all of its diagonal blocks Akk are square, and all the off– diagonal blocks vanish. We use the notation ⎤ ⎡ A11 O . . . O ⎢ O A22 . . . O ⎥ ⎥ A = Ahj := ⎢ (3.153) ⎣ . . . . . . . . . . . . ⎦ = Diag Akk . O O . . . Amn
• Operations with partitioned matrices
♣
We have the following rules:
A1 A 2
b1 = A1 b1 + A2 b2 b2
(3.154)
(of course, the number of columns of Ak equals the number of elements of bk ) and A bT1 bT2 1 = bT1 A1 + bT2 A2 (3.155) A2 (of course, the number of rows of Ak equals the number of elements of bk ). [I strongly recommend that you verify the two equations above. Recall the row–by–column rule, Remark 32, p. 104.] In general, we have A A c1 = bT1 A11 c1 + bT1 A12 c2 + bT2 A21 c1 + bT2 A22 c2 , (3.156) bT1 bT2 11 12 A21 A22 c2 as you may verify. [Note the analogy with the rule for non–partitioned matrices a11 a12 c1 b1 b2 = b1 a11 c1 + b1 a12 c1 + b2 a21 c2 + b2 a22 c2 , (3.157) a21 a22 c2 presented in Eq. 3.63.] Similarly, if A, B and C are respectively m × n, n × p and p × q matrices, we have ⎡ T⎤ ⎡ T ⎤ ˘a1 ˘a1 B c1 ˘aT1 B c2 . . . ˘aT1 B cq . . . . . . . . . ⎦, (3.158) A B C = ⎣ . . . ⎦ B c1 , c2 , . . . , cq = ⎣ . . . ˘aTm ˘aTm B c1 ˘aTm B c2 . . . ˘aTm B cq
134
Part I. The beginning – Pre–calculus
where ˘aTh (h = 1, . . . , m) are the rows of A and ck (k = 1, . . . , q) are the columns of C. Thus, the generic element fhk of the m × q matrix F := A B C is given by fhk = ˘aTh B ck .
(3.159)
As a general rule for the multiplication of partitioned matrices, consider two partitioned matrices, A = Ahj and B = Bjk (with h = 1, . . . , m; j = 1, . . . , q; k = 1, . . . , n). Assume that Ahj and Bjk are multipliable, namely that the number of columns of Ahj are equal to the number of rows of Bjk . We have ⎡ ⎤⎡ ⎤ A11 A12 . . . A1n B11 B12 . . . B1p ⎢ A21 A22 . . . A2n ⎥ ⎢ B21 B22 . . . B2p ⎥ ⎥⎢ ⎥ C := A B = ⎢ (3.160) ⎣ . . . . . . . . . . . . ⎦ ⎣ . . . . . . . . . . . . ⎦ = Chk , Am1 Am2 . . . Amn Bn1 Bn2 . . . Bnp with Chk =
q #
Ahj Bjk .
(3.161)
j=1
◦ Comment. In analogy with Eq. 3.157, in multiplying partitioned matrices, we can treat the “internal” matrices as if they were (scalar) elements of ordinary matrices. Specifically, note the full analogy of Eq. 3.161 with "n chk = j=1 ahj bjk (Eq. 3.45). Thus, the rule for the multiplication of partitioned matrices is formally equal to that of matrices, except for the fact that the order of the matrices Ahj and Bjk in Eq. 3.161 cannot be interchanged, because matrix algebra is non–commutative (Eq. 3.49).
3.8 Appendix B. A tiny bit on determinants
♣
As stated above, all the results that pertain to linear algebraic systems, in particular the Rouch´e–Capelli theorem (Theorem 26, p. 120, on systems of n linear algebraic equations with n unknowns) are traditionally presented in terms of determinants, instead of the Gaussian elimination procedure and linear independence. [A notable exception is the already cited book by Sheldon Axler (Ref. [5]), which as stated above is much more advanced than what we are doing here.] In my experience, first as a student and then as a teacher, the general theory of determinants is complicated, cumbersome, boring, and hence un-
135
3. Matrices and vectors
warranted and undesirable at this point, since it has invariably a negative impact on the interest and motivation of the reader. Accordingly, determinants will be addressed in full when their use will make our life simpler, namely in Vol. II. However, determinants of 2 × 2 and 3 × 3 matrices are used before then. [For instance, the 3 × 3 case is conveniently used in Section 15.3, on the cross product between two physicists’ vectors (Eq. 15.78).] Accordingly, in this appendix, we present an elementary introduction to determinants, limited to 2 × 2 and 3 × 3 matrices.
3.8.1 Determinants of 2 × 2 matrices
♣
Consider the following Definition 74 (Determinant of 2×2 matrix). The determinant of a 2×2 matrix A, denoted by the symbols |A| or Det ahk , is defined by
a11 a12
:= a11 a22 − a12 a21 .
(3.162) |A| = Det ahk = a21 a22 Using this definition, the solution of a 2 × 2 system of linear algebraic equations may be written in a more compact form. Specifically, Eq. 2.15 may be written as x=
A1 A
where (see Eq. 3.162)
a b
= |A|, A= c d
and
y=
f b
, A1 = g d
A2 , A
a f
. A2 = c g
(3.163)
(3.164)
In other words, we have obtained the following7 Theorem 37 (Cramer rule for 2 × 2 systems). For a nonhomogeneous system of 2 equations with 2 unknowns, with A = |A| = 0, the solution may be written as the ratio of two determinants: the denominator is the determinant of the matrix A, whereas the numerator for the first (second) unknown is 7 Named after the Swiss mathematician Gabriel Cramer (1704–1752), who introduced it in 1750, for the general n × n case. According to Boyer (Ref. [13], p. 431), this is one of several cases of inaccurate attributions. Indeed, Colin Maclaurin published the rule in 1748, in his posthumous work Treatise on Algebra, and appears to have known it much earlier than that.
136
Part I. The beginning – Pre–calculus
the determinant of the matrix that is obtained by replacing the first (second) column of A with the right side of the equation. Also, Theorem 10, p. 67, may be restated as follows Theorem 38. For a homogeneous 2 × 2 linear algebraic system, there exist two possibilities: 1. If A = |A| = 0, a unique solution exists and is given by x = y = 0 (trivial solution). 2. If A = |A| = 0, the solution exists, but is not unique; this is called the nontrivial solution, and is uniquely defined except for a multiplicative constant. On the other hand, recall that if A = a d − b c = 0 and A1 = d f − b g = 0, we also have A2 = a g − c f = 0 (Eq. 2.18). Then, Theorem 11, p. 67, for nonhomogeneous problems may be restated as follows Theorem 39. For a nonhomogeneous 2 × 2 linear algebraic system, there exist three possibilities 1. If A = |A| = 0, a unique solution exists and is given by Eq. 3.164. 2. If A = |A| = 0, but A1 = 0 (and hence A2 = 0), the solution does not exist. 3. If A = |A| = 0 and A1 = 0 (and hence A2 = 0), the solution exists, but is not unique (∞1 solutions).
3.8.2 Determinants of 3 × 3 matrices
♣
Consider the following Definition 75 (Determinant of 3×3 matrix). The determinant of a 3×3 matrix A, denoted by the symbol |A|, is defined by
a11 a12 a13 |A| = Det ahk =
a21 a22 a23
:= a11 a22 a33 + a12 a23 a31 + a13 a21 a32
a31 a32 a33 − a11 a23 a32 − a12 a21 a33 − a13 a22 a31 .
(3.165)
137
3. Matrices and vectors
• The Sarrus rule A useful rule to remember the expression for the determinant of a 3×3 matrix is the so–called Sarrus rule.8 This consists in the following: repeat the first two columns to obtain ⎡ ⎤ a11 a12 a13 a11 a12 ⎢ ⎥ ⎢ ⎥ ⎢a21 a22 a23 a21 a22 ⎥ (3.166) ⎢ ⎥. ⎣ ⎦ a32 a33 a31 a32 a31 Each of the first three terms on the right side of Eq. 3.165 equals the product of the three terms along each of the three straight lines descending from left to right (see Eq. 3.166), whereas each of the last three terms equals minus the product of the three terms along each of the three straight lines ascending from left to right, as you may verify.
• Determinant of transpose Note that A and AT have the same determinant:
|A| = AT .
(3.167)
Indeed, applying Eq. 3.165, we have
T
A = a11 a22 a33 + a21 a32 a13 + a31 a12 a23 − a11 a32 a23 − a21 a12 a33 − a31 a22 a13 = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a11 a23 a32 − a12 a21 a33 − a13 a22 a31 = |A|.
(3.168)
• Permutation symbol and 3 × 3 determinants For future reference, Eq. 3.165 may be written in a more compact form by introducing the so–called permutation symbol :9 8 9
Named after the French mathematician Pierre Fr´ ed´ eric Sarrus (1798–1861).
This is also known as the Levi-Civita symbol, named after Tullio Levi-Civita (1873– 1941), an Italian physicist and mathematician, best known for his contributions to tensor analysis, also known as “absolute differential calculus,” which he co–developed with his mentor, the Italian mathematicians Gregorio Ricci-Curbastro (1853–1925).
138
Part I. The beginning – Pre–calculus
hjk = 1,
if (h, j, k) = (1, 2, 3), or (2, 3, 1), or (3, 1, 2);
= −1,
if (h, j, k) = (1, 3, 2), or (2, 1, 3), or (3, 2, 1);
= 0,
otherwise.
(3.169)
[As stated in Section 1.8, there exist two symbols for the Greek letter epsilon, namely and ε. Again, in this book the two symbols are not used interchangeably.] Remark 35. Note that each of the addends in the expression for the determinant contains one and only one term from each row, as well as one and only one term from each column. [There are only six terms that satisfy such conditions, as you may verify.] In addition, the signs are different for the “ascending” and the “descending” lines. This corresponds to the sign of hjk (Eq. 3.169). [This is just a sneak preview of a similar definition for the determinant for an n × n matrix.] Equation 3.169 implies |A| =
3 #
hjk a1h a2j a3k ,
(3.170)
h,j,k=1
as you may verify. ◦ Comment. For future reference, note that hjk = −hkj .
(3.171)
Then, if A is a symmetric matrix, namely if ajk = akj , we have 3 #
hjk ajk = 0
h = 1, 2, 3 .
(3.172)
j,k=1
This is an immediate consequence of the fact that A : B = 0 whenever one matrix is symmetric and the other antisymmetric (Eq. 3.98). [For, for a fixed value of h, we have that hjk are the elements of an antisymmetric matrix (Eq. 3.171).]
• Cramer rule for 3 × 3 system Note that, akin to the 2×2 case (Eq. 3.163), by using the above definitions of 3 × 3 determinants, the general solution of a 3 × 3 linear algebraic system, as given by Eqs. 2.55 and 2.56, may be written in a more compact form, as xk = Ak /A, where A = |A| (Eq. 3.165), whereas
139
3. Matrices and vectors
b1 a12 a13
A1 =
b2 a22 a23
,
b3 a32 a33
a11 b1 a13
A2 =
a21 b2 a23
,
a31 b3 a33
a11 a12 b1
A3 =
a21 a22 b2
.
a31 a32 b3
(3.173)
In other words, akin to the 2×2 case, the solution may be written as the ratio of two determinants (Eq. 3.163). The denominator is the determinant of the matrix A, whereas the numerator for the k-th unknown is the determinant of the matrix obtained by replacing the k-th column of A with the right side of the equation. This is known as the Cramer rule for 3 × 3 systems. Similarly, Theorem 13, p. 78, for homogeneous problems, may be restated as follows Theorem 40. For a homogeneous system of 3 equations with 3 unknowns, there exist two possibilities: 1. If A = |A| = 0, a unique solution exists and is given by x = y = z = 0 (trivial solution). 2. If A = |A| = 0, there exists at least one nontrivial solution, which is uniquely defined except for a multiplicative constant. On the other hand, Theorem 14, p. 79, for nonhomogeneous problems, may be restated as follows Theorem 41. For a nonhomogeneous system of 3 equations with 3 unknowns, there exist three possibilities: 1. If A = |A| = 0, a unique solution exists and is given by Eq. 3.164. 2. If A = |A| = 0, but Ak = 0 (where k = 1, or 2, or 3), the solution does not exist. 3. If A = |A| = 0 and Ak = 0 (where k = 1, 2 and 3), the solution exists, but is not unique.
3.9 Appendix C. Tridiagonal matrices
♠
The Gaussian elimination procedure is especially simple for tridiagonal matrices (Definition 44, p. 92). Specifically, let a2 , . . . , an denote by the subdiagonal elements, d1 , . . . , dn the diagonal ones, and c1 , . . . , cn−1 the superdiagonal ones. Consider a system of equations A x = b, with
140
Part I. The beginning – Pre–calculus
⎡
d1 ⎢ a2 ⎢ ⎢. . . ⎢ A=⎢ ⎢0 ⎢. . . ⎢ ⎣0 0
c1 d2 ... 0 ... 0 0
0 c2 ... 0 ... 0 0
... 0 ... 0 ... ... . . . ak ... ... ... 0 ... 0
0 0 ... dk ... 0 0
0 0 ... ck ... 0 0
⎤ ... 0 0 0 ... 0 0 0 ⎥ ⎥ ... ... ... ... ⎥ ⎥ ... 0 0 0 ⎥ ⎥ ... ... ... ... ⎥ ⎥ . . . an−1 dn−1 cn−1 ⎦ ... 0 an dn
(3.174)
and b = {bk }. In this case, at the end of the first phase of the Gaussian elimination procedure (namely when the equivalent matrix is in upper triangular form), the system is replaced by ⎤ ⎡ 0 0 d1 c 1 0 . . . 0 0 0 . . . 0 ⎢ 0 d2 c2 . . . 0 0 0 . . . 0 0 0 ⎥ ⎥ ⎢ ⎢. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... ⎥ ⎥ ⎢ 0 0 ⎥ A x = ⎢ ⎥ x = b , (3.175) ⎢ 0 0 0 . . . 0 d k ck . . . 0 ⎥ ⎢. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... ⎥ ⎢ ⎣ 0 0 0 . . . 0 0 0 . . . 0 dn−1 cn−1 ⎦ 0 0 0 ... 0 0 0 ... 0 0 dn where d1 = d1 and dk = dk − ck−1 ak /dk−1
(k = 2, . . . , n),
(3.176)
as you may verify. In addition, regarding the term b = {bk } on the right side of the equation, we have b1 = b1 and bk = bk − bk−1 ak /dk−1
(k = 2, . . . , n).
Then, the solution to Eq. 3.175 is given by xn = bn /dn and xk = bk − ck xk+1 /dk (k = n − 1, . . . , 1).
(3.177)
(3.178)
[Note that k goes from n − 1 to 1. In other words, the order of the operations is reversed, namely by k decreasing.]
Chapter 4
Statics of particles in one dimension
Having introduced the basics of linear algebraic systems, we can now begin our second journey, a journey parallel to that on mathematics, a journey that will bring us to the discovery of the innermost secrets of the world of classical mechanics, in some ways still mysterious to many of us. [As stated in the preface, I use the term classical mechanics because mechanics includes quantum mechanics and relativistic mechanics, which are not dealt with in this book. It should be pointed out that an important branch of classical mechanics, namely statistical mechanics, is also not addressed in this book.]
• Overview of this chapter In Section 4.1, we introduce the notions of particles and forces (Subsections 4.1.1 and 4.1.2, respectively), along with what I call the Newton equilibrium law for a single particle in one dimension.1 Specifically, we deal with single– particle statics, namely with the conditions under which a single particle, initially at rest, continues to remain so at all times. As we will see, a particle is in equilibrium if, and only if, the sum of all the forces acting on it vanishes. Next, in Section 4.2, we extend the formulation to the equilibrium of systems of particles, after introducing the Newton third law of action and reaction, still for the limited case of one–dimensional problems. In Section 4.3, we focus on the so–called spring–particle chains, and discuss their longitudinal equilibrium. We do this for three different configurations. Specifically, in Section 4.4, we consider spring–particle chains that are anchored to the ground at both endpoints. Next, in Section 4.5, we consider chains anchored at only one endpoint. Finally, in Section 4.6 we deal with unanchored chains. 1 The term “one–dimensional” is often abbreviated as 1D. Similarly, the terms “three– dimensional” and “three–dimensional” are abbreviated as 2D and 3D, respectively. In this book, this usage is limited to the Index and to the caption of some figures.
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_4
141
142
Part I. The beginning – Pre–calculus
The material in these three sections provides us with a first example of the interplay between mathematics and mechanics. Specifically, we will uncover a correspondence of the (mechanics) spring–chain problem with the three possibilities encountered for the (mathematics) problem of linear algebraic systems addressed in the Rouch´e–Capelli theorem (Theorem 26, p. 120), on the existence and uniqueness of the solution of linear algebraic equations. Specifically, in the first two cases (chain anchored at both or only one endpoint, Sections 4.4 and 4.5), the solution exists and is unique. On the other hand, for the unanchored case (Section 4.6), the solution may not exist, whereas, if it does, it is not unique. We also have three appendices. In the first (Section 4.7), we introduce the notion of the so–called “influence matrix.” In the second appendix (Section 4.8), we generalize the spring–particle chain formulation to the case of n particles, each of which being spring–connected to any other particle. Finally, in the third appendix (Section 4.9) we present an introduction to the International System of Units.
• Preliminary remarks The material of this chapter is relatively simple, and it’s the best we can do with the mathematics we have introduced thus far. Nonetheless, it is quite interesting because it provides us with a first example of the interplay between mathematics and mechanics. Specifically, it provides illustrative examples of how the theorems of existence and uniqueness arise in mechanics as well. Moreover, it is essential for more interesting mechanics material that comes later. [Hold your horses, because the next mechanics chapter (namely Chapter 11) requires that we introduce the appropriate background on calculus.] Isaac Newton is unquestionably the father of classical mechanics, as he introduced the famous Newton laws of dynamics, as well as the law of gravitation. Therefore, classical mechanics is appropriately referred to as Newtonian mechanics.2 [It should be noted that there exist special classical–mechanics 2
Isaac Newton (1643–1727) was an English mathematician, physicist and astronomer, often considered to be the greatest scientist who ever lived. Three of his most important contributions addressed in this book are: (i) in mathematics, infinitesimal calculus (Chapters 9 and 10), which was introduced independently by Newton and Gottfried Wilhelm Leibniz (we have seen their dispute in Footnote 4, p. 12); (ii) in mechanics, the laws that govern dynamics (Chapters 11 and 20 in one and three dimensions, respectively); and (iii) in astronomy, the law of gravitational attraction between celestial bodies (Chapter 20). He is one of the scientists that contributed the most to address the interplay of mathematics and mechanics. Indeed, he developed the mathematics that he needed for his work on mechanics.
4. Statics of particles in one dimension
143
formulations, known as Lagrange mechanics3 and Hamilton mechanics.4 However, both of them may be derived from Newtonian mechanics. In other words, they are consequences of the Newtonian mechanics and are to be considered as its subsets.] Remark 36. Mechanics is a branch of physics and comprises: (i) statics, which deals with the equilibrium of objects that do not move, (ii) kinematics, which deals with their motion, independently of its causes (namely the forces, a notion introduced in Subsection 4.1.2), and (iii) dynamics, which deals with the relationship between motion and forces. ◦ Warning. There is no uniformity in the literature regarding the terminology used for the classification of the field of classical mechanics. Specifically, some authors divide mechanics into two branches: statics and dynamics; then, dynamics is subdivided into kinematics, whereas the rest of dynamics is referred to as kinetics. Other authors divide mechanics into three branches: statics, kinematics, and dynamics (dynamics in the second classification corresponds to kinetics in the first). I follow the second type of classification (namely statics, kinematics and dynamics), because the etymology of dynamics involves the concept of force.5 ◦ Comment. In this chapter, we consider only statics. [Kinematics and dynamics require a precise definition of velocity (Eq. 11.1) and acceleration (Eq. 11.2). These, in turn, require the notion of derivative introduced in Eq. 9.2. Thus, kinematics and dynamics are postponed to Chapter 11, after introducing calculus (Chapters 8–10).] To be precise, in this chapter we limit 3
Joseph–Louis Lagrange (1736–1813), a mathematician and astronomer. He is usually considered French, although he was Italian by birth and scientific upbringing. He was born Giuseppe Luigi Lagrangia, in Turin, Piedmont (now Italy), and studied at the University of Turin. In 1766 he succeeded Euler as the director of mathematics at the Prussian Academy of Sciences, in Berlin, Prussia. In 1787 he moved to Paris, became a member of the French Academy and remained in France to the end of his life. He introduced the Lagrange equations of motion (addressed in Vol. II), which are the basis of analytical mechanics. He also provided important contributions to the calculus of variations. The Lagrange multipliers in constrained minimization (Eq. 18.115), the Lagrange form of the remainder of a Taylor polynomial (Eq. 13.65) and the Lagrange interpolation polynomials (addressed in Vol. II) are also named after him. His 1788 treatise M´ ecanique Analytique (Ref. [39]) is a milestone in the illustration of the interplay between mathematics and mechanics and was the basis of future developments in classical mechanics. Another major contributor to the interplay of mathematics and mechanics! 4 William Rowan Hamilton (1805–1865) was an Irish physicist, astronomer and mathematician. He introduced the Hamilton equations of motion, addressed in Vol. II. The Cayley–Hamilton theorem on matrices (also in Vol. II) is also named after him. 5 Mechanics comes from the ancient Greek word μηχανη (mekhane, ancient Greek for “machine”). Statics comes from στ ατ ικoς (statikos, ancient Greek for “causing to stand”). Kinematics comes from κινημα (kinema, ancient Greek for “motion”). Dynamics comes from δυναμις (dunamis, ancient Greek for “power,” “force”).
144
Part I. The beginning – Pre–calculus
ourselves to one–dimensional problems. As we will see, this implies that we assume all the forces to be coaligned, namely to lie on the same straight line. [In real life, forces can act in any direction. Therefore, forces are represented by a physicists’ vectors (Remark 30, p. 94), which are introduced in Chapter 15. Accordingly, the three–dimensional formulation for statics is postponed to Chapter 17.] Remark 37 (Principles and laws). Throughout the book, I will use the term principle to indicate those mechanics statements based upon non– contradicted experimental evidence. Specifically, if there exists adequate experimental evidence (a lot of it!), one formulates a principle, which is assumed to be “true ad interim,” namely true until some new experimental evidence points to the contrary. “Principle” has some similarity with “postulate” and “axioms,” in that I will avoid using the term “principle” for a statement that may be obtained from other principles. However, we will encounter some rare exceptions, typically whenever the terminology is universally accepted — and there exists no possibility of misunderstanding. [When I do this, you’ll be warned.] Most of the time, in these cases I will use the term “law” instead of “principle.” The term “law” is not used as strictly as “principle” — it is used even when the statement is a consequence of other results and/or principles. For instance, as we will see in Subsection 11.4.1, the so–called Newton first law is a consequence of the Newton second law. Hence, strictly speaking, it is not appropriate to refer to it as the Newton first principle. On the other hand, the Newton second law is a principle. Nonetheless, I will follow the tradition — indeed, it would sound weird calling the Newton second law anything else. [Let us be precise. Are we sure that the Newton second law is a principle? It depends! It was a principle until it was superseded by the Einstein theory of relativity. In fact, the former may be obtained as a consequence of the latter, for speeds much less than the speed of light. To put it in different terms, in classical mechanics (and hence in this book), the Newton second law is a principle. However, in relativistic mechanics (which must be used for speeds relatively close to the speed of light), it is not.]
4.1 Newton equilibrium law for single particles The equilibrium of a particle is governed by the following Law 1 (Newton equilibrium law for a single particle) As mentioned above, here we: (i) limit ourselves to a single particle P that does not move (statics), and (ii) assume that all the forces acting on it have the same direction (one–dimensional problem). For this limited case, the equilibrium
145
4. Statics of particles in one dimension
(namely the fact that a particle at rest does not start to move and remains at rest) is attained if, and only if, all the forces are balanced, in the sense that their sum vanishes, namely F :=
n #
Fk = 0,
(4.1)
k=1
where Fk (k = 1, . . . , n) include all the forces acting on P , with the appropriate signs (as it will be discussed in Remark 40, p. 147). In the above statement, I have used two terms (particle and force), which have not yet been defined. Hence, we can ask ourselves the following questions: “What is a particle? What is a force?” These are discussed in the rest of this section. Remark 38. I would like to emphasize that statics may be seen as a particular case of dynamics, namely the specific case in which the particle has zero velocity. As a matter of fact, the velocity of the particle depends upon the frame of reference that we are using: if we use a frame of reference connected with the particle, the particle is not moving. Oops! I am getting ahead of myself. The notion of frame of reference in mechanics is first addressed in Chapter 11.
4.1.1 Particles and material points Let’s start with a discussion of material points. Paraphrasing Euclid’s definition of a point (“A point is that which has no parts,” Book I, Definition 1; Ref. [22], Vol. 1, p. 153), the definition could be stated as: “A material point is an object having no extension.” In practice, things are a bit more subtle. [This is a first example of the fact that theory needs to be properly “interpreted,” when applying it to practical problems. “Modeling” is the term that comes to mind.] Let us consider an example. Think of yourself as being on the top of a mountain and looking at a highway down below, way down. Each car on the highway looks like a tiny dot, and its actual dimensions appear irrelevant to its motion. Any time this happens, we speak of the object as a material point, which may therefore be defined as the limit case of a body, whose dimensions are becoming smaller and smaller, until they become negligible. [De facto, it’s the observer that gets further and further away.] We may state that an object may often be treated as a material point if its dimensions are negligible for the problem under consideration. For instance,
146
Part I. The beginning – Pre–calculus
in studying their motions, stars, planets and satellites are typically treated as material points. [As we will see, a body cannot always be treated as a material point. This is the case, for instance, whenever the body rotation may be relevant, in the sense that it affects the forces acting on the body (see Chapter 23, on rigid bodies).] In fact, to emphasize that the body may have finite dimensions, albeit very small, I often use the term particle. Indeed, I will use the terms material point and particle, interchangeably. The former is more rigorous, but the latter is more gut–level. This is the reason why I often prefer the use of the latter. If you want to be precise, you may think of a particle as a body that is so small that may be approximated with a material point.
4.1.2 Forces What is a force? The easiest way to get you acquainted with the notion of force is through a series of examples. [Recall that here, we deal only with forces acting in a single direction; forces in three dimensions are dealt with in Chapter 17.]
• Weights. Balance scales The first example of force that comes to mind is weight. You know what I mean by weight, don’t you? [Weight is also known as the force of gravity. I will use the terms weight and force of gravity interchangeably.] As you know, a balance scale is a weight–measuring instrument, composed of a beam, hinged on its center, and two pans hanging from its endpoints (Fig. 4.1).
Fig. 4.1 Balance scale
4. Statics of particles in one dimension
147
It is intuitively true that the beam is in equilibrium if, and only if, the weights in the two pans are equal. [A precise theory behind a balance scale, based upon the notion of “moment of a force,” is presented in Subsection 17.7.1.] Consider two bodies B1 and B1 . A balance scale may be used to decide whether the weights W1 and W1 , of B1 and B1 , are equal. Also, we say that the weight W2 of a body B2 is twice that of B1 if the body B2 , placed on one pan, balances the bodies B1 and B1 , with W1 = W1 , placed on the other: W2 = W1 + W1 = 2 W1 . Of course, we may extend this approach and use three equal weights on one pan and one on the other, so as to have W3 = W1 + W1 + W1 = 3 W1 . This way, we can define Wn = n W1 , with n arbitrarily large. The smaller W1 is, the more accurate is our measuring system. Remark 39. Let me be picky here. Weights act along the vertical — by definition of the term “vertical,” I must add. Indeed, in physics, the terms vertical and horizontal have a very precise meaning, namely parallel and perpendicular to the force of gravity. Thus, in the absence of gravity (for instance, at some points in space) it does not make any sense to speak of horizontal and vertical directions, from a physicist’s point of view. However, in everyday language, horizontal and vertical have also a second meaning, as in “a horizontal line on a piece of paper,” or in horizontal and vertical axes. The two definitions are somewhat connected, because, if you are standing straight, a horizontal line on a sheet of paper happens to be parallel to the straight line connecting your eyes (which lay in a horizontal plane, namely a plane normal to gravity).
• Forces due to springs. Spring scales Another type of force that we will deal with is that generated by a spring. In the example of Fig. 4.2, the body B is subject to its weight (downward) and to the equal and opposite force FS that the spring exerts on B (upwards). Remark 40 (A word of caution). Even for the limited one–dimensional case considered here, forces must be added with an appropriate sign. In view of the fact that the weight points downward and the spring force points upward, the weight and the spring force are assigned opposite signs. [Which one of the two is positive depends upon the convention that we decide to choose. For instance, if we adopt the convention of considering the upward forces to be positive, then W is negative and FS is positive.] This approach is typically used throughout this book.
148
Fig. 4.2 Equilibrium
Part I. The beginning – Pre–calculus
Fig. 4.3 Spring scale
With the above convention, the Newton law of equilibrium (Eq. 4.1) requires that W + FS = 0,
(4.2)
in order for the body not to move. [We could follow an alternate strategy. We could denote by W and FS the absolute value of the two forces (in this case, we always have W > 0 and FS > 0). Hence, to take into account that the weight has a direction (downward) opposite to that of the spring force (upward), the equilibrium law would have to be written as −W + FS = 0, namely as W = FS. However, I prefer the approach illustrated above, namely to assign a sign to each force, although harder to digest in the beginning, because it prepares you to go through the transition to three–dimensional problems.] Of course, Eq. 4.2 may be used to measure weights. Specifically, consider the apparatus depicted in Fig. 4.3, which is known as a spring scale. This may be used as a way to measure forces in a continuous fashion. For, one may calibrate the spring scale by placing bodies of known weight over the dish of the spring scale and marking the corresponding displacement. Then, we can interpolate and obtain the in–between values. In general, it is desirable to avoid the process of interpolation. Accordingly, it is convenient to use springs for which the displacement is proportional to the force applied. Then, in marking the displacements, one may use a subdivision into equal segments, each corresponding to an equal increase in weight. Specifically, we have the following Definition 76 (Linear spring). Consider a spring that is constrained at one endpoint and has a force F applied at the other endpoint. Let us denote
149
4. Statics of particles in one dimension
by u the spring elongation (which equals the displacement of the free end point of the spring, from the unloaded location), obtained by applying a force F . A spring is called linear if, and only if, the elongation u is proportional to the force F applied, namely F = −κ u,
(4.3)
κ>0
(4.4)
where
is a positive constant, known as the spring stiffness constant, or simply as the spring constant.6 The minus sign is due to the fact that F and u have opposite directions. Accordingly, they have opposite signs, provided of course that we use for both of them the same sign convention, as we typically do in this book (Remark 40, p. 147). In other words, F = −κ u is a restoring force, namely a force in the direction opposite to that of the displacement that causes it. To verify that a spring is linear, one may use n bodies Bk (k = 1, . . . , n), all with the same weight W . If the spring is linear, the elongation obtained by adding p ≤ n of these bodies equals p times that due to W . Remark 41. Because of its convenience, the use of linear springs is typically adopted in practice. Of course, in real life springs satisfy Eq. 4.3 only approximately, definitely only within a certain range of elongation. After all, if you keep on stretching a spring, sooner or later it is going to undergo permanent plastic deformations (yield), or even to break. Shortly before the yield/break occurs, the spring behavior is no longer governed by the simple expression in Eq. 4.3. However, for small elongations, Eq. 4.3 gives a very good representation of the actual relationship — the smaller the elongation, the better the approximation. [Idealizations of this type are often used in practice, for instance in engineering design, where the engineers are free to choose the range to be sufficiently small so as to allow them to use Eq. 4.3.] This is just an example of a procedure called linearization, addressed in detail in Section 13.8. These types of limitations are tacitly assumed in the remainder of this book. Indeed, most of the book deals with linear (or linearized) problems. In particular, throughout the book, the springs are assumed to be linear unless otherwise stated. 6
The symbol κ denotes the Greek letter kappa (Section 1.8). [It is also used, for instance, for the curvature of a line (Eq. 9.145) and for the thermal conductivity coefficient in the heat conduction law (addressed in Vol. III).]
150
Part I. The beginning – Pre–calculus
• Other forces There exist other forces familiar to you, such as the force that you produce when you exert pressure on an object, for instance when you push a piece of furniture, or press on the pedals of your bicycle. [Of course, a spring scale may be used to measure these types of forces. For instance, isometric–exercise devices are used to measure the strength of your muscles. These devices consist of two cylinders sliding one inside the other; a spring inside the two provides the restoring force. A numbered scale tells you the force exerted by your muscles.] Another force that may be familiar to you is the force that the Sun exerts on the Earth. This is known as gravitation and is discussed in Subsection 20.8.1 on the Newton law of gravitational attraction. [As we will see, the force of gravity (weight) is closely related to the force of gravitation, but does not coincide with it. The difference will be addressed in Subsections 20.8.1 and 22.4.2).] Other familiar forces include: the Archimedes buoyancy, namely the force that allows a boat to float; the lift of an airplane, namely the force that makes an airplane (and birds) fly; the resistance that the air exerts on your motorcycle or car, known as the aerodynamic drag; the thrust of a rocket. These will be introduced as needed. Weight and spring forces, along with the force you produce by pushing/pulling something, will suffice for the time being.7 7 Archimedes (c. 287–212 BC) was a Greek mathematician, physicist, and engineer. He was born in the Mediterranean island of Sicily (part of “Magna Graecia,” Latin for “Great Greece,” roughly speaking much of the coastal area of southern Italy), in the city of Syracuse, and died there, during the sack of Syracuse in the Second Punic War (218–201 BC), killed by a Roman soldier, who ignored orders from the Roman general Marcellus that Archimedes was not to be harmed. Archimedes is considered one of the greatest scientists ever, by some simply the greatest. He discovered the laws of equilibrium of fluids (hydrostatics), in particular, the laws of density and buoyancy, specifically the so–called Archimedes principle. You might have learned that, when he came up with the idea, he was in a bathtub and got so excited by the result that he jumped out of the tub and started running naked in the streets, yelling “eureka” ( υρηκα, ancient Greek for “I found it”). He is credited with introducing the field of statics by enunciating the law of the lever (Subsection 17.7.1; when I was in grammar school, I was told that he said “Give me a place to stand upon, and I shall lift the Earth”). Also, he is said to have used parabolic reflectors to burn ancient Roman ships, during the Syracuse siege (Subsubsection “High beams, Archimedes, and the Walkie–Scorchie,” on p. 621). In addition to being a great physicist, he is also considered one of the greatest mathematicians in antiquity. He used the method of exhaustion, which is closely related to the limit in calculus, introduced by Newton and Leibniz about two millennia later. He proved that the volume and surface area of the sphere would be two thirds of that of the cylinder with the same height and the same diameter.
151
4. Statics of particles in one dimension
4.2 Equilibrium of systems of particles In this section, we extend the static–equilibrium formulation from a single particle to a system of particles. In this case, it is convenient to distinguish between external and internal forces. We call internal those forces that the individual particles exert on each other. The other forces are called external. As we will see later, this distinction between internal and external forces is essential to formulate the Newton third law, which provides a relationship between: (i) the force Fhk that particle k exerts on particle h, and (ii) the force Fkh that particle h exerts on particle k. [This is discussed in Subsection 4.2.1, for the one–dimensional case. For three–dimensional problems, see Subsection 17.6.1.] To reduce the complexity of the formulation (and remain within the one– dimensional–formulation limitations of this chapter), we assume the particles to be coaligned (namely lying on the same straight line), and all the forces to be directed along the same line. This allows us to fully describe a force simply by a real number. Then, we have the following (recall the sign convention in Remark 40, p. 147) Law 2 (Newton equilibrium law for n particles) For the limited case of n-particle equilibrium in one dimension, the equilibrium is attained if, and only if, all the forces acting on each particle are balanced, in the sense that n #
Fhk + fh = 0
(h = 1, . . . , n),
(4.5)
k=1
where Fhk (with h, k = 1, . . . , n, which in analogy with Eq. 3.27, means h = 1, . . . , n and k = 1, . . . , n) denotes the forces that the k-th particle exerts on the h-th particle, whereas fh denotes the sum of all the external forces acting on the h-th particle.
4.2.1 The Newton third law of action and reaction In order to discuss the equilibrium of several particles, consider the following Law 3 (The Newton third law in one dimension) The force Fkh that particle h exerts on particle k is equal and opposite to the force Fhk that particle k exerts on particle h, namely Fkh = −Fhk .
(4.6)
152
Part I. The beginning – Pre–calculus
◦ Comment. This law is almost self–evident. To see this, consider a rigid ball over a flat rigid surface, such as a marble tabletop. This is a one–dimensional problem, since all the forces acting on the ball (namely the weight and the reaction of the table) are coaligned. Thus, the above Newton third law of action and reaction applies. Now, place a thin sheet of aluminum foil between the ball and the table. At a gut level, the forces are not altered by the presence of the sheet. Thus, this is subject to the weight of the ball and the reaction of the table. For the equilibrium of the sheet, these must be equal and opposite in agreement with Eq. 4.6. Remark 42. This is really a simplified version of the three–dimensional Newton third law, or the law of action and reaction (Eqs. 17.59 and 17.61), which states that the internal forces exchanged between two particles are (i) equal and opposite and (ii) lying along the straight line connecting the two particles. In the present one–dimensional case, the second part of the third law is irrelevant because here all the particles and all the forces have been assumed to be coaligned.
4.3 Longitudinal equilibrium of spring–particle chains We have the following definitions: Definition 77 (Longitudinal and lateral). Given a line, quantities such as force, displacement, velocity, acceleration that are parallel to the line are called longitudinal, whereas quantities that are perpendicular to the line is called lateral (or transversal ). Definition 78 (Spring–particle chain). A spring–particle chain is a collection of n coaligned particles, sequentially numbered, in which particle h is spring–connected only to particles h − 1 and h + 1. [To avoid that the chain buckles, the particles are forced to remain coaligned, as in the case of beads on a taut wire.] In this section and the three that follow, we address the longitudinal equilibrium of spring–particle chains. This problem is particularly interesting not only per se, but also because, as mentioned above, it illustrates what we have learned in the preceding chapters on linear algebraic systems. Specifically, we will see that, for the statics problem under consideration, we may encounter all three cases addressed in Theorem 26, p. 120, for n × n linear algebraic systems: (i) existence and uniqueness, (ii) non–existence, and (iii) existence, but not uniqueness.
153
4. Statics of particles in one dimension
Consider a chain of particles aligned along a given straight line, and subject to known (but arbitrarily prescribed) forces that are acting on the individual particles (see Figs. 4.4 and 4.5, where the forces are not shown for the sake of graphical clarity). [Since for one–dimensional problems the forces have to be coaligned with to the chain, we either assume that there is no gravity (as in space), or that the chain is vertical. However, as we will see in Chapter 17, the formulation applies as well if the particles slide over a virtually frictionless surface, such as ice.]
Fig. 4.4 Spring–particle chain, anchored at both endpoints
Fig. 4.5 Spring–particle chain, unanchored
To be specific, assume that, in the reference configuration (namely in the location that they have in the absence of applied forces), the particles (0) are coaligned. Let xk (k = 1, . . . , n) identify the location of Pk (the k-th particle along the line), as measured starting from an arbitrary origin O on (0) (0) (0) that line so as to have xk < xk+1 . [Here, xk denoted the value of xk when all the forces vanish.] Then, we apply a force of prescribed intensity to each particle and observe how the particles are displaced. Let the displacement of the particle Pk be denoted by (0)
uk := xk − xk ,
(4.7)
where xk denotes the location of Pk , in the equilibrium configuration (namely the configuration attained after we apply the forces). We refer to uk as the longitudinal displacement of Pk . Our problem is finding the displacements uk (k = 1, . . . , n) as functions of the forces fh (h = 1, . . . , n), where fh denotes the sum of all the forces applied to the particle Ph , excluded the spring forces due to the adjacent particles (Eq. 4.5). This will be referred to as the longitudinal equilibrium problem of a spring–particle chain. In the following sections, we examine three separate problems: (i) in Section 4.4, the particles are spring–anchored at both end points (as in Fig. 4.4
154
Part I. The beginning – Pre–calculus
above), (ii) in Section 4.5, they are anchored only at one endpoint, whereas (iii) in Section 4.6, none of the particles is anchored (as in Fig. 4.5 above). As already pointed out, we will see that in the first two cases, the solution exists and is unique. On the other hand, for the unanchored case, the solution may not exist, whereas, if it does, it is not unique. ◦ Warning. In the following three sections, for the sake of simplicity, we assume the springs to be linear and to have the same stiffness constant κ.
4.4 Spring–particle chains anchored at both ends Here, we consider the longitudinal equilibrium of a chain of n spring– connected particles that is anchored at both endpoints (Fig. 4.4 above). ◦ Warning. To make our life simple, the situation in which the springs are pre–stretched (namely stretched before we apply the external forces) is not considered here. [This problem will be addressed in Section 21.2.] Recall that in defining a linear spring (Definition 76, p. 148), we assumed the spring to be anchored at one endpoint and free to move at the other. Then, we had F = −κ u (Eq. 4.3), with u denoting the spring elongation, namely the displacement of the free endpoint from the reference location, namely that attained when F = 0. In the problem under consideration, however, the generic spring is free to move at both endpoints. Thus, we have to generalize Eq. 4.3. Such a generalization is obtained by replacing u with the change in length of the spring, − 0 , where is the length of the spring in the current (equilibrium) configuration, whereas 0 is the length of the spring in the reference (stretched) configuration. Specifically, consider the spring connecting Ph to the particle Ph−1 . For this spring, we have − 0 = uh − uh−1 . Accordingly, the desired generalization is given by Fh,h−1 = −κ uh − uh−1 h = 2, . . . , n , (4.8) where Fh,h−1 (positive when pointed in the x-direction) denotes the force that the spring exerts on the particle Ph . Next, consider the force Fh,h+1 that the spring connecting the particle Ph to the particle Ph+1 exerts on the particle Ph . Note that the one–dimensional Newton third law of action and reaction (Eq. 4.6) yields Fh,h+1 = −Fh+1,h ,
(4.9)
155
4. Statics of particles in one dimension
where Fh+1,h = −κ (uh+1 − uh ). [Hint: Use Eq. 4.8 with h replaced by h + 1.] Accordingly, we have Fh,h+1 = −κ uh − uh+1 h = 1, . . . , n − 1 . (4.10) All the other forces Fhk vanish. Thus, the equation for the equilibrium of the h-th particle, namely "n k=1 Fhk + fh = 0 (Eq. 4.5) yields κ (uh−1 − 2 uh + uh+1 ) + fh = 0, i.e., κ − uh−1 + 2 uh − uh+1 = fh h = 2, . . . , n − 1 . (4.11) Equation 4.11 defines a system of n − 2 equations, with n unknowns. We need two more equations so as to have a system of n equations, with n unknowns. Note that in Eq. 4.11 we have h = 1 and h = n. Accordingly, the two additional equations needed correspond to the equilibrium of the particles h = 1 and h = n. These equations are examined below.
4.4.1 Chains of three particles To avoid confusing you with too many subtleties, before addressing the nparticle case, let us consider a simple system of three particles, connected by springs (again, having the same stiffness constant κ), as in Fig. 4.4, p. 153. The first and third particles are anchored, namely connected by springs (also having a stiffness constant equal to κ) to the points P0 and P4 , which do not move. Therefore, we have F1,0 = −κ u1 and F3,4 = −κ u3 . Then, proceeding as we did for the other particles, we obtain that the governing equations for the first and last particles are, respectively, κ (2 u1 − u2 ) = f1
and
κ (−u2 + 2 u3 ) = f3 .
(4.12)
Remark 43. The equation for the first particle (first in the above equation) may be obtained by using a little trick. Let us pretend that at P0 there is a physical particle and that this particle is constrained so that u0 = 0. Thus, we can still use Eq. 4.11, with h = 1 and u0 = 0, and obtain κ (2 u1 − u2 ) = f1 . We can operate in a similar fashion for the particle P3 , and include a particle P4 , with u4 = 0, to obtain κ (−u2 + 2 u3 ) = f3 . This way of arriving at the end–particle equations unifies the formulation for all the particles and will be used from now on. Combining Eq. 4.11 (with h = 2) and Eq. 4.12, we have that the problem is governed by the following system of three linear algebraic equations, with three unknowns
156
Part I. The beginning – Pre–calculus
κ (2 u1 − u2 ) = f1 , κ (−u1 + 2 u2 − u3 ) = f2 , κ (−u2 + 2 u3 )
= f3 .
(4.13)
Using the matrix notation introduced in Eqs. 3.7 and 3.10, this equation may be written as K u = f, where we have u = u1 , u2 , u3 T and f = f1 , f2 , f3 T , whereas ⎡ ⎤ 2 −1 0 K = khj = κ ⎣−1 2 −1⎦. 0 −1 2
(4.14)
(4.15)
This matrix is called the stiffness matrix for the problem under consideration. As a matter of fact, we have the following Definition 79 (Stiffness matrix). For any problem in which the spring– force vector fS is related to the displacement vector u through an expression of the type fS = −K u, the matrix K is called the stiffness matrix for the problem under consideration.
• Uniqueness of the solution Next, to prove that the solution exists and is unique, we want to show that the rows of the above stiffness matrix are linearly independent. To this end, it is convenient to use the nonstandard Gaussian elimination procedure introduced in Subsection 2.3.6. [The considerations in Remark 27, p. 82, apply here as well.] Specifically, let us add the first equation to the second multiplied by 2. Similarly, we add the resulting second equation to the third multiplied by 3. The upper triangular matrix UK of the resulting equivalent system is given by ⎡ ⎤ 2 −1 0 (4.16) UK = κ ⎣0 3 −2⎦, 0 0 4 as you may verify. This clearly shows that the equations are linearly independent (Lemma 4, p. 115). Thus, the solution exists and is unique. ◦ Comment. An alternate proof that the solution exists and is unique may be obtained by showing that the stiffness matrix in Eq. 4.15 is real positive definite, namely uT K u > 0, for any u = 0 (Eq. 3.120). Indeed, for any 3 × 3
157
4. Statics of particles in one dimension
symmetric matrix, we have (use Eq. 3.83) uT K u = k11 u21 + k22 u22 + k33 u23 + 2 k12 u1 u2 + k13 u1 u3 + k23 u2 u3 . (4.17) In our case, using Eq. 4.15, we have khh = 2 κ, k12 = k23 = −κ and k13 = 0, and hence uT K u = 2 κ u21 + u22 + u23 − u1 u2 − u2 u3 = κ (u1 − u2 )2 + (u2 − u3 )2 + u21 + u23 > 0, (4.18) namely uT K u > 0, unless u1 = u2 = u3 = 0. Thus, K is positive definite, and hence the solution exists and is unique (Corollary 1, p. 124, on the solvability of systems with real positive definite matrices).
4.4.2 Chains of n particles Here, we generalize the results of the preceding subsection to the n particle problem. For every particle, we may use κ (−uh−1 + 2 uh − uh+1 ) = fh (Eq. 4.11), with h = 1, . . . , n and u0 = un+1 = 0 (Remark 43, p. 155). These equations form a system of n linear algebraic equations, with n unknowns. Again, may be written as K u = f (as in Eq. 4.14), these equations where u = uj and f = fh , whereas the n × n stiffness matrix K is given by K = khj = κ Q, (4.19) with 2 −1 0 · · · 0 0 0 ⎢−1 2 −1 · · · 0 0 0 ⎢ ⎢· · · · · · · · · · · · · · · · · · · · · ⎢ Q=⎢ ⎢ 0 0 0 · · · − 1 2 −1 ⎢· · · · · · · · · · · · · · · · · · · · · ⎢ ⎣ 0 0 0 ··· 0 0 0 0 0 0 ··· 0 0 0 ⎡
⎤ ··· 0 0 0 ··· 0 0 0 ⎥ ⎥ · · · · · · · · · · · ·⎥ ⎥ ··· 0 0 0 ⎥ ⎥. · · · · · · · · · · · ·⎥ ⎥ · · · −1 2 −1⎦ · · · 0 −1 2
(4.20)
Note that the stiffness matrix K is once again symmetric. [Of course, if n = 3 we recover Eq. 4.15.]
158
Part I. The beginning – Pre–calculus
• Existence and uniqueness of the solution
♠
Here, extending the results for the 3 × 3 case, we show that the rows of the above stiffness matrix are linearly independent. To verify this, it is convenient to use again the nonstandard Gaussian elimination procedure introduced in Subsection 2.3.6. Specifically, let us add the first equation to the second multiplied by 2. Similarly, we add the resulting equation to the third multiplied by 3. In general, we add the resulting equation h − 1 to the h-th one multiplied by h. The upper triangular matrix UK of the resulting equivalent system is given by ⎡ ⎤ 2 −1 0 0 0 · · · 0 0 ⎢ 0 3 −2 0 0 · · · 0 0 ⎥ ⎢ ⎥ ⎢ 0 0 4 −3 0 · · · 0 0 ⎥ ⎢ ⎥ 0 ⎥ (4.21) UK = κ ⎢ ⎢ 0 0 0 5 −4 · · · 0 ⎥, ⎢· · · · · · · · · · · · · · · · · · · · · · · · ⎥ ⎢ ⎥ ⎣ 0 0 0 0 0 · · · n −n + 1⎦ 0 0 0 0 0 ··· 0 n + 1 as you may verify. The last element of the last row differs from zero. This indicates that the equations are linearly independent (Lemma 4, p. 115). Thus, the solution exists and is unique.
• An explicit solution for fh = F .♥ We can obtain an explicit expression for the solution for a particularly simple case, namely when all the forces have the same value: fh = F . This represents, for instance, the case of n same–weight particles aligned along a vertical line and subject only to gravity. [For the case in which the forces have arbitrary values, the explicit expression for the solution is presented in Section 4.7.] I claim that, in this case, the solution is given by uh =
* F ) (n + 1) h − h2 . 2κ
(4.22)
Indeed, the above expression satisfies all the equations. Specifically, it satisfies Eq. 4.11 with fh = F (h = 1, . . . , n), as well as u0 = un+1 = 0, as you may verify. [Hint: Just, substitute Eq. 4.22 into κ(−uh+1 + 2uh − uh−1 ), and simplify. The result equals F , in agreement with Eq. 4.11 (with h = 1, . . . , n, and fh = F ). Also, Eq. 4.22 yields u0 = 0 and un+1 = 0.] As shown above, the solution is unique. Hence, this is the solution.
159
4. Statics of particles in one dimension
• Alternate proof of existence and uniqueness
♠
In analogy with the approach used in Eq. 4.18 for the 3 × 3 case, here we present an alternate proof that the solution of the problem in the preceding subsubsection exists and is unique. Specifically, we show that the matrix Q is positive definite, and hence the solution exists and is unique (Corollary 1, p. 124, on the solvability of systems with real positive definite matrices). Indeed, we have n+1 #
uk − uk−1
2
k=1
=
n+1 #
# n+1 uk uk − uk−1 − uk−1 uk − uk−1
k=1
=
n+1 #
uk uk − uk−1 −
=
uj uj+1 − uj
j=0
k=1 n #
k=1 n #
uk − uk+1 + 2uk − uk−1 .
(4.23)
k=1
[Hint: For the second equality, set k = j + 1 in the second sum (Eq. 3.24). For the third equality, simply replace j with k (which is legitimate because the dummy indices of summation may be renamed arbitrarily, Eq. 3.23), and use u0 = un+1 = 0 (Remark 43, p. 155).] The left side of Eq. 4.23 is positive, unless all the displacements vanish (recall again that u0 = un+1 = 0), whereas the right side coincides with uT Q u. Thus, we have that uT Q u > 0 (unless u = 0), and hence Q is real positive definite, by definition (Eq. 3.120).
4.5 Particle chains anchored at one endpoint Here, we assume that the chain is anchored only at one endpoint. Specifically, consider the case in which the last particle is not anchored. Then, we have that the last particle is connected only to the particle n − 1. Hence, instead of Eq. 4.11, we obtain that the corresponding equation is given by κ (−un−1 + un ) = fn .
(4.24)
The complete set of equations may still be written as Ku = f (Eq. 4.14), where u = uj and f = fh , whereas K is an n × n matrix still given by K = κ Q (Eq. 4.19), where now
160
Part I. The beginning – Pre–calculus
2 −1 0 · · · 0 0 0 ⎢−1 2 −1 · · · 0 0 0 ⎢ ⎢· · · · · · · · · · · · · · · · · · · · · ⎢ Q=⎢ ⎢ 0 0 0 · · · − 1 2 −1 ⎢· · · · · · · · · · · · · · · · · · · · · ⎢ ⎣ 0 0 0 ··· 0 0 0 0 0 0 ··· 0 0 0 ⎡
⎤ ··· 0 0 0 ··· 0 0 0 ⎥ ⎥ · · · · · · · · · · · ·⎥ ⎥ ··· 0 0 0 ⎥ ⎥. · · · · · · · · · · · ·⎥ ⎥ · · · −1 2 −1⎦ · · · 0 −1 1
(4.25)
Note that Q is again symmetric and that the matrices in Eqs. 4.20 and 4.25 differ only because of the element knn . • Existence and uniqueness of the solution Once again, the solution exists and is unique because the equations are linearly independent, as you may verify, following the same approach used in the preceding subsection. [Alternatively, you might like to show that Q is again positive definite. Hint: Use Eq. 4.23 suitably modified.] ◦ An explicit solution for fh = F .♥ Again, we can obtain an explicit expression for the solution for a particularly simple problem, namely for the case in which all the forces have the same value: fh = F . I claim that, in this case, the solution is given by uh =
* F ) F (2n + 1) h − h2 = h (2n + 1 − h). 2κ 2κ
(4.26)
Indeed, the above expression satisfies all the equations for h = 1, . . . , n − 1, as you may verify. Note that now the procedure proposed in Remark 43, p. 155 (namely introducing non–moving particles at x0 and xn+1 ), applies only to the particle located at x0 . [For, for h = 0, Eq. 4.26 yields u0 = 0.] On the other end, we have to verify also that κ (−un−1 + un ) = F (Eq. 4.24 with fn = F ) is satisfied. To this end, using uh = 12 F h (2n + 1 − h)/κ, we obtain un−1 = 12 F (n − 1)(n + 2)/κ and un = 12 F n(n + 1)/κ. This yields 1 1 κ (−un−1 + un ) = − F (n − 1) (n + 2) + F n (n + 1) = F, 2 2
(4.27)
in agreement with Eq. 4.24 with fn = F . In view of the fact that the solution is unique, Eq. 4.26 is the solution.
161
4. Statics of particles in one dimension
4.6 Unanchored spring–particle chains Much more interesting for the analysis of the interplay between mathematics and mechanics is the problem in which none of the springs are anchored, as in Fig. 4.5, p. 153. For this case, Eq. 4.11 (h = 2, . . . , n − 1) and Eq. 4.24 (h = n) still apply, whereas for the first particle we have (in analogy with Eq. 4.24) κ (u1 − u2 ) = f1 .
(4.28)
This set ofequations may still be expressed as K u = f (Eq. 4.14), where as usual u = uj and f = fh , whereas now the n × n matrix K is given by ⎤ ⎡ 1 −1 0 · · · 0 0 0 ⎢−1 2 −1 · · · 0 0 0 ⎥ ⎢ ⎥ ⎢ 0 −1 2 · · · 0 0 0 ⎥ ⎥ ⎢ ⎥ K = κ⎢ (4.29) ⎢· · · · · · · · · · · · · · · · · · · · · ⎥. ⎢ 0 0 0 · · · 2 −1 0 ⎥ ⎢ ⎥ ⎣ 0 0 0 · · · −1 2 −1⎦ 0 0 0 · · · 0 −1 1 It should be noted that now the rows of K are linearly dependent. For, adding the rows of Eq. 4.29, one obtains n #
khj = 0,
(4.30)
h=1
as you may verify. The left side of Eq. 4.30 is a linear combination of the rows of the matrix in Eq. 4.29, with ch = 1, for all h and all j. Hence, on the basis of the Fredholm alternative (Theorem 29, p. 122), we have that the solution does not exist, unless the same linear combination of the right sides of the equations also vanishes, namely unless n #
fh = 0.
(4.31)
h=1
On the other hand, if Eq. 4.31 is satisfied, then the solution exists, but is not unique (Theorem 26, p. 120). Specifically, if uh is a solution, we have that uh +C (where C is an arbitrary constant) is also a solution. For, the matrix K is symmetric, and hence, using Eq. 4.30, the sum of the columns also vanishes: "n j=1 khj = 0. Thus, we have that if all the particle displacements are equal (namely if, for any j, we have uj = C, where C is an arbitrary constant), all the equations of the associated homogeneous system are satisfied. For,
162
Part I. The beginning – Pre–calculus n #
khj uj = C
j=1
n #
khj = 0.
(4.32)
j=1
Thus, uj = C is a nontrivial solution of the associated homogeneous system, which may always be added to the solution (Theorem 36, p. 130). ◦ Comment. The physical interpretation of Eq. 4.31 is that the solution does not exist unless the sum of all the forces vanishes. On the other hand, if the chain is not anchored, the solution, if it exists, is not unique. For, the particles may be translated by a constant amount without affecting the equilibrium, since this causes no spring force to arise.
4.7 Appendix A. Influence matrix
♥
In this section, we introduce the influence matrix. For the sake of simplicity, the presentation is limited to the case of chains of n particles anchored at both endpoints, addressed in Subsection 4.4.2. Specifically, consider the problem K u = f (Eq. 4.14), where K = κ Q, with Q given by Eq. 4.20. Assume that we apply a single force to only one particle, say, to the particle k. Thus, we have fh = 1 for h = k, and fh = 0 otherwise, namely fh = δhk ,
(4.33)
where δhk is the Kronecker delta (Eq. 3.29). Hence, combining Eqs. 4.11 and 4.33, the governing equations are + , (k) (k) (k) κ − uh−1 + 2 uh − uh+1 = δhk (h = 1, . . . , n), (4.34) with (use Remark 43, p. 155) (k)
(k)
u0 = un+1 = 0.
(4.35)
I claim that the solution is given by uh =
1 n+1−k h, κ n+1
for h ≤ k;
=
1 n+1−h k, κ n+1
for h ≥ k.
(k)
(4.36)
[This is verified in the subsubsection that follows.] The solution is depicted in Fig. 4.6, for n = 15 and k = 8.
4. Statics of particles in one dimension
163
Fig. 4.6 Solution in Eq. 4.36
• Verifying Eq. 4.36
♠
Let us verify that Eq. 4.36 satisfies Eq. 4.34. For h = 1, . . . , k − 1 and h = k + 1, . . . , n, the solution varies linearly with respect to h (Eq. 4.36; see also (k) Fig. 4.6). This fact implies that the value of uh is the semi–sum of the values at the two surrounding points, namely (k)
uh =
, 1 + (k) (k) uh−1 + uh+1 . 2
(4.37)
This satisfies Eq. 4.34, since for the above values of h we have δhk = 0. Next, let us consider the cases h = 0 and h = n + 1. The first in Eq. 4.36 yields (k) (k) u0 = 0, whereas the second yields un+1 = 0, in agreement with Eq. 4.35. Thus, we are left with verifying the case h = k. Substituting Eq. 4.36 into the left side of Eq. 4.34, with h = k, we have + , (k) (k) (k) κ − uk−1 + 2 uk − uk+1 * 1 ) − (n + 1 − k) (k − 1) + 2 (n + 1 − k) k − (n + 1 − k − 1) k = n+1 * 1 ) = (n + 1) − (k − 1) + 2k − k + k (k − 1) − 2k + (k + 1) n+1 = 1,
(4.38)
in agreement with Eq. 4.34. In view of the fact that the solution is unique (Subsection 4.4.2), Eq. 4.36 is the solution.
• What can we do with it?
♥
Here comes the interesting point! Why is this material relevant? What can we do with it? To answer this question, note that we can always write
164
Part I. The beginning – Pre–calculus
fh =
n #
δhk fk .
(4.39)
k=1
"n [Hint: Use the Kronecker delta rule k=1 δhk ak = ah (Eq. 3.30).] This allows us to obtain the general solution as a linear combination of the solutions given in Eq. 4.36, with k = 1, . . . , n. Specifically, using the superposition rule for nonhomogeneous linear algebraic systems in Theorem 34, p. 129, we have that the solution of Ku = f (Eq. 4.14) is given by uh =
n #
(k)
uh fk ,
(4.40)
k=1
or u = G f,
(4.41)
(k) where the matrix G = ghk = uh , is known as the influence matrix. (k) ♥ ♥
• Maximum of uh
(k)
Here is a little exercise. For a given n, find the maximum value that uh in Eq. 4.36 may possibly attain. To obtain this, note that, for a given k, the maximum is attained for h = k. [Hint: The first expression in Eq. 4.36 grows with h, whereas the second decreases with h; see also Fig. 4.6.] Hence, for a given k, the maximum is given by (use Eq. 4.45) k k (k) uk = 1− , (4.42) κ n+1 or (k)
uk =
n+1 n+1 ξ (1 − ξ) = κ κ
1 − η2 , 4
(4.43)
where ξ := k/(n + 1) =: 12 + η. [The Greek letters ξ and η (xi and eta) are widely used as alternatives to x and y to denote abscissas.] Hence, the maximum occurs for η = 0. This does not necessarily correspond to the location of one of the particles, as η = 0 could fall between two particles. Here, to make our life simple, I assume that n is odd. Then, the maximum 1 occurs for η = 0, which
corresponds to k = 2 (n + 1). Setting η = 0 in Eq. (k) = 14 (n + 1)/κ. [What happens if n is even?] 4.43, one obtains uh Max
165
4. Statics of particles in one dimension
• Illustrative example
♥
As an illustrative example, consider the case in which all the forces are equal, say fh = F . This problem was addressed in Subsection 4.4.2, for which the solution is given in Eq. 4.22, namely uh =
* F ) (n + 1) h − h2 , 2κ
(4.44)
which may be used for comparison. Consider Eq. 4.36, in its equivalent form: hk h− , n+1 hk 1 k− , = κ n+1 1 κ
(k) uh =
for h ≤ k; for h ≥ k.
(4.45)
n h # k− k , n+1
(4.46)
Then, one obtains (use Eq. 4.40 with fk = F ) uh = F
n #
(k) uh
k=1
or, using uh =
"q
k=1
k=
1 2
F = κ
# n
h+
k=h+1
h # k=1
k=1
q (q + 1) (Eq. 3.79),
* 1 1 F ) F h (n − h) + h (h + 1) − h n = (n + 1) h − h2 , (4.47) κ 2 2 2κ
in agreement with Eq. 4.44.
• Generality of the method
♣
The results presented here have broad validity. Consider the equation n #
kjk uk = fj
(j = 1, . . . , n),
(4.48)
k=1
namely K u = f, where K denotes any stiffness matrix. Assume K to be nonsingular. Consider the (unique) solutions to the following n problems of n linear algebraic equations with n unknowns, namely n # j=1
ghj kjk = δhk
(h, k = 1, . . . , n),
(4.49)
166
Part I. The beginning – Pre–calculus
where δhk is the Kronecker delta (Eq. 3.29). Then, I claim that the (unique) solution of K u = f is given by u = G f, with G = [ghj ]. Indeed, we have n # j=1
ghj fj =
n # j,k=1
ghj kjk uk =
n #
δhk uk = uh
(h = 1, . . . , n).
(4.50)
k=1
[Hint: Use, in this order, Eqs. 4.48 and 4.49, as well as (Eq. 3.30).] ◦ Comment. Using matrix notations, Eq. 4.49 reads
"n
k=1 δhk
uk = uh
G K = I,
(4.51)
where I = [δhk ] (see Eq. 3.118) is such that I b = b (see below Eq. 3.118). Then, left–multiplying K u = f by G, one obtains u = G f,
(4.52)
in agreement with Eq. 4.50. [More on G K = I in Section 16.2, on the inverse matrix.]
4.8 Appendix B. General spring–particle systems
♠
Here, we consider a very general case (still one–dimensional though), namely one in which each of the n particles (still coaligned) is spring–connected to all the other particles, and, moreover, each particle may be anchored to the ground by an additional spring. [Of course, this includes any spring–particle chain as a particular case, including spring–particle chains with spring constants that are not all equal.] The force Fhj (acting on the particle h, and due to the spring connecting it to the particle j) is given by Fhj = −κhj uh − uj (j = h), (4.53) where, of course, κhj = κjh
(j = h).
(4.54)
[Accordingly, the one–dimensional Newton third law of action and reaction (Eq. 4.6) is satisfied.] In addition, the particle h is subject to the force Fh due to the spring anchoring each particle to the ground, namely Fh = −κh uh ,
(4.55)
167
4. Statics of particles in one dimension
as well as any additional external force fh that is not due to a spring, such as its own weight (if the chain is vertical), or a force due to the fact that you are pushing it). Then, the equilibrium of the particle h is given by (use Eq. 4.5) n #
Fhj + Fh + fh = −
j=1
n #
κhj uh − uj − κh uh + fh = 0,
(4.56)
j=1
where, for the sake of notational simplicity, we have set Fhh := 0
and
κhh := 0,
(4.57)
and included the corresponding term in the appropriate sum in the above equation. The system in Eq. 4.56 may still be expressed as Ku = f (as in Eq. 4.14), where now K = khj , with khj = κh +
n #
κhj
(h = j);
j=1
= −κhj
(h = j).
(4.58)
4.8.1 The matrix K is positive definite/semidefinite
♥
Here, we want to extend to the matrix in Eq. 4.58 the results of Subsubsection “Alternate proof of existence and uniqueness” (p. 159), namely the fact that the stiffness matrix for our problem is positive definite, or positive semidefinite. Indeed, we have uT K u =
=
n #
uh khj uj =
n n n # # # κh + κhj u2h − uh κhj uj
h,j=1 n #
n #
h=1
h,j=1
κh u2h +
h=1
j=1
uh κhj (uh − uj ).
h,j=1
(4.59)
[Hint: For the second equality, use the definition for khj (Eq. 4.58), and the fact that κhh = 0 (Eq. 4.57).] Next, we want to obtain a convenient expression for the last term in the above equation. To this end, note that
168 n #
Part I. The beginning – Pre–calculus
uj κhj (uh − uj ) =
h,j=1
n #
n # uh κjh (uj − uh ) = − uh κhj (uh − uj ). (4.60)
h,j=1
h,j=1
[Hint: The first equality is obtained by interchanging h and j, which is legitimate because the dummy indices of summation may be renamed arbitrarily (Eq. 3.23). The second one is obtained by recalling that κhj = κjh (Eq. 4.54).] Accordingly, we have n #
n #
κhj (uh − uj )2 =
h,j=1
uh κhj (uh − uj ) −
h,j=1
=2
n #
n #
uj κhj (uh − uj )
h,j=1
uh κhj (uh − uj ),
(4.61)
h,j=1
which gives us the desired expression for the last term in Eq. 4.59. Finally, combining Eqs. 4.59 and 4.61, we have uT K u =
n # h=1
κh u2h +
n 1 # κhj (uh − uj )2 . 2
(4.62)
h,j=1
[The factor 12 in front of the last (double) summation symbol corresponds to the fact that each spring is included twice.] Here, we have two possibilities. The first is that at least one of the particles is anchored, namely that at least one of the coefficients κh differs from zero. Then, the matrix K is positive definite, by definition (Definition 70, p. 123), with an exception addressed in the subsubsection that follows. Thus, by Corollary 1, p. 124, for the case under consideration the solution of Ku = f exists and is unique. The second possibility is that all the coefficients κh vanish. Then, the matrix K is positive semidefinite, since in this case uT K u = 0, for uj = C, where C is an arbitrary constant. Indeed, in this case, the associated homogeneous problem has a nontrivial solution uj = C. [You might want to revisit the problem addressed in Section 4.6, which is a particular case of that addressed here.]
• Picky as a good mathematician should be
♥
The situation is actually more subtle than that stated in the above subsection. Indeed, I have tacitly excluded the possibility that the particles can be divided into two or more groups that are not connected to each other. Under such a "n tacit assumption, the condition h,j=1 κhj (uh − uj )2 = 0 implies uh = C for
169
4. Statics of particles in one dimension
all h, where C is an arbitrary constant. The above tacit assumption is not necessarily valid in all cases, although it is valid in all the cases of interest to a physicist, but as mathematicians, you know, we have to be careful. To see when the assumption fails from a physical point of view, let us assume that the particles may indeed be divided into two groups and that none of the particles of the first group is spring–connected to any of the particles of the second group (a case of no interest in this book). Then, if only one of the two groups is anchored to the ground, the other one is free to move. In order to state this fact from a mathematical point of view, let us number the particles so that all the particles of the first group come before those of the second group. Then, using the partitioned–matrix notation of Section 3.7 (see in particular, Eq. 3.148), the fact that there are no springs connecting particles of the first group to particles of the second group is expressed by K O K = 11 , (4.63) O K22 where O is the zero matrix (Definition 57, p. 101). In such a case, the two groups decouple and the conditions discussed above must be applied to each group separately. [Similar reasoning applies if we have more than two groups.]
4.9 Appendix C. International System of Units (SI)
♣
In this appendix, I briefly review the measurement units of interest to us. Here, you will encounter terms that have not yet been defined. However, it seems preferable to group together all the material in a single section, rather than presenting it piecemeal, whenever the corresponding physical quantity is being introduced. Accordingly, I suggest that you postpone reading portions of this section for the time being and/or pay not much attention to the fact that you do not know some of the terms. Throughout the book, we use meters, kilometers, feet and miles to measure distances, as well as seconds, minutes, hours, days and years to measure time, along with grams, kilograms and pounds to measure mass, a physical quantity closely related to weight, which will be introduced in Eq. 11.11. [The relationship between mass and weight is addressed in Subsection 20.8.1.] Accordingly, here we take a quick look at the International System of Units, limited to the units of interest in mechanics and temperature. As you might already know, the International System of Units, abbreviated SI from
170
Part I. The beginning – Pre–calculus
French Le Syst`eme International (d’unit´es), is the world’s most widely used system of measurement units.8 The basic mechanical units of interest to us are the meter, the kilogram, the second and the kelvin. The corresponding system is also known as the MKS (meter, kilogram, second) system, and is the basis of the International System of Units. In addition, we have the kelvin for the absolute temperature. This is the system used in this book. A brief description of the System is presented below. [Additional units are the ampere (electric current), the mole (amount of substance) and the candela (luminous intensity). They are of little interest here. For details, see Ref. [64].] ◦ Meter.9 The meter was originally defined as one ten–millionth of the distance from the Earth’s equator to the North Pole, at sea level. Its definition has been modified over the years, in order to be more and more accurate. Since 1983 (17th Conf´erence G´en´erale des Poids et Mesures), it has been defined as “the length of the path traveled by light in vacuum during a time interval of 1/299,792,458 seconds.” ◦ Kilogram. The kilogram is the unit of mass. The kilogram was originally defined in 1795 as the mass of one liter (that is, 1/1,000 cubic meters) of water, at water’s maximum density (mass per unit volume), namely at 4 degrees Celsius. In 1799, a prototype was chosen to be the official mass of a kilogram. [With this definition, the mass of one liter of water at 4 degrees Celsius is not exactly 1 kg/m3 , but 0.999 972 0 kg/m3 .] A new prototype of the kilogram, an artifact made of platinum–iridium, was manufactured in 1889. However, more accurate measurements have shown that, in reality, this prototype has a mass equal to the mass of 1.000 025 liters of water, at 4 degrees Celsius. On Friday, November 16, 2018, a new definition of the kilogram, in terms of the Planck constant , was voted unanimously by representatives from 58 countries. [Three other standard units of measure were also redefined at that time: (i) the ampere, for electrical current, (ii) the kelvin, for temperature, and (iii) the mole, which describes the amount of a chemical substance. The four new definitions (kilogram, ampere, kelvin and mole) have officially taken effect on May 20, 2019.] ◦ Second. The second is the unit of time. There are 60 seconds in a minute, 60 minutes in an hour and 24 hours in a mean solar day. Accordingly, one has to know exactly the duration of the mean solar day. Too vague a definition! Nowadays, this is totally inadequate for the level of accuracy desired! Thus, the second is now defined very accurately in atomic terms (namely the duration of 9,192,631,770 periods of the radiation corresponding to the tran8
The International System of Units is the modern version of the Metric System, which was first introduced in France in the 1790s, during the French Revolution.
9
The term meter comes from ancient Greek: μ τ ρoν (metron, measure).
4. Statics of particles in one dimension
171
sition between the two hyperfine levels of the ground state of the caesium–133 atom). Remark 44. As you might know, one decimeter corresponds to one tenth of a meter, one centimeter to one hundredth of a meter, one millimeter to one thousandth of a meter. [The roots are Latin: deci comes from decem, Latin for ten, centi from centum, Latin for one hundred, milli from mille, Latin for one thousand.] On the other side of the scale, we have decameters, namely ten meters, hectometers, namely one hundred meters, and kilometers, namely one thousand meters. [If it sounds Greek to you, it is because all these terms come from ancient Greek: deca for ten, hecto for one hundred, kilo for one thousand.] You might also know that micro, nano and pico stand respectively for one millionth, one billionth and one trillionth. [For instance, one micrometer (also known as micron) equals one millionth of a meter, one nanogram equals one billionth of a gram, and one picosecond equals one trillionth of a second. Additional prefixes are: femto, atto, zepto, yocto (also in multiples of 0.001).] On the other hand, mega, giga and tera stand respectively for one million, one billion and one trillion. [For instance, one megahertz equals one million hertz,10 one gigaflop equals one billion FLOPS (floating– point operations per second), one terabyte corresponds to one trillion bytes. Additional prefixes are: peta, exa, zetta and yotta (also in multiples of 1,000).]
• Other mechanical units Other mechanical units are defined in terms of these three basic units. For instance, the units of areas and volumes are the squares and cubes of the unit of length. Similarly, the speed (namely the distance covered per unit time) is measured in m/s, whereas the acceleration (specifically, the change of speed per unit time) is measured in m/s2 . Moreover, we have • The force (mass times acceleration) is measured in newtons (symbol N, named after Isaac Newton, of course): 1 N =1 Kg m s−2 .
(4.64)
• The pressure (force per unit area) is measured in pascals (symbol Pa):11 10
One hertz (symbol Hz) equals one cycle per second. It is named after the German physicist Heinrich Rudolf Hertz (1857–94).
11
Named after the French physicist, engineer, mathematician, and philosopher Blaise Pascal (1623–1662). As a physicist, he contributed to fluid dynamics, in particular to the notion of pressure. As an engineer, he built the earliest practical mechanical calculator
172
Part I. The beginning – Pre–calculus
1 Pa = 1 Kg m−1 s−2 .
(4.65)
Sometimes, we use the bar or the millibar (1 mbar=0.001 bar=100 pascal), or the atmospheres (symbol atm), namely the standard pressure of the air at sea level 1 atm = 101.325 Pa = 1.013 25 bar,
(4.66)
namely 1 bar=0.986 9 atm. • The energy (force times displacement, like work ) is measured in joules (symbol J):12 1 J =1 Kg m2 s−2 .
(4.67)
• The power (energy per unit time) is measured in watts (symbol W):13 1 W =1 Kg m2 s−3 .
(4.68)
◦ Comment. There are also some widely used units that, strictly speaking, are not SI–units. For instance, a liter is one thousandth of a cubic meter. On the other hand, a milliliter (symbol ml) is one thousandth of a liter. Thus, a milliliter is one millionth of a cubic meter, namely a cubic centimeter (symbol: cm3 ). The non–SI symbol cc is often used. Thus, 1 ml = 1 cc.
(4.69)
Also, one hectare equals 10,000 square meters. One acre is about 0.404 hectares. ◦ Warning. The term kilogram is used by physicists to indicate mass, is contrary to the everyday usage of the term, namely to indicate weight. [In the International System of Units, the kilogram is the unit to measure mass. Weight is a force and is measured in newtons.] You could go to a bakery and ask “I would like a loaf of bread that weighs one kilogram.” The baker would understand perfectly well what you want. However, if you were talking (known as the Pascaline). As a mathematician, he introduced important contributions in projective geometry and probability theory. The Pascal triangle (a list of the binomial coefficients, Eq. 13.35) is named after him. His philosophical book Pens´ ees is widely considered a masterpiece of French literature. 12 Named after the English physicist James Prescott Joule (1818–1889), who, among other things, discovered the relationship between mechanical energy and heat, crucial in the formulation of the conservation of energy. [Incidentally, we typically use different units to measure heat (calories) and energy (joules), with 1 joule=0.239 005 74 calories.] 13 Named after the Scottish engineer James Watt (1736–1819), who, among other things, introduced the first practical steam engine, through major improvements in its design.
4. Statics of particles in one dimension
173
to a scientific audience, the correct expression would be: “I bought a loaf of bread with a mass of one kilogram,” or “I bought a loaf of bread that weighs about 9.806 Newton.” [Sometimes we distinguish between kilogram–mass and kilogram–weight (namely 9.806 newtons). The same situation occurs with pounds: we have pound–mass and pound–weight.]
• Temperature
♥
In addition to the mechanical units, when we deal with heat conduction, or the equation of state for ideal gases, or the second principle of thermodynamics, we formulate the problem in terms of the temperature ϑ. [In mechanics, temperature is encountered in the theory of the so–called thermal stresses, related to the expansion typically connected with increases in temperature.] In everyday life, the temperature is officially measured in degree Fahrenheit (symbol ◦ F) in the United States, or in degree Celsius (symbol ◦ C) pretty much in the rest of the world (Liberia, the Bahamas and the Cayman Islands being the other most notable exceptions). Roughly speaking, in the Celsius scale, 0◦ C is the freezing point of water under the standard pressure of the air at sea level (one atmosphere), whereas 100◦ C is the temperature of boiling water, also under the pressure of one atmosphere. On the other hand, in the Fahrenheit scale, the water freezes at 32◦ F and boils at 212◦ F.14 Accordingly, the Fahrenheit temperature scale, ϑF, is related to the Celsius temperature scale, ϑC, by the formula ϑF = 1.8 ϑC + 32.
(4.70)
However, in the scientific literature, we use the absolute temperature. For instance, the equation of state for ideal gases is given by p/ = R ϑ, where p is the pressure, is the density and R is a constant. In this case, ϑ denotes the absolute temperature. Roughly speaking, the absolute temperature is related to the Celsius temperature scale by ϑ = ϑC + 273.16◦ C. In the International System of Units, the absolute temperature is measured in kelvins (symbol K, and not ◦ K), named after Lord Kelvin.15 [The absolute zero (0 K, or 14
Anders Celsius (1701–1744) was a Swedish astronomer, physicist and mathematician, whereas Daniel Gabriel Fahrenheit (1686–1736) was a German physicist.
15 William Thomson, first Baron Kelvin (1824–1907), often referred to as Lord Kelvin, was a Scots–Irish mathematical physicist and engineer. The title Baron Kelvin, bestowed to honor his achievements, is named after the River Kelvin, which flowed next to his laboratory at the University of Glasgow. He made important contributions to different branches of mathematical physics and engineering. Of particular interest are his contributions to thermodynamics, in particular on the absolute temperature (absolute temperatures are measured in units of kelvin in his honor) and in fluid dynamics (the Kelvin theorem
174
Part I. The beginning – Pre–calculus
−273.16◦ C) is when any motion ceases to exist. Accordingly, we always have ϑ ≥ 0.] If we want to be precise, in 1954 the SI governing body gave the Kelvin scale its modern definition by designating its first defining point to be the absolute zero, 0 K, whereas its second defining point is the triple point of water, which is assigned a temperature exactly equal to 273.160 0 kelvins. [The triple point is defined as the combination of pressure and temperature at which liquid water, solid ice, and water vapor can coexist in a stable equilibrium. This occurs, by definition, at exactly 273.160 0 K (0.010 0 ◦ C; 32.018 0 ◦ F), with a partial vapor pressure of 611.657 pascals (6.116 57 mbar; 0.006 036 59 atm).] With the 2018 resolution (when, as stated above, the kilogram was redefined in terms of the Planck constant ), the kelvin was redefined in terms of a fixed value for the Boltzmann constant of 1.380 649 × 10−23 J/K.
• The magic of .9 The following might help you (it helps me!) in remembering how to go from the International to the American System of Units. We have16 1 yard .9 m. 1 quart (one fourth of a gallon) .9 liters. 2 lb (mass) .9 kg (mass). half a degree Celsius .9 degrees Fahrenheit. half a nautical mile .9 km (but half a mile is only .8047 km). 2 acres .81 hectares=.92 hectares (after all, they measure “squares” of lengths).
is named after him). Also, according to Truesdell (Ref. [71], Footnote 2, p. 12–13), the Stokes theorem was discovered by Lord Kelvin, another case of several ones of inaccurate attributions. Also according to Truesdell (Ref. [71], p. 23), he introduced the term lamellar (Footnote 3, p. 762). 16 The symbol stands for “approximately equal to.”
Chapter 5
Basic Euclidean geometry
Here, we present some basic elements of Euclidean geometry. In agreement with the gut–level approach adopted in this book, the level of sophistication I am using here is pretty much that used by Euclid. If you wish to see a more formal approach, take a look at the book Grundlagen der Geometrie, by David Hilbert (Ref. [31]).1
• Overview of this chapter After some preliminaries (Section 5.1), we deal with circles, arclengths, angles and the constant π (Section 5.2). Then, we present the definitions of secants, tangents and normals to a curve (Section 5.3). Next, we consider polygons (Section 5.4), triangles, in particular congruent and similar triangles (Section 5.5), and quadrilaterals, in particular rectangles and squares (Section 5.6), along with their properties, whereas in Section 5.7 we address how to evaluate their areas. Finally, we have all the ingredients to tackle the Pythagorean theorem (Section 5.8). We give two proofs of this theorem. The first is very simple, but it is based on algebra. This is contrary to the rules of ancient Greek geometers who didn’t know algebra and used only compass and straight edge. Accordingly, a second proof is provided to remedy that. We conclude the chapter with some useful additional material (Section 5.9). 1
The German mathematician David Hilbert (1862–1943) was one of the most influential mathematicians of his time. In addition to his axiomatization of geometry, he is famous, among other things, for the seminal work on functional analysis and what is now known as “Hilbert spaces” (addressed in Volume III).
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_5
175
176
Part I. The beginning – Pre–calculus
5.1 Preliminaries To begin with, I presume that you are familiar with the following terms, which I take as primitive: 1. 2. 3. 4.
Point; Line; Plane; Space (three–dimensional).
[These terms have been discussed in Subsection 1.6.5, on arithmetic with real numbers, where we also addressed the distance between two points on a line.] Remark 45 (Line). Just to make sure, a line is always understood to be continuous (namely without interruptions) and not to have bifurcations. It may connect two points, or close on itself, or go to infinity (at one or both endpoints). It is not necessarily non–self–intersecting. In addition, we have the following definitions: Definition 80 (Segment. Straight line. Ray. Curve). Consider, among all the possible continuous lines through two points A and B, the one that has the smallest length. Such a line is called the segment connecting A and B, and is denoted by AB; A and B are called the endpoints of the segment. [For a segment, think of a taut string connecting any two points.] The length of a segment is the distance between A and B and is denoted by AB. [Note that a segment, by definition, is not oriented, in the sense that for a segment we have AB = BA. Otherwise, we use the term oriented segment. An arrow represents an oriented segment.] A straight line, is a line of unlimited length, with the property that a segment connecting any two of its points belongs to the line itself. Any point on a straight line divides it into two portions, each of which is called a ray. A line that is not straight is called a curve. Definition 81 (Congruent, equal, mirror image). Two planar geometrical figures (such as triangles or quadrilaterals) are called congruent if they can be brought to fully overlap one another by a sequence of rigid motions, specifically: (i) a translation, (ii) a rotation, and (iii) possibly one reflection. They are called equal iff no reflection is involved, and mirror images of each other iff one reflection is required. [Two reflections yield no reflection at all, as you may verify. Accordingly, congruent figures are either equal, or a mirror images of each other.]
5. Basic Euclidean geometry
177
5.2 Circles, arclengths, angles and π We have the following definitions: Definition 82 (Locus). Locus (namely “place” in Latin) is a term used in mathematics to indicate the collection of all the points satisfying a specific condition. Definition 83 (Circle. Disk). A circle is the locus of the points that are equidistant from a point O (namely have the same distance R from O; Fig. 5.1). The common distance R is called the radius of the circle, the point O its center. The diameter is the length of a segment through O, with endpoints on the circle. [It equals 2R, as you may verify.] A disk is the locus of all the points that are inside a circle (interior points), possibly along with some of those on the circle (boundary points). If all the boundary points are included the disk is called closed. If none of them is included the disk is called open. [Note the difference with everyday language, where the term “circle” may be used to mean either “circle” or “disk” as defined above.]
Fig. 5.1 Circle, arclength s, and angle θ
Definition 84 (Arclength along a circle. Circumference). Consider a circle C of radius R and center O, as well as two rays emanating from O and intersecting the circle in the points A and B, respectively (see again Fig. 5.1). The distance covered by moving along a circle, from the point A to the point B (both on the circle), is called the arclength along the circle from A to B. In this book, the arclength is denoted by s. [To be precise, the arclength may have a sign. For, s is typically considered positive if the motion is counterclockwise, negative if clockwise. However, unless otherwise stated, the arclength is tacitly understood to be positive, namely that s is covered
178
Part I. The beginning – Pre–calculus
counterclockwise.] The (positive) arclength of the whole circle is called the circumference. Definition 85 (Angle in radians). The angle θ formed by the segments OA and OB in Fig. 5.1 is the ratio between the arclength along the circle from A to B and the radius of a circle, namely θ=
s . R
(5.1)
[The symbol θ denotes the Greek letter theta (Section 1.8). In this book, it is typically used to denote angles. The other symbol for theta, namely ϑ, is used exclusively to denote temperature.] The angle as given by Eq. 5.1 is said to be measured in radians.2 An equivalent definition is that the angle equals the corresponding arclength covered on the unit circle (namely a circle with radius R = 1), with the appropriate sign (see again Fig. 5.1). [Equation 5.1 implies that, strictly speaking, the angle is considered positive if the arclength is covered counterclockwise. However, unless otherwise stated, the angle is tacitly understood to be positive (Definition 84 above).] Typically, the point from which the angles are measured is the point Q in Fig. 5.1 above. Definition 86 (The constant π). The constant π denotes the ratio between the circumference and the diameter of a circle. Its value is approximately given by π = 3.141 592 653 589 793 238 462 643 383 279 502 88 · · · . Accordingly, the angle corresponding to one complete counterclockwise revolution around O measures 2π. [The symbol π denotes the Greek letter pi (Section 1.8). In this book, it is used exclusively to denote the constant defined above.] Definition 87 (Right, acute and obtuse angles). Two straight lines that intersect each other form four angles. Iff the four angles are equal to each other (namely iff θ = π/2), they are called right angles. An angle θ ∈ (0, π/2), namely a positive angle smaller than a right angle, is called acute, whereas an angle θ ∈ (π/2, π), namely an angle greater than a right angle, but less than π, is called obtuse. Definition 88 (Perpendicular and parallel lines). Two straight lines are perpendicular (or orthogonal ) to each other iff they form right angles. Two 2 The angle measured in radians measures the length of an arc of a circle measured in numbers of radii — hence its name. It is not to be confused with the angle measured in degrees, which is defined in Eq. 5.2, and is commonly used in everyday language.
179
5. Basic Euclidean geometry
distinct straight lines are called parallel iff both of them are orthogonal to a third one. [More in Remark 46, p. 182.] Finally, consider the relationship between the angle measured in radians, as defined above, and the angle measured in degrees, used in everyday language. We have the following Definition 89 (Angle in degrees). The angle measured in degrees (denoted by ◦ ), here typically denoted by α, is given by α = 180 θ/π.
(5.2)
[This definition indicates that one degree corresponds to 2π divided into 360 equal angles. Indeed, we have that α = 360◦ corresponds to θ = 2π (full circumference).]
5.3 Secants, tangents and normals Here, we introduce some notions that you are probably familiar with, namely those of secant, tangent and normal to a curve. We have the following definitions (see Fig. 5.2):
Fig. 5.2 Secants, tangent and normal
Definition 90 (Secant). Consider two distinct points, P and P1 , on a given curve L, as shown in Fig. 5.2. The straight line that connects P and P1 is called the secant of L through P and P1 .3 3
From linea secans, “the line that cuts” (from secare, Latin for “to cut”).
180
Part I. The beginning – Pre–calculus
Definition 91 (Tangent). Consider the two points, P and P1 , that define the secant L1 on a given curve L, as shown in Fig. 5.2. Let the point P1 approach (namely become closer and closer to) the point P , which remains fixed. In the process, if L is sufficiently smooth, the secant becomes closer and closer to a specific straight line, say L . We can repeat the same process by letting the point P2 approach the point P , thereby identifying a second straight line, say L . Iff L and L exist and coincide, the line LT = L = L is called the tangent line, or simply the tangent, to L at the point P .4 In this case, we say that the tangent is continuous at P . If, as P1 and P2 approach P , the lines L and L do not exist or do not coincide, we say that the tangent at P does not exist. Iff L and L exist but do not coincide, we say that the tangent is discontinuous at P . [Note that there exist continuous curves that have no tangent at any of their points. An example of this is the Koch curve, which will be introduced in Subsection 8.2.4 (see Fig. 8.5, p. 318).] Definition 92 (Normal). The line LN (see Fig. 5.2), perpendicular to the tangent LT to a given curve L, at the point of tangency P , is called the normal to the line L at P .5 The normal is said to be continuous (discontinuous) at P iff the tangent is continuous (discontinuous) at P .
5.4 Polygons We have the following Definition 93 (Polygon). A polygon is the locus of the points bounded by a line composed of a chain of non–intersecting segments (sides) that are 4
From linea tangens, “the line that touches” (from tangere, Latin for “to touch”). The word “tangent” is used also as an adjective, meaning behaving like the tangent line, as in the sentence: “The straight line L is tangent to the curve C at the point P .” As we will see, the term tangent is also used to indicate a physicists’ vector (Remark 30, p. 94) that is parallel to the tangent line. [An unrelated meaning of the term tangent will be encountered in connection with the trigonometric functions (namely sine, cosine and tangent), discussed in Section 6.4. That may be a source of confusion, as discussed further in Remark 76, p. 365.]
5 The term “normal ” may also be used as an adjective, meaning behaving like the normal line, as in the sentence: “The straight line LN is normal to a given curve L at a given point P .” Of course, if the line L1 is normal to the line L2 at the intersection point, we also say that the lines L1 and L2 are mutually perpendicular, or that they are mutually orthogonal. In other words, the terms normal, perpendicular and orthogonal are interchangeable. A precise rule as to when one of them is preferred to the others is not essential here. [As we will see, the term normal is also used to indicate a physicists’ vector (Remark 30, p. 94) that is parallel to the normal line.]
181
5. Basic Euclidean geometry
connected at their endpoints. To make our life simple, we use the restrictive assumption that, unless otherwise stated, given any side, all the points of the polygon lie on one side of the infinite line through it (Fig. 5.3), so that such a line does not have points in common with the polygon besides those of the side itself. If necessary for clarity, I will refer to this as a typical polygon. [The correct technical term is “convex polygon.”] If the restriction does not apply (Fig. 5.4), I will use the term atypical polygon. The collection of all the sides of a polygon is referred to as its boundary. The term perimeter is used to refer to the corresponding length. A point common to two adjacent sides is called a vertex. The interior angle of a polygon at a given vertex P is the angle (inside the perimeter!) formed by the two sides that have P in common. [Because of the restriction imposed above, all the interior angles of a typical polygon measure less than π.] A polygon is called regular iff it has equal–length sides and equal interior angles.6
Fig. 5.3 A typical polygon
Fig. 5.4 An atypical polygon
A polygon with n sides is called an n-gon. Triangles and quadrilaterals are n-gons with n = 3 and n = 4 respectively.
5.5 Triangles We have the following Definition 94 (Triangle). A triangle is a polygon with three sides. A right triangle is a triangle with a right angle; the side opposing the right angle is 6
The term polygon comes from ancient Greek: πoλυς (polys, many) and γωνια (gonia, angle). Also, perimeter comes from ancient Greek: περι (peri, around) and μετ ρoν (metron, measure).
182
Part I. The beginning – Pre–calculus
called the hypotenuse, the others the legs (or catheti ). A triangle is called equilateral iff the three sides have equal length, isosceles iff two (and only two) sides have equal length, scalene iff the three sides have different lengths.7
5.5.1 Triangle postulate We have the following paramount Postulate 1 (Triangle postulate) The sum of the three interior angles of a triangle equals two right angles. Remark 46. In Definition 88, p. 178, we said then that two distinct straight lines are called parallel iff both of them are orthogonal to a third one. I claim that, to establish if two lines are parallel, it is sufficient that they intersect a third line with the same angle, akin to the lines L1 and L2 in Fig. 5.5, which intersect the line L with the same angle, say α. To this end, let us draw the line LN normal to L1 , which implies that α + β = π/2. Then, the lines L1 and L2 are parallel, because γ = π/2, as you may verify. [Hint: Use the triangle postulate above.]
Fig. 5.5 Parallel lines
Remark 47. ♥ It should be emphasized that the triangle postulate is fully equivalent to the Euclid fifth postulate (also known as the parallel postulate), which states that: “If a straight line falling on two straight lines make the interior angles on the same side less than two right angles, the two straight 7 Isosceles (ισoσκελες) comes from ισoς (isos, ancient Greek for equal) and κελες (keles, ancient Greek for leg), and scalene from σκαληνoς (skalenos, uneven, unequal).
5. Basic Euclidean geometry
183
lines, if produced indefinitely, meet on that side on which are the angles less than two right angles” (Book I, Postulate 5; Ref. [22], Vol. 1, p. 202).8 You might know that, in Euclidean geometry, the above triangle postulate is presented as a theorem, based upon the Euclid fifth postulate. [I will not even think of giving you a proof of the equivalence. However, take a look at Fig. 5.6.] Afterwards, in Euclidean geometry, one would use primarily the fact that the interior angles of a triangle add up to two right angles. Here, I take a shortcut and postulate directly such a fact. Indeed, this is all we will need. The approach used here seems to me to be more direct (gut–level mathematics) and easier to use.
Fig. 5.6 Euclid fifth postulate vs triangle postulate
Let us now introduce the following Definition 95 (Non–Euclidean geometries). ♥ One may introduce geometries in which one uses a postulate different from the Euclid fifth postulate. These are called non–Euclidean geometries. For instance, let us replace the plane with a spherical surface. In this case, spherical segments are — by definition — portions of great circles. [A great circle is the intersection of a sphere with a plane through its center, like the equator. A meridians is half a great circle. Recall that segments are the shortest lines between two points and straight lines contain their segments (Definition 80, p. 176). Since great circles contain lines of minimum distance between two points on the sphere (as you may verify), they are appropriately called spherical straight lines. Accordingly, a spherical segment is a portion of a great circle.] Therefore, we can introduce spherical triangles, as composed of three spherical segments. Here, 8
Euclid (c. 325–265 BC) was a Greek mathematician, born in Alexandria, Egypt. He is best known for his treatise on geometry “The Elements” (Ref. [22]). For this, he is considered by many as the Father of Geometry and the most prominent mathematician of antiquity, although according to others he primarily reorganized material produced by his predecessors (Boyer, Ref. [13], p. 104). The Euclid lemma (Lemma 1, p. 22) is also named after him.
184
Part I. The beginning – Pre–calculus
we emphasize that the sum of the three interior angles of a spherical triangle does not equal two right angles. For instance, consider the spherical triangle identified by the equator plus two meridians, (say, their northern portions), 90◦ apart in longitude. Each interior angle measures π/2, for a total of 3π/2. ◦ Food for thought.♥ Let us take a deeper look at the issue. We have introduced a plane as a two–dimensional space. There exist other two–dimensional spaces, such as the surface of a sphere, as discussed above. What distinguishes the plane from other two–dimensional spaces is its “flatness.” On a plane, we have postulated that the sum of the three angles of a triangle equals two right angles. We could reverse the process and use this fact not as a postulate, but as a definition of planes. Specifically, we might define a plane as a two–dimensional space in which the sum of the three angles of any triangle equals two right angles. This way, we replace the triangle postulate with the definition of a plane. Accordingly, we would need only the first four Euclid postulates, which are (Ref. [22], Vol. 1, p. 154): (i) to draw a straight line from any point to any point; (ii) to produce a finite straight line continuously in a straight line; (iii) to describe a circle with any center and distance; (iv) that all the right angles are equal to one another. [Incidentally, they have been snuck in, inconspicuously, above.]
5.5.2 Similar triangles Let us introduce the following Definition 96 (Similar triangles). Triangles with equal interior angles are called similar. We have the following theorems: Theorem 42 (Similar triangles). Two triangles are similar iff the ratios of corresponding sides are equal. ◦ Proof : ♠ Let us start with similar right triangles. Consider the right triangles AB1 C1 and AB2 C2 shown in Fig. 5.7. They have the same (interior) angles, and hence they are similar, by definition. Indeed, note that the triangle AB2 C2 may be obtained by placing together four triangles (as shown in Fig. 5.7), with each of the four triangles congruent to AB1 C1 . It is apparent that AB2 = 2 AB1 , AC2 = 2 AC1 and B2 C2 = 2 B1 C1 . The same approach may be used for any pair of right triangles, whenever their ratio is a rational number. [For, given two natural numbers, say p and q, we may construct two right triangles similar to AB1 C1 , namely ABk Ck , with ABk = k AB1 , ACk =
5. Basic Euclidean geometry
185
Fig. 5.7 Similar right triangles
k AC1 , and Bk Ck = k B1 C1 , (k = p, q). This implies that ABp / ABq = p/q, ACp / ACq = p/q, and Bp Cp / Bq Cq = p/q.] This is valid for any p/q. Thus, in the limit, the result holds if p/q is replaced by a real number. [Hint: Use the approach in Section 1.6, to go from rational to real numbers (see in particular Eq. 1.76).] For non–right triangles, draw from one of the vertices the normal to the opposite side and apply the preceding results. [If one of the interior angles is greater than π/2, split that angle, as it makes your life easier.] Theorem 43. Two straight lines have at most one point in common. ◦ Proof : Indeed, let O denote an intersection point (Figure 5.8), if such a point exists (parallel lines have no points in common; Theorem 45, p. 186). The right triangles OA1 B1 and OA2 B2 are similar (Definition 96 above). This means that the distance Ak Bk of the point Bk from the horizontal line is proportional to the distance OAk , which differs from 0 unless Ak = O.
Fig. 5.8 At most one point in common
5.5.3 Congruent triangles We have the following
186
Part I. The beginning – Pre–calculus
Theorem 44 (Congruent triangles). Two triangles are congruent if: (i) the three sides are equal, or (ii) two sides and the included angle are equal, or (iii) two angles and a side are equal. In addition, two right triangles are congruent if: (iv) two legs are equal, or (v) one leg and the hypotenuse are equal. ◦ Proof : Indeed, from such data you may construct a triangle, which is unique with the possible exception of one reflection. [For Item (iii), note that the three angles are equal because of the triangle postulate (Postulate 1, p. 182). Hence, the triangles are similar (Definition 96, p. 184), and one equal side is all we need to complete the proof.] ◦ Comment. You might wonder why an additional case (namely that of two sides, say a and b, and an angle, say θ, that is not in between the two) is not included in the list. The reason is that, if b ∈ (h, a) (where h = a sin θ, with θ < π/2), we have two distinct triangles that may be obtained with these data (Fig. 5.9; see (i) the light grey triangle and (ii) that obtained by combining the light and dark grey regions). [However, there is no problem if b = h. The triangles ABC and EBC would coincide (see also Item (v)).]
Fig. 5.9 Two distinct triangles from the same data
We have the following Theorem 45 (Parallel lines). The distance between two parallel lines is constant, and hence they don’t have any point in common. Vice versa, if the distance between two lines is constant, the two lines are parallel. [The distance between two lines, say L1 and L2 , at a given point P of L1 is here defined as the distance of P from the closest point of L2 .] ◦ Proof : Let us begin with the first part of the theorem. Consider Fig. 5.10, which is obtained from Fig. 5.5, p. 182, by adding the segment CD, perpendicular to L2 at D. The line LN is assumed to be perpendicular to
5. Basic Euclidean geometry
187
Fig. 5.10 Parallel lines
both L1 and L2 so as to make these two lines parallel by definition. We have αB + βB = π/2 and αD + βD = π/2 by construction, and αD + βB = π/2 (triangle postulate, with γ = π/2). These yield αB = αD and βB = βD . Hence, the triangles ABD and BCD are congruent. [Hint: Use Item (iii) in Theorem 44; two equal angles and an equal (shared) side.] Hence CD = AB, in agreement with the first part of the theorem. Regarding the second part of the theorem, consider any two points on L2 , say A and D, and draw the normals to L2 through these two points. We have AB = CD, by hypothesis. [For, AB and CD are the corresponding distances between the two lines, equal by hypothesis.] Consider the triangle ABD, and denote by αD and βB its interior angles at D and B, respectively, as shown in the figure. We have that, necessarily, βB = βD . [Indeed, we have βD = π/2 − αD , because DC is perpendicular to L2 , and βB = π/2 − αD , because of the triangle postulate.] Thus, the triangles ABD and BCD are congruent. [Hint: Use Item (ii) in Theorem 44 above, with (i) AB = CD, (ii) BD shared, and (iii) βB = βD .] Accordingly, LN is normal to L1 and L2 . Therefore, L1 and L2 are parallel, by definition.
5.5.4 Isosceles triangles. Pons Asinorum We have the following Lemma 6. Consider an isosceles triangle. A segment through the vertex V , perpendicular to the base AB, intersects AB at its midpoint M . Vice versa, the line connecting the vertex V and the midpoint M is perpendicular to AB. ◦ Proof : Consider Fig. 5.11. If M V is perpendicular to AB, the triangles AM V and BM V are congruent. [For, they are right triangles, with: (i) hypotenuses of equal length, namely AV and BV , and (ii) one leg in common, namely M V ; Theorem 44, p. 186, Item (v).] Therefore, we have AM = BM .
188
Part I. The beginning – Pre–calculus
Vice versa, if AM = BM , the triangles are again congruent. [Hint: Three equal sides; Theorem 44, p. 186, Item (i).] Thus, the corresponding angles are equal. This implies that the two angles at M are necessarily equal to π/2.
Fig. 5.11 Isosceles triangles
As a consequence, we have the following9 Theorem 46 (Pons Asinorum). The base angles of an isosceles triangle are equal. ◦ Proof : In the proof of the preceding lemma, we have already obtained that an isosceles triangle may be divided into two congruent right triangles, as in Fig. 5.11. Thus, the base angles are equal. We have also the following Theorem 47. Consider a segment AB and the line L perpendicular to AB through its midpoint M . Let P denote any point on L distinct from M . The triangle ABP is isosceles. ◦ Proof : The triangles AM P and BM P are congruent, as you may verify. [Hint: Use Theorem 44, p. 186, on congruent rectangular triangles (Item (iv), right triangle, with equal legs).] Therefore, AP = BP and hence the triangle ABP is isosceles.
5.6 Quadrilaterals We have the following Definition 97 (Quadrilateral). A quadrilateral is a four–sided polygon. 9
In Latin, Pons asinorum (often translated as “bridge of asses”) has a double meaning, namely “bridge of donkeys” and “bridge of idiots.” It is often stated that the name refers to the fact that not so bright students were not able to progress beyond such a theorem.
189
5. Basic Euclidean geometry
We have the following Theorem 48. The sum of the four interior angles of a quadrilateral equals four right angles. ◦ Proof : Split the quadrilateral into two triangles and use the triangle postulate (Postulate 1, p. 182). [Let us consider an atypical quadrilateral (namely in which one of the interior angles is larger than π), as shown in Figure 5.12. In this case, to make your life easier, it is convenient to split that angle.]
Fig. 5.12 Sum of angles in a quadrilateral
5.6.1 Rectangles and squares We have the following Definition 98 (Rectangle and square). A rectangle is a quadrilateral obtained by joining two equal right triangles (such as ABC and ACD in Fig. 5.13) along their hypotenuses, to obtain the figure in Fig. 5.14. The segments that join two opposite vertices (namely AC and BD, in Fig. 5.14) are called the diagonals. A square is a rectangle with four equal sides.
Fig. 5.13 Joining two right triangles
We have the following
Fig. 5.14 Rectangle
190
Part I. The beginning – Pre–calculus
Theorem 49. Adjacent sides of a rectangle form right angles. Opposite sides are equal and parallel. The two diagonals are equal and bisect each other. ◦ Proof : ♥ All three propositions stem directly from the construction used in the definition of rectangles (Fig. 5.13), as you may verify. [Hint: For the first proposition, use the triangle postulate (Postulate 1, p. 182). For the second, opposing sides are equal by construction, and parallel by Definition 88, p. 178, of parallel lines. For the third, use Theorem 44, p. 186, Item (iv), on congruent rectangular triangles. Specifically, consider the two pairs of triangles (namely ABC and ACD, as well as ABD and BCD) obtained by dividing the rectangle along the two diagonals (Fig. 5.14).]
5.6.2 Parallelograms
♣
We have the following Definition 99 (Parallelogram. Base and height). A parallelogram is obtained by connecting two equal triangles (e.g., the triangles ABC and ACD in Fig. 5.15), along the corresponding sides. Any side of the parallelogram may be chosen as its base. [Typically, a parallelogram is drawn with two horizontal sides; then, the base is usually the bottom one.] The distance of the base from the opposite side is called the height h of the parallelogram. [For instance, in Fig. 5.15, the base of ABCD is BC; its length is a = BC. The height is h = AO.]
Fig. 5.15 Parallelogram
We have the following theorems: Theorem 50. Opposite sides of a parallelogram are parallel, and vice versa. [Hence the name.]
191
5. Basic Euclidean geometry
◦ Proof : Opposite sides of a parallelogram are parallel by construction (use Remark 46, p. 46). Vice versa, if the sides are parallel, one of the diagonals, say AC, divides the quadrilateral into two triangles ABC and CDA. Note that αA = αC and βA = βC (use again Remark 46, p. 182, on parallel lines). Therefore, ABC and CDA are congruent. [Hint: Use Item (iii) of Theorem 44, p. 186 (two equal angles and a common side).] They are not just congruent — they are actually equal, as you may verify. Theorem 51. Opposing sides of a parallelogram are equal. Vice versa, quadrilaterals with equal opposing sides are parallelograms. ◦ Proof : The first part of the theorem stems directly from the definition. Regarding the second part of the theorem, consider the diagonal AC. This divides the quadrilateral into two triangles, which are congruent. [Hint: Use Item (i) of Theorem 44, p. 186 (three equal sides).] They are actually equal, as you may verify. Theorem 52. Consider a parallelogram. If the diagonals are equal, the parallelogram is a rectangle. ◦ Proof : Each diagonal splits the parallelogram into a pair of two triangles. This yields two sets of two triangles each, namely: (i) ABC and DEA by cutting with the diagonal AC in Fig. 5.15, and (ii) ABD and CDB by cutting with the diagonal BD (not shown in the figure). I claim that these four triangles are congruent. Indeed, the sides of each one of them include the opposite sides of the parallelogram, which are equal (Theorem 51, above), and one of the two diagonals, which are equal by hypothesis. Accordingly, the angles at A, B, C and D (Fig. 5.15) are equal, and hence equal to π/2 (use Theorem 48, p. 189), which shows that the parallelogram is a rectangle.
5.6.3 Rhombuses
♣
We have the following Definition 100 (Rhombus). A rhombus is a four–sided quadrilateral that may be obtained by joining four congruent right triangles as shown in Fig. 5.16. We have the following Theorem 53. The diagonals of a rhombus are mutually perpendicular, bisect each other (namely divide each other into equal parts), and bisect the interior angles.
192
Part I. The beginning – Pre–calculus
Fig. 5.16 Rhombus
◦ Proof : Consider the four generating triangles, namely AOC, BOC, AOD and BOD, which are congruent right triangles, by definition. The four angles at O are equal to π/2. Accordingly, the diagonals are given by the segments AOB and COD, which are mutually perpendicular. Moreover, the diagonals bisect each other, as well as the angles, as you may verify.
5.7 Areas In Eq. 1.78, we introduced areas for the limited case of rectangles (see also Remark 14, p. 49). Those results are summarized in the following Definition 101 (Area of rectangles). The area of a rectangle is given by A = a b, where a and b are the sides of the rectangle.
• Additivity of areas In Remark 14, p. 49, we also pointed out that areas — in the case addressed there — are additive (namely areas have the property that they may be added). Here, we incorporate this property in the definition of areas that will be used in the subsubsections that follow. Specifically, we have that the overall area of two geometric figures that have no interior points in common equals the sum of the individual areas.
• Area of triangles In analogy with Definition 99, p. 190, on parallelograms, we have the following
193
5. Basic Euclidean geometry
Definition 102 (Base and height of a triangle). Given a triangle ABC, choose any of its sides, say BC, and refer to it as the base of the triangle. Consider the perpendicular to BC through A. Let O be the intersection point (Fig. 5.17, below). The length h of the segment AO is called the height of the triangle relative to the base BC. Let us consider a rectangle having the same base BC and the same height h as the triangle under consideration, as shown in Fig. 5.17. The rectangle has an area that is double that of the triangle, as you may verify. Accordingly, the area of a triangle is A=
1 a h, 2
(5.3)
Fig. 5.17 Triangle area
• Area of parallelograms Recall that a parallelogram is obtained by putting together two triangles, as in Fig. 5.15, p. 190 (see Definition 99, p. 190, where we defined also its base length and its height). Accordingly, its area is given by the product of the base length times its height, that is, A = a h.
(5.4)
194
Part I. The beginning – Pre–calculus
• Area of rhombuses
♠
The area of a rhombus is simply given by A=
1 a b, 2
(5.5)
where a and b denote the length of the two diagonals. For, the area of each of the four triangles that form the rhombus is A = a b/8, as you may verify.
5.8 Pythagorean theorem We have the following Theorem 54 (Pythagorean theorem). In a right triangle, the square of the hypotenuse c equals the sum of the squares of the two legs a and b: a 2 + b2 = c 2 .
(5.6)
Fig. 5.18 Pythagorean theorem
◦ Proof : From Fig. 5.18, we have that the area of the outer square, which has sides equal to a + b, is given by A = (a + b)2 . This area equals: (i) the area of the inner square, which is equal to c2 , plus (ii) the areas of the four triangles, each of which has an area equal to 12 ab. This yields c2 + 4
ab = A = (a + b)2 = a2 + 2 a b + b2 . 2
Simplifying, one obtains Eq. 5.6.
(5.7)
◦ Warning. As pointed out in Remark 12, p. 45, ancient Greeks didn’t know algebra. Accordingly, all the proofs were necessarily obtained strictly by using
195
5. Basic Euclidean geometry
compasses and straight edges instead. Here, in contrast, I departed from standard Euclidean geometry and used algebra to prove geometry, specifically the Pythagorean theorem, as this approach simplifies our lives considerably. For the sake of completeness, a standard Euclidean–geometry proof is presented in the subsubsection that follows.
• A Euclidean proof of the Pythagorean theorem
♥♥
As mentioned above, here for the sake of completeness I present an alternate Euclidean–geometry proof of the Pythagorean theorem, the way I learned it in high school, in a course on Euclidean geometry. Consider Fig. 5.19.
Fig. 5.19 Pythagorean theorem (Euclidean geometry)
I claim that the area of the square BCDE (namely a2 ) equals the area of the rectangle BN M L. Indeed, both areas are equal to that of the parallelogram BCF G. In order to show this, let us first compare the square BCDE and the parallelogram BCF G. They have the same area because they have equal base length a = BC, and equal height h = BE. Next, consider the parallelograms BCF G and BN M L. They also have the same area, because they have equal height h = BL, and equal base length c, since BG = BA = c, because the right triangles BGE and BAC are equal. [Hint: Use Theorem 44, p. 186, Item (iii), on congruent triangles. Indeed, they have one equal
196
Part I. The beginning – Pre–calculus
leg (BE = BC = a) and two equal angles (the right angles and those at B, as you may verify).] Of course, the same considerations yield that the area of the square AP RC (namely b2 ) equals the area of the rectangle ALM S. Adding the results, and noting that the sum of the areas of BN M L and ALM S equals c2 , one obtains a2 + b2 = c2 , as in Eq. 5.6.
5.8.1 The Pythagorean theorem in three dimensions Here, we want to extend to three dimensions the Pythagorean theorem (Eq. 5.6). Before we do this, it is appropriate to introduce some terminology used in three–dimensional geometry, also known as solid geometry. We have the following Definition 103 (Polyhedron). A polyhedron is the locus of the points bounded by a planar (namely flat) faces. The intersections of two faces are called the edges. The common points of three or more edges are called the vertices. To make our life simple, we use the restrictive assumption that, given any face, all the points of the polyhedron lie on one side of it, namely that the plane of the face does not have points in common with the polyhedron, besides those of the face itself. [If we want to be precise, in analogy with what we said for polygons (Definition 93, p. 180), we should use the standard term for this restriction, namely convex polyhedron.]10 In particular, we have Definition 104 (Hexahedron. Parallelepiped. Cuboid. Cube). A hexahedron is a polyhedron with six faces.11 A parallelepiped is a hexahedron whose opposing faces are mutually parallel. A cuboid is a parallelepiped, with six rectangular faces.12 A cube is a cuboid whose faces are square. Then, we have the following Theorem 55 (Pythagorean theorem in three dimensions). Consider a cuboid. Let a, b and c denote the length of its edges and d the length of its diagonals. We have a2 + b 2 + c 2 = d 2 .
(5.8)
10 The term polyhedron comes from the Classical Greek πoλυεδρoν, from πoλυς (polys, “many”) and εδρα (hedra, “base” or “seat”). 11
From ancient Greek: εξ (hex, six) and εδρα (hedra, “base” or “seat”). The equivalent terms right cuboid, rectangular box, rectangular hexahedron, rectangular prism, and rectangular parallelepiped are also used.
12
197
5. Basic Euclidean geometry
Fig. 5.20 Cuboid and Pythagorean theorem in three dimensions
◦ Proof : Consider the rectangle with edges a and b, namely the base of the cuboid in Fig. 5.20, and let d denote the length of its diagonals. Using the 2 Pythagorean theorem (Eq. 5.6), we have d = a2 + b2 . Next, applying the Pythagorean theorem to the right triangle with legs d and c, and hypotenuse 2 d (gray triangle in Fig. 5.20), we have d2 = d +c2 = a2 +b2 +c2 , in agreement with Eq. 5.8.
5.8.2 Triangle inequality
♣
We have the following Definition 105 (Projection). Consider a segment AB and a straight line L. Consider the normals to L through the points A and B, as shown in Fig. 5.21. Let A and B denote the points where the two normals intersect L. The segment A B is called the projection of the segment AB into the straight line L. [If you want a practical example of projection, think of the shadow of a rod onto a horizontal plane, when the Sun is at its zenith.13 The shadow is generated by vertical lines through the points of the rod.] We have the following Lemma 7. The projection is never greater than the original segment, and equals it only when the segment is parallel to L. 13
The root of the term zenith is the Arabic word samt (abbreviation for the expression “samt ar–ra’s,” which means “direction of the head,” or “path above the head”). The term samt was miswritten as cenit by the Medieval Latin scribes (the “m” being misread as “ni”). The spelling “zenith” first appeared in the 17th century.
198
Part I. The beginning – Pre–calculus
Fig. 5.21 Projection
◦ Proof : You may use the Pythagorean theorem applied to the triangle ABB in Fig. 5.21 to show that AB (which is equal to A B ) is shorter than AB. Then, we have the following Theorem 56 (Triangle inequality). One side of any triangle is always shorter than the sum of the other two. ◦ Proof : Consider the side c in Fig. 5.22. Note that the projection of the vertex C falls within the base AB. Hence, we have c = a + b < a + b. [Hint: Use Lemma 7 above.] On the other hand, if the projection of the vertex C lies outside of the base (as in Fig. 5.23), then c < b < b < a + b.
Fig. 5.22 Triangle inequality (I)
Fig. 5.23 Triangle inequality (II)
199
5. Basic Euclidean geometry
5.9 Additional results
♥
Here, we present some additional results, namely: (i) the points needed to identify a circle; (ii) a simple procedure to obtain the midpoint of a segment and the corresponding normal, and (iii) an unexpected connection between circles and right triangles.
5.9.1 Three non–coaligned points and a circle
♥
Let us consider the following Theorem 57 (Three non–coaligned points define a circle). In a plane, three distinct non–coaligned points define a circle uniquely. ◦ Proof : Consider the points A, B and C in Fig. 5.24. Draw the normal to the segment AB, through its midpoint P , and the normal to the segment BC through its midpoint Q. [A simple procedure to accomplish this is presented in the subsection that follows.] Let O denote the point where these two lines intersect each other. By construction, the segments OA, OB and OC have the same length, say R (Theorem 47, p. 188). Thus, we may draw a circle, with center O and radius R. Regarding the uniqueness of the solution, the center O of the circle lies at the intersection of the two normals to the segments AB and BC, through their midpoints, and hence is unique (intersection of two straight lines, Theorem 43, p. 185). Accordingly, the circle we obtained is also unique.
Fig. 5.24 Circle through three points
200
Part I. The beginning – Pre–calculus
◦ Comment. The theorem is still applicable when the plane is embedded in a three–dimensional space. Indeed, three points identify a triangle, and hence a plane that contains it. Then, we can apply the preceding theorem and obtain that these same points identify a circle within such a plane. We have the following Theorem 58. In a plane, two distinct circles have at most two points in common. ◦ Proof : If they had more than two points in common, the two circles would coincide, because of the theorem above.
5.9.2 Easy way to find midpoint and normal of a segment ♥ Several times, in particular in the preceding subsection, we have used the midpoint of a segment and the corresponding normal, which may be obtained whichever way you wish. A simple construction to find them is presented in the following Theorem 59 (Finding midpoint and normal of a segment). Consider Fig. 5.25. Draw two circles, with centers A and B and radius R > 12 AB. The two circles meet at P and Q. The intersection (say M ) of AB and P Q is the midpoint of AB, whereas P Q is normal to AB through M . ◦ Proof : By construction, we have AP = AQ = BP = BQ = R. Thus, the quadrilateral AQBP is a rhombus. Therefore, the diagonals AB and P Q are perpendicular and bisect each other (Theorem 53, p. 191). Hence, M is the midpoint of AB and P Q is the normal to AB through M .
Fig. 5.25 Midpoint of a segment
201
5. Basic Euclidean geometry
◦ Comment. Given a line L and a point P ∈ / L, draw a circle that has center P and intersects the straight line L (Fig. 5.26) at A and B. Then, using Theorem 59 above, we can obtain the normal to a given straight line L through a given point P , as you may verify.
Fig. 5.26 Normal to L through a given point P
5.9.3 Circles and right triangles
♥
We have the following Theorem 60. Let BC be a diameter of a circle, and P any other point on the circle. The triangle BCP has a right angle at P . ◦ Proof : Consider the triangle BCP in Fig. 5.27. Rotate it by 180◦ around the center of the circle, O generate the triangle CBQ. By construction, the triangles BCP and CBQ are equal, and hence we have BP = CQ and BQ = CP . Therefore, the quadrilateral BQCP is a parallelogram (Theorem 51, p. 191). In addition, OQ is obtained through a 180◦ rotation of OP around O, with OP = R (R being the radius of the circle). Hence, OP = OQ = R, with P , O and Q coaligned. Therefore, P Q = 2R = BC. As a consequence, the parallelogram BQCP is a rectangle (Theorem 52, p. 191, on parallelograms with equal diagonals). Accordingly, the angle at P is a right angle. The reverse is also true. Indeed, we have the following Theorem 61. Consider the triangle BCP in Fig. 5.27, and assume the angle at P to be a right angle. Then, P is on the circle whose diameter is BC. ◦ Proof : Again, rotate the right triangle BCP , by 180◦ , around the point O (midpoint of the segment BC), as in Fig. 5.27 above. By definition of
202
Part I. The beginning – Pre–calculus
Fig. 5.27 Angle at P is right
rectangle, the quadrilateral BQCP is a rectangle. Hence, the diagonals P Q and BC have equal length and bisect each other (Theorem 49, p. 190), say at O. Accordingly, we have OP = OB = OC. In other words, the points B, C and P are equidistant from O, in agreement with the theorem.
Chapter 6
Real functions of a real variable
In this chapter, we introduce a new notion, that of a real function of a real variable, and introduce specific functions of interest in this book, such as polynomials, trigonometric functions and powers of positive real numbers with real exponents.
• Overview of this chapter We begin by presenting some basic concepts regarding functions and their graphical representations (Section 6.1). Then, in Section 6.2, we introduce linear dependence and linear independence of functions, which are analogous to those for vectors introduced in Section 3.3. As an illustrative example, we show that 1, x and x2 are linearly independent. Next, we introduce specific functions of interest. In particular, we discuss polynomials, namely linear combinations of the natural powers of x (Section 6.3), and the traditional trigonometric functions, namely sine, cosine and tangent of x (Section 6.4). Then, in Section 6.5, we introduce the operation of exponentiation, aα , with a real and positive, and α real. This allows us discuss additional functions, such as y(x) = xα (power with real exponent) and y(x) = ax , as well as y(x) = xx and y(x) = x1/x . Finally, in Section 6.6 we introduce composite functions and inverse functions, whereas in Section 6.7 we discuss single–valued and multi–valued functions. We also have an appendix (Section 6.8), where we delve more deeply in the definition of functions. In particular, we specify the meaning adopted in this book for the terms “functions,” “operators” and “mappings,” and clarify the relationship between functions and vectors.
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_6
203
204
Part I. The beginning – Pre–calculus
6.1 Introducing functions We have the following Definition 106 (Real function of a real variable). A real function of a real variable is a relationship between two real numbers, called variables (because their values may vary, contrary to the those of the constants, which cannot vary). Specifically, assume that we have an algorithm that, for given values of the real number x provides us with a value for y, which is called the dependent variable, because its value depends upon the value of x, which in turn is called the independent variable, or the argument of the function. The function f (x) is the algorithm used to arrive from the value x to the value y.1 We denote this by y = f (x).
(6.1)
[Unless required for clarity, I will use the term “function,” instead of the more specific “real functions of a real variable.”] ◦ Comment. The notation y = y(x) is also widely used. Strictly speaking, such a notation is improper, because the same symbol y is used to denote: (i) the function and (ii) the value that the function generates. Nonetheless, simply because it is widely used in practice, it is often adopted here, unless otherwise needed for the sake of clarity.
• Graph of a function As they say: “A picture is worth a thousand words.” So, let us turn to a graphical representation of a function. To do this, we introduce a pair of orthogonal axes, known as the Cartesian axes, which you probably have encountered in high school.2 1 In this book, an algorithm is a well–defined sequence of operations that allows one to perform a mathematical task (as you would if you were to write a computer program). The term algorithm has an interesting origin, which includes some mixup in the translation from Latin. Like the term algebra, this also has to do with al–Khw¯ arizm¯ı and his book (Ref. [3]). Specifically, its twelfth–century Latin translation was titled Algoritmi de numero Indorum, which means On Indian numbers by Algoritmus, Algoritmus being the Latinization of al– Khw¯ arizm¯ı. Instead, it was later understood as Algorithms on Indian numbers. [If you are familiar with Latin declinations, you will note that in the original translation Algoritmi is the genitive singular of Algoritmus, whereas in the second Algoritmi is the nominative plural of Algoritmus.] 2 Named after Cartesius, the Latin name of the French philosopher and mathematician Ren´ e Descartes (1596-1650). Yes, Descartes is the one of the “Cogito ergo sum” fame (Latin for I think, therefore I am), who is widely recognized as the father of modern philosophy.
6. Real functions of a real variable
205
Specifically, we have the following3 Definition 107 (Cartesian coordinates). Consider a horizontal straight line and a vertical straight line through a point O, called the origin, as in Fig. 6.1. These straight lines are called respectively the x- and the y-axes. The positive directions are respectively rightwards and upwards. The horizontal distance from the vertical axis is called the abscissa (positive rightwards), whereas the vertical distance from the horizontal line (positive upwards) is called the ordinate. The two (abscissa and ordinate) are referred to as Cartesian coordinates. The corresponding plane is called the Cartesian plane. [I will also use the terms x- and y-axes to refer to abscissas and ordinates.]4 The notion of functions is easily understood through its graph (Fig. 6.1), for which we have the following
Fig. 6.1 The graph of a generic function y = f (x)
He is also recognized as the father of analytic geometry, the subject matter of the next chapter. Such a field deals with coupling mathematical analysis and geometry, to describe geometrical figures in terms of analytical expressions — hence the name analytic geometry. [For instance, in two–dimensional analytic geometry we will deal with straight lines, circles, parabolas, ellipses, and hyperbolas.] To accomplish this, Descartes made use of the x- and y-axes (abscissas and ordinates) addressed here. 3 Although abscissa and ordinate are generally known as the Cartesian coordinates, it appears that neither term was used by Ren´ e Descartes. Apparently, the mathematical term abscissa occurs for the first time in a 1659 work, written in Latin by the Italian mathematician Stefano degli Angeli (1623–1697). The terms abscissa and ordinate were extensively used by Leibniz and became the standard mathematical terms after that. [Incidentally, Nicolas d’Oresme (1323–1382), a French mathematician, physicist, astronomer, economist and philosopher, used constructions similar to that used for the Cartesian coordinates, well before the time of Descartes. He used the terms latitude and longitude.] 4 The term abscissa (the feminine past participle of abscindere, Latin for “to cut off”) comes from abscissa linea, which literally means cut–off line, what we now call a segment. It refers to the segment that gives the horizontal distance from the origin. The term ordinate (from ordinatae, the feminine plural past participle of ordinare, Latin for “to place in order”) comes from the expression lineae ordinatae (Latin for “ordered lines”), a term utilized by Roman surveyors for “parallel lines.” The term refers to the lines perpendicular to the abscissas, which are indeed parallel lines. For, the value of the ordinate was measured not along the y-axis, but along a vertical line through the abscissa x under consideration.
206
Part I. The beginning – Pre–calculus
Definition 108 (Graph of a function). The graphical representation of y = f (x), in terms of abscissas (the values of x) and ordinates (the values of y) is called the graph of the function f (x).
6.1.1 Powers Some examples should help. So, let me start with a function that is familiar to all of us, namely the square of x (natural powers of real numbers were introduced in Definition 16, p. 37): f (x) = x2 ,
where x ∈ (−∞, ∞).
(6.2)
This means that, in this case, the algorithm indicated by f (x) consists in
Fig. 6.2 The function y = x2
Fig. 6.3 The function f (x) = x3
taking the square of the real number x. For instance, for x = .2 we obtain y = .04, for x = .3 we obtain y = .09, and so on. The graph of the function y = x2 (Eq. 6.2) is presented in Fig. 6.2. Another example that may be familiar to you is y = x3 , depicted in Fig. 6.3. For simplicity, both graphs are limited to x ∈ [−1, 1]. In general, for any of the so–called n-th natural powers of x f (x) = xn ,
where x ∈ (−∞, ∞),
where n = 1, 2, . . . , we use Definition 16, p. 37.
(6.3)
207
6. Real functions of a real variable
6.1.2 Roots Another function that is familiar to all of you is the positive square root of a positive real number x, namely (see Eq. 1.55) f (x) =
√ x > 0,
The graph of the function y = graph is limited to x ∈ [0, 1].
Fig. 6.4 The function y =
where x ∈ (0, ∞).
(6.4)
√ x is presented in Fig. 6.4. For simplicity, the
√ x
We also have the following Definition 109 (n-th root of a). Consider a real number a. The symbol b=
√ n
a
(6.5)
denotes an n-th root a, namely the number b such that bn = a (as in Definition √ 17, p. 38, of a). To be specific, if n is odd, we have only one root, which has the same sign as a. On the other hand, if n is even, a = bn is necessarily positive (in the real field). Moreover, we have two numbers, having opposite √ signs, whose n-th power equals a. The positive one is denoted by n a, the √ negative one by − n a. Both of these numbers are considered as roots of a. For instance, the cubic root of a = −125 is a = −5. Indeed, (−5)3 = −125. On the other hand, √ the two (real) fourth roots of 81 are ±3. Indeed, (±3)4 = 81. [However 4 81 = 3 > 0 (Definition 109 above).]
208
Part I. The beginning – Pre–calculus
6.1.3 Range and domain I wrote, on the right side of Eq. 6.2, “where x ∈ (−∞, ∞).” Similarly, on the right side of Eq. 6.4, I wrote “where x ∈ (0, ∞).” Let us examine this issue in detail. To this end, let us introduce the following definitions: Definition 110 (Set). Any collection of points is called a set of points, or simply a set. [For instance, the collection of all the points corresponding to integers forms a set.] Definition 111 (Domain of definition and range). The domain of definition of a function f (x) is the set of points x where f (x) is defined. In this book, the domain of definition typically consists of an interval. In this case, we speak of the interval of definition. The range of a function f (x) is the set of points corresponding to all the values that f (x) attains in the domain. For instance, the domain of definition of y = x2 is (−∞, ∞), whereas that √ √ of y = x is [0, ∞). On the other hand, for both y = x2 and y = x, the range is [0, ∞), since in both cases the function is positive (Figs. 6.2 and 6.4).
6.1.4 The one–hundred meter dash
♥
In order to obtain a clearer understanding of the term function (and of the corresponding graph), let us consider a more subtle example. Specifically, consider the graph that gives the speed of a runner in the one–hundred meter dash, in terms of the distance from the starting line. Let us say that you use a measuring instrument, which, at some given points xk , gives you the speed vk of the runner at those points. Next, you plot these points on a Cartesian plane and then simply connect them with segments. [This process is referred to as linear interpolation, and simply consists in connecting adjacent dots with a segment. The analytical expression for a straight line, irrelevant here, is presented in Subsection 6.3.2, on linear interpolation (Eq. 6.31).] Note that the function is defined piecewise (namely piece by piece), as a sequence of segments. A specific example is depicted in Fig. 6.5. The abscissas represent the distance x from the starting line. The ordinates y represent the measured/interpolated speed y = v(x) that the runner has at the location x. [Note that I shifted the notation from y = f (x) into the more descriptive y = v(x). For, the letter used to denote a function is arbitrary, and the letter typically used to denote velocity is v.]
209
6. Real functions of a real variable
Fig. 6.5 Velocity vs space
Assume that the speed is measured in meters per second and the distance from the starting line is measured in meters. In addition, let us assume that the distance between the measuring points, xk , happens to be one meter. As shown in Fig. 6.5, during the first ten meters, our runner reaches the speed of 10 m/s (namely 36 km/h, or approximately 22.37 mph), whereas, after that, the speed remains constant and equal to 10 m/s. The interval of definition of this function, expressed in meters, is [0, 100], whereas the range, expressed in meters per second, is [0, 10]. Again, the figure represents the graph of the function, not the function itself. So, what is the function? Recalling that the function is the algorithm used to obtain the graph, and that the graph was generated by connecting the experimental points by linear interpolation, we conclude that the function is the algorithm that one uses to interpolate the values vk that you measured. ◦ Comment. Note that, here, to draw the graph of the function we do not even need the mathematical expression for the function, we only need to know that it is given by a sequence of segments. Hence, a pencil and a ruler are all we need. However, a graph is a graph and a function is a function! For the latter, a ruler won’t do. One needs the algorithm that does exactly what the ruler does approximately — again, linear interpolation (Eq. 6.31). [More on the subject in Subsection 6.8.2.] To further elaborate on this example, let us assume that you happen to observe that the points you measured lie approximately on a curve described by the following equation v(x) = 10 1 − (1 − x/10)2 , for x ∈ [0, 10]; = 10,
for x ∈ [10, 100].
(6.6)
210
Part I. The beginning – Pre–calculus
This relationship between x and v, denoted by v(x), is now the function defined in Eq. 6.6. Again the algorithm that defines the function is provided piecewise, in the sense that the specific expression used for x ∈ [0, 10] differs from that for x ∈ [10, 100].
Fig. 6.6 Velocity vs space
The graph of such a function is presented in Fig. 6.6. Again the abscissas represent the distance x from the starting line; the ordinates represent the speed v(x) that the runner has at the location x. Note again that the representation of v(x), as presented in Fig. 6.6, is not the function itself, it is only the graph of the function v(x). The function is the algorithm given in Eq. 6.6. [Recall one more time that the expression y = v(x) indicates that, when x takes a specific value, say x = x∗ , the use of the algorithm corresponding to the function v(x) yields the corresponding value for y, namely y∗ = v(x∗ ). For the specific function in Eq. 6.6, the value x∗ = 2 yields v∗ = 6 (namely, after 2 meters, the speed is 6 meters per second), as you may verify. Similarly, the value x∗ = 4 yields v∗ = 8, whereas for x∗ ≥ 10 we have v∗ = 10.] Remark 48. If the speed of our runner was constant and equal to v = 10 m/s, he/she would cover the one–hundred–meter distance in 10 s, slightly above the absolute world record. However, it will take him/her a bit longer, because of the lower speed during the first ten meters. The exact time it takes this runner to cover the one hundred meter distance will be discussed in Subsection 11.1.2 (see in particular Remark 91, p. 452), after introducing additional mathematical know–how, covered in Chapters 9 and 10.
211
6. Real functions of a real variable
6.1.5 Even and odd functions
♣
For future reference, consider the following Definition 112 (Even and odd functions). A function f (x), defined over a symmetric interval (−a, a), is called even (or symmetric), iff f (−x) = f (x).
(6.7)
A function f (x), defined over a symmetric interval (−a, a), is called odd (or antisymmetric), iff f (−x) = −f (x).
(6.8)
Some examples might help. The function y = x2 (see Fig. 6.2, p. 206) is even and so are all the even powers of x, whereas the function y = x3 (see Fig. 6.3, p. 206) is odd and so are all the odd powers of x, as you may verify. ◦ Comment. It is easy to see from the definition that the graph of any even function is symmetric with respect to the y-axis (namely the branch on the left side of the graph is the mirror image of that on the right side). On the other hand, for any odd function, to obtain the left branch of the graph from the right one, one has to perform a double reflection: the first around the y-axis (akin to the even functions), and then a second one around the x-axis, so as to change the sign of the function for x < 0. [Alternatively, one may perform a rotation of 180◦ around the origin (axisymmetry, namely symmetry with respect to the origin).] Iff f (x) is odd, Eq. 6.8 implies f (0) = −f (0) = 0,
(6.9)
since 0 is the only number equal to its opposite. Following what we did in Subsection 3.4.3 for symmetric and antisymmetric matrices, we have that any function f (x), defined over a symmetric interval (−a, a), may be decomposed into its even and odd portions, as f (x) = fE(x) + fO(x),
(6.10)
where fE(x) =
1 2
f (x) + f (−x)
We have the following
and
fO(x) =
1 2
f (x) − f (−x) .
(6.11)
212
Part I. The beginning – Pre–calculus
Theorem 62. The product of two even functions is an even function. The product of two odd functions is an even function. The product of an even function by an odd function is an odd function. ◦ Proof : This theorem is an immediate consequence of Eqs. 6.7 and 6.8. For instance, x2 x3 (even times odd) equals x5 (odd), whereas x3 x5 (odd times odd) equals x8 (even). [If you are not clear about the rule for the product of powers, just wai. We’ll address it in Eq. 6.91.]
6.2 Linear independence of functions In this section, we present the definitions of linear combination and linear dependence/independence of functions. As stated in the introductory remarks of this chapter, these definitions are virtually identical to those for vectors and matrices, which were introduced in Section 3.3. Nonetheless, for future reference, it won’t hurt to restate the same concepts for real functions of a real variable. In addition, to illustrate the meaning of linear independence, we show that the first few natural powers of x (namely 1, x and x2 ) are linearly independent (Subsection 6.2.1). ◦ Warning. It should be emphasized that what we say here for functions may be applied to more general mathematical entities, provided that, for these entities, we have: (i) defined the product by a scalar and the sum, and (ii) introduced the zero entity. In other words, the considerations in this section are applicable not only to functions (as well as vectors and matrices); they could also be used for other mathematical entities as well, entities that we have not yet introduced. [Indeed, later in this volume, we will introduce related concepts for operators (Subsection 9.8.2), physicists’ vectors (Chapter 15) and tensors (Section 23.9).]
• Linear combinations of functions Akin to what we did in Subsection 3.2.2 for vectors and matrices, we have the following Definition 113 (Linear combination of functions). By linear combination of n given functions, f1 (x), . . . , fn (x), with n given scalars c1 , . . . , cn , we mean the following expression n # k=1
ck fk (x) = c1 f1 (x) + · · · + cn fn (x),
(6.12)
213
6. Real functions of a real variable
namely the sum of the products of each function by a coefficient.
• Linear dependence and independence of functions Consider the following Definition 114 (Zero function). Consider the function that vanishes at every point of its domain of definition R. This is called the zero function and is denoted by 0(x). Specifically, we have for all x ∈ R.
0(x) = 0,
(6.13)
A nonzero function is any function that differs from the zero function. Then, akin to what we did for vectors and matrices in Section 3.3, we have the following (compare to the equivalent Eq. 3.35 for vectors) Definition 115 (Linear dependence of functions). Consider the nonzero functions f1 (x), . . . , fn (x), which have a common domain of definition R. These functions are called linearly dependent iff there exists a nontrivial linear combination (namely a linear combination with coefficients ck not all equal to zero) such that n #
ck fk (x) = 0,
(6.14)
k=1
for every x ∈ R. In analogy with Eq. 3.36 for vectors, at least one of the coefficients, say cp , must differ from zero. Then, Eq. 6.14 may be used to express fp (x) as a linear combination of the others, as n # ck fp (x) = − fk (x). cp
(6.15)
k=1
(k=p)
We also have the following Definition 116 (Linear independence of functions). The nonzero functions fk (x) (k = 1, . . . , n), with a common domain of definition R, are called linearly independent iff they are not linearly dependent, namely iff n # k=1
ck fk (x) = 0,
(6.16)
214
Part I. The beginning – Pre–calculus
for every x ∈ R, implies necessarily ck = 0 (k = 1, . . . , n). Then, we have the following Theorem 63 (Equating coefficients). Given a set of n nonzero linearly independent functions fk (x) (k = 1, . . . , n), with a common domain of definition R, if we have that n #
ak fk (x) =
k=1
n #
bk fk (x),
(6.17)
k=1
for every x ∈ R, then we have, necessarily, ak = bk
(k = 1, . . . , n).
(6.18)
◦ Proof : Equation 6.17 may be written as n #
(ak − bk ) fk (x) = 0,
(6.19)
k=1
which implies ak − bk = 0 (in agreement with Eq. 6.18), because of the linear independence of the fk (x). Similarly, you may show that, if n # k=1
ak fk (x) +
n # k=1
bk fk (x) =
n #
ck fk (x),
(6.20)
k=1
for all values of x in the common domain of definition, then necessarily a k + bk = c k
(k = 1, . . . , n),
(6.21)
provided of course that the function fk (x) are linearly independent.
6.2.1 Linear independence of 1, x and x2
♥
As an illustrative example, consider the linear independence of the first three natural powers of x, namely 1, x and x2 . ◦ Comment. I am not addressing the linear independence of the first n powers of x, with n > 2, because its proof with the know–how available here is complicated even for n = 3. For, we have to resort to the Gaussian elimination technique, because we have not introduced the determinants on n × n matrices, for n > 3. [You’ll be glad to know that a simple proof is
6. Real functions of a real variable
215
presented in Subsection 6.3.5. However, as we will see, such proof is indirect, whereas that in this section is direct (namely based upon the definition), and as such it is presented primarily to give you a better understanding of the meaning of linear independence of functions.] We have the following Theorem 64. The powers 1, x, and x2 are linearly independent. ◦ Proof : By definition of linear independence of functions (Definition 116, p. 213), this is equivalent to showing that, if c0 +c1 x+c2 x2 = 0 for all the values of x, then, necessarily, we must have c0 = c1 = c2 = 0. Indeed, it is sufficient to show this for just three different values of x. [For, if c0 = c1 = c2 = 0 is true for just three values, it will be automatically true if we have more than three values, even for an infinite number of values.] Accordingly, consider c0 + c1 xk + c2 x2k = 0 (k = 1, 2, 3), where x1 , x2 and x3 differ from each other. These three equations may be written as ⎤⎧ ⎫ ⎧ ⎫ ⎡ 1 x1 x21 ⎨c0 ⎬ ⎨0⎬ (6.22) X c := ⎣ 1 x2 x22 ⎦ c1 = 0 . ⎩ ⎭ ⎩ ⎭ 0 1 x3 x23 c2 To show that c = 0, it is sufficient to show that the determinant of X does not vanish (Theorem 40, p. 139, Item 1). Indeed, using the Sarrus rule (Eq. 3.166), we have |X| = x21 x3 + x22 x1 + x23 x2 − x21 x2 − x22 x3 − x23 x1 .
(6.23)
On the other hand, we have (x1 − x2 ) (x2 − x3 ) (x3 − x1 ) = x1 x2 x3 + x21 x3 + x22 x1 + x23 x2 −x21 x2 − x22 x3 − x23 x1 − x1 x2 x3 = |X|,
(6.24)
which shows that the determinant does not vanish, since x1 , x2 and x3 differ from each other.
6.3 Polynomials In this section, we consider polynomials, namely arbitrary linear combinations of natural powers. Specifically, we have the following (recall Definition 16, p. 37, of natural powers of x, namely 1, x, x2 , . . . )
216
Part I. The beginning – Pre–calculus
Definition 117 (Polynomial). A linear combination of the first n + 1 natural powers of x (namely of xk , with k = 0, . . . , n), is called a polynomial of degree n. Specifically, the polynomial pn (x) is given by pn (x) := a0 + a1 x + a2 x2 + · · · + an xn =
n #
ak xk ,
(6.25)
k=0
where an = 0 and n ∈ [0, ∞). [In the above summation, we have x0 = 1 for any x = 0, and hence we assume 00 = 1 (see Remark 9, p. 37).] The highest– degree term, an xn , is called the leading term of the polynomial, whereas an is called its leading coefficient. The polynomial with n = 0 (a constant, a0 ) is called the zeroth–degree polynomial. The zero function 0(x) is a zeroth– degree polynomial with a0 = 0, and is also referred to as the zero polynomial. ◦ Warning. I avoid the symbol Pn (x), because in this book it is used for the Legendre polynomials. ◦ Comment. If the sum includes only one term, the expression in Eq. 6.25 is also referred to as a monomial. [This is true even for polynomials with more than one independent variable. For instance, Cxp y q is a monomial.] Hence, a polynomial consists of the sum of one or more monomials. In the remainder of this section, we address the relationship between polynomials and geometric figures, specifically, that between: (i) the zeroth– degree polynomial and a constant function (Subsection 6.3.1), (ii) first–degree polynomials and straight lines (Subsection 6.3.2), and (iii) second–degree polynomials and parabolas (Subsection 6.3.3). Then, we consider some general properties of polynomial and their roots (Subsection 6.3.4). Finally, we introduce the Vandermonde matrix and use this to address the problem of the linear independence of the natural powers xn , for any value of n ≥ 0 (Subsection 6.3.5).
6.3.1 Zeroth–degree polynomials The zeroth–degree polynomial consists simply of a constant. Its graph consists of a horizontal line (Fig. 6.7).
6.3.2 First–degree polynomials Consider a first–degree polynomial, which is given by y(x) = a0 + a1 x and is typically written as
217
6. Real functions of a real variable
Fig. 6.7 The function y = 7.5
Fig. 6.8 The function y = .7 x + 2
y(x) = m x + b,
(6.26)
where b is the y-intercept of the line (namely the value that y takes when x = 0). The graph of this type of function is presented in Fig. 6.8. ◦ Warning. As you can see, the graph represents a straight line. Indeed, in analytic geometry, a straight line is defined as the graph of a first–order polynomial (Definition 139, p. 262). [There is an exception, though. Vertical lines correspond to the equation x = constant!] On the other hand, in Euclidean geometry, a straight line is defined in Definition 80, p. 176. The two definitions are reconciled in Subsection 7.3.1.
• Root of a linear equation You might wonder under what conditions the straight line in Eq. 6.26 intersects the x-axis, namely the line y = 0. The solution to this problem is obtained by satisfying simultaneously the following two equations y = m x+b and y = 0. Combining, we obtain the following equation m x + b = 0.
(6.27)
This is a linear algebraic equation, whose solution is (Subsection 1.1.1)
x = −b/m.
(6.28)
218
Part I. The beginning – Pre–calculus
• Slope and average slope Here, we introduce the notion of slope, which is used later on. If you are driving your car along a highway and read a sign that says “Slope = 15%,” you might want to know what that means. “Slope = 15%” means that the highway altitude above sea level increases (or decreases) by 15% of the horizontal distance covered. With this standard definition, a slope of 100% means that the hill makes a 45◦ angle with respect to the horizontal plane. Indeed, for a line at a 45◦ angle, the vertical displacement equals the horizontal one. Remark 49. This is contrary to the (incorrect) belief held by many people that the slope is the increase in altitude divided by the length of the road covered. Had we used this (nonstandard) definition for the slope, a 100% slope would have meant an angle of 90◦ (a vertical cliff!), since, on a vertical cliff (and only on a vertical cliff), the increase in altitude equals the length of the path covered. With the standard definition, a vertical cliff corresponds to as an infinite slope. Accordingly, for future reference, consider the following Definition 118 (Average slope of the graph of a function). The average slope, m1,2 , of the graph of any function y = f (t) between two points P1 = (x1 , y1 ) and P2 = (x2 , y2 ) is given by the increase in y divided by the increase in x, namely (Fig. 6.9) m1,2 =
y2 − y 1 . x2 − x1
(6.29)
Fig. 6.9 Average slope
The average slope of the function y = m x + b (see Eq. 6.26) equals
219
6. Real functions of a real variable
y2 − y1 (m x2 + b) − (m x1 + b) = = m, x2 − x1 x2 − x1
(6.30)
independently of the values of x1 and x2 . Thus, the average slope of a first– degree polynomial is independent of x1 and x2 .
• Linear interpolation/extrapolation Consider the function f (x) =
x − x2 x − x1 y2 − y 1 x2 y1 − x 1 y2 y1 + y2 = x+ . x1 − x2 x2 − x1 x2 − x1 x2 − x1
(6.31)
This equation represents a first–degree polynomial. Indeed, it corresponds to Eq. 6.26, with m = (y2 − y1 )/(x2 − x1 ) and b = (x2 y1 − x1 y2 )/(x2 − x1 ). In addition, the first equality in the above equation shows that f (x1 ) = y1 and f (x2 ) = y2 . Thus, Eq. 6.31 represents a linearly–varying line, namely a straight line, through the points P1 = (x1 , y1 ) and P2 = (x2 , y2 ). Accordingly, if x ∈ (x1 , x2 ), Eq. 6.31 is called a linear interpolation between P1 and P2 . It is called a linear extrapolation if x ∈ / (x1 , x2 ).
6.3.3 Second–degree polynomials. Quadratic equations The second–degree polynomial y = a0 + a1 x + a2 x2 , typically written as y = a x2 + b x + c,
(6.32)
is also frequently encountered in this book. The corresponding graph is known as a vertical parabola. [More on parabolas in Subsection 7.7.2.] As an example, the graph of the function y(x) = 2 x2 − 4 x + 1 is shown in Fig. 6.10. Note that the parabola is symmetric with respect to the vertical line x = 1. Indeed, it may be written as y(x) = 2 (x − 1)2 − 1. ◦ Comment. Note also that the parabola in Fig. 6.10 curves upwards. [The proper terminology is convex (or concave upward), a term that is introduced later (see Definition 192, p. 399).] This is due to the fact that a > 0. For a < 0, the parabola curves downwards.
220
Part I. The beginning – Pre–calculus
Fig. 6.10 The function y(x) = 2 x2 − 4 x + 1 = 2 (x − 1)2 − 1
• Quadratic equations As in the case of a straight line, you might wonder under what conditions the parabola in Eq. 6.32 intersects the x-axis, namely the line y = 0. The solution to this problem is obtained by satisfying simultaneously the following two equations y = a x2 +b x+c and y = 0. Combining, we obtain a x2 +b x+c = 0. As you might have seen in high school, this corresponds to the following5 Definition 119 (Quadratic equation). An equation of the type a z2 + b z + c = 0
(6.33)
is called a quadratic algebraic equation, or simply a quadratic equation. ◦ Warning. Here, I am using z instead of x, because in general the solutions to Eq. 6.33 (called the roots of the quadratic equation) may be complex, and z is the letter typically used to denote complex numbers. Indeed, in this entire section, we never state that the roots must be real. In fact, all the material presented here is valid for real as well as complex roots. [This is a good time to go back and read Section 1.7, on imaginary and complex numbers.]
• Roots of a quadratic equation In order to find the solutions to Eq. 6.33, let us rewrite it as 2 b b2 a z2 + b z + c = a z + = 0, +c− 2a 4a
(6.34)
5 Quadratic equations have been around for over a millennium. The al–Khw¯ arizm¯ı book (Ref. [3]), already includes the analysis of quadratic equations (Footnote 2, p. 12).
221
6. Real functions of a real variable
or 2 b2 − 4 a c b = . z+ 2a 4 a2
(6.35)
This yields the two solutions (roots) of Eq. 6.33, namely (use Definition 17, p. 38, of square roots) √ −b ± b2 − 4 a c z = z± := . (6.36) 2a ◦ Comment. Let us verify that our result is correct. Note that, for any pair of complex numbers A and B, we have (A + B) (A − B) = A2 − B 2 ,
(6.37)
as you may verify. Therefore, we have a (z − z+)(z − z−) b2 b2 b c c b − z+ + − − =a z+ 2a 4a2 a 2a 4a2 a 2 2 b c b , (6.38) − + =a z+ 2a 4 a2 a or, expanding and simplifying, a (z − z+) (z − z−) = a z 2 + b z + c.
(6.39)
Accordingly, if z takes the values z+ or z−, the left side of the above equation vanishes, and hence the quadratic equation is satisfied. From Eq. 6.39, we uncover an important relationship between the coefficients of a quadratic equation and its roots. Specifically, expanding Eq. 6.39 and equating the coefficients on two sides of the equation, we obtain z+ + z− = −b/a
and
z+z− = c/a.
(6.40)
[The first is also an immediate consequence of Eq. 6.36.]
• Alternate expression for the two roots Exploiting the second in Eq. 6.40, we may obtain another expression for the roots, namely
222
Part I. The beginning – Pre–calculus
z± =
2c √ . −b ∓ b2 − 4 a c
(6.41)
[This expression may be obtained also by noting that Eq. 6.33 implies c w2 + b w + a = 0,
(6.42)
with w = 1/z. Using Eq. 6.36 with a and c interchanged, one obtains that the two roots of this equation are given by √ 1 −b ± b2 − 4 a c w = w± := = , (6.43) 2c z∓ in agreement with Eq. 6.41.]
• Discriminant. Its role Let us consider the quantity Δ := b2 − 4 a c,
(6.44)
which is called the discriminant of the quadratic equation, since it discriminates between roots of a different type. Indeed, if Δ > 0, the two roots are real and distinct. On the other hand, if Δ < 0 the two roots form a pair of complex conjugate numbers. Finally, if Δ = 0, we only have one root, namely x = −b/(2a). Indeed, Δ = 0 implies c = b2 /(4a), and hence a x 2 + b x + c = a x2 + b x +
2 b b2 =a x+ = 0. 4a 2a
(6.45)
As mathematicians prefer to say, if Δ = 0 the two roots coincide, namely we have a double root. [Indeed, if we divide ax2 + bx + c = 0 by x + b/(2a), we are left with a[x + b/(2a)] = 0, which indicates that there is a second root which is again x = −b/(2a). This issue is addressed in greater depth in Definition 121, p. 223, on repeated roots.] ◦ Comment. The above considerations have a geometrical interpretation. If Δ > 0, the graph of the parabola y = ax2 + bx + c intersects the real axis, namely the x-axis is a secant (Section 5.3). According √ to Eq. 6.36, the two intersection points are given by x = x± := (−b ± b2 − 4ac )/(2a). On the other hand, if Δ = 0, the real axis is tangent to the parabola y = ax2 +bx+c, with x = −b/(2a) being the tangency point, which may be considered as a double intersection point (namely two intersection points that have merged
223
6. Real functions of a real variable
into one). Finally, if Δ < 0, the graph of the parabola does not intersect the x-axis, or, as mathematicians prefer to say, the two intersections are complex (in sense of not being real points). ◦ Comment. Here, we begin to see the advantage of using: (i) complex numbers and (ii) repeated roots. By doing this, for a quadratic equation we always have two roots. [As we will see in Vol. II, polynomials of degree n always have n roots (see also Corollary 2, p. 225, and Remark 50, p. 225.]
6.3.4 Polynomials of degree n
♣
Having introduced the roots of a polynomial of degree two, namely the roots of a quadratic equation, we can broaden our results and discuss the roots of a polynomial of degree n. We have the following Definition 120 (Roots of a polynomial). A polynomial of degree n is called divisible by x − c, iff it may be written as pn (x) = (x − c) pn−1 (x),
(6.46)
where pn−1 (x) denotes a suitable polynomial of degree n−1. Correspondingly, c is called a root of the polynomial pn (x). We have the following Definition 121 (Multiple roots of a polynomial). We say that c is a multiple (or repeated ) root of pn (x) with multiplicity m iff pn (x) is divisible by (x − c)m , namely iff we can write pn (x) = (x − c)m pn−m (x),
(6.47)
where pn−m (x) is a suitable polynomial of degree n − m. [You may revisit the case of double roots for a quadratic equation with discriminant equal to zero (Eq. 6.45).] A root with multiplicity m = 1 is called a simple root.
• Properties of roots To prove the lemma that follows, we need a nice trick I had learned in high school, namely that (Ap − B p ) = (Ap−1 + Ap−2 B + Ap−3 B 2 + · · · + B p−1 )(A − B),
(6.48)
224
Part I. The beginning – Pre–calculus
where A and B are any two complex numbers. [For n = 2 we recover Eq. 6.37, namely A2 − B 2 = (A + B) (A − B).] Indeed, we have (Ap−1 + Ap−2 B + Ap−3 B 2 + · · · + A B p−2 + B p−1 ) (A − B) = (Ap + Ap−1 B + Ap−2 B 2 + · · · + A2 B p−2 + A B p−1 ) − (Ap−1 B + Ap−2 B 2 + · · · + A2 B p−2 + A B p−1 + B p ) = Ap − B p .
(6.49)
We have the following Lemma 8. The polynomial pn (x) is divisible by x − c (namely c is a root of the polynomial) iff pn (c) = 0. ◦ Proof : ♠ If pn (x) is divisible by x − c, then Eq. 6.46 yields immediately pn (c) = 0. Vice versa, if pn (c) = 0, we have pn (x) = pn (x) − pn (c) =
n #
n # ak xk − ck = (x − c) ak qk (x),
k=0
(6.50)
k=1
where q1 (x) = 1, whereas (use Eq. 6.48) qk (x) = xk−1 + xk−2 c + xk−3 c2 + · · · + x ck−2 + ck−1
k>1 ,
(6.51)
is a polynomial of order k − 1. Equation 6.50 may be written as pn (x) = (x − c) pn−1 (x), where pn−1 (x) :=
"n
k=1
ak qk (x). Hence, pn (x) is divisible by x − c.
(6.52)
◦ Comment. Given the importance of the lemma, I’ll give you an alternate proof of its second part. Set x = c + y to obtain pn (x) =
n #
ak (c + y)k =
k=0
n #
bk y k =: p˘n (y),
(6.53)
k=0
where the coefficients bk may be obtained in terms of the coefficients ak . In "n particular, we have that b0 = p˘n (0) = k=0 ak ck = pn (c), which vanishes by hypothesis. Hence, p˘n (y) = y p˘n−1 (y), for some p˘n−1 (y). Back substituting y = x − c, we have p(x) = (x − c) pn−1 (x), in agreement with Eq. 6.52. We have the following Theorem 65. If pn (xk ) = 0 (k = 1, . . . , q), where pn (x) := a0 + a1 x + a2 x2 + · · · + an xn and xk are not necessarily distinct, then
(6.54)
6. Real functions of a real variable
pn (x) := (x − x1 ) (x − x2 ) . . . (x − xr ) pn−q (x),
225
(6.55)
where pn−q (x) denotes a suitable polynomial of degree n − q. ◦ Proof : Since pn (x1 ) = 0, pn (x) is divisible by x1 (Lemma 8), and Eq. 6.52 yields pn (x) = (x − x1 ) pn−1 (x), for some suitable polynomial pn−1 (x) of degree n − 1. Assume first that we have only distinct roots. Hence, x2 = x1 . Then, the fact that pn (x2 ) = 0 implies pn−1 (x2 ) = 0. Hence, still according to Lemma 8 above, we have pn−1 (x) = (x − x2 ) pn−2 (x), for some suitable polynomial pn−2 (x) of degree n − 2. We can continue this way, until we arrive at pn−q+1 (x) = (x−xq ) pn−q (x), for some polynomial pn−q (x) of degree n−q. Combining, we obtain Eq. 6.55. In order to complete the theorem, assume that x1 is a root of multiplicity m. Then, we have, by definition, pn (x) = (x − x1 )m pn−m (x), and the above procedure may be repeated with minor modifications. [You are encouraged to fill in the details.] We have the following (this generalizes Eq. 6.39, limited to quadratic polynomials) Theorem 66. If the polynomial pn (x) := an xn + · · · + a2 x2 + a1 x + a0
(6.56)
has n roots, not necessarily distinct, say xk with k = 1, . . . , n, then pn (x) = an (x − x1 ) (x − x2 ) . . . (x − xn ).
(6.57)
◦ Proof : We have pn (x) := (x − x1 ) (x − x2 ) . . . (x − xn ) p0 (x),
(6.58)
where the polynomial p0 (x) is a polynomial of degree zero (Theorem 65 above), namely a constant, say p0 (x) = A. Then, equating the leading coefficients of the right sides of Eqs. 6.56 and 6.58 (that is, an and A, respectively), one obtains A = an , in agreement with Eq. 6.57. We have the following Corollary 2. A polynomial pn (x) of degree n has at most n roots. ◦ Proof : Indeed, if pn (x) had n + 1 roots, it would be of degree n + 1. [Use Eq. 6.57.] Remark 50. For your information, the convenience of counting roots with their multiplicity is further supported by the fact that in the complex field a
226
Part I. The beginning – Pre–calculus
polynomial of degree n always has n roots (real or complex), provided that we count them with their multiplicity. [This result, known as the fundamental theorem of algebra, is addressed in Vol. II. Its prof stems directly from Eq. 6.46 with a subtle addition — we have to prove that any polynomial has at least one root.] ◦ Comment. In view of Lemma 8, I could have defined c to be a root of pn (x) iff pn (c) = 0. However, Definition 120, p. 223, is preferable because it simplifies the discussion of multiple roots (Definition 121, p. 223).
6.3.5 Linear independence of 1, x,. . . , xn
♣
We conclude this section by addressing the issue regarding the linear independence of the natural powers of xk (k = 0, 1, . . . , n), for an arbitrarily large value of n ≥ 0. [We have seen this for the limited case n = 2 in Subsection 6.2.1.] Consider the following6 Definition 122 (Vandermonde matrix). The (n + 1) × (n + 1) matrix ⎡ ⎤ 1 x0 x20 · · · xn0 ⎢ 1 x1 x21 · · · xn1 ⎥ ⎢ ⎥ 2 n⎥ V := ⎢ (6.59) ⎢ 1 x2 x 2 · · · x 2 ⎥ ⎣· · · · · · · · · · · · · · · ⎦ 1 xn x2n · · · xnn is called the Vandermonde matrix. We have the following Lemma 9. The rows of the Vandermonde matrix are linearly independent, provided that x0 , . . . , xn differ from each other. "n ◦ Proof : Note that the equations k=0 ak xkj = 0 (j = 0, 1, . . . , n), with aj arbitrary, imply necessarily that ak = 0, with k = 0, 1, . . . , n. [For, if this "n were not the case, the n-th degree polynomial k=0 ak xk would have n + 1 roots, namely x0 , x1 , . . . , xn , in contradiction to Corollary 2, p. 225.] We can state this fact as follows: the only solution to the system 6
Named after Alexandre–Th´ eophile Vandermonde (1735–1796), a French mathematician, musician and chemist.
227
6. Real functions of a real variable
⎡
1 ⎢ 1 ⎢ ⎢ 1 ⎢ ⎣··· 1
x0 x1 x2 ··· xn
x20 x21 x22 ··· x2n
··· ··· ··· ··· ···
⎤⎧ ⎫ ⎧ ⎫ 0⎪ xn0 ⎪ a0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ xn1 ⎥ a ⎬ ⎨ 1⎬ ⎨ 0 ⎪ ⎥ n⎥ 0 x 2 ⎥ a2 = ⎪ ⎪ ⎪ ⎪ ⎪ ⎪· · · ⎪ · · · ⎦⎪ ⎪· · · ⎪ ⎪ ⎪ ⎪ ⎪ ⎭ ⎪ ⎩ ⎪ ⎭ n ⎩ 0 xn an
(6.60)
is the trivial one, a0 = a1 = · · · = an = 0. Hence, using Theorem 25, p. 118, on homogeneous n × n systems, we obtain that the rows of the Vandermonde matrix are linearly independent. Finally, we are now in a position to prove the linear independence of 1, x, . . . , xn , with n arbitrarily large. We have the following Theorem 67 (Linear independence of the natural powers of x). The natural powers 1, x, . . . , xn , with n arbitrarily large, are linearly independent. ◦ Proof : By definition of linear independence of functions (Definition 116, p. 213), this theorem is fully equivalent to stating that the coefficients ak "n k (k = 0, 1, . . . , n) necessarily vanish whenever k=0 ak x = 0 for all the values of x. It should be noted that it is sufficient to ascertain that this is true just for any n + 1 distinct values of x. [For, if ak = 0 (k = 0, 1, . . . , n) for any distinct n + 1 values, it will be all the more true for more than n + 1 values, even for infinite values, since the additional conditions would be automatically satisfied (redundant equations).] Accordingly, this theorem is a direct consequence of the preceding lemma. Remark 51. In Subsection 6.2.1, the definition of linear independence of functions was illustrated by showing that the first three natural powers of x, namely 1, x and x2 , are linearly independent. The proof presented here is not only less cumbersome and more elegant, but also valid for n arbitrarily large. The proof in Subsection 6.2.1, however, is more gut–level, since it is directly based upon the definition of linear independence of functions. Indeed, as stated there, the primary objective there was to help you understanding the meaning of such a definition. Here, we have learned another important lesson regarding mathematics: “Sometimes a proof may be obtained more easily if we approach the problem indirectly.”
6.4 Trigonometric functions In this section, we introduce the traditional trigonometric functions, namely sin x, cos x and tan x, which you may have encountered in high school.7 7
Trigonometry denotes the branch of geometry dedicated to the study of the relationships between sides and angles of a triangle. The term comes from ancient Greek:
228
Part I. The beginning – Pre–calculus
Specifically, here we define these functions (Subsection 6.4.1), and then present their properties (Subsection 6.4.2). Next, we discuss ways to estimate π (Subsection 6.4.3), as well as a closely related important inequality, namely Eq. 6.89 (Subsection 6.4.4). Finally, we show how to evaluate the area of a circle (Subsections 6.4.5).
6.4.1 Sine, cosine and tangent Consider Fig. 6.11 below, specifically the horizontal ray that emanates from the center O of a circle C, and goes through a point Q ∈ C, along with a second ray, from O through P . Let θ denote the angle (in radians) between the two
Fig. 6.11 Trigonometric functions
rays. [Here, we use θ, instead of x, as the argument of the trigonometric functions in order to avoid confusion between the angle θ and the abscissa x in Fig. 6.11.] Let A be a point of the segment OQ such that the segment AP is perpendicular to the segment OQ. For θ ∈ [0, π/2],
(6.61)
we have the following definitions (see again Fig. 6.11) τ ριγωνoν (trigonon, triangle) and μ τ ρoν (metron, measure). The term sine comes from Latin sinus (bosom, fold, bay), a Latin mistranslation of the Arabic word jiba, which is a transliteration of the Sanskrit word jya–ardha (half the chord), or jya; jiba was mistakenly understood as jayb, indeed bosom, fold, bay. The term cosine was originally written co– sine, short for sinus complementi, Latin for sine of the complementary angle (namely the complement to π/2). The term tangent stems from the fact that the trigonometric tangent is measured along the line tangent to the circle at Q (see Fig. 6.11).
229
6. Real functions of a real variable
AP , OP OA cos θ := , OP QB AP sin θ . tan θ := = = cos θ OQ OA
sin θ :=
(6.62)
◦ Comment. Note that the considerations on similar triangles presented in Subsection 5.5.2 are essential for the definitions of the trigonometric functions, in that they show that the definitions are independent of the size of the triangle under consideration, and depends only upon the angle θ. In particular, for the second equality in the definition of tan x, we use the fact that the triangles OAP and OQB are similar, and hence the ratios of the corresponding sides are equal. The definitions in Eq. 6.62 are limited to θ ∈ [0, π/2] (Eq. 6.61). Next, we extend the definitions to θ ∈ [−π, π]. To this end, let us introduce the x- and y-coordinates (Fig. 6.11). Then, instead of Eq. 6.62, we use the following Definition 123 (Trigonometric functions). The trigonometric functions, also called circular functions, namely sin θ (sine of θ), cos θ (cosine of θ), and tan θ (tangent of θ), are defined as follows: sin θ = yP /R, cos θ = xP /R, tan θ = yP /xP = sin θ/ cos θ
xP = 0 ,
(6.63)
where xP and yP denote abscissa and ordinate of the point P on the circle C of radius R, centered at O (Fig. 6.11). The graphs of these functions are presented in Fig. 6.12. [For future reference the line y = x is also depicted in the figure (thin slanted straight line through the origin).] As we wanted, these new definitions are not limited to θ ∈ [0, π/2], but are valid for θ ∈ (−π, π]. The difference with respect to Eq. 6.62 is that, contrary to AP and OA, the quantities x and y can be negative. Specifically, let us introduce the following Definition 124 (Quadrant). The first, second, third and fourth quadrants denote the following angular intervals, respectively: (1) (2) (3) (4)
θ θ θ θ
∈ (0, π/2); ∈ (π/2, π); ∈ (π, 3π/2), namely θ ∈ (−π, −π/2); ∈ (3π/2, 2π), namely θ ∈ (−π/2, 0).
230
Part I. The beginning – Pre–calculus
Fig. 6.12 The functions: (1) y1 = sin x, (2) y2 = cos x and (3) y3 = tan x
Accordingly, the definitions in Eq. 6.63 imply • sin θ > 0,
in the first and second quadrants;
< 0,
in the third and fourth quadrants.
• cos θ > 0,
in the first and fourth quadrants;
< 0,
in the second and third quadrants.
• tan θ > 0,
in the first and third quadrants;
< 0,
in the second and fourth quadrants.
For future reference, note that the first two in Eq. 6.63 may be written as xP = R sin θ
and
yP = R cos θ.
(6.64)
• Average slope vs tan x Here, we discuss the relationship between trigonometric tangent and average slope (Eq. 6.29). Consider the straight line between two points, say P1 and P2 (Fig. 6.9, p. 218). We have tan θ1,2 =
y2 − y 1 = m1,2 . x2 − x1
(6.65)
[Hint: See Eq. 6.29 and the third in Eq. 6.63.] Had we used the nonstandard definition of slope (namely change in altitude per distance covered, Remark 49, p. 218), the average slope would correspond to sin θ1,2 , as you may verify.
231
6. Real functions of a real variable
6.4.2 Properties of trigonometric functions Thus far, we have defined the trigonometric functions for θ ∈ [−π, π). Here, we want to extend the definitions to any value of θ ∈ (−∞, ∞). To this end, note that, if we increase (or decrease) the angle θ by k times 2π, where k denotes any positive or negative integer, we are back at the same angular location we started from. Correspondingly, the trigonometric functions take the same value, namely sin(θ + k 2 π) = sin θ, cos(θ + k 2 π) = cos θ, tan(θ + k 2 π) = tan θ,
(6.66)
with k = ±1, ±2, . . . . This way, the trigonometric functions are now defined for any value of θ ∈ (−∞, ∞), with the exception of tan θ, which is not defined for θ = (k + 21 )π (Eq. 6.63). In order to indicate this, let us introduce the following Definition 125 (Periodic function). A function is called periodic, with period Θ iff we have, for any θ, f (θ + Θ) = f (θ).
(6.67)
[Of course, Eq. 6.67 implies f (θ + k Θ) = f (θ),
where k = ±1, ±2, . . . ,
(6.68)
namely the function repeats itself if the argument of the function is increased (or decreased) by k times Θ.] Accordingly, we can say that the trigonometric functions are periodic with period 2π. We can do better! If we increase/decrease the angle θ simply by π, we see from the definition (see also Fig. 6.11) that sin(θ ± π) = − sin θ, cos(θ ± π) = − cos θ, tan(θ ± π) = tan θ.
(6.69)
◦ Comment. The last of the above equations indicates that the function tan θ is periodic with period π, whereas sin θ and cos θ are periodic with period 2π (Eq. 6.66). Another consequence of the definitions of sin θ, cos θ and tan θ is
232
Part I. The beginning – Pre–calculus
sin(−θ) = − sin θ, cos(−θ) = cos θ, tan(−θ) = − tan θ.
(6.70)
In other words, sin θ and tan θ are odd functions of θ, whereas cos θ is an even function, as apparent from Fig. 6.12. One more immediate consequence of the definitions of sin θ, cos θ and tan θ is sin(θ ± π/2) = ± cos θ, cos(θ ± π/2) = ∓ sin θ, −1 tan(θ ± π/2) = . tan θ
(6.71)
Indeed, assume θ ∈ (0, π/2). From Fig. 6.11, p. 228, we see that by going from θ to π/2 − θ (namely complementary angles) the role of sine and cosine are interchanged, namely that sin π/2 − θ = cos θ and cos π/2 − θ = sin θ. (6.72) These are equivalent to the first two in Eq. 6.71 (lower signs), because sin θ is an odd function of θ, whereas cos θ is an even one. [For the upper sign, use Eq. 6.69.] These identities are valid as well for θ ∈ / (0, π/2), as you may verify. [Hint: Use Fig. 6.11, p. 228, or Fig. 6.12, p. 230. Also, if you don’t mind waiting, you will be able to verify these results by using Eqs. 7.47 and 7.48, with β = ±π/2.]The third relationship is an immediate consequence of the first two.
• Convention for trigonometric functions notations We have the following Definition 126 (A convention). By universal convention (no matter how confusing it may be, especially in the beginning), sin2 θ stands for (sin θ)2 . The same rule holds of course for cos2 θ and tan2 θ. Similarly, we use sinn θ = (sin θ)n , cosn θ = (cos θ)n and tann θ = (tan θ)n , for any positive integer n. Indeed, often we write g n (x) to mean [g(x)]n . [It should be emphasized that the above notations cannot be used for n = −1. “Why?” you are going to ask me! Because there is another important and universally accepted convention, namely that g -1(θ) denotes the inverse function of g(θ), a notion that will be introduced in Section 6.6 (see Eq. 6.134).]
233
6. Real functions of a real variable
◦ Warning. I will put the argument of a trigonometric function in parentheses if the argument involves a + or a − sign, as in sin(α ± β), or a / sign, as in sin(kπx/). [No parenthesis for sin kπx, or sin kπx .] ◦ Comment. The following functions are never used in this book: the cosecant (reciprocal of sine):
csc x = 1/ sin x,
the secant (reciprocal of cosine):
sec x = 1/ cos x,
the cotangent (reciprocal of tangent):
cot x = 1/ tan x.
(6.73)
• An important relationship According to the Pythagorean theorem (Eq. 5.6), the equation of the circle is given by x2 + y 2 = R2 .
(6.74)
Combining with Eq. 6.63, one obtains the very important relationship sin2 θ + cos2 θ = 1.
(6.75)
This implies 1 + tan2 θ = 1 +
cos2 θ + sin2 θ 1 sin2 θ = = . 2 2 cos θ cos θ cos2 θ
(6.76)
◦ Comment.♥ To obtain a geometrical interpretation of this equation, at least for θ ∈ (0, π/2), note that the triangles OAP and OQB in Fig. 6.11, p. 228, are similar. Thus, we have OB/OQ = OP /OA. This implies, for a unit circle (for which we have OP = OQ = 1), OB = 1/OA = 1/ cos θ. On the other hand, applying the Pythagorean theorem to the right triangle OQB, 2 still for OQ = 1, we have OB = 1 + tan2 θ. Comparing the two results, one obtains Eq. 6.76.
• Trigonometric functions for specific values of θ For certain specific values of the argument θ, we are able to give the corresponding values that the trigonometric functions attain. For instance, the definitions imply sin 0 = 0,
cos 0 = 1,
tan 0 = 0,
(6.77)
234
Part I. The beginning – Pre–calculus
and sin(±π/2) = ±1
and
cos(±π/2) = 0.
(6.78)
On the other hand, as θ approaches ±π/2, we have that |tan θ| grows beyond any limit. In other words, as pointed out above, tan θ is not defined for θ = ±π/2. Moreover, we have (note that π and −π denote the same angular location) sin(±π) = 0,
cos(±π) = −1,
tan(±π) = 0.
(6.79)
Next, consider the values that the trigonometric functions take for θ = π/4. From the definition (tan θ = QB/OQ, Eq. 6.62), we have tan(π/4) = 1.
(6.80)
√ √ sin(π/4) = cos(π/4) = 1/ 2 = 2/2.
(6.81)
Accordingly, we have
[Hint: For the first equality, the fact that sin(π/4) equals cos(π/4) stems √ from geometrical considerations. For the second equality, the value 1/ 2 stems from sin2 θ + cos2 θ = 1 (Eq. 6.75).] Next, consider the values for θ = π/6 and θ = π/3 (namely for 30◦ and 60◦ , respectively). These are obtained from the Pythagorean theorem, a2 +b2 = c2 (Eq. 5.6), applied to equilateral triangles. [Recall that the interior angles of an equilateral triangle are equal to π/3 (namely 60◦ ).] Specifically, we have (see Fig. 6.13, where the sides of the triangle have length = 1) sin(π/6) =
1 /2 = . 2
(6.82)
√ Then, using cos θ = 1 − sin2 θ (Eq. 6.75), we have cos(π/3) = 3/2. On the other hand, for the complementary angle θ = π/3 (namely 60◦ ) the values of sine and cosine are interchanged (Eq. 6.72). This yields √ sin(π/3) = 3/2 and cos(π/3) = 1/2. (6.83) The values obtained above are presented in Table 6.1. For future reference, note that we also have (use Eqs. 6.66, 6.78 and 6.79) tan kπ = 0, (6.84) sin kπ = 0, cos (k + 12 )π = 0, where k = 0, ±1, ±2, . . . .
235
6. Real functions of a real variable
Fig. 6.13 The angle θ = π/6 θ
α◦
0
0◦
0
π/6
30◦
π/4
45◦
π/3
60◦
1/2 √ 2/2 √ 3/2
π/2
90◦
1
sin θ
cos θ
tan θ
1
0 √ 1/ 3
√ √
3/2
1/2
2/2
1 √ 3
0
–
Table 6.1 The values of cos θ, sin θ and tan θ, for certain specific values of θ
6.4.3 Bounds for π
♥
Here, we show how to obtain better and better estimates for π. To this end, we begin by showing that the circumference is larger than the perimeter of any inscribed polygon (namely a polygon inside the circle, with its vertices on the circle; see Fig. 6.14) and smaller than the perimeter of any circumscribed polygon (namely a polygon outside to the circle, with each of its sides tangent to the circle; see again Fig. 6.14). Then, we use these facts to obtain bounds for π, as accurate as we wish.
• Inscribed and circumscribed polygons Here, we present some basic facts regarding polygons that inscribe and circumscribe a circle. Specifically, we show that the perimeter of the inscribed (circumscribed) polygon is smaller (larger) than the circumference of the circle.
236
Part I. The beginning – Pre–calculus
Fig. 6.14 Bounds for π
Remark 52. Gut–level approach. To visualize the fact that the circumference is greater than the perimeter of any inscribed polygon, replace the circle with a thread ending in a slip–knot. Then, pull the thread through the slip–knot: the thread gets shorter and shorter, until it is fully in contact with the boundary of the inscribed polygon. Similarly, to visualize the fact that the circumference is smaller than the perimeter of any circumscribed polygon, replace the circumscribed polygon with the same thread and pull the thread through the slip–knot: the thread gets shorter and shorter, until it is fully in contact with the boundary of the circle. [Admittedly, the argument used here is quite simplistic from a mathematical point of view. It is not really a proof, but I believe that we can live with it. After all, you have to admit that it is much clearer, and simpler, than a rigorous geometrical proof. If you wish to be a bit more sophisticated, a brief outline of the proof is provided in the comment below.] ◦ Comment. Let us be a bit more sophisticated. For the sake of clarity, let us begin with an inscribed square as shown in Fig. 6.15. The fact that P Q is shorter than the arc of the circle from P to Q stems from the fact that P Q is a segment, namely the shortest line between two points. [Alternatively, divide the arc of the circle from B to P (Fig. 6.15) into portions so small that they may be approximated by segments. Consider their projection into a P Q (horizontal lines). The fact that the projection of a segment into a straight line is always shorter than the segment itself (Lemma 7, p. 197) is all we need to confirm that the inscribed polygon is shorter than the circle.] Turning to the circumscribed square, let us divide again the arc of the circle between P and B into portions so small that they may be approximated by segments. Then, consider the corresponding segments on the circumscribed polygon. Each of these segments is longer than the corresponding arcs, because of Lemma 7, p. 197, and because of the fact that the rays diverge (light gray area in Fig. 6.16). [Specifically, consider the unit circle in Fig. 6.16, so that
237
6. Real functions of a real variable
Fig. 6.15 On sin θn < θn < tan θn
Fig. 6.16 On proof that θn < tan θn
OR = OB = 1. Note that δ = Δy cos θ∗ and OR OB Δθ = = = cos θ∗ . δ OQ OQ
(6.85)
Combining, we obtain Δθ = δ cos θ∗ = Δy cos2 θ∗ < Δy, with any θ∗ ∈ (0, π/2).] This confirms that the circumscribed polygon is shorter than the circle. [Of course, these considerations are not limited to a square and may be repeated, nearly verbatim, for any polygon, regular or not (provided of course that θ∗ < π/2).] • Bounds for π Consider a circle with radius R = 1 and the circumscribed n-sided regular polygon. Let θn = π/n
(6.86)
denote half the angle between two rays from the center of the circle through two contiguous vertices of the polygon (such as the rays through H and A in Fig. 6.14, p. 236, where n = 8). Accordingly, the perimeter of the circumscribed regular polygon equals 2n tan θn . Similarly, the perimeter of the inscribed n-sided regular polygon equals 2n sin θn . Therefore, the fact that the circumference is longer than the perimeter of any circumscribed polygon and shorter than the perimeter of any inscribed polygon may be expressed as 2n sin θn < 2π < 2n tan θn , or n sin θn < π < n tan θn ,
(6.87)
238
Part I. The beginning – Pre–calculus
with θn = π/n. This provides us with bounds for π. For instance, if we consider a square, we have θn = π/4. Then, Eq. 6.87 √ yields: 2 2 < π < 4 (use Table 6.1, p. 235), namely 2.8284 · · · < π < 4. Next, to obtain a better estimate consider a regular hexagon (again, a polygon with six equal sides); we have θ = π/6√(namely α◦ = 30◦ ), and Eq. 6.87 yields (use Table 6.1, p. 235) 3 < π < 6/ 3, namely 3 < π < 3.464 1 . . . .
• Better and better
♥
Can we do any better? Yes, we can! In fact, according to Boyer (Ref. [13], p. 202): “The Pythagorean theorem alone suffices to give as accurate an approximation as may be desired,” because, if we know the perimeter for a polygon with n sides, we can easily obtain that for polygons with 2n sides. Indeed, consider a polygon, say a square. Set a = AQ, b = AP , c = QP , d = OA (Fig. 6.17). To obtain c from b, with R = OP = 1, we can use the following identities: d2 = 1 − b2 , a = 1 − d, c2 = a2 + b2 . [Of course, the procedure is not limited to squares. For, we do not take advantage of the fact that here d = b. We may start from any polygon, as you may verify.]
Fig. 6.17 Approximation of π from polygons
Specifically, we may apply it to a succession of regular polygons with a number of sides double of the preceding one, and obtain a quite highly accurate estimate for π. In particular, still according to Boyer (Ref. [13], p. 202), in China “in the third century Liu Hui [. . . ] derived the figure 3.14 by use of a regular polygon of 96 sides and the approximation 3.141 59 by considering a polygon of 3,072 sides.” [Note that 96 = 3 × 32 = 3 × 25 , whereas 3, 072 = 3 × 1, 024 = 3 × 210 .]
239
6. Real functions of a real variable
6.4.4 An important inequality
♣
Note that dividing Eq. 6.87 by n we obtain an interesting expression, namely sin θn < θn < tan θn ,
for θn ∈ (0, π/2),
(6.88)
with θn = π/n. In this subsection, we extend the above equation to an arbitrary value of θ ∈ (0, π/2), thereby obtaining an important inequality, namely sin θ < θ < tan θ,
for θ ∈ (0, π/2).
(6.89)
[This is shown in Fig. 6.12, p. 230, where the straight line y = x is depicted as a thin black line.] To this end, note that the visualization process used to illustrate Eq. 6.87 (Remark 52, p. 235, or the comment that follows it) needs not to be applied to a regular polygon. Any polygon will do. Accordingly, we may conclude that the construction in Fig. 6.15, p. 237, holds true for any irregular polygon, in which case the angle 2θ at O may differ from 2π/n. Thus, the first inequality in Eq. 6.89 is valid for any θ ∈ (0, π/2) (provided of course that θ < π/2). Similarly, for the circumscribed polygon, we may use the construction in Fig. 6.16, p. 237. The final result holds true even if the angle θ at O differs from π/n (provided of course that θ < π/2). Thus, also the second inequality in Eq. 6.89 is valid for any θ ∈ (0, π/2).
6.4.5 Area of a circle
♥
Here, we introduce a procedure to obtain the area of a circle having a radius equal to R. Let us begin with the area of the circumscribed polygons. Consider, for instance, the circumscribed octagon, like that depicted in Fig. 6.14, p. 236. We can obtain this area by adding the areas of the eight triangles OAB, OBC, . . . , OHA. The area of each of these isosceles triangle equals 1 2 R b, where b is the common length of the bases. Adding the contribution of the n triangles, we have that the area of the circumscribed octagon equals A = 12 R p, where p = 8 b denotes the perimeter of the octagon. In general, you may obtain that the area of any circumscribed polygon with n vertices equals A = 12 R p, where p = n b denotes the perimeter of the circumscribed polygon, b being the base length of each of the triangles that form the polygon. Similar results are obtained for the inscribed polygon, as you may verify.
240
Part I. The beginning – Pre–calculus
As we increase the number n, the circumscribed polygon becomes closer and closer to the inscribed polygon. On the other hand, the two perimeters bracket the circumference, which equals 2πR (by definition of π, Definition 86, p. 178). Thus, recalling Remark 11, p. 44, on “bracketing something, with a bracket that may be made as small as we wish,” we obtain that the area of a disk having a radius equal to R is given by A = π R2 ,
(6.90)
as you probably remember from high school.
6.5 Exponentiation. The functions xα and ax The operation of exponentiation aα consists in raising a number a, called the base, to the (not necessarily natural) power α, called the exponent. This is the basis to introduce two important functions, namely xα (the so–called real power of x, namely a power of a real number x with a real exponent α) and ax (a close cousin of the exponential function, y = exp x, which will be introduced in Eq. 12.23). Specifically, we have already defined an , where a is real and n is a natural number, as the number 1 multiplied n times by the number a (Eq. 1.53). Here, we want to extend this to the case in which the exponent is real. Our plan of action is as follows: in Subsections 6.5.1–6.5.4, we start with the properties of an , with n being a natural number, and go all the way to aα , with α being a real number. Then, having clarified the operation aα (exponentiation), in Subsection 6.5.5 we use this to introduce the above–mentioned functions of interest, namely y = xα and y = ax , as well as y = xx and y = x1/x . ◦ Warning. The material in this section is possibly the most meticulous and “detailish” portion of the entire book. Accordingly, “you are hereby authorized to skip it! ” Just take my word for any equation in this section. However, if you have an inquisitive mind and want to understand where these rules come from, I am not going to stop you. Go for it!
6.5.1 Exponentiation with natural exponent
♦
In this subsection, we define the operation aα , where a is an integer and address some of its properties. After reviewing the properties of np , with n
241
6. Real functions of a real variable
and p integers, we address an with n being a natural number, and a a real one.
• The operation pn , with p and n natural numbers
♦
You might have seen exponentiation aα , with a and α real, in high school. However, in my view (based primarily upon my experience as a parent), what is typically taught in high school is widely inadequate for the level of sophistication attained in this book — still a far cry from the level of sophistication deemed adequate by my mathematician friends. Thus, we might as well start from scratch. Accordingly, here we take a step back and review the operation pn , where p and n are both natural numbers. [The case an (with a real) is addressed in the subsubsection that follows, where all the proofs missing here are presented. Here, I just want to give you a gut–level presentation of the same ideas.] It should be relatively easy to convince yourself that pm pn = pm+n .
(6.91)
If you are not convinced, consider the case with p = 2, m = 3, and n = 4. On the left side of Eq. 6.91, we have 23 · 24 = 8 · 16 = 128, whereas on the right side we have 27 , which indeed equals 128. In fact, on the left side you have (2 · 2 · 2) (2 · 2 · 2 · 2), which equals 2 · 2 · 2 · 2 · 2 · 2 · 2 on the right side. [As mentioned above, this is just a gut–level presentation; a simple proof is given in the subsubsection that follows.] Similarly, you should also be able to convince yourself that (p q)n = pn q n .
(6.92)
If you are not convinced, let us consider the case with p = 2, q = 5, and n = 3. On the left side of Eq. 6.91, we have 103 , namely 1,000, whereas on the right side we have 23 · 53 = 8 · 125 = 1, 000. This is no surprise, since (2·5) (2·5) (2·5) = (2·2·2) (5·5·5). [Again, this is just a gut–level presentation. More in the subsubsection that follows.] Similarly, we have m n p = pmn . (6.93) If you are not convinced, consider the case with p = 3, m = 2, and n = 2. On the left side of Eq. 6.93, we have (32 )2 = 92 , namely 81, whereas on the right side we have 34 , which indeed equals 81. This is not surprising, since
242
Part I. The beginning – Pre–calculus
(32 )2 = 32 · 32 = 3 · 3 · 3 · 3 = 81. [One more time, this is just a gut–level presentation. More in the subsubsection that follows.]
• Exponents that are natural numbers Here, we show that the basic properties introduced in the preceding subsubsection hold for an , where a is a real positive number, whereas n is still a natural number. Let us begin with verifying that Eq. 6.91 may be extended to am an = am+n ,
(6.94)
where a is real, whereas m and n are both natural numbers. Indeed, we have am an = a . . a a . . a = a . . a = am+n . . . . m terms n terms
(6.95)
m+n terms
[The result applies even if m and/or n are equal to zero, as you may verify (see Eq. 1.54).] Next, let us verify that Eq. 6.92 may be extended to (a b)n = an bn ,
(6.96)
where a and b are real. Indeed, using ab = ba (Eq. 1.6), we have (a b)n = (ab) . . . (ab) = a . . . a b . . . b = an bn . n terms
Finally, let us verify that Eq. 6.93 may be extended to m n a = amn , where a is real. Indeed, we have m n n a = a . . a = a . . a = amn . . . m terms
(6.97)
n terms n terms
(6.98)
(6.99)
mn terms
For future reference, note that Eq. 6.98 implies m m n = amn = an . a
(6.100)
Equations 6.94, 6.96 and 6.98 will be referred to as the three basic rules of exponentiation.
243
6. Real functions of a real variable
6.5.2 Exponentiation with integer exponent Here, we consider an , where a is real, whereas n is a negative integer. [Nonnegative integers are natural numbers, and have been addressed in the preceding subsection.] Let us begin by defining a−1 :=
1 . a
(6.101)
With this definition, we make sure that the rule am an = am+n (Eq. 6.94) is valid for m = 1 and n = −1. [For, a · a−1 = a1−1 = a0 = 1 (use a0 = 1, Eq. 1.54).] Similarly, we define a−n :=
1 , an
(6.102)
so that the rule am an = am+n (Eq. 6.94) is valid for m = −n < 0. Accordingly, we have (use 1/(ab) = (1/a) (1/b), Eq. 1.38) n 1 1 1 1 1 −n = ... = . a := n = a a . . a a . a a n terms
(6.103)
n terms
Thus far, we have defined a−n by imposing Eq. 6.94 to be valid for n = −m < 0. [By definition! No proof required!] Then, using Eqs. 6.101–6.103, you may verify that the proofs in Eqs. 6.95, 6.97 and 6.99 apply as well for integers. Accordingly, the three basic rules of exponentiation (namely Eqs. 6.94, 6.96 and 6.98) are valid as well when the exponents are integers.
6.5.3 Exponentiation with rational exponent
♥
Here, we extend the definition of exponentiation to the cases in which the exponent is a rational number. Remark 53. Here, we assume that the base a is a real positive number. We limit ourselves to positive numbers in order to avoid issues like taking the square root of a negative number, which would require the use of complex numbers (Definition 109, p. 207). [The n-th power of a complex number and the n-th root of a complex number have been addressed at the end of Section 1.7, on imaginary and complex numbers.]
244
Part I. The beginning – Pre–calculus
• Definition of ar Here, we introduce the definition of ar , with r = p/q, where p and q are integers, so that r is a rational number. Again, without loss of generality, we may assume q > 0 (see Eq. 1.31). ◦ The case a1/q , with q > 0. Let us begin with a1/q , where a ≥ 0 is real (see Remark 53, p. 243), and q is a positive integer. q This is a new operation, not yet defined. To this end, we impose that ap = apq (Eq. 6.98) remains valid for p = 1/q, so as to have 1/q q a = a. (6.104) In other words, we have defined a1/q to be that number that raised to the q-th power gives back a. But this is exactly the definition of the q-th root of a. Thus, the two coincide, namely a1/q :=
√ q
a.
(6.105)
◦ The case ap/q , with p/q > 0. Consider ap/q , where a > 0 is real, and p and q are integers, with p/q > 0. [This implies that, without loss of generality, we may assume p > 0 and q > 0 (Eq. 1.31).] Again, we want to define this so as to extend the validity of (am )n = amn (Eq. 6.98). Accordingly, we can think of two possible definitions p √ p ap/q = a1/q = q a
and
1/q √ ap/q = ap = q ap .
(6.106)
I claim that these two definitions are equivalent, namely that the p-th power of the q-th root of a coincides with the q-th root of ap , that is, √ p √ q a = q ap . (6.107) √ q Indeed, taking the q-th power of this relationship, we obtain ( q a)p = ap , which is true, because we have ) √ *q ) √ *p p q q a = qa = ap . (6.108) [Hint: Use (bp )q = (bq )p (Eq. 6.100), as well as Eqs. 6.104 and 6.105.] For future reference, note that the above equation implies q p/q q √ a = q a p = ap . (6.109) ◦ The case a−p/q , with p/q > 0. Next, consider the case a−p/q , with p and q positive integers. Following a procedure similar to that used for integers, we obtain that, if we want to extend the rule am an = am+n (Eq. 6.91), we
245
6. Real functions of a real variable
have to have a−p/q ap/q = 1 (in analogy with Eq. 6.101), and hence a−p/q =
1 . ap/q
(6.110)
We also have −p/q
a
p/q 1 = . a
Indeed, taking the q-th power of Eq. 6.110, we obtain q q p 1 1 1 1 1 √ + , = = = = , a−p = q √ q p q a a ap/q ap p a
(6.111)
(6.112)
which is equivalent to Eq. 6.111. [Hint: For the third and fifth equalities, use Eq. 6.103. For the fourth, use Eq. 6.109.]
• Verifying Eqs. 6.94, 6.96 and 6.98 for rational exponents
♠
Thus far, we have completed the definition of ar , where a is a real positive number, and r is rational. Here, we verify that the three basic rules of exponentiation (namely Eqs. 6.94, 6.96 and 6.98) apply to rational exponents as well. First, we show that a p/q b p/q = (a b)p/q . Indeed, we have p/q p/q q p/q q p/q q q a b b = a = ap bp = (a b)p = (a b)p/q .
(6.113)
(6.114)
[Hint: For the first equality, use Eq. 6.96, which we may legitimately use because a := ap/q and b := bp/q are real numbers, and q > 0. Ditto for the third equality. For the second and fourth equalities, use Eq. 6.109.] Then, taking the q-th root of the above equation, one obtains Eq. 6.113. Next, we show that ar1 ar2 = ar1 +r2 .
(6.115)
Indeed, we have p1 /q1 p2 /q2 q1 q2 p1 /q1 q1 q2 p2 /q2 q1 q2 a a a = a = ap 1 q 2 a p 2 q 1 = ap1 q2 +p2 q1 .
(6.116)
246
Part I. The beginning – Pre–calculus
[Hint: Use Eq. 6.113 for the first equality, Eq. 6.109 for the second equality, and Eq. 6.94 for the third equality.] Then, taking the root q1 q2 of Eq. 6.116, one obtains ap1 /q1 ap2 /q2 = ap1 /q1 +p2 /q2 , namely Eq. 6.115 with r1 = p1 /q1 and r2 = p2 /q2 . [Use p1 /q1 + p2 /q2 = (p1 q2 + p2 q1 )/(q1 q2 ) (Eq. 1.34).] Finally, we show that, for any real r1 and r1 , we have r2 b := ar1 = ar 1 r 2 . (6.117) Indeed, setting r1 = p1 /q1 and r2 = p2 /q2 , we have ) p2 /q2 *q1 q2 ) p1 /q1 p2 *q1 ) p1 /q1 q1 *p2 bq1 q2 = ap1 /q1 = a = a = ap1 p2 . (6.118) [Hint: For the second and last equalities use Eq. 6.109, for the third use Eq. 6.100.] Hence, b = ap1 p2 /(q1 q2 ) , in agreement with Eq. 6.117.
• An inequality for ar Let us consider an inequality for ar . To begin with, I claim that a>b
is equivalent to
a1/q > b1/q
(a, b, q > 0).
(6.119)
This is a consequence of Eq. 1.68, which states that a1 > b1 is equivalent to aq1 > bq1 . [For, setting a1 = a1/q and b1 = b1/q , we obtain Eq. 6.119.] Moreover, Eq. 6.119 yields (use Eq. 1.68 again) a>b
is equivalent to
ap/q > bp/q
(a, b, p, q > 0).
6.5.4 Exponentiation with real exponents
(6.120)
♣
Here, we want to extend the definition of ap/q (where p/q is rational) to aα (where α is irrational). [Again, here we assume that a is a real positive number (Remark 53, p. 243)]. To this end, let us go back to the approach used to go from rational to real numbers (Remark 11, p. 44). Specifically, recall that, for any a irrational and for any positive number q, there exists a unique number p such that (Eq. 1.76) p/q < α < (p + 1)/q,
(6.121)
247
6. Real functions of a real variable
and that, as q becomes larger and larger, the bracket (which equals 1/q) becomes smaller and smaller, thereby refining more and more the definition of α. Let us see how we can proceed to define aα . To streamline the flow of the presentation, let us limit ourselves a > 1. [You are encouraged to consider the case a < 1.] Then, we have ap < a ap = ap+1 a>1 . (6.122) This implies ap/q < a(p+1)/q
a>1 .
(6.123)
[Hint: Use Eq. 6.119.] Next, note that every time we double the value of q, the bracket 1/q in Eq. 6.121 is halved. Accordingly, ap/q and a(p+1)/q approach each other, and the difference a(p+1)/q − ap/q becomes smaller and smaller. Then, we have the following Definition 127 (aα for α real). As q becomes larger and larger, the above limit process identifies a number. We define aα to be equal to such a number. [Again, this is really another sneak preview of the notion of limit, which will be formally introduced in Chapter 8.] Using the same approach, we obtain (start from Eqs. 6.110 and 6.111) a−α = 1/aα = (1/a)α .
(6.124)
Similarly, we have (a b)γ = aγ bγ , β
γ
(6.125)
β+γ
a a =a , β γ a = aβ γ .
(6.126) (6.127)
Finally, let us consider an inequality for powers in the real field. Definition 127 above yields a>b
is equivalent to
aα > bα
(a, b, α > 0).
(6.128)
where α is any positive real number. [Hint: Apply the above limit process to Eq. 6.120.]
248
Part I. The beginning – Pre–calculus
6.5.5 The functions xα and ax , and then more
♣
Having defined aα , with a real positive and α real, we can now introduce several functions y = f (x). For instance, if we replace a with the variable x, the operation of exponentiation produces a function known as the real– exponent power of x, or simply the real power of x, namely y = xα
(x > 0).
(6.129)
This is an extension (from natural exponents to real exponents) of the natural powers of x (Eq. 6.3). On the other hand, if we replace α with the variable x, the operation of exponentiation produces the function y = ax
(a > 0).
(6.130)
This function is very closely related to the exponential function, which will be introduced in Section 12.2 (see, in particular, Eq. 12.23). Hence, its analysis will be presented there. We can also assume that both a and α are equal to x, thereby introducing the function y = xx .
(6.131)
One more function may be introduced by assuming that a is equal to x, whereas α is equal to 1/x. Then, we have y = x1/x ,
(6.132)
which will be used in Eq. 12.21.
6.6 Composite and inverse functions
♣
Here, we deal with the so–called composite functions and inverse functions. To begin with, let us consider the following Definition 128 (Composite function). Given two functions, say y = f (x) and u = g(y), one can use the output of the first as the input for the second. The resulting function, u = g[f (x)], is called the corresponding composite function. In other words, in the composite function u = g[f (x)], first we use y = f (x) to evaluate y from x, and then use y as the argument of u = g(y), to evaluate u.
249
6. Real functions of a real variable
For instance, given u = sin y and y = (x2 + c), the composite function is u = sin(x2 + c). Inverse functions are defined in terms of composite functions. Specifically, we have the following Definition 129 (Inverse function). Consider two functions y = f (x) and u = g(y). Iff the composite function, u = g[f (x)], coincides with the identity function, namely iff, for all values of x, the final result u equals the value of x we started with, then we call x = g(y)
(6.133)
the inverse of y = f (x). Note that, if g[f (x)] = x, we have f g[f (x)] = f (x), namely f [g(y)] = y. In other words, y = f (x) is the inverse of x = g(y). We denote the inverse function of y = f (x) by x = f -1(y); similarly, we write y = g -1(x). √ For instance, the inverse of y = x2 > 0, with x ∈ [0, ∞), is x = y > 0. √ √ Indeed, x2 = x, for all the values of x > 0 (vice versa, y = y 2 implies that √ the inverse of x = y > 0 is y = x2 > 0. In particular, we have that y = sin -1 x
denotes the inverse function of x = sin y;
-1
denotes the inverse function of x = cos y;
-1
denotes the inverse function of x = tan y.
y = cos x y = tan x
(6.134)
◦ Comment. You might like to revisit Definition 126, p. 232, on f n (x) = [f (x)]n . Indeed, as pointed out there, y = f -1(x) should not be confused with the reciprocal of f (x), which is written as 1/f (x), or as [f (x)] -1. [Again, y = sin -1 x always denotes the inverse function of x = sin y, whereas y = [sin x] -1 denotes the reciprocal of sin x, namely y = 1/ sin x. On the contrary, for any positive integer n, y = sinn x and y = [sin x]n are fully equivalent.]
6.6.1 Graphs of inverse functions
♣
Note that given the graph of a function f (x), the graph of its inverse, f -1(x), is obtained from that of f (x) through a rotation around the line that forms a 45◦ angle with the x-axis. [The graphs of the functions y = sin -1 x, y = cos -1 x and y = tan -1 x are shown in Fig. 6.18, p. 252, which depicts only the so–called principal branch of these functions, as given in Eq. 6.135.]
250
Part I. The beginning – Pre–calculus
This yields an interchange of the abscissas and ordinates. Accordingly, the range and domain of a function and those of its inverse function are also interchanged. This is apparent for the functions y = x2 , with x > 0, and √ x = y (see Figs. 6.2 and 6.4, as well as Eqs. 6.2 and 6.4). Let us introduce the following Definition 130 (Monotonic function). A function y = f (x) is called monotonically increasing, in the interval [a, b] iff f (x2 ) > f (x1 ), whenever x2 > x1 , for any x1 and x2 in [a, b]. [To put it in a nutshell, a monotonically increasing function is an ever growing function.] A function y = f (x) is called monotonically decreasing in the interval [a, b] iff f (x2 ) < f (x1 ), whenever x2 > x1 , for any x1 and x2 in [a, b]. A function y = f (x) is called monotonic in the interval [a, b] iff it is either monotonically increasing, or monotonically decreasing in [a, b]. [The term monotone is also used.] Note that, if the function y = f (x) is monotonic, then the function x = f (y) is “well behaved,” in the sense that to any value of y corresponds a single value of x (single–valued function). In short, we say that monotonic functions may be uniquely inverted. On the other hand, if the function y = f (x) is not monotonic, then the function x = f -1(y) is not “well behaved,” in the sense that to a value of y may correspond multiple values of x. [Functions of this type are called multi–valued. The distinction between single–valued and multi–valued functions is addressed in detail in the next section.] ◦ Comment. Just to make sure that it’s clear to you what I am talking about, you may note that a function like y = x3 , which has a horizontal tangent at x = 0, is a monotonically increasing function. The graph of its √ inverse x = 3 y has a vertical tangent at y = 0, but is single–valued. -1
6.7 Single–valued and multi–valued functions
♣
As we have seen in the preceding section, there exist functions such that, to a given value for the argument x, corresponds more than one value for y. For √ instance, for the inverse of x = y 2 , we have y = ± x. Similarly, y = sin -1 x has infinite values. [For, there are infinite values of y that yield the same value x = sin y.] To be specific, an inverse function y = f -1(x) takes more than one value any time the original function x = f (y) attains a given value for at least two different values of x. Only functions that are monotonically increasing or decreasing in the domain of definition have single–valued inverses. ◦ Comment. Were I to be consistent with the notations of the preceding section, for any multi–valued function I would have to use y for the argument and x for the dependent variable, namely x = F (y). [Indeed, for inverse func-
6. Real functions of a real variable
251
tions we would have F (y) = f -1(y), consistently with the preceding section.] However, multi–valued functions are not necessarily connected with inverse functions. Accordingly, here I have used the notations y = f (x), even though this is in contrast with that used in the preceding section, where we were dealing with inverse functions. Let us broaden our discussion of multi–valued function outside the realm of inverse functions. Consider an algorithm y = f (x). If for any x we have only one value for y, the algorithm relating a given value of x to a value of y defines the type of functions y = f (x) that we have been addressing thus far (preceding section excluded). If we have more than one value for y, it is still convenient to consider such an algorithm to be a function. However, to emphasize this difference, we have the following Definition 131 (Single–valued and multi–valued functions). Consider an algorithm that for any value of x yields one or more values of y. If, for some values of x, there exists more than one value for y, the algorithm y = f (x) from x to y is called a multi–valued function. On the contrary, if, for any values of x, there exists only one value for y, the algorithm y = f (x) from x to y is called a single–valued function. Remark 54. As we will see in Section 7.2, we may encounter multi–valued functions when we go from an implicit representation of a curve, namely one of the type F (x, y) = 0, to an explicit one, namely one of the type y = f (x). For instance, the equation x2 + y 2 = R2 (Eq. 5.6) defines a circle with the center at the origin and radius R. This √is the implicit representation of a circle. The explicit representation is y = ± R2 − x2 (a multi–valued function with two values for the same value of x). [More in Section 7.5.]
• Principal branch In order to avoid having to deal with multi–valued functions, mathematicians sometimes use the notion of the principal branch, which is best explained with an example. The principal branch of y = sin -1 x, denoted by y = sinP-1 x, coincides with the inverse function of x = sin y, provided that the range of y = sinP-1 x is limited to y ∈ [−π/2, π/2]. Specifically, for the inverse trigonometric functions, we have y = sinP-1 x
is the inverse of x = sin y, with y ∈ [−π/2, π/2];
y = cosP x
is the inverse of x = cos y, with y ∈ [0, π];
y = tanP-1 x
is the inverse of x = tan y, with y ∈ (−π/2, π/2).
-1
(6.135)
252
Part I. The beginning – Pre–calculus
The principal branches of the inverse trigonometric functions are depicted in Fig. 6.18. [Compare to the graphs of the trigonometric functions, Fig. 6.12, p. 230.]
Fig. 6.18 Principal branch of y1 (x) = sin -1 x, y2 (x) = cos -1 x and y3 (x) = tan -1 x
For future reference, note that sinP-1 0 = 0,
cosP-1 0 = π/2,
6.8 Appendix. More on functions
tanP-1 0 = 0.
(6.136)
♣
In this appendix, we clarify two aspects of the definition of functions. The first is the meaning used in this book for the term “function,” in comparison to the terms “mapping,” and “operator.” The second deals with the relationship between vectors and functions, namely between discrete and continuous formulations.
6.8.1 Functions, mappings and operators
♣
Here, we want to point out that, in this book, the term “function” is understood in a more limited sense than that used by some other authors. Specifically, in this book, the term “function” is used exclusively for an algorithm that, given a number, produces a number (or even more than one number in the case of multi–valued functions).
253
6. Real functions of a real variable
Let us generalize the concept. Consider an algorithm that, given some mathematical entity (such as a number, or an ordered collection of n numbers, or a function), produces another mathematical entity, not necessarily of the same type. In this book, to refer to such a generalized operation, I use exclusively the term “mapping.” Specifically, we have the following Definition 132 (Mapping). The term mapping denotes any algorithm that, applied to some mathematical entity, generates another mathematical entity, not necessarily of the same type. [Thus, a mapping is an extension of the term function, and includes functions as particular cases. It includes any correspondence between much wider mathematical entities than those addressed in this chapter. For instance, left–multiplication of a matrix by a vector, A b = c, is a mapping, in the sense that, given a vector b, it produces a vector c.] Particular types of mapping are given in the following definitions: Definition 133 (Operator). An operator is a particular case of mapping, namely one that applied to a function produces a function. [For instance, the derivative, which will be introduced in Chapter 9, is an operator.] Definition 134 (Functional). A functional is a particular case of a mapping, namely one that applied to a function produces a number. [For instance, the integral, which will be introduced in Chapter 10, is a functional.] ◦ Warning. Some authors use the terms mapping, function and operator interchangeably, with the meaning that here is reserved for mapping.
6.8.2 Functions vs vectors
♣
You might object that an n-dimensional vector may be considered as a function defined only for x equal to the first n natural number, namely a function whose domain of definition is restricted to the points x = 1, 2, . . . , n. For the sake of clarity, I prefer to avoid this extended definition of the term function. Accordingly, I must add a qualifier to the definition of functions, stating that: “A function y = f (x) is the algorithm used to arrive at the value y, from the value x (in the real field), with values of x spanning over a domain that consists of a collection of nonzero intervals.” Accordingly, returning to the problem of the one–hundred meter dash runner (Section 6.1), the measured values vk (namely the dots in Fig. 6.5, p. 209) do not represent a function. They are simply the input parameters used to arrive at the function, via linear interpolation.
254
Part I. The beginning – Pre–calculus
Here, we want to explore another aspect of this issue. Specifically, we want to study the relationship between vectors and functions (in particular, real vectors and real functions of a real variable), in order to emphasize the interconnection of the two concepts. Indeed, vectors and functions are closely related. In the case of a real vector b, we have an algorithm that, given an integer, namely the index k of the vector’s element, yields a real number bk . In the case of real functions of a real variable, we have an algorithm that, given a real number in the domain of definition, yields a real number. Note that it is easy to go from a function to a vector. Just set fk = f (k Δx). For instance, from the function x3 , with x ∈ [1, n], we can obtain the n-dimensional vector v = {k 3 }, with k = 1, . . . , n. This process is called discretization, because we go from a continuous representation of x to a discrete one, namely to that with xk = k Δx (k = 1, . . . , n). The other way around, namely from vectors to functions, may be obtained by connecting the points with straight lines. Of course, this does not yield the original function; the process of interpolation introduces some inaccuracies (compare Figs. 6.5, p. 209, and 6.6, p. 210). However, we may improve the accuracy of the approximation by increasing the number of points (the more points we have in a given interval, the better is the approximation), or by using more sophisticated interpolation procedures, addressed in Vol. II.
Chapter 7
Basic analytic geometry
In this chapter, we introduce some elementary notions of a branch of mathematics called analytic geometry (or Cartesian geometry), in which roughly speaking one uses mathematical analysis to represent geometrical figures on a plane or in space, and study their properties.
• Overview of this chapter In Section 7.1, we discuss Cartesian frames of reference within the context of analytic geometry. In analogy to what we did for the graph of a function (Section 6.1), a Cartesian frame of reference is composed of an origin and two orthogonal axes. However, contrary to those for functions, in analytic geometry, the two axes have equivalent roles. We also address what happens if we translate and/or rotate the axes. Next, in Section 7.2, we present three ways with which lines and surfaces may be represented, namely their explicit, implicit and parametric representations. Then, we introduce straight lines on a plane (Section 7.3), as well as straight lines and planes in space (Section 7.4). In Section 7.5, we address circles and disks, as well as the formulas of addition and subtraction for sine and cosine, whereas in Section 7.6 we discuss spheres and balls. Finally, we introduce the so–called conics, namely ellipses, parabolas and hyperbolas (Section 7.7), along with their canonical and polar representations (Sections 7.8 and 7.9, respectively). For the inquisitive minds among you, we also have an appendix (Section 7.10), where we address the question: “Why are ellipses, parabolas and hyperbolas called conics?”
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_7
255
256
Part I. The beginning – Pre–calculus
7.1 Cartesian frames In Definition 107, p. 205, we introduced Cartesian axes, which consist of two straight lines, mutually orthogonal. Here, we extend the formulation, and have the following Definition 135 (Cartesian frame). For the two–dimensional case (planar geometry), let us introduce an origin O, and a pair of Cartesian axes, that is: (i) the x-axis, namely a straight line through O (typically, but not necessarily, in the horizontal direction, pointing to the right), and (ii) the y-axis, namely a straight line through O perpendicular to the x-axis (Fig. 7.1; as stated in Footnote 1, p. 141, the terms “two–dimensional” and “three–dimensional”) are sometimes abbreviated as 2D and 3D, respectively. The system composed of the origin and the x- and y-axes is called a Cartesian frame of reference. The corresponding plane is the Cartesian plane. In this plane, each point is identified by two real numbers, x, and y, which are known as the Cartesian coordinates of the point. In the rest of this book, we use the notation P = (xP , yP ) to identify a point P on the (x, y)-plane. [As mentioned above, contrary to what we did in Chapter 6, where the x-axis is related to the independent variable and the y-axis to the dependent one, in analytic geometry the two axes have equivalent roles. Indeed, they will not be referred to as abscissas and ordinates.] A straight line parallel to an axis is called a coordinate line. [Specifically, a line parallel to the x-axis (y-axis) is called an x-line (y-line).]
Fig. 7.1 Cartesian axes (2D)
Fig. 7.2 Cartesian axes (3D)
◦ Warning. It may be noted that the choice of the origin, as well as that of the x-axis direction, are arbitrary. On the other hand, the y-axis is perpendicular to the x-axis. More precisely, in this book, for an observer traveling in the positive direction of the x-axis, the y-axis typically points towards the left.
257
7. Basic analytic geometry
In other words, it is obtained from the x-axis via a counterclockwise rotation of a π/2 angle. ◦ Comment. The extension to three dimensions is simple. We have three mutually orthogonal Cartesian axes: x, y, and z, as in Fig. 7.2. Correspondingly, we use the notation P = (xP , yP , zP ). In this book, pointing the thumb of the right hand like the x-axis, and the index finger like the y-axis, the zaxis will be directed like the middle finger (right–handed frame of reference).
7.1.1 Changes of axes Let us explore what happens when we change the axes. For the sake of clarity, we limit ourselves to two–dimensional problems. [Three–dimensional problems are a bit more complicated, as they require the use of orthogonal matrices, which are introduced in Chapter 16.] Specifically, here we obtain explicit expressions for the coordinate transformations corresponding to a general change of axes. We begin by examining two simple cases. In the first, we shift the origin O to a new point Q, without rotating the axes; this corresponds to a pure translation of our frame of reference. In the second case, we rotate the axes without moving the origin; this corresponds to a pure rotation around the origin of our frame of reference. Finally, we consider general transformations (called rigid–body transformations), which are obtained by combining the two preceding ones.
• Pure translations Consider a pure translation. This is obtained by moving the origin from O to Q without rotating the axes, as shown in Fig. 7.3. Let ξ and η denote the coordinates in the new frame of reference. It is apparent from the figure that x = xQ + ξ, y = yQ + η.
(7.1)
The inverse transformation is ξ = x − xQ , η = y − yQ .
(7.2)
258
Part I. The beginning – Pre–calculus
Fig. 7.3 Translation
• Pure rotations Here, we consider a pure rotation. This is obtained by rotating the axes while keeping the origin fixed. Again, let ξ and η denote the coordinates in the new frame of reference. This is shown in Fig. 7.4, from which we obtain x = ξ cos θ − η sin θ, y = ξ sin θ + η cos θ,
(7.3)
as you may verify. [θ > 0 corresponds to a counterclockwise rotation.] The inverse transformation is ξ = x cos θ + y sin θ, η = −x sin θ + y cos θ,
(7.4)
as you may verify. [Hint: Combine Eqs. 7.3 and 7.4. Alternatively, use Fig. 7.5.]
Fig. 7.4 Rotation
Fig. 7.5 Rotation (inverse)
259
7. Basic analytic geometry
• Rigid–body transformations Any transformation that may be obtained as an arbitrary sequence of translations and rotations around the origin (in arbitrary order) is called a rigid–body transformation. You can easily convince yourself that any rigid–body transformation may be obtained as a sequence of one single rotation followed by one single translation, to obtain x = ξ cos θ − η sin θ + xQ, y = ξ sin θ + η cos θ + yQ,
(7.5)
where xQ and yQ identify the origin Q of the (ξ, η)-frame after the transformation, as seen from the (x, y) frame of reference. Alternatively, we can perform a sequence of one single translation followed by one single rotation, namely x = (ξ + ξQ) cos θ − (η + ηQ) sin θ, y = (ξ + ξQ) sin θ + (η + ηQ) cos θ,
(7.6)
where (ξQ, ηQ) identify the origin of the (ξ, η)-frame after the translation, as seen in the (ξ, η) frame of reference. ◦ Comment. Note that, if we interchange the order of the two transformations, the overall transformation changes in a formal way, but not in any substantial one! Indeed, they are fully equivalent, because (xQ, yQ) in Eq. 7.5 and (ξQ, ηQ) in Eq. 7.6 are related by Eq. 7.3.
7.2 Types of representations In Remark 54, p. 251, we touched upon the notion of explicit and implicit representations of lines and surfaces. Here, before embarking on an analysis of specific lines and surfaces, we want to introduce a terminology that allows one to distinguish the different kinds of representations in general. Indeed, in this book we do not limit ourselves to those lines and surfaces that are discussed in this chapter, but include more general geometrical figures. Specifically, in this section, we introduce the definitions of explicit, implicit and parametric representations for lines in two dimensions (Subsection 7.2.1), and for lines and surfaces in three dimensions (Subsection 7.2.2). [In this section, I do not provide you with illustrative examples. These are presented throughout the rest of the chapter.]
260
Part I. The beginning – Pre–calculus
7.2.1 Two dimensions We have the following Definition 136 (Representation of lines in two dimensions). A representation of a two–dimensional line of the type y = f (x)
(7.7)
is called explicit, whereas one of the type F (x, y) = 0
(7.8)
is called implicit, and finally one of the type x = X(u)
and
y = Y (u)
(7.9)
is called parametric, u being the parameter that describes the line. Note that Eq. 7.7 is identical to the expression used to represent functions (Eq. 6.1). [A representation of the type x = g(y) (akin to the expression used to represent inverse functions, Eq. 6.133) is an alternate explicit representation of the line, as the axes have equivalent roles.] ◦ Comment. The explicit representation may be easily recast as both: (i) an implicit one, by rewriting as F (x, y) := y − f (x) = 0, and (ii) a parametric one, by writing x = u, y = f (u). [On the other hand, while it is always possible to go from the explicit to the implicit, it is not always possible to go the other way around, within the realm of the functions available.]
7.2.2 Three dimensions In three dimensions, we have two types of “nice” geometrical entities, namely: (i) lines, namely one–dimensional entities, embedded in a three–dimensional space, and (ii) surfaces, namely two–dimensional entities also embedded in a three–dimensional space. [Surfaces are formally defined later (Definition 169, p. 350).1 For the time being, surfaces are those entities described by Eqs. 7.10, or 7.11, or 7.12.] Here, we address the types of representations for both cases. We have the following 1 To be precise, here I call a geometrical entity “nice,” if it locally behaves like a Eulerian space having the same dimensions. Mathematicians call it manifold.
261
7. Basic analytic geometry
Definition 137 (Representation of surfaces). Akin to lines in two dimensions, we have three types of representations for surfaces. Specifically, a representation of a surface of the type z = f (x, y)
(7.10)
is called explicit, one of the type F (x, y, z) = 0
(7.11)
is called implicit, whereas one of the type x = X(u, v)
y = Y (u, v)
z = Z(u, v)
(7.12)
is called parametric, u and v being the parameters that describe the surface. ◦ Comment. Akin to lines in a two–dimensional space, the explicit representation may be easily recast as both: (i) an implicit one, by rewriting as F (x, y, z) := z − f (x, y) = 0, and (ii) a parametric one, by writing x = u, y = v, and z = f (u, v). We have the following Definition 138 (Representation of lines in three dimensions). Given a line in three dimensions, a representation of the type y = f (x)
and
z = g(x)
(7.13)
and
G(x, y, z) = 0
(7.14)
is called explicit, one of the type F (x, y, z) = 0
(namely the intersection of two surfaces) is called implicit, whereas one of the type x = X(u),
y = Y (u),
z = Z(u)
(7.15)
is called parametric; u is called the parameter describing the line. ◦ Comment. Note that, if in Eq. 7.12 we set v=constant and vary only u, we obtain a line on the surface. We will refer to this as a u-line. Thus, on a surface given in parametric form, we can identify two families of lines: the u-lines and the v-lines. More generally, if we set in Eq. 7.12 v = v(u), we obtain a parametric representation of a line on the surface (use Eq. 7.15).
262
Part I. The beginning – Pre–calculus
7.3 Straight lines on a plane Here, we consider straight lines in two dimensions. [More on this subject in Subsection 15.2.5, on lines and planes revisited, after we introduce physicists’ vectors.] In Subsection 6.3.2, we have introduced first–degree polynomials, such as in Eq. 6.26, namely y = m x + b.
(7.16)
In this form, it is apparent that when x = 0 we have y = b. Thus, Eq. 7.16 represents a line that has a constant slope equal to m and crosses the y-axis (namely the line x = 0) at y = b. ◦ Comment. Given the fact that, as already pointed out, the x- and y-axes have interchangeable roles, we also consider the equation x = n y + c.
(7.17)
This allows us to consider vertical lines (n = 0), something not possible with Eq. 7.16. On the other hand, this last equation cannot represent horizontal lines, something that Eq. 7.16 can (m = 0). ◦ Warning. In Definition 80, p. 176, I have introduced a straight line as an infinitely long line, with the property that a segment connecting any two of its points belongs to the line itself (where a segment, in turn, is defined a the shortest line connecting two given points). As pointed out in Subsection 6.3.2 on first–degree polynomials, here in the middle of the game I change the rules. Indeed, we have the following Definition 139 (Straight line). A straight line is the graph of the functions in Eqs. 7.16 and 7.17 (first–degree polynomials). [Equations 7.16 and 7.17 are explicit representations of a straight line. In other words, now a straight line is a line with a constant slope. The equivalence of this new definition of straight line with that given in Euclidean geometry is addressed in Subsection 7.3.1.] Other explicit representations for a straight line are y − y0 = m (x − x0 )
and
x − x0 = n (y − y0 ).
(7.18)
In this form, it is apparent that when x = x0 , we have y = y0 . Hence, Eq. 7.18 represents a straight line through the point P0 = (x0 , y0 ), with a slope equal to m (or 1/n). In other words, P0 = (x0 , y0 ) is a point of the line. [Of course, the first in Eq. 7.18 is fully equivalent to Eq. 7.16, with b = y0 − m x0 , namely y0 = m x0 + b, whereas the second in Eq. 7.18 is fully equivalent to Eq. 7.17, with c = x0 − n y0 , namely x0 = n y0 + c.]
263
7. Basic analytic geometry
An implicit representation for a straight line is ax (x − x0 ) + ay (y − y0 ) = 0,
(7.19)
which is fully equivalent to Eq. 7.18 (with m = −ax /ay , or n = −ay /ax ). The advantage of this form of representation with respect to the preceding ones is that it covers both horizontal and vertical lines. [For future reference, Eq. 7.19 may be written as a x + b y = c,
(7.20)
with a = ax , b = ay and c = ax x0 + ay y0 .] A parametric representation of a straight line may be obtained as follows. Note that a straight line through P0 = (x0 , y0 ) is the locus of the points P = (x, y) such that P − P0 is proportional to a given vector b = bx , by T . In terms of coordinates, we have that such a straight line may be represented as x − x0 = λ bx , y − y0 = λ by ,
(7.21)
with λ ∈ (−∞, ∞). This equation is equivalent to x − x0 y − y0 = , bx by
(7.22)
by (x − x0 ) − bx (y − y0 ) = 0.
(7.23)
which may be recast as
For future reference, this equation is equivalent to Eq. 7.19, with ax = γ by
and
ay = −γ bx ,
(7.24)
namely ax /by = −ay /bx , or ax bx + ay by = 0. [More on this relationship in Subsection 15.2.5 (see Eq. 15.51).]
(7.25)
264
Part I. The beginning – Pre–calculus
7.3.1 Reconciling two definitions of straight lines
♥
Here, we address the fact that we, like the rest of the world, have now two definitions for a straight line. Specifically, in Euclidean geometry we use Definition 80, p. 176, namely as an infinite line with the property that a segment connecting any two of its points belongs to the line, where a segment in turn is defined as the shortest line connecting two points. On the other hand, in analytic geometry we use Definition 139, p. 262, namely as the graph of a first–order polynomial. That puzzled me when I was a student. Here, we want to reconcile the two definitions. Note that in going from Euclidean geometry to analytic geometry, we have to introduce the Cartesian axes. Assume that having chosen the x- and y-axes (arbitrarily), we have obtained that the line is described by y = mx + b (Eq. 7.16). Next, let us perform a rigid–body transformation of the axes (Eq. 7.5), so as to obtain ξ sin θ + η cos θ + yQ = m (ξ cos θ − η sin θ + xQ) + b,
(7.26)
where θ, xQ and yQ are arbitrary. The above equation may be recast as (cos θ + m sin θ) η = (− sin θ + m cos θ) ξ + (b + m xQ − yQ).
(7.27)
Next, let us choose tan θ = m. This yields η=
b + m xQ − yQ = (b + m xQ − yQ) cos θ =: C, cos θ + m sin θ
(7.28)
namely a line parallel to the x-axis. Finally, let us perform a rigid–body translation (Eq. 7.1), such that b + m xQ − yQ = 0, so as to obtain C = 0. Then, the above equation reduces to η = 0. In other words, the original line is now represented by the ξ-axis of the new frame of reference. But the ξ-axis is a straight line (see Definition 107, p. 205, which was adopted for analytical geometry in Section 7.1). ◦ Comment. For the critical minds, let us show that any other continuous line, say L, connecting the points A and B on the x-axis is longer than the segment AB. For simplicity, let us assume that the line L may be represented by a single–valued function η = η(ξ). Divide the line L into small elements Lk . Replace each element with a straight element (segment), say Lˇk . The ˇ composed of result is that L is approximated with a continuous line, say L, all the segments Lˇk . The projection of each segment Lˇk into the x-axis is shorter than the segment itself (Lemma 7, p. 197). Therefore, the line Lˇ is longer than the portion AB of the x-axis. As the elements Lk become smaller and smaller, Lˇ becomes closer and closer to L and we obtain that L is longer than AB. [For simplicity, we have assumed that the line L may be represented
265
7. Basic analytic geometry
by a single–valued function η = η(ξ). I’ll let you show that the results are even more true when the function is multi–valued. Also, a refined analysis is presented in Subsection 10.5.3, after we introduce a method to evaluate the length of a line.]
7.3.2 The 2 × 2 linear algebraic system revisited We are now in a position to give a geometrical interpretation of the 2 × 2 linear algebraic system (Eq. 2.1), namely a x + b y = f, c x + d y = g.
(7.29)
In analytic geometry, each of the two equations represents a straight line in the (x, y)-plane (Eq. 7.20). Thus, the above 2 × 2 linear algebraic system has the following geometrical interpretation. Find the intersection of the two straight lines, namely the point P = (x, y) that the two lines have in common. Let us examine the three possibilities encountered in Theorem 11, p. 67. Assume first that b/a = d/c (namely that ad − bc = 0). In this case, the two lines have a different slope; this guarantees the existence of a unique solution, namely a single intersection point. Next, assume that b/a = d/c. In this case, the two lines have the same slope. In addition, if c/a = d/b = g/f , the two lines are parallel and distinct, and hence the solution does not exist. On the other hand, if c/a = d/b = g/f , the two lines coincide, and hence the solution exists, but is not unique — every point of the line is a solution to the problem.
• Ill–conditioned systems Let us consider an even deeper analysis. Consider the case a d − b c 0.
(7.30)
[Recall that the symbol “” stands for “approximately equal to” (Footnote 16, p. 174).] In this case, the lines are almost parallel. Thus, a unique intersection point exists. However, . . . , there is a “however :” if the second line is translated by a small amount (which corresponds to changing g by a small amount, while leaving the other coefficients unchanged), the intersection point changes con-
266
Part I. The beginning – Pre–calculus
siderably. Similar considerations hold for small variations of c or d. We can say that the system is sensitive to small variations of the coefficients. The corresponding mathematical terminology is that the system is “ill conditioned.” [The extension to three dimensions is addressed in Section 7.4.]
7.4 Planes and straight lines in space Here, we consider planes and straight lines in three dimensions. [More on this subject in Subsection 15.2.5, on lines and planes revisited, after we introduce physicists’ vectors.]
7.4.1 Planes in space Equation 7.18 for a straight line on a plane may be generalized to yield z = z0 + m (x − x0 ) + n (y − y0 ).
(7.31)
This is the explicit representation of a plane through the point P0 = (x0 , y0 , z0 ), having slope equal to m along the x-direction, and equal to n along the y-direction. An implicit representation for a plane in a three–dimensional space is the generalization of Eq. 7.19 for a straight line on a plane, namely ax (x − x0 ) + ay (y − y0 ) + az (z − z0 ) = 0,
(7.32)
which is fully equivalent to Eq. 7.31, with m = −ax /az and n = −ax /az . ◦ Comment. The advantage of the representation in Eq. 7.32, with respect to that in Eq. 7.31, is that it includes vertical planes, namely planes parallel to the z-axis. These correspond to az = 0, namely ax (x−x0 )+ay (y−y0 ) = 0. This equation represents a vertical plane through the line ax (x−x0 )+ay (y − y0 ) = 0 of the (x, y)-plane, since z is arbitrary. This could not be obtained with Eq. 7.31, since m = −ax /0 and n = −ay /0 do not exist. Equation 7.32 may be written as (compare to Eq. 7.20 for a straight line on a plane) a x + b y + c z = d, with a = ax , b = ay , c = az and d = ax x0 + ay y0 + az z0 .
(7.33)
267
7. Basic analytic geometry
Finally, Eq. 7.21 suggests a way to introduce a parametric representation for a plane, namely x − x 0 = λ ax + μ b x , y − y 0 = λ ay + μ b y , z − z0 = λ az + μ b z ,
(7.34)
where λ, μ ∈ (−∞, ∞), whereas a = ax , ay , az T and b = bx , by , bz T are any two linearly independent (namely not proportional to each other) vectors, to which the plane under consideration is parallel. You may verify that the above equation defines indeed a plane through P0 that is parallel to both a and b. [Hint: Vary λ with μ fixed, and vice versa.]
• 3 × 3 linear algebraic systems revisited. Ill–conditioned systems The remarks on 2 × 2 linear algebraic systems, presented in Subsection 7.3.2, apply as well for the 3 × 3 case. Specifically, consider three planes. If they are not parallel, then the solution (intersection of the three planes) exists and is unique. If at least two planes are parallel but do not coincide, the solution does not exist. If two planes coincide (and the third is not parallel to them), we have ∞1 solutions. If the three planes coincide, we have ∞2 solutions. These considerations provide a geometrical interpretation of Theorem 14, p. 79. Moreover, the remarks on the sensitivity of the solution to ill–conditioned systems, also presented in the preceding section for the two–dimensional case, apply as well for the three–dimensional case.
7.4.2 Straight lines in space
♣
Next question: “How do we represent a line in three dimensions?” A standard approach is to represent a line as the intersection of two planes, namely a1 x + b1 y + c1 z + d1 = 0, a2 x + b2 y + c2 z + d2 = 0.
(7.35)
Another representation (a parametric one) is to consider a line through P0 as the locus of the points P such that P −P0 is parallel (namely proportional) to a given vector a = ax , ay , az T (compare to Eq. 7.21 for a straight line on a plane):
268
Part I. The beginning – Pre–calculus
x − x 0 = λ ax , y − y 0 = λ ay , z − z 0 = λ az ,
(7.36)
with λ ∈ (−∞, ∞). This is equivalent to (compare to Eq. 7.22 for a straight line on a plane) x − x0 y − y0 z − z0 = = , ax ay az
(7.37)
which may be recast as (compare to Eq. 7.23 for a straight line) ay (x − x0 ) = ax (y − y0 )
and
az (x − x0 ) = ax (z − z0 ).
(7.38)
[Note that the above representation is of the type given in Eq. 7.35. Moreover, the first equation is satisfied for an arbitrary value of z; thus, it represents a plane parallel to the z-axis. Similarly, the second equation represents a plane parallel to the y-axis.]
7.5 Circles and disks The definition of circle and disk in analytical geometry is essentially equal to that in Euclidean geometry (Definition 83, p. 177). Specifically, we have the following Definition 140 (Circle and disk). A circle C is the locus of the points in the (x, y)-plane that are equidistant from a given point C. The common distance R is called the radius of the circle, the point C its center. The diameter is the length of a segment through O, with endpoints on the circle. [It equals 2R, as you may verify.] The length of the circle is called the circumference. A disk is the locus of all the points that are inside C (interior points), possibly including boundary points (namely points on C). In particular, we use the term open disk iff all the boundary points are excluded, and closed disk iff all the boundary points are included. Thus, according to the Pythagorean theorem (a2 + b2 = c2 , Eq. 5.6), the equation for the circle is (use Fig. 7.6) (x − xC )2 + (y − yC )2 = R2 ,
(7.39)
or (implicit representation) F (x, y) := (x − xC )2 + (y − yC )2 − R2 = 0.
(7.40)
269
7. Basic analytic geometry
Fig. 7.6 Circle (x − xC )2 + (y − yC )2 = R2
It is easy to obtain an explicit representation. Specifically, the function y = y(x) is multi–valued, and hence we need two distinct representations, one for the upper portion of the circle (plus sign), one for the lower (minus sign), namely . y = yC ± R2 − (x − xC )2 . (7.41) Similarly, we have x = xC ±
. R2 − (y − yC )2 .
A parametric representation of a circle of radius r and center C = (xC , yC ) is given by x = xC + R cos u
and
y = yC + R sin u,
(7.42)
where u denotes the angle between the segments CQ and CP , as you may verify. [Hint: Substitute the above expressions into Eq. 7.39. See also Eq. 6.64.] On the other hand, the equation for an open disk is (x − xC )2 + (y − yC )2 < R2 ,
(7.43)
whereas for a closed disk we have (x − xC )2 + (y − yC )2 ≤ R2 .
(7.44)
7.5.1 Polar coordinates In Eq. 7.42 R is a constant. If the radius is not kept constant, we use the symbol r and ϕ in place of R and u, and set xC = yC = 0. The variables r
270
Part I. The beginning – Pre–calculus
and ϕ are called polar coordinates. Specifically, we have (see Fig. 7.7) x = r cos ϕ and y = r sin ϕ r ∈ [0, ∞); ϕ ∈ (−π, π] . (7.45) If we keep r constant (as in Eq. 7.42), and vary ϕ, Eq. 7.45 describes a circle centered on the origin. On the other hand, if we keep ϕ constant and vary r, we describe a ray from the origin (Eq. 7.21, for the parametric representation of a straight line). The lines r = constant (circles) and ϕ = constant (rays) are called coordinate lines (ϕ-lines and r-lines, respectively). The inverse relationships are y r = x2 + y 2 (7.46) and ϕ = tan -1 . x Thus, r is the distance of P = (x, y) from the origin, whereas ϕ is the angle that the segment OP forms with the x-axis.
Fig. 7.7 Polar coordinates
Remark 55. Note that the definition of ϕ in Eq. 7.46 does not yield a unique value for ϕ. For, we can always add 2kπ (k = ±1, ±2, . . . ), and obtain the same result; this non–uniqueness, intrinsic in the definition of an angle, is removed through the limitation ϕ ∈ (−π, π] included in Eq. 7.45. However, because the period of tan ϕ is π (and not 2π, third in Eq. 6.69), this still leaves ϕ non–uniquely defined, because if we change the signs of both x and y their ratio y/x remains unchanged. Thus, the definition must be completed by imposing that, upon substitution of ϕ into Eq. 7.45, we obtain the correct signs for x and y. This assumption is tacitly used throughout the book.
271
7. Basic analytic geometry
7.5.2 Addition formulas for sine and cosine
♣
Here, I present a set of formulas, that were among the most hated by my students. I am referring to the formulas of addition and subtraction for sin θ, cos θ and tan θ. [Actually, I include myself in the haters’ group, although I have to admit that they are often very useful. Indeed, I will often use them in this book.] We have the following Theorem 68 (Addition and subtraction formulas). The functions sin θ, cos θ and tan θ satisfy the following relationships, known as the formulas of addition and subtraction for sines, cosines and tangents: sin(α ± β) = sin α cos β ± cos α sin β,
(7.47)
cos(α ± β) = cos α cos β ∓ sin α sin β, tan α ± tan β tan(α ± β) = . 1 ∓ tan α tan β
(7.48) (7.49)
◦ Proof : Consider Fig. 7.8, where the points P and Q lie on the unit circle centered on the origin O. The point Q has coordinates xQ = 1
and
yQ = 0.
(7.50)
On the other hand, the segment OP forms an angle θ = α+β with the x-axis. Hence, we have xP = cos(α + β)
Fig. 7.8 Addition formula (A)
and
yP = sin(α + β).
Fig. 7.9 Addition formula (B)
(7.51)
272
Part I. The beginning – Pre–calculus
In addition, let us consider a different frame of reference, where the x -axis forms an angle α with the x-axis (Fig. 7.9). In this second frame of reference, we use, for the same point P considered above, the coordinates xP and yP , which may be expressed in terms of β, as xP = cos β
and
yP = sin β.
(7.52)
Similarly, the coordinates of the point Q may be expressed in terms of α > 0, as xQ = cos α
and
yQ = − sin α.
(7.53)
Next, we want to take advantage of the fact that the distance P Q is independent of the frame of reference chosen. Specifically, in the (x, y) frame, we have 2
P Q = (xP − xQ)2 + (yP − yQ)2 = [cos(α + β) − 1]2 + sin2 (α + β) = 2 − 2 cos(α + β).
(7.54)
[Hint: Use Eqs. 7.50 and 7.51, as well as sin2 x + cos2 x = 1 (Eq. 6.75).] On the other hand, in the (x , y ) frame we have 2
2 P Q = (xP − xQ)2 + (yP − yQ )
= (cos β − cos α)2 + (sin β + sin α)2 = 2 − 2 cos α cos β + 2 sin α sin β.
(7.55)
[Hint: Use Eqs. 7.52 and 7.53, and again sin2 x + cos2 x = 1 (Eq. 6.75).] Comparing the expressions in Eqs. 7.54 and 7.55, we obtain cos(α + β) = cos α cos β − sin α sin β,
(7.56)
which coincides with Eq. 7.48, with the upper sign. [To obtain the expressions with the lower sign, you may replace β with −β, and use the fact that sin β is odd, whereas cos β is even (Eq. 6.70).] Next, let us consider the formulas for sin(α + β). We have sin(α + β) = cos(α + β − π/2) = cos(α − π/2) cos β − sin(α − π/2) sin β = sin α cos β + cos α sin β.
(7.57)
[Hint: For the first and third equalities, recall that cos(γ − π/2) = sin γ and sin(γ − π/2) = − cos γ (Eq. 6.71) and use γ = α + β for the first equality, and γ = α for the third. For the second, use Eq. 7.56.] The above equation
273
7. Basic analytic geometry
coincides with Eq. 7.47, with the upper sign. [For the lower sign use again Eq. 6.70.] Finally, combining Eqs. 7.47 and 7.48, we have tan(α ± β) =
sin α cos β ± cos α sin β sin(α ± β) = , cos(α ± β) cos α cos β ∓ sin α sin β
which is equivalent to Eq. 7.49.
(7.58)
Remark 56. In the above proof, we have used Figs. 7.8 and 7.9, where α > 0 and β > 0, with α + β ≤ π/2. You can verify that such restrictions are not used in Eqs. 7.51–7.53. Thus, the proof is valid for any value of the angles.
7.5.3 Prosthaphaeresis formulas
♣
Here, we introduce the so–called formulas of prosthaphaeresis (hated even more than those of addition and subtraction of trigonometric functions, at least by me).2 ◦ Comment. From a historical point of view, Eqs. 7.59, 7.60 and 7.61 were important towards the end of the 16th century, because they were used to replace multiplications with sums and subtractions, so as to expedite calculations involving multiplication. After John Napier introduced the logarithm in 1614, their role was displaced by the more convenient formula ln xy = ln x + ln y (Eq. 12.6). The formulas of prosthaphaeresis are 1 cos(α − β) − cos(α + β) , 2 1 cos(α − β) + cos(α + β) , cos α cos β = 2 1 sin(α − β) + sin(α + β) . sin α cos β = 2 sin α sin β =
(7.59) (7.60) (7.61)
They are obtained by suitable combinations of Eqs. 7.47 and 7.48, as you may verify. [Hint: Start from the right sides of the above equations and use Eqs. 7.47 and 7.48.] In particular, for α = β, we have 2
The term “prosthaphaeresis” comes from πρoσθεσις and αφαιρεσις (prosthesis and aphaeresis, ancient Greek for addition and subtraction).
274
Part I. The beginning – Pre–calculus
1 (1 − cos 2α), 2 1 cos2 α = (1 + cos 2α), 2 1 sin α cos α = sin 2α. 2 sin2 α =
(7.62) (7.63) (7.64)
Note that adding Eqs. 7.62 and 7.63 yields sin2 α + cos2 α = 1 (Eq. 6.75). On the other hand, subtracting Eq. 7.62 from Eq. 7.63, we obtain cos2 α − sin2 α = cos 2α.
(7.65)
Moreover, we have cos3 α =
1 3 cos α + cos 3α. 4 4
(7.66)
Indeed, cos3 α =
1 1 1 cos α (1 + cos 2α) = cos α + cos α + cos 3α , 2 2 4
(7.67)
in agreement with Eq. 7.66. [Hint: For the first equality, use Eq. 7.63. For the second, note that cos α cos 2α = 12 [cos α + cos 3α] (Eq. 7.60, with β = 2α).] Similarly, one obtains sin3 α =
1 3 sin α − sin 3α, 4 4
(7.68)
as you may verify. [Hint: Use Eq. 7.62, as well as Eq. 7.61 with β = 2α, namely sin α cos 2α = 21 (− sin α + sin 3α).] Finally, for future reference, let us consider Eqs. 7.59, 7.60 and 7.61, and set α = (α + β )/2 as well as β = (β − α )/2, so that α − β = α and α + β = β , and then drop the symbol prime, which is no longer needed. This yields (interchange left and right sides) α+β β−α sin , 2 2 β−α α+β cos , cos α + cos β = 2 cos 2 2 β−α α+β cos . sin α + sin β = 2 sin 2 2
cos α − cos β = 2 sin
(7.69) (7.70) (7.71)
275
7. Basic analytic geometry
7.5.4 Properties of triangles and circles
♥
Here, as applications of the preceding results, we consider an extension of the Pythagorean theorem (Eq. 5.6) valid for any triangle. We have the following Theorem 69. Consider a generic triangle. We have (Fig. 7.10) c2 = a2 + b2 − 2 a b cos γ.
(7.72)
Fig. 7.10 Generalization of the Pythagorean theorem
◦ Comment. This is a well–known and useful equation that generalizes the Pythagorean theorem. [Indeed, if γ = π/2, Eq. 7.72 reduces to the Pythagorean theorem.] Its proof is quite simple if one uses physicists’ vectors, which will be introduced in Chapter 15 (see Eq. 15.41). However, we will need this equation before then. In order to avoid using an equation that has not yet been proven, here we present a proof that, albeit more cumbersome and less gut–level, uses only the tools of analytic geometry. ◦ Proof : We start with the fact that c = a + b := a cos α + b cos β. [If you want to make your life simple, limit yourself to Fig. 7.10.a. If you so wish, you may verify that the results are valid as well for Fig. 7.10.b, provided that you take into account that a := a cos α is negative.] Hence, c2 = a2 cos2 α + b2 cos2 β + 2 a b cos α cos β.
(7.73)
I claim that Eqs. 7.72 and 7.73 are fully equivalent. To this end, we show that their difference, denoted by Δ, vanishes. Indeed, we have Δ = a2 sin2 α + b2 sin2 β − 2 a b (cos γ + cos α cos β).
(7.74)
276
Part I. The beginning – Pre–calculus
Next, use α + β + γ = π (triangle postulate, Postulate 1, p. 182), as well as cos(x − π) = − cos x (Eq. 6.69) and the cosine addition formula (Eq. 7.48), to obtain cos γ = − cos(α + β) = − cos α cos β + sin α sin β.
(7.75)
Therefore, we have Δ = a2 sin2 α + b2 sin2 β − 2 a b sin α sin β = (a sin α − b sin β)2 , which vanishes because a sin α = b sin β = c (Fig. 7.10).
(7.76)
We are now in a position to obtain the following Theorem 70 (Inverse Pythagorean theorem). The Pythagorean theorem (a2 + b2 = c2 , Eq. 5.6) holds only for right triangles. ◦ Proof : From Eq. 7.72 we have that c2 = a2 + b2 iff γ = π/2.
In closing this section, we want to give an analytic–geometry proof (more elegant, but less gut–level) of two theorems presented in Chapter 5 regarding circles and right triangles, namely Theorem 60, p. 201, and its inverse, Theorem 61, p. 201. We have the following Theorem 71. Consider Fig. 7.11. Let AB be a diameter of a circle, and P any other point on the circle. Then, the triangle AP B has a right angle at P . Vice versa, if the triangle AP B has a right angle at P , then the distance of P from the midpoint O of the segment AB equals R := 12 AB, namely P belongs to a circle with center O and radius R.
Fig. 7.11 Angle AP B = π/2
7. Basic analytic geometry
277
◦ Proof : To begin with, let us consider the triangle ABP in Fig. 7.11, and the midpoint O of the segment AB. Set ρ := OP and R := 12 AB. Using a frame of reference with the origin at O, we have xP = OD = ρ cos θ and yP = DP = ρ sin θ, for θ ∈ (0, π). Accordingly, we have 2
a2 := BP = (R − xP )2 + yP2 = (R − ρ cos θ)2 + ρ2 sin2 θ, 2
b2 := AP = (R + xP )2 + yP2 = (R + ρ cos θ)2 + ρ2 sin2 θ.
(7.77)
Therefore, we obtain the identity a 2 + b2 = 2 R2 + 2 ρ2 cos2 θ + sin2 θ = 2 R2 + 2 ρ2 .
(7.78)
Now, let us consider the first part of the theorem, namely that ρ = R. Then, is a right triangle Eq. 7.78 yields a2 +b2 = (2R)2 = c2 . This implies that ABP (inverse Pythagorean theorem, Theorem 70, above). For the second part, if the angle at P is right (by hypothesis, now), using the Pythagorean theorem (Eq. 5.6), we have a2 + b2 = 4 R2 , and Eq. 7.78 yields ρ = R.
7.6 Spheres and balls We have the following Definition 141 (Sphere and ball). A sphere is the locus of the points (x, y, z) that are equidistant from a given point C = (xC , yC , zC ). The common distance R is called the radius of the sphere, the point C its center. The solid figure formed by all the points that are inside the sphere, possibly including boundary points (namely points of the sphere itself) is called a ball. We use the term open ball iff all the boundary points are excluded, and closed ball iff all the boundary points are included. [Here again, note the difference with everyday language, where the term sphere, more often than not, is understood to mean also what here we call a ball.] ◦ Comment. Note that the term “closed” (“open”) is used also for intervals (Definition 26, p. 43) and disks (Definition 83, p. 177) pretty much with the same meaning, namely to indicate that all (none of) the boundary points are included in the set. Using the above definition, in analogy with Eq. 7.40 for circles, the implicit representation of a sphere with center C and radius R is (x − xC )2 + (y − yC )2 + (z − zC )2 − R2 = 0.
(7.79)
278
Part I. The beginning – Pre–calculus
In analogy with Eq. 7.41 for circles, an explicit representation of a sphere is given by . z = zC ± R2 − (x − xC )2 − (y − yC )2 . (7.80) Finally, in analogy with Eq. 7.42 for a circle, a parametric representation is given by x = xC + R cos u cos v,
y = yC + R cos u sin v,
z = zC + R sin u, (7.81)
with u ∈ [−π/2, π/2] and v ∈ (−π, π]. [Indeed, substituting these expressions into Eq. 7.79, you may verify that such an equation is satisfied.] On the other hand, the equation for an open ball is (x − xC )2 + (y − yC )2 + (z − zC )2 < R2 ,
(7.82)
whereas that for a closed ball is (x − xC )2 + (y − yC )2 + (z − zC )2 ≤ R2 .
(7.83)
◦ Comment (Spherical coordinates). In Eq. 7.81 R is a constant. If the radius is not kept constant, we use the symbol r. The variables r, u and v are called spherical coordinates. Note that these coordinates are quite familiar to you: on the Earth, u coincides with the latitude, v with the longitude, whereas r represents the distance from the center of the Earth (which is where we place the origin, so that xC = yC = zC = 0.
7.7 Conics: the basics In this section, we introduce the so–called conics, namely ellipses, parabolas, and hyperbolas. [The etymology of these terms is best discussed in connection with the material Section 7.10, on conic sections (see Footnote 5, p. 294).] ◦ Comment. What do conics have in common? Their representations are given below, in Eqs. 7.84, 7.87 and 7.90 respectively. If you note, these equations may be written as quadratic polynomials in x and y equated to zero. This is the common thread among ellipses, parabolas, and hyperbolas. Indeed, it will be shown in the next section on the canonical representation of conics that — through a suitable translation and rotation of the x- and y-axes — any real quadratic polynomial in x and y equated to zero may be reduced to either Eq. 7.84 (ellipse), or Eq. 7.87 (parabola), or Eq. 7.90 (hyperbola), plus some special (degenerate) cases, addressed in the next section.
279
7. Basic analytic geometry
In this section, we limit ourselves to the standard (implicit, explicit, parametric) representations, using particularly convenient frames of reference. [Here, the representations are in terms of Cartesian coordinates. Those in polar coordinates are addressed in Section 7.9.]
7.7.1 Ellipses Closely related to the circle is the ellipse, whose standard implicit representation is x2 y2 + 2 = 1. 2 a b
(7.84)
Equation 7.84 is depicted in Fig. 7.12, where, without loss of generality, we assume a > b (horizontal ellipse). Note that ±a are the values that x takes when y = 0, whereas ±b are the values that y takes when x = 0. Accordingly, the parameters a and b are called the semi–major and semi–minor axes.
Fig. 7.12 The ellipse x2 /a2 + y 2 /b2 = 1
◦ Comment. If a = b = R, Eq. 7.84 reduces to x2 + y 2 = R2 (Eq. 7.39), namely the equation of a circle centered on the origin, with radius R. [Even more interesting, if x/a and y/b were replaced, respectively, by x/R and y/R, the ellipse would be replaced by a circle of radius R. Thus, an ellipse may be obtained from a circle by a suitable scaling in the x- or y-directions.] The explicit representation for the above ellipse is y = ± b 1 − (x/a)2 , (7.85) whereas the parametric one is
280
Part I. The beginning – Pre–calculus
x = a cos u
and
y = b sin u,
(7.86)
as you may verify. [Substitute Eq. 7.86 into Eq. 7.84.]
7.7.2 Parabolas Another geometrical figure of interest is the parabola. Specifically, we have the vertical parabola (Fig. 7.13) y = a x2 .
(7.87)
[This parabola curves upwards (downwards) if a > 0 (a < 0).] Similarly, we have x = b y 2 for the horizontal parabola.
Fig. 7.13 The vertical parabola y = a x2
Remark 57. Sometimes the geometrical figure addressed here is referred to as second–order parabolas, to distinguish it from n-th order parabolas, such as y = a xn and x = b y n . ◦ Comment. In Eq. 6.32, we have encountered a different expression for the vertical parabola, namely y = a x2 + b x + c.
(7.88)
Let us show that the two expressions are equivalent. To this end, note that, setting x = ξ + x0 , and y = η + y0 in the above equation, and then choosing x0 = −
b 2a
and
y0 = a x20 + b x0 + c,
(7.89)
281
7. Basic analytic geometry
we obtain η = a ξ 2 (as you may verify), in agreement with Eq. 7.87. [This sequence of operations corresponds to using a different origin, namely to a simple translation of the axes (Eq. 7.1).] ◦ Comment. In Subsection 6.3.3, we have studied the roots of the quadratic equation a x2 + b x + c = 0. Here, we can present similar considerations. The intersection of the parabola y = a x2 (a > 0) with the horizontal line y = y∗ are real if y∗ > 0, double if y∗ = 0, imaginary if y∗ < 0.
7.7.3 Hyperbolas A third geometrical figure of interest to us is the hyperbola, whose implicit representation is (horizontal hyperbola) x2 y2 − = 1. a2 b2
(7.90)
If a = b, the figure is called an equilateral hyperbola. [The hyperbola corresponding to Eq. 7.90 is shown in Fig. 7.14, where a = b.]
Fig. 7.14 Hyperbola
x2 y2 − 2 =1 a2 b
Fig. 7.15 Hyperbola
y2 x2 − 2 =1 a2 b
Similarly, the equation y2 x2 − 2 = 1, 2 a b
(7.91)
which is obtained from Eq. 7.90 by interchanging the x- and y-axes, represents the same figure rotated by 90◦ (vertical hyperbola). [The hyperbola corresponding to Eq. 7.91 is shown in Fig. 7.15, where again a = b.]
282
Part I. The beginning – Pre–calculus
The explicit representations for Eq. 7.90 is given by the double–valued function x2 y=± − 1. (7.92) a2 ◦ Comment. For the sake of completeness, let me anticipate the parametric representation for a hyperbola, namely x = a cosh u
and
y = b sinh u,
(7.93)
where cosh x and sinh x are the so–called hyperbolic functions, which will be introduced in Eqs. 12.42 and 12.43, and satisfy the relationship cosh2 x − sinh2 x = 1 (Eq. 12.46). [Be reassured, Eq. 7.93 will not be used before then.] ◦ Asymptotes. As |x| and |y| tend to infinity, the right side of Eqs. 7.90 and 7.91 becomes negligible, and the hyperbolas approach the lines y x =± , b a
(7.94)
which are called the asymptotes of the hyperbola. ◦ An alternate representation of hyperbolas. Note that the equation x y = c,
(7.95)
or its equivalent y = c/x, also represent an equilateral hyperbola, with asymptotes x = 0 and y = 0. [Fig. 7.16 (Fig. 7.17) for c > 0 (c < 0).]
Fig. 7.16 The function y = 1/x
Fig. 7.17 The function y = −1/x
Indeed, let us perform a 45◦ rotation of √ the axes. [Use Eq. 7.3, with θ = π/4, namely sin(π/4) = cos(π/4) = 1/ √ √2 (Eq. 6.81).] Accordingly, set in Eq. 7.95 x = (u − v)/ 2 and y = (u + v)/ 2. This yields (u − v)(u + v) =
283
7. Basic analytic geometry
u2 − v 2 = 2c, which is equivalent to Eq. 7.90, with a2 = b2 = 2c if c > 0 (or to Eq. 7.91, with a2 = b2 = −2c if c < 0).
7.8 Conics. General and canonical representations
♠
In this section, for the sake of completeness, we want to show that the expression A x2 + 2 B x y + C y 2 + D x + E y + F = 0,
(7.96)
with A, B and C not all zero, always represents a conic (either an ellipse, or a parabola, or a hyperbola), except for some special cases, known as the degenerate conics. Specifically, we want to show that, performing a rotation and a translation of the axes (Section 7.1), Eq. 7.96 may be transformed into either Eq. 7.84 (ellipse), or Eq. 7.87 (parabola), or Eq. 7.90 (hyperbola), or one of the above mentioned degenerate cases. To this end, let us set x = u cos θ − v sin θ
and
y = u sin θ + v cos θ,
(7.97)
which corresponds to a rotation of the axes about the origin (Eq. 7.3). Combining with Eq. 7.96, one obtains A1 u2 + 2 B1 u v + C1 v 2 + D1 u + E1 v + F1 = 0,
(7.98)
with A1 = A cos2 θ + 2 B cos θ sin θ + C sin2 θ, B1 = (C − A) cos θ sin θ + B (cos2 θ − sin2 θ), C1 = A sin2 θ − 2 B cos θ sin θ + C cos2 θ, D1 = D cos θ + E sin θ, E1 = −D sin θ + E cos θ, F1 = F,
(7.99)
as you may verify. Next, we choose θ so as to have B1 = 0. Specifically, if B = 0, θ is obtained by imposing tan2 θ − 2 H tan θ − 1 = 0, with 2 H = (C − A)/B. This yields (use Eqs. 6.33 and 6.36)
(7.100)
284
Part I. The beginning – Pre–calculus
tan θ± = H ± H 2 + 1.
(7.101)
Note that both roots are real, since H 2 + 1 > 0. [On the other hand, if B = 0, we may use θ such that cos θ sin θ = 0, which yield B1 = 0, as desired.] ◦ Comment. Let us address this result in greater depth. Recall Eq. 6.40, namely the relationship z+ z− = c/a between the product of the two roots z+ and z− of the quadratic equation, and the coefficients a and c. In the case of Eq. 7.100, we have c/a = −1; this yields tan θ− = −1/ tan θ+. This implies that θ+ and θ− differ by π/2, because tan(x − π/2) = −1/ tan x (Eq. 6.71). Moreover, we have tan(θ + π) = tan θ (Eq. 6.69). These two observations lead to the fact that there are four possible choices for the axes, each obtained by a 90◦ rotation from each other. This is to be expected. A rotation of 90◦ means replacing the x-axis with the y-axis, and the y-axis with minus the x-axis. Similarly, a rotation of 180◦ means replacing the x-axis with minus the x-axis, and the y-axis with minus the y-axis. Remark 58. Next, note that the assumption that A, B and C do not all vanish (introduced just below Eq. 7.96) implies that A1 , B1 and C1 cannot all vanish. Indeed, had we had A1 = B1 = C1 = 0, Eq. 7.98 would represent a straight line, whereas Eq. 7.96 does not, since we assumed A, B and C not to be all zero. This is impossible, because a simple change of the axis cannot change the nature of the line. Alternatively, you may consider the first three in Eq. 7.99 as a system of equations for the unknowns A, B and C, with determinant different from zero, as you may verify, with some patience. [Hint: Use the Sarrus rule (Eq. 3.166). You’ll get that the determinant is cos6 θ + 3 cos4 θ sin2 θ + 3 cos2 θ sin4 θ + sin6 θ = (cos2 θ + sin2 θ)3 = 1.] This implies that A1 = B1 = C1 = 0 requires, necessarily, A = B = C = 0. Thus, having shown that we cannot have A1 = B1 = C1 = 0, and having chosen θ so as to have B1 = 0, we are left with the following possibilities: 1. Either A1 or C1 equals zero, 2. Neither A1 nor C1 equals zero. Let us consider first Item 1, say that C1 = 0. [The case A1 = 0 is analogous and corresponds to a horizontal parabola.] In this case, we are left with A1 u2 + D1 u + E1 v + F1 = 0.
(7.102)
If E1 = 0, we have a vertical axis parabola, namely v = a u2 + b u + c (Eq. 7.88), with a = −A1 /E1 , b = −D1 /E1 and c = −F1 /E1 . On the other hand, if E1 = 0, we have A1 u2 + D1 u + F1 = 0. This yields u = u±, with v arbitrary, namely a degenerate case — two vertical straight lines, not necessarily real and distinct, on the plane (u, v).
285
7. Basic analytic geometry
Next, let us consider Item 2, namely that A1 = 0 and C1 = 0. In this case, Eq. 7.98 (again with B1 = 0) may be written as 2 2 D1 E1 D2 E2 + C1 v + + F1 − 1 − 1 = 0. A1 u + 2 A1 2 C1 4 A1 4 C1
(7.103)
Next, set ξ := u + D1 /(2A1 ) and η := v + E1 /(2C1 ), which corresponds to a translation of the axes (Eq. 7.1). Then, we have A1 ξ 2 + C1 η 2 + F˘1 = 0,
(7.104)
with F˘1 := F1 − 14 D12 /A1 − 14 E12 /C1 . Now, we have two subcases: F˘1 = 0 and F˘1 = 0. If F˘1 = 0, Eq. 7.104 may be written as A2 ξ 2 + C2 η 2 = 1,
(7.105)
with A2 = −A1 /F˘1 and C2 = −C1 /F˘1 . Here, we have three possibilities. If A2 and C2 are both positive, Eq. 7.105 corresponds to an ellipse (see Eq. 7.84, with a2 = 1/A2 and b2 = 1/C2 ). On the other hand, if A2 and C2 have opposite signs, we have a hyperbola (see either Eq. 7.90 with a2 = 1/|A2 | and b2 = 1/|C2 |, or Eq. 7.91). The third possibility is that A2 and C2 are both negative. Then, there are no real points (x, y) that satisfy Eq. 7.105 (and hence Eq. 7.96) — another degenerate case! The second subcase is F˘1 = 0. Then, there are two possibilities: if A1 and C1 have the same sign, then Eq. 7.104 yields ξ = η = 0: a degenerate case of a circle with radius R = 0. If A1 and C1 have opposite signs, then recalling that (a + b) (a − b) = a2 − b2 (Eq. 6.37), Eq. 7.104 may be rewritten as (α ξ + β η)(α ξ − β η) = 0,
(7.106)
with α2 := |A1 | and β 2 := |C1 |. We have another degenerate case, namely two straight lines through the origin — the limit case of a hyperbola approaching its asymptotes! In summary, we have obtained that, if we exclude all the degenerate cases, only the following possibilities exist: (i) if C1 = 0 we have a vertical parabola, whereas if A1 = 0 we have a horizontal parabola; (ii) if A1 and C1 have the same sign and F˘1 has the opposite one, we have an ellipse; (iii) if A1 and C1 have opposite signs and F˘1 = 0, we have a hyperbola. ◦ Comment. The analysis presented above is quite cumbersome and limited to two dimensions. You are going to ask me: “Is there a simpler way to obtain these results? Can we generalize the results to more than two dimensions?” The answer to both questions is: “Yes!” A much simpler derivation of the results presented here, along with their extension to n-dimensional
286
Part I. The beginning – Pre–calculus
problems, may be obtained by using a more sophisticated mathematics than that available at this point, specifically by addressing the so–called eigenvalue problem in linear algebra, which is barely touched upon in this volume (Subsection 16.3.4). Accordingly, such a problem is addressed in Vol. II, after the eigenvalue problem in linear algebra is addressed in detail.
7.9 Conics. Polar representation
♥
In this section, we discuss another type of representation for the conics, the so–called polar representation, namely a representation based upon the use of polar coordinates, which were introduced in Subsection 7.5.1. In this book, these representations will be used exclusively in the discussion of the Kepler laws on the motion of the planets (Subsection 20.8.3). Thus, you may postpone the reading of this section until we reach that point (or even skip it altogether if you are not interested in the Kepler laws).
7.9.1 Ellipses
♥
Consider the equation for the ellipse (Eq. 7.84), namely x2 y2 + 2 = 1. 2 a b
(7.107)
Here, without loss of generality, we assume a > b (horizontal ellipse). This equation is depicted in Fig. 7.18, for a = 5, b = 3, c = 4.
Fig. 7.18 Polar representation for ellipse
287
7. Basic analytic geometry
Let us introduce the points F± := (±c, 0), with c :=
a2 − b2 > 0.
(7.108)
These points are called the foci of the ellipse.3 In addition, let us introduce the semi–latus rectum for the ellipse.4 This is defined as the length of the vertical segment from F+ = (c, 0) to the point 2 Q := (xQ, yQ) on the ellipse, with xQ = c and yQ = . Using x2Q/a2 +yQ /b2 = 1 (Eq. 7.84), we have . := yQ = b 1 − x2Q/a2 = b 1 − c2 /a2 = b b2 /a2 , (7.109) namely = b2 /a = (a2 − c2 )/a.
(7.110)
Next, let us introduce a system of polar coordinates, r and ϕ, with origin at the left focus, namely at F− = (−c, 0). This yields x = r cos ϕ − c
and
y = r sin ϕ,
(7.111)
as shown in Fig. 7.18. Combining with Eq. 7.107, one obtains (r cos ϕ − c)2 (r sin ϕ)2 + = 1. a2 b2
(7.112)
Multiplying both sides of the equation by a2 b2 , and using sin2 ϕ = 1 − cos2 ϕ (Eq. 6.75), we have b2 r2 cos2 ϕ − 2 c r cos ϕ + c2 + a2 r2 1 − cos2 ϕ = a2 b2 , (7.113) or, rearranging terms, a2 r2 + b2 − a2 r2 cos2 ϕ − 2 b2 c r cos ϕ + b2 c2 − a2 = 0.
(7.114)
Next, use a2 = b2 + c2 (Eq. 7.108), to obtain 2 a2 r2 = c2 r2 cos2 ϕ + 2 b2 c r cos ϕ + b4 = c r cos ϕ + b2 ,
(7.115)
namely a r = b2 + c r cos ϕ. This is equivalent to r(a − c cos ϕ) = b2 , or 3 4
Focus is Latin for hearth, fireplace. Latus rectum is Latin for upright side.
288
Part I. The beginning – Pre–calculus
, 1 − cos ϕ
r=
(7.116)
where = b2 /a (Eq. 7.110) is the semi–latus rectum, whereas (use Eq. 7.108) := c/a = 1 − b2 /a2 ∈ (0, 1) (7.117) is known as the eccentricity of the ellipse. Equation 7.116 is the desired representation of the ellipse in polar coordinates.
• How to draw an ellipse As an application, let us assume that you want to draw an ellipse, say on the wall between the living room and the dining room of your apartment, so as to create a doorway with an elliptic top connecting the two. To this end, you may use the following procedure: (i) hammer two nails on the wall, at the same height from the floor; let 2c denote the distance between the two nails; (ii) get a string, of length 2a > 2c, and tie the two ends of the string to the nails; (iii) hold a pencil against the wall and use it to keep the string taut (so as to consist of two segments) and then move the pencil around. The mark left on the wall will have the desired (horizontal) elliptical shape, whose foci F± coincide with the locations of the two nails (Fig. 7.18, p. 286). To show this, let P denote the location of the pencil, as we move it around. Set r := P F−.
(7.118)
In the original frame of reference, we have xP = r cos ϕ − c and yP = r sin ϕ, as well as xF = c and yF = 0, and hence +
P F+ =
+
x P − xF
2 +
+ y P − yF
2
+
=
.
2 2 c − r cos ϕ + r2 sin2 ϕ.
Accordingly, the length of the string is given by . 2 a := P F− + P F+ = r + (2 c − r cos ϕ)2 + r2 sin2 ϕ .
(7.119)
(7.120)
This implies (2a − r)2 = (2c − r cos ϕ)2 + r2 sin2 ϕ, namely 4 a2 − 4 a r + r2 = 4 c2 − 4 r c cos ϕ + r2 cos2 ϕ + r2 sin2 ϕ.
(7.121)
289
7. Basic analytic geometry
The three terms that contain r2 offset each other. Accordingly, we are left with 4 a2 − 4 a r = 4 c2 − 4 r c cos ϕ. Then, dividing by 4a and using = (a2 − c2 )/a (Eq. 7.110) as well as := c/a (Eq. 7.117), the above expression simplifies into r (1 − cos ϕ) = , in agreement with the polar representation for the ellipse (Eq. 7.116). This shows that the procedure presented at the beginning of this subsubsection does indeed yield an ellipse.
7.9.2 Parabolas
♥
Consider the horizontal parabola described by y 2 = 4˚ ax ˚ a>0 .
(7.122)
Remark 59. Equation 7.122 is typically written as y 2 = 4 a x. This would facilitate the comparison with the results for ellipses and hyperbolas. However, I want to be able to distinguish the symbol a used for the vertical axis–parabola (namely y = a x2 , Eq. 7.87, also a standard notation), from the symbol in Eq. 7.122. This is the reason why I use ˚ a in Eq. 7.122. Consider the point F = (˚ a, 0), called the focus of the parabola. In addition, let us introduce the semi–latus rectum, , for the parabola. This is defined as the length of the vertical segment from F to Q (Fig. 7.19). Using 2 y 2 = 4˚ a x (Eq. 7.122), we have 2 := yQ = 4˚ a2 , namely := 2˚ a.
(7.123)
Fig. 7.19 Polar representation of parabola
Next, let us introduce a system of polar coordinates, r and ϕ, with origin at the focus F . This yields, in the original frame of reference (use Fig. 7.19)
290
Part I. The beginning – Pre–calculus
x = r cos ϕ + ˚ a
and
y = r sin ϕ,
with ϕ ∈ [0, π].
(7.124)
Combining Eqs. 7.122 and 7.124, one obtains r2 sin2 ϕ − 4˚ a r cos ϕ − 4˚ a2 = 0.
(7.125)
This is a quadratic equation for r, whose roots are given by (Eq. 6.36) 2˚ a cos ϕ ± 4˚ a2 cos2 ϕ + 4˚ a2 sin2 ϕ cos ϕ ± 1 r= = 2˚ a . (7.126) 2 sin ϕ sin2 ϕ The root with the minus sign is negative, because ˚ a > 0 (Eq. 7.122), and hence it must be discarded. [This is further discussed in the subsubsection that follows.] Thus, noting that sin2 ϕ = 1 − cos2 ϕ = (1 + cos ϕ)(1 − cos ϕ), we are left with r=
, 1 − cos ϕ
(7.127)
where = 2˚ a is the semi–latus rectum (Eq. 7.123). Equation 7.127 is the desired representation of the parabola in polar coordinates. [Note that r(π) = ˚ a, r(π/2) = and r(0) = ∞. For ϕ < 0 we obtain the lower branch of the parabola.] ◦ Comment. It may be noted that Eq. 7.127 equals Eq. 7.116 with = 1. Thus, we can say that the eccentricity for a parabola is = 1.
• Spurious roots Regarding the reason for ignoring the second (negative) root, some comments are in order. When a solution appears that is not a valid solution to the original problem, the solution is called a spurious (or extraneous) solution to the problem. On the other hand, when a solution to the original problem does not appear among the solutions obtained, such a solution is called a missing solution to the problem. [For instance, consider the equations x − 2 = 0 and x2 − 4 = 0. Both yield a root x = 2. However, the second has also a root x = −2. If the original problem is given by the first equation, the root x = −2 is called a spurious (or extraneous) solution to the problem. On the other hand, if the original problem is given by the second equation, the root x = −2 is called a missing solution to the problem.] The problem may occur whenever we perform an operation that is not uniquely invertible. [In the above example, the first equation may be written as x = 2. Taking the square of both sides we have the second equation, x2 = 4, thereby introducing a new root, x = −2!]
291
7. Basic analytic geometry
In the case under consideration however, the situation is more subtle. The second root yields r = −2˚ a/(1 + cos ϕ). This is the correct expression if ˚ a < 0. This corresponds to the horizontal parabola that lies on the negative half–plane, namely x < 0. Specifically, now we have r(0) = −˚ a > 0, r(π/2) = −2˚ a =: > 0 and r(π) = ∞.
7.9.3 Hyperbolas
♥
Consider a horizontal hyperbola, given by Eq. 7.90, namely x2 y2 − = 1, a2 b2
(7.128)
which is depicted in Fig. 7.20 (where a = b = 1), along with its asymptotes x/a = ± y/b (Eq. 7.94).
Fig. 7.20 Polar representation for hyperbola
Introduce the points F± = (± c, 0), with c=
a2 + b2 > 0.
(7.129)
The points F± are called the foci of the hyperbola. In addition, let us introduce the semi–latus rectum, , for the hyperbola, which is defined as the length of the vertical segment from F+ = (c, 0) to the point Q on the hyperbola, namely to Q = (xQ, yQ), with xQ = c and yQ = . Using x2 /a2 − y 2 /b2 = 1 (Eq. 7.128), we have . := yQ = b x2Q/a2 − 1 = b c2 /a2 − 1 = b b2 /a2 , (7.130)
292
Part I. The beginning – Pre–calculus
namely = b2 /a = (c2 − a2 )/a.
(7.131)
Next, let us introduce a system of polar coordinates, r and ϕ, with origin at the right focus, F+ = (c, 0) (see again Fig. 7.20). This yields the following representation for the right branch of the hyperbola x = r cos ϕ + c > 0
and
y = r sin ϕ.
(7.132)
Combining with x2 /a2 − y 2 /b2 = 1 (Eq. 7.128), one obtains (r cos ϕ + c)2 (r sin ϕ)2 − = 1. a2 b2
(7.133)
Multiplying both sides of the equation by a2 b2 and using sin2 ϕ = 1 − cos2 ϕ (Eq. 6.75), we have b2 (r2 cos2 ϕ + 2 c r cos ϕ + c2 ) − a2 r2 (1 − cos2 ϕ) = a2 b2 ,
(7.134)
or, rearranging terms, −a2 r2 + (b2 + a2 ) r2 cos2 ϕ + 2 b2 c r cos ϕ + b2 (c2 − a2 ) = 0.
(7.135)
Next, using c2 = a2 + b2 (Eq. 7.129), one obtains a2 r2 = c2 r2 cos2 ϕ + 2 b2 c r cos ϕ + b4 = (c r cos ϕ + b2 )2 ,
(7.136)
namely a r = b2 + c r cos ϕ. This is equivalent to r (a − c cos ϕ) = b2 , or r=
, 1 − cos ϕ
where = b2 /a is the semi–latus rectum (Eq. 7.131), whereas := c/a = 1 + b2 /a2 > 1
(7.137)
(7.138)
is known as the eccentricity of the hyperbola. Equation 7.137 is the desired representation of the upper portion of the right branch of the hyperbola in polar coordinates. ◦ Comment. ♥ Equation 7.137 requires some clarifications regarding the sign of r. No problem arises if ϕ ∈ (ϕ∞ , π], with ϕ∞ = cos -1(1/) ∈ (0, π/2). Specifically, for ϕ = π, we obtain (as expected)
293
7. Basic analytic geometry
r=
b2 /a c 2 − a2 = = = c − a > 0, 1+ε 1 + c/a a+c
(7.139)
as you may verify. On the other hand, for ϕ = π/2, Eq. 7.137 yields r = , in agreement with the definition of yQ = (Fig. 7.20). Finally, we have that, as ϕ decreases to ϕ∞ , we obtain that r(ϕ) becomes infinitely large. [This corresponds to the angle of the asymptote x/a = y/b (Eq. 7.94). Indeed, cos ϕ = 1/ = a/c corresponds to tan ϕ = b/a, as you may verify.] In the comment above, we have dealt with ϕ ∈ (ϕ∞ , π], which corresponds to the right branch of the hyperbola under consideration. What happens when ϕ ∈ [0, ϕ∞ ). For ϕ = 0, Eq. 7.137 yields r=
b2 /a c2 − a2 = = = −(c + a) < 0. 1−ε 1 − c/a a−c
(7.140)
Let us see how this may be interpreted. From Fig. 7.20 we see that on the x-axis, the distance between F+ and left branch of the hyperbola is c + a. However, the direction is opposite to that of the ray from F+ with ϕ = 0. Accordingly, can we state that Eq. 7.137 holds as well for the left branch, provided that we interpret the fact that r < 0 as to mean that the distance |r| is measured in the direction opposite to that of the ray? Could this cover the left branch of the hyperbola under consideration? I’ll let you explore this possibility.
7.9.4 Summary
♥
Let us summarize our results. [Here we use a, in place of ˚ a (Remark 59, p. 289), so as to provide a uniform notation for all three conics.] That said, all three conics are expressed as (Eqs. 7.116, 7.127 and 7.137) r=
, 1 − cos ϕ
(7.141)
where the eccentricity , and the semi–latus rectum are given by Table 7.1, along with the variable c.
294
Part I. The beginning – Pre–calculus
Conic
Ellipse
c/a =
Parabola
Hyperbola
(
c2
b2 /a = a(1 − 2 )
a2 − b 2
1 − b2 /a2 < 1
1
c/a =
1 + b2 /a2 > 1
2a
NA
b2 /a = a( 2 − 1)
a2 + b 2
Table 7.1 Eccentricity , semi–latus rectum (, and c
7.10 Appendix. “Why conic sections?”
♥
As I was told in college, the term “conics” (originally known as conic sections) stems from the fact that these figures may be obtained as the intersection of a circular cone with a plane that does not contain the vertex of the cone.5 However, I had never seen a proof for it. If you are as curious as I was then (and still am now), here is the missing proof — I finally got it. [This section is provided primarily to satisfy any curiosity that you might have about this matter. It satisfied mine. In addition, it will help you to familiarize yourself with the methods of analytical geometry.] Consider a circular cone with the vertex in the origin O = (0, 0, 0), and the axis directed like the z-axis (gray region in Fig. 7.21). [I hope I do not have to define a cone. You know what a cone is! It looks like an ice cream cone! A cone is called circular iff its cross section is acircle.] The equation for this figure is obtained by imposing the radius r = x2 + y 2 of the cone’s intersection with a horizontal plane to be proportional to the altitude z of the plane itself. This yields x2 + y 2 = a z, as you may verify, or x2 + y 2 = z 2 ,
(7.142)
5 On etymology. Specifically, consider a vertical–axis circular cone, in which the lines through the origin make an angle π/4 (45◦ ) with the horizontal plane (Fig. 7.21). As we will see, the conics are the intersection of such a cone with a plane that forms an angle θ with the horizontal. If θ ∈ (0, π/4) we have an ellipse, if θ = π/4 we have a parabola, if θ ∈ (π/4, π/2) we have a hyperbola. Accordingly, ellipse comes from ελλειψις (ellipsis), ancient Greek for a falling short (in the sense that it does not intersect both branches of the cone); parabola comes from παραβoλη (parabole), ancient Greek for a throwing over, a placing side by side (in the sense that the plane runs parallel to a straight line of the cone); hyperbola comes from υπερβoλη (uperbole), ancient Greek for a throwing beyond (in the sense that the plane goes over, to the other branch of the cone).
295
7. Basic analytic geometry
where, for simplicity, I chose a = 1, so that the cone makes a 45◦ angle with its axis.
Fig. 7.21 Conic section
Consider Fig. 7.21. The point O is the vertex of the circular cone and the z-axis is its axis of revolution. The y-axis is perpendicular to the plane of the figure. The lines OA and OD are the intersections of the cone with the plane of the figure (namely the plane y = 0). Let us introduce a new set of axes (ξ, η, ζ), obtained by performing a rigid–body rotation θ of the x- and z-axes around the y-axis. Accordingly, for a generic point (x, y, z) we have (Eq. 7.3) x = ξ cos θ − ζ sin θ, y = η, z = ξ sin θ + ζ cos θ.
(7.143)
For simplicity, we limit ourselves to the case θ ∈ [0, π/2]. [If θ is in any of the other three quadrants, the conclusions are similar, as you may verify.] Combining with Eq. 7.142, one obtains ξ 2 cos2 θ − 2 ξ ζ cos θ sin θ + ζ 2 sin2 θ + η 2 = ξ 2 sin2 θ + 2 ξ ζ sin θ cos θ + ζ 2 cos2 θ.
(7.144)
Next, consider the plane ζ = 1 (see again Fig. 7.21). This plane forms an angle θ with the (horizontal) z = 0 plane. The points A, B, C and D lie on such a plane. Of course, OB = 1, by definition. To obtain the intersection of the cone with such a plane, we set ζ = 1 in the above equation and obtain ξ 2 (cos2 θ − sin2 θ) − 4 ξ cos θ sin θ + η 2 = cos2 θ − sin2 θ.
(7.145)
Recall that 2 cos θ sin θ = sin 2θ and cos2 θ − sin2 θ = cos 2θ (Eqs. 7.64 and 7.65). Thus, Eq. 7.145 yields ξ 2 cos 2θ − 2 ξ sin 2θ + η 2 = cos 2θ, or
296
Part I. The beginning – Pre–calculus
ξ 2 − 2 ξ tan 2θ +
η2 = 1. cos 2θ
(7.146)
This may be rewritten as (ξ − tan 2θ)2 + η 2 / cos 2θ = 1 + tan2 2θ = 1/ cos2 2θ (use 1 + tan2 α = 1/ cos2 α, Eq. 6.76), or, multiplying both sides by cos2 2θ, (ξ − ξ0 )2 cos2 2θ + η 2 cos 2θ = 1,
(7.147)
where ξ0 := tan 2θ. Equation 7.147 is a particular case of Eq. 7.96, and hence it represents a conic in the (ξ, η)-plane. Finally, we want to show that, depending upon the angle θ, Eq. 7.147 represents either a circle, or an ellipse, or a parabola, or a hyperbola. Indeed, if θ = 0, we obtain the circle ξ 2 + η 2 = 1. Next, consider the case cos 2θ > 0, namely for θ ∈ (0, π/4), in which case Eq. 7.147 may be written as (ξ − ξ0 )2 η2 + = 1, a2 b2
(7.148)
where a := 1/ cos 2θ > 1 and b2 := 1/ cos 2θ = a < a2 . Hence, it represents a horizontal ellipse (Eq. 7.84). Next, consider the case cos 2θ < 0, namely for θ ∈ (π/4, π/2]. Now, Eq. 7.147 may be written as (ξ − ξ0 )2 η2 − 2 =1 2 a b
(7.149)
(where a = |1/ cos 2θ| > 1 and b2 = |1/ cos 2θ|), and hence it represents a horizontal hyperbola (Eq. 7.90). For the last case (namely for θ = π/4), first we multiply Eq. 7.146 by cos 2θ, to obtain ξ 2 cos 2θ − 2 ξ sin 2θ + η 2 = cos 2θ. Then, we can set θ = π/4, which yields cos 2θ = 0 and sin 2θ = 1, and obtain ξ = η 2 /2,
(7.150)
namely a horizontal parabola. In summary, a conic section corresponds to: (i) a circle if θ = 0, (ii) an ellipse if θ ∈ (0, π/4), (iii) a parabola if θ = π/4 and (iv) a hyperbola if θ ∈ (π/4, π/2].
Part II Calculus and dynamics of a particle in one dimension
Chapter 8
Limits. Continuity of functions
Here, we begin the second leg of our journey. Hold on to your hats, because the acceleration is going to be exhilarating — it will be like going from high school to college.
• Overview of Part II Let me start by giving you a brief overview of Part II. We begin with infinitesimal calculus (in particular, limits, differential and integral calculus (Chapters 8, 9 and 10, respectively), thereby introducing you to the third major area of mathematics (after algebra and geometry), namely that of mathematical analysis, of which calculus is the most elementary portion. This is the branch of mathematics that was introduced by Isaac Newton and Gottfried Wilhelm Leibniz, as outlined in Footnote 4, p. 12. Calculus has been instrumental in almost all further developments in mathematics. If you never had calculus in high school, these three chapters are definitely a major step, probably the crucial step in the entire book. Most likely, this step is going to be challenging, but — believe me — it is worth the effort. From now on, it’s gonna be a whole new ball game. After these three chapters, we will go back to the parallel journey of mechanics. Specifically, in Chapter 4 we have dealt with statics. In Chapter 11, we will address the foundations of dynamics, which deals with the motion of particles, as caused by forces. We will limit ourselves essentially to the case of single–particle dynamics in one dimension, as this is the best we can do with the mathematical tools available to us at this point. [Also, it is easier to introduce the field of dynamics within this limited framework.] Nonetheless, we will be able to obtain some explicit solutions for relatively simple problems.
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_8
299
300
Part II. Calculus and dynamics of a particle in one dimension
Next, in Chapter 12, we will go back to mathematics and introduce two new important functions, the logarithm and its inverse, the exponential. Then, in Chapter 13 we will show how to use the notion of derivative to obtain a polynomial approximation of any suitably smooth function f (x) in the neighborhood of a point within its interval of definition, via the so–called Taylor polynomials. The material covered in Chapters 12 and 13 will allow us to obtain, in Chapter 14, some explicit solutions for some more complicated mechanics problems, specifically one–dimensional single–particle dynamics in the presence of damping and/or aerodynamic drag. This is quite a mouthful to start with. Indeed, the most important conceptual notions used in this book are presented in embryo in Part II.
• Overview of this chapter Let us turn now to the specifics of this chapter, where we deal with the notion of limit. To provide you with a gut–level introduction to the limit, I will begin by addressing the well–known paradox of Achilles, who can never reach the tortoise (Section 8.1). This allows us to introduce the definition of the limit for a sequence of numbers, addressed in full in Section 8.2. Then, we shift gears and address the limit that a function, say f (x), attains, as the variable x approaches a given value (Section 8.3). We use this to introduce a formal definition of continuous functions (Section 8.4) and address some related notions, namely boundedness and uniform continuity (Section 8.5). Next, we shift to two or more variables. To this end, we present first some preliminary material on smooth lines, contours and regions (Section 8.6). Then, in Section 8.7, we consider functions of two variables and address the corresponding issues of continuity, boundedness and uniform continuity. In addition, we have an appendix (Section 8.8), where, to give a rigorous proof of a theorem presented in Section 8.2 (Theorem 73, p. 308, on the convergence of bounded increasing sequences), we introduce some important related material: accumulation points, the Bolzano–Weierstrass theorem and Cauchy sequences. This is a bit above and beyond the call of duty, but it will put your understanding of the mathematical foundations on a firmer ground.
• Introductory remarks As one of my professors used to say, in mathematical analysis there are five operations: the four standard algebraic operations (addition and multiplication, along with their inverses, subtraction and division) and the limit. In this chapter, we address the fifth operation — the limit. This will get you
301
8. Limits. Continuity of functions
acquainted with the field of infinitesimal calculus, which is the wondrous kingdom of limits. As we will see, the limit is a key ingredient for differential and integral calculus (Chapters 9 and 10), and indeed for much of this book. It is the quintessence of mathematical analysis. It allows us to accomplish things otherwise inconceivable, and helps us in many ways, when otherwise we would get stuck, as did people in the days when the concept of limit had not yet been introduced. What is the limit? Roughly speaking, the limit is a refined way of introducing a notion that we have already been using, namely the idea of “getting closer and closer to a certain value without ever reaching it.” [You remember that we have already used such a notion. For instance, rational numbers get closer and closer to an irrational number, with a bracket getting smaller and smaller (Eq. 1.76). We used it also in Section 6.5, on exponentiation, again to go from rational to irrational exponents. Finally, we used it in Subsection 6.4.5, to determine the area of a disk (Eq. 6.90).] ◦ Warning. From now on, we will modify our jargon and say instead “getting closer and closer to a certain value, and reaching it only in the limit.”
• Preliminary material Here, we present some preliminary material. Specifically, we recast the material in Section 1.6 in terms of points, instead of numbers. Recall the definition of a set of points (Definition 110, p. 208). We have the following definitions: Definition 142 (Interval). Let us consider all the real numbers x between a and b > a. The corresponding set forms an interval. Specifically, we use x ∈ (a, b)
to mean
a < x < b,
x ∈ [ a, b ]
to mean
a ≤ x ≤ b,
x ∈ (a, b ]
to mean
a < x ≤ b,
x ∈ [ a, b)
to mean
a ≤ x < b.
(8.1)
The above intervals are called bounded. [D´ej` a vu? See Eq. 1.73!] The interval (a, b), which excludes the end values, is called open, whereas the interval [a, b], which includes the end values, is called closed. Definition 143 (Length of interval). The length of a bounded interval equals |a − b|. This applies to all four types of intervals introduced above.
302
Part II. Calculus and dynamics of a particle in one dimension
Definition 144 (Neighborhood). A neighborhood of a point x of the real line is the collection of all the points of the real line in the interval (x0 −c, x0 + c), where c is an arbitrary positive number. [Note that a neighborhood is an open interval.] We use also the terms upper neighborhood to mean [x0 , x0 +c), and lower neighborhood to mean (x0 −c, x0 ]. These will be referred to as one– sided neighborhoods. If the interval is not bounded, we have, as in Eq. 1.74, x ∈ (a, ∞)
to mean
x > a,
x ∈ [ a, ∞)
to mean
x ≥ a,
x ∈ (−∞, b)
to mean
x < b,
x ∈ (−∞, b ]
to mean
x ≤ b,
x ∈ (−∞, ∞)
to mean any point.
(8.2)
The above intervals are called unbounded.
8.1 Paradox of Achilles and the tortoise Here, we present some gut–level remarks regarding the fifth operation, just to give you an introduction of our quintessential instrument, the limit. Consider the paradox of Achilles and the tortoise, which goes pretty much like this. Assume that the speed of the tortoise is a tenth of that of Achilles. [I know! This is a very fast tortoise, or Achilles that day was running very, very slow. The ratio 1 to 10 is only used to make the math simpler.] Assume also that Achilles starts one mile behind the tortoise. Then, once he covers one mile, the tortoise has moved a tenth of a mile. As he moves by an additional tenth of a mile, the tortoise has moved by an additional one hundredth of a mile. Thus, as Zeno of Elea claimed, Achilles would never reach the tortoise. He (Zeno) called this a paradox.1 Following tradition, it is still called a paradox, although, by now, we know that this is not the case, in the sense that we know how to address the problem, as shown below. To this end, once you introduce the notion of approaching (namely “getting closer and closer, as close as you wish, and reaching only in the limit”), this is no longer a paradox. For, adding the various distances, we obtain the following collection of numbers: 1, then 1 + 1/10 = 1.1, then 1
This so–called paradox was introduced by the Greek philosopher Zeno of Elea (c. 495– 430 BC), a member of the Eleatic School, founded by Parmenides (c. 515–450 BC). [Zeno of Elea is not to be confused with Zeno of Citium (c. 334–262 BC), the founder of Stoicism.]
303
8. Limits. Continuity of functions
1 + 1/10 + 1/102 = 1.11, then 1 + 1/10 + 1/102 + 1/103 = 1.111. If we keep going, we obtain 1 + 1/10 + 1/102 + · · · + 1/10n , which tends to (namely gets closer and closer to) 1.11111. . . = 1.1. Note that 1.1 = 1 + 1/9 = 10/9. [Just get your calculator and evaluate 10/9. And what did you get? Didn’t I tell you? Alternatively, use Eq. 1.49.] Thus, Achilles reaches the tortoise after one mile and one ninth of a mile. And he does so in a finite amount of time, namely 10/9 miles divided by Achilles’ own speed. Ancient Greeks (specifically, those in the fifth century BC) had a hard time with this whole process, because their ideas about limits were not crystal clear. That is why in those days the story of Achilles and the tortoise was presented as a paradox.
8.2 Sequences and limits Here, we want to formalize the notion of limit and make it rigorous. Consider the numbers 1/2, 1/3, 1/4, 1/5, . . . . To be more formal, let me write the above collection of numbers as s1 = 1, s2 = 1/2, s3 = 1/3, s4 = 1/4, s5 = 1/5, . . . , or, more concisely, as sn = 1/n, where n = 1, 2, 3, 4, . . . . We say that this forms an ordered collection of infinitely many numbers. [I said “ordered ” because the order in which the numbers are listed is important and should not be altered.] Then, we have the following Definition 145 (Sequence. Bounded sequence). A sequence is an ordered collection infinitely many numbers, not necessarily distinct, and is of ∞ denoted by sn n=1 , or simply as sn if the range is clear from the context. The elements s1 , s2 , s3 , s4 , . . . , are called the terms of the sequence. We say that sn , with an unspecified value for n, denotes the generic term of the sequence, specifically the n-th term of the sequence. [Note the difference between sn (sequence) and sn (generic term of the sequence).] A sequence is called bounded iff there exists an integer M such that |sn | < M,
for all n.
(8.3)
Otherwise is called unbounded. For instance, the sequence {n} is unbounded, whereas the sequence {1/n} is bounded. ◦ Warning. Some authors distinguish between finite sequences to denote ordered collections of n < ∞ numbers (for which I use the term ordered n-tuples), and infinite sequences to denote ordered collections of infinitely many numbers (for which I simply use the term sequences).
304
Part II. Calculus and dynamics of a particle in one dimension
Remark 60. Note that the correspondence of points and numbers (Subsection 1.6.5) implies that a sequence of numbers generates a set of points. However, as stated above, the definition of a sequence of numbers does not exclude the possibility of repeated numbers. Thus, a sequence of numbers does not necessarily correspond to a set of infinitely many points. For instance, the sequence of numbers {bn } = {1, −1, 1, −1, . . . } corresponds to a set of only two points of the real line, namely 1 and −1. Now, we are in a position to formalize the concept of getting smaller and smaller. Specifically, let us turn our attention to the notion of the limit of a sequence. Let us begin with the case {sn } = {1/n}. The key observation is that, in this case, if you choose an arbitrarily tiny positive number, say ε > 0, I can find N such that sn is smaller than ε, whenever n > N . [Like the rest of the world, I will use the Greek letter ε (epsilon) to denote the arbitrarily small real positive number that we are free to choose.] Remark 61 (ε vs ). Note that we have again encountered two different symbols, namely (Eq. 3.169) and now ε, both of which denote the same Greek letter epsilon. Throughout this book, I will use ε exclusively to denote a real positive number that is either arbitrarily small, or simply very small, specifically ε 1, where the symbol stands for “much less than”. Otherwise, I use the symbol . ◦ Point of clarification. It should be emphasized that ε can be as small as we wish, but cannot be “evanescent,” in the sense that its value cannot be considered vanishing. On the contrary, it must be considered as fixed, albeit as small as you wish. [A historical note on this issue will be presented in Subsection 9.1.3.] ◦ Warning. I will typically use the symbol N to denote an arbitrarily large natural number, and the symbol M to denote an arbitrarily large real number. Here is the crucial turning point ! Let us go back to the example above, that is, to the case sn = 1/n. In this case, we have that, given ε > 0 as small as you wish, I can choose any integer N > 1/ε, and obtain 1/n < ε, whenever n > N . [Hint: Dividing both sides of the inequality n > N by the positive integer n N yields 1/N > 1/n (Eq. 1.67). On the other hand, multiplying the inequality N > 1/ε by the positive real number ε/N , we have ε > 1/N (again Eq. 1.67). Combining yields ε > 1/N > 1/n (Eq. 1.66).] For instance, in the above example, if you were to ask me to make 1/n smaller than ε = 1/100, I would choose n > N = 100, because, whenever n > 100, we have 1/n < 1/100 (for instance, 1/101 < 1/100). If you want 1/n to be smaller than 1/1, 000, 000, I would choose N = 1, 000, 000, because for n > 1, 000, 000 we have 1/n < 1/1, 000, 000.
305
8. Limits. Continuity of functions
Mathematicians like to shorten their sentences. Thus, in order to indicate the fact that the generic term 1/n approaches zero (in the sense that given ε > 0 as small as you wish, we can find a number N such that 1/n < ε, whenever n > N ), they say: “The sequence {1/n} tends to zero as n tends to infinity.” [Again, as in Remark 10, p. 44, here infinity does not represent a number — it is a short convenient way to say that it grows beyond any limit.] As a second example, consider the sequence s1 = 1/10, s2 = 1/102 = 1/100, s3 = 1/103 = 1/1, 000, s4 = 1/104 = 1/10, 000, . . . , or, more concisely, sn = 1/10n (n = 1, 2, 3, 4, . . . ). Again, we can say: “The sequence 1/10n tends to zero as n tends to infinity.” The statement is apparent, since the generic term of the sequence s1 = 1/10, s2 = 1/102 = 1/100, s3 = 1/103 = 1/1, 000, s4 = 1/104 = 1/10, 000, . . . , may be made smaller than any number we wish. [If you are familiar with the floating–point notation that is used in computer programming, you would have for instance, s77 = 1.0 E − 77. Accordingly, you can write ε in floating–point notation and compare the exponents.] The examples used above are particularly simple. In order for you to have a less limited picture of the range of possibilities, note that, in general, a sequence might tend to a number s, not necessarily zero. For instance, consider, as a third example, sn = (n − 1)/n, namely s1 = 0, s2 = 1/2, s3 = 2/3, s4 = 3/4, . . . . Note that sn =
1 n−1 =1− . n n
(8.4)
As n tends to infinity, we have that 1/n tends to zero, and hence sn approaches one. To formalize and generalize the ideas above, we introduce the following Definition 146 (Limit of a sequence). Given a sequence of numbers, {sn }, we say that the limit of sn , as n tends to infinity, equals s iff, for any given ε > 0 arbitrarily small, there exists an integer N > 0 such that |sn − s| < ε,
whenever n > N.
(8.5)
Mathematicians like to shorten their notation as well and, to indicate that “s is the limit of sn , as n tends to infinity,” they write lim sn = s.
(8.6)
lim (sn − s) = 0,
(8.7)
n→∞
Equation 8.6 is equivalent to n→∞
306
Part II. Calculus and dynamics of a particle in one dimension
since both, Eq. 8.6 and Eq. 8.7, indicate that, given any ε > 0 arbitrarily small, there exists a number N such that |sn − s| < ε, whenever n > N . [For instance, in the first example above, namely sn = 1/n, we would write lim
n→∞
1 =0 n
(8.8)
to mean: “for any ε > 0 arbitrarily small, there exists a number N such that |1/n − 0| < ε, whenever n > N .”] For the third example above, namely for sn = (n − 1)/n = 1 − 1/n, I claimed the limit to be 1. Now, we can simply use the definition and verify whether my claim is correct. Indeed, using Eq. 8.7, we have (use Eq. 8.8)
1 1 lim sn − s = lim
1 − − 1
= lim = 0. (8.9) n→∞ n→∞ n→∞ n n Often, if Eq. 8.6 is true, we use an expression even shorter. Instead of “the limit of sn for n that tends to infinity is s,” we simply say “the sequence sn converges to s.” • Divergent sequences We also have the following Definition 147 (Divergent sequence). If a sequence does not converge, we say that it diverges, or that it is divergent. ◦ Warning. Contrary to what you would expect from the everyday use of the English language, a divergent sequence is not necessarily unbounded, namely it does not necessarily tend to ∞, or to −∞, as n goes to infinity. For instance, the sequence 1, −1, 1, −1, . . . , does not converge, and hence, by definition, is divergent. However, it is bounded. At times, we encounter sequences whose value becomes larger and larger (namely those that, in the everyday use of the English language, one would simply call divergent). Again, we use the symbol ∞ (see its use in Definition 27, p. 43) and write lim sn = ∞,
n→∞
(8.10)
iff, given a real number M > 0, as large as you wish, we can find an integer N > 0 such that, for any n > N , we have sn > M . Similarly, we write lim sn = −∞,
n→∞
(8.11)
307
8. Limits. Continuity of functions
iff, given a real number M > 0, as large as you wish, we can find an integer N > 0 such that, for any n > N , we have sn < −M . In this case, we say that the sequence tends to minus infinity, or negative infinity. ◦ Comment. In view of the fact that ∞ is not a number, in the above cases, instead of saying that the sequence tends to plus or minus infinity, we say also that the limit does not exist.
• Linearity of limit of sequences Here, we address some properties of sequences of linear combinations. We have the following Theorem 72. Given two convergent sequences, with lim sn = s
n→∞
and
lim tn = t,
n→∞
(8.12)
and two constants α and β, we have that lim (α sn + β tn ) = α s + β t.
n→∞
(8.13)
◦ Proof : Given any ε > 0 arbitrarily small, let us set ε1 := 12 ε/|α| and ε2 =: 12 ε/|β|. Then, Eq. 8.12 implies that, there exist N1 and N2 such that |sn − s| < ε1 for any n > N1 , and |tn − t| < ε2 for any n > N2 . Accordingly, for any n > N , where N denotes the greater between N1 and N2 , we have that
(α sn + β tn ) − (α s + β t) = α (sn − s) + β (tn − t)
≤ |α| sn − s + |β| tn − t < |α| ε1 + |β| ε2 =
1 2
ε+
1 2
ε = ε,
(8.14)
which is equivalent to Eq. 8.13. [Hint: Use |a ± b| ≤ |a| + |b| (Eq. 1.70) and |a b| = |a| |b| (Eq. 1.60).] ◦ Comment. Of course, the theorem is easily extendable to linear combinations of more than two sequences, as you may verify. In analogy with Definition 71, p. 127, for linear operations on vectors, we have the following Definition 148 (Linear operations on sequences). Consider an operation Q, which performed on a sequence, produces another sequence. Iff we have Q α sn + β tn = α Q(sn ) + β Q(tn ), (8.15)
308
Part II. Calculus and dynamics of a particle in one dimension
the operation is called linear. Therefore, the above theorem states that the operation of taking the limit of sequences is linear.
8.2.1 Monotonic sequences Consider the following Definition 149 (Monotonic sequence). Given a sequence of numbers, s1 , s2 , s3 , s4 , . . . , sn , . . . , we say that the sequence is: (i) monotonically increasing iff s1 < s 2 < s 3 < . . . ,
(8.16)
(ii) monotonically decreasing iff s1 > s 2 > s 3 > . . . ,
(8.17)
(iii) non–decreasing iff s1 ≤ s2 ≤ s3 ≤ . . . ,
(8.18)
s1 ≥ s2 ≥ s3 ≥ . . . .
(8.19)
(iv) non–increasing iff
Any sequence that satisfies one of the four above conditions is called monotonic. By changing the signs of the terms sk , we obtain that increasing sequences are transformed into decreasing sequences and vice versa. Thus, without loss of generality, in the following we focus on increasing and non–decreasing sequences. We have the following Theorem 73 (Bounded monotonically increasing sequence). A bounded increasing sequence converges. ◦ Proof : The theorem is intuitively true. The sequence keeps on growing monotonically, without oscillations. Being bounded, it has nowhere to go, except to a limit value. [We can do better. See Remark 62 below.]
8. Limits. Continuity of functions
309
Remark 62. The proof of Theorem 73 above is really based on intuition, not on mathematical rigor — a truly “gut–level proof.” You might like to know that I have seen widely used textbooks that treat the issue pretty much as I did above. Nonetheless, for the sake of completeness, a rigorous proof is given in Section 8.8. [The same holds true for Theorem 74 below.] Next, consider non–decreasing sequences. In this case, the numbers sn are not necessarily distinct. [For instance, from an increasing sequence we may generate a non–decreasing sequence by repeating twice each number of the first sequence.] Therefore, as pointed out in Remark 60, p. 304, a sequence, namely an ordered collection of infinitely many numbers, might correspond to a finite number of points on the real line. Therefore, we want to broaden the above theorem. Accordingly, we have the following Theorem 74 (Bounded non–decreasing sequence). A bounded non– decreasing sequence converges. ◦ Proof : Let M denote any upper bound of the sequence, whereas s0∞ is clearly a lower bound (indeed the minimum). Thus, the sequence sn n=0 defines a set of points in the interval [s0 , M ]. In comparison with the preceding theorem, the sequence now is non–decreasing, rather than strictly increasing. Therefore, we have to allow for repeated values of sn , and even the possibility that the infinitely many numbers of the sequence might correspond to a set with a finite number of distinct points. Thus, we have to consider both possibilities. The proof for an infinite number of points is identical to that of the preceding theorem. Therefore, here we examine only the case in which the number of points in the set is finite. In this case, the proof is actually trivial. Indeed, for a non–decreasing sequence with a finite number of points there exists, necessarily, a number n∗ with sn = s∗ for any n > n∗ , since the sequence cannot oscillate, because it is non–decreasing. In other words, the only way that a non–decreasing sequence can produce only a finite number of points is when sn remains constant from a certain point on. Accordingly, the sequence converges.
8.2.2 Limits of important sequences Here, we discuss the limits of important sequences. We have the following Theorem 75. Consider the sequence nα , with n = 1, 2, . . . , and α real. We have
310
Part II. Calculus and dynamics of a particle in one dimension
lim nα = 0
(α < 0);
=1
(α = 0);
=∞
(α > 0).
n→∞
(8.20)
◦ Proof : The second case is trivial, since we have n0 = 1 (Eq. 1.54), for all values of n ≥ 1. Next, consider the third case, namely α > 0. Given an N > 0 arbitrarily large, for any n such that n > N 1/α , we have nα > N (use Eq. 6.128), in agreement with the third equality in Eq. 8.20. Finally, for the case α < 0, set β = −α > 0 (so as to have nα = 1/nβ ) and use the third in Eq. 8.20. This yields the desired result, as you may verify. n Next, we want to consider a . We will study such a sequence in Theorem 76 below. To this end, it is convenient to introduce the following Lemma 10. We have that (1 + r)n > 1 + n r,
for r ∈ (0, ∞),
(8.21)
where n is any integer greater than 1. ◦ Proof : We can proceed as follows. Verify that Eq. 8.21 is valid for n = 2, then, for n = 3, and so on. Indeed, for n = 2, we have (1 + r)2 = 1 + 2r + r2 > 1 + 2r, because r2 > 0. Then, using this result, we have (1 + r)3 = (1 + r)(1 + r)2 > (1 + r)(1 + 2r) = 1 + 3r + 2r2 > 1 + 3r, again because r2 > 0. Next, using the last result, obtain (1 + r)4 = (1 + r)(1 + r)3 > (1 + r)(1 + 3r) = 1 + 4r + 3r2 > 1 + 4r, once more because r2 > 0. We may continue this way and obtain, (1 + r)n > 1 + nr, for any n we choose. Thus, the theorem is valid for any n. [An alternate, rigorous and less cumbersome proof will be provided in Subsection 8.2.5, on the principle of mathematical induction. Also, a simpler proof may be obtained by using the binomial theorem (Eq. 13.39); see Remark 113, p. 530.] Now, we are in a position to prove the following Theorem 76. Consider the sequence an , where a is a real positive number. We have lim an = 0
n→∞
(0 < a < 1);
=1
(a = 1);
=∞
(a > 1).
(8.22)
◦ Proof : The second equality is trivial, since for a = 1 we have an = 1 for all values of n. Next, consider the first case, namely a ∈ (0, 1). In order to prove this case, set a = 1/(1 + r) < 1. [This implies r = (1 − a)/a > 0.] According
311
8. Limits. Continuity of functions
to Eq. 8.21, we have 1/an = (1 + r)n > 1 + nr > nr > 0, namely an < 1/(nr) (use Eq. 1.67). Thus, for 0 < a < 1, an tends to zero as n tends to infinity (use Eq. 8.8), in agreement with the first equality in Eq. 8.22. Finally, the third case is proven by taking the reciprocal of the first, as you may verify. Remark 63. An alternate much simpler proof is obtained with the use of logarithms (Remark 111, p. 500). However, the logarithms will be introduced only in Chapter 12 and, for this reason, we used the proof presented above. [If you are familiar with logarithms, such a proof consists in taking the logarithm of an , to obtain ln an = n ln a. For a ≷ 1, we have ln a ≷ 0, and hence ln an tends to ±∞. Correspondingly, an tends to 0 for a < 1, and to ∞ for a > 1.] ◦ Comment. If a is negative, we have lim an = 0,
n→∞
for a ∈ (−1, 0),
(8.23)
as you may verify. On the other hand, for a = −1, the sequence takes successively the values ±1, whereas for |a| > 1 the sequence oscillates with |a|n that tends to infinity, as you may verify.
8.2.3 Fibonacci sequence. The golden ratio
♥
As an illustrative example, let us address the relationship between√the Fibonacci sequence (defined below) and the golden ratio Φ := (1 + 5)/2 = 1.6180339887 . . . . Specifically, we want to show that the ratio of two consecutive terms of the Fibonacci sequence tends to the golden ratio. [If you would like to know more on the subject (there is much more, in particular in relationship with the regular pentagon), you may consult the enjoyable book The Golden Ratio: The Story of Phi, the World’s Most Astonishing Number by Mario Livio (Ref. [43]).]
• Fibonacci sequence Let us consider the Fibonacci sequence, defined as follows: (i) the first two numbers are F1 = F2 = 1, and (ii) each number Fk of the sequence is the sum of the last two numbers:2 2
Fibonacci introduced the sequence in his book Liber Abaci (Ref. [23]), as the solution to some weird rabbit population: each couple produces a new couple every month, beginning two months after their birth. These rabbits never die. Thus, each month we have: (i)
312
Part II. Calculus and dynamics of a particle in one dimension
Fk+2 = Fk+1 + Fk .
(8.24)
The numbers Fk (k = 1, . . . ) are called the Fibonacci numbers. The first few Fibonacci numbers are 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, . . .
(8.25)
Fig. 8.1 Fibonacci numbers. Golden spiral
In Fig. 8.1, we present an interesting composition obtained by sticking together squares whose sides equal the Fibonacci numbers. Superimposed to this, we show a spiral, known as the golden spiral (or the Fibonacci spiral), which is composed of circular arcs inscribed in each square. [The numbers occur in nature, such as in arrangements of leaves and branches in plants (a romanesque broccoli is a great example), as well as in the spirals of shells. Look in the web for “Fibonacci sequence in nature.”]
• Golden ratio The origin of the golden ratio, Φ, may be placed with the beginning of civilization. Euclid (Vol. II, Prop. 11; Ref. [22], Vol. 1, pp. 402–403) supposedly gave the first known definition of Φ. Artists used it because such a proportion is considered to be aesthetically pleasing. Indeed, the letter Φ (Phi ) stems from the initial letter of Phidias (Φιδιας), the famous ancient Greek sculptor, who is said to have used it for the “ideal proportions” of a human body. His statues on the Parthenon appear to be based upon the golden ratio. The ratio between the height and one half of the base length of the pyramid of Cheops (the great pyramid of Giza) equals Φ. The Vitruvian man, by the existing rabbits Fk−1 , plus (ii) the newly born rabbits Fk−2 . [It should be pointed out that the sequence was already known to Indian mathematicians.]
313
8. Limits. Continuity of functions
Leonardo da Vinci, might be based upon Φ. The golden ratio is addressed by Pacioli (Ref. [51]), where he refers to it as the “Divine Proportion.” 3 The golden ratio may be introduced as follows. Consider Fig. 8.2. By
Fig. 8.2 The golden ratio
definition of x, the dark gray rectangle and the big rectangle (namely the union of the dark gray rectangle and light gray square) are similar. In other words, we impose (1 + x)/1 = 1/x, namely x2 + x = 1.
(8.26)
Φ = 1/x = 1 + x,
(8.27)
Φ2 − Φ − 1 = 0.
(8.28)
Next, setting
one obtains
This is a quadratic equation that has only one positive root, namely √ 1+ 5 Φ= = 1.618 033 988 . . . . (8.29) 2 ◦ Comment. A compass–and–straightedge construction is shown in Fig. 8.2. Consider the black circle in the dark grey region, along with its radius along the bottom horizontal line (namely R = .5 + x) and that√along the slanted black straight line in the light gray region (namely R = 1 + .52 ). Equate R2 from the two expressions and compare the result to Eq. 8.26. 3 Fra Luca Bartolomeo de Pacioli (c. 1447–1517), was an Italian Franciscan friar with interests in mathematics. He is considered the “Father of Accounting and Bookkeeping,” as he introduced the double–entry system of bookkeeping.
314
Part II. Calculus and dynamics of a particle in one dimension
• Limit of Fk+1 /Fk vs the golden ratio Finally, we can address the problem of interest here: “What is the limit of the ratio of two consecutive terms of the Fibonacci sequence, Fk+1 /Fk , as k tends to infinity?”
Fig. 8.3 Ratios Fk+1 /Fk vs k
Let us begin with a simple gut–level approach. If the sequence of the ratios of the Fibonacci numbers converges (as it appears to do in Fig. 8.3), we necessarily have that Fk+1 /Fk = constant, say r > 0, for all k, and hence Fk+2 /Fk = (Fk+2 /Fk+1 ) (Fk+1 /Fk ) = r2 . Thus, dividing Fk+2 = Fk+1 + Fk (Eq. 8.24) by Fk , one obtains Fk+2 Fk+1 − − 1 = r2 − r − 1 = 0. (8.30) Fk Fk √ This shows that r = 12 1 + 5 = Φ (Eqs. 8.28 and 8.29). Thus, we have obtained that the limit of Fk+1 /Fk as k tends to infinity, if it exists, is necessarily equal to the golden ratio.
• Transient of Fibonacci sequence to the golden ratio
♥♥
Here, we up the ante and show how this limit is reached. Equation 8.24 may be rewritten as Fk+2 − Fk+1 − Fk = 0.
(8.31)
The solution to this equation may be obtained by trying Fk = ρk .
(8.32)
315
8. Limits. Continuity of functions
Combining with Eq. 8.31 yields ρk+2 − ρk+1 − ρk = 0, or, dividing by ρk , ρ2 − ρ − 1 = 0. This is a quadratic equation whose solution is √ 1± 5 ρ± = . 2
(8.33)
(8.34)
Equation 8.31 is linear. Therefore, we can apply the superposition theorem for the solutions of linear homogeneous equations (Eq. 3.138) to obtain Fk = A+ ρ+k + A− ρ−k .
(8.35)
The constant A+ and A− are obtained by imposing F1 = F2 = 1. This yields 1 1 Fk = √ ρ+k − ρ−k = √ 5 5
√ k √ k 1− 5 1+ 5 − , 2 2
(8.36)
as you may verify. The above expression is known as the Binet formula.4 Next, note that 1 − (ρ−/ρ+)k+1 Fk+1 = ρ+ . Fk 1 − (ρ−/ρ+)k
(8.37)
√ Also, ρ+ + ρ− = 1 (Eq. 8.34), which yields ρ− = 12 (1 − 5) −0.618 033 988 (Eq. 8.29) and hence ρ−/ρ+ ∈ (−1, 0). Accordingly, (ρ−/ρ+) k tends to zero (oscillating), as k tends to infinity (Eq. 8.23). Therefore, Fk+1 /Fk tends to ρ+. On the other hand, ρ+ coincides with the golden ratio Φ (Eqs. 8.29 and 8.34). Thus, we obtain lim
k→∞
Fk+1 = Φ. Fk
8.2.4 The coast of Tuscany.
(8.38)
♥
Let us consider another illustrative example. Sometime back, while my wife and I were discussing a naval disaster that occurred off the coast of Tuscany, she asked me: “How long is the coast of Tuscany?” I didn’t know the answer, but I thought I could figure it out, by measuring it on a map, which I did. 4 Named after the French mathematician and physicist Jacques Philippe Marie Binet (1786–1856). The Binet–Cauchy identity (Eq. 15.101) and the Binet theorem on the determinant of the product of two matrices are also named after him.
316
Part II. Calculus and dynamics of a particle in one dimension
She objected that the coast is a jagged line, which is considerably longer than the straight line I had measured. So, I told her I was going to get a correction factor, as the ratio between a straight segment and a jagged one. I constructed the jagged one as follows. I started from a segment from 0 to 1 (black bottom line in Fig. 8.4). Then, I removed the segment from 1/3 to 2/3 and replaced it with two segments of equal length (sides of the big triangle in Fig. 8.4). This way, the original segment, of length 1, was replaced by a jagged line of length 1 = 4/3.
Fig. 8.4 Model (wrong) of the coastline of Tuscany
She still persisted, because the coast of Tuscany, she said, is more jagged than that. Therefore, I considered each of the four segments of length 1/3 and applied the same procedure (mid–size triangles in Fig. 8.4). Each one of them was replaced by a jagged line equal to 4/3 of the original one. Thus, the total length of the new jagged line (composed of 16 segments) is obtained from the preceding one by multiplying it by 4/3. Accordingly, the length of the 16–segment line is equal to 2 = (4/3)2 . I kept on going so as to obtain smaller triangles (shown in Fig. 8.4), and then even smaller ones (not shown in the figure). Every time, we multiply the length by a factor 4/3, so that n = (4/3)n . As n tends to infinity, we find out that, in the limit, we have obtained a jagged line of infinite length! [Use the third equality in Eq. 8.22.] ◦ Food for thought. Is there something wrong with my model? Yes! Something appears to be definitely wrong with my model — mathematics and real life seem to be in disagreement. However, an interesting observation (my wife again) is that an infinite length is reached only if we keep on going forever. What happens if we stop at a final value of n? What happens if we stop at the dimension of a grain of sand (or even that of an atomic radius)? What then? Then, . . . the length (4/3)n is finite. Something for us to think about. Theoretical models have to be taken with a grain of salt — or a grain of sand in this case!
317
8. Limits. Continuity of functions
• An alternate approach
♥
To make sure that I did not have any error in my approach, I repeated the calculation in a different way. The initial length is 1. At the first step, I replaced the middle segment, of length 1/3, with two of the same length; thus, the length was increased by 1/3, to a total of 1 = 1 + 1/3 = 4/3. At the second step, I replaced each of the four middle segments, each of length 1/32 , with two of the same length; thus, the length was increased by 4/32 , to a total of 2 = 1 + 1/3 + 4/32 = 16/9. At the third step, I replaced each of the 16 segments of length 1/33 with two of the same length; thus, the length was further increased by 16/33 , to a total of 3 = 1 + 1/3 + 4/32 + 42 /33 =64/27. At the n-th step, the total length is n = 1 + 1/3 + 4/32 + 42 /33 + 43 /34 + · · · + 4n−1 /3n .
(8.39)
◦ A nice trick. I wanted an explicit value of n , for n arbitrary. At this point, I remembered the nice trick I had learned in high school (Eq. 6.48, with A = 1 and B = ρ). In our case, we have that (1 − r) (1 + r + r2 + r3 + · · · + rn−1 ) = (1 + r + r2 + r3 + · · · + rn−1 ) − (r + r2 + r3 + r4 + · · · + rn ) = 1 − rn ,
(8.40)
or, dividing by 1 − r, 1 + r + r2 + r3 + · · · + rn−1 =
1 − rn . 1−r
Accordingly, we have (in agreement with the earlier result) 4 42 1 42 4n−1 1 + + 2 + 2 + · · · + n−1 n = 1 + 3 3 3 3 3 n 1 1 − (4/3)n 1 1 − (4/3) =1+ = (4/3)n . =1+ 3 1 − 4/3 3 −1/3
• What about the area?
(8.41)
(8.42)
♥
I gave up on that issue! However, I got curious and started to inquire about the area. What happens to the area of an equilateral triangle if we keep on applying the above procedure to each of its three sides?
318
Part II. Calculus and dynamics of a particle in one dimension
Let us choose to study the problem under the condition that at each step the line is modified so as to increase the area, namely that the new segments are always placed outside the boundary, as shown in Fig. 8.5.
Fig. 8.5 Area inside the jagged line
Let us denote by A the original area of the triangle. With the first step, this area is increased by three equilateral triangles, each of which has the sides equal to one third of the original one. Correspondingly, the area of each of the additional triangle equals A/32 (the area of a triangle is proportional to the square of its sides). Hence, after the first step, the total area inside the resulting polygon equals A + 3 A/9. At the second step, for each of the three original sides we add four equilateral triangles, each of which has sides equal to 1/32 . Correspondingly, the area of each of the 3 × 4 = 12 additional triangles equals A/92 . Therefore, after the second step, the total area inside the resulting polygon equals A + 3 A/9 + 3 A 4/92 = A [1 + (1 + 4/9)/3]. At the third step, for each of the three original sides, we add 42 = 16 equilateral triangles, each of which has sides equal to 1/33 . Correspondingly, the area of each of the 3×42 additional triangles equals A/93 . Hence, after the third step, the total area of the resulting polygon equals A+3 A/9+3 A 4/92 +3 A 42 /93 = A [1 + (1 + 4/9 + 42 /92 )/3]. If we continue this way, we have that after the n-th step the total area of the resulting (atypical) polygon equals * 1) 2 n−1 An = A 1 + 1 + 4/9 + (4/9) + · · · + (4/9) , (8.43) 3 as you may verify. Hence (use Eq. 8.40) 1 1 − (4/9)n . An = A 1 + 3 1 − 4/9
(8.44)
In the limit, as n tends to infinity, (4/9)n tends to 0 (Eq. 8.22), and we have
319
8. Limits. Continuity of functions
lim An = A 1 +
n→∞
1 1 3 1 − 4/9
=A 1+
3 9−4
=
8 A. 5
(8.45)
What do you know about that! We have a geometrical figure with a finite area and an infinitely long boundary! Remark 64. Post scriptum: Sometime later, my wife was asking me about fractals. She wanted to buy me a book on fractals for my birthday. I went on the web, looked for “fractal ” and showed her beautiful pictures of fractals. I also found out that the issue of the coastline having an infinite length had been addressed by Mandelbrot (Ref. [44]), in 1967.5 Moreover, I discovered that the limit of the line shown in Fig. 8.4 is called the Koch curve, the limit of the line shown in Fig. 8.5 is called the Koch snowflake.6 [The Koch flake has the property of a being continuous line that does not have a tangent at any of its points (Definition 91, p. 180), as you may verify.]
8.2.5 Principle of mathematical induction
♣
Here, we want to improve the rigor of the proof of Eq. 8.21. To this end, let us introduce an important principle of mathematics, the principle of mathematical induction. Indeed, the proof given above, albeit gut–level, is cumbersome, repetitive, definitely not elegant, and not satisfactory because of the extrapolation of the result used in the last step, where we suggested that one should repeat the process n times. Such an extrapolation, however, may be made rigorous by introducing the following Principle 1 (Principle of mathematical induction) For a given property, say An , to be valid for any integer n, it is sufficient that: 1. An is valid for an initial value, say n = 1, and 2. if property An is valid for n = k, it is valid for n = k + 1 as well. [The fifth Peano postulate (Footnote 12, p. 43) is a particular case of the above.] Being a principle, no proof is required. Its meaning will be apparent by applying it to the proof of Eq. 8.21. For n = 2 we have (1 + r)2 > 1 + 2r, and hence Item 1 is satisfied. Next, consider Item 2. For k > 2, we have 5
Benoˆıt B. Mandelbrot (1924–2010) was a Polish–born French–American mathematician, who introduced the term fractal. The Mandelbrot set is named after him. 6
Named after the Swedish mathematician Niels Fabian Helge von Koch (1870–1924).
320
Part II. Calculus and dynamics of a particle in one dimension
(1 + r)k+1 = (1 + r) (1 + r)k > (1 + r) (1 + k r) > 1 + (k + 1) r.
(8.46)
[Hint: Use (1 + r)k > 1 + kr (premise in Item 2 of the above principle) for the first inequality, and r2 > 0 for the second.] Thus, (1 + r)n > 1 + nr for all n. ◦ Warning. The principle of mathematical induction, as stated in Principle 1 above, may be used not only for the proof of a given statement, but also to give a recursive definition. Specifically, to define the entity Bn for n = 1, 2, 3, . . . , it is sufficient to: (i) provide the definition: for an initial value, say n = 1, and (ii) define Bn+1 in terms of Bn .
8.3 Limits of functions In the preceding sections, we have examined the limit of a sequence {sn }, as n goes to infinity. In this section, we extend the concept by introducing the notion of the limit of a function f (x), as x approaches a given value, say c. We have the following Definition 150 (Limit of a function). We say that fc is the limit of the function f (x) for x that tends to c iff, given any ε > 0 arbitrarily small, there exists a number δ > 0 such that, whenever x satisfies the inequality |x − c| < δ, we have |f (x) − fc | < ε. In mathematical symbols, this whole sentence is succinctly expressed by lim f (x) = fc .
x→c
(8.47)
Note that c and/or fc may be replaced with ∞. Specifically, we use the convenient expression lim f (x) = f∞
x→∞
(8.48)
to mean that, given an ε > 0 arbitrarily small, there exists a positive number M such that, whenever x > M , we have |f (x) − f∞ | < ε. Similarly, we use lim f (x) = ∞
x→c
(8.49)
to mean that, given an M > 0 arbitrarily large, there exists a number δ > 0 such that, whenever |x − c| < δ, we have f (x) > M . Similar considerations apply if ∞ is replaced with −∞ in Eqs. 8.48 and 8.49.
321
8. Limits. Continuity of functions
It is also convenient to introduce unilateral limits, as in the following 7 Definition 151 (Limit from the left and the right). We say that f− is − the limit from the left of a function f (x) as x tends to c iff, for any ε > 0 arbitrarily small, there exists a number δ > 0 such that, for any x ∈ (c − δ, c), we have |f (x) − f−| < ε. We denote this by lim f (x) = f−.
x→c−
(8.50)
Similarly, we say that f+ is the limit from the right of a function f (x) as x + tends to c iff, for any ε > 0 arbitrarily small, there exists a number δ > 0 such that, for any x ∈ (c, c + δ), we have |f (x) − f+| < ε. We denote this by lim f (x) = f+.
x→c+
(8.51)
[This definition (often implied and not stated explicitly) is particularly important when c is one of the endpoints of the interval of definition of f (x).] ◦ Comment. The limit of a function defined in Eq. 8.47 makes sense only if f− = f+ (Eqs. 8.50 and 8.51). [More on this in Section 8.4, on continuous and discontinuous functions.]
8.3.1 Rules for limits of functions We have the following Theorem 77 (Limit of a linear combination). Consider a linear combination of the functions f (x) and g(x). Assume that the limits of f (x) and g(x) for x that tends to c exist, with limx→c f (x) = fc and limx→c g(x) = gc . Then, we have lim α f (x) + β g(x) = α fc + β gc . (8.52) x→c
◦ Proof : The fact that the limits of f (x) and g(x) for x that tends to c exist, implies that, given an ε > 0 arbitrarily small: (i) there exists a δ1 > 0 such that, for |x − c| < δ1 , we have |f (x) − fc | < ε1 := 12 ε/|α|, and (ii) there exists a δ2 > 0 such that, for |x − c| < δ2 , we have |g(x) − gc | < ε2 := 12 ε/|β|. Let δ > 0 denote the smaller of the two (δ1 and δ2 ). Then, for any x with |x − c| < δ, we have 7
−
+
By a universally accepted convention, c (c ) denotes the point c as it is approached from the left (right). In gut–level terms, it’s like c consists of three overlapping points, with − + c on the center, c on the left, and c on the right.
322
Part II. Calculus and dynamics of a particle in one dimension
αf (x) + βg(x) − αfc + βgc = α f (x) − fc + β g(x) − gc (8.53)
≤ α f (x) − fc + β g(x) − gc ≤ |α| ε1 + |β| ε2 = 12 ε + 12 ε = ε, which is equivalent to Eq. 8.52. [Hint: Use |a ± b| ≤ |a| + |b| (Eq. 1.70) and |a b| = |a| |b| (Eq. 1.60).] We have the following Theorem 78 (Limit of a product of functions). Consider the functions f (x) and g(x). Assume that the limit of f (x) and g(x) for x that tends to c exist, with limx→c f (x) =: fc and limx→c g(x) =: gc . Then, we have lim f (x) g(x) = fc gc .
x→c
(8.54)
◦ Proof : The proof is obtained by “splitting” the variation f (x) g(x) − fc gc into the sum of two variations, namely f (x) g(x) − fc gc = f (x) g(x) − fc g(x) + fc g(x) − fc gc . (8.55) The fact that the limits of f (x) and g(x) for x that tends to c exist (and equal fc and gc , respectively), implies that, given an ε > 0 arbitrarily small: (i) there exists a δ1 > 0 such that, for |x−c| < δ1 , we have |f (x)−fc | < ε1 := 1 2 ε/|g|Sup , where |g|Sup is the supremum of |g(x)| in the interval (c−δ1 , c+δ1 ), and (ii) there exists a δ2 > 0 such that, for |x − c| < δ2 , we have |g(x) − gc | < ε2 := 12 ε/|fc |. Next, let δ > 0 denote the smaller of the two (δ1 and δ2 ). Then, for any x with |x − c| < δ, we have
f (x) g(x) − fc gc ≤ f (x) g(x) − fc g(x) + fc g(x) − fc gc
≤ f (x) − fc g(x) + g(x) − gc fc
≤ ε1 |g|Sup + ε2 fc = 12 ε + 12 ε = ε, (8.56) in agreement with Eq. 8.54. [Hint: Use |a b| = |a| |b| (Eq. 1.60) and |a ± b| ≤ |a| + |b| (Eq. 1.70).] Remark 65 (Splitting trick). Here, we have learned another important tool, one that will sharpen your ability in using mathematical tricks to prove theorems. Indeed, the approach used in Eq. 8.55 may be generalized as follows: when you encounter a difference that is due to more than one variation of the relevant variables, break it down in the sum of increments, each of which is due to only one variation. I will refer to this approach as the splitting trick. We also have
323
8. Limits. Continuity of functions
Theorem 79 (Limit of the reciprocal of a function). Consider the function g(x) = 1/h(x), with h(x) = 0 in a neighborhood N of c. Assume that the limit of h(x), for x that tends to c, exist, with limx→c h(x) =: hc = 0. Then, we have lim g(x) = lim
x→c
x→c
1 1 = . h(x) hc
(8.57)
◦ Proof : By hypothesis, we know that, given an ε1 arbitrarily small, there exists a δ > 0 such that |h(x) − hc | < ε1 for any x ∈ N , say for |x − c| < δ. Set ε = ε1 /(|h|Inf |hc |) (also arbitrarily small), where |h|Inf > 0 denotes the infimum of |h(x)| in N . Then, we have, for any x ∈ N (use |a b| = |a| |b|, Eq. 1.60),
1 1
|hc − h(x)| ε1
− (8.58)
h(x) hc = |h(x)| |hc | ≤ |h| |hc | = ε, Inf which is fully equivalent to Eq. 8.57.
◦ Comment. For the last three theorems, the limit could even be unilateral, either from the right or from the left, as you may verify (use Definition 151, p. 321). For future reference, setting g(x) = 1/h(x), if the limits of f (x) and h(x) are fc and hc , Eqs. 8.54 and 8.57 yield (provided of course that hc = 0) lim
x→c
f (x) fc = . h(x) hc
(8.59)
8.3.2 Important limits of functions It should be emphasized that what makes limits really important is the fact that they are paramount in introducing continuous functions (Section 8.4), derivatives (Chapter 9) and integrals (Chapter 10). Nonetheless, the limits of some specific functions are worth addressing. These are considered here.
• Trigonometric functions Here, we present some useful approximations for the trigonometric functions, cos x, sin x and tan x, when the angle x is very small. Note that Eq. 6.89, namely sin x < x < tan x for x ∈ (0, π/2), implies
324
Part II. Calculus and dynamics of a particle in one dimension
sin x 1, x
and
for x ∈ (0, π/2).
(8.60)
On the other hand, we have tan x = sin x/ cos x. Moreover, it is apparent, from the geometric definition of cos x (Eq. 6.63), that, as x approaches zero, we have that cos x approaches one (use Eq. 6.77), namely lim cos x = 1.
(8.61)
x→0
Thus, the left sides of the two inequalities in Eq. 8.60 tend to each other. This is possible if, and only if, sin x =1 x→0 x lim
tan x = 1. x→0 x
and
lim
(8.62)
[This may also be seen from the graphs of these functions (see Fig. 6.12, p. 230; as stated there, the function y = x is presented as a thin slanted straight line through the origin).] Equations 8.61 and 8.62 may be interpreted as follows: if x is very close to zero, we may introduce the approximations sin x = x + h.o.t. x,
(8.63)
cos x = 1 + h.o.t. 1,
(8.64)
tan x = x + h.o.t. x,
(8.65)
where h.o.t. stands for “higher–order terms,” namely terms that are smaller than those that are included, as x tends to zero. It may be of interest to pursue the issue a bit further, and obtain a better approximation for cos x. This is obtained by noting that sin2 x = 1 − cos2 x = (1 + cos x)(1 − cos x) (use Eqs. 6.75 and 6.37, respectively). Thus, we have sin2 x = 1 − cos x. 1 + cos x
(8.66)
Next, using Eqs. 8.63 and 8.64, we have sin2 x =1 x→0 x2 lim
and
lim (1 + cos x) = 2.
x→0
(8.67)
Therefore, combining with Eq. 8.66, one obtains (use Eqs. 8.54 and 8.57) 1 1 − cos x sin2 x 1 (8.68) lim = lim = , x→0 x→0 1 + cos x x2 x2 2 namely
325
8. Limits. Continuity of functions
cos x = 1 −
1 1 2 x + h.o.t. 1 − x2 , 2 2
(8.69)
a considerable improvement over Eq. 8.64. Remark 66. Note that, for x considerably smaller than 1, the term 12 x2 is considerably smaller than x, and hence much much smaller than 1. Therefore, sometimes it may be disregarded. To give an example, assume that x = 10−6 = 0.000 001; then, x2 = 10−12 = 0.000 000 000 001. Thus, we could approximate cos x 1 − 12 x2 = 0.999 999 999 999 5, with cos x 1, thereby recovering Eq. 8.64. [If you know a bit about computers, you would know that they work with a finite number of digits (round–off error ), and hence there exists a value of x (which depends upon the computer accuracy, and whether you work on single or double precision), below which your computer will not be able to pick up the difference between 1 and 1 − 12 x2 .] Remark 67. Let us address the geometrical interpretation of Eq. 8.69. Consider Fig. 8.6. We have that cos x equals the length OA. Next, we approximate the angle x with the length AP , the error being of higher order (use Eq. 8.63). Then, note that, according to Eq. 8.69, for x very small, P Q varies like x2 , namely the curve between B and P is approximated by a horizontal parabola.
Fig. 8.6 cos x 1 − 12 x2
• The function xα We have introduced the function xα in Subsection 6.5.5. We have the following (D´ej` a vu? See Eq. 8.20) Theorem 80. Consider the function xα , with x > 0. We have
326
Part II. Calculus and dynamics of a particle in one dimension
lim xα = 0
(α > 0);
=1
(α = 0);
=∞
(α < 0).
x→0
(8.70)
◦ Proof : The second of these equations is trivial. To prove the first one, we use the definition: given an ε > 0 arbitrarily small, consider δ = ε1/α > 0. Then, for any x < δ, we have |xα − 0| = xα < δ α = ε (use Eq. 1.68). [The third one is a direct consequence of the first one. For, if α < 0, we have xα = 1/xβ , with β = −α > 0.] ◦ Comment. In Remark 9, p. 37, I stated that I chose to examine the value of 00 case by case. From the second in Eq. 8.70, we have limx→0 x0 = 1. This is a reason some authors define 00 = 1. As an immediate consequence, we also have lim xα = ∞
(α > 0);
x→∞
=1
(α = 0);
=0
(α < 0).
(8.71)
[Hint: Use Eq. 8.70 with x replaced by 1/x.] Note the analogy with Eq. 8.20.
8.3.3 The O and o (“big O” and “little o”) notation
♣
For future reference, it is useful to introduce the symbols O[x] and o[x], often referred to as the “big O” and “little o” notation, which are defined as follows: Definition 152 (Notation O and o). Iff lim
x→c
f (x) = A, g(x)
(8.72)
where A is any constant in (−∞, ∞), we say that, as x tends to c, f (x) is of order g(x), and write f (x) = O[g(x)],
as x → c.
(8.73)
On the other hand, iff lim
x→c
f (x) = 0, g(x)
(8.74)
327
8. Limits. Continuity of functions
we say that, as x tends to c, f (x) is of order smaller than g(x) and write f (x) = o[g(x)],
as x → c.
(8.75)
g(x) = f (x) + O(...).
(8.76)
Note that f (x) = g(x) + O(...)
is equivalent to
In other words, O(...) has no sign. The same holds true for o(...). ◦ Comment. We do not exclude the possibility that the constant A in Eq. 8.72 is equal to zero. Because of this, as x tends to c, we have that f (x) = O[g(x)] does not imply g(x) = O[f (x)], since the latter means lim
x→c
g(x) = 1/A. f (x)
(8.77)
Often, g(x) coincides with (x − c)α , so as to have f (x) = O (x − c)α , as x → c,
iff
lim
f (x) = A, (x − c)α
(8.78)
lim
f (x) = 0. (x − c)α
(8.79)
x→c
for some A ∈ (−∞, ∞), and f (x) = o (x − c)α , as x → c,
iff
x→c
In particular, for α = 0, we have f (x) = O (x − c)0 = O 1 , as x → c,
iff
for some A ∈ (−∞, ∞), and f (x) = o (x − c)0 = o 1 , as x → c,
iff
lim f (x) = A,
(8.80)
lim f (x) = 0.
(8.81)
x→c
x→c
◦ Comment. Note that, for ε > 0 arbitrarily small, we have f (x) = O[xα ]
is equivalent to
f (x) = o[xα−ε ],
(8.82)
as x tends to 0, as you may verify. ◦ Warning. If we want to be rigorous, the use of the equal sign in the above equations is inconsistent with Remark 3, p. 17, on the properties of the equal sign. Indeed, f (x) = O[g(x)] does not imply O[g(x)] = f (x) (a meaningless expression), in contrast with the so–called transitive property of equalities (Remark 3, p. 17). Like most of the mathematical world, I will
328
Part II. Calculus and dynamics of a particle in one dimension
use Eq. 8.73 with the understanding that it is simply a convenient symbol to mean exclusively what is stated in Definition 152 above.
• Illustrative examples Using these notation, Eq. 8.21 may be replaced with (1 + x)n = 1 + n x + O x2 ,
(8.83)
since all the terms neglected in Eq. 8.46 are of order x2 . For future reference, setting x = ε/a, the above equation yields (a + ε)n = an (1 + ε/a)n = an 1 + n ε/a + O ε2 /a2 = an + n an−1 ε + an−2 O ε2 . (8.84) Also, Eqs. 8.63, 8.69 and 8.65 may be written, respectively, as sin x = x + o x , as x → 0, 1 cos x = 1 − x2 + o x2 , as x → 0, 2 tan x = x + o x , as x → 0. Finally, note that from the definition we have xO 1 =O x , as x → 0, xo 1 =o x , as x → 0, as you may verify. In general, we have x O xα = O xα+1 , x o xα = o xα+1 ,
(8.85) (8.86) (8.87)
(8.88) (8.89)
as x → 0,
(8.90)
as x → 0.
(8.91)
8.4 Continuous and discontinuous functions In this section, we define continuous real functions of a real variable. [Their properties are addressed in Section 8.5.] From an intuitive point of view, and in everyday jargon, a function is called continuous iff its graph has no jumps, otherwise it’s called discontinuous. However, there exist different types of discontinuities. As mathemati-
329
8. Limits. Continuity of functions
cians, we have to be precise and rigorous. We accomplish this by relating continuity to the notion of limit introduced above. Specifically, let fc denote the limit that the function approaches as x tends to c, as in Eq. 8.47. Note that fc is not necessarily equal to the value f (c) that the function attains for x = c. In other words, f (x) can get closer and closer to a value fc as x approaches c, without necessarily having such a value when x = c. For instance, consider the function sin x , x = 0,
for |x| ∈ (0, π];
f0 (x) =
for x = 0.
(8.92)
We have limx→0 f0 (x) = 1 (Eq. 8.62), whereas f0 (0) = 0, by definition of f0 (x). Accordingly, let us introduce a formal definition of continuity. We have the following Definition 153 (Continuity of f (x) at a point). A function f (x) is said to be continuous at x = c iff its limits, from both the left and the right (Definition 151, p. 321) exist and are both equal to f (c), namely iff (recall Eqs. 8.50 and 8.51) lim f (x) = lim f (x) = f (c). +
x→c−
(8.93)
x→c
A function that is not continuous at x is called discontinuous there. ◦ Comment. Note that in the above definition of continuity, it is explicitly assumed that the two limits of f (x) as x tends to c exist. If even just one of the limits does not exist, the function is discontinuous by definition. Also, the above definition implies that f (c) exists, namely that |f (c)| < ∞. Therefore, a function for which we have
lim f (x) = ∞ and/or lim f (x) = ∞ (8.94) x→c−
x→c+
is not continuous at the point c. For future reference, note that Eq. 8.93 is equivalent to saying that a function f (x) is continuous at a point c ∈ (a, b) iff (see Eq. 8.81)
f (x) − f (c) = o[1], as |x − c| → 0, (8.95) for x on both sides of c. We also need the definition of continuity over an interval. We have the following (recall Definition 26, p. 43, of open and closed intervals)
330
Part II. Calculus and dynamics of a particle in one dimension
Definition 154 (Continuity of f (x) over an interval). Let us start with open intervals. If a function is continuous for all the points of the open interval (a, b), we say that f (x) is continuous in (a, b). If the interval is not open, at the end point(s) we take only the pertinent of the two limits in Eq. 8.93. For instance, we say that f (x) is continuous in [a, b) iff: (i) f (x) is continuous in (a, b), and (ii) (see Eq. 8.51) lim f (x) = f (a).
x→a+
(8.96)
It is apparent from their definitions, that the functions xn , sin x and, cos x are continuous in (−∞, ∞), whereas tan x is discontinuous at x = ( 12 ± k) π (k = 0, 1, . . . ).
8.4.1 Discontinuous functions We distinguish four different types of discontinuities, namely: (i) removable discontinuities; (ii) jump discontinuities; (iii) infinite discontinuities; (iv) essential discontinuities. These are addressed in the subsubsections that follow.
• Removable discontinuities On the basis of Definition 153 above, we see that the function f0 (x) defined in Eq. 8.92 is discontinuous at x = 0. However, let us make a small change, namely introduce a slightly different function, to which we assign the value 1 at x = 0: sin x , x = 1,
f1 (x) =
for |x| ∈ (0, π]; for x = 0.
(8.97)
Contrary to the function f0 (x) in Eq. 8.92, the function f1 (x), which differs from f0 (x) only at x = 0, is continuous everywhere. In other words, in this case, the discontinuity may be eliminated simply by changing the value of the function at the point of discontinuity from 0 to 1. − + The same considerations apply to any f (x) for which f = f = fc , with fc = f (c). The discontinuity may be removed by redefining f (c) := fc . Accordingly, discontinuities of this type will be referred to as removable discontinuities.
331
8. Limits. Continuity of functions
• Jump discontinuities The example par excellence of a function with a jump discontinuity is the Heaviside function, or step function, H(x), which is defined by8 H(x) = 0,
for x < 0;
= 1/2,
for x = 0;
= 1,
for x > 0.
(8.98)
This function is discontinuous at x = 0, because the limit from the left is 0, whereas the limit from the right is 1. Its graph, presented in Fig. 8.7, shows − + a jump equal to one, between x = 0 and x = 0 .
Fig. 8.7 The Heaviside function
As another example, consider the function sgn(x), namely sgn(x) =
1,
for x > 0;
=
0,
for x = 0;
= −1,
for x < 0,
(8.99)
which gives us the sign of x. The graph of the function, presented in Fig. − + 8.8, shows a jump equal to two, between x = 0 and x = 0 . Note that 9 sgn(x) = 2 H(x) − 1. 8
Named after Oliver Heaviside (1850–1925), an English electrical engineer, mathematician and physicist. He used complex numbers to study electrical circuits, developed mathematical techniques for the solution of differential equations, and introduced vector calculus the way we know it, independently (and nearly simultaneously) from Willard Gibbs. He also reformulated the Maxwell field equations in terms of electric and magnetic forces and energy flux.
9
The symbol sgn(x) (to be read: sign of x, or signum of x) denotes the signum function (after the Latin term for “sign”).
332
Part II. Calculus and dynamics of a particle in one dimension
Fig. 8.8 The function sgn(x)
In the above cases, the limits from left and right (Eq. 8.93) exist, but they are different. In mathematical terms, we have lim f (x) = f−
x→c−
and
lim f (x) = f+,
x→c+
with f− = f+.
(8.100)
The graph of the function presents a finite jump. Accordingly, discontinuities of this type will be referred to as jump discontinuities.
• Infinite discontinuities
♣
Consider the function 1 , x = 0,
for x = 0;
f (x) =
for x = 0.
(8.101)
This function is discontinuous because lim f (x) = ±∞,
(8.102)
x→0±
namely the limit does not exist. The graph of the function presents an infinitely large jump. However, this is not relevant for deciding that a function is discontinuous. For, consider the function 1 , x2 = 0,
f (x) =
for x = 0; for x = 0.
(8.103)
333
8. Limits. Continuity of functions
In this case, the function is even, and hence we have “no jump.” For, 1 1 − = 0. (8.104) lim x→0 x2 (−x)2 Nonetheless, this function is discontinuous. In fact, this is the case addressed in Eq. 8.94. Indeed, the above equation, while correct, is misleading, because we have lim x→0
±
1 = ∞. x2
(8.105)
Thus, the function is discontinuous, again because the limit of f (x) = 1/x2 , ± as x tends to 0 , does not exist. Discontinuities of the type discussed above will be referred to as infinite discontinuities.
• Essential discontinuities
♣
We are not done yet! The notion of discontinuous functions (Definition 153, p. 329) is broader than one might infer from the preceding examples. We have one last type of discontinuity, which is more subtle than the preceding ones. Let us consider the function 1 fA(x) = sin , x = 0,
for x = 0; for x = 0.
(8.106)
For this function, the limit for x that tends to 0 does not exist; the function oscillates infinite times between −1 and 1, as x tends to 0. Thus, on the basis of Definition 154, p. 330, this function is to be considered discontinuous. Discontinuities of this type, namely those in which the limit of the function does not exist, not even in the sense the f (x) tends to ±∞, as x tends to x0 , will be referred to as essential discontinuities. ◦ Comment. To further clarify the situation, consider the function 1 fB(x) = x sin , x = 0,
for x = 0; for x = 0.
(8.107)
For any ε > 0, we can find a δ > 0 such that |fB(x) − 0| < ε for any |x − 0| < δ (use any δ ≤ ε). Thus, the limit exists and equals 0. The function is continuous!
334
Part II. Calculus and dynamics of a particle in one dimension
◦ Comment. For future reference, consider also the function 1 fC(x) = x2 sin , x = 0,
for x = 0; for x = 0.
(8.108)
For any ε > 0 we can find a δ > 0 such that |fC(x) − 0| < ε for any |x − 0| < δ √ (use any δ ≤ ε ). This function is also continuous.
• Discontinuities of the first and second kind If the limit from the left and that from the right (Eqs. 8.50 and 8.51) exist, we speak of discontinuity of the first kind, otherwise we speak of discontinuity of the second kind. In other words, removable and jump discontinuities are of the first kind, infinite and essential discontinuities are of the second kind.
8.5 Properties of continuous functions
♣
Here, we address a few key properties of continuous functions, such as boundedness and uniform continuity. As a preliminary, we introduce the so–called successive subdivision procedure.
8.5.1 Successive subdivision procedure In the next subsection, we will show that a function that is continuous over a close interval is also bounded there (Theorem 82, p. 336). To prove this, we will use a procedure that will be repeatedly encountered to address similar problems. In order to avoid repetitions, such a procedure is described here in its full generality. Specifically, we have the following Definition 155 (Successive subdivision procedure). Let us assume that in the neighborhood of one or more points of a bounded closed interval, say [a, b], we have a certain property, say Property P, which is such that, if we divide the interval into two, at least one of the two closed subintervals contains one or more points that have Property P. We want to identify at least one point, say c, in an arbitrarily small neighborhood of which such a property
8. Limits. Continuity of functions
335
holds. To this end, we can use the following procedure, which will be referred to as the successive subdivision procedure. Let us subdivide the interval [a, b] into two equal subintervals [a, c0 ] and [c0 , b], with c0 = 12 (a + b). At least one of the two subintervals will have Property P. [If both have Property P, just pick one of the two.] Let us denote this subinterval as J1 = [a1 , b1 ], with b1 − a1 = (b − a)/2. [Specifically, we either have a1 = a and b1 = c0 , or a1 = c0 and b1 = b.] Next, let us subdivide the interval J1 into two equal ones. Again, at least one of the two subintervals will have Property P. We denote this as J2 = [a2 , b2 ], with b2 − a2 = (b − a)/4. Repeating the procedure k times, we generate a sequence of smaller and smaller nested closed intervals Jk := [ak , bk ], with bk − ak = (b − a)/2k . We still have to prove that such a procedure identifies at least a point. For this, we have the following Theorem 81 (Successive subdivision procedure). The sequence ck = 1 2 (ak + bk ) generated by the successive subdivision procedure, presented above, converges to at least one point c ∈ [a, b]. ◦ Proof : The successive subdivision procedure generates two monotonic sequences of numbers, ak and bk , which are, respectively, non–decreasing and bounded above, and non–increasing and bounded below. In the limit, each of these two sequences converge to a∗ and b∗ , respectively (Theorem 74, p. 309). In addition, we have that necessarily a∗ = b∗ , because bk − ak = (b − a)/2k tends to zero as k tends to infinity. Thus, in the limit, we have identified a point c := a∗ = b∗ (with c ∈ [a, b] by construction), to which the sequence ck = 12 (ak + bk ) converges. ◦ Comment. The point is unique if, and only if, for each subdivision only one of the two subintervals has Property P.
8.5.2 Boundedness vs continuity We have the following Definition 156 (Bounded function). A function is said to be bounded at a point x iff there exists a real number M > 0 such that |f (x)| < M . A function is said to be bounded in an interval iff there exists an M > 0 such that |f (x)| < M , for all the points of the interval. A function that is not bounded is called unbounded. I claim that the definition of continuity at a point x = c implies that a function that is continuous at a given point is also bounded there. Indeed,
336
Part II. Calculus and dynamics of a particle in one dimension
the continuity implies that, for x = c, there exists a value f (c); hence, there is an M such that f (c) < M . [Remember that ∞ is not a real number. Hence, in the definition of continuity (Eq. 8.93), we cannot use f (c) = ∞.] Next, the question arises whether or not continuity in an interval implies that the function is bounded in that interval. The question is subtle: for instance, f (x) = 1/x is continuous for x ∈ (0, 1], but is not bounded there, since it tends to infinity as x goes to zero. In other words, there is no M such that f (x) < M , for any x ∈ (0, 1). Thus, functions that are continuous in a non–closed interval are not necessarily bounded there. However, things are different for closed intervals. For, we have the following Theorem 82. If a function f (x) is continuous in a closed interval [a, b], then f (x) is bounded in [a, b]. ◦ Proof : ♠ Assume, to obtain a contradiction, that the function f (x) is continuous in a closed interval [a, b], but not bounded. In other words, we assume that there exists at least one point where the function is unbounded. We can use the successive subdivision procedure introduced in Definition 155, p. 334, with Property P being that f (x) is unbounded. Then, the sequence ck = 12 (ak + bk ) converges to a point c ∈ [a, b], where the function is unbounded (Theorem 81, p. 335). However, from the continuity of f (x), we have that limx→c f (x) = f (c) < ∞. Thus, we have reached a contradiction. Hence, f (x) is bounded in [a, b]. We can ask ourselves the following question: “What happens when f (x) is continuous in the open interval (a, b)?” We have the following Theorem 83. If a function f (x) is continuous in an open interval (a, b), with −∞ < a < b < ∞, then f (x) may be unbounded only in arbitrarily small neighborhoods of a and/or b. ◦ Proof : Indeed, f (x) is bounded in the closed interval [a + ε, b − ε] (with an arbitrarily small ε > 0), because of Theorem 82 above.
8.5.3 Supremum/infimum, maximum/minimum Here, we present some material needed in the next subsection, namely the definitions of maximum, supremum and upper bound, and the corresponding ones, minimum, infimum and lower bound. Given a finite number of points Pk (k = 1, 2, . . . , n) and their corresponding real numbers xk (k = 1, 2, . . . , n), the maximum (minimum) denotes the
8. Limits. Continuity of functions
337
highest (lowest) of these numbers. Things get more subtle when we have an infinite (countable or uncountable) number of points. For instance, consider the numbers 1, 1/2, 1/3, 1/4, . . . , 1/n, . . . , or the points with x ∈ (0, 1]. In both cases, the maximum is clearly 1. Is there a minimum? Can we say that 0 is the minimum? How do maximum, supremum and upper bound differ from each other? Let us begin with a gut–level introduction. Roughly speaking, if you have a collection of points, the maximum is the one corresponding to the largest number (iff it exists). For instance, for the set of points in [0, 1], the maximum is x = 1. However, for the points in [0, 1), x = 1 is no longer the maximum because it does not belong to the given set. In this case, we use the term supremum. In general, x∗ is called the supremum of a set of points S iff any point with x > x∗ does not belong to S, and, at same time, there exists at least one point x of S in any lower neighborhood of x∗ , no matter how small. [The point x can be x∗ itself, if x∗ belongs to the set.] Finally, a point x∗ is called an upper bound if any point above x∗ does not belong to the set. [To be clear, the supremum is an upper bound, but the converse is not true – any number greater than (or equal to) the supremum is an upper bound.] Analogous definitions hold for minimum, infimum and lower bound. Let us be more specific and introduce the appropriate terminology that allows us to be absolutely clear on this matter (I would even say, unequivocally clear). Let us begin with the following definitions: Definition 157 (Upper and lower bounds). Consider a set of infinitely many (countable or uncountable) points on the real line, not necessarily an interval. Iff there exists a number X such that x ≤ X for all the points x of the set, this is said to be bounded above and X is called an upper bound. Iff such an X does not exist, the set is called unbounded above. The definitions of bounded below, lower bound and unbounded below are analogous, with ≥ replacing ≤. A set of points of the real line is called bounded iff it is bounded both above and below. [As stated there, the intervals in Eq. 8.1 are bounded, whereas those in Eq. 8.2 are unbounded.] Definition 158 (Supremum and infimum). Consider a set of points on the real line, bounded above. Consider the smallest of the upper bounds, namely the number x∗ such that any x > x∗ is an upper bound, whereas any x < x∗ is not an upper bound. The point x∗ is called the supremum (or the least upper bound). The definition of the infimum (or greatest lower bound) is analogous. For instance, in all four bounded intervals in Eq. 8.1, a is the infimum, b is the supremum, and any number x ≥ b (x ≤ a) is an upper (lower) bound.
338
Part II. Calculus and dynamics of a particle in one dimension
Remark 68 (Supremum always exists). Here is a proof that for sets bounded above, the supremum always exists. Consider: (i) any point of the set, and (ii) any upper bound (along with the corresponding interval). They define an interval. Use the successive subdivision procedure (Definition 155, p. 334). After each subdivision, remove the upper subinterval iff it contains no point of the set, but use it if it contains at least one point. The sequence ck = 12 (ak + bk ) converges (Theorem 81, p. 335). Finally, we can address maxima and minima. We have the following Definition 159 (Maximum and minimum). Consider a set of points on the real line. If the supremum is a point of the set, we call it the maximum. Otherwise, we say that the set has no maximum. The definition of the minimum is analogous. For instance, for the four intervals presented in Eq. 8.1, b is a maximum (and hence a supremum) only for (a, b] and [a, b]. The other two intervals have no maximum — b is just the supremum. Similarly, a is a minimum (and hence an infimum) only for [a, b) and [a, b]. The other two intervals have no minimum — a is just the infimum. Accordingly, closed intervals have a maximum and a minimum, open intervals have no maximum and no minimum.
8.5.4 Maxima and minima of continuous functions In this subsection, we deal with maxima and minima of continuous functions. Specifically, we draw a distinction between global and local maxima/minima of continuous functions. ◦ Warning. Some authors use the term absolute (relative) maximum/minimum, instead of global (local) maximum/minimum. We have the following definitions: Definition 160 (Global maximum and minimum). The terms global maximum and global minimum of a function f (x) in a given interval I denote the maximum and minimum of the points of its range (see Fig.8.9). Specifically, we say that x0 is a point of global maximum iff f (x0 ) ≥ f (x) for all x ∈ I. [Analogous for minimum.] Definition 161 (Local maxima and minima). A function f (x) defined on an interval bounded by a and b is said to attain a local minimum at x = c ∈ (a, b) iff there exists a ε arbitrarily small, with a < c − ε and c + ε < b, such that
339
8. Limits. Continuity of functions
Fig. 8.9 Local/global maxima/minima
f (x) > f (c),
(8.109)
for all x ∈ (c − ε, c + ε), with x = c (see again Fig. 8.9). The definition for local maximum is analogous — just replace f (x) > f (c) with f (x) < f (c). [Note that the definition requires c ∈ (a, b). Therefore, the definition excludes the possibility that local minima (or maxima) might occur at the endpoints of the interval, even when these belong to the domain of definition of the function. For instance, at the endpoint on the far left of Fig. 8.9, we have a global minimum, but not a local minimum.] ◦ Warning. Some authors use the term strict local maximum if Eq. 8.109 is satisfied. For them, the term local maximum requires only f (x) ≥ f (c).
• Comments
♥
As an illustration, note that the function f1 (x) = x2 has no global minimum and no global maximum in the open interval (0, 2). Indeed, in (0, 2), the infimum of the range of f1 (x) is 0, which is the value the function attains at x = 0. However, the point x = 0 does not belong to the interval under consideration. In other words, for such an interval, the value f1 = 0 does not belong to the range of the function. The same considerations apply to the maximum. Moreover, the same function f1 (x) = x2 , in the closed interval [0, 2] has a global minimum f1 (x) = 0 at x = 0, and a global maximum f1 (x) = 4 at x = 2. [These are not local maxima and minima, because the above definition of local maximum and minimum excludes the endpoints of the interval of definition.] On the other hand, for x ∈ [−ε, 2] (ε > 0), the same function f1 (x) = x2 has a global minimum f1 (x) = 0 at x = 0, which is also a local minimum, and a global maximum f1 (x) = 4 at x = 2, but no local maximum.
340
Part II. Calculus and dynamics of a particle in one dimension
In order for you to have a better appreciation of the implications of the definition, let us examine some additional examples. Consider a function that is not continuous in a closed interval. Specifically, consider 1 f2 (x) = 1 + sin , x = 1,
for x ∈ (0, 1]; for x = 0.
(8.110)
This function is not continuous at x = 0, since limx→0 sin(1/x) does not even exist. It is apparent, however, that this function reaches a local minimum f2 (x) = 0, whenever sin(1/x) = −1, namely for 1/xk = (− 12 + 2k)π (with k = 0, 1, . . . ). Moreover, f2 = 0 is also a (multiple) global minimum. On the other hand, consider the function 1 f3 (x) = 1 + x2 + sin , x
for x ∈ (0, 1].
(8.111)
If x 1, the second term is very small and we can obtain the approximate values of x, in which the function reaches local minima. This occurs when sin(1/xk ) −1, namely at 1/xk (− 12 + 2k)π (k = 0, 1, . . . ). Correspondingly, we have f3 (xk ) x2k . [The exact values of the abscissa xk where the function attains the minimum are not of special interest here.] Thus, as x tends to zero, the values of the local minima decrease and tend to zero. However, the function never attains the value 0. Thus, the function does not have a global minimum in the interval (0, 1]. In addition, consider the function 1 2 f4 (x) = x 1 + x + sin , for x ∈ (0, 1]; x = 0,
for x = 0.
(8.112)
This function has a single global minimum, namely f4 (0) = 0. For, we have f4 (x) > 0 for any x ∈ (0, 1]. [Indeed, the limit of f4 (x) as x tends to zero exists, and equals zero. Thus, f4 (x) is continuous over the closed interval [0, 1], and Theorem 84 below guarantees that a global minimum exists in the interval.]
• Weierstrass theorem for continuous functions Recall the difference between maximum/minimum and supremum/infimum (Subsection 8.5.3). In general, if we are dealing with an open interval, continuous functions do not necessarily have a maximum (minimum), in the sense that the supremum (infimum) do not necessarily belong to the range
8. Limits. Continuity of functions
341
of the function for the interval under consideration. However, we have the following10 Theorem 84 (Weierstrass theorem for continuous functions). Any function f (x) that is continuous in the closed interval [a, b] attains a global maximum (possibly multiple) for some xMax in [a, b] and a global minimum (possibly multiple) for some xMin, also in [a, b]. ◦ Proof : ♠ Let us focus on the maximum (analogous considerations hold for the minimum). The function f (x) is continuous in a closed interval [a, b], and hence is bounded in [a, b] (Theorem 82, p. 336). Accordingly, there exists a supremum of f (x) in [a, b] (Remark 68, p. 338), which is denoted by f+. The only question that we have to address is whether or not there exists at least one point, say x+, where the function attains the value f+. We can use the successive subdivision procedure (Definition 155, p. 334), with Property P being that the supremum equals f+. The sequence ck = 12 (ak + bk ) converges to at least one point c ∈ [a, b] (Theorem 81, p. 335). Finally, observe that the continuity of f (x) yields that at x = c the function necessarily attains the value f (c) = f+, namely its supremum, that is, its maximum since c belongs to [a, b]. Remark 69. In the above theorem, we assumed that the function f (x) is continuous in the closed interval [a, b]. This hypothesis is essential to the validity of the theorem, in the sense that, if f (x) has neither a global maximum nor a global minimum, then we necessarily have that the function is either discontinuous, or is continuous over an interval that is not closed. In order to illustrate this, let us present some examples. Consider first a function that is not continuous in a closed interval. Specifically, consider, in the closed interval [−π/2, π/2], the function in Eq. 8.97, namely f0 (x) = (sin x)/x for x = 0 and f0 (0) = 0. This function is continuous in (0, π], but discontinuous at x = 0. Its supremum is clearly f (x) = 1 (limit for x that tends to 0; Eq. 8.62). However, at x = x+ = 0, we have f0 (0) = 0. Hence, the value 1 is just the supremum, but not the maximum! Next, consider a function that is continuous in an open interval. For instance, consider again the function f (x) = (sin x)/x, this time in the open interval (0, π), where it is continuous. The supremum is again equal to 1 (limit for x that tends to 0). However, the point x = 0 is not part of the interval (0, π). Thus, here as well the value 1 is just the supremum, not the maximum! 10
Named after Karl Weierstrass (1815–1897), a German mathematician considered by many as the father of rigorous modern analysis. The Bolzano–Weierstrass theorem (Subsection 8.8.2) and the Weierstrass M –test for absolutely and uniformly convergent series (Vol. II) are also named after him.
342
Part II. Calculus and dynamics of a particle in one dimension
On the other hand, consider the function f1 (x) := (sin x)/x (x = 0), with f1 (0) = 1 (Eq. 8.92). This function attains the maximum f1 (0)=1 in the interval [0, π). The fact that the interval is open at x = π does not affect the result regarding the maximum. However, if we want it to have also a minimum, we have to consider the closed interval [0, π].
8.5.5 Intermediate value properties Here, we address some important properties of continuous functions. We have the following Lemma 11. Let f (x) denote a continuous function in [a, b]. Assume that f (a) and f (b) have opposite signs. Then, there exists at least one point c ∈ (a, b) such that f (c) = 0. ◦ Proof : ♠ I could say that the theorem is intuitively clear, and leave it at that. However, we can do better. We can use again the successive subdivision procedure (Definition 155, p. 334), with Property P being that the two endpoints have opposite signs. In the process, we have two possibilities: either we encounter a ck , with f (ck ) = 0 (end of the search), or we obtain a sequence ck = 12 (ak + bk ) that converges to a point c ∈ [a, b] (Theorem 81, p. 335). I claim that f (c) = 0. Assume, to obtain a contradiction, that f (c) = 0, say f (c) > 0. In this case, the continuity of f (x) implies that, given any ε > 0, there exists a δ > 0 such that, for any x with |x − c| < δ, we have |f (x) − f (c)| < ε, namely f (c) − ε < f (x) < f (c) + ε. Let us choose ε < f (c). Then, we have obtained that f (x) > 0 for any x with |x − c| < δ. This contradicts the fact that, by construction, any of the intervals (ak , bk ) contains c and has opposite values at the endpoints. [Choose |bk − ak | < δ.] Thus, f (c) = 0. We have the following Theorem 85 (Intermediate value theorem). If a function f (x) is continuous in [a, b], then there exists at least one point c ∈ (a, b) such that f (c) = fc , where fc is any prescribed number between f (a) and f (b). ◦ Proof : Apply Lemma 11 above to the function f (x) − fc .
We also have the following Corollary 3. If a function f (x) is continuous in [a, b], then it attains any value between its global maximum and minimum.
343
8. Limits. Continuity of functions
◦ Proof : Apply Theorem 85 above to a closed interval having, as endpoints, the points where f (x) attains its global maximum and minimum.
8.5.6 Uniform continuity
♣
We have said that a function is continuous at x = c iff, given any ε > 0 arbitrarily small, there exists a δ > 0 such that |f (x) − f (c)| < ε, for any x with |x − c| < δ. It is important to note that δ in general depends upon c, namely δ = δ(c). To delve deeper into this issue, let us consider the function f (x) = 1/x, which is continuous, for x ∈ (0, 1). This means that for any c, given an ε > 0 as small as you wish, but fixed, there exists a δ(c) such that |f (x) − f (c)| ≤ ε, for any x with |x−c| ≤ δ(c). Let us concentrate on the point x = c−δ (namely the lower boundary of the interval) and impose 1 1 − = ε, c − δ(c) c
(8.113)
with δ < c. This yields δ(c) = ε c2 /(1 + ε c) > 0. Here, we have uncovered an important fact, namely that δ(c) ∈ (0, c) tends to zero as c tends to zero. Accordingly, it is impossible to have a fixed value δ0 > 0 that is valid for all values of c. It is important to distinguish between those functions for which the infimum of δ(c) is greater than zero and those for which such an infimum equals zero. Indeed, for the first group, there exists a δ0 > 0, namely the infimum of δ(x), which is good for any x. For the second group, such a δ0 > 0 does not exist, simply because the infimum of δ(x) equals zero. For the functions for which there exists a δ0 > 0 that is valid for all the values of x, we are guaranteed some very interesting properties, some of which are presented in the remainder of this section. For this reason, it is convenient to introduce the following Definition 162 (Uniformly continuous function). A function is said to be uniformly continuous in a given interval N iff given an ε > 0 arbitrarily small, there exists a δ > 0 such that |f (x1 ) − f (x2 )| < ε, for all of the values of x1 and x2 in N , with |x1 − x2 | < δ. We have the following Theorem 86 (Uniform continuity). If a function f (x) is continuous in a bounded closed interval [a, b], then f (x) is uniformly continuous in [a, b].
344
Part II. Calculus and dynamics of a particle in one dimension
◦ Proof : Assume, to obtain a contradiction, that f (x) is continuous, but non–uniformly continuous in [a, b]. Again, let us use the successive subdivision procedure (Definition 155, p. 334), with Property P being that f (x) is non– uniformly continuous. Let c ∈ [a, b] denote a point to which the sequence ck = 1 2 (ak +bk ) converges (Theorem 81, p. 335). The function is continuous at c by hypothesis, because c ∈ [a, b]. Accordingly, for any ε1 = 12 ε > 0 arbitrarily small, there exists a δ1 such that |f (x)−f (c)| < ε1 , for any x with |x−c| < δ1 . Next, consider Jk := [ak , bk ], with k such that bk − ak = (b − a)/2k < δ1 ; for any two points, say x1 and x2 in Jk , we have |xh − c| < δ1 (h = 1, 2), and hence |f (xh )−f (c)| < ε1 . Therefore, we have (use |A−B| ≤ |A−C|+|B −C|, Eq. 1.71) |f (x2 ) − f (x1 )| ≤ |f (x2 ) − f (c)| + |f (x2 ) − f (c)| < 2 ε1 =: ε.
(8.114)
This contradicts the assumption that f (x) is non–uniformly continuous in Jk := [ak , bk ]. Therefore, f (x) is uniformly continuous in [a, b]. We can ask ourselves the following question: “What happens when f (x) is continuous in the open bounded interval (a, b)?” We have the following Theorem 87. If a function f (x) is continuous in an open interval (a, b), with −∞ < a < b < ∞, then f (x) may be non–uniformly continuous only in arbitrarily small neighborhoods of a and/or b. ◦ Proof : Indeed, f (x) is uniformly continuous in any closed interval [a + ε, b − ε], because of Theorem 86 above.
• What causes non–uniform continuity? We have already seen that f (x) = 1/x is continuous in (0, 1], but non– uniformly continuous in the same interval. [It is not continuous in [0, 1], since it is not even defined for x = 0.] Questions: “What makes f (x) = 1/x non– uniformly continuous? Is it the fact that the slope is vertical at x = 0? Or, is it the fact that the function is unbounded at x = 0?” In order to shed some light on this issue, here we consider a few examples of uniformly or non–uniformly continuous functions. √ Let us consider the function f (x) = 3 x , for x ∈ [0, 1] (Fig. 8.10). This function is continuous over the closed interval [0, 1], and hence is uniformly continuous (Theorem 86 above), even though the slope is vertical for x = 0. Thus, what makes f (x) = 1/x not–uniformly continuous in (0, 1] is not the fact that the slope is vertical for x = 0. ◦ Comment. Do you want a more direct and gut–level proof? Here it is! Let us consider first the endpoint x + δ. We want to find δ > 0 such that
345
8. Limits. Continuity of functions
Fig. 8.10 The function f (x) =
√ 3
x
√ 3
√ √ x + δ − 3 x < ε. This implies 0 < δ < ( 3 x + ε)3 − x, which yields 0 < δMin = δ(0) < ε3 = 0. Next, let us address the endpoint x−δ. We want to find √ √ √ δ > 0 such that 3 x − 3 x − δ < ε. This implies 0 < δ < x − ( 3 x − ε)3 , which yields 0 < δMin = δ(0) < ε3 = 0 again. This confirms the results obtained in the preceding paragraph. Next, let us check whether what makes the function f (x) = 1/x non– uniformly continuous is the fact that, as x that tends to zero, the function tends to infinity. Not even this is the case! To see this, consider the function f (x) = sin(1/x): as x tends to zero, this function remains bounded. Nonetheless, the function is non–uniformly continuous in (0, 1], as you may verify. ◦ Comment. These examples were presented simply to show you that uniform continuity is more subtle than it appears. Accordingly, whenever the situation is not clear, the only suggestion I have is for you to go back to the definition and see if it applies. Fortunately for us, we will not encounter any such cases.
8.5.7 Piecewise–continuous functions We have the following Definition 163 (Piecewise–continuous function). A function f (x) is said to be piecewise–continuous in a given closed interval [x0 , xn ] iff the interval may be divided into a finite number of subintervals (xk−1 , xk ) (k = 1, . . . , n), inside each of which f (x) is continuous, and in addition f (x) has only removable and/or jump discontinuities at x = xk (k = 0, . . . , n).
346
Part II. Calculus and dynamics of a particle in one dimension
Remark 70 (Convention on piecewise–continuous functions). In the remainder of this book, we use the following convention. [This convention is used because it streamlines some of the proofs considerably.] Given any piecewise–continuous function, its values at the boundaries xk of the subintervals will be conveniently modified by removing any removable discontinuity. With this correction, a piecewise–continuous function can only have jump discontinuities (a finite number of them). Then, at any jump discontinuity, we impose that the function has two values (see Section 6.7, on multi–valued functions), so as to make the function continuous in any of the closed subintervals [xk−1 , xk ] (k = 1, . . . , n). [For instance, with this convention, the − Heaviside function (Eq. 8.98) will be redefined so as to have H(0 ) = 0 and + H(0 ) = 1.] Let us introduce the following Definition 164 (The symbols mink and maxk ). Given n numbers ak with k = 1, . . . , n, the symbols min[ak ] k
and
min[a1 , a2 , . . . , an ]
(8.115)
denote the lowest of the values a1 , a2 , . . . , an . Similarly, the symbols maxk [ak ] and max[a1 , a2 , . . . , an ] denotes the highest of the values a1 , . . . , an . Then, we have the following Theorem 88. A function f (x) that is piecewise–continuous in an interval [x0 , xn ] is: (i) bounded in [x0 , xn ] and (ii) uniformly continuous in [x0 , xn ], with the exception of the points x1 , . . . , xn where f (x) is discontinuous. [Of course, here we adopt the convention in Remark 70 above, so that there are only jump discontinuities (no removable ones).] ◦ Proof : According to Theorem 82, p. 336, f (x) is bounded in any of the closed subintervals Jk = [xk−1 , xk ] (k = 1, . . . , n), and hence is bounded in [x0 , xn ]. In addition, f (x) is uniformly continuous in [xk−1 , xk ] (Theorem 86, p. 343, on uniform continuity). Accordingly, given an arbitrarily small ε, it is possible to find within any in Jk a δk such that |f (x) − f (y)| < ε, for |x − y| < δk . Next, choose δ := mink δk . This value may be used to obtain that the function is uniformly continuous in [x0 , xn ], except for the points xk , with k = 1, . . . , n − 1 (jump discontinuities).
8. Limits. Continuity of functions
347
8.6 Contours and regions In the next section, we extend the results of the preceding sections to functions of two or three variables. To this end, we need some basic materials on two– and three–dimensional spaces, which are introduced here. Specifically, here we present the definitions of lines, contours, surfaces and regions. We first address the issue in two dimensions. Then, we extend it to three dimensions. In addition, we define simply connected and multiply connected regions. We will need the following Definition 165 (Union, intersection and complement of sets). Consider two sets, A and B, in Rk (k = 1, 2, 3). The union of A and B, denoted by S = A ∪ B, is the set composed of all the points that belong to either A or B. On the other hand, the intersection of A and B, denoted by S = A ∩ B, is the set composed of all the points that belong to both, A and B. Moreover, the relative complement of A with respect to B, denoted by S = B\A, is the set of elements of B that do not belong to A. Finally, the absolute complement, AC , of A in Rk is the set composed of all the points that do not belong to A, namely AC = Rk \A.
8.6.1 Two dimensions We have the following definitions: Definition 166 (Smoothness of a line). A line, which by definition is continuous, and does not to present bifurcations (Remark 45, p. 176), is called smooth at one of its points P iff its unit tangent is continuous at P . A line is called smooth iff its unit tangent is continuous at all of its points (Definition 91, p. 180). A line is called piecewise–smooth iff its unit tangent is discontinuous at a finite number of points. Definition 167 (Open and closed lines. Contour). A line may close on itself, or connect two points, or go to infinity, at one or both endpoints (Remark 45, p. 176). A piecewise–smooth finite–length line that does not close on itself is called an open line. [The fact that it is finite–length implies that open lines cannot go to infinity. They necessarily have endpoints.] A finite–length piecewise–smooth line that closes on itself will be referred to as a closed line. A non–self–intersecting closed line (piecewise–smooth, by definition) will be referred to as a contour. [Iff the line is smooth, the contour is called smooth.]
348
Part II. Calculus and dynamics of a particle in one dimension
Remark 71. (Property of contours). A contour C divides the plane into two sets of points, one composed of all the points inside C, and the other of all the points outside C. For a gut–level proof, note that for interior points the angle swept by a segment P Q, with Q ∈ C, as Q travels along C counterclockwise, equals 2π, whereas it vanishes for exterior points. For a more sophisticated proof, let C denote a contour (piecewise smooth, by definition). The outward piecewise normal is uniquely defined as a ray that: (i) is perpendicular to the piecewise–smooth tangent, and (ii) placed to the right for an observer traveling along S counterclockwise. This clearly distinguishes between the inner and outer regions. Remark 72. (Jordan curves). By adding the restriction that a contour is, at a minimum, piecewise–smooth, I intentionally distinguish contours from Jordan curves, for which such a restriction does not apply.11 [The Koch snowflake (Remark 64, p. 319; Fig. 8.5, p. 318) does not qualify as a contour, because it does not have a tangent at any its points and also because it has an infinite length.] This is done for the sake of simplicity, since it is more than adequate for our needs. Indeed, if we replace the term “contour” with the term “Jordan curve,” the property stated in the remark above is the subject of the Jordan curve theorem. [The Jordan curve theorem not easy to prove. As a matter of fact, the proof is very complicated. If you would like to have an idea of how complicated it gets, see the paper by Th. C. Hales (Ref. [29]).] As stated in Remark 71, p. 348, a contour C divides R2 into two sets of points (one inside and one outside C), which are called regions. However, in this book, the term “region” is not limited to the sets of points inside/outside a contour. For instance, consider the interior region R0 defined by a contour C0 . Then, consider a second contour, say C1 , which is fully contained in R0 , and let R1 denote the region inside C1 . If we remove from R0 the points that belong to R1 , we obtain a set of points, which we still call a region. All this is to justify the following Definition 168 (Region on a plane). Consider a contour C0 on a plane, along with n separate contours Ck (k = 1, . . . , n, with n ≥ 0) that are completely inside C0 , and do not intersect each other. [Warning: The limit case in which some of the Ck shrink to a single point are included.] The set R composed of points that are located inside C0 and outside Ck (k = 1, . . . , n), plus possibly points on the boundaries (namely points on C0 and Ck , k = 1, . . . , n), 11 Named after the French mathematician Marie Ennemond Camille Jordan (1838– 1922), who is especially known for his work in group theory, not addressed in this book. In Vol. II, we will deal with the Jordan form of non–diagonalizable matrices and the Jordan lemma on complex variables, also named after him. [Camille Jordan is not to be confused with Wilhelm Jordan, of Gauss–Jordan elimination fame.]
349
8. Limits. Continuity of functions
is called a region. If none of the boundary points (namely on C0 and Ck (k = 1, . . . , n) belong to R, then R is called an open region. If all the boundary points belong to R, then R is called a closed region. [Recall that a contour, has a finite length (Definition 167, p. 347). Therefore, the boundary of a region, being the union of a finite number of contours, always has a finite length. For instance, the geometrical figure inside the Koch flake does not qualify as a region, because it has an infinitely long boundary.] Any region bounded by a contour C0 is called an interior region; C0 is called the outer boundary, Ck (k = 1, . . . , n, with n ≥ 0) the inner boundaries. If we take the limit as C0 tends to infinity, the resulting region is referred to as an exterior region. ◦ Comment. The definition implies that a region is path–connected, that is, any two points P and Q in R may be connected by a line in R. ◦ Warning. I intentionally use the above definition of region to distinguish a region from a domain, namely a path–connected open set in a finite– dimensional vector space (not to be confused with the domain of definition of a function). The motivation is the same as that in Remark 72, p. 348, regarding the difference between a contour and a Jordan curve. The Koch flake domain (the set of points inside a Koch snowflake curve) does not qualify as a region. Using domains instead of regions (as defined above) would make our life much more complicated. The above definition of regions is all we need.
8.6.2 Three dimensions
♣
In Definition 167, p. 347, we defined a contour on a plane as a non–self– intersecting line that closes on itself. Then, in Definition 168 above, we defined an open region on a plane as the collection of all the points inside C0 and outside Ck (k = 1, . . . , n), with Ck inside C0 and non–self–intersecting each other. Here, we want to extend these concepts to three dimensions. Accordingly, we begin with the boundary of a region in space, namely surfaces instead of contours.
• Surfaces In Subsection 7.2.2, we discussed the different types of representation for surfaces (Definition 137, p. 261). Here, we want to give a more precise definition of surfaces. Specifically, we have the following definitions:
350
Part II. Calculus and dynamics of a particle in one dimension
Definition 169 (Patch. Surface). A smooth patch is a collection of points obtained by a one–to–one smooth mapping of a polygon that preserves the number of sides. [The term smooth is used to mean that any segment is transformed into a smooth line, and vice versa. Accordingly, the normal of a patch is continuous. The contour of the polygon is mapped into the contour of the patch. Hence, the contour of the patch is piecewise smooth. The mapping of a polygon side is called a side of the patch.] A surface S consists of the unions of a finite number of smooth patches, obtained by joining compatible patches side–by–side, with three restrictions: (1) any side of each patch is connected, at most, to one other patch (no bifurcations), (2) each patch is connected at least to another patch (no separations), and (3) the patches do not intersect each other. Accordingly, the normal is at a minimum piecewise–continuous. Therefore, any surface is, by definition, piecewise–smooth. [Recall the difference between contours and Jordan curves (Remark 72, p. 348).] If the normal is continuous everywhere, the surface is called smooth. Moreover, again by definition, surfaces are path–connected, namely any two points P and Q on a surface S may be connected by a line all contained in S. Definition 170 (Contour on S). The definition of a contour over a surface is analogous to that on a plane. Note that a sphere and a hemisphere (namely one half of a sphere) are both surfaces. It is convenient to distinguish between these two types of surfaces. Accordingly, let us introduce the following Definition 171 (Open and closed surfaces). A surface is called closed iff it does not have a boundary. [In other words, a surface is called closed iff there are no unmatched patch sides.] On the other hand, a surface is called open iff it has a boundary, consisting of one or more contours. Let us consider some examples. A sphere and torus (e.g., the surface of a doughnut) are closed surfaces.12 On the other hand, a hemisphere, or the lateral surface of a truncated cylinder are open surfaces. Also, any surface that is obtained by continuous deformation of a sphere is closed. Note that, a contour on a closed surface divides it into two open subsurfaces. Finally, in analogy with Definition 168, p. 348, of regions in two dimensions, we have the following 12
The abbreviation “e.g.” stands from “exempli gratia,” Latin equivalent of “for example.”
351
8. Limits. Continuity of functions
Definition 172 (Region in space). Let S0 denote a closed surface in space, whereas Sk (k = 1, . . . , n, with n ≥ 0) are closed surfaces completely inside S0 that are external to each other and typically have no points in common. [Warning: The limit case in which some of the Sk shrink to a single point are included.] The set of points inside S0 and outside Sk (k = 1, . . . , n), plus possibly points on the boundaries (namely points on S0 and Sk , k = 1, . . . , n), is called a region in space. If none of the boundary points (namely the points of S0 and Sk ) belong to R, then R is called an open region. If all the boundary points belong to R, then R is called a closed region. Any region inside a closed surface S0 is called an interior region. On the other hand, as S0 tends to infinity, the resulting region is referred to as an exterior region. ◦ Warning. Note that the terms “open” and “closed ” have totally different meanings when they are referred to: (i) a line (Definition 167, p. 347) or surface (Definition 171 above), and (ii) an interval (Definition 142, p. 301), or a region on a plane or in space (Definition 168, p. 348, or Definition 172 above, respectively).
8.6.3 Simply– and multiply–connected regions
♣
In Chapter 19, we will need the some additional notions regarding regions. For the sake of organizational clarity, these notions are presented here. We have defined a contour on a plane (Definition 167, p. 347). We have the following Definition 173 (Contour in space). Any closed piecewise–smooth line that may be continuously deformed into a circle, without forming knots, is called a contour. ◦ Comment. A circle may be shrunk to a point. Therefore, a contour can always be shrunk to a point (without forming knots). In doing this, the contour sweeps a surface. Thus, we can also define a contour as a closed piecewise–smooth line that may be conceived as the boundary of an open surface that has no holes. We have the following definitions: Definition 174 (Shrinkable contour. Shrinkable surface). A contour C in an open two– or three–dimensional region R is called shrinkable in R iff it may be continuously shrunk to any point in R without ever leaving R (namely without ever crossing points that do not belong to R). A closed surface S in an open three–dimensional region R is called shrinkable in R
352
Part II. Calculus and dynamics of a particle in one dimension
iff it may be continuously shrunk to any point in R without ever leaving R. [The term reducible is also used (Truesdell, Ref. [71], p. 11).] We will need the following Definition 175 (Simply connected region). A region R (in two or three dimensions) is called simply connected iff any contour C in R is shrinkable. [Note that the definition implies that iff R is simply connected, then for any C in R there always exist a surface S in R, which has C as a boundary. For, as C is shrunk to zero, it sweeps a surface with such a property. Indeed, this could be an alternate definition of simply connected regions.] Definition 176 (Multiply connected region). Regions are not simply connected are called multiply connected. Specifically, a region R (in two or three dimensions) is called n-connected iff we have to perform n − 1 cuts to transform it into a simply connected region. Accordingly, a simply connected region is 1-connected. An example of how one cut of a 2-connected region yields a 1-connected region is shown in Figs. 8.11 and 8.12.
Fig. 8.11 A 2-connected region
Fig. 8.12 A 1-connected region
Remark 73. On a plane, an annulus (namely the region between two concentric circles) is not simply connected. Even a disk with its center point removed is not simply connected. In general, if we remove a single point from a simply connected two–dimensional region, the resulting region is multiply connected. In three dimensions, the region inside a torus (namely a doughnut–shaped region) is 2-connected. The region outside a torus is also 2-connected. Even the region obtained by removing a straight line from R3 is 2-connected, and so is a coffee cup with one handle. A cup with two handles
353
8. Limits. Continuity of functions
is 3-connected. On the other hand, the region between two concentric spheres is simply connected, and so is the region outside a sphere, as you may verify.
8.7 Functions of two variables
♣
Here, we consider functions of more than one variable. For simplicity, we limit ourselves to functions of two variables. ◦ Comment. I’ll let you verify that much of the section may be easily modified to make it applicable to functions of three variables. Indeed, this comment holds true even for more than three variables. In this case, we use the terms hyperspace, hyperplane and hypersurface. Recall the explicit representation of a surface, namely z = f (x, y) (Eq. 7.10). More in general, we have the following Definition 177 (Function of two variables). A function of two variables is an algorithm, which, given the values of two variables, x and y, yields the value of u. It is denoted as u = f (x, y).
(8.116)
The graph of a function of two variables is a surface in a three–dimensional space. It is obtained by plotting the (orthogonal) x- and y-axes on the horizontal plane and the value of u along the vertical direction. For instance, the function u(x, y) = R2 − x2 − y 2 corresponds to the upper hemisphere having center at the origin and radius R (Eq. 7.80). To give you another example of a function of two variables, the above–the–sea–level height of a point on a mountain is a function of latitude and longitude; a scale model of the mountain is the graph of such a function. [In general, the variables x and y are not necessarily spatial coordinates. For instance, the density of the air is a function of two variables, such as pressure and temperature.] In this section, we address the issues of limits, continuity, boundedness and uniform continuity for functions of two variables.
8.7.1 Non–interchangeability of limits
♣
Limits are not always interchangeable. For instance, consider the function f (x, y) =
x2 − y 2 x2 + y 2
(x, y) = (0, 0) .
(8.117)
354
Part II. Calculus and dynamics of a particle in one dimension
We have limx→0 (x2 − y 2 )/(x2 + y 2 ) = −1, and hence x2 − y 2 = −1. lim lim 2 y→0 x→0 x + y 2
(8.118)
On the other hand, we have limy→0 (x2 − y 2 )/(x2 + y 2 ) = 1, and hence x2 − y 2 = 1. (8.119) lim lim 2 x→0 y→0 x + y 2 We can better understand the behavior of this function by using polar coordinates, x = r cos ϕ and y = r sin ϕ (Eq. 7.45). Combining with Eq. 8.117, and using sin2 ϕ + cos2 ϕ = 1 (Eq. 6.75), as well as cos2 ϕ − sin2 ϕ = cos 2ϕ (Eq. 7.65), we obtain f (ϕ) =
cos2 ϕ − sin2 ϕ = cos 2ϕ cos2 ϕ + sin2 ϕ
r = 0 .
(8.120)
In other words, f (x, y) is constant along the lines ϕ = constant, whereas, along the lines r = constant, the function varies like cos 2ϕ, independently of r. At r = 0 the function is not defined and presents a discontinuity as r tends to zero. [The graph of the function f (x, y) reminds me of a funny spiral staircase. Indeed, it is constant along r. On the other hand, it goes up and down twice along ϕ. Specifically, as ϕ goes from −π/2 to 0, the value of the function grows from −1 to 1. Then it decreases from 1 to −1, as ϕ goes 0 to π/2, and then it repeats itself.]
8.7.2 Continuity for functions of two variables
♣
In this subsection, we extend the definition of continuity to functions of two variables. Let us begin with the following definitions: Definition 178 (Neighborhood of a point on a plane). A circular neighborhood of a point P0 = (x0 , y0 ) of the (x, y)-plane is an open disk, centeredon P0 , namely the collection of all the points P = (x, y) with P P0 = (x − x0 )2 + (y − y0 )2 < δ, where δ is an arbitrary positive real number. A rectangular neighborhood of a point P0 = (x0 , y0 ) of the (x, y)plane is the collection of all the points P = (x, y), with |x − x0 | < δx and |y − y0 | < δy , where δx and δy are arbitrary positive real numbers. The above definition implies that a neighborhood is an open region. [In this book, we deal primarily with neighborhoods that are either circular or rectangular.
355
8. Limits. Continuity of functions
If the nature of the neighborhood is not relevant, we simply use the term neighborhood.] Definition 179 (Continuity of f (x, y) at a point). A function f (x, y) is said to be continuous at a point (x0 , y0 ) of the (x, y)-plane iff, given an ε > 0 arbitrarily small, there exists a neighborhood N0 of P0 = (x0 , y0 ) such that |f (x, y)−f (x0 , y0 | < ε, for any point P = (x, y) of the neighborhood. In other words, akin to the one–dimensional case (Eq. 8.95), continuity of f (x, y) at (x0 , y0 ) requires
f (x, y) − f (x0 , y0 ) = o[1], as (x, y) → (x0 , y0 ). (8.121) A function that is not continuous at a point (x0 , y0 ) is called discontinuous there. Definition 180 (Continuity of f (x, y) over a region). Let us begin with an open region R. Iff a function is continuous for all the points of the region, we say that f (x, y) is continuous in R. If the region is not open, we extend the definition akin to what we did for the one–dimensional case, in Definition 154, p. 330.
8.7.3 Piecewise continuity for two variables
♣
In analogy with Definition 163, p. 345, we have the following Definition 181 (Piecewise–continuous function). A function of two variables, f (x, y), is called piecewise continuous in a two–dimensional region R iff this region may be subdivided into a finite number of subregions Rk , within each of which f (x, y) is continuous with at most removable and/or jump discontinuities at the boundaries of Rk . [Recall that the length of the boundary of each Rk is finite (Definition 168, p. 348).] Remark 74 (Convention on piecewise–continuous function). In analogy with Remark 70, p. 346, for the one–dimensional case, we adopt the following convention. Given any piecewise–continuous function, f (x, y), its values at the boundaries of the subregions Rk are conveniently modified so that the resulting function is continuous in any of the closed subregions Rk . Of course, this implies that removable singularities are removed, and that the function has two values at any jump discontinuity point. Again, this convention is used throughout this book, because it streamlines some of the proofs considerably.
356
Part II. Calculus and dynamics of a particle in one dimension
8.7.4 Boundedness for functions of two variables
♣
In analogy with Definition 156, p. 335, for bounded functions of one variable, we have the following Definition 182 (Bounded function of two variables). A function f (x, y) is said to be bounded in a region R iff there exists M ∈ (0, ∞) such that |f (x, y)| < M , for all the points in R. We have the following Theorem 89. If a function f (x, y) is continuous in a closed region R, then f (x, y) is bounded in R. ◦ Proof : ♣ The proof is similar to that of Theorem 89, p. 356. I’ll let you take care of the details. Remark 75. Indeed, the restriction that function f (x, y) be continuous is not essential. The theorem is valid as well if the function is piecewise– continuous, as you may verify. [Hint: Apply the above theorem to each of the closed subregions where the function is continuous. Of course, here again we adopt the convention in Remark 74, p. 355.]
8.7.5 Uniform continuity in two variables
♣
We can also extend to two dimensions the notion of uniform continuity with the following Definition 183 (Uniformly continuous). A function f (x, y) is said to be uniformly continuous in a given rectangular region R iff, given an arbitrarily small ε > 0, there exist δx > 0 and δy > 0 such that, for any (x1 , y1 ) ∈ R and (x2 , y2 ) ∈ R, with |x1 − x2 | < δx and |y1 − y2 | < δy , we have that |f (x1 , y1 ) − f (x2 , y2 )| < ε. We have the following theorems: Theorem 90 (Uniform continuity). If a function f (x, y) is continuous over a bounded closed region R ∈ R2 , then it is also uniformly continuous in R. ◦ Proof : The proof is conceptually identical to that used in Theorem 86, p. 343. Again, I’ll let you take care of the details.
357
8. Limits. Continuity of functions
Theorem 91. A function f (x, y) that is piecewise–continuous in a region R is bounded in R and uniformly continuous for any x ∈ R, with the exception of the boundaries of Rk in the interior of R, where f (x, y) may have jump discontinuities. [Of course, here again we adopt the convention in Remark 74, p. 355.] ◦ Proof : ♣ The proof is conceptually identical to that of Theorem 88, p. 346. One more time, I’ll let you take care of the details.
8.8 Appendix. Bolzano–Weierstrass theorem
♣
In Subsection 8.2.1, we studied monotonic sequences. In particular, we introduced, Theorem 73, p. 308, which states that a bounded increasing sequence converges. The main objective of this appendix is to provide a rigorous proof for it. The proof requires some material that is at the foundation of mathematics. Such material (e.g., accumulation points, the Bolzano–Weierstrass theorem, Cauchy sequences) is definitely more sophisticated than what we have seen thus far. Accordingly, I recommend that, for the time being, you skip this appendix. You may wish to come back to it if you want to refine your mathematical skills, as I recommend you do — you’ll be more self–confident in your mathematical foundations. ◦ Warning. For simplicity, this appendix is limited to one dimension.
8.8.1 Accumulation points
♣
Consider the following Definition 184 (Accumulation point). Consider a set of points in R. A point xA (not necessarily belonging to the set) is called an accumulation point of the set iff any arbitrarily small neighborhood of xA contains at least one point of the set different from xA. We have the following theorems: Theorem 92. Any neighborhood of an accumulation point xA of a given set of points must necessarily contain an infinite number of points. ◦ Proof : By definition, the neighborhood of xA must contain at least one point different from xA. Assume, to obtain a contradiction, that it contains a finite number (say n) of points x = xk (k = 1, . . . , n) of the set. Let d be
358
Part II. Calculus and dynamics of a particle in one dimension
smaller than any dk = |xk − xA|. Consider the neighborhood (xA − d, xA + d). This neighborhood does not include any of the points of the set, in violation of the definition of accumulation points, thereby proving the theorem. Theorem 93. Any closed interval contains all of its accumulation points. ◦ Proof : The theorem is easily verified: all the points of a closed interval (including the endpoints) are clearly accumulation points of the set, in that any of their neighborhoods contains infinite points. On the other hand, all the points outside the interval are not accumulation points, in that it is possible to choose the neighborhood sufficiently small, so as not to contain any point of the interval, as you may verify.
8.8.2 Bolzano–Weierstrass theorem
♣
Here, we address the Bolzano–Weierstrass theorem. This is a theorem that takes you to a definitely higher level of mathematical sophistication. We have the following13 Theorem 94 (Bolzano–Weierstrass theorem over a closed interval). A set S, composed of an infinite number of points that belong to a closed bounded interval [a, b], admits at least one accumulation point that belongs to [a, b], albeit not necessarily to S. ◦ Proof : Let us use the successive subdivision procedure (Definition 155, p. 334). Divide the interval J0 := [a, b] into two equal subintervals, [a, c0 ] and [c0 , b], with c0 = 12 (a+b) being the midpoint of [a, b]. There will be an infinite number of points belonging to S in at least one of the two subintervals, say in the interval J1 = [a1 , b1 ], where [a1 , b1 ] is equal either to [a, c0 ], or to [c0 , b]. [If both contain infinite points, just pick one of the two. The same results would be obtained also for the other one. Indeed, the theorem does not exclude the possibility of more than one accumulation point.] Repeating the procedure n times, we generate a sequence of smaller and smaller intervals Jk := [ak , bk ], where bk −ak = (b−a)/2k (k = 1, . . . , n). As n tends to infinity, we identify a point, say c (Theorem 81, p. 335). Any neighborhood of c, say N := (c − ε, c + ε), no matter how small ε is, contains all the intervals Jk := [ak , bk ], with bk − ak < ε, where k = 1, . . . , n. [Of course, k is such 13
Named after Bernhard Bolzano and Karl Weierstrass. Bernhard Placidus Johann Nepomuk Bolzano (1781–1848), from Bohemia (now in Czechia, the former Czech Republic), was a Catholic priest, who substantially contributed to the foundation of mathematics, logics and philosophy.
359
8. Limits. Continuity of functions
that 2k > (b − a)/ε.] These intervals in turn contain infinite points of the original set of points S, by construction of Jk . Thus, c is an accumulation point of S. Finally, note that c does not necessarily belong to S. However, c necessarily belongs to the interval [a, b], because all the intervals [ak , bk ] belong to [a, b]. ◦ Comment. For instance, 0 is an accumulation point of the sequence {1/n} (n < ∞). It does not belong to the sequence, but it belongs to the closed interval [0, 1]. On the other hand, if the interval is not closed, the theorem does not necessarily apply. For instance, in the open interval (0, 1], consider again the sequence of points {1/n}. This sequence has an accumulation point x = 0, which however does not belong to the interval under consideration.
8.8.3 Cauchy sequence
♣
The accumulation point is not necessarily unique. [For instance, the sequence (−1)n + 1/n has two accumulation points, 1, and −1.] However, it is of interest to know when a sequence is unique. One such possibility is addressed here. We have the following14 Definition 185 (Cauchy sequence). A sequence xh is said to be Cauchy iff, for any ε > 0 arbitrarily small, there exists an n such that
xh − xk < ε, for all h, k > n. (8.122) [Note that the term “Cauchy” is here used as an adjective. This is a universal usage of the term.] Then, we have the following Theorem 95 (Convergence of Cauchy Given a closed in sequences). terval [a, b], any convergent real sequence xh in [a, b] is Cauchy. Vice versa, 14 Named after Augustin–Louis Cauchy (1789–1857), a French mathematician, mathematical physicist and engineer. He made major contributions to mathematics and mechanics. Much of the formulation of analytic functions of a complex variable is connected with his name (for instance, Cauchy–Riemann conditions, Cauchy integral theorem, Cauchy integral formula). Also the Binet–Cauchy identity (Eq. 15.101) is named after him. Other results (addressed in Vol. III) that are named after Cauchy include the Lagrange–Cauchy theorem for irrotational flows, the Cauchy–Green deformation tensor used in the definition of the strain tensor, the Cauchy tetrahedron theorem on the existence of the stress tensor, the Cauchy equation of motion which govern the dynamics of continua, the Cauchy data in partial differential equations, along with a beautiful result regarding the vorticity evolution in inviscid flows.]
360
Part II. Calculus and dynamics of a particle in one dimension
any real Cauchy sequence in [a, b] has a unique accumulation point, namely the sequence converges. ◦ Proof : To prove the first part of the theorem, note that, if the sequence xh converges, say to x, then, by definition, given an ε arbitrarily small, there exists n such that |xh − x| < ε/2, for all h > n. Therefore, we have, for all h, k > n (use |A − B| ≤ |A − C| + |B − C|, Eq. 1.71), |xh − xk | ≤ |xh − x| + |xk − x| < ε,
(8.123)
and hence the sequence is Cauchy. Vice versa, assume the sequence to be Cauchy. The case in which xk remains constant for k > k0 is trivial, as you may verify. If this is not the case, the sequence contains infinitely many points, as you may verify. Therefore, the Bolzano–Weierstrass theorem (Theorem 94, p. 358) guarantees that there exists at least one accumulation point. On the other hand, the fact that the sequence is Cauchy guarantees that this accumulation point is unique. [Indeed, let us assume, to obtain a contradiction, that there exist two accumulation points, say c1 and c2 ; then, for any arbitrarily small and for h > n, we would have that |xh − c1 | < ε and |xh − c2 | < ε. I claim that these two inequalities are incompatible. To this end, let us choose ε < 12 |c1 −c2 |. Then, using again |A−B| ≤ |A−C|+|B −C| (Eq. 1.71), we would have |c2 − c1 | ≤ |c2 − xh | + |c1 − xh | < 2 ε < |c2 − c1 |, an obvious impossibility.]
8.8.4 A rigorous proof of Theorem 73, p. 308
♣
Finally, we present a rigorous proof of Theorem 73, p. 308, on the convergence of monotonic sequences. I claim that a bounded increasing sequence is Cauchy. Indeed, Eq. 8.122 is satisfied. [For, if it were not, xk+1 − xk ≥ 0 would have a lower bound greater than zero, and hence the sequences could not be bounded.] Then, the theorem on the convergence of Cauchy sequences (Theorem 95 above) guarantees that there exists a unique accumulation point, namely that the series converges.
Chapter 9
Differential calculus
In this chapter, we introduce one of the most important notions in mathematical analysis. Specifically, we use the definition of limit, presented in Chapter 8, to introduce the derivative of a function, a notion that corresponds to the slope of the graph of the function. The material in this chapter is known as differential calculus, a branch of mathematical analysis.
• Overview of this chapter In Section 9.1, we introduce the definition of derivative of a function, along with its rules. Then, we discuss the derivatives of the functions that we have introduced thus far. Specifically, in Sections 9.2–9.4 we address, respectively, the derivatives of: (i) xα , where α is a real number, (ii) polynomials and other algebraic function, and (iii) trigonometric functions and their inverse functions. Next, in Section 9.5, we present some additional results on differentiable functions. Finally, in Section 9.6, we discuss derivatives of a function of several variables, namely the so–called partial derivatives, including the Schwarz theorem on invertibility of the order of second–order mixed derivatives. We also have three appendices. In Appendix A (Section 9.7), we discuss issues related to the second ordinary derivative, such as the curvature of the graph of f (x) and osculating circles. In Appendix B (Section 9.8), we present some general comments regarding operators, in particular, linear operators and linear equations. Finally, in Appendix C (Section 9.9), we introduce a way to obtain convenient numerical approximations for derivatives of various order, namely the so–called finite difference approach, which is used in computations.
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_9
361
362
Part II. Calculus and dynamics of a particle in one dimension
9.1 Derivative of a function As stated above, the concept of the derivative of a function is closely related to that of the slope of the graph of a function, a notion introduced in Subsection 6.3.2 for a straight line. Thus, let us begin with the average slope, namely the slope of the secant (Definition 118, p. 218). This is given by the difference quotient, namely the ratio Δf /Δx, where: (i) Δx := xQ − xP is the horizontal difference, whereas Δf := f (xQ) − f (xP ) = f (xP + Δx) − f (xP ) is the vertical difference. Note the analogy between the different quotient and the average slope (Eq. 6.29). Therefore, we encounter here a problem similar to that addressed there: if Δx is too large, we can obtain results that are totally unrelated to the notion of local slope (see the secant line L2 , namely the segment P Q2 in Fig. 9.1). Hence, we have the requirement that xQ be close to xP , namely that Δx be small. [For instance, the secant line L1 , namely the segment P Q1 in Fig. 9.1, is noticeably more representative than L2 .] Again, the problem is removed if we consider the “local slope,” namely the slope of a portion of the function around a given point xP , with Δx so small that the corresponding portion of the graph may be approximated with a segment. But how small should we choose Δx = xQ − xP ?
Fig. 9.1 Local slope
As mathematicians we must be precise, you know. For, to say that Δx is small, even very small, is not adequate enough, because the value that we would obtain for the derivative would not be defined in an absolutely incontrovertible manner, since it changes with Δx, albeit imperceptibly. Unless we arbitrarily choose a specific value for Δx, we would not have a 100% precise definition. It is apparent that the smaller Δx is, the better off we are. Hence, here again (and it will not be the last time), we can get some help from the
363
9. Differential calculus
quintessential instrument of mathematical analysis — our friend the limit! “Can we take Δx smaller and smaller, namely in the limit as Δx approaches zero?” Answer: “Why not!” Hence, we define the local slope as the limit of the difference quotient, as Δx tends to zero. [After all, recall that the tangent is the limit of a secant, the line P Q in our case, as the point Q approaches P (Section 5.3).] The term used by mathematicians to denote the function corresponding to the local slope of the graph of f (x) is the derivative of f (x). Specifically, we have the following Definition 186 (Derivative). Consider a continuous function f (x). Assume that lim x→c
−
f (x) − f (c) f (x) − f (c) = lim . + x−c x−c x→c
(9.1)
[Again, the first limit is called limit from the left, the second, limit from the right (Definition 151, p. 321; Eqs. 8.50 and 8.51).] Then, we say that f (x) is differentiable at c, and the derivative of f (x) with respect to x, evaluated at x = c, denoted by f (c), is defined as f (c) := lim
x→c
f (x) − f (c) . x−c
(9.2)
If there exist a and b such that the derivative of f (x) exists for all x ∈ (a, b), we say that f (x) is differentiable in the interval (a, b). If the interval is not open and we are dealing with one of the two endpoints of the interval, Eq. 9.1 is ignored, and only the pertinent limit in Eq. 9.2 is considered there. For instance, if we are dealing with the interval [a, b), at x = a we consider only the limit from the right. Similar considerations apply to (a, b ] and [a, b ]. [We also use the term “to differentiate f (x)” to mean “to take the derivative of f (x),” and the term “differentiation” to mean “the operation of taking the derivative.”] ◦ Comment. The notation f (x) was introduced by Newton, who used the term fluxion to refer to the derivatives. [Of course, the symbol prime in Eq. 9.2 has nothing to do with the same symbol used in Remark 29, p. 85. There, we could have used any symbol to distinguish ahk from ahk (for instance a∗hk ). This is not the case here. In the Newton notation, the derivative of f (x) is denoted exclusively by f (x).] ◦ Warning. I will often use the more compact notation Δf , Δx→0 Δx
f (x) = lim
(9.3)
364
Part II. Calculus and dynamics of a particle in one dimension
with the understanding that Δf = f (x + Δx) − f (x) and that Eq. 9.1 is satisfied (namely Δx ≷ 0), unless otherwise stated. ◦ Comment. Let us go back to the issue of how small Δx should be. We said that the secant L1 is noticeably more representative of L2 (Fig. 9.1 above). However, if you look at it carefully, the slope of L1 is half as large as that of the tangent L. [We also said that the graph between x and x + Δx (P and Q1 in our case) should be “close” to a segment. This is not the case here and this will provide you with a better idea of how what we mean by “close.”]
• Implications of the definition Here, we address some of the implications of Eq. 9.1. The first one is that a function with a slope discontinuity at x = c is not considered differentiable at x = c. For instance, consider the function abs(x) := |x| x ∈ (−∞, ∞) , (9.4) depicted in Fig. 9.2. This function is not differentiable at x = 0, because at
Fig. 9.2 The function abs(x) := |x|
such a point Eq. 9.1 is not satisfied (the limits of the difference quotient from the left and from the right differ). Therefore, the derivative is not defined for x = 0. [For x = 0, the derivative of |x| coincides with the function sgn(x), defined in Eq. 8.99.] However, things change if we modify the interval of interest into [0, ∞). Now, x = 0 is an endpoint of the interval, and hence only the limit from the right is to be considered. Therefore, in this case the function is differentiable for all the points of the interval [0, ∞), including the point x = 0, where the derivative is equal to 1.
365
9. Differential calculus
◦ Comment. We have an important consequence of Eq. 9.2 (use the definition of o[Δx], Eq. 8.75) f (x) =
Δf + o[1], Δx
(9.5)
which is equivalent to (use x o[1] = o[x], Eq. 8.89) f (x) − f (c) = Δf = f (c) Δx + o[Δx].
(9.6)
• Local slope and derivative It may be worth reconsidering some of the material in the introductory remarks of this section pertaining to the average slope of a curve, and state in more specific terms the geometrical interpretation of a derivative. The difference quotient equals the trigonometric tangent of the angle that the secant through the points P = c, f (c) and Q = x, f (x) makes with the x-axis. In the limit, as x tends to c, the secant tends to the tangent line to the curve y = f (x) at x = c, namely f (x) = tan θc ,
(9.7)
where θc is the angle that the local tangent to the curve at x = c makes with the x-axis (Fig. 18.1).
Fig. 9.3 Tangent line vs tan θ
Remark 76. We can state the above result by saying that the derivative of the function f (x) with respect to x, evaluated at x = c, equals the trigonometric tangent of the angle that the geometric tangent to f (x) evaluated at x = c makes with the x-axis.” [Here, the term “tangent” takes two completely unrelated meanings in the same sentence: (i) the trigonometric tangent (namely
366
Part II. Calculus and dynamics of a particle in one dimension
the function tan x), and (ii) geometric tangent (namely the tangent line to a curve). As pointed out in Footnote 4, p. 180, this may be really confusing. Unfortunately, this terminology is universally accepted and the best we can do is to be careful in distinguishing between the two. The difference will be emphasized, unless it is apparent from the context. Accordingly, I will avoid sentences like “the derivative equals the tangent of the angle that the tangent makes with the x-axis.”]
9.1.1 Differentiability implies continuity
♥
We have seen that continuous functions with slope discontinuities are not differentiable there (Eq. 9.4). We have also seen that the graph of the con√ tinuous function f (x) = 3 x has a vertical slope at x = 0, and hence it is not differentiable there. In other words, continuity does not guarantee differentiability. The reverse is a whole different story! Specifically, we have the following Theorem 96 (Differentiability implies continuity). If a function f (x) is differentiable at x = c, then it is also continuous there. ◦ Proof : Equation 9.6 implies lim f (x) = f (c),
x→c
namely continuity (Eqs. 8.93).
(9.8)
Remark 77. On the other hand, in Subsection 9.5.4 we will see that the existence of the derivative f (x) does not exclude the possibility that f (x) be discontinuous, in spite of what one would expect from Eq. 9.1. In other words, there exist functions whose derivative exists everywhere in a given interval, but is discontinuous at some points of the same interval.
9.1.2 Differentials. Leibniz notation Consider the following Definition 187 (Differential). A differential of a function f (x), denoted by df , is defined as the product of the derivative of f (x) multiplied by an increment Δx, namely
367
9. Differential calculus
df := f (x) Δx.
(9.9)
◦ Comment. Note that the difference Δf is measured on the curve y = f (x), whereas the differential df is measured on the tangent line. Comparing Eqs. 9.6 and 9.9, we see that df = Δf + o[Δx].
(9.10)
In particular, we may apply Eq. 9.9 to the function f (x) = x, for which we have f (x) = lim
Δx→0
Δx = 1. Δx
(9.11)
Thus, we have, exactly (note the difference with respect to Eq. 9.10), dx = Δx.
(9.12)
Combining Eqs. 9.9 and 9.12, one obtains df = f (x) dx.
(9.13)
[By definition of df , this coincides with the equation of the tangent line.] Dividing by dx yields f (x) =
df . dx
(9.14)
The expression of the right side of Eq. 9.14 is known as the Leibniz notation for a derivative.1 ◦ Warning. There is an important conceptual difference between df /dx, namely the ratio of the differentials df and dx (no limits involved!) and f (x), as the limit of the difference quotient (Eq. 9.2). In fact, f (x) as given in Eq. 9.2 is the definition of derivative, whereas df /dx (Eq. 9.14) is not a definition — it’s a consequence of the definition of differentials (Eq. 9.9) and it required a proof (Eqs. 9.11–9.14). Nonetheless, given their equivalence, in the rest of this book, I will use f (x) and df /dx interchangeably. ◦ Comment on notations. It is important to emphasize a subtle point on
notations. We have (the notation · · · x=x0 denotes evaluation at x = x0 ) 1
You may note that the Leibniz notation is more gut–level than that of Newton, in that it reminds us of the limit of the difference quotient. It is believed that this is the reason why, in subsequent years, mathematical analysis flourished more on the continent than in England. [See Boyer (Ref. [13], p. 414) and Hellman (Ref. [30], p. 72).]
368
Part II. Calculus and dynamics of a particle in one dimension
df (x)
f (c) := , dx x=c
(9.15)
whereas df (c)/dx = 0 and [f (c)] = 0, because f (c) is a constant. [To be one hundred percent clear, for the expressions on either side of Eq. 9.15 we first take the derivative, and then set x = c.] Remark 78. Note that d/dx stands for an operation that given a function f (x), gives another function, namely its derivative f (x). Accordingly, we can conceive of d/dx as an operator, in the sense introduced in Definition 133, p. 253. [More on this in Subsection 9.8.1.] This operator is called the derivative operator, and, more often than not, simply the derivative. The use of the same term to indicate both, (i) the function f (x), and (ii) the operator d/dx, should not cause you any problem. The difference will always emerge clearly from the context. If the meaning is not apparent from the context, I will use the expanded terminology “derivative operator.”
9.1.3 A historical note on infinitesimals
In the original formulation by Leibniz, the quantities df and dx were considered as infinitesimals, namely quantities that are infinitely small, something really ambiguous, because it could mean “as small as you wish (albeit not really zero),” as well as “just plain zero.” Accordingly, in the Leibniz definition, the derivative was considered as a ratio of infinitesimals, df and dx. However, such a use of the concept of infinitesimals was harshly criticized, in particular, by Bishop George Berkeley.2 For, infinitesimals were considered as vanishing quantities, namely simultaneously: (i) finite (albeit very small), and (ii) zero. Also, d’Alembert recommended that the idea of infinitesimals be abandoned in favor of the notion of limit (Boyer, Ref. [13], p. 450).3 2
George Berkeley (1685–1753), Church of Ireland Bishop of Cloyne, was an Irish philosopher (empiricism), who also had interests in mathematics and physics. The University of California, Berkeley, and the city developed around the university after this was moved into this new location are named after him. In his book, The Analyst. A Discourse Addressed to the Infidel Mathematician (Ref. [9]), Berkeley refers to infinitesimals as the “ghosts of departed quantities” (Boyer, Ref. [13], pp. 429–431).
3 Jean–Le Rond d’Alembert (1717–1783) was a French mathematician. He co-edited (responsible for the scientific aspects) the “Enciclop´ edie, ou dictionnaire raisonn´ e des sciences, des arts et des m´ etiers,” a 28 volume work edited by the Enlightenment French philosopher Denis Diderot (1713–1784), published beginning in 1751. Also named after him is the so–called d’Alembert paradox in fluid dynamics (namely that lift and drag vanish for a body in uniform motion in a steady incompressible potential flow) and the d’Alembert principle of inertial forces (Section 20.4), as well as some results addressed in Vol. III (the d’Alembert traveling wave solution and the d’Alembert–Euler acceleration formula).
369
9. Differential calculus
I have intentionally avoided defining differentials as infinitesimals, and I will deal with differentials as finite quantities (proportional to Δx, as in Eq. 9.9) without using the word infinitesimal (as a substantive). Indeed, I intend to ban the use of the term infinitesimal (as a substantive), since it is quite controversial.4 However, I will retain this term as an adjective, to mean “as small as you wish, but greater than zero.” For instance, the term “an infinitesimal variation of f (x)” refers to Δf df .
9.1.4 Basic rules for derivatives In this subsection, we present some of the basic rules for handling derivatives. Specifically, we address: 1. 2. 3. 4. 5. 6. 7.
the the the the the the the
derivative derivative derivative derivative derivative derivative derivative
of of of of of of of
a linear combination (linearity); the product of two functions (product rule); the reciprocal function; the ratio of two functions (quotient rule); a composite function (Leibniz chain rule); an inverse function; an even or an odd function.
Illustrative examples of these rules will be presented in Sections 9.2–9.4.
• Linearity of the derivative operator It should be relatively easy for you to verify, from the definition of derivative, that for any two constants, α and β, we have d α f (x) + β g(x) = α f (x) + β g (x). dx
(9.16)
[Hint: Use the linearity of the limit operator, Eq. 8.52.] Successive applications of the above equation yields 4
To be precise, some 20th–century developments have made it rigorous to deal with infinitesimals. I am referring to an approach known as nonstandard analysis. This is not adopted here, not only for its level of sophistication, way beyond the objectives of this book, but also because it seems appropriate for you to learn the classical approach before venturing into more recent developments. [If you want to know more about the subject, nonstandard analysis was introduced by Abraham Robinson (Ref. [55]), in 1966. The historical development of this approach and its precursors throughout the centuries is well documented by Philip Ehrlich (Ref. [17]).]
370
Part II. Calculus and dynamics of a particle in one dimension
n # n d # ck fk (x) = ck fk (x). dx k=1
(9.17)
k=1
Remark 79. Recall that the operation of differentiation is an operator (Remark 78, p. 368). Next, note that (as indicated in Appendix B to this chapter, on linear operators, specifically in Subsection 9.8.2) an operator is called linear if, and only if, L(αf + βg) = α Lf + β Lg (see Eq. 9.152). Thus, the derivative operator is linear.
• Product rule Consider a function y(x) that is defined as the product of two functions, say y(x) = f (x) g(x). Its derivative is given by d f (x) g(x) = f (x) g(x) + f (x) g (dx). dx
(9.18)
Indeed, using the definition of derivative, we have (use the “splitting trick,” Remark 65, p. 322)
f (x) g(x)
1 f (x1 ) g(x1 ) − f (x) g(x) (9.19) x1 − x + 1 f (x1 ) g(x1 ) − f (x) g(x1 ) = lim x1 →x x1 − x , + f (x) g(x1 ) − f (x) g(x) f (x1 ) − f (x) g(x1 ) − g(x) g(x1 ) + f (x) , = lim x1 →x x1 − x x1 − x = lim
x1 →x
which yields Eq. 9.18. [Hint: Use the linearity of the limit operator (Eq. 8.52) and the limit of a product (Eq. 8.54).]
• Derivative of reciprocal functions The derivative of 1/f (x) is given by d 1 −1 = 2 f (x). dx f (x) f (x) Indeed, using the definition of derivative, we have
(9.20)
371
9. Differential calculus
1 d dx f (x)
1 1 1 − x1 − x f (x1 ) f (x) −1 f (x1 ) − f (x) = lim , x1 →x f (x) f (x1 ) x1 − x
= lim
x1 →x
(9.21)
which yields Eq. 9.20 (use the limit of a product, Eq. 8.54).
• Quotient rule Consider y(x) = f (x)/g(x). Its derivative is given by f (x) g(x) − f (x)g (x) d f (x) = . dx g(x) g 2 (x)
(9.22)
Indeed, using the product rule (Eq. 9.18), as well as Eq. 9.20, we have 1 d 1 1 f (x) = f (x) + f (x) dx g(x) g(x) g(x) −g (x) f (x) + f (x) 2 , = g(x) g (x)
(9.23)
which is equivalent to Eq. 9.22.
• Leibniz chain rule Recall Definition 128, p. 248, of composite functions, namely, given two functions y = f (x) and u = g(y), we can form the composite function u = g[f (x)], where we use the first one, namely y = f (x), to evaluate y, and then use this value as the argument of the second one, namely u = g(y), to evaluate u. The derivative of u = g[f (x)] with respect to x is given by the Leibniz chain rule, namely , dg df d + g f (x) = . dx dy dx
(9.24)
Indeed, using Δy = Δf , we have + , Δg Δg Δf g f (x) = lim , = lim Δx→0 Δx Δx→0 Δy Δx which is equivalent to Eq. 9.24 (use the limit of a product, Eq. 8.54).
(9.25)
372
Part II. Calculus and dynamics of a particle in one dimension
• Derivative of inverse functions Consider a functions y = f (x), along with its inverse x = f -1(y) = g(y). We have: df 1 1 = = . dx g (y) dg/dy
(9.26)
Indeed, using Δy = Δf and Δx = Δg, we obtain f (x) = lim
Δx→0
Δf 1 1 = lim = lim , Δx→0 Δg/Δy Δx Δx→0 Δx/Δf
(9.27)
which yields Eq. 9.26.
• Derivatives of even and odd functions Recall that a function fE(x) is called even iff fE(x) = fE(−x) (Eq. 6.7), whereas a function fO(x) is called odd iff fO(x) = −fO(−x) (Eq. 6.8). These imply, if the above functions are differentiable (use the Leibniz chain rule, Eq. 9.24), fE (x) = −fE (−x)
and
fO (x) = fO (−x).
(9.28)
In other words, the derivative of an even differentiable function is odd, whereas that of an odd one is even. Iff fE(x) is differentiable at x = 0, we have necessarily (Eq. 6.9) fE (0) = 0.
(9.29)
9.1.5 Higher–order derivatives The derivative of f (x) is called the derivative of f (x) of order two, or more directly the second derivative of f (x), and is denoted by f (x + Δx) − 2 f (x) + f (x − Δx) d df f (x) := = lim . (9.30) Δx→0 dx dx Δx2 We will use the universally used notation f (x) =
d2 f (x) . dx2
(9.31)
373
9. Differential calculus
◦ Comment. The second derivative of f (x) represents the rate of change of the slope of f (x) and is therefore somewhat related to the curvature of the graph of the function. [The exact relationship between second derivative and curvature is addressed in Subsection 9.7.3.] Recall that mathematical induction may be used to give a recursive definition (Subsection 8.2.5). Specifically, in order to define the entity An for n = 1, 2, 3, . . . , it is sufficient to provide the definition: (i) for an initial value, say n = 1, and (ii) define An in terms of An−1 . Accordingly, the derivative of order of n (where = 1, 2, . . . ), denoted by dn f /dxn , may be defined in terms of that of order n − 1, as follows f (n) (x) := f (n−1) (x) , (9.32) where we have introduced the convenient and widely used notation f (n) (x) :=
dn f (x) , dxn
with f (0) (x) := f (x).
(9.33)
We have the following Definition 188 (Function of class C n ). A function is said to be of class C n iff it is continuous with its derivatives up to that of order n (included), in its domain of definition. Accordingly, a function of class C 0 is continuous and a function of class C 1 is continuously differentiable, namely differentiable with a continuous derivative. [As pointed out in Remark 77, p. 366 (and as we will see in Subsection 9.5.4), it is possible for a derivative to exist but to be discontinuous.] A function is said to be smooth iff it is continuously differentiable. [Note the analogy with the definitions of smooth lines (Definition 166, p. 347) and smooth surfaces (Definition 169, p. 350), since the normal is related to the derivative, as we will see later (see for instance Subsection 18.2.1).] A function is said to be infinitely smooth iff it may be differentiated infinitely many times. I will use the term suitably smooth to mean that it is differentiable as many times as necessary for the issue under consideration. Warning: Some authors use the term smooth function for what here is called an infinitely smooth function.
374
Part II. Calculus and dynamics of a particle in one dimension
9.1.6 Notations for time derivatives If the variable of differentiation is time, the following notation is widely used and is adopted here: u(t) ˙ :=
du(t) . dt
(9.34)
u ¨(t) :=
d2 u(t) , dt2
(9.35)
du(t) . dt
(9.36)
Also used is the notation
as well as [u(t)]˙ :=
◦ Comment. Note that, in analogy with Eq. 9.15 for x-derivatives, we have
du(t)
, (9.37) u(t ˙ 0) = dt t=t0 whereas du(t0 )/dt = 0 and [u(t0 )]˙ = 0.
9.2 Derivative of xα At this point, we are ready to obtain the derivatives of all the functions that we have introduced thus far, namely xα (this section), polynomials and other algebraic functions (Section 9.3), and traditional trigonometric functions with their inverses (Section 9.4). ◦ Warning. In the following, for the sake of graphical clarity, I will use the symbol h := Δx.
(9.38)
I claim that the derivative of xα , with α real, is given by d α x = α xα−1 , dx
(9.39)
for any real number α. ◦ Warning. We have to add the limitation x = 0 whenever the derivative is not defined at x = 0, namely, as we will see, for α ∈ (−∞, 1).
375
9. Differential calculus
We obtain the proof of Eq. 9.39 by assuming α, successively, to be equal to: (i) zero, (ii) a natural number, (iii) an integer, (iv) the reciprocal of an integer, (v) a rational number, and (vi) a real number.
• Motivating the approach chosen
♥
Admittedly, the material in this subsection is lengthy and cumbersome, albeit relatively simple. However, this is the price to pay in order to simplify the overall presentation of the book. Indeed, in the standard approach (at least, the one I was taught when I was a student), the proof is based upon the use of logarithms. Such a proof is much simpler than that presented here (albeit not as gut–level). [Incidentally, in order to satisfy your possible curiosity, such a proof is presented in Eq. 12.13.] However, such an approach requires introducing the logarithm before introducing Eq. 9.39. Again, the way I was taught, the logarithm is defined as the inverse of the exponential function, which therefore must be introduced before the logarithm. This in turn requires the definition of the so–called Napier number e (here introduced in Eq. 12.25), through either of the following expressions (see Rudin, Ref. [56], who presents also a proof of the equivalence of these two definitions, in his Theorem 3.31, p. 64) e := lim
n→∞
1+
1 n
n or
n # 1 , n→∞ n!
e := lim
(9.40)
k=0
where we have used the following Definition 189 (n factorial). The number n! := 1 · 2 · 3 · 4 · · · (n − 1) · n
(9.41)
is known as the factorial of n, or simply as n factorial. Note that Eq. 9.41 implies n! = (n − 1)! n.
(9.42)
Accordingly, we have by definition 0! := 1.
(9.43)
When I was a student, I had a hard time with such an approach to introduce exponentials and logarithms. Accordingly, here I follow a different path. Specifically, I introduce the logarithm as the function whose derivative equals 1/x (Eq. 12.1), and then the exponential function as the inverse of the loga-
376
Part II. Calculus and dynamics of a particle in one dimension
rithm. However, the proof of a key property of logarithms (ln xy = y ln x, Eq. 12.8) requires the use of Eq. 9.39. Thus, we are trading the lengthy and cumbersome proof of Eq. 9.39, as presented in this section, against a maybe more elegant introduction of logarithms and exponentials than that used here. On the other hand, our approach has a very important advantage, namely it is a good example of gut–level mathematics. And, as stated in Remark 2, p. 10: “Between a long proof that is intuitive, and a short elegant proof that is not easy to grasp at a gut level, I will always choose the first!” Indeed, the proof based upon the use of the logarithms is straightforward, but one does not get a good grasp of “what’s it all about.” In addition, this section provides you with a wealth of illustrations of the rules for the use of derivatives, which were introduced in Subsection 9.1.4. At the end of the day, you’ll come out ahead.
9.2.1 The case α = 0 It is apparent from the graph that Eq. 9.46 holds true for n = 0, since x0 = 1 (Eq. 1.54) and the slope of a constant function is zero. [Still not convinced? See Theorem 97 below.] More important is to note that the reverse statement is also true. Indeed, we have the following Theorem 97. If f (x) is constant then f (x) = 0, and vice versa. ◦ Proof : For, if the function f (x) is equal to a constant, say C, we have df C −C = lim = 0. dx Δx→0 Δx
(9.44)
Vice versa, if a function has a zero derivative, the slope of its graph vanishes and therefore the function equals a constant. As a consequence, we have the following Corollary 4. If two functions, f1 (x) and f2 (x), have the same derivative, namely if f1 (x) = f2 (x) = g(x), then, they differ at most by a arbitrary additive constant C: f1 (x) = f2 (x) + C.
(9.45)
In other words, we can say that a function f (x) is fully determined by its derivative f (x), except for an arbitrary additive constant. ◦ Proof : Indeed, we have [f1 (x) − f2 (x)] = f1 (x) − f2 (x) = g(x) − g(x) = 0, and hence Theorem 97 above yields Eq. 9.45.
377
9. Differential calculus
9.2.2 The case α = n = 1, 2, . . . The derivative of f (x) = xn , where n is a natural number, is dxn = n xn−1 . dx
(9.46)
We can prove this by mathematical induction. Equation 9.46 is trivially true for n = 0, for which we have df /dx = 0 (Eq. 9.44). Moreover, we have dxn d n−1 = xx = xn−1 + x (n − 1) xn−2 = n xn−1 , dx dx
(9.47)
in agreement with Eq. 9.46. [Hint: Use dxn−1 /dx = (n − 1) xn−2 (premise in Item 2 of the principle of mathematical induction).] Thus, Eq. 9.39 is valid as well for any α equal to a natural number. ◦ Comment. Alternatively, here is a more gut–level and maybe easier–to– remember proof. Using (x + h)n = xn + n xn−1 h + O[h2 ] (Eq. 8.84), with h := Δx (Eq. 9.38) and n = 1, 2, . . . , one obtains , dxn 1 1 + n−1 = lim (x + h)n − xn = lim nx h + O h2 , h→0 h h→0 h dx
(9.48)
which yields Eq. 9.46. [The proof holds for x = 0 as well, as you may verify.]
9.2.3 The case α = −1 We have (for x = 0) d 1 −1 −1 = 2 x = 2 = −x−2 . dx x x x
(9.49)
[Hint: Use the rule for the derivative of y(x) = 1/f (x), namely y = −f /f 2 (Eq. 9.20), where f (x) = x, along with x = 1 (Eq. 9.46 with n = 1).] Thus, Eq. 9.39 is valid for α = −1 as well.
9.2.4 The case α = −2, −3, . . . Setting α = −m, with m > 0, we have (again for x = 0)
378
Part II. Calculus and dynamics of a particle in one dimension
m m−1 1 dx−m −1 d 1 = =m = −m x−m−1 . dx dx x x x2
(9.50)
[Hint: Recall that x−m = (1/x)m (Eq. 6.102), and use the Leibniz chain rule (Eq. 9.24), along with (1/x) = −1/x2 (Eq. 9.49).] Thus, Eq. 9.39 is valid for any integer α.
9.2.5 The case α = 1/q The function y = x1/q , with q nonzero integer, was introduced in Subsection 6.5.3, on exponentiation with a rational exponent, as the inverse function of x = y q . Using the rule for the derivatives of inverse functions (Eq. 9.26), one obtains (for x = 0, whenever (1/q) − 1 < 0) dx1/q 1 1 1 1 1 = = = = = x(1/q)−1 . dx dy q /dy q y q−1 q q x(q−1)/q q x1−1/q
(9.51)
Thus, Eq. 9.39 is valid as well for α = 1/q, with q integer. In particular, for α = 12 , Eq. 9.39 yields d √ 1 x= √ . dx 2 x
9.2.6 The case α rational
(9.52)
♣
Let α = p/q be rational, with p = 0. We have seen that xp/q = (xp )1/q (Eq. 6.106). We have dxp/q d p 1/q 1 p (1/q)−1 dxp = x x = dx dx q dx 1 (p/q)−p p = x p xp−1 = x(p/q)−1 , q q
(9.53)
provided that x = 0, whenever (p/q) − 1 < 0. [Hint: Use the Leibniz chain rule (Eq. 9.24), as well as xb xc = xb+c (Eq. 6.126), and (xb )c = xbc (Eq. 6.127).] Thus, the rule given in Eq. 9.39 is valid for any rational α as well.
379
9. Differential calculus
9.2.7 The case α real
♥
The function xα , with α = 0 real, was defined in Section 6.5. Now that we have learned what a limit is, we can say that xα was defined, essentially, as the limit for p/q that tends to a real number, α. The derivative of the function xα , with α real, is defined using a similar approach. We could say (again for x = 0, when α − 1 < 0) dxα = lim dx p/q→α
p/q
lim
x1 →x
x1
− xp/q x1 − x
= lim p/q→α
p (p/q)−1 x = α xα−1 , q
(9.54)
in agreement with Eq. 9.39. In summary, we have obtained that Eq. 9.39 is valid for any real α.
• Being picky like any good mathematician should be
♠
Note that, to make our lives simple, in Eq. 9.54 the correct order of the limits has been interchanged. Indeed, the correct order of the limits is dxα = lim x1 →x dx
p/q
lim p/q→α
x1
− xp/q , x1 − x
(9.55)
and not the reverse, as we did in Eq. 9.54. Accordingly, if we want to be picky like any good mathematician should be, we have to be careful here. Indeed, as shown in Subsection 8.7.1, it is not always possible to interchange the order in which we take the limits. Thus, we cannot interchange the limits, namely assume that the derivative (limit of the difference quotient) of xα (limit of xp/q as p/q tends to α) necessarily equals the limit, as p/q tends to α, of the derivative of xp/q . We should (and can) do better than that. Indeed, to circumvent the problem, note that p/q
α xα x − xp/q p 1 −x 1 xp/q−1 α xα−1 . x1 − x x1 − x q
(9.56)
Specifically, we have that: (i) in the limit, as p/q tends to α, the second term tends to the first one, so that their difference may be made as small as we wish; (ii) as x1 tends to x, the second term tends to the third one, so that, again, their difference may be made as small as we want; (iii) in the limit, as p/q tends to α, the third term tends to the fourth one, so that, in this case as well, their difference may be made as small as we want. To exploit these facts, we note that, using the splitting trick (Remark 65, p. 322), as well as |A + B| ≤ |A| + |B| (Eq. 1.70), we have
380
Part II. Calculus and dynamics of a particle in one dimension
α
α p/q
x 1 − xα
x1 − xα x1 − xp/q
α−1
x1 − x − α x
≤ x1 − x − x1 − x
p/q
x − xp/q
p
p − xp/q−1
+
xp/q−1 − α xα−1
, +
1 x1 − x q q
(9.57)
so as to have, on the right side, three terms each of which may be made as small as we wish, so that the left side may be made smaller than any arbitrarily small ε > 0. Specifically, let us begin by considering the first term on the right side of Eq. 9.57. Note that here the values of x and x1 are fixed; what changes is p/q. Thus, by definition of xα (Definition 127, p. 247), given any ε1 > 0 arbitrarily small, there exists a δ1 such that |xα − xp/q | < ε1
and
p/q
|xα 1 − x1 | < ε1 ,
(9.58)
whenever |α − p/q| < δ1 . Hence, for the first term on the right side of Eq. 9.57, choose ε1 := ε |x1 − x|/6 to get (use again |A + B| ≤ |A| + |B|, Eq. 1.70)
α p/q p/q
x 1 − xα x1 − xp/q
|xα |xα − xp/q | 1 − x1 |
− < +
x1 − x x1 − x |x1 − x| |x1 − x| ε 2 ε1 = , < |x1 − x| 3
(9.59)
whenever |α − p/q| < δ1 . Next, consider the second term on the right side of Eq. 9.57. By definition of the derivative of xp/q , we have that given any ε2 := ε/3 > 0, there exists a δ2 such that (use Eq. 9.53)
p/q
x1 − xp/q p p/q−1
ε
< ε2 = 3 ,
x1 − x − q x
(9.60)
whenever |x1 − x| < δ2 . Then, consider the last term on the right side of Eq. 9.57. By definition of α and xα , we know that, given any ε3 = ε/3 > 0, there exists a δ3 such that, whenever |p/q − α| < δ3 , we have (use again Eq. 1.70, and the splitting trick, Remark 65, p. 322)
+ p ,
p p/q−1
p
p/q−1
α−1 α−1 α−1
x
− α x ≤ + − α x − x
x
q
q
q ε < ε3 = . (9.61) 3 Finally, combining Eqs. 9.57, 9.59, 9.60 and 9.61, we obtain that
α
x1 − x α
ε ε ε α−1
x1 − x − α x
< 3 + 3 + 3 = ε, whenever |p/q − α| < min[δ1 , δ3 ] and |x1 − x| < δ2 .
(9.62)
381
9. Differential calculus
9.3 Additional algebraic derivatives In this section, we present the derivatives of some algebraic functions, which may be obtained using the results obtained this far. Let us begin with a simple application. Applying the rule for a linear combination (Eq. 9.16), as well as x = 1 (Eq. 9.46, with n = 1) d (a x + b) = a, dx
(9.63)
in agreement with the fact that straight lines have a constant average slope, and hence their derivative equals the coefficient of x (see Eq. 6.30). Next, consider the Leibniz chain rule (Eq. 9.24), namely: given y = g(u) and u = f (x), we have y (x) = g[f (x)] = g (u) f (x). In particular, if f (x) = a x + b, for which f (x) = a (Eq. 9.63), we have d dg g(a x + b) = a . dx du
(9.64)
• Polynomials Here, we consider the derivative of polynomials. Recall that polynomials are linear combinations of the natural powers of x (see Eq. 6.25). Hence, they are easily obtained from those of xk , namely dxk /dx = k xk−1 (Eq. 9.46), by using the rule for the derivative of a linear combination (Eq. 9.17). Specifically, we have n
n
k=0
k=1
# dxk # d pn (x) = = ak k ak xk−1 . dx dx
(9.65)
• Derivative of f (x) = (1 ± xα )β Here, we consider the derivative of f (x) = (1 ± xα )β . Set u = 1 ± xα and use the Leibniz chain rule (Eq. 9.24) and dxα /dx = α xα−1 (Eq. 9.39) to obtain β d duβ d 1 ± xα = 1 ± xα = ± β uβ−1 α xα−1 , dx d u dx
(9.66)
β β−1 α−1 d 1 ± xα = ± α β 1 ± xα x . dx
(9.67)
or
382
Part II. Calculus and dynamics of a particle in one dimension
In particular, if we choose α = 2√and β = 12 , we obtain that the derivative the function f (x) = (1 ± x2 )1/2 = 1 ± x2 is given by d ±x 1 ± x2 = √ . dx 1 ± x2
(9.68)
Similarly, for α = 2 and β√= − 12 , we obtain that the derivative the function f (x) = (1 ± x2 )−1/2 = 1/ 1 ± x2 is given by d 1 ∓x √ = . 2 dx 1 ± x (1 ± x2 )3/2
(9.69)
9.4 Derivatives of trigonometric and inverse functions Here, we present the derivatives of the traditional trigonometric functions (namely sine, cosine and tangent), and their inverse functions (Eq. 6.134). We have the following Theorem 98 (Derivatives of trigonometric functions and inverse). The derivatives of sine, cosine, tangent, and their inverse functions are given by d sin x = cos x, dx d cos x = − sin x, dx d 1 tan x = 1 + tan2 x = , dx cos2 x d ±1 sin -1 x = √ , dx 1 − x2 d ∓1 cos -1 x = √ , dx 1 − x2 1 d tan -1 x = . dx 1 + x2
(9.70) (9.71) (9.72) (9.73) (9.74) (9.75)
◦ Proof of Eq. 9.70. Setting h := Δx (Eq. 9.38), and recalling the addition formula of the sine function, namely sin(x + h) = sin x cos h + cos x sin h (Eq. 7.47), one obtains 1 sin h cos h − 1 sin(x + h) − sin x = sin x + cos x . h h h Next, recall Eqs. 8.85 and 8.86, namely
(9.76)
383
9. Differential calculus
lim
h→0
cos h − 1 =0 h
and
lim
h→0
sin h = 1. h
(9.77)
Then, taking the limit as h tends to zero, Eq. 9.76 yields the desired result. ◦ Proof of Eq. 9.71. Recall the addition formula of the cosine function, namely cos(x + h) = cos x cos h − sin x sin h (Eq. 7.48). This implies 1 cos h − 1 sin h cos(x + h) − cos x = cos x − sin x , h h h
(9.78)
which in turn, taking the limit as h tends to zero and using again Eq. 9.77, yields the desired result. ◦ Proof of Eq. 9.72. Using the quotient rule, namely [f /g] = [f g − g f ]/g 2 (Eq. 9.22), as well as Eqs. 9.70 and 9.71, and recalling that sin2 x + cos2 x = 1 (Eq. 6.75) and 1/ cos2 x = 1 + tan2 x (Eq. 6.76), we have d 1 d d tan x = cos x sin x − sin x cos x dx cos2 x dx dx =
cos2 x + sin2 x 1 = = 1 + tan2 x. cos2 x cos2 x
(9.79)
◦ Proof of Eq. 9.73. Setting y = sin -1 x, and using the rule for the inverse function derivative, namely f (x) = 1/g (y), where x = g(y) is the inverse of y = f (x) (Eq. 9.26), we have (use Eq. 9.70) d ±1 1 ±1 1 =√ sin -1 x = = = . 2 dx d(sin y)/dy cos y 1 − x2 1 − sin y
(9.80)
[For the sign ambiguity, see Remark 80 below.] ◦ Proof of Eq. 9.74. Similarly, setting y = cos -1 x, we have (use Eq. 9.71) ∓1 1 ∓1 1 d =√ cos -1 x = = = . (9.81) dx d(cos y)/dy − sin y 1 − x2 1 − cos2 y [See again Remark 80 below.] ◦ Proof of Eq. 9.75. Similarly, setting y = tan -1 x, we have d 1 1 1 = tan -1 x = = . dx d(tan y)/dy 1 + x2 1 + tan2 y [Hint: Use [tan y] = 1 + tan2 y, Eq. 9.72)].
(9.82)
Remark 80. The sign ambiguity in Eqs. 9.73 and 9.74 depends upon the fact that sin -1 x and cos -1 x are multi–valued (see Section 6.7, on single–valued and multi–valued functions). However, if we restrict ourselves to the principal branch of sin -1 x, namely y = sinP-1 x with y ∈ [−π/2, π/2] (Eq. 6.135), then
384
Part II. Calculus and dynamics of a particle in one dimension
the ambiguity disappears; the plus sign holds, as you may verify (see Fig. 6.18). [Similar considerations apply to cosP-1 x; the minus sign holds.]
9.5 More topics on differentiable functions In this section, we address a variety of important topics regarding differentiable functions. Specifically, in Subsection 9.5.1, we discuss local maxima/minima of differentiable functions (an issue touched upon in Subsection 8.5.4 for continuous functions). In Subsection 9.5.2, we present the Rolle theorem and the mean value theorem in differential calculus. In Subsection 9.5.3, we introduce the l’Hˆopital rule. Finally, in Subsection 9.5.4, we show that, in spite of Eq. 9.1, it is possible to have functions whose derivative exists but is discontinuous.
9.5.1 Local maxima/minima of differentiable functions Here, we examine issues related to what happens when a differentiable function attains a local maximum or a local minimum. [We may limit ourselves to local minima. For, the considerations for a local maximum are virtually identical. Indeed, multiplying f (x) by −1, a maximum becomes a minimum, and vice versa.] We have the following Theorem 99 (Local minima of differentiable functions). If a function f (x) attains a local minimum at a point c where it is differentiable, then necessarily f (c) = 0. ◦ Proof : Recall Definition 161, p. 338, of local minimum, namely: “A function f (x) defined on (a, b) is said to attain a local minimum at x = c ∈ (a, b) iff there exists a ε arbitrarily small, with a < c − ε < c < c + ε < b such that f (x) > f (c), for all x ∈ (c − ε, c + ε) that are different from c. Thus, by definition of minimum, we have lim
x→c−
f (x) − f (c) ≤0 x−c
and
lim
x→c+
f (x) − f (c) ≥ 0. x−c
(9.83)
9. Differential calculus
385
Still by hypothesis, f (x) is differentiable at x = c, and hence the two limits must be equal (Eq. 9.1). The only possibility for this to occur is to have f (c) = 0. ◦ Comment. Note that, the above theorem does not require the continuity of f (x), only its existence. [Accordingly, the same holds true for Theorems 100 (Rolle) and 101, p. 386.] Note that, if f (c) = 0, then the tangent to the graph of y = f (x) at c is a horizontal line. Then, you might have a question for me: “If f (c) = 0, do we necessarily have maximum or a minimum?” The answer is: “Not necessarily!” To see this, let us first introduce the following Definition 190 (Stationary point). A point x = c where f (c) = 0 is called a stationary point. [We can also say that f (x) is stationary at x = c.] At a stationary point, say at x = c, a function does not necessarily have a local maximum, or a local minimum. It is possible to have a f (c) = 0, with f (x) > f (c) on one side and f (x) < f (c) on the other. As an example, consider the function f (x) = x3 , depicted in Fig. 9.4.
Fig. 9.4 The function f (x) = x3
We have the following Definition 191 (Horizontal inflection point). Consider a differentiable function f (x). Assume that f (c) = 0, and that, in an arbitrarily small neighborhood of c, we have f (x) > f (c) on one side of c, and f (x) < f (c) on the other. Then, the point x = c is called a horizontal inflection point. Going back to the example above, the function f (x) = x3 has a horizontal inflection point at x = 0.
386
Part II. Calculus and dynamics of a particle in one dimension
9.5.2 Rolle theorem and mean value theorem Consider a differentiable function that starting from zero, initially goes upwards, and eventually returns to zero. Remember that “what goes up, must come down,” as an old song says (Spinning wheels, by Blood, Sweat & Tears, 1969). If we apply this idea to the graph of such a function, at the point where we shift from “go up” to “go down” the slope is necessarily horizontal. These remarks are the basis for the following5 Theorem 100 (Rolle). Consider a real function f (x) that is continuous in [a, b] and differentiable in (a, b), with f (a) = f (b). Then, there exists at least one point c ∈ (a, b) with f (c) = 0. ◦ Proof : By hypothesis, f (x) is continuous on [a, b], and hence attains both its global maximum and its global minimum at some points of [a, b] (Theorem 84, p. 341). First, consider the case in which both the maximum and minimum occur at the endpoints. This case is trivial; for, recalling that f (a) = f (b), we have that f (x) is necessarily constant over [a, b], and hence f (x) = 0 at every point of (a, b). Next, consider the case in which this is not true. Then, we are guaranteed that at least one of the two (either a global maximum or a global minimum) occurs at an interior point of the interval, say at x = c ∈ (a, b) (Weierstrass theorem, Theorem 84, p. 341). Let us assume that we have a minimum, namely that there exists a c such that f (c) is a global minimum. [If we have only a global maximum, the formulation is analogous.] If f (x) is constant in a whole (possibly one–sided) neighborhood J of c, then f (x) = 0 in the whole J . If this is not the case, there exists a neighborhood of c with f (x) < f (c), and hence the global minimum is also a local minimum. Then, using Theorem 99, p. 384, we have that f (c) = 0. We also have the following Theorem 101 (Mean value theorem in differential calculus). If a real function f (x) is continuous in [a, b] and differentiable in (a, b), then there exist a c ∈ (a, b) such that f (c) =
f (b) − f (a) . b−a
(9.84)
◦ Proof : Consider the function F (x) = (b−a) f (x)−(b−x) f (a)−(x−a) f (b). Note that F (a) = F (b) = 0. Hence, the Rolle theorem (Theorem 100 above) 5 The theorem is named after Michel Rolle (1652-1719), a French mathematician, who √ published it in 1691. He also introduced the notation n a to denote the n-th root of a.
387
9. Differential calculus
applies. Therefore, there exists a point x = c such that F (c) = (b − a) f (c) + f (a) − f (b) = 0, in agreement with Eq. 9.84. For a gut–level appreciation, note that F (x)/(b−a) = f (x)−fL(x) (where fL(x) = f (a) (b − x)/(b − a) + f (b) (x − a)/(b − a) is the linear interpolation of the endpoint values) vanishes at the endpoints. This means that there exists a point x where f (x) and fL(x) have the same derivative (Rolle theorem, Theorem 100 above), in agreement with Eq. 9.84. It is interesting to interpret the theorem in terms of the speed of your car. Assume that you cover 80 km (that is, 50 miles) in one hour. Let x = f (t) describe the motion of your car. By placing the origin at the starting point we have f (0) = 0, whereas f (1) = 80. [Here, time is measured in hours, space in kilometers.] Accordingly, your average speed is 80 km/h (or 50 mph). This means that at times you were driving faster and at times slower than 80 km/h (unless you were traveling at exactly 80 km/h the entire time). The velocity is a differentiable function (unless your car is capable of an infinitely large acceleration, and this is crucial), and hence it is a continuous function. Therefore, there must be at least one instant when the speed of your car was exactly equal to 80 km/h.
9.5.3 The l’Hˆ opital rule
♣
Consider the limit of the function y(x) := f (x)/g(x). We have seen that, when x tends to c, if f (x) tends to fc and g(x) tends to gc , then the function y(x) tends to fc /gc , provided that gc = 0 (Eq. 8.59). If gc = 0, Eq. 8.49 (namely limx→c f (x)/g(x) = ∞) applies, provided of course that fc = 0. However, if both functions, f (x) and g(x), tend to zero, we do not know what the limit is without additional information, as in the case (sin x)/x, addressed in Eq. 8.62. The same considerations hold if both f (x) and g(x) tend to infinity. Fortunately, in these cases, we may use the following6 6 Named after the French mathematician Guillaume Fran¸ cois Antoine, Marquis de l’Hˆ opital (1661–1704), the author of the first (and highly influential) textbook on differential calculus to appear in print, namely Analyse des infiniment petits pour l’intelligence des lignes courbes, 1696, Ref. [42]. This book is based upon the private lectures given to him by the Swiss mathematician Johann Bernoulli (1667–1748), and includes the l’Hˆ opital rule. It appears that the l’Hˆ opital rule is actually due to Johann Bernoulli (Boyer, Ref. [13], pp. 420–421; Kline, Ref. [38], p. 383). Since we are at it, let me tell you that, in the case Bernoulli, the use of the full name is necessary. Indeed, there are at least seven members of the Bernoulli family, who contributed to mathematics and mechanics. Therefore, in order to avoid confusion, a short synopsis of the Bernoulli family is presented here, for the limited cases of those family members that we will encounter in this book. Johann was
388
Part II. Calculus and dynamics of a particle in one dimension
Theorem 102 (l’Hˆ opital rule). Consider two real functions f (x) and g(x), that are both differentiable in a neighborhood N of x = c. Assume that we have f (c) = g(c) = 0, or that both functions tend to infinity, as x tends to c. Then, we have f (x) f (x) = lim , x→c g(x) x→c g (x) lim
(9.85)
provided of course that the second limit exists. [The neighborhood may be one–sided (Definition 144, p. 302).] ◦ Proof : Consider first the case in which f (c) = g(c) = 0. Note that, in general, y(x) − y(c) = y (c∗ ) (x − c) (Eq. 9.84), with c∗ ∈ (c, x). Accordingly, for any x ∈ N , we have (use f (c) = g(c) = 0) f (x) = f (c1 ) (x − c)
and
g(x) = f (c2 ) (x − c),
(9.86)
where c1 ∈ (c, x) and c2 ∈ (c, x) are not necessarily equal. Hence, for any x ∈ N , we have f (x) f (c1 ) = . g(x) g (c2 )
(9.87)
Taking the limit of the above equation as x tends to c, we obtain Eq. 9.85. [For, as x tends to c, both c1 and c2 also tend to c.] Next, consider the second part of the theorem, namely the case in which both f (x) and g(x) tend to infinity, as x tends to c. In this case, let us introduce the functions F (x) = 1/f (x) and G(x) = 1/g(x), for which we have F (c) = G(c) = 0. Thus, we have, in analogy with Eq. 9.87, F (x) F (c1 ) = , G(x) G (c2 )
(9.88)
which is equivalent to (use the derivative of the reciprocal function, Eq. 9.20) introduced to calculus by his older brother Jacob Bernoulli (1655–1705), who had studied it directly on Leibniz’s papers. Later on, these two brothers didn’t quite get along. Indeed, they ended up being in a fierce competition. Among other things, their name is connected with the problem of the catenary (the shape of a chain under its weight): Jacob proposed the problem and Johann was among those who found the solution. Also, Johann had a son, Daniel Bernoulli (1700–1782). Johann didn’t get along with his own son either, as he felt threatened by him. The Bernoulli theorem in fluid dynamics is named after Daniel. The Euler–Bernoulli beam is based upon the discoveries by Jacob Bernoulli, and was completed by Euler and Daniel. [You might want to know more about the feud between the Bernoulli brothers (Jacob and Johann), which is very nicely presented by Hellman (Ref. [30], Ch. 4). You may consult also Boyer (Ref. [13], pp. 415–424), as well as Kline (Ref. [38], p. 381–383).]
389
9. Differential calculus
1/f (x) f (c1 )/f 2 (c1 ) = , 1/g(x) g (c2 )/g 2 (c2 )
(9.89)
f (c1 ) f 2 (c1 )/f (x) = . g 2 (c2 )/g(x) g (c2 )
(9.90)
or
Again, the limit of the above equation, as x tends to c, yields Eq. 9.85.
◦ Comment. If the result is still an undetermined form, one may use the limit of the ratio of the second derivatives (and so on).
• Illustrative examples In order for you to get a better grasp of the l’Hˆ opital rule, let us consider some illustrative examples. Let us begin with the function (sin x)/x (Eq. 8.62). Using (sin x) = cos x (Eq. 9.70), Eq. 9.85 yields lim
x→0
sin x cos x = lim = 1, x→0 x 1
(9.91)
in agreement with Eq. 8.62. Next, consider the limit of (tan x)/x (again Eq. 8.62). Using (tan x) = 1/ cos2 x (Eq. 9.72), the l’Hˆ opital rule (Eq. 9.85) yields tan x 1/ cos2 x = lim = 1, x→0 x→0 x 1 lim
(9.92)
again in agreement with Eq. 8.62. Finally, consider the limit of (1 − cos x)/x2 (Eq. 8.68). Using Eqs. 9.85 and 9.91 and (cos x) = − sin x (Eq. 9.71) yields lim
x→0
1 − cos x sin x 1 = , = lim x→0 2 x x2 2
(9.93)
in agreement with Eq. 8.68. One more example. Consider the ratio of two polynomials of degree n. The limit as x tends to infinity may be obtained by dividing numerator and denominator by xn and then taking the limit. This yields an xn + an−1 xn−1 + · · · + a0 x→∞ bn xn + bn−1 xn−1 + · · · + b0 an + an−1 /x + · · · + a0 /xn an = lim = . x→∞ bn + bn−1 /x + · · · + b0 /xn bn lim
(9.94)
Here, we want to obtain the same result as an application of the l’Hˆ opital rule (Eq. 9.85). Indeed, differentiating numerator and denominator n times
390
Part II. Calculus and dynamics of a particle in one dimension
yields an xn + an−1 xn−1 + · · · + a0 an = . x→∞ bn xn + bn−1 xn−1 + · · · + b0 bn lim
(9.95)
[You may want to sharpen your skills and obtain the limit of ratios of polynomials having different degrees.]
• Applying the l’Hˆ opital rule backwards
♣
As an intriguing application of the l’Hˆopital rule, we have the following Theorem 103. Assume that the function f (x) has the property f (x) = o xn , as x tends to 0.
(9.96)
Then, we have dk f (x) = o xn−k , k dx
as x tends to 0
(k = 1, . . . , n).
(9.97)
◦ Comment. Here, we will apply the l’Hˆopital rule backwards. Specifically, the way the theorem was presented, it looks like one can evaluate the limit of the ratio of the functions from the limit of the ratio of the derivatives. Here, on the other hand, the ratio of the functions is assumed to be known, and we use the theorem to evaluate the limit of the ratio of the derivatives. Indeed, the theorem on the l’Hˆopital rule (Theorem 102, p. 388), allows for both uses of Eq. 9.85, as you may verify. ◦ Proof : By definition of o[x] (Eq. 8.79), we have lim
x→0
f (x) = 0. xn
(9.98)
Clearly, the denominator tends to zero with x. Hence, the numerator must tend to zero as well. Thus, we can apply the l’Hˆ opital rule, and obtain lim
x→0
f (x) = 0. n xn−1
This implies, by definition of o[x] (use again Eq. 8.79), f (x) = o xn−1 , as x tends to 0.
(9.99)
(9.100)
If n > 1, we have limx→0 f (x) = 0, again a zero–over–zero situation, as in Eqs. 9.99. Thus, we may apply the l’Hˆ opital rule a second time, and obtain
391
9. Differential calculus
f (x) = o xn−2 ,
as x tends to 0.
(9.101)
Of course, we may use the procedure, as long as the exponent in the denominator in Eq. 9.98 is greater than zero, as you may verify. Hence, we can repeat this for a total of n times, thereby proving Eq. 9.97. ◦ The inverse theorem.♥ The reverse is not necessarily true, namely Eq. 9.97 does not guarantee Eq. 9.96, because f (x) does not necessarily vanish at x = 0. [For instance, sin x = o[x], as x tends to 0. However, cos 0 = 1.] Nonetheless, we have the following Theorem 104. Consider a differentiable function f (x) and assume that df = o xn , dx
as x tends to 0,
(9.102)
with n ≥ 1, and that limx→0 f (x) = f (0). Then, necessarily, f (x) − f (0) = o xn+1 , as x tends to 0.
(9.103)
◦ Proof : Indeed, the left side of Eq. 9.103 tends to zero, as x tends to zero. Thus, we may assume that f (x) − f (0) = o[xα ], for some α ≥ 0. Then, the preceding theorem tells us that f (x) = o[xα−1 ]. Comparing with Eq. 9.102, we obtain that necessarily α = n + 1, in agreement with Eq. 9.103. You may want to sharpen your skills and extend this theorem to f (x) = o[x ] and higher–order derivatives. [Hint: Use the principle of mathematical induction (Subsection 8.2.5).] n
9.5.4 Existence of discontinuous derivative
♥
To conclude this section, let us consider a subtle point. In Theorem 96, p. 366, we have shown that differentiability implies continuity of the function being differentiated. Does it also imply that f (x) is continuous? By definition, Eq. 9.1 (namely that the limit from the right equals that from the left) is a necessary condition for the derivative of f (x) to exist. This implies that f (x) cannot have jump or infinite discontinuities (see Subsection 8.4.1 on discontinuous functions). Here is the question again: “Does the fact that f (x) is differentiable imply that f (x) is continuous?” The answer is: “No! The fact that, in order for the derivative to exist, Eq. 9.1 must be satisfied does not imply that the derivative is continuous.” Remark 81. The answer (namely the sentence in quotation marks) is an example of a negative theorem, namely a theorem that negates a statement.
392
Part II. Calculus and dynamics of a particle in one dimension
In order to prove any negative theorem, namely to prove that the opposite theorem is not valid, it is sufficient to provide a single example (called a counterexample) that negates the statement, as you may verify. Accordingly, as a counterexample, consider the continuous function fC(x) = x2 sin(1/x) for x = 0, with fC(0) = 0 (Eq. 8.108). Using the product rule (Eq. 9.18) and the Leibniz chain rule (Eq. 9.24), we have fC (x) = 2 x sin
1 1 − cos , x x
for x = 0.
(9.104)
On the other hand, for x = 0 we have 1 1 2 1 f (h) − f (0) = lim h sin = 0. h→0 h h→0 h h
fC (0) = lim
(9.105)
This shows that the derivative exists for all values of x ∈ (−∞, ∞). However, it is not continuous — it has an essential discontinuity at x = 0. Indeed, as x tends to zero, the limit of the function in Eq. 9.104 does not exist. ◦ Comment. You will be glad to know that, from now on, we will not encounter any function whose derivative exists but is not continuous.
9.6 Partial derivatives In the next chapter (specifically, in expressing the derivative of an integral, Section 10.6), we will need the notion of partial derivative, which is briefly introduced here. In Section 8.7, we have addressed functions of two variables, u = f (x, y). For a fixed value of y, say y = yc , we have that u = f (x, yc ) becomes a function only of x. The graph of u = f (x, yc ) is obtained, from the graph of u = f (x, y), by taking its vertical cross section parallel to the x-axis (namely its intersection with the vertical plane y = yc ). Then, we can take the derivative as usual. However, in this case, to avoid confusion, we use a special terminology: such a derivative is called the partial derivative of f (x, y) with respect to x (it is understood “with y constant”). We also use a special notation: for a generic y, we write ∂f (x, y) f (x1 , y) − f (x, y) := lim . x1 →x ∂x x1 − x
(9.106)
Similar considerations hold for ∂f /∂y. We will also use the standard convenient notation
393
9. Differential calculus
∂f
∂f (x, y) , :=
∂x y ∂x
∂f
∂f (x, y) , :=
∂y x ∂y
(9.107)
or fx (x, y) :=
∂f (x, y) , ∂x
fy (x, y) :=
∂f (x, y) . ∂y
9.6.1 Multiple chain rule, total derivative
(9.108)
♣
We have the following Theorem 105 (Multiple chain rule). Consider the function f = f (x, y), with x = x(t) and y = y(t). Then, we have df ∂f dx ∂f dy = + . dt ∂x dt ∂y dt
(9.109)
[The above derivative is called the total derivative, in order to distinguish it from the ordinary derivatives (Eq. 9.14).] ◦ Comment. The fact that we use for both — ordinary and total derivatives — the same symbol, namely df /dt, should not be a source of confusion. Indeed, the total derivative coincides with the ordinary derivative of F (t) := f [x(t), y(t)]. ◦ Proof : Indeed, we have (use the splitting trick, Remark 65, p. 322) , df 1 + = lim f x(t + Δt), y(t + Δt) − f x(t), y(t) Δt→0 Δt dt f x(t + Δt), y(t + Δt) − f x(t), y(t + Δt) Δx = lim Δt→0 Δx Δt f x(t), y(t + Δt) − f x(t), y(t) Δy + lim , (9.110) Δt→0 Δy Δt which yields Eq. 9.109.
Similarly, for the function f = f (u, v), with u = u(x, y) and v = v(x, y), we have ∂f ∂f ∂u ∂f ∂v = + , ∂x ∂u ∂x ∂v ∂x as you may verify. The expression for ∂f /∂y is analogous, namely
(9.111)
394
Part II. Calculus and dynamics of a particle in one dimension
∂f ∂f ∂u ∂f ∂v = + . ∂y ∂u ∂y ∂v ∂y
9.6.2 Second– and higher–order derivatives
(9.112)
♣
The second–order derivatives are defined as the derivative of the first derivatives. For instance, we have * ∂2f 1 ) ∂ ∂f fx (x + hx , y) − fx (x, y) . = lim := (9.113) 2 hx →0 hx ∂x ∂x ∂x A similar definition holds for ∂ 2 f /∂y 2 . Also, we have , ∂2f 1+ ∂ ∂f := = lim fx (x, y + hy ) − fx (x, y) . hy →0 hy ∂x ∂y ∂y ∂x
(9.114)
In other words, first we take the derivative of f (x, y) with respect to x and then that of the result with respect to y. On the other hand, we have , ∂2f 1 + ∂ ∂f fy (x + hx , y) − fy (x, y) . (9.115) := = lim hx →0 hx ∂y ∂x ∂x ∂y We will also use the convenient notation ∂ 2 f (x, y) , ∂x2 ∂ 2 f (x, y) , (x, y) := fyx ∂y ∂x
(x, y) := fxx
∂ 2 f (x, y) , ∂x ∂y ∂ 2 f (x, y) fyy (x, y) := . ∂y 2 fxy (x, y) :=
(9.116)
◦ Warning. In this book, I will only use the above notations. However, it should be emphasized that the notations used by some authors (specifically which derivative comes first) are the opposite of those used here. [This subtlety is often irrelevant because of the Schwarz theorem on the interchangeability of the order of differentiation addressed in the following subsection.] Note that Eq. 9.115 tacitly implies ∂ fyx (x0 , y0 ) = fy (x, y0 ) . (9.117) ∂x x=x0 [Recall that the notation [· · · ]x=x0 denotes evaluation at x = x0 (Eq. 9.15).] ◦ Comment. Derivatives of order n > 2 are defined in terms of those of order n − 1.
395
9. Differential calculus
9.6.3 Schwarz theorem on second mixed derivative
♣
Note that often the second mixed derivatives are equal: fxy (y, x) = fyx (y, x). "n h k For instance, for any polynomial f (x, y) := h,k=1 chk x y , we have fxy
=
fyx
n #
=
h k chk xh−1 y k−1 ,
(9.118)
h,k=1
independently of the order of differentiation. We wonder, however, whether this is always true, namely whether the order of differentiation in a second mixed derivative may be interchanged. The answer is: “Most of the time, but not always!” To back up this claim, let me give you a counterexample. Consider the function f (x, y) = x2 tan -1
y x − y 2 tan -1 . x y
(9.119)
Using the Leibniz chain rule (Eq. 9.24), as well as [tan -1 u] = 1/(1 + u2 ) (Eq. 9.75), and (1/x) = −1/x2 (Eq. 9.49), we have ∂ y 1 tan -1 = ∂x x 1 + y 2 /x2 ∂ x 1 tan -1 = ∂x y 1 + x2 /y 2
−y −y = 2 , 2 x x + y2 1 y = 2 . y x + y2
(9.120)
Then, using the product rule (Eq. 9.18), and the multiple chain rule (Eq. 9.109), Eqs. 9.119 and 9.120 yield ∂f y y x2 y y3 = 2 x tan -1 − 2 (9.121) − = 2 x tan -1 − y. 2 2 2 ∂x x x +y x +y x Therefore, fx (0, y) = ∂f (x, y)/∂x x=0 = −y (Eq. 9.117), and hence fxy (0, 0) :=
∂ f (0, y) ∂y x
= −1.
(9.122)
y=0
Similarly, we obtain fy (x, 0) = ∂f (x, y)/∂y y=0 = x, as you may verify. Hence, ∂ fy (x, 0) (0, 0) := = 1. (9.123) fyx ∂x x=0 (x, y) = fyx (x, y). Thus, at x = y = 0, we have that fxy
396
Part II. Calculus and dynamics of a particle in one dimension
• A deeper analysis
♠
In order to clarify the issue, we can look at this result from a different angle. For any point (x, y) = (0, 0), I claim that we have ∂2f ∂2f x2 − y 2 = = 2 . ∂x ∂y ∂y ∂x x + y2
(9.124)
To this end, set g(x, y) := x2 tan -1(y/x) and obtain gy = x3 /(x2 + y 2 ). [Use the second in Eq. 9.120, with the variables x and y interchanged.] Then, we obtain (use the quotient rule, Eq. 9.22) ∂gy 2x 3x2 x4 + 3x2 y 2 = 2 − x3 2 = . 2 2 2 ∂x x +y (x + y ) (x2 + y 2 )2
(9.125)
On the other hand, consider the function h(x, y) := y 2 tan -1(x/y). We have hy = 2y tan -1(x/y) − xy 2 /(x2 + y 2 ), and ∂hy /∂x = (y 4 + 3x2 y 2 )/(x2 + y 2 )2 , as you may verify. Next, note that f (x, y) = g(x, y) − h(x, y). Thus, subtracting the two results and noting that x4 − y 4 = (x2 − y 2 )(x2 + y 2 ) (Eq. 6.37), one obtains Eq. 9.124. [You may want to sharpen your skills and verify that the same result is obtained by interchanging the order of differentiation.] Recall that we have already studied the function (x2 − y 2 )/(x2 + y 2 ) (Eq. 9.124) in Subsection 8.7.1 (see Eq. 8.117). As shown there, such a function has a regular behavior for any (x, y) = (0, 0), but is discontinuous as (x, y) tends to (0, 0), where it is not even defined! [However, (x2 − y 2 )/(x2 + y 2 ) is continuous along the lines x = 0 and y = 0 (where it takes the values −1 and 1, respectively). Accordingly, the two second mixed derivatives under consideration are properly defined.]
• Schwarz theorem
♣
The example above shows that for f (x, y) = x2 tan -1(y/x) − y 2 tan -1(x/y), we have fxy (y, x) = fyx (y, x) (Eq. 9.124), except for x = y = 0. Thus, not always the order of differentiation can be interchanged. We ask ourselves whether there is a general rule to decide when the order of the derivatives may be interchanged. This rule is provided by the following 7 Theorem 106 (Schwarz theorem). Assume that the function f (x, y) ad mits: (i) the derivatives fx = ∂f /∂x and fxy = ∂ 2 f /∂x ∂y, in a rectangular neighborhood N of a point (x0 , y0 ), with fxy continuous at (x0 , y0 ), and (ii) 7 Named after the German mathematician Karl Hermann Amandus Schwarz (1843– 1921). The important Schwarz inequality is also named after him.
397
9. Differential calculus
the derivative fy = ∂f /∂y, along the line y = y0 . Then at x = x0 and y = y0 fyx = ∂ 2 f /∂y ∂x also exists, with fyx (x0 , y0 ) = fxy (x0 , y0 ).
◦ Proof : ♠ Let us introduce the function * 1 ) fy (x0 + hx , y0 ) − fy (x0 , y0 ) . F0 (hx ) := hx
(9.126)
(9.127)
Note that, iff the limit as hx tends to zero exists, then (Eq. 9.115) fyx (x0 , y0 ) := lim F0 (hx ). hx →0
(9.128)
Accordingly, the proof of the theorem consists in showing that the limit exists and is equal to fxy (x0 , y0 ), thereby proving that fyx (x0 , y0 ) = fxy (x0 , y0 ) (Eq. 9.126). The above statement — namely that the limit exists and is equal to fxy (x0 , y0 ) — is equivalent to saying that, for any ε > 0 arbitrarily small, there exists a δx > 0 such that, for |hx | < δx , we have
F0 (hx ) − fxy (x0 , y0 ) < ε. (9.129) In order to ascertain the validity of Eq. 9.129 (thereby proving the theorem), let us introduce the function g(x, hy ) := f (x, y0 + hy ) − f (x, y0 ),
(9.130)
which is such that, for (x, y0 ) ∈ N , fy (x, y0 ) = lim
hy →0
g(x, hy ) , hy
(9.131)
since the limit exists in N , by hypothesis. Accordingly, we have that g(x0 + hx , hy ) − g(x0 , hy ) . hy →0 h x hy
F0 (hx ) = lim
(9.132)
Note that, by hypothesis, g(x, hy ) is differentiable with respect to x, with gx (x, hy ) = fx (x, y0 + hy ) − fx (x, y0 ),
(9.133)
provided of course that |x − x0 | and |hy | are adequately small, so as to have (x, y0 + hy ) ∈ N . Hence, the mean value theorem in differential calculus (Eq. 9.84) yields that there exists an x1 ∈ (x0 , x0 + hx ) such that g(x0 + hx , hy ) − g(x0 , hy ) = hx gx (x1 , hy ), with x1 = x1 (hx ). Accordingly, we have
398
Part II. Calculus and dynamics of a particle in one dimension
gx (x1 , hy ) f (x1 , y0 + hy ) − fx (x1 , y0 ) = lim x . hy →0 hy →0 hy hy
F0 (hx ) = lim
(9.134)
Next, we apply again the mean value theorem in differential calculus (Eq. 9.84), this time exploiting the fact that fxy exists in N , by hypothesis. This yields that there exists a y1 ∈ (y0 , y0 + hy ), with y1 = y1 (hy ) such that F0 (hx ) = lim fxy (x1 , y1 ) . (9.135) y1 →y0
This implies that, for |hy | adequately small, we have
ε
F0 (hx ) − fxy (x1 , y1 ) < . 2
(9.136)
On the other hand, for |hx | and |hy | adequately small, we have
ε
fxy (x1 , y1 ) − fxy (x0 , y0 ) < , 2
(9.137)
is continuous at (x0 , y0 ), by hypothesis. Thus, again for |hx | and because fxy |hy | adequately small, we have (splitting trick again, Remark 65, p. 322)
F0 (hx ) − fxy (x0 , y0 )
< F0 (hx ) − fxy (x1 , y1 ) + fxy (x1 , y1 ) − fxy (x0 , y0 ) < ε, (9.138)
in agreement with Eq. 9.129, thereby proving the theorem (Eq. 9.128).
◦ Warning. In the rest of the book, whenever the Schwarz theorem is used, I will tacitly assume that the hypotheses of the theorem are satisfied.
9.6.4 Partial derivatives for n variables
♣
We have already introduced the partial derivatives for a function of two variables (Eq. 9.106). Here, we generalized this to a function of n variables, namely f = f (x1 , x2 , . . . , xn ), as f (x1 , . . . , xk , . . . , xn ) − f (x1 , . . . , xk , . . . , xn ) ∂f := lim , xk →xk ∂xk xk − xk
(9.139)
with k = 1, . . . , n. The multiple chain rule in Eq. 9.109 may be trivially extended to three or more arguments: given the function f = f (uk ), with uk = uk (x1 , . . . , xn ) (k = 1, . . . , m), we have
399
9. Differential calculus m
# ∂f ∂uk ∂ f u1 (x1 , . . . , xn ), . . . , um (x1 , . . . , xn ) = , ∂xj ∂uk ∂xj
(9.140)
k=1
where j = 1, . . . , n. Also, the second–order derivatives are defined as ∂f ∂2f ∂ := (h, k = 1, . . . , n). ∂xh ∂xk ∂xk ∂xh
(9.141)
Finally, the Schwarz theorem on the symmetry of the second partial derivatives (Theorem 106, p. 396) applies to this case as well, since the independent variables that are not involved are kept constant, thereby reducing the problem to one with only two variables.
9.7 Appendix A. Curvature of the graph of f (x) Here, we define the curvature of the graph of a function f (x) and give an expression that relates it to the second derivative of f (x). Specifically, we begin with concepts of convexity and concavity (Subsection 9.7.1). Then, we introduce the osculating circle (Subsection 9.7.2), and finally we use it to define the curvature of the graph of a function f (x) and its relationship to the second derivative of the function (Subsection 9.7.3).
9.7.1 Convex and concave functions
♣
Consider Figs. 9.5 and 9.6. The graph in Fig. 9.5 is “curved upward,” namely the value of its slope (taken with its sign) increases with x (positive second derivative). On the other hand, the graph in Fig. 9.6 is “curved downward,” namely its slope decreases with x (negative second derivative). Here, we address these concepts in greater depth. We have the following8 Definition 192 (Convex and concave functions). A real function of a real variable f (x) is called convex (or concave upward ) in the interval [a, b] iff, given any two points x1 and x2 in the interval [a, b], no point of its x1 and x2 lies above the segment connecting x1 , f (x1 ) and graph between x2 , f (x2 ) . In mathematical terms, we have 8 The terms convex and concave come from Latin: convexus (arched) and concavus, namely con (together) plus cavus (hollow).
400
Part II. Calculus and dynamics of a particle in one dimension
Fig. 9.5 Strictly convex function
f (x) ≤ α f (x1 ) + (1− α) f (x2 )
Fig. 9.6 Strictly concave function
x := α x1 + (1− α) x2 ∈ (x1 , x2 ) , (9.142)
where α ∈ [0, 1]. The function f (x) is called lies strictly convex iff its graph completely below the segment connecting x1 , f (x1 ) and x2 , f (x2 ) (namely iff the sign ≤ in Eq. 9.142 is replaced by .) ◦ Comment. Consider a function f (x) that possesses a seconde derivative. Iff d2 f /dx2 > 0 in [a, b], f (x) is strictly convex there (see Fig. 9.5). On the other hand, iff d2 f /dx2 is negative, f (x) is strictly concave (see Fig. 9.6). [Note that Definition 192 above applies even to discontinuous functions.]
9.7.2 Osculating circle. Osculating parabola
♣
We have the following9 9
To osculate comes from osculum, Latin for “kiss” (etymologically, “small mouth”). Thus, the osculating circle is the circle that “kisses” the curve. [Os, oris is Latin for mouth. Oral and osculating have the same root!]
401
9. Differential calculus
Definition 193 (Osculating circle. Osculating parabola). Consider a function y = f (x) that is twice differentiable at x = c. The osculating circle to the graph of y = f (x) is the circle described by the function . y = g(x) := yC ∓ R2 − (x − xC )2 , (9.144) where the coordinates of its center C = (xC , yC ) and its radius R are such that f (x) and g(x) take the same value at x = c, and so do their first and second derivatives. [The upper (lower) sign corresponds to the osculating semicircle for functions that are concave upwards (downwards), as you may verify. More on this in Subsection 9.7.3.] Similarly, given a function f (x), the osculating parabola at x = c, is given by y = q(x) = a2 x2 + a1 x + a0 , with ak such that the functions f (x) and q(x) take the same value at x = c, and so do their first and second derivatives.
9.7.3 Curvature vs f (x)
♣
Here, we define the curvature of the graph of y = f (x). We have the following Definition 194 (Curvature). The signed curvature κS of the graph of y = f (x) is given κS :=
±1 , R
(9.145)
where R is the radius of its osculating circle, whereas, as in Eq. 9.144, the upper (lower) sign applies for functions that are concave upwards (downwards). [The curvature κ is 1/R > 0.] Using this definition, we have the following Theorem 107 (Expression for signed curvature). The signed curvature of the graph of a twice differentiable function y = f (x), at x = c, is given by f (c) . 2 ,3/2 1 + f (c)
κS(c) = +
(9.146)
◦ Proof : Setting x − xC = Δx, Eq. 9.144 yields (use Eq. 9.68) g (x) = √ This implies
±Δx . R2 − Δx2
(9.147)
402
Part II. Calculus and dynamics of a particle in one dimension
2 1 + g (x) =
R2 . R2 − Δx2
(9.148)
In addition, Eq. 9.147 yields (use Eq. 9.69) g (x) = √
±1 Δx2 ±R2 ± = 3/2 , 3/2 R2 − Δx2 R2 − Δx2 R2 − Δx2
(9.149)
or (use Eq. 9.148) g (x) =
2 ,3/2 ±1 + 1 + g (x) . R
(9.150)
Thus, combining the above equation with Eq. 9.145 and recalling that we impose f (c) = g (c) and f (c) = g (c) (by Definition 193, p. 401, of osculating circle), one obtains Eq. 9.146. If [f (c)]2 1, neglecting higher–order terms, one obtains an approximate expression for the curvature of the graph of y = f (x) at x = c, namely κS(c) f (c).
(9.151)
This expression is often used throughout this book. ◦ Comment. In Remark 67, p. 325, we presented a geometrical interpretation of the equation cos x 1 − 12 x2 (Eq. 8.69). Here, we address the issue in greater depth. To this end, I claim that y = 1− 12 x2 is the osculating parabola to the unit circle, centered at the origin, at the point (0, 1). Indeed, at x = 0 (where y = 0 and y = 1), the radius of curvature of the parabola is κ = 1 (use Eq. 9.146), as it should since the radius of the unit circle equals 1.
9.8 Appendix B. Linear operators. Linear mappings
♣
In this appendix, we discuss operators (Subsection 9.8.1), linear operators (Subsection 9.8.2), linear–operator equations (Subsection 9.8.3), superposition theorems (Subsection 9.8.4) and linear mappings (Subsection 9.8.5).
9.8.1 Operators
♣
Recall that in this book, the term operator is used in a restrictive sense. Indeed, as stated in Definition 133, p. 253, an operator is a particular case of mapping, specifically one that applied to a function produces a function.
403
9. Differential calculus
Thus far, the derivative is the most important operator that we have encountered (see Remark 78, p. 368). Other operators acting on y = f (x) that we have seen are: (i) multiplication by a scalar, namely c f (x), (ii) taking the k-th power, namely f k (x), and (iii) their linear combinations, "n k namely k=0 ck f (x), that is, polynomials of the function f (x). We will encounter other types of operators, such as linear combinations of derivatives of different orders (see Eq. 9.162), and primitives (see Chapter 10). ◦ Comment. “Repetita iuvant” (Latin for “repetitions help”), as my high– school Latin teacher used to say when he was drilling his subject matter inside our reluctant brains. Accordingly, remember that, as already pointed out in Subsection 6.8.1, some authors use interchangeably the terms “function,” “operator” and “mapping” with the meaning that here is reserved for the term mapping, namely as an algorithm that applied to some mathematical entity (such as numbers, n-tuples of numbers, functions) generates another mathematical entity (Definition 132, p. 253). According to their definition, matrix times vector multiplication is an example of an operator. Here, the term “operator” is used only with the meaning indicated in Definition 133, p. 253, namely for an algorithm that, given a function, produces a function (real or complex, as the case may be).
9.8.2 Linear operators
♣
You are already familiar with the linearity of the matrix–by–vector multiplication (see Eq. 3.130). This notion, however, has a much broader validity and applies to several operators (and mappings) that will be encountered in this book. It applies, in particular, to the derivative and related operators, such as those in Eq. 9.162, which are the only operators of interest in this chapter. Accordingly, here we can generalize to operators the results obtained in Section 3.6, on linearity and superposition theorems for algebraic systems of equations. To begin with, we ask ourselves: “What is a linear operator?” In analogy with the definition of linearity for sequences (Definition 148, p. 307), we have the following Definition 195 (Linear operator). An operator L is called linear iff, given two functions u1 (x) and u2 (x), along with two scalars α1 and α2 , we have L α1 u1 (x) + α2 u2 (x) = α1 Lu1 (x) + α2 Lu2 (x). (9.152) An equivalent definition is:
404
Part II. Calculus and dynamics of a particle in one dimension
Definition 196 (Equivalent definition of linear operator). An operator L is called linear iff, for any two functions u1 (x) and u2 (x) and any scalar α, we have L α u(x) = α Lu(x), (9.153) L u1 (x) + u2 (x) = Lu1 (x) + Lu2 (x). (9.154) The equivalence stems from the following facts. First, Eqs. 9.153 and 9.154 are consequences of Eq. 9.152. [Indeed, by setting α2 = 0 in Eq. 9.152, one obtains Eq. 9.153, whereas by setting α1 = α2 = 1 in Eq. 9.152, one obtains Eq. 9.154.] Vice versa, Eq. 9.152 is a consequence of Eqs. 9.153 and 9.154. [Indeed, by applying first Eq. 9.154 and then 9.153 to the left side of Eq. 9.152, one obtains the right side.] Repeated applications of Eq. 9.152 yield that, for αk constant, L
n #
αk uk (x) =
k=1
n #
αk Luk (x),
(9.155)
k=1
as you may verify. In other words, if we apply a linear operator to a linear combination of functions, the result is a linear combination of the operator applied to the individual functions. Multiplication by a function, say c(x), is a linear operator. Indeed, c(x) α1 u1 (x) + α2 u2 (x) = α1 c(x) u1 (x) + α2 c(x) u2 (x). (9.156) Also, we have the following Theorem 108. Given n linear operators, Lk , we have that their linear combination, with coefficient ck (x) that are functions of x, namely L=
n #
ck (x) Lk ,
(9.157)
k=1
is also a linear operator. ◦ Proof : Indeed, we have n # ck (x) Lk α1 u1 (x) + α2 u2 (x) L α1 u1 (x) + α2 u2 (x) = k=1
= α1
n #
ck (x) Lk u1 (x) + α2
k=1
= α1 Lu1 (x) + α2 Lu2 (x),
n #
ck (x) Lk u2 (x)
k=1
(9.158)
405
9. Differential calculus
in agreement with the theorem.
We also have the following Theorem 109. The operator consisting in the application of a linear operator repeated k times, Lk := L . . . L, is also a linear operator. In particular, L2 := L L
(9.159)
is a linear operator ◦ Proof : Consider first Eq. 9.159. We have + , L2 α1 u1 (x) + α2 u2 (x) = L L α1 u1 (x) + α2 u2 (x) + , = L α1 Lu1 (x) + α2 Lu2 (x) = α1 L2 u1 (x) + α2 L2 u2 (x).
(9.160)
The case with n > 2 repeated application is a simple extension of the proof for n = 2, as you may verify, first for n = 3, then, for n = 4, and so on. [♠ Strictly speaking, we should apply the principle of mathematical induction (Subsection 8.2.5), which, we recall, says in essence that a statement is true for all values of n, provided that: (i) it is true for an initial value of n (n = 2, in our case), and (ii) we can prove that, if true for n = k, it is true for n = k + 1 as well. In our case, we have + , Lk+1 α1 u1 (x) + α2 u2 (x) = L Lk α1 u1 (x) + α2 u2 (x) (9.161) = L α1 Lk u1 (x) + α2 Lk u2 (x) = α1 Lk+1 u1 (x) + α2 Lk+1 u2 (x). The second equality is valid by the above Item (ii) of the principle of mathematical induction.] ◦ Comment. Similarly, we may prove that successive applications of not– necessarily–equal linear operators is still a linear operator, as you may verify. Finally, we have the following Theorem 110. Given a linear operator, L, we have that a linear combination of repeated applications Lk of this operator is still a linear operator. ◦ Proof : Combine the last two theorems, using Lk := Lk .
In particular, we have that (note: if L := d/dx, then Lk := dk /dxk ) L :=
n # k=0
ck (x)
dk d dn + · · · + c = c (x) + c (x) (x) 0 1 n dxk dx dxn
(9.162)
is a linear operator, called a linear differential operator of order n. [It is addressed in detail in Vol. II.]
406
Part II. Calculus and dynamics of a particle in one dimension
9.8.3 Linear–operator equations
♣
An equation of the type Lu = f (x),
(9.163)
where L is a linear operator, is called linear. If f (x) = 0, the equation is called homogeneous, otherwise is called nonhomogeneous. The term f (x) is called the input function (forcing function in dynamics). Remark 82. Important linear–operator equations addressed in this book are: 1. Linear differential equations. An equation is called differential iff the corresponding expression contains the unknown function f (t), and at least one of its derivatives. The order of the equation is that of the highest derivative. 2. Systems of linear differential equations. 3. Partial differential equations. They involve partial derivatives. 4. Integral equations. They involve integrals (introduced in the next chapter). No derivatives! 5. Integro–differential equations. They involve integrals and derivatives!
9.8.4 Uniqueness and superposition theorems
♣
We have the following (note the analogy with Definition 72, p. 130, for matrices) Definition 197 (Associated homogeneous equation). Given a linear equation of the type Lu = f (x), the associated homogeneous equation is obtained by setting f (x) = 0, to yield Lu = 0. Then, we may state the following theorems: Theorem 111 (Uniqueness of the solution of Lu = f ). If the associated homogeneous equation, Lu = 0, admits only the trivial solution u(x) = 0, then solution to the nonhomogeneous equation Lu = f (x), if it exists, is unique. ◦ Comment. Note the analogy with the first part of Theorem 36, p. 130, for linear algebraic systems.
407
9. Differential calculus
◦ Proof : Let us assume that the nonhomogeneous equation has two solutions, namely that there exist u1 (x) and u2 (x) such that Lu1 = f (x) and Lu2 = f (x). Then, we have L(u1 − u2 ) = Lu1 − Lu2 = f (x) − f (x) = 0,
(9.164)
which implies u1 (x) − u2 (x) = 0, by hypothesis. Hence, the solution is unique. Theorem 112 (Superposition for linear homogeneous equations). If uk (x) (k = 1, . . . , n) are n linearly independent solutions of the homogeneous "n equation, namely if Luk = 0, then u(x) = k=1 ck uk (x), with ck arbitrary constants, is also a solution to Lu = 0. ◦ Proof : Indeed, using Eq. 9.155, we have Lu = L
n #
c k uk =
k=1
n #
ck Luk = 0,
(9.165)
k=1
in agreement with the theorem.
Theorem 113 (Superposition of input functions). If the function uk (x) "n (k = 1, . . . , n) satisfies the equation Luk = fk (x), then u(x) = k=1 αk uk (x), where αk are prescribed constants, is a solution to Eq. 9.163 with f (x) =
n #
αk fk (x).
(9.166)
k=1
◦ Proof : Indeed, using Eqs. 9.155 and 9.166, we have Lu = L
n # k=1
α k uk =
n #
αk Luk =
k=1
n #
αk fk (x) = f (x),
(9.167)
k=1
in agreement with the theorem.
Theorem 114 (Superposition for linear nonhomogeneous equations). If the functions ukH (x), with k = 1, . . . , n, are solutions of the associated homogeneous equation, namely if LukH (x) = 0, and if uN(x) is any solution to the nonhomogeneous equation LuN(x) = f (x), then u(x) = uN(x) +
n #
ck ukH (x),
k=1
where ck are arbitrary constants, is also a solution to Lu = f (x).
(9.168)
408
Part II. Calculus and dynamics of a particle in one dimension
◦ Proof : Since L is linear, we have n n # # ck ukH (x) = LuN(x) + ck LukH (x) = f (x), Lu = L uN(x) + k=1
(9.169)
k=1
in agreement with the theorem.
Remark 83. For a differential equation of order n, the expression in Eq. 9.168 is called the general solution to the nonhomogeneous equation, provided that the n functions ukH (x) are linearly independent. [The expression in Eq. 9.168 with uN(x) = 0 is called the general solution to the homogeneous equation, provided again that the n functions ukH (x) are linearly independent.] Combining the last two theorems, we have that if ukH (x) (k = 1, . . . , n) are solutions of the associated homogeneous equation (namely if LukH (x) = 0), whereas the function uj (x) (j = 1, . . . , m) satisfies the equation Luj = fj (x), "m then the solution to Eq. 9.163 with f (x) = j=1 αj fj (x) (with αj prescribed constants) is given by u(x) =
m # j=1
αj uj (x) +
n #
ck ukH (x),
(9.170)
k=1
where ck are arbitrary constants. ◦ A point of clarification on superposition theorems. The above superposition theorems, valid exclusively for linear equations, greatly simplifies the study of these equations. For this reason, the solutions of linear equations are much simpler to obtain than those of nonlinear equations. Again, this is particularly important in engineering, where the engineer can, to a large extent, choose the type of physical object to be used in their work. If at all possible, engineers design their object so that it behaves like a linear system (namely so that the mathematical approximation obtained after linearization offers a good description of the actual system), because linear equations are much simpler to study than nonlinear ones. [The advantage is largely due to the fact that the superposition theorems are not available for nonlinear operators.] For instance, a linear spring is always preferable to a nonlinear one. Thus, during the design, the engineer typically makes sure that the deformation of the spring is limited to a range such that the linear model is adequate.
409
9. Differential calculus
9.8.5 Linear mappings
♠
As stated in Definition 132, p. 253, the term mapping includes any algorithm that, applied to some mathematical entity (such as numbers, or an n-tuples of numbers, functions), generates another mathematical entity, not necessarily of the same type. [Recall the mappings that we have already introduced, which include: functions (which applied to a number produce another number); matrices (which applied to a vector produce another vector); operators (which applied to a function produce another function); and functionals (which applied to a function produce a number, Subsection 6.8.1).] We have the following Definition 198 (Linear mapping). A mapping LM is called linear iff, given two pertinent mathematical entities u1 and u2 , along with two scalars α1 and α2 , we have LM α1 u1 + α2 u2 = α1 LMu1 + α2 LMu2 . (9.171) The consideration regarding linear operators in the preceding subsections apply, of course, to linear mappings as well.
9.9 Appendix C. Finite difference approximations
♣
In this chapter, we have introduced the derivative of a function as the limit of the difference quotient. The reverse process is widely used for computer calculations. [For, computers cannot perform limits. Accordingly, we typically approximate a derivative with the corresponding difference quotient.] Specifically, consider a suitably smooth function, which is defined for all x ∈ [a, b], but is addressed only at given points, here assumed for simplicity to be uniformly spaced, with xk = k Δx. Assume that you want to have an approximate evaluation of the derivative of f (x) at the point xk+ 12 that lies between xk and xk+1 . This is given by f (xk+ 12 )
f (xk+1 − f (xk ) , h
where h = Δx.
(9.172)
Indeed, if we take the limit as h = Δx tends to zero, the right side approaches the left one. Similarly, we have f (xk− 12 )
f (xk ) − f (xk−1 ) . h
(9.173)
410
Part II. Calculus and dynamics of a particle in one dimension
If you want the value at xk , you can use the average of the two and obtain f (xk )
f (xk+1 ) − f (xk−1 ) . 2h
(9.174)
On the other hand, if you want an approximation for the second–order derivative, you can use f (xk ) [f (xk+ 12 ) − f (xk− 12 )]/h, namely f (xk )
f (xk+1 ) − 2 f (xk ) + f (xk−1 ) . h2
(9.175)
[Compare to Eq. 9.30.] We may also obtain approximations for higher–order derivatives. For instance, for the third–order derivative, we have (use the first derivative of the second derivative) f (xk ) [f (xk+ 12 ) − f (xk− 12 )]/h, namely f (xk )
f (xk+ 32 ) − 3 f (xk+ 12 ) + 3 f (xk− 12 ) − f (xk− 32 ) h3
.
(9.176)
Similarly, for the fourth–order derivative, we have (use the second derivative of the second derivative) f (4) (xk ) [f (xk+1 )−2 f (xk )+f (xk−1 )]/h2 , namely f (4) (xk )
f (xk+2 ) − 4 f (xk+1 ) + 6 f (xk ) − 4 f (xk−1 ) + f (xk−2 ) . (9.177) h4
[An error analysis of the above finite difference approximation is presented in Section 13.7, after we introduce the appropriate tools to address this issue, namely the so–called Taylor polynomials, which are the subject of Chapter 13.] ◦ Comment. Note that the coefficients in Eqs. 9.175, 9.176 and 9.177 are the same as those for (1 − x)n , with n = 2, 3 and 4, as you may verify. [Except for the sign, these are equal to the binomial coefficients, which will be introduced in Subsection 13.4.2 (Eq. 13.35).]
Chapter 10
Integral calculus
In this chapter, we introduce two important notions: (i) the so–called integral of a function, and (ii) the so–called primitive of a function. We also show the relationship between integrals, primitives and derivatives. ◦ Warning. What I call the “primitive” is almost universally referred to as the “indefinite integral,” whereas what I call the “integral” is typically referred to as the “definite integral.” [More on this later.]
• Overview of this chapter In Section 10.1, we define the integral of a continuous function over a bounded closed interval [a, b], as the area under the graph of a function, between the vertical lines x = a and x = b. We also present some properties of integrals, in particular the mean value theorems, the rule for changing th variable of integration, and integrals od even and odd functions. In addition, we extend the definition to piecewise–continuous functions. Then, in Section 10.2, we define an apparently unrelated operation: the primitive (or antiderivative) of a function, namely, the inverse of the derivative. In Section 10.3, we show the relationship between the two. In particular, we show how an integral of a function may be evaluated from the corresponding primitive. In Section 10.4, we introduce two rules that may be used for the evaluation of an integral, namely the integration by substitution and the integration by parts. Next, in Section 10.5, we address some interesting applications. In particular, we present: (i) the so–called method of separation of variables for ordinary differential equations, (ii) a proof that sin x and cos x are linearly independent, and (iii) the expression for the length of a line. Finally, in Section 10.6, we show how to evaluate the derivative of an integral.
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_10
411
412
Part II. Calculus and dynamics of a particle in one dimension
We also have two appendices. In the first (Section 10.7), we extend the definition of an integral given in Section 10.1, which is limited to functions that are bounded over a bounded closed interval [a, b], so as to remove some limitations. [In particular, we address the so–called improper integrals of the first and second kind, namely integrals for: (i) unbounded integrand and (ii) unbounded intervals.] In the second appendix (Section 10.8), we present a more refined definition of an integral (the so–called Darboux integral). This allows us to identify sufficient regularity conditions for a function to be Darboux integrable (specifically, that the function is piecewise–continuous).
10.1 Integrals Consider a function f (x) > 0 that is continuous in the bounded interval [a, b], with −∞ < a < b < ∞. Let A(a, b) denote the area underneath the graph of f (x), between the lines x = a and x = b (Fig. 10.1). To evaluate A(a, b), let us subdivide the interval [a, b] into n uniformly spaced subintervals [xk−1 , xk ] (k = 1, . . . , n), where xk = a + k Δx, with Δx = (b − a)/n, so that x0 = a and xn = b.
Fig. 10.1 Integral of f (x)
Fig. 10.2 Histogram from f (x)
Next, let us approximate the function f (x) with a histogram, f˘(x). [A histogram is the graph of a function that is piecewise constant.] Specifically, by definition, in the k-th interval (xk−1 , xk ), f˘(x) takes the value that f (x) has in the corresponding middle point x ˘k = namely (see Fig. 10.2):
1 (xk−1 + xk ), 2
(10.1)
413
10. Integral calculus
f˘(x) = f (˘ xk ),
for x ∈ (xk−1 , xk ).
(10.2)
The area A(a, b) may be evaluated by approximating the area under the curve with the area of the histogram, and then taking the limit as n tends to infinity and Δx = (b − a)/n tends to zero. This yields A(a, b) = lim
n→∞
n #
f (˘ xk ) Δx.
(10.3)
k=1
◦ Comment. Note that the assumption that f (x) > 0 made in the beginning is irrelevant in Eq. 10.3. Indeed, the interpretation of A(a, b) in terms of areas still applies, provided that we interpret as negatives the areas when f (x) < 0 (namely the areas below the x-axis). The operation defined by the right side of Eq. 10.3 is referred to as the integral of the function f (x) between a and b, even when f (x) is not necessarily positive. Specifically, we have the following1 Definition 199 (Riemann integral of a continuous function). Consider a function f (x) that is continuous in the bounded closed interval [a, b], so that f (x) is bounded there (Theorem 82, p. 336). The quantity n #
b
/
f (x) dx := lim
n→∞
a
f (˘ xk ) Δx
(10.4)
k=1
is called the Riemann integral of the function f (x) between a and b. The function f (x) is called the integrand.2 The variable x is called the dummy variable of integration. The endpoints of the interval, a and b, are called respectively upper and lower limits of integration. A function for which the limit in Eq. 10.4 exists is called Riemann integrable. [A refinement is given in Definition 202, p. 443.] ◦ Comment. Akin to the dummy index of summation (Eq. 3.23), here dummy means that the symbol used for it is irrelevant, in the sense that any symbol may be used instead of x, without affecting the result. For instance, we have b
/
b
/ f (x) dx =
a 1
f (u) du.
(10.5)
a
The Riemann integral is named after Georg Friedrich Bernhard Riemann (1826–1866), an important German mathematician, who contributed to analysis, theory of complex variables (the Cauchy–Riemann conditions are also named after him), and differential geometry (Riemannian geometry), with implications for the Einstein formulation of general relativity. 2 From integrandum, Latin for “that which is to be integrated.”
414
Part II. Calculus and dynamics of a particle in one dimension
Remark 84. As mentioned in the warning at the beginning of this chapter, the operation defined by the right side of Eq. 10.3 is almost universally referred to as the definite integral. This is done in order to distinguish it from the indefinite integral (or primitive), which is introduced in Section 10.2. Such terminology has always been a source of confusion for my students (and for me, when I was a student). In my experience, the confusion is eliminated by adopting the terminology used here, namely: (i) the term integral (instead of “definite integral ”) is used for the quantity defined in Eq. 10.4; (ii) the term primitive, or antiderivative (instead of “indefinite integral ”) is used for the quantity introduced in Section 10.2. [More on this in Remark 85, p. 420.]
10.1.1 Properties of integrals In this subsection, we examine the basic properties of integrals. Note that an integral is a functional (Definition 134, p. 253).
• Absolute value of an integral The definition of integral (Eq. 10.4) implies (use |A ± B| ≤ |A| + |B|, Eq. 1.70)
/ b
/ b
f (x) dx. f (x) dx
≤ (10.6)
a
a
• Linearity of integrals Using the definition of integral, as well as the linearity of limits (Theorem 8.13, p. 307), one obtains that, given any two constants α and β, we have / a
b
α f (x) + β g(x) dx = α
/
b
/ f (x) dx + β
a
b
g(x) dx.
(10.7)
a
Hence, the integral is a linear functional. [The above rule often helps in breaking down complicated integrals into simpler ones.]
415
10. Integral calculus
• Additivity of integration intervals It is apparent from the definition that, for a < b < c, b
/
c
/ f (x) dx +
c
/ f (x) dx =
a
f (x) dx.
b
(10.8)
a
As we will find out, it would very convenient for the above equation to be valid even when we do not have a < b < c. Accordingly, we impose such a rule to be valid (by definition) for all values of a, b and c! In particular, applying Eq. 10.8 to the case a = c, we have b
/
a
/ f (x) dx +
a
a
/ f (x) dx =
b
f (x) dx = 0,
(10.9)
a
or b
/
f (x) dx = −
a
a
/
f (x) dx.
(10.10)
b
• Mean value theorems in integral calculus We have the following Theorem 115 (Mean value theorem in integral calculus). Consider a function f (x), continuous in the bounded closed interval [a, b]. There exists a point ξ ∈ [a, b] such that b
/
f (x) dx = (b − a) f (ξ).
(10.11)
a
◦ Proof : Let fMax and fMin denote respectively the maximum and minimum of f (x) in [a, b]. Then, the definition of integral implies (b − a) fMin ≤
b
/
f (x) dx ≤ (b − a) fMax.
(10.12)
a
Hence, the integral equals (b − a) fξ , for some fξ ∈ (fMin, fMax). Then, Corollary 3, p. 342, to the intermediate value theorem for continuous functions, guarantees that there exists a ξ ∈ (xMin, xMax) such that f (ξ) = fξ , since f (x) is assumed to be continuous in [a, b]. This implies Eq. 10.11. We have the following
416
Part II. Calculus and dynamics of a particle in one dimension
Corollary 5 (Corollary to mean value theorem). Consider a function f (x), continuous in the bounded closed interval [a, b]. Then, lim
b→a
1 b−a
b
/
f (x) dx = f (a).
(10.13)
a
◦ Proof : Use Eq. 10.11, as well as the fact that limb→a f (ξ) = f (a), since ξ ∈ [a, b]. For future reference, consider also the following Theorem 116 (Extended mean value theorem in integral calculus). Assume that f (x) is continuous in [a, b], and that g(x) is integrable in [a, b]. In addition, assume that g(x) has always the same sign in [a, b]. Then, there exists a point ξ ∈ [a, b] such that b
/
b
/ f (x) g(x) dx = f (ξ)
a
g(x) dx.
(10.14)
a
◦ Proof : ♠ Let us assume for simplicity that g(x) > 0 in [a, b]. [This is without loss of generality. For, if g(x) < 0 in [a, b], simply use −g(x).] Let fMax and fMin denote respectively the maximum and minimum of f (x) in [a, b]. Then, we have fMin g(x) ≤ f (x) g(x) ≤ fMax g(x).
(10.15)
Integrating from a to b, we have b
/ fMin a
g(x) dx ≤
/
b
f (x) g(x) dx ≤ fMax
a
b
/
g(x) dx.
(10.16)
a
Hence, Eq. 10.14 follows, as in the proof of the preceding theorem.
• Substitution of the variable of integration Here, we want to see what happens to the operation of integration if we substitute the dummy variable of integration, say x, with a different one, say u, which is related to x through a function, namely as x = g(u). For the sake of simplicity, let us assume g(u) to be a continuously differentiable function that is monotonic (Definition 130, p. 250), which is typically the case. Accordingly, x = g(u) may be uniquely inverted to yield a single–valued function u = g -1(x) (Section 6.6).
417
10. Integral calculus
It is apparent that the new integrand will be obtained as follows: given a value of u, we can evaluate x = g(u). Then, we have that the integrand will be given by the composite function f [g(u)]. Also, the limits of integration will change: instead of integrating from a to b, we will have to integrate from ua := g -1(a) to ub := g -1(b). However, these changes (namely those of integrand and limits of integration) are not all there is to it! Indeed, the abscissa for the graph of y = f (x) between a and b differs from that for the graph of y = f [g(u)] between ua and ub . To see what to do about it, it is convenient to return to the definition of integral (Eq. 10.4). Assume, for the time being g (u) > 0. Accordingly, in addition to the above substitutions, we have to use the fact that (use Eq. 9.6.1) Δx = g (u) Δu + o[Δu].
(10.17)
Thus, for the term on the right side of Eq. 10.4, we have n #
f (˘ xk ) Δx =
k=1
n #
f [g(˘ uk )] g (˘ uk ) Δu + o[1].
(10.18)
k=1
Then, in the limit, one obtains b
/
/
ub
f (x) dx =
f [g(u)] g (u) du,
(10.19)
ua
a
where ua := g -1(a) and ub := g -1(b). Equation 10.19 is the desired integration– by–substitution rule to be used to change the dummy variable of integration. In particular, let us make the substitution x = α u + β, which implies dx = α du and u = (x − β)/α. Accordingly, we have b
/
/
(b−β)/α
f (x) dx = α
f (α u + β) du.
(10.20)
(a−β)/α
a
Less elementary examples are presented in Subsection 10.4.1. ◦ Comment. Above, we have assumed g (u) > 0. Let us see what happens if g (x) < 0. First of all, Eq. 10.19 is still valid, as you may verify. In this case, however, we have ua > ub , and hence we typically use b
/
/
ub
f (x) dx = a
ua
f [g(u)] g (u) du =
/
ua
f [g(u)] g (u) du,
ub
which is a more interesting way to present Eq. 10.19.
(10.21)
418
Part II. Calculus and dynamics of a particle in one dimension
• Integrals of even and odd functions Recall the definition of even functions fE(−x) = fE(x) (Eq. 6.7), and odd functions fO(−x) = −fO(x) (Eq. 6.8). We have the following Theorem 117. If fO(x) is an odd function, then its integral over a symmetric interval [−a, a] vanishes: / a fO(x) dx = 0. (10.22) −a
If fE(x) is an even function, then its integral over a symmetric interval [−a, a] equals twice that over the interval (0, a): / a / a fE(x) dx = 2 fE(x) dx. (10.23) −a
0
◦ Proof : The theorem is an immediate consequence of the definitions of integral (Eq. 10.4), and of even and odd functions (Eqs. 6.7 and 6.8), as you can see by using the graphs of the functions, as well as the definition of integrals in terms of areas. To address this in a more formal way, consider the two subintervals (−a, 0) and (0, a) and note that, for any f (x), we have /
0
−a
f (x) dx = −
0
/
/ f (−u) du =
a
a
f (−x) dx,
(10.24)
0
as you may verify. [Hint: For the first equality, use Eq. 10.21 with x = −u. For the second equality, simply rename the dummy variable from u into x (Eq. 10.5) and use Eqs. 10.10 and 10.20.] Then, if the function is odd, the contributions of the two subintervals in Eq. 10.4 offset each other (in agreement with Eq. 10.22), whereas, if the function is even, the contributions of the two subintervals in Eq. 10.4 are equal, and their sum may be replaced with twice the contribution from the right subinterval, in agreement with Eq. 10.23.
10.1.2 Integrals of piecewise–continuous functions Note that the values of f (x) at the endpoints x = a and x = b are never used in Eq. 10.4. Thus, using Eq. 10.8 we could broaden the definition and say that f (x) is integrable even if f (x) is piecewise–continuous (Definition 163, p. 345). [Alternatively, use the convention on piecewise–continuous functions (Remark 70, p. 346).]
419
10. Integral calculus
Here, we address the issue in detail. Assume that f (x) is bounded, but only piecewise–continuous, with removable or jump discontinuities at x = Xh ∈ (a, b) (h = 0, . . . , n), with Xh < Xh+1 . Thus, setting X0 = a and Xn = b, f (x) is bounded and continuous within (Xh−1 , Xh ) (h = 1, . . . , n). Then, extending the additivity property of integration intervals for continuous functions (Eq. 10.8), we can define b
/
f (x) dx := a
n / # k=1
Xk
f (x) dx,
(10.25)
Xk−1
where for each of the n integrals on the right side we assume the function to be Riemann integrable in [Xh−1 , Xh ] (Eq. 10.4).
10.2 Primitives In this section, we introduce an important new notion, namely the inverse of the derivative operator. Specifically, given a function f (x), we are interested in any function F (x) whose derivative coincides with f (x). As we will see in the next section (specifically in Subsection 10.3.2), this operation is essential for the evaluation of integrals. Thus, this operation is very important and F (x) deserves a name. Accordingly, we have the following Definition 200 (Primitive). Let f (x) denote a function of x, continuous in [a, b]. Any function F (x) such that dF = f (x) dx
(10.26)
is called a primitive (or an antiderivative) of f (x), for all x ∈ [a, b]. The universally used symbol to denote the primitive is3 / F (x) = f (u) du. (10.27) [Of course, the term “primitive” has nothing to do with the term “primitive concept” (Subsubsection“Primitive terms,” p. 14).] 3
Both Newton and Leibniz used antidifferentiation to obtain integrals, such as areas for primitives (an elongated S for and volumes (Kline, Ref. [38], p. 379). The symbol
b
“sum”) was introduced by Leibniz (Kline, Ref. [38], p. 373). The symbol a for integrals was introduced by Joseph Fourier (Grattan-Guinness, Ref. [27], p. 359). Jean-Baptiste Joseph Fourier (1768–1830) was a French mathematician and physicist. The Fourier series, the Fourier transform, and the Fourier law in heat conduction are named after him.
420
Part II. Calculus and dynamics of a particle in one dimension
◦ Comment. Note that akin to the derivative operator, the primitive operator is linear, as you may verify. ◦ Comment. Note that the primitive is a mapping that, given a function, f (u), produces a function, F (x). Thus, this mapping is an operator (Definition 133, p. 253). This operator is called the primitive operator, and — more often than not — simply as the primitive. Akin to the derivative (Remark 79, p. 370), the use of the same term to indicate both, the function and the operator, should not cause any problem.] Remark 85 (On notation and terminology). The symbol used for primitives (Eq. 10.27) is the same as that for integrals, except for the fact that the limits of integration are missing. As mentioned above, in the next section we will see how the primitive allows one to evaluate integral. Thus, the symbol used to denote the primitive somehow makes sense, in that it emphasizes the relationship between integrals and the instrument used to evaluate them. However, the two concepts are considerably different. Indeed, the integral is a functional, whereas the primitive is an operator. As mentioned in Remark 84, p. 414, often this difference is not properly emphasized, and this may be confusing. To make this worse, in the literature primitives are often referred to as “indefinite integrals.” In contrast, the integrals as introduced above are referred to as “definite integrals.” Indeed, the term “indefinite integrals” is at least as common as is the term “primitive.” [The term “antiderivative,” which is the one that most emphasizes the concept in the definition, is used much less.] In this book, I will use exclusively the term “primitive.” If I want to remind you of the definition of primitive, I will add the term “antiderivative,” which I will not use by itself, since it is not widely used. In any event, I will stay away from the terms “definite integral” and “indefinite integral,” which in my experience, as a student and as a teacher, are major sources of confusion. We have the following Theorem 118. The primitive of the zero function 0(x) (Eq. 6.13) is a constant. ◦ Proof : This is simply a different way to state Theorem 97, p. 376.
In general, the definition of primitive implies that the primitive is not unique, since adding a constant does not alter the result. Indeed, we have the following Theorem 119 (Distinct primitives of a given function). Let F1 (x) and F2 (x) denote any two primitives of f (x). We have that F1 (x) and F2 (x) differ at most by an additive constant:
421
10. Integral calculus
F2 (x) = F1 (x) + C,
(10.28)
where C is an arbitrary constant. ◦ Proof : Indeed, if F1 (x) and F2 (x) two primitives of f (x), we have d F2 (x) − F1 (x) = f (x) − f (x) = 0, dx which implies Eq. 10.28 (Theorem 118 above).
(10.29)
◦ Comment (Primitives of even and odd functions). Recall the definition of even functions, fE(−x) = fE(x), and odd functions, fO(−x) = −fO(x) (Eqs. 6.7 and 6.8, respectively). The facts that the derivative of an even function is odd, whereas that of an odd function is even (Eq. 9.28), imply that the primitive of an even function is odd, whereas that of an odd one is even.
10.2.1 Primitives of elementary functions Here, we present a few illustrative examples of primitives which are obtained from what we know about derivatives. [Extensive lists of primitives are available for instance in Abramowitz and Stegun (Ref. [2]), Gradshteyn and Ryzhik (Ref. [25]) and Jeffrey (Ref. [33]).] For instance, if n is an integer, with n = −1, we have (use Eq. 9.46) / xn+1 xn dx = (n = −1). (10.30) n+1 [The case n = −1 will be addressed in connection with the definition of logarithms (Eq. 12.1).] If the exponent is real, we have (use Eq. 9.39) / xα+1 xα dx = (α = −1). (10.31) α+1 [Recall that in the real field, xα is defined only for x = 0 when α is a negative integer, and only for x > 0 in general (Remark 53, p. 243).] In particular, for α = − 12 , we have (use Eq. 9.52) / √ 1 √ dx = 2 x. (10.32) x Similarly, we have (use Eqs. 9.70–9.72)
422
Part II. Calculus and dynamics of a particle in one dimension
/
/ cos x dx = sin x;
sin x dx = − cos x;
/
1 dx = tan x. (10.33) cos2 x
For primitives that involve the inverse trigonometric functions, we have (use Eqs. 9.73–9.75) / / 1 1 √ dx = sinP-1 x = − cosP-1 x; dx = tan -1 x. (10.34) 1 + x2 1 − x2 Moreover, we have (use Eqs. 9.68 and 9.69) / x √ dx = ± 1 ± x2 2 1±x
(10.35)
and /
∓1 x dx = √ . 2 3/2 (1 ± x ) 1 ± x2
(10.36)
10.3 Use of primitives to evaluate integrals As stated above, the primitives, introduced as the inverse operator of a derivative, are the main tool to evaluate an integral. Indeed, this is the reason for the similarity of the symbols used for the two of them (Remark 85, p. 420). In this section, we show why this is the case.
10.3.1 First fundamental theorem of calculus In order to accomplish the stated objective, consider the function f (x) defined in (a, b), as well as the function Fc (x) defined by / x Fc (x) = f (u) du, (10.37) c
where x ∈ (a, b) is varying, whereas c ∈ (a, b) is a fixed number, arbitrarily prescribed. In other words, Fc (x) denotes the integral of f (x) seen as a function of the upper limit of integration. [In the expression in Eq. 10.37, instead of u, we could have used any symbol except for x, since x here has a specific meaning, namely the value of the upper limit of integration.] ◦ Comment. By replacing the upper limit of the integral from b to x, we have replaced a functional with an operator.
423
10. Integral calculus
Then, we have the following Theorem 120 (First fundamental theorem of calculus). Assume the function f (x) to be continuous in the bounded closed interval [a, b]. We have that the derivative of Fc (x) in Eq. 10.37 coincides with f (x), namely with the integrand evaluated at the upper limit of integration, that is, / x dFc d = f (u) du = f (x). (10.38) dx dx c In other words, the function Fc (x) is a primitive of f (x). ◦ Proof : Indeed, we have 1 dFc = lim h→0 h dx 1 h→0 h
/ /
x+h
c x+h
= lim
f (u) du −
/
x
f (u) du
(10.39)
c
f (u) du = f (x), x
in agreement with Eq. 10.38. [Hint: For the first equality use the definition of derivative (Eq. 9.2), for the second the additivity of the integration intervals (Eq. 10.8), and for the third the corollary to the mean value theorem in integral calculus (Eq. 10.13).]
Fig. 10.3 First fundamental theorem
A geometrical illustration of the theorem is provided in Fig. 10.3. The dark gray area corresponds to the integral in the third term of Eq. 10.39, namely the difference between the two integrals in the second term. For h adequately small, this area is approximately equal to that of a rectangle with height equal to y = f (x) and base length equal to h (the smaller is h the truer this is). Dividing the area of this rectangle by its base length h (as indicated in the third term of Eq. 10.39), one obtains the height, namely the value of the function at the upper limit of integration, y = f (x).
424
Part II. Calculus and dynamics of a particle in one dimension
10.3.2 Second fundamental theorem of calculus Theorem 120 above is crucial in integral calculus, in that it provides the paramount link between primitives (antiderivatives) and integrals. Note that, Fc (x) being a primitive, Eq. 10.28 implies that any other primitive differs from Fc (x) by a constant (Theorem 119, p. 420) F (x) = Fc (x) + C,
(10.40)
where C is an arbitrary constant. Then, we have the following Theorem 121 (Second fundamental theorem of calculus). Let F (x) denote any primitive of the continuous f (x), continuous in [a, b]. Then / a
b
b
f (u) du = F (b) − F (a) =: F (x) ,
(10.41)
a
where we have introduced the standard notation
b
g(x) := g(b) − g(a),
(10.42)
a
b valid for any function g(x). The notation g(x) a is also used. ◦ Proof : Indeed, recalling Eqs. 10.8 and 10.10, and 10.37, we have b
/
b
/ f (u) du =
a
c
f (u) du −
a
/
f (u) du = Fc (b) − Fc (a),
(10.43)
c
in agreement with Eq. 10.41, because F (x) = Fc (x) + C (Eq. 10.40).
Accordingly, we do not need to evaluate Fc (x). As long as we know any primitive of f (x), we can easily evaluate its integral from a to b.
• Relating the two (differential and integral) mean value theorems We are now in a position to relate the mean value theorem in integral calculus (Eq. 10.11) to the mean value theorem in differential calculus (Eq. 9.84). Let F (x) be a primitive of f (x). Then, we have b
/ a
f (x) dx = F (b) − F (a) = (b − a) F (ξ) = (b − a) f (ξ),
(10.44)
425
10. Integral calculus
for some ξ ∈ (a, b), in agreement with Eq. 10.11. [Hint: For the first equality, use Eq. 10.41. For the second one, use the mean value theorem in differential calculus, namely F (b) − F (a) = (b − a) F (ξ) (Eq. 9.84).]
• Illustrative examples A few illustrative examples? Why not! Let us start with a simple one. Using Eq. 10.30, we have b
/ a
b xn+1
1 n+1 b x dx = = − an+1
n+1 a n+1 n
(n = −1)
(10.45)
(provided of course that 0 ∈ / (a, b), whenever n = −2, −3, . . . ). If α is not an integer, assuming a and b to be positive (see Remark 53, p. 243), we have (use Eq. 10.31) /
b
a
b xα+1
1 α+1 b x dx = = − aα+1 .
α+1 a α+1 α
(10.46)
We also have /
kπ/2
cos2 x dx =
−kπ/2
1 2
/
kπ/2
1 + cos 2x dx
−kπ/2
*kπ/2 1 kπ 1) x + sin 2x . = = 2 2 2 −kπ/2
(10.47)
0 [Hint: Use cos2 x = 12 (1+cos 2x) (Eq. 7.63) for the first equality, cos xdx = sin x (Eq. 10.33) for the second, and sin(±kπ) = 0 (Eq. 6.84) for the third.] Similarly, we have /
kπ/2 −kπ/2
sin2 x dx =
1 2
/
kπ/2
1 − cos 2x dx
−kπ/2
*kπ/2 1 kπ 1) x − sin 2x . = = 2 2 2 −kπ/2
(10.48)
◦ Comment. For an easy way to remember Eqs. 10.47 and 10.48, note that the average value of the integrand is 12 (Eqs. 7.62 and 7.63), whereas the length of the interval of integration is kπ. Furthermore, using cos03 x = 34 cos x + 14 cos 3x (Eq. 7.66), as well as the first in Eq. 10.33 (namely cos nx dx = (1/n) sin nx), one obtains
426
Part II. Calculus and dynamics of a particle in one dimension π/2
/
1 4
cos3 x dx =
0
π/2
/
3 cos x + cos 3x dx
0
*π/2 1 2 1 1) 3 sin x + sin 3x = (3 − 1/3) = . = 4 3 4 3 0
(10.49)
Similarly, using sin03 x = 34 sin x − 14 sin 3x (Eq. 7.68), as well as the second in Eq. 10.33 (namely sin nx dx = −(1/n) cos nx), one obtains π/2
/
1 4
sin3 x dx =
0
π/2
/
3 sin x − sin 3x dx
0
*π/2 1 2 1 1) = (3 − 1/3) = . = − 3 cos x − cos 3x 4 3 4 3 0
(10.50)
◦ Comment. The fact that we obtain the same result in Eqs. 10.47 and 10.48 (and again in Eqs. 10.49 and 10.50) should not surprise you. Indeed, in the interval [0, π/2], the graphs of sin x and cos x are obtained from each other by flipping the abscissas around the midpoint of the interval. [See Fig. 6.11, p. 228, or use cos (π/2 − x) = sin x (Eq. 6.71).] Hence, the areas underneath the graphs are equal. Finally, we have π / π − cos kx sin kx dx = = 0, (10.51) k −π −π because cos kπ = cos(−kπ). [This result is expected because the integrand is an odd function (use Eq. 10.22).] Of course, we also have /
2π
sin kx dx = 0,
(10.52)
0
because of the periodicity of the integrand. Similarly, we have π / π sin kx cos kx dx = = 0, k −π −π
(10.53)
because sin(±kπ) = 0. [Indeed, in the last three equations, the number of positive half–waves, equals that of the negative ones, which therefore offset each other.] On the other hand, we have /
π/2
−π/2
whereas
0 π/2
−π/2
π/2 cos x dx = sin x −π/2 = 2,
π/2 sin x dx = − cos x −π/2 = 0 (odd function again).
(10.54)
10. Integral calculus
427
10.4 Rules for evaluating integrals In this section, we present some techniques for evaluating integrals, which requires finding the primitive of a given function. ◦ Warning. From now on, the processes of evaluating an integral will be considered accomplished if we find the primitive of the integrand. ◦ Comment. The rules for integration are not as systematic as the rules for differentiation. Indeed, at times the primitive of the function that we wish to integrate is not on the list of known functions. Sometimes this might even yield the definition of a new function, as in the case of the logarithm introduced in Section 12.1. Nonetheless, there are some procedures that might make it easier to perform the task. One rule we have already seen consists in exploiting the linearity of the integral operator, which allows us to break an integral into simpler ones. A second one is based upon the rule for changing the dummy variable of integration (Eq. 10.19), and is addressed in Subsection 10.4.1. The third one is the so–called integration by parts and is addressed in Subsection 10.4.2. ◦ Comment. Here, these rules are only briefly outlined, in a somewhat cursory fashion. Some applications are provided. [As mentioned in the Preface, the objective of this book is to introduce you to the understanding of the worlds of mathematics and mechanics, without any pretense of making you self–sufficient in the material covered. At this point, I do not expect you to learn how to apply the rules on your own, although you probably will by the time you will have finished reading this book. For instance, I fully understood the material in the course Mathematical Analysis I only when I had Mathematical Analysis II.]
10.4.1 Integration by substitution Consider the rule of integration by substitution (namely the transformation in Eq. 10.19), which gives us the rule we have to follow if we replace the dummy variable of integration. This rule may be used to transform an integral that we don’t know how to evaluate into one that we do. This use of Eq. 10.19 is known as the technique of integration by substitution.
428
Part II. Calculus and dynamics of a particle in one dimension
• Area inside an ellipse As an illustrative example, consider an ellipse described by (x/a)2 + (y/b)2 = 1 (Eq.7.84). Its area is equal to twice that under the graph of the function y = b 1 − (x/a)2 , with x ∈ [−a, a] (Eq. 7.85), and is given by / a A = 2b 1 − (x/a)2 dx. (10.55) −a
Consider the substitution x = g(u) = a sin u. Then, using the rule of integration by substitution (Eq. 10.19), we have / A = 2ab
π/2
−π/2
1 − sin u cos u du = 2 a b 2
/
π/2
cos2 u du = π a b, (10.56)
−π/2
as you may verify. [Hint: For the first equality, use dx = g (u) du = a cos u du (Eqs. 9.70 and 10.17), as well as sin(±π/2) = ±1 (Eq. 6.78). For the second equality, use cos2 u + sin2 u = 1 (Eq. 6.75). For the third equality, use Eq. 10.47.] If a = b = R, we recover the area of a disk of radius R, namely A = πR2 , in agreement with Eq. 6.90.
• Another illustrative example Equation 10.19 may also be used in the other direction. For instance, consider I=
π/2
/
sinα u cos u du
(α ≥ 0).
(10.57)
0
[For α ∈ (−1, 0), see Subsection 10.7.1 (Eq. 10.105).] Consider the substitution x = g(u) = sin u. Then, using Eq. 10.19, we have
1 / 1 xα+1
1 α I= (10.58) x dx =
= α + 1. α + 1 0 0 [Hint: For the first equality, use dx = g (u) du = cos u du (Eq. 9.70) as well as sin(0) = 0 and sin(π/2) = 1 (Eqs. 6.77 and 6.78). For the second equality, use Eq. 10.46.]
429
10. Integral calculus
10.4.2 Integration by parts A powerful and widely used technique for evaluating integrals is the integration by parts. This is based upon the use of the product rule for derivatives, namely [f (x) g(x)] = f (x) g(x) + f (x) g (dx) (Eq. 9.18). This rule implies that / / / f (x) g (x) dx + f (x) g(x) dx = f (x) g(x) dx = f (x) g(x). (10.59) Hence, we have /
f (x) g (x) dx = f (x) g(x) −
/
f (x) g(x) dx.
(10.60)
The above equation, known as the rule of integration by parts, often used to obtain the primitive.
• Illustrative examples To illustrate the method, consider an application. Noting that (x sin x) = x cos x + sin x, and using Eq. 10.60 with f (x) = x and g(x) = sin x, and hence f (x) = 1 and g (x) = cos x, we have / / x cos x dx = x sin x − sin x dx = x sin x + cos x. (10.61) Let us verify the result. We have (x sin x + cos x) = x cos x + sin x − sin x = x cos x. You may follow the same procedure and obtain / x sin x dx = −x cos x + sin x,
(10.62)
(10.63)
as you may verify. [This approach may be used, iteratively, to find the primitives of xn sin x and xn cos x. You might like to try, at least for n = 2.] Another primitive that may be obtained by integration by parts is that of sinP-1 x. Indeed, we have, for |x| < 1, /
sinP-1 x dx = x sinP-1 x −
/
√
x dx = x sinP-1 x + 1 − x2 . 1 − x2
(10.64)
430
Part II. Calculus and dynamics of a particle in one dimension
[Hint: For the first equality, use Eq. 10.60 with f (x) = sinP-1 x and g(x) = x, √ so that f (x) = (sinP-1 x) = 1/ 1 − x2 (Eq. 9.73), and g (x) = 1. For the √ second equality, use Eq. 10.35 on the primitive of x/ 1 − x2 .] Let us verify the result. We have * ) x x −√ = sinP-1 x. (10.65) x sinP-1 x + 1 − x2 = sinP-1 x + √ 1 − x2 1 − x2 A similar approach may be used to find the primitive of cosP-1 x. [The primitive of tanP-1 x requires the use of logarithms and it will be addressed in Section 12.5.] More illustrative examples will be presented later, after the introduction of the logarithm and the exponential function (Section 12.5).
10.5 Applications of integrals and primitives Here, we present three interesting — and totally unrelated — applications of integrals and primitives, just to show their usefulness and the broadness of their applicability. Specifically, first we present the method of separation of variables for ordinary differential equations (Subsection 10.5.1), then we show that sin x and cos x are linearly independent (Subsection 10.5.2), and finally we obtain the integral expression for the length of a line described by the equation y = f (x) (Subsection 10.5.3, where we will address again the consistency of two definitions of a straight line, which have been discussed in Subsection 7.3.1).
10.5.1 Separation of variables
♣
Much of the book is devoted to the solution of linear problems, namely problems involving linear operators, for which there exist general methodologies to obtain the solution. Here, however, we consider a class of problems that are not governed by a linear operator, for which nonetheless the solution is easily obtained. Specifically, consider the problem dy f (x) = dx g(y)
[y(x0 ) = y0 ].
(10.66)
In general, this equation is nonlinear. However, if g(y) = 1/y, the equation is linear.
431
10. Integral calculus
Remark 86. The method of solution for these types of problems, illustrated below, is known as the method of separation of variables for a first–order ordinary differential equation. The name stems from the fact that the above equation is an example of a first–order ordinary differential equation. [This method is not to be confused with the method of separation of variables for partial differential equations, which will be addressed in Vol. III.] The objective here is to find the function y = y(x) that satisfies the differential equation as well as the condition y(x0 ) = y0 , as given in Eq. 10.66. The method consists in the following: multiply Eq. 10.66 by g(y) and integrate from x0 to x, to obtain / x / x dy g[y(x1 )] dx1 = f (x1 ) dx1 , (10.67) dx1 x0 x0 or, using the method of integration by substitution (set y = y(x) in the first integral, see Subsection 10.4.1, in particular, Eq. 10.19) / y / x g(y1 ) dy1 = f (x1 ) dx1 . (10.68) y0
x0
This yields G(y) − G(y0 ) = F (x) − F (x0 ), where F (x) and G(y) are the primitives of f (x) and g(y), respectively. Accordingly, the solution to the original problem (Eq. 10.66) is given by y(x) = G -1 F (x) − F (x0 ) + G(y0 ) , (10.69) provided of course that G(y) has an inverse.
• An illustrative example As an illustrative example, consider the problem dy = x (1 + y 2 ), dx
with y(0) = 0.
(10.70)
Dividing by 1 + y 2 , integrating, and using the condition y(0) = 0, we have (use Eq. 10.68) / y / x 1 x1 dx1 . (10.71) 2 dy1 = 0 1 + y1 0 Then, using (tan -1 y) = 1/(1 + y 2 ) (Eq. 9.75), Eq. 10.71 yields tan -1 y = or
1 2
x2 ,
432
Part II. Calculus and dynamics of a particle in one dimension
y = tan
x2 . 2
(10.72)
Let us verify that the result is correct. Using the Leibniz chain rule (Eq. 9.24), as well as (tan u) = 1 + tan2 u (Eq. 9.72), we have 2 dy d x2 2 x = tan = 1 + tan x = 1 + y 2 x. (10.73) dx dx 2 2 Thus, Eq. 10.70 is satisfied, since we also have y(0) = 0.
• Another illustrative example
♠
As another example, consider the following equation dy xα−1 = β−1 , dx y
with y(0) = y0 = 0,
(10.74)
where x ∈ (0, ∞), whereas α and β are positive real numbers. The above equation is of the type in Eq. 10.66, with f (x) = xα−1 , g(y) = y β−1 , and x0 = y0 = 0. In this case, Eq. 10.68 yields / x / y β−1 y1 dy1 = xα−1 dx1 , (10.75) 1 0
0
or 1 β 1 y = xα . β α
(10.76)
Thus, the solution to the original problem is 1/β β y(x) = xα/β . α
(10.77)
Let us verify that this expression is the solution to Eq. 10.74. Indeed, the conditions at x = 0 (Eq. 10.74) are satisfied. In addition, one obtains (1/β)−1 (1−β)/β β β (α/β)−1 x = x(α−1)+(1−β) α/β α α 1/β 1−β β = xα−1 xα/β = xα−1 /y β−1 , (10.78) α
dy = dx
433
10. Integral calculus
in agreement with Eq. 10.74. [Hint: Use dxγ /dx = γ xγ−1 (Eq. 9.39) for the first equality, as well as (α/β) − 1 = (α − 1) + (1 − β) α/β for the second, along with basic arithmetic for the third and Eq. 10.77 for the fourth.] In particular, for α = β = 12 , Eq. 10.74 reduces to dy = y/x, dx
with y(0) = y0 = 0.
(10.79)
Equation 10.77 yields that the solution is y = x, as you may verify. [Hint: We have dy/dx = 1, as well as y/x = 1.]
10.5.2 Linear independence of sin x and cos x Here, just to show you how powerful are the tools we have acquired thus far, let us consider the following. Theorem 122. The trigonometric functions sin x and cos x, with x ∈ [−π, π] are linearly independent. ◦ Proof : The definition of linear independence of functions (Definition 116, p. 213) states that sin x and cos x are linearly independent iff the fact that A sin x + B cos x = 0,
(10.80)
for any x ∈ [−π, π], yields A = B = 0. To show that this is the case, let us multiply Eq. 10.80 by sin x and integrate from −π to π. Note that the integral of a positive integrand is positive, and hence / π sin2 x dx > 0. (10.81) −π
In addition, we have /
π
cos x sin x dx = 0,
(10.82)
−π
because sin x is odd and cos x is even (Eq. 6.70), so that the integrand is odd (Theorem 62, p. 211, on the product of even and odd functions), and integrals of odd functions over a symmetric interval vanish (Eq. 10.22). Equations 10.80, 10.81 and 10.82 imply that A = 0. Similarly, multiplying Eq. 10.80 by
434
Part II. Calculus and dynamics of a particle in one dimension
cos x and integrating from −π to π, we obtain that B = 0, as you may verify. Hence, sin x and cos x are linearly independent. ◦ Comment. The same approach may be use to show that the functions sin hx (h = 1, 2, . . . ) and cos kx (k = 0, 1, . . . ) form a set of mutually linearly independent functions, as you may verify. [This fact will be further addressed (and exploited) in Vol. II, in connection with the Fourier series,]
10.5.3 Length of a line. Arclength Consider a line L connecting the points A and B. Assume that L is represented in a single–valued explicit form as y = y(x) (Fig. 10.4). [Otherwise, subdivide the line into portions where the function is represented in a single– valued explicit form.]
Fig. 10.4 Length of line
The length of an infinitesimal line element ds is given by . 2 ds = dx2 + dy 2 = 1 + dy/dx dx.
(10.83)
Hence, the length of the line between A and B is given by / x . B 2 = 1 + dy/dx dx.
(10.84)
xA
Then, we may introduce the following Definition 201 (Arclength along a line). The length of a line from (x0 , y0 ) to (x1 , y1 ) is called its arclength
435
10. Integral calculus
/
x1
s=
.
2 1 + dy/dx dx.
(10.85)
x0
◦ Comment. This is an extension to a line in two dimensions of the notion of arclength of a circle introduced in Definition 84, p. 177, s = Rθ. Assume for √ simplicity R = 1. Then, for the upper unit semicircle, we have y(x) = 1 − x2 √ √ 0 -1 2 2 and hence y (x) = −x/ 1 − x . Thus, using (1/ 1 − x ) dx = − cosP x (Eq. 10.34) and x = cos θ yields / 1 / 11 1 x21 1+ dx = dx1 = cosP-1 x = θ. (10.86) s= 1 2 1 − x1 1 − x21 x x For a circle of radius R, we obtain s = Rθ (Eq. 5.1), as you may verify.
• Straight lines again
♣
In Definition 80, p. 176, we introduced a straight line as the extrapolation of a segment, which is the shortest line between two points (Euclidean geometry). Then, in Definition 139, p. 262, we identified a straight line as a line with constant slope (analytic geometry). In Subsection 7.3.1, we reconciled the two definitions. We also gave a gut–level proof, as follows. Given two points A and B, we chose the x-axis so as to have yA = yB = 0, and argued that a constant–slope line is indeed the shortest line between two points. I also promised that I would address the issue in greater depth as the appropriate tools became available. Here, we are in a position to address the problem properly. Let us choose again the x-axis to go through the points A and B, so as to have yA = yB = 0. Again assume the line to be described by a single–valued function, say y = f (x). Using Eq. 10.85, the length of the line is given by / x . B 2 = 1 + df /dx dx ≥ xB − xA. (10.87) xA
The minimum is clearly attained when df /dx = 0, namely for f (x) = 0 for all x ∈ [xA, xB ], to yield = xB −xA. Thus, the portion of the x-axis connecting A and B is a line of minimum length (segment). If the x-axis is not through the point A and B, we obtain a line with a constant slope. [Use the formulation in Section 7.1, on rigid–body transformations (Eq. 7.5).] Thus, a constant–slope line is indeed the shortest line between two points (straight line). ◦ Comment. As in Subsection 7.3.1, for simplicity we have assumed that the line L may be represented by a single–valued function. If y = f (x) is a multi–valued function, namely if the line “bends backwards,” the situation
436
Part II. Calculus and dynamics of a particle in one dimension
Fig. 10.5 Length of line
is only slightly more complicated. To this end, let us consider a branch of y = f (x), say y = f˘(x), with f˘(x) single–valued, albeit discontinuous at x = xD (Fig. 10.5). The above results apply to y = f˘(x), and hence are all the more valid for y = f (x), as you may verify.
10.6 Derivative of an integral According to the first fundamental theorem of calculus (Eq. 10.38), we have 0x that the x-derivative of c f (u) du (namely an integral considered as a function of x solely through its upper limit of integration) coincides with f (x), namely the integrand evaluated at u = x. Here, we want to extend such a result and show how to evaluate the x-derivative of an integral of the type I(x) =
/
b(x)
f (u, x) du,
(10.88)
a(x)
namely an integral that depends upon the variable of differentiation x in three different ways: 1. The upper limit of integration is a function of x, namely b = b(x). 2. The lower limit of integration is a function of x, namely a = a(x). 3. The integrand itself is a function of x, namely f = f (u, x). In this section, we examine the three cases separately, and then extend the results to the general case, where all three possibilities occur at the same time.
437
10. Integral calculus
10.6.1 The cases a = a(x) and b = b(x) Consider first the case in which the integral is a function of x through its upper limit of integration, namely b = b(x). Assume b(x) to be differentiable. Then, we have d dx
b(x)
/
f (u) du = a
d db
b(x)
/ a
db db = f [b(x)] . f (u) du dx dx
(10.89)
[Hint: Use the Leibniz chain rule (Eq. 9.24), as well as the first fundamental theorem of calculus (Eq. 10.38).] In plain words, if the upper limit of integration (and only the upper limit of integration) is a function of x, then the x-derivative of the integral equals the integrand evaluated at the upper limit of integration, multiplied by the x-derivative of the upper limit of integration. Next, assume that the integral is a function of x only through its lower limit of integration a = a(x), also differentiable. In this case, Eq. 10.89 yields (use Eq. 10.10 for the first equality) d dx
/
b
f (u) du = −
a(x)
d dx
a(x)
/
f (u) du = −f [a(x)]
b
da . dx
(10.90)
10.6.2 The case f = f (u, x); the Leibniz rule Next assume that the integrand itself is a function not only of u, but also of x, namely f = f (u, x). In this case, we want to show that, under suitable conditions (see Theorem 123 below), we have d dx
b
/
b
/ f (u, x) du =
a
a
∂ f (u, x) du, ∂x
(10.91)
where ∂/∂x denotes the partial derivative with respect to x (Eq. 9.106). The above equation is known as the Leibniz rule for differentiating under the integral sign. [Note that the symbol d/dx outside the integral turned into ∂/∂x inside the integral, because the integrand is a function of x and u, whereas the integral is only a function of x.] At a gut level, we can say that
438
Part II. Calculus and dynamics of a particle in one dimension
d dx
/
b
/ b / 1 1 b f (u, x + h) du − f (u, x) du h→0 h a h a a / b 1 f (u, x + h) − f (u, x) du, (10.92) = lim h→0 h a f (u, x) du = lim
in agreement with Eq. 10.91.
• Being as picky as a good mathematician should be
♠
However, the integral has been defined as a limit. Thus, in the expression above, we have interchanged the order of the limits and this is not necessarily legitimate. [Indeed, there are cases where Eq. 10.91 does not apply. For a counterexample, see Rudin, Ref. [56], p. 242–243, Exercise 28.] Therefore, if we want to be rigorous, we have to proceed more cautiously, so as to identify under what conditions Eq. 10.91 is valid. To this end, let us begin with the following Lemma 12 (Interchangeability of limit and integral). Assume that there exists an h such that f (u, x) is continuous in the closed rectangular region R of the (u, x) plane, defined by u ∈ [a, b] and x ∈ [c − h, c + h]. Then, b
/ lim
x→c
b
/ f (u, x) du =
a
f (u, c) du.
(10.93)
a
◦ Proof : The continuity of f (u, x) in the closed region R implies its uniform continuity (Theorem 90, p. 356). This in turn means that for all u ∈ [a, b] and any ε > 0 arbitrarily small, there exists a δ ∈ (0, h) such that
f (u, x) − f (u, c)
0 tends to zero, exists and equals zero (Eq. 8.71). Hence, by definition of improper integral of the first kind, we have / 1 1 √ dx = 2. I = lim (10.102) ε→0 ε x To address a more general case, consider f (x) = xα , with α < 0 and I=
/
b
xα dx
α = −1 .
(10.103)
0
[Note that α < 0 implies that xα tends to infinity as x tends to zero.] Recalling that, for α = −1, the primitive of xα is xα+1 /(α + 1) (Eq. 10.31), we have b
/ ε
xα dx =
* 1 ) α+1 b − εα+1 α+1
α = −1 .
(10.104)
For α ∈ (−1, 0), the limit exists, since in this case α + 1 > 0, and hence εα+1 goes to zero with ε > 0 (Eq. 8.71). Therefore, we have
442
Part II. Calculus and dynamics of a particle in one dimension b
/ 0
xα dx := lim
b
/
ε→0
xα dx =
ε
b α+1 α+1
α ∈ (−1, 0) .
(10.105)
On the other hand, for α < −1, the limit tends to infinity and the integral does not exist.
• An illustrative example for an essential discontinuity
♥
Here, we present an illustrative example for an integrand with an essential discontinuity. Let us consider the function g(x) := 2 x sin
1 1 − cos , x x
(10.106)
which we have encountered in Eq. 9.104, as the derivative of the continuous function fC(x) = x2 sin(1/x) for x = 0 (Eq. 8.108). Applying the above definition, one obtains 2/π
/ 0
2/π *2/π ) 4
g(x) dx := lim fC(x) = lim x2 sin(1/x) = 2. ε→0 ε→0 π ε ε
10.7.2 Improper integrals of the second kind
(10.107)
♣
In this subsection, we address the second of the possible exceptions introduced at the beginning of the section, namely the case in which the interval is unbounded, that is, a = −∞ and/or b = ∞. Let us assume the interval of integration to be [a, ∞). [The other cases are similar.] In this case, the definition of the integral is extended as follows: ∞
/
b
/ f (x) dx := lim
b→∞
a
f (x) dx,
(10.108)
a
provided of course that the limit exists. Otherwise, we say that the integral does not exist.
• An illustrative example
♥
As an illustrative example, consider / ∞ 1 √ dx. I= x 1
(10.109)
443
10. Integral calculus
√ √ Again, the primitive of 1/ x is 2 x (Eq. 10.32). Hence, we have b
/ 1
√ 1 √ dx = 2 b − 2. x
(10.110)
We have that limb→∞ Ib = ∞. Thus, we say that the integral in Eq. 10.109 does not exist. To address a more general case, consider again f (x) = xα , and / ∞ I= xα dx α = −1 . (10.111) 1
We have /
b
xα dx =
1
1 α+1 b −1 α+1
α = −1 .
(10.112)
For α + 1 < 0, we have that the limit as b goes to infinity exists since, in this case, bα+1 goes to zero (Eq. 8.71). Thus, we have I = lim
b→∞
b
/ 1
xα dx =
−1 α+1
α < −1 .
(10.113)
On the other hand, for α > −1, Ib tends to infinity with b (Eq. 8.71 again), and the integral does not exist.
10.7.3 Riemann integrable functions
♣
We have the following refinement of Definition 199, p. 413: Definition 202 (Riemann integrable function). A function f (x) that satisfies the requirements given above for improper integrals of the first and second kind will also be referred to as Riemann integrable. [See Definition 199, p. 413.] The integrals introduced in this section will also be referred to as the Riemann integrals, like that in Eq. 10.4. [These include the functions discussed in Remark 88, p. 440.]
10.8 Appendix B. Darboux integral
♣
Here, we present an alternate definition of integrals, which allows us to prove that the sequence in Eq. 10.4 indeed converges, provided that the function
444
Part II. Calculus and dynamics of a particle in one dimension
is piecewise–continuous in [a, b]. This definition of integral generalizes the Riemann integral and is known as the Darboux integral.4 Specifically, let f (x) denote a bounded function in the bounded closed interval [a, b]. By definition of Darboux integral, the increase of the number of subintervals is obtained by dividing the interval [a, b] into two equal–size subintervals, then into four, and so on. In other words, by going from p to p + 1, the number of equal–size subintervals is doubled. After p steps, we have n = 2p equal–size subintervals [xk−1 , xk ], with xk = a + k Δx, where Δxp = (b − a)/n, and k = 0, 1, . . . , n, so that x0 = a and xn = b. Let fkSup and fkInf denote respectively the supremum and infimum of f (x) in [xk−1 , xk ], whose existence is guaranteed (Remark 68, p. 338). Then, let us consider the sequences p
Inf
sp =
2 #
p
Inf
fk Δx
and
k=1
Sup
sp
=
2 #
fkSup Δx.
(10.114)
k=1
Sup Note that sInf p ≤ sp . Also, the fact that, by going from p to p + 1, each element is divided into two equal subelements guarantees that Sup Inf Sup sInf p ≤ sp+1 ≤ · · · ≤ sp+1 ≤ sp .
(10.115) In other words, sInf is a bounded non–decreasing sequence, whereas sSup p p is a bounded non–increasing sequence. Thus, according to Theorem 74, p. 309, both sequences converge, and we can define p
sInf := lim
p→∞
2 #
p
Inf
fk Δx
k=1
and
sSup := lim
p→∞
2 #
fkSup Δx.
(10.116)
k=1
Then, we have the following Definition 203 (Darboux integral of a function). If sInf and sSup (defined in Eq. 10.116) coincide, the expression b
/
f (x) dx := sInf = sSup
(10.117)
a
is called the Darboux integral of the function f (x) between a and b. 4
Named after the French mathematician Jean–Gaston Darboux (1842–1917).
445
10. Integral calculus
• Convergence for continuous functions You might wonder whether the Riemann integral (Eq. 10.4) and the Darboux integral (Eq. 10.117) coincide whenever the function f (x) is continuous in the bounded closed interval [a, b]. [Incidentally, in this case, fkSup and fkInf are replaced by fkMax and fkMin , whose existence is guaranteed by the Weierstrass theorem for continuous functions (Theorem 84, p. 341).] To ascertain this, we have the following Theorem 125. If the function f (x) is continuous in the bounded closed interval [a, b], then sInf and sSup are equal and coincide with the definition given for the Riemann integral (Eq. 10.4), provided that this is also evaluated with the interval subdivision procedure introduced in Eq. 10.114. ◦ Proof : We have p
2 #
p
f (˘ xk ) − f k
Min
Δx ≤ max f (˘ xk ) − f k
Min
2 #
k
k=1
Δx
k=1
xk ) − fkMin , = b − a max f (˘
(10.118)
k
where the symbol maxk [ak ] denotes the maximum of the values a1 , . . . , an (Definition 164, p. 346). Note that the function f(x) is continuous in [a, b] by hypothesis. Accordingly, we obtain that maxk f (˘ xk ) − fkMin = o[1], as Δx tends to zero (Eq. 8.95). Thus, p
lim
p→∞
2 #
p
f (˘ xk ) Δx = sMin := lim
p→∞
k=1
2 #
fkMin Δx.
(10.119)
fkMax Δx.
(10.120)
k=1
Similarly, you may prove that p
lim
p→∞
2 # k=1
p
f (˘ xk ) Δx = sMax := lim
p→∞
2 # k=1
Thus, we have that sMin and sMax coincide and are equal to the Riemann integral.
• Convergence for piecewise–continuous functions Next, let us apply the Darboux definition of integrals to piecewise–continuous functions, a problem that was addressed in Subsection 10.1.2. In this case,
446
Part II. Calculus and dynamics of a particle in one dimension
the (bounded) values of fkMin and fkMax at the finite number of discontinuity points (where the difference between fkMin and fkMax does not tend to zero) are multiplied by a vanishing Δx. Therefore, (sMax − sMin) tends to zero, and hence, with the Darboux definition, integrals of piecewise–continuous functions exist as well. [The approach used here will be useful in extending the formulation to multidimensional integrals in Chapter 19.]
Chapter 11
Rectilinear single–particle dynamics
At last, here we get to talk a bit more about classical mechanics (after statics in Chapter 4), still at a very elementary level though, specifically for a single particle moving in a rectilinear motion, namely a motion that occurs along a straight line.1
• Overview of this chapter As stated in Remark 36, p. 143, the field of classical mechanics can be broken down into three branches, namely: (i) statics, which deals with the equilibrium of objects that do not move; (ii) kinematics, which deals with the motion of objects, independently of the forces that cause it, and (iii) dynamics, which deals with the relationship between motion and forces. In Chapter 4, we had just a very first glimpse of mechanics, namely statics. Here, after a brief detour on kinematics, we begin to look at the real stuff: we extend to dynamics the know–how acquired in the field of statics. As apparent from the title, in this chapter we deal primarily with a single particle in rectilinear motion. [The only exception is in Subsubsection “Elastic collision between two particles,” p. 488.] For, the notions of mass and acceleration are more easily introduced within this context. Specifically, in Section 11.1 we deal with kinematics. This includes in particular the relationship between the location of a point as a function of time on one hand, and its velocity and acceleration on the other. Dynamics begins in Section 11.2, where we introduce the Newton second law for rectilinear motion, which states that, in an inertial frame of reference (defined in Subsection 11.2.1), the acceleration of a particle is proportional to the force, with the constant of proportionality being the mass of the particle. In the two sec1
The term rectilinear stems from recta linea (Latin for “straight line”).
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_11
447
448
Part II. Calculus and dynamics of a particle in one dimension
tions that follow, we consider the solutions of particularly simple problems in dynamics, namely a single particle not subject to any force (Section 11.4), and a single particle subject to a constant force (Section 11.5). Then, we present the formulation for a so–called undamped (or simple) harmonic oscillator, which consists of a particle subject to a restoring force proportional to the displacement (linear mass–spring system). In the absence of additional forces, such a system oscillates without damping. In particular, we address undamped harmonic oscillators for forces of a different type, namely 1. Free oscillations, namely oscillations in the absence of additional forces (Section 11.6). 2. Forced oscillations with periodic forcing, namely with forces that vary periodically, in particular sinusoidally, in time (Section 11.7). 3. Forced oscillations with arbitrary forcing (Section 11.8). We conclude the chapter with Section 11.9, which is devoted to mechanical energy. This includes the definitions of kinetic energy and work (along with the specific case of conservative forces and the corresponding potential energy), as well as a method of solution via energy considerations.
11.1 Kinematics In this section, we address the field of kinematics, again for the limited case of a single particle in rectilinear motion. Specifically, in this section, we introduce the notions of rectilinear motion, velocity, acceleration, and frame of reference. We also present an illustrative example.
11.1.1 Rectilinear motion, velocity and acceleration Let us begin by introducing a formal definition of velocity, a notion already used several times above, as an everyday term. For instance, at the end of Subsection 9.5.2, we used the notion of average velocity (namely, if you cover 50 miles in an hour, your average speed over that hour is 50 mph). To arrive at a formal definition, let us consider shorter and shorter time–lapses, so as to have an average velocity that is more and more “local.” Accordingly, it makes sense to define the instantaneous velocity to be the limit — as the time–lapse Δt tends to zero — of the distance Δx covered, divided by the time Δt required. Thus, we have the following
11. Rectilinear single–particle dynamics
449
Definition 204 (Velocity in one–dimensional motion). Let x = x(t) denote the location of a point at time t. Then, the velocity, v(t), is the time derivative of the function x(t) Δx dx = = x(t), ˙ Δt→0 Δt dt
v(t) := lim
(11.1)
where we used the notation introduced in Subsection 9.1.6, namely u˙ = du/dt (Eq. 9.34). [Note that the value of the velocity depends upon the frame of reference, as discussed in Subsection 11.1.3.] ◦ Warning. It should be emphasized that locations and velocities may be positive or negative depending upon their orientation and the convention that we choose for the positive orientation. Thus, in order for Eq. 11.1 to be valid, it is essential that the orientation chosen for the positive velocity be the same as that for the positive location. In other words, once we decide the positive direction for the x-axis, Eq. 11.1 yields that the velocity is positive if the particle moves in the positive direction for the x-axis. Remark 89. Also, as we will see in Section 15.1, in three–dimensional motion the velocity is a vector quantity (namely representable by an arrow). The speed is its amplitude with the appropriate sign (for instance, it is convenient to assign a negative value to the speed, when your car is in reverse). In one–dimensional motion, the terms speed and velocity are interchangeable. The difference will be clearer when we address the three–dimensional case (Definition 324, p. 793). Next, consider the acceleration. As you already know, the acceleration measures the rate of change of the velocity. The average acceleration over a certain time is the change in velocity divided by the time required. Again, if we consider shorter and shorter time lapses, we obtain an average acceleration that is more and more “local.” Thus, it makes sense to define the (instantaneous) acceleration as the limit of the change in velocity divided by the time Δt it takes to obtain the change, as Δt tends to zero. Thus, we have the following Definition 205 (Acceleration in one–dimensional motion). The acceleration, a(t), is the time derivative of the velocity, namely (use u ¨ = d2 u/dt2 , Eq. 9.35) Δv dv d2 x = = 2 =x ¨(t). Δt→0 Δt dt dt
a(t) := lim
(11.2)
◦ Warning. It should be emphasized that, akin to locations and velocities, acceleration may be positive or negative depending upon its direction. Again,
450
Part II. Calculus and dynamics of a particle in one dimension
in order for Eq. 11.2 to be valid, it is essential that the orientation chosen for the positive acceleration be the same as that for the positive velocity. In other words, once we have decided the positive direction for locations and velocity (namely that of the x-axis), Eq. 11.2 requires that the acceleration is considered positive iff it is itself directed like the x-axis (for instance, iff the velocity increases as the particle moves in the positive direction of the x-axis). ◦ My dream car. To give you an example, I used to tell my class that my dream car (one of those fancy Italian sports cars, a Ferrari Maranello), had pretty good “pickup” (acceleration). Indeed, it went from 0 to 100 km/h in about 4.8 seconds. [“I didn’t know you owned a Ferrari!” said facetiously my wife. “Yes, I own it in all of my dreams! That’s why it’s my dream car,” I replied. “Very funny!” said she, “Dream on!” My students knew I was referring to my dreams, as often they saw me passing them in the streets of Rome with my sub–sub–compact car, whenever I was running late for class.] As you might have guessed, I always had a passion for racing cars. Sure enough, I used to go racing at the Vallelunga racing track, near Rome. But that was with my Fiat 600, in the early sixties. No dream car then either! Note that 100 km/h = 100 · (1, 000/3, 600) m/s 27.777 m/s, since there are 1,000 meters in one kilometer and 3,600 seconds in one hour. Hence, going from 0 to 100 km/h in about 4.8 seconds means that the average acceleration is a (27.777/4.8) m/s2 5.787 m/s2 . To have a comparison, the acceleration due to gravity (namely the acceleration of a particle subject only to its own weight) is g 9.806 m/s2 , as we will see later (Definition 207, p. 462, and Eq. 11.27). Accordingly, we can say that the acceleration of my dream car is equal to about 0.590 g. A pretty good dream car, huh! Dreaming big is the secret to happiness in life! Remark 90. As we will see in Section 15.1, in three–dimensional motion, the acceleration is a vector quantity. One could say that the pickup of a car is the derivative of the speed. [In the one–dimensional motion, the terms pickup and acceleration are interchangeable. In three dimensions they are not, as we will see (Definition 324, p. 793).]
11.1.2 The one–hundred meter dash again
♥
In Section 6.1, after discussing functions and their graphs, as an illustrative example, we introduced a function corresponding to the speed of a runner of
11. Rectilinear single–particle dynamics
451
the one–hundred meters dash as a function of space (Eq. 6.6). Specifically, we assumed that, for the first ten meters, the runner’s speed may be approximated by v(x) = 10 1 − (1 − x/10)2 m/s, whereas after the first ten meters the (constant) speed is equal to 10 m/s. [In this subsection, distances are expressed in meters, time in seconds, and velocities in meters per second.] Here, just to show that even in kinematics one may encounter challenging problems, we discuss how to obtain the function v = v(t) from the function v = v(x). This is achieved as follows. Using the method of separation of variables for ordinary differential equations (Subsection 10.5.1), from dx/dt = v(x), with t0 = 0 and x(0) = 0, we have (use Eq. 10.68) / x / x 1 1 1 t(x) = dx1 = dx1 . (11.3) 10 0 1 − (1 − x1 /10)2 0 v(x1 ) To evaluate the integral in Eq. 11.3, note that −1 −1 x d -1 cosP 1 − = . 2 dx 10 1 − (1 − x/10) 10
(11.4)
[Hint: Use the chain rule as given in Eq. 9.64, as well as the derivative of cos -1 u (Eq. 9.74), and Remark 80, p. 383, on the related sign ambiguity.] Accordingly, we have *x ) x ∈ [0, 10] . (11.5) t(x) = cosP-1(1 − x1 /10) = cosP-1(1 − x/10) 0
For x = 10 meters, we have t = π/2 seconds. The above equation is equivalent to x(t) = 10 (1 − cos t) t ∈ [0, π/2] . (11.6) After that, the speed equals 10 m/s, and hence x(t) = 10 + 10 (t − π/2). Next, differentiating Eq. 11.6, we have v(t) = 10 sin t t ∈ [0, π/2] . (11.7) The functions x = x(t) and v = v(t) are depicted, respectively, in Figs. 11.1 and 11.2, for x ∈ [0, 10]. Let us verify that our result is correct. Indeed, combining Eqs. 11.5 and 11.7, so as to eliminate t, we have (for x ∈ [0, 10]) v(x) = 10 1 − cos2 t(x) = 10 1 − (1 − x/10)2 , (11.8) in agreement with the original function v(x), as given in Eq. 6.6. In addition, the acceleration is given by
452
Part II. Calculus and dynamics of a particle in one dimension
Fig. 11.1 Space vs time
Fig. 11.2 Velocity vs time
a = 10 cos t
t ∈ [0, π/2] ,
(11.9)
and a = 0 after that. Note that the acceleration at t = 0 is finite and equal to 10 m/s2 1 g. ◦ Comment. The fact that, in Fig. 6.5, p. 209, the slope is vertical stems from a=
dv dx dv dv = = v, dt dx dt dx
(11.10)
or dv/dx = a/v. Thus, for x = 0, where v = 0 and a = 10, we have that the slope of the graph of v(x) is vertical. Remark 91. As promised in Remark 48, p. 210, we can now evaluate the time used by this runner to cover the 100 meters: it takes him/her t0 = cos -1 0 = π/2 1.57 seconds to cover the first 10 m; then 9 more seconds are needed to cover the other 90 m at a speed of 10 m/s, for a total time of (π/2 + 9) s 10.57 s. Note that this time is excellent, although not quite a world record. [At the time of this writing, the men’s world record is below 9.60 m/s; the women’s world record is below 10.50 m/s.]
11.1.3 What is a frame of reference? Here, we clarify what a frame of reference is. For the man on the street, the term “frame of reference” indicates the point of perspective that gives us the proper context in which to examine a fact. In mathematics and mechanics, a frame of reference has a much more specific meaning. We have already explored the notion from a mathematical point of view: we have introduced
11. Rectilinear single–particle dynamics
453
frames of reference in analytic geometry, as determined by the Cartesian axes. Here, we present the notion of frames of reference for kinematics and dynamics, a notion barely touch upon in statics (Remark 38, p. 145). This notion is closely related to that in analytic geometry, but it is not quite the same, because now we have to establish the connection with time, an aspect missing in analytic geometry. The notion of frame of reference for kinematics and dynamics is better understood through an example. Consider a ship that is leaving a port, and assume that there are people on the boat playing a game of billiard. For our purpose, the billiard ball may be treated as a point. We want to evaluate its velocity. The velocity of the ball may be defined “with respect to an observer rigidly connect to the billiard table.” [This means that it is measured by using, for instance, a video camera rigidly connected to the billiard table.] Just to be specific, assume that the velocity VB measured turns out to be equal to 10 m/s (namely VB = 36 km/h, since there are 3,600 seconds in an hour and 1,000 meters in 1 kilometer). [This corresponds to 100 m in 10 s, the speed of our runner in the last 90 meters of the 100 m dash, close to the world record.] However, the velocity of the ball may be evaluated also with respect to a person on the dock that is observing the ship passing by. Let us assume, for simplicity, that the ball and the ship move in parallel rectilinear motions, and that the velocity VS of the ship is also equal to 10 m/s, and that the ball moves in the direction opposite to that of the ship. Then, the observer on the dock would say that the speed of the ball is VB − VS = (10 − 10) m/s = 0 m/s. In this case, as far as the observer on the dock is concerned, the ball is not moving at all. [On the other hand, were the ball to move in the same direction as the ship, the observer on the dock would say that the speed is equal to VB + VS = (10 + 10) m/s = 20 m/s.] Accordingly, we must indicate the observer with respect to which the velocity is measured, indeed the observer with respect to which the location is measured. In this chapter, we deal only with rectilinear motion. Typically, the origin of the frame of reference coincides with the location of the observer. For instance, to measure the motion of the billiard ball for an observer on the ship, the point in front of the video camera is an adequate choice for the origin of the frame of reference.
11.2 The Newton second law in one dimension In this section, we finally start to address dynamics. Our journey in the field of mechanics begins in earnest, at least for a single particle in rectilinear motion! Specifically, consider a particle P which is subject to a force (or sum
454
Part II. Calculus and dynamics of a particle in one dimension
of forces), acting along the same line at all times and assume that the initial velocity is also parallel to such a line. We have the following Principle 2 For the limited case stated above, the Newton law of motion, states that, in an inertial frame of reference (which will be defined in Subsection 11.2.1), the acceleration a of the particle P is proportional to the sum F of all the forces acting on P (all acting in the direction of the rectilinear motion). The coefficient of proportionality, m = F/a, is called the mass of the particle. In mathematical terms, we have m a = F.
(11.11)
Experimental evidence indicates that the mass of a particle is always positive, m > 0.
(11.12)
Even more important is the fact (also experimental) that the mass is a constant, specifically independent of time, location, velocity, acceleration and forces acting on the particle. Remark 92. It should be emphasized that the comment regarding the sign conventions for locations, velocities, and accelerations applies to forces as well. Specifically, in order for Eq. 11.11 to be valid, it is essential that the orientation chosen for positive forces be the same as that for the positive acceleration (which, in turn, we have assumed to coincide with that for positive locations and velocities). In other words, once we decide the positive direction for the x-axis, Eq. 11.11 tells us that F is positive iff it is pointed like the x-axis. As an example, consider a ball that is subject only to its weight and is being thrown upwards: if we assume displacements, velocities and accelerations to be positive upwards, then weight must be treated as a negative force. ◦ A point of clarification. The forces that we have encountered in statics (such as weight or spring forces) are not affected by the velocity or acceleration of the particle. To be specific, experimental evidence shows that the force due to a massless spring with a given elongation is independent of whether or not the particle is moving (or even accelerating). Similarly, assume that you have measured the weight of a particle when it is at rest. Then, at that location, the weight of the particle will be the same, independently of its motion. [It goes without saying that, when the velocity differs from zero, new forces may appear (such as aerodynamic/hydrodynamic lift and drag, or viscous friction), which are necessarily functions of the velocity v, since they vanish when v = 0.] The above considerations are important because the fact that we can measure the acceleration and the forces (at least some of them),
11. Rectilinear single–particle dynamics
455
independently of the velocity, makes it possible for us to evaluate the mass, through its definition, namely as the ratio F/a. Remark 93. In any case, none of the forces is a function of the acceleration of the particle.
11.2.1 What is an inertial frame of reference? By now, you should have complained! I have used the term inertial frame of reference, without even defining it. How remiss of me! Having already provided a description of what is meant by a frame of reference (Subsection 11.1.3), we can now ask ourselves: “What is an inertial frame of reference?” This is probably the most difficult mechanics question that I have to address in the entire book. The definition — a bit lengthy, albeit still quite cursory — is presented in this section. A fully satisfactory answer is provided by the general theory of relativity by Albert Einstein, which is way beyond the objective of this book. 2 So, to make our life simpler, let us start with the days of Aristotle.3 He had no problem whatsoever. To him, the so–called distant stars were fixed to a sphere that rotated around the Earth.4 Any motion was measured with respect to these stars. For centuries, the theories of Aristotle were accepted as an act of faith, especially during the dark ages: “Ipse dixit! ” (Latin for “He himself said it!”) was the expression used to cut off any discussion. In fact, things didn’t change that much for almost two millennia, until Galileo Galilei appeared on the scene, towards the end of the sixteenth century.5 On the basis of his 2
Albert Einstein (1879–1955) was a German–born theoretical physicist. He is a household name, especially because of the equation E = m c2 , included in the development of the special theory of relativity, a turning point in physics. In 1916 he extended the principle of relativity to gravitation (general theory of relativity). However, he made also major contributions outside the field of relativity, in particular in quantum mechanics. Indeed, he was awarded a Nobel Prize in 1921, not for the theory of relativity, but for the discovery of the photoelectric effect. He is often regarded as the father of modern physics. 3 Aristotle (384–322 BC) was a Greek philosopher who dealt with many aspects of human endeavor: metaphysics, logic, and ethics; physics and biology; politics and government; poetry, theater and music. [Incidentally, several of these terms were introduced by Aristotle himself. For instance, etymologically speaking, the term metaphysics (ancient Greek, for “after physics”) stems from the fact that, in his writings, the section on metaphysics came after that on physics.] 4 Things would have been easier if people had listened to Aristarchus of Samos (c. 310–c. 230 BC), an ancient Greek astronomer and mathematician. He was a Copernicus precursor, as he presented the first known heliocentric model. In his system, the distant stars were fixed. 5 Galileo Galilei (1564–1642) was an Italian mechanician, mathematician, astronomer and philosopher. He is considered by some the father of modern science. He improved the
456
Part II. Calculus and dynamics of a particle in one dimension
experimental evidence (the equations of motion had not yet been formulated — that came about with Newton, towards the end of the seventeenth century), he stated that it is not possible to distinguish between two frames of reference that are in uniform (namely constant speed) rectilinear motion one with respect to the other. This principle is now known as Galilean relativity. Thus, any frame of reference in uniform rectilinear motion with respect to the apparently fixed distant stars was also considered an acceptable frame of reference. The implications of the Galilean relativity are important: it means that m and F are the same in the two frames of reference. Indeed, a is the same in any two different inertial frames of reference, since the two speeds differ by a constant, which disappears upon differentiation. Thus, to introduce inertial frames of reference, we could start with any frame in uniform rectilinear motion with respect to the “distant stars.” However, nowadays we know that the “distant stars” are not fixed after all; we know that they are moving even with respect to each other (albeit imperceptibly, at least so it was with the information available in the days of Galileo). Thus, we have to choose one of them. Even so, we have no guarantee that a frame connected to one of the distant stars is an inertial frame of reference. On the contrary, they move with respect to each other not necessarily in uniform rectilinear motion. It is worth noting that Newton himself had a hard time with the issue, in his Principia (Ref. [49]). According to Penrose (Ref. [53], p. 388), he “was, at least initially, as much of a Galilean relativist as was Galileo himself. This is made clear from the fact that in his original formulation of his laws of motion, he explicitly stated the Galilean principle of relativity as a fundamental law (this being the principle that physical action should be blind to a change from one uniformly moving reference frame to another, the notion of time being absolute [...]).” telescope and used it to make important astronomical observations, such as the discovery of the four largest satellites of Jupiter (now known as the Galilean moons). Using inclined planes, he was able to obtain a good estimate of the acceleration of gravity (Definition 207, p. 462). His support of Copernicus’ heliocentrism (namely the fact that the Earth rotates around the Sun, and not vice versa) got him into big trouble — a major confrontation with the Church. He abjured his beliefs, but still spent the rest of his life under house arrest. As he was leaving the courtroom, immediately after the abjuration, he supposedly muttered “Eppur si muove” (Italian for “And yet it moves”), referring to the Earth that was then widely believed to be fixed. [For the sake of completeness, Nicolaus Copernicus (1473– 1543) was the Polish mathematician and astronomer, who introduced heliocentrism in the modern era, apparently independently of Aristarchus of Samos. He was also a physician, a classics scholar, and a politician.]
11. Rectilinear single–particle dynamics
457
Still according to Penrose (Ref. [53], p. 388), “He had originally proposed five (or six) laws, law 4 of which was indeed the Galilean principle,6 but later he simplified them, in his published Principia, to the three ‘Newton’s laws’ that we are now familiar with. For, he had realized that they were sufficient for deriving all the others. In order to make the framework for his laws precise, he needed to adopt an ‘absolute space’ with respect to which his motions were to be described.” Accordingly, I believe the best course of action, at least for the time being, is to settle for a non–rotating frame of reference rigidly connected to the center of the Milky Way galaxy, as well as any frame of reference in uniform rectilinear motion with respect to it. Although rejected (correctly, I must add) by Penrose (Ref. [53], p. 386-387), as being too inadequate for his objectives, this approach is more than acceptable for the kind of applications that are of interest in this book. ◦ Comment. In fact, for most of the applications presented here, even a frame of reference rigidly connected to the Earth is an adequate inertial frame (and so is any frame of reference in uniform rectilinear motion with respect to it). Remark 94. The best way to appreciate the reason for using inertial frames of reference is to discuss what happens when we don’t do this. Such an approach, premature at this point, is addressed in Chapter 22, where we discuss non–inertial frames of reference, which give rise to the so–called “apparent forces,” such as the centrifugal force and the Coriolis force. Just to give you an idea of what we are talking about, we have all experienced the sensation of a side force when our car is running in circles, or the sensation of a change in weight when the elevator is subject to a sudden start/stop. These apparent forces are a consequence of the fact that, in the car or in the elevator, we are in a non–inertial frame of reference and we formulate the problem in such a frame. These forces disappear if we formulate the same problem in an inertial frame of reference.
11.3 Momentum It is sometimes useful to see Eq. 11.11 from a different angle. Consider the following 6 “This was in his manuscript fragment De motu corporum in mediis regulariter cedentibus — a precursor of Principia written in 1684.” [Penrose, Ref. [53], Note 17.5, p. 410.]
458
Part II. Calculus and dynamics of a particle in one dimension
Definition 206 (Momentum). Consider a particle having mass m and velocity v. The quantity p = mv
(11.13)
is called the momentum of the particle. We have the following Theorem 126 (Theorem of momentum balance). Consider a particle that has mass m and is subject to a force F (x, t), or several forces that add up to F (x, t). We have that the difference between the momentum at time tb and that at time ta equals the time integral of the force between ta and tb : / tb F [x(t), t] dt, (11.14) p b − pa = ta
where pa = p(ta ) = mv(ta ) and pb = p(tb ) = mv(tb ). ◦ Proof : Integrating Eq. 11.11 over time from ta to tb , we have (use the fact that m is constant and that a = dv/dt, Eq. 11.2) /
tb
m ta
t dv dt = mv tb = a dt
which is fully equivalent to Eq. 11.14.
/
tb
F [x(t), t] dt,
(11.15)
ta
Remark 95 (Point of clarification). The integral of a force with respect to time (namely the right side of Eq. 11.14) is commonly referred to as the impulse. On the other hand, an impulsive function (or simply impulse) is a time function with infinitely large amplitude and infinitesimal duration. To make things more complicated, the solution for a system subject to an impulse function is called the impulse response. This may be confusing, because the term impulse in Eq. 11.14 does not imply infinitesimal duration, which is instead implied in the expression “impulse response.” In order to avoid generating unnecessary confusion, I will restrain myself and will not use the term impulse to refer to the integral in Eq. 11.14. In this book, an impulse describes an infinitely large force that has an infinitesimal duration, with a finite value of the integral over t (an idealization of the force that a cue stick exerts on a billiard ball, for example).
11. Rectilinear single–particle dynamics
459
11.4 A particle subject to no forces In this and in the four sections that follow, we present some illustrative examples. In these sections, the objective is to use the tools introduced in Chapters 9 and 10 to find the solutions to some simple problems in dynamics, namely to obtain the function x = x(t) that satisfies the equation governing the problem. Here, we consider the motion of a particle, or mass point, not subject to any force, or to several forces that add up to zero. [In dynamics, the term “mass point” is used to denote a material point (again a particle with very small dimensions), with a non–negligible mass.] The equation that governs the motion in this case is given by m a = 0 (Eq. 11.11 with F = 0), with a = dv/dt = d2 x/dt2 (Eq. 11.2), namely a=
d2 x dv = 2 = 0. dt dt
(11.16)
Integrating Eq. 11.16 between 0 and t, we have (recall that the primitive of the zero function is a constant, Theorem 118, p. 420) v(t) =
dx = constant = v0 , dt
(11.17)
where we have imposed v(0) = v0 , with v0 denoting the initial velocity of the particle. Integrating again we have x(t) = x0 + v0 t,
(11.18)
where we have used the fact that, for t = 0, we have x(0) = x0 , with x0 denoting the initial location of the particle. Equation 11.18 gives the mathematical representation of the fact that the particle moves in uniform rectilinear motion. Remark 96 (Initial conditions). It is worth noting the presence of two arbitrary constants of integration, x0 and v0 . It is important to observe that if x0 and v0 are known, the solution in Eq. 11.18 is unique. Thus, the solution to Eq. 11.16 depends uniquely upon these two arbitrary constants of integration, x0 and v0 . In other words, if we consider only Eq. 11.16, we have that the initial location and the initial velocity are arbitrary; they must be prescribed by additional conditions, otherwise the solution is not unique. [These are called the initial conditions and play an important role within the theory of initial–value problems in ordinary differential equations, addressed in Vol. II. This issue is addressed in greater depth in Remark 98, p. 465).]
460
Part II. Calculus and dynamics of a particle in one dimension
11.4.1 The Newton first law in one dimension
♥
The result expressed in Eq. 11.18 embodies the Newton first law in one dimension, which states that, in an inertial frame of reference, a particle subject to no forces either remains at rest or moves in rectilinear motion with a constant velocity. [As pointed out in Remark 37, p. 144, the Newton first law is a consequence of the Newton second law.] Note that, for an observer that has a velocity exactly equal to that of the particle, the particle appears to be at rest. Note also that, if the original frame of reference was inertial, the frame of reference of the moving observer is also inertial. In other words, in that very special frame of reference, the Newton first law reduces to the Newton equilibrium law (Eq. 4.1). Vice versa, we can say that the Newton first law is a consequence of the Newton equilibrium law, and of the definition of inertial frame of reference.
11.5 A particle subject to a constant force In this section, we consider a particle subject to a constant force (or forces that add up to a constant). We address first the problem in general. Then, to connect this to everyday life, we interpret the results in terms of a ball moving along the vertical and subject only to its own weight (Subsection 11.5.1). Consider a particle subject to a force F that is constant. The equation that governs the motion is still given by ma = F (Eq. 11.11), where F is constant and a = dv/dt = d2 x/dt2 (Eq. 11.2), namely a=
d2 x 1 dv = 2 = F = constant. dt dt m
Integrating Eq. 11.19 from 0 to t, we obtain / / t a(t1 ) dt1 = v(t) − v(0) = 0
t 0
F F dt1 = t, m m
(11.19)
(11.20)
or v(t) =
F t + v0 , m
(11.21)
where, as in the preceding section, to obtain the constant of integration we have used the fact that v(0) = v0 , with v0 denoting the initial velocity of the particle. Integrating again from 0 to t, and recalling that the primitive of t is
461
11. Rectilinear single–particle dynamics
t2 /2 (Eq. 10.30), we have x(t) =
1 F 2 t + v0 t + x0 , 2 m
(11.22)
where we have used the fact that, for t = 0, we have x(0) = x0 , with x0 denoting the initial location of the particle. Equation 11.22 is the desired solution for a particle subject to a constant force. It is the mathematical representation of the fact that the particle moves with constant acceleration. [Note again the presence of two arbitrary constants of integration x0 and v0 . The initial position and the initial velocity may be prescribed arbitrarily.] In the following, we simplify the initial conditions, so as to obtain a solution that is easier to understand at a gut level. Let us place the origin at the starting point so as to have x0 = 0, and assume that the particle is initially at rest, so that v0 = 0. Then, the solution is simply given by v(t) =
F t m
and
x(t) =
1 F 2 t . 2 m
(11.23)
Sometimes, it is of interest to have the velocity as a function of the space covered. In this case, eliminating t between the above expressions, we have 2F x v(x) = . (11.24) m [More on this in Section 11.9, on energy, where we introduce a more direct way to obtain this result.]
11.5.1 A particle subject only to gravity In order to relate the above results to everyday life, consider the motion of a ball moving along the vertical direction. To make our life simpler, I assume the resistance of the air (known as the aerodynamic drag) to be negligible. [The validity of this assumption depends upon the weight of the particle. For instance, let us compare a ball of lead and a feather. For a ball of lead, the drag is typically negligible, whereas for a feather the drag becomes immediately preponderant. More on this subject in Section 14.6, where we study the effect of the aerodynamic drag.] Then, the Newton second law gives us (use the conventions that x and the forces are both positive if directed upwards, so that the weight is given by −W , with W > 0)
462
Part II. Calculus and dynamics of a particle in one dimension
a=
d2 x dv W = 2 =− . dt dt m
(11.25)
Next, let us introduce the following Definition 207 (Acceleration of gravity). The ratio W/m is known as the acceleration of gravity and it is denoted by the symbol g: g=
W . m
(11.26)
What is startling is that experimental evidence shows that, at a given location on the Earth, the value of g is the same for any particle, independently of its material. In other words, the weight W is proportional to the mass m. [More on this in Subsection 20.8.1.] Comparing Eq. 11.25 and 11.26, we see that the acceleration of gravity is the downwards (i.e., negative) acceleration that a particle has when it is subject solely to its own weight.7 ◦ Memories. In order to obtain a clearer appreciation of the fact that the acceleration of gravity is independent of the material composing the particle, let us consider the problem of a body in a vertical tube from which the air has been virtually completely removed, so as to minimize the relevance of the aerodynamic drag. This is referred to as the free fall in a vacuum. The phenomenon was convincingly illustrated by an exhibit, shown at the Museum of Science in Boston, a long time ago, when my daughters were kids and I used to take them there. The setup consisted of two vertical ceiling– to–floor transparent tubes, placed next to each other: the first contained a feather, the second a lead ball, both in a vacuum. A mechanism released the two objects simultaneously, and one could observe that they reached the bottoms of the tubes exactly at the same time. [Recall that, if the length of the tube is h, we have h = 12 g t2 (Eq. 11.28, with v0 = 0). Thus, for h = 5 m, we have t 10/9.806 s 1 s; it takes approximately one second for any object to fall from the height of 5 meters (in a vacuum, of course).] To the best of my knowledge, the exhibit — unfortunately — is no longer available there. However, you can see equivalent clips on the web (search for “free fall in a vacuum”). Remark 97. Experimental evidence indicates that, for a given body, the mass m is constant, whereas the weight W is independent of velocity and acceleration, although it is a function of the location. Thus, from the fact that m is independent of the location, but W is not, we deduce that the acceleration of gravity changes with the location. [On the other hand, at a 7
The abbreviation “i.e.” stands from ”id est,” Latin for “that is.”
11. Rectilinear single–particle dynamics
463
given location on the Earth, the acceleration of gravity g = W/m may be treated as a constant, as we did in the study of Eq. 11.25, provided of course that we assume an adequately small variation of W (which is due to the change of the location, which in turn is due to the motion).] The standard value for g is g = 9.806 65 m/s2 .
(11.27)
To be precise, it varies from g = 9.823 m/s2 at the poles to g = 9.789 m/s2 at the equator. [All the astronautical data used in this volume are collected on p. 1003.] You might be interested in knowing that the value of the acceleration of gravity on the Moon is g = 1.625 m/s2 , about 16.7% of that on the Earth (p. 1003). Thus, on the Moon the weight of a given body is about one sixth of that on the Earth. If you weigh 150 pounds on the Earth, your weight on the Moon would be a mere 25 pounds, whereas of course your mass remains the same.
• Solution Equation 11.21 gives v(t) = −g t + v0 , whereas, placing the origin at the starting point, Eq. 11.22 yields 1 x(t) = − g t2 + v0 t. 2
(11.28)
If v0 is positive (directed upwards), the ball goes initially upwards, reaches a maximum and then starts to fall. The resulting graph of the function x = x(t) is of the type shown in Fig. 11.3. This corresponds to a vertical parabola (Eq. 6.32), with downwards concavity.
Fig. 11.3 x(t)/xMax vs t/tM
464
Part II. Calculus and dynamics of a particle in one dimension
◦ Maximum height. With the know–how developed in Chapter 9, we are able to determine the maximum height reached by the ball. Indeed, denoting with tM the instant of time when the ball reaches the maximum height, we have that the graph of x vs t has a zero slope at t = tM. For a mathematician, the derivative of x(t) at x = x(tM) vanishes, that is, dx/dt|t=tM = 0. For a physicist, the velocity at t = tM changes sign, that is, v(tM) = 0. Of course, the two points of view are equivalent, because v = dx/dt. Imposing v(tM) = −g tM + v0 = 0, we have tM = v0 /g. Substituting into Eq. 11.28, we obtain that the maximum height reached by the ball is given by xMax =
1 2 v . 2g 0
(11.29)
◦ Returning to the original location. For t = t0 := 2tM, the two terms on the right side of Eq. 11.28 cancel out, and x attains again the value x = 0, namely the ball is returned to its original location. It takes the ball the same amount of time to go up to the top height, as it takes to come back down from there to the original location. [This is apparent from the fact that the graph of a vertical parabola is symmetric with respect to its axis.]
11.6 Undamped harmonic oscillators. Free oscillations Consider the setup shown in Fig. 11.4: a particle connected to an anchored spring. We will refer to this setup as an undamped harmonic oscillator. [Why? See Subsection 11.6.1.] In this section, we assume that the only force acting on the particle is that due to the spring (free–oscillation problem). You may think of the setup to be in a weightless environment, such as the International Space Station.
Fig. 11.4 Free oscillations of an undamped harmonic oscillator
◦ Warning. In this book, the springs are always considered “massless.” In other words, we assume that the mass of the spring is negligible when compared to the other masses under consideration.
465
11. Rectilinear single–particle dynamics
Let x = 0 denote the location where the particle is when the spring is in the unstretched condition. Therefore, at x = 0 the particle is subject to no force, and hence if v0 = 0 it remains at rest (equilibrium location). Assume the same sign convention for FS and x, for instance, to be positive towards the right. Also, we assume that the spring is linear, namely that (Eq. 4.3) FS = −k x,
(11.30)
where k > 0 (Eq. 4.4). Thus, the Newton second law (Eq. 11.11) gives us m d2 x/dt2 = FS = −k x, or d2 x k x = 0. + dt2 m
(11.31)
◦ Comment. Recall that an equation is called differential iff the corresponding expression contains the unknown function and at least one of its derivatives (no integrals), and that the order of the equation equals the order of the highest derivative (Remark 82, p. 406). In addition, note that the operator in Eq. 11.31, namely the operator L ... =
d2 k ... , ... + dt2 m
(11.32)
is linear. Thus, Eq. 11.31 is a linear homogeneous second–order differential equation. Remark 98 (Initial conditions again). Akin to what we have done in all the examples presented thus far, to complete the problem, we need to add two initial conditions (Remark 96, p. 459). Here, we address the issue at a bit greater depth. We can begin by noting that experimental evidence tells us that the solution changes if we change the initial location or the initial velocity. To convince yourself of this, think of a mass–spring system at rest in the equilibrium location, x = 0. You may note that the mass will continue to remain in the equilibrium location. Next, change the initial location by moving it away from x = 0 and then release it. You will observe that the mass will oscillate around x = 0. Alternatively, leave the mass in the equilibrium location, but give it an initial velocity, such as an initial push (e.g., hit the mass in Fig. 11.4 with a cue stick, or kick it, as in a soccer penalty kick). Again, you will observe that the mass will oscillate around x = 0. Thus, the initial location and the initial velocity affect the solution. If these are not known the solution cannot be fully determined. Hence, Eq. 11.31 must be “completed” by adding the following initial conditions, namely x(0) = x0
and
x(0) ˙ = v0 ,
(11.33)
466
Part II. Calculus and dynamics of a particle in one dimension
with x0 and v0 available for us to impose as we wish. As mentioned above, here the initial conditions are introduced on the basis of experimental evidence. [However, as it will be shown in Vol. II, when dealing with the initial–value problem in ordinary differential equations, this issue may be addressed from a mathematical point of view. This yields that (under very broad conditions) the solution to an n-th order differential equation, with conditions on the values of the function and its first n − 1 derivatives, is unique. This implies that, if we find a solution that satisfies such a problem, this is the solution to our problem. These considerations apply to all the problems of this kind that we will consider.]
• Solution of differential equations Note that d2 A sin ωt + B cos ωt = −ω 2 A sin ωt + B cos ωt . 2 dt
(11.34)
x(t) = A cos ωt + B sin ωt
(11.35)
Hence,
is a solution to our problem, in the sense that it satisfies the Newton second law, as expressed by Eq. 11.31, provided that k ω= > 0, (11.36) m where ω (which is real, because m > 0 and k > 0) is called the natural frequency of oscillation of the mass–spring system. Correspondingly, the velocity v(t) = dx/dt is given by v(t) = ω (−A sin ωt + B cos ωt . (11.37) ◦ Comment. Recall that the trigonometric functions are periodic with period 2π (Eq. 6.66). Hence, when t is increased by an amount T =
2π , ω
(11.38)
the solution takes the same value that it had at time t. [The above relationship between period of oscillation and frequency of oscillation in Eq. 11.38 is quite important and will be encountered several times in this book.] Accordingly, the solution in Eq. 11.37 is periodic in time, with period
467
11. Rectilinear single–particle dynamics
T = 2π
m . k
(11.39)
• Imposing the initial conditions As in the preceding sections, in Eq. 11.35 we have two arbitrary constants A and B. This may be chosen so as to satisfy the initial conditions (Eq. 11.33), where x0 and v0 are available for us to choose as we wish (Remark 98, p. 465.) Specifically, note that, setting t = 0 in Eqs. 11.35 and 11.37, we have x(0) = A and v(0) = ωB. Thus, the conditions in Eq. 11.33 yield A = x0 and B = v0 /ω. Accordingly, we have that x(t) = x0 cos ωt +
v0 sin ωt ω
(11.40)
satisfies the Newton second law, as well as the initial conditions in Eq. 11.33. In particular, we see that, if x0 = 0 and v0 = 0 (namely if the particle is initially at rest, at the location x = 0), then x(t) = 0 at all times — the particle does not move! We will refer to this as the equilibrium location. If the particle is released with zero velocity at time t = 0 (namely v(0) = v0 = 0) from the location x0 = 0, we have x(t) = x0 cos ωt. On the other hand, if at t = 0 the particle is in the equilibrium position (namely x(0) = x0 = 0) and is hit impulsively, so as to provide an initial velocity v(0) = v0 = 0, we have x(t) = (v0 /ω) sin ωt. These two types of motion are shown in Figs. 11.5 and 11.6, with x0 = 1 and v0 /ω = 1, respectively (with ω = 1). The general motion described by Eq. 11.40 is a linear combination of these two types of motion.
Fig. 11.5 Initial position response
Fig. 11.6 Initial velocity response
468
Part II. Calculus and dynamics of a particle in one dimension
• Equivalent representations for the solution We have seen that the solution to the problem under consideration may be expressed as x(t) = A cos ωt + B sin ωt (Eq. 11.35). Similarly, one may verify that x(t) = C sin(ωt + χS)
(11.41)
x(t) = D cos(ωt + χC)
(11.42)
and
are also solutions to our problem. Here, we show that the expressions in Eqs. 11.35, 11.41 and 11.42 are equivalent, thereby clarifying the relationship between the three expressions. Let us begin with the equivalence of Eqs. 11.35 and 11.41. To this end, let us combine Eq. 11.41 with sin(x + y) = sin x cos y + cos x sin y (Eq. 7.47), to obtain C sin(ωt + χS) = C(cos ωt sin χS + sin ωt cos χS), namely C sin(ωt + χS) = A cos ωt + B sin ωt,
(11.43)
where A = C sin χS
and
B = C cos χS,
(11.44)
A2 + B 2
and
χS = tan -1(A/B).
(11.45)
so that C=
[Recall Remark 55, p. 55, on the ambiguity of χS.] This shows the equivalence of Eqs. 11.35 and 11.41. Similarly, to show the equivalence of Eqs. 11.35 and 11.42, we use cos(x + y) = cos x cos y − sin x sin y (Eq. 7.48). Combining this with Eq. 11.42, one obtains D cos(ωt + χ) = D (cos ωt cos χ − sin ωt sin χ), namely D cos(ωt + χC) = A cos ωt − B sin ωt,
(11.46)
where A = D cos χC and B = −D sin χC. These imply that D=
A2 + B 2
and
χC = − tan -1(B/A).
(11.47)
[Remark 55, p. 270, regarding the value of χC apply here as well.] This shows the equivalence of Eqs. 11.35 and 11.42. In conclusion, the three expressions given in Eqs. 11.35, 11.41 and 11.42 are fully equivalent.
469
11. Rectilinear single–particle dynamics
11.6.1 Harmonic motions and sinusoidal functions The type of motion described by these equations (see also Eq. 11.35) is very important in mechanics and is referred to as harmonic motion. Remark 99 (A word of caution). The term harmonic is used in mathematics and mechanics with several different meanings. For instance, by harmonic force, we mean a force of the type F (t) = A sin(Ωt + χS). The term harmonic input is also used if f (t) is referred to the right side of a differential equation, whereas the term harmonic forcing function is used in dynamics, when f (t) is referred to a force. An unrelated use of the term harmonic is in connection with 1 + 21 + 13 + 14 + . . . , the so–called harmonic series. However, a function u(x, y) is called harmonic iff it satisfies the so–called Laplace equation, namely ∇2 u :=
∂ 2u ∂ 2u ∂ 2u + + = 0. ∂x2 ∂y 2 ∂z 2
(11.48)
Thus, we do not have a name for functions of the type f (t) = A sin(Ωt + χ), as we cannot call these harmonic functions. To avoid any possible confusion, we introduce the following Definition 208 (Sinusoidal function. Phase shift). Given the equivalence of the three expressions in Eqs. 11.41, 11.42 and 11.35, the√corresponding function are all referred to as sinusoidal functions: C = A2 + B 2 is called the amplitude of the oscillation, whereas χS in Eq. 11.43 is called the phase shift (and so is χC in Eq. 11.46). ◦ Comment. Now we can answer the question: “Why the term undamped harmonic oscillator ?” It is called harmonic oscillator because, if the initial conditions are not homogeneous and no other force is applied, the mass oscillates harmonically (constant amplitude sinusoidal oscillations). [This implies that we have assumed the restoring force to be linear.] It is called undamped in order to distinguish it from a damped harmonic oscillator, which is the combination of a harmonic oscillator and a damper, namely a mechanism that produces a force opposite to the velocity, that damps the motion. [As we will see in Chapter 14, the resulting motion consists of free oscillations of decreasing amplitude.]
470
Part II. Calculus and dynamics of a particle in one dimension
11.7 Undamped harmonic oscillators. Periodic forcing In this section, we consider the forced oscillation of an undamped harmonic oscillator (a mass–spring system), due to a periodic force. Specifically, in Subsection 11.7.1 we address harmonic forcing, and present the solution along with its properties, for the limited case in which the frequency of the forcing frequency Ω is not equal to the natural frequency ω of the system. The analysis is completed in Subsection 11.7.2, where we address the phenomenon of resonance, which occurs when Ω = ω. Next, in Subsection 11.7.3, we study what happens if the forcing frequency is not equal but very close to the resonance frequency. Then, in Subsection 11.7.4, we extend the results to a force that may be written as a linear combination of harmonic ones having a period multiple of the basic one (the corresponding force is periodic, but not harmonic). Finally, in Subsection 11.7.5, I’ll share with you another personal anecdote regarding my bad driving habits in my youth, which will provide you with a fairly detailed analysis of the response of the car wheels to a bumpy road, thereby connecting the material of this section to automobile mechanics. This way you can appreciate how, with some modeling and approximations, we may go from theory to practice.
11.7.1 Harmonic forcing with Ω = ω Here, we consider the forced oscillation of an undamped harmonic oscillator subject to an external harmonically–varying force F (t) (Fig. 11.7). Here, to avoid cumbersome expressions, we consider the limited cases of F (t) = F˘ cos Ωt.
Fig. 11.7 Forced oscillations of undamped harmonic oscillators
(11.49)
471
11. Rectilinear single–particle dynamics
◦ Comment. The more general case with F (t) = F˘ cos(Ωt + χ) is virtually identical — simply replace Ωt with Ωt + χ in all the pertinent equations of this section. The force in the above equation must be added to the force due to the spring FS = −k x. Thus, generalizing Eq. 11.31 for free oscillations, the Newton second law (Eq. 11.11) gives us d2 x F˘ 2 cos Ωt, + ω x = dt2 m
(11.50)
where ω=
k/m.
(11.51)
Again, the problem must be completed by stating the initial location and the initial velocity, as in Eq. 11.33. In this subsection, we limit ourselves to the case Ω = ω.
(11.52)
[The case Ω = ω is addressed in the following one.] I claim that, in such a case, a particular solution to Eq. 11.50 is given by xP(t) =
F˘ /m cos Ωt. − Ω2
ω2
(11.53)
Indeed, we have F˘ /m F˘ /m 2 cos Ωt + ω cos Ωt ω2 − Ω 2 ω2 − Ω 2 F˘ /m F˘ cos Ωt, (11.54) cos Ωt = = (−Ω 2 + ω 2 ) 2 2 ω −Ω m
d2 xP d2 2 + ω x = P dt2 dt2
in agreement with Eq. 11.53.
• Initial conditions Recall that the operator d2 /dt2 + k/m (Eq. 11.32) is linear. Specifically, it’s a particular case of the linear differential operator in Eq. 9.162. Thus, we can exploit the superposition theorem for linear nonhomogeneous equations (Theorem 114, p. 407). Accordingly, we can add to the particular solution xP(t) (Eq. 11.53) the solution of the free–oscillation problem (Eq. 11.35), since doing this does not alter the contribution to the right side of Eq. 11.50.
472
Part II. Calculus and dynamics of a particle in one dimension
Hence, x(t) = A cos ωt + B sin ωt +
F˘ /m cos Ωt − Ω2
ω2
(11.55)
is the general solution to Eq. 11.50, as you may verify (Remark 83, p. 408). Again, we have two arbitrary constants, A and B, which allow us to impose the initial conditions, as given in Eq. 11.33. This yields (set t = 0 in Eq. 11.55 and in its derivative) F˘ /m = x0 , − Ω2 x(0) ˙ = B ω = v0 . x(0) = A +
ω2
(11.56)
Finally, solving for A and B and substituting into equation 11.55, yields x(t) = x0 cos ωt +
F˘ cos Ωt − cos ωt v0 sin ωt + . ω m ω2 − Ω 2
(11.57)
Remark 100. Note that the three terms on the right side of the equation correspond to the effects of the three non–homogeneities, namely: (i) x0 , which appears on the right side of the initial condition for x; (ii) v0 , which appears on the right side of the initial condition for x; ˙ and (iii) (F˘ /m) cos ωt, which appears on the right side of the differential equation. Indeed, because of linearity, we can solve independently the following three problems: x ¨ + ω 2 x = 0,
with x(0) = x0 and x(0) ˙ = 0,
2
x ¨ + ω x = 0, x ¨ + ω 2 x = (F˘ /m) cos Ωt,
with x(0) = 0
and x(0) ˙ = v0 ,
with x(0) = 0
and x(0) ˙ = 0.
(11.58)
They yield three solutions, which correspond to the three terms in Eq. 11.57, as you may verify. Using the superposition principle (valid for any linear problem, Subsection 9.8.2), these may be added to obtain Eq. 11.57.
• Amplification factor Let us take a deeper look at the term xP(t) =
F˘ /m cos Ωt. − Ω2
ω2
(11.59)
The particular solution xP(t) equals the force F (t) = F cos Ωt divided by m(ω 2 − Ω 2 ).
473
11. Rectilinear single–particle dynamics
◦ Comment. The factor F/[m(ω 2 − Ω 2 )] is positive for Ω < ω. This is expressed by saying that, for Ω < ω, the output xP(t) is “in phase” with the input F cos Ωt (namely they reach their maximum values simultaneously). On the contrary, the factor is negative for Ω > ω. This is expressed by saying that, for Ω > ω, the output xP(t) is “out of phase” with the input F cos Ωt (namely one reaches its maximum when the other reaches its minimum). Note that for Ω = 0, Eq. 11.59 yields xP = F˘ /(mω 2 ) = F˘ /k. This corresponds to the (constant) static displacement xP0 := F˘ /k, due to a constant force (in which case x ¨ = 0). Consider the ratio xP(t) 1 = cos Ωt. xP0 1 − (Ω/ω)2
(11.60)
The coefficient 1
Ξ(σ) =
1 − σ2
σ := Ω/ω ,
(11.61)
namely the ratio between the amplitude of the oscillation and the static displacement is known as the amplification factor (or as the frequency response) for the specific case under consideration, and is depicted in Fig. 11.8 as a function of σ.
Fig. 11.8 Ξ vs σ = Ω/ω
Remark 101 (Relevance of Ξ(σ)). The term corresponding to the amplification factor Ξ(σ) in Eq. 11.61 (namely the term in Eq. 11.59) dominates the solution (Eq. 11.55), when the frequency of excitation Ω approaches the natural frequency ω. This fact, which is important in the experimental techniques used to determine the natural modes of oscillation, will become apparent in Chapter 14. Therefore, to support this statement, I have to anticipate some of the results of such a chapter that are relevant here. In any practical applica-
474
Part II. Calculus and dynamics of a particle in one dimension
tion, we always have some energy dissipation (damping), whose effect is that the terms corresponding to the solution to the homogeneous equation (that is, A cos ωt + B sin ωt) are replaced by an expression that tends to zero as t tends to infinity, possibly ever so slowly, as we will see in Eq. 14.51. Thus, we are left only with the term in Eq. 11.53. [In order to avoid giving the wrong impression regarding physical reality, I must add that the situation is even more subtle than what I just said, because the term xP(t) is itself modified by the presence of damping. Specifically, we will see that, contrary to what appears in Eq. 11.59, in the presence of damping the amplitude of oscillation is always bounded, namely Ξ(Ω) < ∞ (Eq. 14.55). For very small damping, the modification is significant only in the neighborhood of Ω = ω.]
11.7.2 Harmonic forcing with Ω = ω. Resonance In the preceding analysis, we have assumed Ω = ω (Eq. 11.52). Here, we consider the case Ω = ω. Note that when the frequency of excitation Ω approaches the natural frequency ω (namely σ = Ω/ω approaches one), the amplification factor tends to infinity (Eq. 11.61). Thus, the solution we have found is not valid for Ω = ω and you might wonder what happens in this case. Assume, for the time being, the initial conditions to be homogeneous: xR(0) = x0 = 0
and
x˙ R(0) = v0 = 0.
(11.62)
I claim that for Ω = ω the solution is xR(t) =
F˘ t sin ωt. 2mω
(11.63)
◦ Comment. Let us verify that Eq. 11.63 is indeed the solution. We have dxR F˘ F˘ = t cos ωt + sin ωt. dt 2m 2mω
(11.64)
[Use the product rule for derivatives (Eq. 9.18).] Also d 2 xR F˘ F˘ F˘ t sin ωt + 2 cos ωt = −ω 2 xR + cos ωt, = −ω 2 dt 2m 2m m
(11.65)
in agreement with Eq. 11.50. The homogeneous initial conditions, xR(0) = x˙ R(0) = 0 (Eq. 11.62), are also satisfied, as you may verify. [Hint: Use Eqs. 11.63 and 11.64.] Hence, Eq. 11.63 gives indeed the solution when Ω = ω.
475
11. Rectilinear single–particle dynamics
The function xR(t) (Eq. 11.63) is plotted in Fig. 11.9, for F˘ /(2 m ω) = 1 and ω = 2π, so that the period is T = 1 (Eq. 11.38).
Fig. 11.9 Resonance solution: xR (t) vs t
Note that, in the present case (Ω = ω), the particular solution xR(t) grows to infinity. The corresponding phenomenon is known as resonance. Let us introduce the following Definition 209 (Envelope). Given the function f (t) = Q(t) cos(ωt + χ), where Q(t) is an arbitrary continuous positive function, the term envelope refers to the graph of the function Q(t). Accordingly, for the case under consideration, we can say that the envelope grows linearly with time. [See Eq. 11.63, where we have Q(t) = F˘ t/(2 m ω).] ◦ Comment. Again, as we did for Eq. 11.55, we can add the free–oscillation solution and obtain x(t) = A cos ωt + B sin ωt +
F˘ t sin ωt. 2mω
(11.66)
Imposing the initial conditions, and noting that xR(0) = x˙ R(0) = 0, we obtain x(t) = x0 cos ωt +
F˘ v0 sin ωt + t sin ωt. ω 2mω
(11.67)
◦ Comment. Note that the solution in Eq. 11.67 coincides with the limit, as Ω tends to ω, of the expression in Eq. 11.57. Indeed, using the l’Hˆ opital rule (Eq. 9.85), we have d(cos Ωt)/dΩ cos Ωt − cos ωt 1 lim t sin ωt, (11.68) = = Ω→ω ω2 − Ω 2 d(−Ω 2 )/dΩ Ω=ω 2ω in agreement with Eq. 11.63.
476
Part II. Calculus and dynamics of a particle in one dimension
11.7.3 Near resonance and beats (Ω ω)
♥
The fact that resonance was obtained as the limit of the non–resonance case indicates that there is some sort of continuity in going from one case to the other. But how can that be possible? How can we go from a solution that remains bounded to one that grows without limits? To address these questions, let us explore the case Ω ω, which I will refer to as near resonance. This will help us in clarifying the issue. To do this, let us consider again the term (cos Ωt − cos ωt)/(ω 2 − Ω 2 ) that we studied in Eq. 11.68. This time however, we do not consider the limit as ω tends to Ω — we limit ourselves to study the solution when Ω ω. To this end, recall that cos x − cos y = 2 sin 12 (y − x) sin 12 (x + y) (Eq. 7.69). Accordingly, setting x = Ω t and y = ω t, we have ω+Ω ω−Ω t sin t . (11.69) cos Ωt − cos ωt = 2 sin 2 2 This yields, for the homogeneous–initial–condition solution of the near– resonance problem, xNR(t) :=
ω+Ω F˘ cos Ωt − cos ωt F˘ /m ˘ Q(t) sin t , = m ω2 − Ω 2 ω+Ω 2
(11.70)
where ˘ := Q(t)
2 sin ω−Ω
ω−Ω t . 2
(11.71)
The function xNR(t) (Eq. 11.70) is depicted in Fig. 11.10, which is obtained
Fig. 11.10 Near–resonance solution xNR (t), compared to xR (t)
11. Rectilinear single–particle dynamics
477
for F˘ /[m(ω +Ω)] = 1, (ω +Ω)/2 = 2π, and (ω −Ω)/2 = 0.7 π. The resonance solution is also shown for comparison. As you can see from Eq. 11.70, the solution may be described as a sinu˘ soidal function, with an envelope Q(t) = F˘ Q(t)/[m(ω + Ω)]. Specifically, if Ω ω, the envelope Q(t) varies also in a sinusoidal way, but with a very low frequency. ˘ Of course, as Ω tends to ω, Q(t) tends to t. [Use limx→0 [(sin x)/x] = 1, Eq. 8.62.] This yields the resonance solution (Eq. 11.68), when Ω = ω. On the other hand, when Ω is close to ω, the two solutions (resonance and near resonance) appear very close, albeit only for 12 (ω − Ω) t π/2. • Beats The phenomenon is closely related to that of beats. This may be best explained at a gut level as follows. Consider two equal–amplitude sinusoidal signals that have a frequency very close to each other. Assume the two signals to be initially in phase (namely to reach the peaks simultaneously). In this case, the signals add up, namely the signals reinforce each other (constructive interference). After a while, because of the different frequencies, the two signals become out of phase (namely when one reaches the positive peak, the other reaches the negative peak). Then, the signals subtract, namely they offset each other (destructive interference). Remark 102. It should be emphasized that this behavior is independent of the fact that one function is the particular solution to the nonhomogeneous and the other is a solution to the homogeneous equation. The phenomenon occurs whenever we have two signals that have almost the same frequency, and is known as beats. [More on this topic in Subsection 21.2.1, on two weakly– coupled harmonic oscillators. See in particular Eq. 21.23.] Remark 103 (Beats in music). The phenomenon of beats is very well known in music, where it is referred to as beat tones, or simply as beats. Indeed, our ears perceive the corresponding sound as a note of frequency 1 1 (ω + Ω), with an intensity that varies like sin (ω − Ω)t . [You might have 2 2 used this phenomenon to tune your guitar. Tibetan bells are based on the same idea. What you hear is a note with loudness changing sinusoidally. Indeed, the loudness of the sound is modulated.]
478
Part II. Calculus and dynamics of a particle in one dimension
11.7.4 Periodic, but not harmonic, forcing
♣
You might object to the limitation that the forcing function is purely sinusoidal, because it is quite unlikely that we encounter purely harmonic signals. To avoid giving you the wrong impression, here we show that the results remain conceptually valid whenever the forcing function is of the type F (t) =
n #
˘ h sin hΩt , P˘h cos hΩt + Q
(11.72)
h=1
with n arbitrarily large. The above function is still periodic (albeit not harmonic), with period T = 2π/Ω, as you may verify. Correspondingly, the differential equation is n
m
# d2 x ˘ h sin hΩt . P˘h cos hΩt + Q +kx = 2 dt
(11.73)
h=1
Recall that the operator in Eq. 11.32 is linear (see again Eq. 9.162). Hence, we can use Theorem 113, p. 407, on the superposition of forcing functions, and obtain that a particular solution of Eq. 11.73 is xP(t) =
n # h=1
with ω =
1 ˘h cos hΩt + Q ˘ h sin hΩt , P m (ω 2 − h2 Ω 2 )
(11.74)
k/m (Eq. 11.51), provided of course that h Ω = ω,
with h = 1, 2, . . . , n.
(11.75)
[If this is not true, we have to modify Eq. 11.74 by replacing the term with h Ω = ω with the resonance solution.] Then, akin to what we did for Eqs. 11.55–11.57, we have to add the solution to the homogeneous equation. This provides us with two arbitrary constants, which allow us to impose the initial conditions. Remark 104. As we will see in Vol. II, the applicability of the result presented above is very general, much broader than what we can show with the mathematical know–how available at this point. For, using the so–called Fourier series, one obtains that any periodic piecewise–continuous function with period T = 2π/Ω may be expressed as in Eq. 11.72 with n that tends to ∞.
479
11. Rectilinear single–particle dynamics
• Illustrative examples The phenomenon of resonance explains several common phenomena. Let us start with an example familiar to all of us: what happens when we push a toddler on a swing. To begin with, note the force we push with is periodic (not harmonic; indeed it equals zero except for when we push). Also, its frequency is equal to the natural frequency of the swing. Thus, we can use the formulation presented above. Remark 105. An objection that you might come up with is the fact that the swing never goes to infinity. This is due to the fact that in the actual situation there always exists some dissipation of energy that limits the amplitude of the swing (Remark 101, p. 473). [For a quantitative analysis, you have to wait until we get to Subsection 14.5.3, where we will see how to take damping into account.] This phenomenon is also the reason why groups of soldiers are ordered not to march in step when they cross a bridge. Indeed, it has been the cause of several disasters. A sadly famous one occurred at a disco that had a structure overarching the dance floor. When people started dancing on the structure itself and the rhythm of the dancing steps came in sink with the frequency of oscillation of the arch, the structure “went into resonance” and collapsed, causing the death of several people. ◦ Comment. Contrary to a widely held belief (and contrary to what is stated in some mathematics textbooks), according to the U.S. federal committee that investigated the case, the Tacoma Narrows bridge disaster (Tacoma, Washington, November 7, 1940) was not caused by the phenomenon of resonance, but by that of aerodynamic flutter, namely an aeroelastic phenomenon of self–induced excitations, which is addressed in Vol. II.
11.7.5 An anecdote: “Irrational mechanics”
♥
In order to shed some light on what we have seen on undamped harmonic oscillators, let me tell you another personal anecdote, from the days when I was a young man. After I had completed the first two years of engineering studies (within the standard time limit, may I add — not an easy task in those days in Italian universities), my parents rewarded me by buying my first car, a Fiat 600. In hindsight, I think they were afraid I’d kill myself with the motorbike I had been using until then. They regretted their generosity shortly after. My driving habits didn’t improve appreciably by shifting from motorcycle to car. My friends used to
480
Part II. Calculus and dynamics of a particle in one dimension
say that I drove the car like a daredevil. According to some (not my friends, who drove pretty much like me), that was grossly understated — to them, I was driving like a maniac. From their point of view, I was practicing “Irrational Mechanics.” [In order for you to appreciate that, I must add that the course “Analytical Mechanics” has an Italian equivalent called “Meccanica Razionale,” namely Rational Mechanics.] Every day returning from the School of Engineering in San Pietro in Vincoli, close to the Colosseum, I had to drive through Via dei Cerchi, a street that runs parallel to the Circus Maximus. The street pavement was very wavy (they had done a poor job in tarring over the tramway tracks). In addition, by then my car’s shock absorbers (or simply shocks) were virtually gone. As a consequence, if I drove my car even at a relatively low speed, say 20 mph, the car started to shake. That was way too slow for my tastes! But, at 40 mph the car would shake was so bad it felt that the steering wheel was going to fly off. However, ... at 60 mph? No problem! The car would stop shaking — it was just buzzing, namely vibrating a bit at high frequency. That was the right speed for me! ◦ Warning. I am not suggesting that you follow my footsteps, to confirm the experimental evidence presented above. Don’t even think about it! That might be dangerous to your health! Let us try to use the material that we have learned thus far to make some sense out of this experimental evidence.
• Model of car suspension system in test facility (no shocks) Before modeling the problem of interest here, let me start with a phenomenon that is simpler to model. Consider a car wheel in a test facility. To this end, let us consider Fig. 11.11 (similar to that of Fig. 11.7 with an additional spring).
Fig. 11.11 Model of a car wheel in a test facility (no shocks)
11. Rectilinear single–particle dynamics
481
On the left of the figure, we have a sketch of the experimental setup. This consists of a wheel that is connected with a fixed structure on the top, while on the bottom is in contact with a support that oscillates periodically. [For simplicity, the support shown in the figure consists of an oscillating plate. A more realistic support consists of a rotating eccentric, that is, a cylinder rotating around a line different from its axis.8 This forces the wheel to move up and down and to rotate at the same time.] On the center of the figure, we have an idealized car–wheel system. The mass idealizes the wheel itself (the axis of the wheel is represented by the point P ). The point Q is where the wheel is anchored to the support of the facility, whereas S1 is the spring that connects the fixed point Q to the wheel axis P . In addition, the spring S2 is used to model the elasticity of the tire. [The points P and C move with respect to each other, and their relative motion generates a spring–like reaction, whose effect is simulated by the spring S2 .] The plate (point C) oscillates up and down around C0 , to simulate the waviness of the road. The setup is idealized by a mass (free to move up and down) with two springs. Somewhat oversimplifying the situation, both springs are here assumed to be linear. Finally, on the right side of the figure, we have the x-axis (vertical, for a change), along which the motion occurs. The origin O is the location of P in the equilibrium configuration, whereas x(t) denotes the location of P , as a function of time. On the other hand, the locations of the points C and C0 are denoted by xC and xC0 , respectively. For the sake of simplicity, we assume the springs to be unstretched when xC = xC0 (equilibrium configuration). We assume that a frame of reference rigidly connected to Q (namely to the Earth) may be considered inertial. This is sufficiently accurate for the problem under consideration (Remark 94, p. 457). Turning to the mathematical model, the force that the first spring exerts on the particle is given by F1 (t) = −κ1 x(t), whereas that exerted by the second is given by F2 (t) = −κ2 x(t) − uC (t) , (11.76) where uC (t) = xC (t) − xC0 , denotes the displacement of C from xC0 . The displacement uC (t) (which simulates the effect of the waviness of the street) is prescribed by the designer of the setup. Assume, for the sake of simplicity, uC (t) = UC cos Ωt. The Newton second law (Eq. 11.11) now reads m x ¨ = −κ1 x − κ2 (x − uC ), namely m 8
d2 x + k x = F (t), dt2
(11.77)
From ancient Greek εκκεντ ρoς, namely εκ (ek, out) and κεντ ρoν (kentron, center).
482
Part II. Calculus and dynamics of a particle in one dimension
where k = κ1 + κ2 , and F (t) = F˘ cos Ωt
F˘ = κ2 UC .
(11.78)
The above equation coincides with Eq. 11.50. Therefore, the analysis of Section 11.7 applies. Accordingly, if Ω = ω, we have (use Eq. 11.57) x(t) = x0 cos ωt +
κ2 UC cos Ωt − cos ωt v0 sin ωt + . ω m ω2 − Ω 2
(11.79)
However, if Ω = ω, we have resonance, and we use Eq. 11.67 instead.
• Model of a car suspension system on the road (no shocks) Let us see how can we extend the results obtained in the preceding subsubsection to a car that moves at a constant speed over a wavy road. Remark 106. Let us introduce a simplifying assumption, namely that the mass M of the car is much larger than that of the wheel (infinite–mass car hypothesis), so that the vertical motion of the body of the car is negligible with respect to that of the wheel axis. [Of course, this may be invoked if, and only if, the frequency of oscillation is sufficiently large, so as to make the inertial term m x ¨Q relevant. Why? What happens if Ω approaches zero?] Let us use a frame of reference (assumed inertial) connected to the car, so that the body car is fixed (Remark 106 above), akin to the point Q in the preceding subsubsection (Fig. 11.11). Next, we need to relate uC (t) to the waviness of the street. Assume, for the sake of simplicity, that the level h of the street pavement is sinusoidally shaped. Specifically, we assume that h(s) = UC cos 2πs/,
(11.80)
where s denotes the abscissa along the street, whereas is the wavelength of the oscillation, namely the distance between two successive peaks. [Indeed, when s = the argument equals 2π.] We have assumed that the car is traveling at a constant speed V , so that s = V t, and hence uC (t) = UC cos Ωt, where Ω = 2πV /.
(11.81)
Thus, we have again F (t) = F˘ cos Ωt (Eq. 11.78), with F˘ = κ2 UC . Therefore, the analysis of Section 11.7 applies again. Accordingly, if Ω = ω, the solution is given by Eq. 11.79.
483
11. Rectilinear single–particle dynamics
However, if the car is traveling at a speed V such that Ω = ω (namely, if V = ω/(2π), where is the wavelength of the pavement), then we have resonance, and we use Eq. 11.67 instead. This is what I experienced by driving the car on Via dei Cerchi at 40 mph. This is the speed at which resonance occurred. [Recall Remark 105, p. 479, on the fact that the amplitude of oscillation remains finite.] On the other hand, at a speed of 60 mph, we have
σ = Ω/ω = 1.5, and the amplification factor Ξ(Ω) = 1/ 1 − σ 2 (Eq. 11.61) takes the value Ξ = .8, lower than the static value.
• What about an accelerating car?
♠
You might object that in order to drive the car at 60 mph, one has to go through the speed of 40 mph. True! I could sweep the issue under the rug, and tell you that the accelerating portion was at the beginning of Via dei Cerchi and that the pavement was smooth in that portion. [Not true then, not true now.] Instead, let us address the problem at a deeper level. ◦ Comment. The first issue that we have to address is the fact that the frame of reference is no longer inertial. Indeed, in the above analysis, we have chosen to use a frame of reference connected to the car. This is OK if the car travels at a constant speed. “But the car is accelerating!” you are going to tell me. Indeed, in this case, the frame of reference connected to the car can no longer be considered as inertial. However, anticipating a few results obtained later (Ch. 22), I inform you that this affects only the horizontal forces, of no interest here. [This is similar to being in an accelerating elevator. In such a case, one experiences a fictitious force in the direction of the acceleration (vertical). However, no lateral (namely horizontal) force is experienced.] Thus, for the horizontal component, we can use such a frame as if it were inertial. Having clarified this, let us address the problem at a deeper level, and assume simply s = s(t). Then, the force is no longer given by Eq. 11.78. Instead, we should use F (t) = κ2 UC cos 2πs(t) . Correspondingly, the solution is given by (use Eq. 11.84 below) x(t) = x0 cos ωt +
κ2 UC v0 sin ωt + ω mω
t
/
sin ω(t − τ ) cos[a s(τ )] dτ. (11.82)
0
In order to continue, we need to know the function s(t), and then evaluate the integral. However, this is a moot point, because with the analytical tools that we have available at this point I am not able to find an explicit expression for the above integral, not even for the simple case of uniform acceleration. And it’s not even worth the effort. For, we can settle for a qualitative analysis of the phenomenon. To this end, let us apply an old trick. Replace
484
Part II. Calculus and dynamics of a particle in one dimension
the graph of the speed with a histogram. Then, a varying velocity (and Ω) is replaced by a piecewise constant velocity. For each constant–velocity time interval, we can apply the results of Section 11.7, on forced oscillations with harmonic force. Accordingly, we see that if the time interval with Ω = ω is sufficiently brief, the response does not have the time to reach dangerous levels. [It should be recognized that the trick can be applied here because the time history of the car speed offers variations that are much slower than that of the oscillations.]
11.8 Undamped harmonic oscillators. Arbitrary forcing The results presented above are limited to a periodic force. In this subsection, we consider the general forced–oscillation problem m
d2 x + k x = F (t), dt2
(11.83)
where F (t) is an arbitrary input force. The initial conditions are still arbitrarily prescribed: x(0) = x0 and x(0) ˙ = v0 (Eq. 11.33). I claim that the solution is given by / 1 t 1 v0 sin ωt + sin ω(t − τ ) F (τ ) dτ. (11.84) x(t) = x0 cos ωt + ω m 0 ω Indeed, using the rule for the derivative of the type of integral in the equation above (derivative under the integral sign plus integrand evaluated at the upper limit, Eq. 10.98), one obtains (use [sin ω(t − τ ) F (τ )]τ =t = 0) x(t) ˙ = −ω x0 sin ωt + v0 cos ωt +
1 m
t
/
cos ω(t − τ ) F (τ ) dτ.
(11.85)
0
Time differentiating this equation and using again Eq. 10.98, we have / ω t sin ω(t − τ ) F (τ ) dτ x ¨(t) = −ω 2 x0 cos ωt − ω v0 sin ωt − m 0 * 1) 1 + cos ω(t − τ ) F (τ ) F (t). (11.86) = −ω 2 x(t) + m m τ =t Thus, the expression given in Eq. 11.84 satisfies the differential equation in Eq. 11.83. In addition, we have x(0) = x0 and x(0) ˙ = v0 , as you may verify. Thus, the initial conditions are satisfied as well. Hence, Eq. 11.84 provides indeed the solution to Eq. 11.83, with its initial conditions. Recalling that the solution is unique (Remark 96, p. 459), we obtain that this is the solution.
485
11. Rectilinear single–particle dynamics
You may have noted that the expression given for the solution when F (t) is arbitrary (namely Eq. 11.84) does not look at all like the expression obtained when F (t) = F˘ cos Ωt (namely Eq. 11.57 for Ω = ω, or Eq. 11.67 for Ω = ω). In the two subsubsections that follow, we show that, when F (t) = F˘ cos Ωt (as in Eq. 11.50), Eq. 11.84 properly yields the solutions obtained above (namely Eq. 11.57 when Ω = ω, and Eq. 11.67 when Ω = ω).
• The case F (t) = F˘ cos Ωt, for Ω = ω
♥
Here, we consider the case F (t) = F˘ cos Ωt, with Ω = ω. Specifically, we want to show that in this case the solution in Eq. 11.84 coincides with that in Eq. 11.57. For simplicity, we assume x0 = v0 = 0, since the corresponding terms are equal in the two equations. Using sin x cos y = 12 [sin(x + y) + sin(x − y)] (Eq. 7.61), we have (note that the integration is with respect to τ , not t) 1 m
/
t
1 sin ω(t − τ ) F˘ cos Ωτ dτ 0 ω / t+ , F˘ sin[ω(t − τ ) + Ωτ ] + sin[ω(t − τ ) − Ωτ ] dτ = 2mω 0 t cos[ω(t − τ ) + Ωτ ] cos[ω(t − τ ) − Ωτ ] F˘ + = 2mω ω−Ω ω+Ω 0 F˘ cos Ωt − cos ωt = , (11.87) m ω2 − Ω 2
in agreement with Eq. 11.57.
• The case F (t) = F˘ cos ωt, for Ω = ω
♥
On the other hand, if Ω = ω, using again sin x cos y = 12 [sin(x+y)+sin(x−y)] (Eq. 7.61), we have (again, the integration is with respect to τ , not t) 1 m
/
/ t 1 F˘ sin ωt + sin ω(t − 2τ ) dτ sin ω(t − τ ) F˘ cos ωτ dτ = 2mω 0 0 ω t ˘ cos ω(t − 2τ ) F˘ F τ sin ωt + t sin ωt, (11.88) = = 2mω 2ω 2mω 0 t
in agreement with Eq. 11.63.
486
Part II. Calculus and dynamics of a particle in one dimension
11.9 Mechanical energy In this section, we introduce one of the most important notions in mechanics, that of mechanical energy. Here, we address it at its most elementary level, namely again for the limited case of a single particle in rectilinear motion. [You need not be concerned about these limitations! The concept is further elaborated for three dimensions in Section 20.6 (single–particle dynamics), in Section 21.6 (n particle dynamics), and in Section 23.3 (rigid body dynamics).] Specifically, in Subsection 11.9.1, we introduce the definitions of kinetic energy and work and present the relationship between the two. Next, we introduce the definition of conservative force and potential energy, for time– independent forces (Subsection 11.9.2). A deeper look is presented in Subsection 11.9.3, where we show that time–dependent potential force fields are not conservative. Furthermore, in Subsection 11.9.4 we show that, for a particle subject to conservative forces, the sum of kinetic and potential energies remains constant in time. Illustrative examples of the above material are presented in Subsection 11.9.5. Then, in Subsection 11.9.6, we make a short detour back to statics and introduce the theorem of minimum energy for the equilibrium of a particle subject to conservative forces. Finally, in Subsection 11.9.7, we discuss a method that allows us to obtain the solution for the dynamics problem of a particle subject only to conservative forces — via energy considerations.
11.9.1 Kinetic energy and work Let us start from the Newton second law for a single particle P , namely m dv/dt = F (Eq. 11.11). In general, the force F may be a function of t and x, and even of v. [This is the case of the force produced by the air (the air resistance, or aerodynamic drag), which according to the experimental evidence is proportional to v 2 . More on the subject in Section 14.6.] However, as pointed out in Remark 93, p. 455, none of the forces is a function of the acceleration of the particle. Thus, in general, we have F = F (x, v, t). Multiplying both sides of the equation by v, we have m v v˙ = v F (x, v, t), or d 1 (11.89) m v 2 = v F (x, v, t). dt 2 Integrating with respect to time, between ta and tb , one obtains
487
11. Rectilinear single–particle dynamics
1 m v2 2
tb
/
tb
F x(t), v(t), t v(t) dt.
=
(11.90)
ta
ta
To express the above relationship in a more interesting way, let us introduce the following definitions: Definition 210 (Kinetic energy in one dimension). The quantity T =
1 m v2 ≥ 0 2
(11.91)
is called the kinetic energy of a particle that has a mass m and velocity v. Definition 211 (Work in one dimension). The quantity W(ta , tb ) :=
/
tb
F x(t), v(t), t v(t) dt
(11.92)
ta
is called the work performed by the force F , as the particle moves from xa = x(ta ) to xb = x(tb ). Then, we have the following Theorem 127 (Kinetic energy and work). The change in kinetic energy at t = tb and t = ta equals the work performed by the force F , as P moves from xa = x(ta ) to xb = x(tb ): Tb − Ta = W(ta , tb ).
(11.93)
The above equation will be referred to as the one–dimensional energy equation. ◦ Proof : Combine Eqs. 11.90, 11.91 and 11.92, to yield * / tb 1 ) F [x(t), v(t), t] v(t) dt = W(ta , tb ), (11.94) Tb − Ta = m vt2b − vt2a = 2 ta in agreement with Eq. 11.93.
• Use momentum or energy equation? My students often are perplexed and say: “The equation of energy is a consequence of the equation for momentum. So, why do we have to study the energy equation?” For particle dynamics, the answer is: “Because for some problems the energy equation yields the solution more directly.”
488
Part II. Calculus and dynamics of a particle in one dimension
Then, another question arises: “When is it more convenient to use the energy equation, instead of the momentum equation?” To give you a simple example, assume consider what happens when you use your car’s brakes. Assume that the force F is constant and that you know the initial speed v0 . If you want to know how long it will take to get the car to stop, use the equation of momentum, v(t) = −F t/m + v0 (Eq. 11.21, with F > 0 since the force is pointed against the velocity), with v(t) = 0, and obtain t = m v0 /F .] On the other hand, if you want to know the space necessary to stop the car, you can obtain this 0 t much more directly by using the equation of energy, 12 m v 2 (t) = 12 m v02 + 0 F [x(t), v(t), t] v(t) dt (Eq. 11.94), which — for a constant force and v(t) = 0 — yields 12 m v02 = F . This yields =
1 2
m v02 /F.
• Elastic collision between two particles
(11.95)
♥
There are cases in which we use both equations (momentum and energy). To give you an example, I will make an exception to the fact that this chapter is primarily devoted to single particle dynamics. Let us consider the one– dimensional collision between two particles, such as two billiard balls. The equations are obtained by adding those for each of the two particles. [The forces due to the contact are equal and opposite and their contributions to the momentum equation cancel out. In addition, we assume the collision to be “elastic,” in the sense that no energy dissipation occurs. Then, both momentum and energy are conserved and we have m1 v1 + m2 v2 = m1 v1 + m2 v2 , 1 2
m1 v1 + 2
1 2
m2 v2 = 2
1 2
m1 v1 + 2
1 2
m2 v2 , 2
(11.96)
where and denote the values before and after the collision. [The velocities are positive if directed along the positive direction.] These yield m1 − m2 2 m2 v + v , m1 + m2 1 m1 + m2 2 2 m1 m2 − m1 v1 + v , v2 = m1 + m2 m1 + m2 2 v1 =
(11.97)
as you may verify. If m1 = m2 , we have v1 = v2 and v2 = v1 , namely the particle exchange their velocities. In particular, if v2 = 0 (particle P2 not moving before the collision), then v1 = 0 (particle P1 not moving after the collision).
489
11. Rectilinear single–particle dynamics
11.9.2 Conservative forces and potential energy We have seen that in general F = F (x, v, t). Thus, in general, the evaluation of the work performed requires that we know the time history x = x(t), describing how x moves from xa = x(ta ) to xb = x(tb ). However, if the force F is only a function of x, the formulation in terms of energy simplifies considerably and the energy approach becomes very useful. This is the case addressed in this subsection.
• Conservative forces We have the following Definition 212 (Conservative force). For one–dimensional problems, a force F acting on a particle located at x = x(t) is called conservative iff the work done by the force, as the particle moves from xa to xb , depends only upon xa and xb , and not on the time history x = x(t). We have the following Theorem 128. In one–dimensional problems, a force F that is only a function of the location x, namely F = F (x),
(11.98)
is conservative. ◦ Proof : Noting that v dt = dx, the work W (ta , tb ) performed by the force F (x), as the particle moves from xa = x(ta ) to xb = x(tb ), is given by W(ta , tb ) :=
/
tb
/
xb
F [x(t)] v(t) dt = ta
F (x) dx = W(xa , xb ).
(11.99)
xa
Thus, the work depends only upon xa and xb , and not on the time history x = x(t). In other words, the force F (x) is conservative, in agreement with the theorem.
• Potential energy Let us introduce the following Definition 213 (Potential energy, potential force). Assume that there exists a function U(x, t) such that (at any given t and any given x)
490
Part II. Calculus and dynamics of a particle in one dimension
dU = −F (x, t). dx
(11.100)
Then, the force F is said to be potential. The function U is called the potential energy of the force F (x, t). We have the following Theorem 129 (Potential energy for conservative forces). Consider a conservative force, F = F (x). Except for an arbitrary additive constant, the potential energy of the force F (x) equals minus the work done by the particle by going from x0 to x: / x F (x) dx + C = −W(x0 , x), (11.101) U (x) = − x0
where x0 is an arbitrarily prescribed value and C is an arbitrary constant. ◦ Proof : Indeed, Eq. 11.99 yields / x / F (x) dx = − W(x0 , x) = x0
x x0
dU dx = − U (x), dx
(11.102)
in agreement with Eq. 11.101. [In mathematical terms, U (x) is a primitive of −F (x). Accordingly, an arbitrary additive constant may be included.] Remark 107. Why did I add the constant C? Because for instance, as we √ will see in the next chapter, U (x) = cosh -1(x) (primitive of 1/ x2 − 1, Eq. 12.65) cannot be written as in Eq. 11.101 with C = 0, because cosh -1(x) > 1 (see Fig. 12.5, p. 511). Equation 11.101 implies W(xa , xb ) = U (xa ) − U (xb ).
(11.103)
[Note again that Eq. 11.103 does not require one to know the history x = x(t). In other words, for F = F (x) only the values xa and xb are needed to evaluate the work.] Remark 108. We emphasize that, in the definition of potential energy, x0 is assumed to be fixed once and for all. This is clearly stated by the notation U = U(x): U is considered only a function of the upper limit of integration. By changing x0 into x0 , the value of the potential energy is altered by a constant / x0 F (x) dx = W(x0 , x0 ), (11.104) C0 = x0
491
11. Rectilinear single–particle dynamics
consistently with the fact that the potential energy is known except for an arbitrary additive constant. [Recall that two distinct primitives of a given function differ only by a constant.] However, the value of this constant is irrelevant in view of Eq. 11.100.
11.9.3 A deeper look: U = U (x, t)
♥
Here, we address this issue in greater depth. To this end, let us obtain Eq. 11.102 in a slightly different way. If F is only a function of x, we have d U[x(t)]/dt = (d U/dx) (dx/dt) = −F [x(t)] v(t). Therefore, W(ta , tb ) =
/
tb
F [x(t)] v[x(t)] dt = −
tb
/
ta
ta
dU dt dt
= U[x(ta )] − U [x(tb )],
(11.105)
in agreement with Eq. 11.102. This way of obtaining the result allows us to examine what happens if F is not only a function of x, but also a function of t. Specifically, let us go back to Definition 213, p. 489, and assume again that there exists a function U (x, t), such that ∂U/∂x = F (x, t). This force is potential, in the sense that it admits a potential function. But, is it conservative, in the sense that the work depends only upon the initial and final location? Let’s find that out. We have dU ∂U ∂U ∂U = +v = − F [x(t), t] v(t). dt ∂t ∂x ∂t
(11.106)
Therefore, we have W(xa , xb ) :=
/
tb
/
tb
F [x(t), t] v(t) dt = ta
ta
= U[x(ta ), ta ] − U [x(tb ), tb ] +
/
∂U dU − ∂t dt
tb ta
∂U dt, ∂t
dt (11.107)
which generalizes Eq. 11.103. From Eq. 11.107, we see that the force F is conservative only if U is a function of x, but not of t. In other words, the condition that F is only a function of x (namely that ∂F /∂t = 0) is not only sufficient for the force F to be conservative (Theorem 128 above), but it is also necessary. In summary, for time–independent fields, conservative and potential are equivalent terms (as in Theorem 203, p. 764), whereas for time–dependent ones they are not.
492
Part II. Calculus and dynamics of a particle in one dimension
11.9.4 Conservation of mechanical energy The usefulness of the potential energy becomes even more evident if we introduce the following Definition 214 (Mechanical energy in one dimension). Given a conservative force F (x), the sum of kinetic and potential energies, E(x, v) = T (v) + U (x),
(11.108)
is called the (total) mechanical energy of the material point P . Then, we have the following Theorem 130 (Conservation of mechanical energy). Consider a single particle in one–dimensional motion. Iff F is a conservative force, namely iff F = F (x), the (total) mechanical energy E remains constant in time and equal to its initial value, that is, E(x, v) = T (v) + U (x) = E0 .
(11.109)
◦ Proof : Combining Tb −Ta = W(ta , tb ) (Eq. 11.93) and W(xa , xb ) = U(xa )− U(xb ) (Eq. 11.103) yields Tb − Ta = Ua − Ub , or T b + U b = Ta + U a ,
(11.110)
which is equivalent to Eq. 11.109, since the subscripts a and b denote arbitrary events. For a deeper look of what’s going on, let us introduce the following Definition 215 (First integral). Any expression G(. . . ) = constant that after differentiation yields a differential equation (and vice versa) is called a first integral of the equation. Note that by differentiating Eq. 11.109 with respect to x one obtains (use d U /dx = −F (x)) + dv , dE dv d U dx = mv + =v m − F (x) = 0. dt dt dx dt dt
(11.111)
In other words, differentiating Eq. 11.109 with respect to x, one obtains the Newton second law, m v˙ = F (x) (Eq. 11.11), multiplied by v(t). Vice versa, Eq. 11.109 is satisfied by any solution to the Newton second law with F (x) = −dU /dx. Hence, the expression in Eq. 11.109 is a first integral of the Newton second law for conservative forces, multiplied by v(t).
11. Rectilinear single–particle dynamics
493
11.9.5 Illustrative examples In this subsection, we consider two illustrative examples, namely: (i) a constant force, and (ii) a spring.
• Energy conservation for constant force (weight, brakes) For a constant force F (such as gravity), let us choose U (0) = 0 (arbitrarily, but legitimately, since U is defined up to an additive constant; Remark 108, p. 490). Then, we have U(x) = −F x,
(11.112)
where x is the abscissa in the direction of the force, from an arbitrarily chosen point. [Of course, here we assume again that the same sign convention applies for the signs of both x and F (Remark 92, p. 454).] Combining Eqs. 11.109 and 11.112 yields 1 m v 2 − F x = E0 , 2 where E0 :=
1 2
(11.113)
m v02 − F x0 . If x0 = v0 = 0, we have
1 m v 2 − F x = 0, (11.114) 2 a result fully equivalent to v = 2F x/m (Eq. 11.24). [If you examine the procedure used to obtain Eq. 11.24, you can really appreciate how much simpler it is to obtain the relationship v = v(x) via energy considerations.] In particular, in the case of gravity, denoting by x the distance along the vertical, pointed upwards, we have F = −mg, and Eq. 11.113 yields 1 m v 2 + m g x = E0 , 2
(11.115)
where E0 := 12 m v02 + m g x0 . The above equation yields xMax = v02 /(2 g) for the maximum height reached by a ball thrown upwards, from x0 = 0, with velocity v0 , in agreement with Eq. 11.29. Of course, the above considerations hold for any other constant force. For instance, if you apply a constant braking force FB, the distance to stop your car is given by = 12 m v02 /|FB| (as in Eq. 11.95). [Surprisingly enough, Eq. 11.113 applies in this case as well. I said “surprisingly enough” because this sounds like an oxymoron (a contradiction in terms), since friction is a
494
Part II. Calculus and dynamics of a particle in one dimension
dissipative force (namely a force that dissipates energy), whereas in our minds a conservative force is associated with the idea that it conserves energy. The conundrum is easily clarified if we recall that the friction force changes its sign with the velocity. Nonetheless, as long as the velocity does not change sign, friction satisfies Eq. 11.98, and hence it may be treated as if it were a conservative force. Thus, in this case, the results obtained apply.]
• Energy conservation for linear spring Similarly, in the case of a linear spring, we have F = −kx, and hence, choosing (arbitrarily, but legitimately) U(0) = 0, we have U(x) =
1 k x2 . 2
(11.116)
In this case, Eq. 11.109 yields (see Eq. 11.116) 1 1 m v 2 + k x 2 = E0 , 2 2
(11.117)
where E0 := 12 m v02 + 12 k x20 . ◦ Comment. Recall that the solution for an undamped harmonic oscillator is given by x(t) = C sin(ωt + χ) (Eq. 11.41), with ω 2 = k/m (Eq. 11.36). You may verify, by substitution into Eq. 11.117, that this solution satisfies the above energy equation, with E0 = 12 k C 2 = 12 m ω 2 C 2 .
11.9.6 Minimum–energy theorem for statics In this subsection, we return to statics. Specifically, here we exploit the above results on energy, not available in Chapter 4, to address static equilibrium, along with the corresponding issue of stability. We have the following Theorem 131 (Minimum–energy theorem for a single particle). Consider a particle subject to a conservative force F (x), and let U(x) denote its potential energy. If U(x) is stationary at x = x0 , then x0 is an equilibrium point. In addition, a minimum at x = x0 implies that the equilibrium is stable, whereas a maximum or an inflection imply that the equilibrium is unstable. Finally, if U(x) is constant in N = (x0 − δ, x0 + δ), then we have a so–called neutral equilibrium, for any x ∈ N . ◦ Proof : Consider the first part. The point x0 is an equilibrium point iff we have F (x0 ) = [dU /dx]x=x0 = 0. Thus, at x = x0 the energy is station-
495
11. Rectilinear single–particle dynamics
ary (Definition 190, p. 385, of stationary function), and vice versa. Next, regarding the second part of the statement, assume that U (x) has a minimum at x = x0 . In this case, there exists a neighborhood of x0 where the sign of F (x) is opposite to that of x − x0 . This means that, if the particle is moved away from the equilibrium position, the resulting force tends to move the particle back to the equilibrium point (restoring force). In other words, the equilibrium is stable. Vice versa, if U (x) has a maximum, there exists a neighborhood of x0 where the sign of F (x) is equal to that of x − x0 . Thus, if the particle is moved away from the equilibrium position, the resulting force tends to move the particle further away from the equilibrium point (repulsive force). In other words, the equilibrium is unstable. The same holds true in the case of a horizontal inflection point, because on one side of x0 the behavior is equal to that of a maximum. Finally, if U(x) is constant for any x ∈ N , then F (x) = 0, for any x ∈ N . The motion of the particle causes no force to arise, for any x ∈ N . We will refer to this as neutral equilibrium. Remark 109. It is of interest to revisit the above minimum–energy theorem from a dynamics point of view. Assume that at x = x0 the potential energy U has a minimum. Consider a particle that is located at x0 , with zero velocity. Then, E(x0 ) = 0 (Eq. 11.109). Any infinitesimal displacement away from x0 would cause the potential energy to increase, and hence the kinetic energy to decrease, a clear impossibility, since T (which is initially zero) cannot be negative (Eq. 11.91). Thus, the equilibrium is stable. On the contrary, if the potential energy U has a maximum at x = x0 , any infinitesimal displacement away from x0 would cause the potential energy to decrease, and hence the kinetic energy to increase, so as to move the particle further away from x0 . Thus, the equilibrium is unstable.
11.9.7 Dynamics solution via energy considerations It should be noted that Eq. 11.109 provides a very direct and convenient way to obtain a relationship between F (x) and v(t). As shown below, this in turn may be exploited to obtain the solution to the Newton second law via energy considerations, at least in implicit form, namely as t = t(x), with t(x) given in the form of an integral. Indeed, Eq. 11.109 may be written as dx =v= dt
2 E0 − U (x) m
1/2 .
(11.118)
Remark 110. In the above equation, we have tacitly assumed v > 0. [A minus sign on the far–right term should be included whenever v < 0.]
496
Part II. Calculus and dynamics of a particle in one dimension
The solution may be obtained with the method of separation of variables for ordinary differential equations (Subsection 10.5.1), to yield (use t0 = 0) - / x m 1 t(x) = dx1 , (11.119) 2 x0 E0 − U (x1 ) which provides us with an implicit expression of the function x = x(t), through its inverse t = t(x).
• Illustrative examples In order to clarify the method of solution introduced above, we present three simple illustrative examples. ◦ Constant force. If the force F is constant, we have U (x) = −F x (Eq. 11.112). Thus, for v > 0 (Remark 110, p. 495), Eq. 11.119 yields / x m 1 2mx t= , (11.120) √ dx1 = 2F 0 x1 F with Eq. 11.23. [Hint: Use which is equivalent to x(t) = 12 F t2 /m, in agreement 0 √ √ x0 = v0 = 0 and hence E0 = 0, along with (1/ x) dx = 2 x (Eq. 10.32).] ◦ Linear spring. Consider the case of a linear spring, as in the example in Section 11.6 (Eq. 11.31). If x(0) = 0, Eq. 11.117 may be written as 1 1 1 m v 2 + k x2 = m v02 , 2 2 2
(11.121)
where v0 = v(0). Then, we have (assume v > 0, Remark 110, p. 495) / x / x -1/2 2 -1/2 1 2 t(x) = dx1 = dx1 v0 − k x1 /m 1 − ω 2 x21 /v02 v0 0 0 / 1 ωx 1 ωx/v0 1 √ = du = sin -1 , (11.122) ω 0 ω v0 1 − u2 as you may verify. [Hint: Use Eq. 11.119 with t0 = x0 = 0, along with the transformation u = ωx1 /v0 , with ω 2 = k/m.] This yields x(t) = (v0 /ω) sin ωt, in agreement with Eq. 11.40, with x0 = 0. ◦ Comment. You might like to sharpen your skills and extend this to nonlinear springs with FS = −(k x+k x3 ). [Note that the integral that you would obtain is not one that you would be able to solve with what you have learned thus far.]
Chapter 12
Logarithms and exponentials
Now that we know what an integral is, we are in a position to introduce some new functions that are very important in this book, namely logarithm, exponential and hyperbolic functions, along with their inverses.
• Overview of this chapter In Section 12.1 we introduce the natural logarithm, as a specific primitive of 1/x. Its inverse, the exponential function, is addressed in Section 12.2. We also define the hyperbolic functions (Section 12.3), and their inverses (Section 12.4). Finally, we consider primitives involving logarithms and exponentials (Section 12.5). We conclude the chapter with an appendix (Section 12.6), where we address the issue: “Why the name hyperbolic functions.”
12.1 Natural logarithm In this section, we introduce a very important function, the natural logarithm. Recall that we know how to evaluate the integral of xα except for α = −1 (Eq. 10.46). In order to remedy the situation, so as to have a rule that includes the case α = −1, we introduce a new function through the following1 1
The Scottish mathematician and physicist John Napier of Merchiston (1550–1617), also known as Neper, introduced logarithms in 1614. He came up with this name, from λoγoς (namely logos, ancient Greek for word, reason) and αριθμoς (arithmos, ancient Greek for number).
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_12
497
498
Part II. Calculus and dynamics of a particle in one dimension
Definition 216 (Natural logarithm). The natural logarithm (or Napier logarithm), denoted by ln x, is defined by / x 1 ln x := du, x ∈ (0, ∞) . (12.1) 1 u [They are called natural logarithms in order to distinguish them from the closely related common logarithms (also known as base–10 logarithms), namely the exponent to give 10 to obtain a given number, as we all have learned in high school), for which the symbol log x is used. The base–10 logarithms are not used in this book, therefore the term “logarithm” may be used without fear of generating confusion.] The definition implies ln 1 = 0. The function ln x is depicted in Fig. 12.1.
Fig. 12.1 The function y = ln x
12.1.1 Properties of ln x Here, we explore the most important properties of logarithms.
• Derivative of ln x The derivative of the logarithm is given by
(12.2)
499
12. Logarithms and exponentials
d 1 ln x = . dx x
(12.3)
This stems directly from its definition (Eq. 12.1) and the first fundamental theorem of calculus, which states that the derivative of an integral, considered as a function solely of its upper limit, equals the integrand evaluated at the upper limit (Eq. 10.38). We have assumed x > 0 (Eq. 12.1). Thus, the derivative of ln x is always positive, so that the function ln x is monotonically increasing, as apparent from Fig. 12.1. In other words, a > b implies ln a > ln b.
(12.4)
In particular, we have (use Eq. 12.2) ln x ≷ 0,
for x ≷ 1.
(12.5)
• Logarithms of products and powers We have the following theorems Theorem 132 (Logarithm of product). Given any two real positive numbers, x and y, we have ln xy = ln x + ln y. ◦ Proof : Indeed, from the definition, we have / xy / x / xy 1 1 1 ln xy := du = du + du u u u 1 1 x / y / x 1 1 du + dv = ln x + ln y. = 1 u 1 v
(12.6)
(12.7)
[Hint: For the second equality, use the additivity of integration intervals (Eq. 10.8). For the third equality, set u = xv in the second integral, so that du = x dv; regarding the new limits, note that u = xy when v = y, whereas u = x when v = 1. For the fourth equality, recall that the symbol used for the dummy of variable of integration may be changed as we please (Eq. 10.5).] Theorem 133 (Logarithm of xy ). Given two real numbers, x > 0 and y, we have ln xy = y ln x.
(12.8)
500
Part II. Calculus and dynamics of a particle in one dimension
◦ Proof : Indeed, from the definition we have ln xy :=
xy
/ 1
1 du = u
/ 1
x
1 y v y−1 dv = y vy
/ 1
x
1 dv = y ln x. v
(12.9)
[Hint: For the second equality, set u = v y , so that du = y v y−1 dv (use du/dv = y v y−1 , Eq. 9.39); regarding the new limits, we have u = xy when v = x, whereas u = 1 when v = 1.] In particular, we have ln (1/x) = ln x−1 = − ln x.
(12.10)
• Logarithmic derivatives It should be noted that we have, for any function f (x) (use Eq. 12.3, as well as the Leibniz chain rule, Eq. 9.24)
d 1 df ln f (x) = , dx f (x) dx
(12.11)
df d = f (x) ln f (x) . dx dx
(12.12)
or
This approach, referred to as taking the logarithmic derivative, may be useful to simplify some mathematical operations. For instance, using Eq. 12.12 with f (x) = xα , we have, using Eq. 12.8, dxα d d 1 = xα ln xα = xα (α ln x) = xα α = α xα−1 , dx dx dx x
(12.13)
in agreement with Eq. 9.39. Remark 111. This is the proof of Eq. 9.39 alluded to in Remark 63, p. 311, on the derivative of xα (Eq. 9.39). However, as mentioned there, Eq. 9.39 was used in the proof of Eq. 12.8. Hence, Eq. 12.8 could not have been used in the proof of Eq. 9.39. This would have been a circular–proof misdeed !
• Asymptotic behavior of logarithm Here, we want to obtain the asymptotic behavior of the function ln x, namely its behavior as x tends to ∞. Specifically, we want to show that the logarithm
501
12. Logarithms and exponentials
goes to infinity with x, but it does so more slowly than any positive real power of x. First, we show that lim ln x = ∞.
x→∞
(12.14)
Indeed, consider Fig. 12.2.
Fig. 12.2 Limit of ln x, as x tends to ∞
Specifically, compare the area under the graph of the function 1/u and the area under the histogram corresponding to the function defined by f (x) := 1/n for x ∈ (n − 1, n), with n = 1, 2, . . . . The dark gray area corresponds to "∞ n=2 1/n, whereas the light gray area represents the difference between the two areas. We have / x 1 1 1 1 1 1 1 1 lim ln x = lim du > + + + + + + + ... x→∞ x→∞ 1 u 2 3 4 5 6 7 8 1 1 1 1 1 1 1 > + + + + + + + ... 2 4 4 8 8 8 8 1 1 1 (12.15) = + + + · · · = ∞. 2 2 2 Thus, the logarithm tends to infinity as x tends to infinity. However, it does so more slowly than any positive real power of x (namely xα , with α > 0 real). Specifically, we have lim
x→∞
ln x =0 xα
(α > 0).
(12.16)
Indeed, both numerator and denominator tend to infinity. Then, we can apply the l’Hˆopital rule (Eq. 9.85) and obtain
502
Part II. Calculus and dynamics of a particle in one dimension
lim
x→∞
ln x 1/x 1 = lim = lim = 0, α α−1 x→∞ x→∞ x αx α xα
(12.17)
since, for α > 0, xα goes to infinity with x (Eq. 8.71). + Finally, consider the asymptotic behavior as x tends to 0 (defined in Footnote 7, p. 321). As it can also be seen from Fig. 12.1, we have lim ln x = −∞,
(12.18)
x→0+
as you may verify. [Set x = 1/u and use ln(1/u) = − ln u (Eq. 12.10), as well as Eq. 12.14.]
• Evaluation of limx→0+ xx and limx→∞ x1/x
♣
In Remark 9, p. 37, we said that the value of 00 is best discussed case by case. Here, we show how to evaluate limx→0 xx , where the function xx is defined in Eq. 6.131. Before, we were missing a key ingredient, namely the logarithm. Accordingly, now it’s the right time to tackle the issue. To evaluate the limit, we consider first the logarithm of xx . We have lim [ln xx ] = lim+[x ln x] = lim+
x→0+
x→0
x→0
ln x 1/x = lim = 0, 1/x x→0+ −1/x2
(12.19)
as you may verify. [Hint: Note that both 1/x and | ln x| tend to infinity as x + tends to 0 . Hence, for the third equality, we can use the l’Hˆ opital rule (Eq. 9.85).] Accordingly, we have (use ln 1 = 0, Eq. 12.2) lim xx = 1.
x→0+
(12.20)
Since we are at it, let us consider the function x1/x (introduced in Eq. 6.132) and show that lim x1/x = 1.
x→∞
(12.21)
To this end, let us begin with lim ln x1/x = lim
x→∞
x→∞
ln x = 0, x
(12.22)
as you may verify. [Hint: For the second equality use Eq. 12.16, with α = 1.] This in turn yields Eq. 12.21 (use again ln 1 = 0, Eq. 12.2).
503
12. Logarithms and exponentials
12.2 Exponential Finally, we may introduce the exponential function. We have the following Definition 217 (Exponential). The exponential function (or simply the exponential) y = exp x is defined as the inverse of the logarithm, namely y = exp x,
iff
x = ln y.
(12.23)
In particular, we have exp 0 = 1,
(12.24)
because ln 1 = 0 (Eq. 12.2). Next, let us introduce the Napier number e, namely2 Definition 218 (Napier number). The Napier number e is defined by e := exp 1,
(12.25)
which is equivalent to (use Eq. 12.23) / e 1 ln e := du = 1. 1 u
(12.26)
• Exponential function vs ex Note that, for any real number a, y = ax implies ln y = x ln a (Eq. 12.8). In particular, for a = e we have that y = ex implies x = ln y (use Eq. 12.26). In other words, the exponential function exp x coincides with ex : exp x = ex .
(12.27)
Accordingly, from now on, we will denote the exponential function by ex . In particular, we have (use Eq. 12.24) e0 = 1.
(12.28)
Also, we have (use Eq. 12.23) ln ex = x 2
and
eln x = x.
(12.29)
Named after John Napier. Apparently, e was introduced by Euler. Indeed, some authors refer to e as the Euler number.
504
Part II. Calculus and dynamics of a particle in one dimension
The graph of ex is presented in Fig. 12.3. Note that the graph of ex is obtained from that of ln x, through a rotation around the line that forms a 45◦ angle with the x-axis, as it should, since one is the inverse of the other (Subsection 6.6.1). [The graph of e−x is also presented in the same figure. Note that the graph of e−x is obtained from that of ex , through a reflection around the y-axis, since it involves simply a change in the pointing of the x-axis.]
Fig. 12.3 The functions y = ex and y = e−x
12.2.1 Properties of exponential Here, we explore the most important properties of the exponential function.
• Derivative of ex We have the following Theorem 134 (Derivative of ex ). The derivative and the primitive of the exponential function ex are given by / dex x =e and ex dx = ex . (12.30) dx ◦ Proof : The second in Eq. 12.30 is equivalent to the first. So, let’s consider the first. Setting y := ex , we have dex 1 1 = = = y = ex . dx d(ln y)/dy 1/y
(12.31)
505
12. Logarithms and exponentials
[Hint: For the first equality, use the rule for the derivatives of inverse functions (Eq. 9.26). Then, use d(ln y)/dy = 1/y (Eq. 12.3).] The fact that the derivative of ex equals ex itself makes the exponential function very useful. For the material dealt with in this book, we can say that the exponential is the queen of the functions. [This is even more true in the complex field, addressed in Vol. II.]
• Exponentials of sum and product We have the following Theorem 135. Given two arbitrary real numbers, x and y, we have ex+y = ex ey .
(12.32)
◦ Proof : Starting from ln uv = ln u + ln v (Eq. 12.6), and setting u = ex and v = ey (namely x = ln u and y = ln v, Eq. 12.29), one obtains ln(ex ey ) = x + y, which is fully equivalent to Eq. 12.32. In particular, we have ex e−x = e0 , namely e−x = 1/ex ,
(12.33)
since e0 = 1 (Eq. 12.28). In addition, we have the following Theorem 136. Given two arbitrary real numbers, x and y, we have x y = exy . (12.34) e ◦ Proof : Using ln ay = y ln a (Eq. 12.8) and ln ex = x (Eq. 12.29), we have ln[(ex )y ] = y ln ex = x y, which is fully equivalent to Eq. 12.34. ◦ Comment. Equations 12.32, 12.33 and 12.34 may also be obtained from ax+y = ax ay (Eq. 6.126), a−x = 1/ax (Eq. 6.101) and (ax )y = axy (Eq. 6.127), by setting a = e. However, to me the approach used here appears preferable, as it emphasizes the relationship with the corresponding ones for the logarithms (Eq. 12.6, 12.8 and 12.10).
506
Part II. Calculus and dynamics of a particle in one dimension
• Asymptotic behavior of exponential Here, we study the asymptotic behavior of the exponential function, that is, its behavior as x tends to ∞. Specifically, we want to show that ex goes to infinity faster than any positive power of x. As pointed out in Section 6.6, the range of the inverse function coincides with the domain of the original function (and vice versa). Since the domain of definition of the logarithm is the positive x-semiaxis, we have that the range of ex is the positive y-semiaxis. In addition, the range of the logarithm is the whole y-axis, and hence the domain of definition of ex is the whole x-axis. In other words, we have ex > 0 x ∈ (−∞, ∞) . (12.35) Therefore, the derivative of ex , which coincides with ex , is also always positive. Hence, ex is monotonically increasing. [This is consistent with the same result for the logarithm, since their derivatives are the reciprocal of each other.] Also, limx→∞ ln x = ∞ (Eq. 12.14) implies lim ex = ∞.
x→∞
(12.36)
Thus, the exponential goes to infinity with x. To be precise, it does so faster than any positive real power of x, namely xα =0 x→∞ ex lim
(α > 0).
(12.37)
Indeed, applying the l’Hˆopital rule (Eq. 9.85) n times, with n ∈ (α, α + 1], we have xα xα−n = lim α(α − 1) . . . (α − n + 1) = 0, x→∞ ex x→∞ ex lim
(12.38)
since limx→∞ xα−n = 0, whenever α − n < 0 (Eq. 8.71). Finally, e−x = 1/ex (Eq. 12.33) implies lim e−x = lim ex = 0.
x→∞
x→−∞
(12.39)
[See also Fig. 12.3.] This is consistent with the fact that limx→0 ln x = −∞ (Eq. 12.18).
507
12. Logarithms and exponentials
• The function ax in terms of the exponential We conclude this subsection by showing how the function ax can be expressed in terms of the exponential function, even when a = e. Indeed, given y = ax , setting a = eα (namely α := ln a), and using Eq. 12.34, yields ax = eαx
(α = ln a).
(12.40)
For this reason, the function ax is rarely used in practice; eαx is widely used in its place. [Indeed, I don’t remember ever using ax after my freshman year in college.] Thus, for the derivative of ax , we have (use Eq. 12.40 and the chain rule as given in Eq. 9.64) dax deαx = = α eαx = ax ln a. dx dx
(12.41)
The same result may be obtained from [ln ax ] = [x lna] , as you may verify. [Hint: Use the logarithmic derivative f (x) = f (x) ln f (x) (Eq. 12.12), with ln ax = a ln x (Eq. 12.8).]
12.3 Hyperbolic functions Here, we introduce the so–called hyperbolic functions: sinh x, cosh x and tanh x. [The reason for the name hyperbolic functions is addressed in Section 12.6.] We have the following Definition 219 (Hyperbolic functions). The hyperbolic functions, sinh x (hyperbolic sine), cosh x (hyperbolic cosine), and tanh x (hyperbolic tangent) are defined as follows: 1 x e − e−x , 2 1 x e + e−x , cosh x := 2 sinh x ex − e−x 1 − e−2x tanh x := = x = . cosh x e + e−x 1 + e−2x
sinh x :=
(12.42) (12.43) (12.44)
Note that cosh x ± sinh x = e±x ,
(12.45)
508
Part II. Calculus and dynamics of a particle in one dimension
and cosh2 x − sinh2 x = 1, 2
2
cosh x + sinh x = cosh 2x,
(12.46) (12.47)
as you may verify. ◦ Comment. Note the analogy of: (i) Eq. 12.46 with cos2 x + sin2 x = 1 (Eq. 6.75), and (ii) of Eq. 12.47 with cos2 x − sin2 x = cos 2x (Eq. 7.65). [A deeper analysis of this analogy will be obtained within the complex field, addressed in Vol. II.] The hyperbolic functions are depicted in Fig. 12.4.
Fig. 12.4 Hyperbolic functions: (1) y1 (x) = sinh x, (2) y2 (x) = cosh x, (3) y3 (x) = tanh x
• Derivatives of hyperbolic functions We have the following Theorem 137 (Derivatives of hyperbolic functions). The derivatives of hyperbolic sine, cosine, tangent are given by d sinh x = cosh x, dx d cosh x = sinh x, dx d 1 tanh x = 1 − tanh2 x = . dx cosh2 x
(12.48) (12.49) (12.50)
◦ Proof of Eq. 12.48. Using de±x /dx = e±x (Eq. 12.30) and the definitions of hyperbolic sine and cosine (Eqs. 12.42 and 12.43), we obtain
509
12. Logarithms and exponentials
d 1 x sinh x = e + e−x = cosh x. dx 2
(12.51)
◦ Proof of Eq. 12.49. Similarly, using again Eqs. 12.30, 12.42 and 12.43, we obtain d 1 cosh x = ex − e−x = sinh x. dx 2
(12.52)
◦ Proof of Eq. 12.50. We have d 1 d d tanh x = sinh x − sinh x cosh x cosh x dx dx dx cosh2 x 1 1 = cosh2 x − sinh2 x = 1 − tanh2 x = . (12.53) 2 cosh x cosh2 x [Hint: Use the quotient rule (Eq. 9.22), as well as Eqs. 12.48 and 12.49, and recall that cosh2 x − sinh2 x = 1 (Eq. 12.46).] ◦ Comment. Note the analogy of Eqs. 12.48–12.50, with the derivatives of sin x, cos x and tan x (Eqs. 9.70–9.72). [Again, a deeper analysis of this analogy will be obtained within the complex field, addressed in Vol. II.]
• More properties of hyperbolic functions Note that the definitions imply that sinh x and tanh x are odd functions, whereas cosh x is an even function, as you may verify. In addition, still from the definition, we have (recall that e0 = 1, Eq. 12.28) sinh 0 = 0,
cosh 0 = 1,
tanh 0 = 0.
(12.54)
Also, we have cosh x > 0, because e±x > 0, for x ∈ (−∞, ∞) (Eq. 12.35). Therefore, for x ∈ (−∞, ∞), sinh x has a positive derivative, and hence it is a monotonically growing function. In view of the fact that sinh 0 = 0 (Eq. 12.54), we have sinh x 0,
for x 0,
(12.55)
as shown in Fig. 12.4, p. 508. Accordingly, cosh x is decreasing for x < 0 and increasing for x > 0. In view of the fact that cosh 0 = 1 (Eq. 12.54), we have cosh x > 1, as shown in Fig. 12.4, p. 508.
for x = 0,
(12.56)
510
Part II. Calculus and dynamics of a particle in one dimension ♣
• Formulas of addition for hyperbolic functions Moreover, we have sinh(x + y) = sinh x cosh y + cosh x sinh y,
cosh(x + y) = cosh x cosh y + sinh x sinh y, tanh(x + y) = (tanh x + tanh y)/(1 + tanh x tanh y).
(12.57)
Indeed, using Eq. 12.32, we have sinh x cosh y + cosh x sinh y 1 1 = ex − e−x ey + e−y + ex + e−x ey − e−y 4 4 1 x+y −(x+y) = sinh(x + y) −e = e 2 cosh x cosh y + sinh x sinh y 1 1 = ex + e−x ey + e−y + ex − e−x ey − e−y 4 4 1 x+y −(x+y) = cosh(x + y). +e = e 2
(12.58)
(12.59)
On the other hand, using the last two equations, one obtains tanh(x + y) =
tanh x + tanh y sinh x cosh y + cosh x sinh y = . cosh x cosh y + sinh x sinh y 1 + tanh x tanh y
(12.60)
◦ Comment. Note the analogy of Eq. 12.57 with the addition formulas for the trigonometric functions (Eqs. 7.47–7.49). [Again, a deeper analysis of this analogy will be obtained within the complex field, addressed in Vol. II.]
• Asymptotic behavior of hyperbolic functions
♣
From the definition of the hyperbolic functions (Eqs. 12.42–12.44) and from the asymptotic behavior of the exponential, namely limx→∞ ex = ∞ (Eq. 12.36) and limx→∞ e−x = 0 (Eq. 12.39), one obtains lim sinh x = ±∞,
(12.61)
lim cosh x = ∞,
(12.62)
lim tanh x = ±1,
(12.63)
x→±∞ x→±∞
x→±∞
as shown in Fig. 12.4, p. 508. [For Eq. 12.63, use the last expression in Eq. 12.44.]
511
12. Logarithms and exponentials
12.4 Inverse hyperbolic functions
♣
Here, we introduce the inverse functions of the hyperbolic functions sinh x, cosh x and tanh x, denoted respectively by sinh -1 x, cosh -1 x and tanh -1 x. The inverse hyperbolic functions are depicted in Fig. 12.5. [Compare to Fig. 12.4, p. 508, of the hyperbolic functions.]
Fig. 12.5 Inverse hyperbolic functions: (1) sinh -1x, (2) cosh -1x, (3) tanh -1x
12.4.1 Derivatives of inverse hyperbolic functions
♣
We have the following Theorem 138 (Derivatives of inverse hyperbolic functions). The derivatives of the inverse hyperbolic functions are given by d 1 sinh -1 x = √ , dx 1 + x2 d ±1 cosh -1 x = √ , dx x2 − 1 d 1 tanh -1 x = . dx 1 − x2
(12.64) (12.65) (12.66)
◦ Proof of Eq. 12.64. Setting y = sinh -1 x, and using the rule for the derivative of the inverse function (Eq. 9.26), along with Eq. 12.48, we have d 1 1 1 1 sinh -1 x = = = =√ . (12.67) 2 dx d sinh y/dy cosh y 1 + x2 1 + sinh y
512
Part II. Calculus and dynamics of a particle in one dimension
◦ Proof of Eq. 12.65. Setting y = cosh -1 x (x ≥ 1), and using again Eq. 9.26 along with Eq. 12.49, we have (the function sgn y is defined in Eq. 8.99) d ±1 1 sgn y 1 =√ cosh -1 x = = = . (12.68) 2 dx d cosh y/dy sinh y x2 − 1 cosh y − 1 ◦ Proof of Eq. 12.66. Similarly, we have d 1 1 1 tanh -1 x = = . = 2 dx d tanh y/dy 1 − x2 1 − tanh y
(12.69)
[Hint: Set y = tanh -1 x, and use again Eq. 9.26 along with Eq. 12.50.]
12.4.2 Explicit expressions
♠
Contrary to the inverse trigonometric functions, the inverse hyperbolic functions may be expressed in terms of elementary functions.
• Inverse hyperbolic sine Consider first y = sinh -1 x, namely the inverse of x = sinh y = (Eq. 12.42), or x = 12 (σ − 1/σ), where σ = ey > 0
1 y 2 (e
(namely y = ln σ).
− e−y ) (12.70)
Thus, we have √ obtained a quadratic equation, σ 2 − 2xσ − √1 = 0, whose roots are σ± = x ± 1 + x2 (Eq. 6.36). The root σ+ = x + x2 + 1 is the only positive one, as required by Eq. 12.70. Indeed, we have σ− = −1/σ+, because here (use Eq. 6.40) σ−σ+ = c/a = −1.
(12.71)
Accordingly, using y = ln σ (Eq. 12.70), we obtain sinh -1 x = y = ln σ+, or , + sinh -1 x = ln x + 1 + x2 .
(12.72)
√ ◦ Comment. Note that, although not immediately apparent, ln x+ 1 + x2 is an odd function (as it should, since it’s the inverse of an odd function). For + , ln − x + 1 + x2 = ln
, + 1 √ = − ln x + 1 + x2 . x + 1 + x2
(12.73)
513
12. Logarithms and exponentials
[Hint: For the first equality, use −σ−σ+ = 1 (Eq. 12.71); for the second, use ln(1/x) = − ln x (Eq. 12.10).] ◦ Comment. The reason for ignoring the negative root deserves additional comments. The root σ− is spurious, since it is incompatible with Eq. 12.70. [We have encountered spurious roots in Subsubsection “Spurious roots,” on p. 290.] Nonetheless σ− is meaningful. We can set σ = −e−y < 0 (instead of σ = ey > 0, Eq. 12.70), and obtain again x = 12 (σ − 1/σ), which√gives us the same roots σ±. In this case, we would choose the root σ− = x − x2 + 1 < 0. Then, we would obtain again (use −σ−σ+ = 1, Eq. 12.71) sinh -1 x = y = − ln(−σ−) = ln(σ−) = ln x + 1 + x2 ).
(12.74)
• Inverse hyperbolic cosine Next, let us consider y = cosh -1 x (x ≥ 1), which corresponds to x = cosh y = 1 y −y ) ≥ 1 (Eq. 12.43). Note that the function cosh -1 x is double–valued, 2 (e + e because cosh x is an even function. For simplicity, let us limit the analysis to the branch with y ≥ 0 (positive branch). Setting σ = ey ≥ 1 (namely y = ln σ), we obtain x = 12 (σ + 1/σ) ≥ 1, or σ 2 − 2xσ + 1 = 0. Recalling again the formula for the roots of a quadratic √ √ equation (Eq. 6.36), one obtains 2 σ± = x ± x − 1. We have σ+ = x + x2 − 1 ≥ 1 (since x ≥ 1), whereas √ σ− = x − x2 − 1 ≤ 1, because now (use Eq. 6.40) σ−σ+ = c/a = 1.
(12.75)
Accordingly, we have cosh -1 x = y = ln σ+, or + , cosh -1 x = ln x + x2 − 1 ≥ 0
(x > 1).
(12.76)
◦ Comment. Recall that in the above analysis we have limited ourselves to -1 y the positive branch of y = cosh √ x. Accordingly, we have set σ = e ≥ 1 and 2 used only the root σ+ = x + x − 1 ≥ 1. What happens if we use instead √ σ− = x − x2 − 1? Recall that σ− = 1/σ+ (Eq. 12.75). Accordingly, we have ln σ− = − ln σ+, namely the negative branch. • Inverse hyperbolic tangent Finally, let us consider y = tanh -1 x, which corresponds to x = tanh y = (ey − e−y )/(ey + e−y ) = (e2y − 1)/(e2y + 1) (Eq. 12.44). Setting σ = ey > 0 2 2 (namely y = ln σ), we obtain x = (σ −1)/(σ +1), or σ = (1 + x)/(1 − x) .
514
Part II. Calculus and dynamics of a particle in one dimension
Accordingly, we have tanh -1 x = y = ln σ = ln tanh -1 x =
(1 + x)/(1 − x) , or
1+x 1 ln . 2 1−x
(12.77)
Note that tanh -1 x is an odd function, as it should, since it is the inverse of an odd function.
• Derivatives again
♠
Note that the same result for the derivatives of sinh -1 x, cosh -1 x and tanh -1 x may be obtained by differentiating Eqs. 12.72, 12.76 and 12.77. Specifically, we have , + d 1 d x √ sinh -1 x = ln x + 1 + x2 = 1+ √ dx dx x + 1 + x2 1 + x2 1 , (12.78) =√ 1 + x2 in agreement with Eq. 12.64. Similarly, + , d d ±1 x √ cosh -1 x = ± ln x + x2 − 1 = 1+ √ dx dx x + x2 − 1 x2 − 1 ±1 x>1 , (12.79) =√ x2 − 1 in agreement with Eq. 12.65. Finally (use the quotient rule for derivatives, Eq. 9.22) d 1+x 1 1 − x (1 − x) + (1 + x) 1 d tanh -1 x = ln = dx 2 dx 1 − x 2 1+x (1 − x)2 1 |x| < 1 , = 2 1−x
(12.80)
in agreement with Eq. 12.66.
12.5 Primitives involving logarithm and exponentials We conclude this chapter by considering some primitives that involve the logarithm and the exponential, specifically, the primitives of ln x itself, as
515
12. Logarithms and exponentials
well as those of tan x, tan -1 x, tanh x, tanh -1 x, eαx cos x, eαx sin x, xn ex and 1/(a x2 + b x + c).
• Primitive of ln x Note that (x ln x) = ln x + 1 (use Eq. 9.18 and 12.3). Hence, using an integration by parts (Eq. 10.59) with f (x) = ln x and g(x) = x, we have / / ln x dx = x ln x − dx = x ln x − x. (12.81) As usual, it is a good idea to verify that our result is correct. Indeed, we have (x ln x − x) = ln x + 1 − 1 = ln x.
• Primitive of tan x Using the rule of integration by substitution introduced in Subsection 10.4.1 (Eq. 10.19), and setting cos x = u yields du = − sin x dx. Hence, / / / sin x 1 tan x dx = dx = − du = − ln |u| = − ln |cos x|. (12.82) cos x u Indeed, we have (use Eqs. 12.3 and 9.71, as well as the Leibniz chain rule, Eq. 9.24) d 1 d sin x ln |cos x| = cos x = − = − tan x. dx cos x dx cos x
(12.83)
• Primitive of tan -1 x Using the rule of integration by part (Eq. 10.60), we have / / 1 dx tan -1 x dx = x tan -1 x − x 1 + x2 1 = x tan -1 x − ln 1 + x2 . 2
(12.84)
Indeed,
1 ln 1 + x2 2 -1 = tan x.
x tan -1 x −
= tan -1 x +
x 1 2x − 1 + x2 2 1 + x2 (12.85)
516
Part II. Calculus and dynamics of a particle in one dimension
• Primitive of tanh x Setting cosh x =: u > 0, we have / / / 1 sinh x dx = du = ln |u| = ln cosh x. tanh x dx = cosh x u
(12.86)
Indeed, we have (use Eq. 12.3 and 12.49, as well as the Leibniz chain rule, Eq. 9.24) d 1 d sinh x ln cosh x = cosh x = = tanh x. dx cosh x dx cosh x
(12.87)
• Primitive of tanh -1 x Using the rule of integration by part (Eq. 10.60), we have, for |x| < 1, / / 1 -1 -1 dx tanh x dx = x tanh x − x 1 − x2 1 = x tanh -1 x + ln 1 − x2 . (12.88) 2 Indeed,
1 ln 1 − x2 2 -1 = tanh x.
x tanh -1 x +
= tanh -1 x +
1 −2 x x + 1 − x2 2 1 − x2 (12.89)
• Primitive of eαx cos x and eαx sin x In order to obtain the primitives of eαx cos x and eαx sin x, note that ) * eαx A cos x + B sin x ) * = eαx α A cos x + B sin x + − A sin x + B cos x ) * = eαx (α A + B) cos x + (α B − A) sin x . (12.90) Thus, the primitive of eαx cos x is obtained by imposing α A + B = 1 and α B − A = 0, namely A = α/(1 + α2 ) and B = 1/(1 + α2 ). This yields / 1 eαx cos x dx = eαx α cos x + sin x . (12.91) 2 1+α
517
12. Logarithms and exponentials
Indeed, we have ) * eαx α cos x + sin x = α eαx α cos x + sin x + eαx − α sin x + cos x (12.92) = 1 + α2 eαx cos x. Similarly, the primitive of eαx sin x is obtained by imposing α A + B = 0 and α B − A = 1, namely A = −1/(1 + α2 ) and B = α/(1 + α2 ). This yields / 1 eαx sin x dx = (12.93) eαx α sin x − cos x . 2 1+α I won’t deprive you of the opportunity to verify this.
• Primitive of xn ex Note that [xn ex ] = xn ex + nxn−1 ex . Thus, using an integration by parts, we have (use Eqs. 9.18 and 12.30) / / xn ex dx = xn ex − n xn−1 ex dx. (12.94) The above equation may be used recursively to obtain an explicit expression for the desired integral. I claim that the result is /
n x
x e dx = e
x
n #
(−1)n−k
k=0
n! k x , k!
(12.95)
where n! = 1 · 2 · 3 · 4 . . . (n − 1) · n and 0! = 1 (Eqs. 9.41 and 9.43). Let us verify my claim. We have
ex
n #
(−1)n−k
k=0
= ex
n #
n! k x k!
n
(−1)n−k
k=0
=e
x
n # k=0
# n! k n! x + ex k xk−1 (−1)n−k k! k!
(12.96)
k=1
n−1
(−1)
n−k
# n! k n! h x − ex x = xn e x . (−1)n−h k! h! h=0
[Hint: For the second equality, set h = k − 1 in the second sum. For the last equality eliminate equal terms.] In particular, we have
518
Part II. Calculus and dynamics of a particle in one dimension
/
x ex dx = (x − 1) ex .
(12.97)
Indeed, we have [(x − 1) ex ] = ex + (x − 1) ex = x ex . • Primitive of 1/(a x2 + b x + c) In closing this section, consider the primitive of 1/(a x2 + b x + c). Using the notation Δ = b2 − 4ac (Eq. 6.44), we have / b + 2ax −2 1 , for Δ > 0; dx = √ tanh -1 √ a x2 + b x + c Δ Δ −2 = , for Δ = 0; b + 2ax b + 2ax 2 = √ tan -1 √ , for Δ < 0. (12.98) −Δ −Δ I’ll let you do some serious work this time, and verify that the above equation is correct.
12.6 Appendix. Why “hyperbolic” functions?
♥
Why are sinh x and cosh x called hyperbolic functions? To address this question, let us begin by going back to Eq. 7.93, where we stated that the parametric representation for a hyperbola is x = a cosh u, y = b sinh u.
(12.99)
Now we have the tools to prove that Eq. 12.99 represents a hyperbola. Indeed, using cosh2 x−sinh2 x = 1 (Eq. 12.46), you can verify that the above equation is equivalent to the horizontal hyperbola x2 /a2 − y 2 /b2 = 1 (Eq. 7.90). [Note the analogy with the parametric representation of a unit circle, namely x = cos u and y = sin u. The similarity is even stronger if you recall that the trigonometric functions are also called circular functions (Definition 123, p. 229).]
519
12. Logarithms and exponentials
12.6.1 What is u?
♥
As I was writing this section, I asked myself: “In the equation for a circle, u denotes the angle that the segment OP in Fig. 12.6 forms with the x-axis, namely the arclength s on the unit circle. Is this true for the hyperbola as well? If not, what is the meaning of the parameter u for the hyperbola?”
Fig. 12.6 Area swept by segment OP
• Two failed attempts
♠
To answer these questions, I considered the so–called unit hyperbola (the equivalent of a unit circle), for which a = b = 1: x = cosh u, y = sinh u.
(12.100)
The relationship between the parameter u and the angle θ that the segment OP forms with the x-axis is u = tanh -1 y/x = tanh -1(tan θ).
(12.101)
Thus, the parameter u does not coincide with the angle θ. Then, I remembered that the angle is defined as the arclength along the unit circle. Thus, I considered the relationship between the parameter u and the arclength along the hyperbola. We have . ds = dx2 + dy 2 = sinh2 u + cosh2 u du. (12.102) or
520
Part II. Calculus and dynamics of a particle in one dimension
/ s(u) =
u
.
sinh2 u1 + cosh2 u1 du1 .
(12.103)
0
Again, the parameter u does not coincide with the arclength along the hyperbola either.
• Here we have it! Really out of desperation, I considered the area swept by the segment OP , as P moves along the hyperbola (Fig. 12.6). Let x and y denote the coordinates of the point P , and let us identify the differential (dx, dy) with the segment P Q. The area dA swept by OP , as P moves along the hyperbola, is approximately equal to that of the triangle OP Q (in dark gray in the figure). The area dA is equal to the area of the triangle OCQ, minus the area under the segments OP and P Q, namely the sum of the areas of the triangles ODP and P BQ (in light gray in the figure) and that of the rectangle P DCB (in medium gray in the figure). Accordingly, we have 2 dA = (x + dx) (y + dy) − x y − dx dy − 2 y dx = x dy − y dx. (12.104) Next, note that we have x = cosh u and y = sinh u (Eq. 12.100), and d cosh u du = sinh u du, du d sinh u du = cosh u du. dy = du
dx =
(12.105)
Therefore, we have 2 dA = cosh2 u − sinh2 u du = du.
(12.106)
Integrating, we have u = 2 A. Thus, we have that the parameter u coincides with twice the area swept by the segment OP . It’s never too late to learn something new! ◦ Comment. To no surprise, this is true for a circle as well. In this case, note that Eq. 12.104 is valid as well for dx < 0, as you may verify (as in the case for a circle, where u equals the angle θ). Accordingly, we have 2 dA = x dy − y dx = R2 (cos2 θ + sin2 θ) dθ = R2 dθ, which yields A = π R2 , in agreement with Eq. 6.90.
(12.107)
Chapter 13
Taylor polynomials
In this chapter, we consider a function, say f (x), which is known at x = c, along with its derivatives up to order n, and discuss how we can approximate f (x) with a polynomial of order n that best fits it in a suitable neighborhood of c. Such a polynomial is called the Taylor polynomial of degree n of the function f (x) about x = c. [Before you raise your hand, the limit of a Taylor polynomial as n tends to infinity is called a Taylor series and is discussed in Vol. II.] Motivation. In Chapter 8 we have shown that, in the neighborhood of x = 0, we have cos x = 1 − 12 x2 + o[x2 ] (Eq. 8.69). This equation allows us to say that, if we are close enough to the point x = 0, the higher–order terms are negligible and we can approximate the function cos x with the polynomial 1− 1 2 2 x . You are going to ask me: “What’s the big deal about that?” Primarily, the advantage is that the approximating polynomial is much simpler to use than cos x. For instance, we see that cos x takes the value 1 at x = 0. Also, we can immediately visualize the graph of cos x in a neighborhood of x = 0. Similarly, from the expression of the polynomial, it looks like the first and second derivatives equal, respectively 0 and −1, which are the exact values for f (x) = cos x. Accordingly, the curvature κ at x = 0 is exactly equal to 1 (use Eq. 9.146), and hence a unit circle through the origin is the osculating circle. This gives a good approximation of the graph of cos x around x = 0 (provided of course that we use the same scale for abscissas and ordinates). The fact that polynomials are easier to use than other functions is true in general. So we wonder: “Is there a systematic procedure to obtain a polynomial approximation, in the neighborhood of a point, say x = c, for any of the functions that we have introduced thus far? Can we use higher– degree polynomials and, if so, do we get a better approximation? Is there a relationship between the coefficients of the polynomials and the derivatives of the function, or is the fact that it occurred for the function cos x due to a sheer coincidence?” © Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_13
521
522
Part II. Calculus and dynamics of a particle in one dimension
• Overview of this chapter In order to arrive at the criterion used to construct them, we examine the Taylor polynomials by gradually increasing the degree n. It is apparent that, for a polynomial of degree zero (namely for a constant), the best we can do in a neighborhood of x = c is f (x) f (c). Then, in Section 13.1 we examine a polynomial of degree one, which corresponds to a straight line. The straight line that best approximates a given curve in a neighborhood of a given point, say x = c, is the tangent line to f (x) at x = c (Section 9.1, on the derivative of a function). [We can say that the graph of the function and the tangent line have two infinitesimally close points in common.] In other words, the best first degree polynomial in a suitable neighborhood of x = c is the one that, at x = c, has the same value as f (x), and the same slope, namely the same derivative. The polynomial so constructed is known as the Taylor polynomial of degree one for f (x). Next, in Section 13.2 we consider polynomials of degree two. We may start from the first–order Taylor polynomial and add a term of the type a x2 , and obtain a parabola (Eq. 6.32). Of all the possible values for a, we choose the one that gives us the osculating parabola, whose curvature is given by κ = f (c)/(1 +[f (c)]2 )3/2 (Eq. 9.146). Thus, at x = c, the values of function, first and second derivative of the osculating parabola are identical to those of f (x) (Definition 193, p. 401). The polynomial so constructed is known as the Taylor polynomial of degree two for the function under consideration. Proceeding this way, in Section 13.3 we will show that the best polynomial approximation of degree n is obtained by imposing that the polynomial and f (x) attain the same value at x = c, along with their derivatives up to order n. The polynomial so constructed is known as the Taylor polynomial of degree n for the function under consideration. Then, in Section 13.4, we apply the material developed and obtain the Taylor polynomials for most of the functions that we have introduced thus far. Finally, in Section 13.5, we introduce a useful estimate for the Taylor remainder, namely the error introduced by replacing the function f (x) with its Taylor polynomial. We also have three appendices, where we present applications of this newly introduced tool. Specifically, in the first (Section 13.6), we discuss the conditions for a function f (x) to attain its (local) maximum and minimum values. In the second (Section 13.7), we present an analysis of the error introduced by replacing derivatives with their finite differences approximations (introduced in Section 9.9). Finally, in the third (Section 13.8) we use the Taylor polynomials to address the process of linearization.
523
13. Taylor polynomials
13.1 Polynomials of degree one. Tangent line As indicated above, the Taylor polynomial of degree one, denoted by t1 (x), coincides with the tangent line to f (x) at x = c, namely the straight line that, at x = c, attains the same value and the same slope as the graph of f (x) (Section 7.3). Thus, for a function differentiable at c, the tangent line is represented by a first–degree polynomial (Subsection 6.3.2), which is given by t1 (x) := f (c) + f (c) (x − c).
(13.1)
Indeed, t1 (x) and f (x) have the same value and same slope (namely derivative) at x = c, as you may verify.1 ◦ Warning. There is no uniformity for the symbol used to denote the Taylor polynomials. Sometimes no symbol is used (as in f (x) . . . ). A symbol sometimes used is Tn (x), which I avoid because it’s the symbol used for the Tchebychev polynomials. Similarly, I avoid the symbol Pn (x) because it’s used for the Legendre polynomials. The symbol I use, namely tn (x), should not be a source of confusion. Consider for example, the parabola y = a x2 (Eq. 7.87). Using y(c) = ac2 and y (c) = 2ac, we obtain that its tangent at x = c is given by y(x) = y(c) + y (c) (x − c) = a c2 + 2 a c (x − c).
(13.2)
• Error analysis for t1 (x) We have said that t1 (x) provides a good first–degree approximation for f (x) in a suitable neighborhood of x = c. To be more precise, I claim that t1 (x) is the first–degree polynomial that best approximates f (x) in a suitable neighborhood of x = c, in the sense that it’s the only one with a remainder, R1 (x) := f (x) − t1 (x), of order higher than h, where h = x − c: 1 Taylor polynomials are named after the British mathematician Brook Taylor (16851731). I’d like to let you know that, if c = 0, the polynomials are typically referred to as the Maclaurin polynomials, after the Scottish mathematician Colin Maclaurin (1698– 1746). Nonetheless, I will use the term “Taylor polynomials,” even when c = 0. You might want to know the motivation for this choice. When I was a student, I was led to believe that Taylor introduced a trivial extension of Maclaurin’s results, by replacing x with x − c. However, just the opposite is true — Maclaurin introduced an even more trivial variation of the Taylor work, simply setting c = 0 (giving however full credit to Taylor). Nonetheless, his name is historically associated with these series, simply because of the wide use he made of them. [On the other hand, speaking of cases of inaccurate attributions, Maclaurin should be credited for introducing the Cramer rule for systems of linear algebraic equations (Footnote 7, p. 135).]
524
Part II. Calculus and dynamics of a particle in one dimension
R1 (x) := f (x) − t1 (x) = o[h].
(13.3)
[Recall that, in the “little o” notation introduced in Eqs. 8.75, o[h] stands for “of order higher than h.”] In other words, f (x) and t1 (x) differ by terms of order higher than the last term in t1 (x). [A more interesting estimate for the remainder is presented in Section 13.5.] To prove Eq. 13.3, note that the definition, f (c) := lim
x→c
f (x) − f (c) , x−c
(13.4)
implies f (c) =
f (x) − f (c) + o[1], x−c
(13.5)
as x − c tends to zero. Thus, multiplying by h = x − c and recalling that h o[1] = o[h] (Eq. 8.89), we have f (x) = f (c) + f (c) (x − c) + o[h],
(13.6)
which is fully equivalent to Eq. 13.3. ◦ Comment. We still have to prove that t1 (x) is the only first–degree polynomial with an error of o[h]. Any other first–degree polynomial would have a larger error. Indeed, we have (use Eq. 13.6) f (x) − f (c) + m (x − c) = f (c) + f (c) (x − c) + o[h] − f (c) + m (x − c) = f (c) − m (x − c) + o[h] = O[h], (13.7) whenever m = f (c). In this sense (and only in this sense), the Taylor polynomial t1 (x) provides the most accurate first–degree approximation of the function f (x) in a suitable neighborhood of c. ◦ Comment. Note that for the existence of t1 (x) (and for the error analysis above) it is sufficient that f (x) admits the derivative f (c).
13.2 Polynomials of degree two; osculating parabola For a function that is twice differentiable at c, we can introduce the Taylor polynomial of degree two, t2 (x). This is such that at x = c the polynomial t2 (x) and f (x) have the same value, as do their first two derivatives, namely t2 (x) = f (c), t2 (x) = f (c), and t2 (x) = f (c). Thus, we have
525
13. Taylor polynomials
t2 (x) = f (c) + f (c) (x − c) +
1 f (c) (x − c)2 . 2
(13.8)
The second–degree Taylor polynomial describes a parabola (Eq. 6.32), namely the osculating parabola for the function y = f (x) at x = c (Definition 193, p. 401).
• Error analysis for t2 (x)
♠
We have said that t2 (x) provides a good second–degree polynomial approximation for f (x) in a suitable neighborhood of x = c. To be more precise, I claim that t2 (x) is the second–degree polynomial that best approximates f (x) in a suitable neighborhood of x = c, in the sense that the remainder, R2 (x) := f (x)−t2 (x), is of order smaller than h2 , where h = x−c. Specifically, we have f (x) = f (c) + f (c) (x − c) +
1 f (c) (x − c)2 + R2 (x), 2
(13.9)
with R2 (x) := f (x) − t2 (x) = o[h2 ].
(13.10)
In other words, f (x) and t2 (x) differ by terms of order higher than the last term in t2 (x). [Again, a more interesting expression for R2 (x) is obtained in Section 13.5.] Indeed, f (x) − f (c) x→c x−c
(13.11)
f (x) − f (c) + o[1]. x−c
(13.12)
f (c) = lim implies f (c) =
Recalling again that h o[1] = o[h] (Eq. 8.89), we have f (x) = f (c) + f (c) (x − c) + o[h],
(13.13)
which is fully equivalent to Eq. 13.10. [Hint: Integrate the above equation from c to x (akin to what we did in Theorem 104, p. 391).] ◦ Comment. It should be emphasized that, as in the preceding cases, t2 (x) is the only first–degree polynomial with an error of o[h2 ]. Any other polynomial would have a larger error, as you may verify (use the approach followed in
526
Part II. Calculus and dynamics of a particle in one dimension
Eq. 13.7). In this sense (and only in this sense), the Taylor polynomial t2 (x) provides the most accurate second–degree approximation of the function f (x) in a suitable neighborhood of c. ◦ Comment. Note that for the existence of t2 (x) (and for the error analysis above) it is sufficient that f (x) admits the derivative f (c).
13.3 Polynomials of degree n The results of the preceding sections make us wonder whether (and how) the process may be continued with higher degree polynomials. To this end, assume that f (x) is differentiable n times at x = c. Then, its Taylor polynomial of degree n is defined by tn (x) := f (c) + f (c)(x − c) + f (c) =
n # k=0
f (k) (c)
(x − c)2 (x − c)n + · · · + f (n) (c) 2! n!
(x − c)k , k!
(13.14)
where f (k) (x) := dk f /dxk (Eq. 9.33), whereas n! = 1 · 2 · 3 · 4 . . . (n − 1) · n and 0! = 1 (Eqs. 9.41 and 9.43). Note that
dh k (x − c) = 0, for 0 ≤ h < k, dxh x=c dh (x − c)k = k!, dxh dh (x − c)k = 0, dxh
for h = k, for h > k.
(13.15)
[Hint: Use dxk /dx = k xk−1 (Eq. 9.46).] Accordingly, the function f (x) and the corresponding Taylor polynomial have the same value at x = c, as do their first n derivatives.
• Error analysis for tn (x)
♠
We have said that tn (x) provides a good n-th degree approximation for f (x) in a suitable neighborhood of x = c. Again, to be more precise, I claim that tn (x) is the polynomial of degree n that best approximates f (x) in a suitable neighborhood of x = c, in the sense that the remainder, Rn (x) := f (x)−tn (x),
527
13. Taylor polynomials
is of order higher than hn , where h = x − c, that is,2 Rn (x) := f (x) − tn (x) = o[hn ].
(13.16)
[Again, a more interesting expression for Rn (x) is obtained in Section 13.5.] Indeed, the definition of f (n) (x) at x = c implies f (n) (c) =
f (n−1) (x) − f (n−1) (c) + o[1]. x−c
(13.17)
Multiplying by h, and recalling again that h o[1] = o[h] (Eq. 8.89), we obtain f (n−1) (x) = f (n−1) (c) + f (n) (c) (x − c) + o[h].
(13.18)
Then, integrating this n−1 times from c to x, we obtain Eq. 13.16. [Hint: Apply Theorem 104, p. 391, at every step.] ◦ Comment. It should be emphasized that, as in the preceding cases, tn (x) is the only polynomial of degree n with an error of o[hn ]. Any other polynomial would have a higher–order error, as you may verify. [Hint: Use the approach followed in Eq. 13.7.] In this sense (and only in this sense), the Taylor polynomial tn (x) provides the most accurate n-th degree approximation of the function f (x) in a suitable neighborhood of c. Remark 112. Note that for the existence of tn (x) (and for the error analysis above) it is sufficient that f (x) admits the derivative f (k) (c).
13.4 Taylor polynomials of elementary functions Here, we introduce the Taylor polynomials for the so–called elementary functions (namely all of the functions that we have introduced thus far). For all the Taylor polynomials considered in this section, we use c = 0. [See Footnote 1, p. 523.]
13.4.1 Algebraic functions Here, we consider the Taylor polynomials of some algebraic functions. Specifα ically, we begin√with 1/(1 + √ x) and 1/(1 + x2 ).√Then, we address √ (1 + x) 2 2 (in particular, 1 + x and 1/ 1 + x), as well as 1 ± x and 1/ 1 ± x . We 2
This form of the Taylor polynomial remainder is named after Giuseppe Peano.
528
Part II. Calculus and dynamics of a particle in one dimension
conclude this subsection with the analysis of the binomial expansion, namely the expression for the n-th power of a binomial, namely (a + b)n .
• Taylor polynomials of 1/(1 + x) and 1/(1 + x2 ) Here, we consider the function 1/(1 + x). Note that 1 −1 d = dx 1 + x (1 + x)2
(13.19)
and, in general, dn dxn
1 1+x
= (−1)n
n! . (1 + x)n+1
(13.20)
as you may verify. Hence, for t7 (x), we have 1 = 1 − x + x 2 − x 3 + x4 − x5 + x 6 − x 7 + o x7 . 1+x
(13.21)
[This could have been obtained also from Eq. 8.41, namely 1 x8 = 1 + x + x2 + x3 + x4 + x5 + x6 + x7 + . 1−x 1−x
(13.22)
This includes Eq. 13.21 as a particular case.] What about 1/(1 ± x2 )? Here is an easy trick: simply replace x with ±x2 in Eq. 13.21 to obtain the corresponding Taylor polynomials t14 (x), as 1 = 1 ∓ x2 + x4 ∓ x6 + x8 ∓ x10 + x12 ∓ x14 + o x14 . 2 1±x
(13.23)
• Taylor polynomials of (1 + x)α and similar cases Let us consider the function (1 + x)α . Recalling that dxα /dx = α xα−1 (Eq. 9.39), we have d (1 + x)α = α (1 + x)α−1 . dx Similarly, we have
(13.24)
529
13. Taylor polynomials
d2 (1 + x)α = α (α − 1) (1 + x)α−2 , dx2 d3 (1 + x)α = α (α − 1) (α − 2) (1 + x)α−3 , dx3
(13.25)
and so on. Accordingly, for the Taylor polynomial tn (x) of (1 + x)α we have x3 x2 + α (α − 1) (α − 2) + ... 2! 3! n # xn + α (α − 1) . . . (α + 1 − n) = Bkα xk , (13.26) n!
tn (x) = 1 + α x + α (α − 1)
k=0
where
B0α
= 1 and Bkα =
1 α (α − 1) . . . (α − k + 1), k!
(13.27)
with k = 1, . . . , n. In particular, for α = −1 we recover Eq. 13.21. On the other hand, for the √ function 1 + x = (1+x)1/2 (namely for α = 1/2), for the Taylor polynomials t3 (x), we have √ 1 x2 3 x3 1 + + o x3 , (13.28) 1+x =1+ x− 2 4 2! 8 3! √ whereas, for 1/ 1 + x = (1 + x)−1/2 (namely for α = −1/2), again for the Taylor polynomials t3 (x), we have √
1 3 x2 3 · 5 x3 1 − + o x3 . =1− x+ 2 4 2! 8 3! 1+x
(13.29)
[You might want to sharpen your skills, and obtain the expressions of tn (x), for values of n higher than 3.] α What about 1 ± x2 ? Again, simply replace x with ± x2 in Eq. 13.26, and obtain α x6 x4 ± α (α − 1) (α − 2) + o x6 . (13.30) 1 ± x2 = 1 ± α x2 + α (α − 1) 2! 3! In particular, for α =
1 2
and α = − 12 , we have, respectively,
3 x6 1 1 x4 ± + o x6 , 1 ± x 2 = 1 ± x2 − 2 4 2! 8 3! 1 1 3 4 3·5 6 √ x ∓ x + o x6 . = 1 ∓ x2 + 2 2 2 · 4 2 · 4 · 6 1±x
(13.31) (13.32)
530
Part II. Calculus and dynamics of a particle in one dimension
13.4.2 Binomial theorem Equations 13.26 and 13.27 are particularly interesting when α equals a natural number, say n. In this case, the coefficients Bkα (Eq. 13.27) are given by n! 1 n (n − 1) . . . (n − k + 1) = , k! k! (n − k)! =0
Bkn =
for k ≤ n; for k > n,
(13.33)
where n! = 1 · 2 · 3 · 4 . . . (n − 1) · n denotes the n factorial (Eq. 9.41). Next, let us introduced the universally accepted notation n := Bkn (k ≤ n), (13.34) k where n n n! := = n−k k k! (n − k)! is called the binomial coefficient. Note that (use 0! = 1, Eq. 9.43) n n! n = = =1 n 0 0! n!
(13.35)
(13.36)
and n n! n = = = n. n−1 1 1! (n − 1)!
(13.37)
Let us now go back to Eq. 13.26, which using the above notation reads 1 1 n (n − 1) x2 + n (n − 1) (n − 2) x3 2 3! n # n k x . + · · · + n xn−1 + xn = (13.38) k
(1 + x)n = 1 + n x +
k=0
◦ Comment. It should be noted that, contrary to Eq. 13.26, this is an exact expression, not an approximate one, and is valid for any value of x. Remark 113. Let us go back to Eq. 8.21, namely (1 + x)n > 1 + n x, for x > 0. We can now use Eq. 13.38 for the proof. Let us make things a bit more general. What about (a + b)n ? We have (a + b)n = an (1 + b/a)n . Then, replacing x with b/a, and multiplying the
531
13. Taylor polynomials
result by an , Eq. 13.38 yields 1 1 n (n − 1) an−2 b2 + n (n − 1) (n − 2) an−3 b3 2 3! n # n n−k k a + · · · + n a bn−1 + bn = b . (13.39) k
(a + b)n = an + n an−1 b +
k=1
The above equation is known as the binomial theorem.
13.4.3 Trigonometric functions and their inverses Here, we first introduce the Taylor polynomials of the trigonometric functions sin x and cos x. Then, we consider tan x. We conclude this subsection with the Taylor polynomials for the inverse trigonometric functions, sin -1 x, cos -1 x and tan -1 x.
• Taylor polynomials of sin x and cos x Let us begin with the functions sin x and cos x. Recall that the derivatives of the functions sin x and cos x are given by (sin x) = cos x (Eq. 9.70) and (cos x) = − sin x (Eq. 9.71), and that sin 0 = 0 and cos 0 = 1 (Eq. 6.77). Accordingly, the Taylor polynomials t7 (x) are (use Table 13.1) x5 x7 x3 + − + o x7 , 3! 5! 7! x2 x4 x6 cos x = 1 − + − + o x7 . 2! 4! 6!
sin x = x −
(13.40) (13.41)
f (x)
f0
f0
f0
f0
f0
f0
f0
sin x
0
1
0
−1
0
1
0
−1
cos x
1
0
−1
0
1
0
−1
0
(4)
(5)
Table 13.1 Derivatives of trigonometric functions, at x = 0
(6)
(7)
f0
532
Part II. Calculus and dynamics of a particle in one dimension
The above equations provide a generalization of sin x = x + o(x) (Eq. 8.85), and cos x = 1 − 12 x2 + o(x2 ) (Eq. 8.86). The graphs for the Taylor polynomials tk (x) in Eqs. 13.40 and 13.41 are presented in Figs. 13.1 and 13.2, for different values of k.
Fig. 13.1 tk (x) for sin x
• Taylor polynomial of tan x
Fig. 13.2 tk (x) for cos x
♣
To obtain the Taylor polynomial of tan x, we could evaluate all the required derivatives of tan x. You may try that, but this is too cumbersome for my tastes. So, I will proceed in a different way. We know that tan x is an odd function. Accordingly, all the coefficients of the even powers vanish, and hence we have to worry only about the coefficients of the odd powers. We also know that (tan x) = 1 for x = 0 (Eq. 9.72). Thus, the Taylor polynomial t7 (x) for tan x is necessarily such that tan x = x + A x3 + B x5 + C x7 + o x7 . (13.42) We also know that tan x cos x = sin x, by definition of tan x, which implies x4 x6 x2 x + A x3 + B x5 + C x7 1 − + − 2 4! 6! 3 5 7 x x x + − + o x7 . (13.43) =x− 3! 5! 7! Equating the coefficients of x3 yields −1/2 + A = −1/6, namely A = 1/3. Next, equating the coefficients of x5 yields 1/4! − A/2 + B = 1/5!, namely B=
1 1 1 − 5 + 20 16 2 1 − + = = = . 5! 4! 3! 5! 2·3·4·5 15
(13.44)
533
13. Taylor polynomials
You might want to continue, so as to practice your skills. Then, equating the coefficients of x7 you would obtain C = 17/315. Thus, for the Taylor polynomial t7 (x), we have tan x = x +
2 5 17 7 1 3 x + x + x + o x7 . 3 15 315
• Taylor polynomials for sin -1 x, cos -1 x and tan -1 x
(13.45)
♠
Note that (use Eqs. 9.73 and 13.32) d 1 1 3 4 3·5 6 sinP-1 x = √ x + x + o x6 . (13.46) = 1 + x2 + 2 dx 2 2·4 2·4·6 1−x Integrating this equation yields, for the Taylor polynomial t7 (x), sinP-1 x = x +
3 x5 3 · 5 x7 1 x3 + + + o x7 . 2 3 2·4 5 2·4·6 7
(13.47)
[Hint: The constant of integration, C = 0, is obtained from the values of the functions at x = 0.] The Taylor polynomial for cosP-1 x is obtained from the fact that cosP-1 x = π/2−sinP-1 x. [Hint: Use Eq. 10.34 and cosP-1 0 = π/2 (Eq. 6.136).] This yields, for the Taylor polynomial t7 (x), cosP-1 x =
1 x3 3 x5 3 · 5 x7 π −x− − − + o x7 . 2 2 3 2·4 5 2·4·6 7
(13.48)
Finally, note that (use Eqs. 9.75 and 13.23) d 1 tan -1 x = = 1 − x2 + x 4 − x 6 + o x6 . 2 dx 1+x
(13.49)
Integrating this equation yields, for the Taylor polynomial t7 (x), tan -1 x = x −
x5 x7 x3 + − + o x7 . 3 5 7
(13.50)
[Again, the constant of integration, C = 0, is obtained from the values of the functions at x = 0.]
534
Part II. Calculus and dynamics of a particle in one dimension
13.4.4 Logarithms, exponentials, and related functions Here, we consider the Taylor polynomials of the functions that we have introduced in Chapter 12. Specifically, we begin with the Taylor polynomials of ln(1 + x), ex , sinh x and cosh x. Next, we consider tanh x. Then, we address the Taylor polynomials for the inverse hyperbolic functions sinh -1 x and tanh -1 x. We conclude this subsection with the analysis of cosh -1 x, which requires a special treatment, since the Taylor polynomial for cosh -1 x around x = 0 does not exist.
• Taylor polynomials of ln(1 + x), ex , sinh x and cosh x We are now in a position to obtain the Taylor polynomials of the functions ln(1 + x), ex , sinh x and cosh x. [We cannot expand ln u around u = 0, where it is not defined. For this reason, we expand it around u = 1.] Note that (use Eq. 12.3, as well as the chain rule as given in Eq. 9.64) d 1 ln(1 + x) = , dx 1+x
(13.51)
and hence d2 ln(1 + x)/dx2 = −1/(1 + x)2 , and in general (use Eq. 13.20) dn (n − 1)! ln(1 + x) = (−1)n−1 . dxn (1 + x)n
(13.52)
In addition, recall that d ex /dx = ex (Eq. 12.30), d(sinh x)/dx = cosh x (Eq. 12.48), d(cosh x)/dx = sinh x (Eq. 12.49), and that ln 1 = 0 (Eq. 12.2), e0 = 1 (Eq. 12.28), sinh 0 = 1 and cosh 0 = 0 (Eq. 12.54).
f (x)
f0
f0
f0
f0
ln(1 + x)
0
1
-1
2!
ex
1
1
1
sinh x
0
1
cosh x
1
0
(4)
(5)
f0
-3!
4!
- 5!
6!
1
1
1
1
1
0
1
0
1
0
1
1
0
1
0
1
0
f0
f0
(6)
Table 13.2 Derivatives of ln(1 + x), ex , sinh x and cosh x, at x = 0
(7)
f0
535
13. Taylor polynomials
Therefore, the corresponding approximations with the Taylor polynomials t7 (x) are given by (use Table 13.2) x2 x3 x4 x5 x6 x7 + − + − + + o x7 , 2 3 4 5 6 7 2 3 4 5 6 7 x x x x x x ex = 1 + x + + + + + + + o x7 , 2! 3! 4! 5! 6! 7! x3 x5 x7 sinh x = x + + + + o x7 , 3! 5! 7! x2 x4 x6 cosh x = 1 + + + + o x7 . 2! 4! 6!
ln(1 + x) = x −
(13.53) (13.54) (13.55) (13.56)
◦ Comment. Note that, by differentiating Eq. 13.53, one obtains the Taylor polynomials t6 (x) for 1/(1+x), as 1/(1+x) 1−x+x2 −x3 +x4 −x5 +x6 , in agreement with Eq. 13.21. Thus, Eq. 13.53 can also be obtained by integrating Eq. 13.21. [The integration constant vanishes because ln 1 = 0, Eq. 12.2]. Remark 114. Note that, except for the signs of the second and fourth terms, the expressions for sinh x and cosh x are equal to those for sin x and cos x (Eqs. 13.40 and 13.41). [A deeper analysis of this issue will be obtained within the complex field, addressed in Vol. II.]
• Taylor polynomial of tanh x
♠
Here, you might like to sharpen your skills. Follow the procedure used to obtain Eq. 13.45 and show that tanh x = x −
1 3 2 5 17 7 x + x − x + o x7 . 3 15 315
• Taylor polynomials of sinh -1 x and tanh -1 x
(13.57)
♣
Note that (use Eqs. 12.64 and 13.32) d 1 1 3 4 3·5 6 sinh -1 x = √ x − x + o x6 . (13.58) = 1 − x2 + 2 dx 2 2·4 2·4·6 1+x Integrating yields (again, the integration constant vanishes, because sinh -1 0 = 0, Eq. 12.54) sinh -1 x = x −
3 x5 3 · 5 x7 1 x3 + − + o x7 . 2 3 2·4 5 2·4·6 7
(13.59)
536
Part II. Calculus and dynamics of a particle in one dimension
Similarly, note that (Eqs. 12.66 and 13.23) d 1 tanh -1 x = = 1 + x2 + x 4 + x 6 + o x 6 . dx 1 − x2
(13.60)
Integrating, one obtains tanh -1 x = x +
x5 x7 x3 + + + o x7 . 3 5 7
• No Taylor polynomial for cosh -1 x around x = 1!
(13.61)
♠
What about cosh -1 x? This requires a special treatment. To begin with, cosh -1 x is defined (in the real field) only for x ≥ 1. So, we could consider the Taylor polynomial for cosh -1(1 + x), with x > 0. Let us try. We know that √ (cosh -1 u) = ±1/ u2 − 1 (Eq. 12.65). Hence, we have d ±1 cosh -1 (1 + x) = dx (1 + x)2 − 1 1 ±1 ±1 . =√ =√ 2 2 x 1 + x/2 2x + x
(13.62)
√ Next, recall that 1/ 1 + u 1 − u/2 + 3 u2 /8 − 5 u3 /16 (Eq. 13.29). Hence, we have d 3 2 ±1 5 3 1 -1 cosh (1 + x) √ x − x 1− x+ dx 4 32 128 2x ±1 1 3 3/2 5 5/2 =√ x − x . (13.63) x−1/2 − x1/2 + 4 32 128 2 Integrating, we have ±1 1 3/2 3 5/2 5 7/2 1/2 √ x − x 2x − x + cosh (1 + x) = 6 80 448 2 √ 3 2 1 5 3 x+ x − x . (13.64) ± 2x 1 − 12 160 896 -1
It is not quite a Taylor polynomial (it is not even a polynomial), but it is as √ good as it gets. The x behavior for x close to zero is unavoidable because its inverse, x = cosh y − 1, has a y 2 behavior in a neighborhood of y = 0.
537
13. Taylor polynomials
13.5 Lagrange form of Taylor polynomial remainder
♣
In Section 13.3, we have shown that f (x) = tn (x) + o[hn ], where h = x − c (Eq. 13.16). However, we can be more precise. Specifically, let us assume that f (n+1) (x) := dn+1 f /dxn+1 is continuous in [c, x]. Then, I claim that the remainder Rn (x) of the Taylor polynomial of degree n is given by Rn (x) := f (x) − tn (x) = f (n+1) (ξ)
(x − c)n+1 , (n + 1)!
(13.65)
for some ξ ∈ (c, x). This is known as the Lagrange form of the Taylor polynomial remainder. For the sake of clarity, we present the proof of Eq. 13.65 first for R1 (x), then for R2 (x), and finally for Rn (x). • Remainder for polynomials of degree one Here, we address the remainder of Taylor polynomials of degree one. Specifically, we want to improve upon Eq. 13.3. We have the following Theorem 139. If f (x) is continuous in [c, x], we have R1 (x) := f (x) − t1 (x) = f (ξ)
(x − c)2 , 2
(13.66)
for some ξ = ξ(x) ∈ (c, x). ◦ Proof : We can apply the mean value theorem in differential calculus (Eq. 9.84) to the function f (x) to obtain f (η) = [f (x) − f (c)]/(x − c), where η is a suitable number in (c, x), which of course is a function of x: η = η(x). This is equivalent to f (x) − f (c) = f (η) (x − c),
with η = η(x) ∈ (c, x).
(13.67)
Integrating both sides of the above equation between c and x, one obtains f (x) − f (c) − f (c) (x − c) = R1 (x),
(13.68)
with x
/ R1 (x) =
f [η(x1 )] (x1 − c) dx1 .
(13.69)
c
Next, being f (x) continuous by hypothesis, we can use the extended mean value theorem in integral calculus (Eq. 10.14), which guarantees that there
538
Part II. Calculus and dynamics of a particle in one dimension
0 exists a ξ ∈ (c, x) such that (use u du = 12 u2 , Eq. 10.30) / x R1 (x) = f [(η(x )] (x − c) dx c / x (x − c)2 , = f (ξ) (x − c) dx = f (ξ) 2 c
(13.70)
in agreement with Eq. 13.66.
Remark 115. Note that Eq. 13.66 has a more stringent requirement than Eq. 13.3. Specifically, the proof of Eq. 13.3 only requires f (x) to exist at x = c, whereas that of Eq. 13.66 requires f (x) to be continuous in [0, c]. [It should be pointed out that using a different (more elegant, but less gut–level) proof, the requirement is only that f (x) be continuous in [c, x] and f (x) exists in (c, x) (see Remark 116, p. 540).]
• Remainder for polynomials of degree two Here, we improve upon Eq. 13.10. Specifically, we have the following Theorem 140. If f (x) is continuously differentiable in [c, x], we have R2 (x) := f (x) − t2 (x) =
1 (x − c)3 f (ξ), 3!
(13.71)
for some ξ = ξ(x) ∈ (c, x). ◦ Proof : ♣ ♣ The mean value theorem in differential calculus (Eq. 9.84) yields f (η) = [f (x) − f (c)]/(x − c), where η is a suitable number in (c, x), which of course is a function of x: η = η(x). This is equivalent to f (x) − f (c) = f (η) (x − c),
with η = η(x) ∈ (c, x).
(13.72)
Integrating both sides of this equation between c and x, one obtains / x f (x) − f (c) − f (c) (x − c) = f [η(x1 )] (x1 − c) dx1 . (13.73) c
Integrating again both sides of this new equation between c and x, one obtains (x − c)2 f (x) − f (c) − f (c) (x − c) − f (c) 2 / x / x2 = f [η(x1 )] (x1 − c) dx1 dx2 , c
namely
c
(13.74)
539
13. Taylor polynomials x /
/
x2
R2 (x) =
f [η(x1 )] (x1 − c) dx1 dx2 .
(13.75)
c
c
Next, we follow the technique used for the extended mean value theorem in integral calculus (Theorem 116, p. 416). Let us start from fMin ≤ f [η(x1 )] ≤ fMax ,
(13.76)
where fMin and fMax denote the minimum and maximum of f (x) in [c, x]. For simplicity, assume that x > c. [You might like to address the proof for x < c. The main difference is that the inequalities in Eq. 13.77 are reversed.] Then, multiplying by (x−c) and integrating twice as in Eq. 13.75, one obtains fMin
3 (x − c)3 (x − c) ≤ R2 (x) ≤ fMax . 3! 3!
(13.77)
Finally, being f (x) continuous by hypothesis, the corollary to the intermediate value theorem for continuous functions (Corollary 3, p. 342) guarantees that there exists a ξ ∈ (c, x) such that R2 (x) = f (ξ) (x − c)3 /3!, in agreement with Eq. 13.66. ◦ Comment. Akin to Remark 115, p. 538, the proof of Eq. 13.9 only requires f (c) to exists at x = c (Remark 112, p. 527), whereas that of Eq. 13.71 requires f (x) to be continuous in [c, x] (or at least f (x) to be continuous in [c, x] and f (x) to exists in (c, x), Remark 116, p. 540).
• Remainder for polynomials of degree n Here, we improve upon Eq. 13.16. Specifically, we have the following Theorem 141. If f (n+1) (x) := dn+1 f /dxn+1 is continuous in [c, x], then Rn (x) := f (x) − tn (x) = f (n+1) (ξ)
(x − c)n+1 , (n + 1)!
(13.78)
for some ξ = ξ(x) ∈ (c, x). ◦ Proof : ♣ ♣ The proof is a simple extension of that given for R2 (x). Specifically, let us start from f (n+1) (η) =
f (n) (x) − f (n) (c) , x−c
(13.79)
where η = η(x) ∈ (c, x) is obtained by applying the mean value theorem in differential calculus (Eq. 9.84) to the function f (n) (x). Multiplying by x − c and integrating n times both sides of this equation between c and x, one
540
Part II. Calculus and dynamics of a particle in one dimension
obtains x
/ Rn (x) =
x2
/ ...
c
f (n+1) [η(x1 )] (x1 − c) dx1 . . . dxn .
(13.80)
c
Next, assume x > c. [Again you might like to address the proof for the case x < c.] Then, following essentially the same procedure used to obtain Eq. 13.77, we have (n+1) fMin
n+1 (x − c)n+1 (n+1) (x − c) ≤ Rn (x) ≤ fMax . (n + 1)! (n + 1)!
(13.81)
Finally, being f (n+1) (x) continuous by hypothesis, the corollary to the intermediate value theorem for continuous functions (Corollary 3, p. 342) guarantees that there exists a ξ ∈ (c, x) such that Rn (x) is given by Eq. 13.78. ◦ Comment. Akin to Remark 115, p. 538, the proof of Eq. 13.16 only requires f (n) (x) to exist at x = c, whereas Eq. 13.78 requires f (n+1) (x) to be continuous in [c, x] (or at least f (n) (x) to be continuous in [c, x] and f (n+1) (x) to exists in (c, x), see Remark 116 below). Remark 116. It should be noted that the theorem holds under less restrictive hypotheses, specifically that: (i) f (n) (x) is continuous in [c, x], and (ii) f (n+1) (x) exists in (c, x). [For a proof, see Rudin (Ref. [56], p. 110–111).]
13.6 Appendix A. On local maxima and minima again
♣
In this appendix, first we present the sufficient conditions for a function f (x) to attain a local minimum at a given point. Then, we up the ante, and address the necessary and sufficient conditions for local minima. [It is sufficient to address the problem for the limited case of local minima. For, a change in sign allows one to deal with the problem of maxima of f (x).] We have seen that f (c) = 0 is a necessary condition for a local minimum of a differentiable f (x) (Theorem 99, p. 384). Here, we consider the following Theorem 142 (Sufficient conditions for a local minimum). Assume that f (x) is twice differentiable at x = c. Sufficient conditions for f (x) to have a local minimum at x = c are f (c) = 0 and f (c) > 0. ◦ Proof : The existence of f (c) guarantees (see Remark 112, p. 527) the existence of the second–degree Taylor polynomial (Eq. 13.8). Recall that f (c) = 0 is a necessary condition for the function to have a local minimum at x = c (Theorem 99, p. 384). Then, setting f (c) = 0 in Eq. 13.9 yields
541
13. Taylor polynomials
f (x) = f (c) +
1 f (c) (x − c)2 + ε2 , 2
(13.82)
where ε2 = o[h2 ], with h = x − c. Note that the definition of the symbol o[h2 ] (Definition 152, p. 326) implies that, for h sufficiently small, ε2 is negligible with respect to 12 f (c) (x − c)2 . Specifically, there exists a δ > 0 such that, for any |h| < δ, we have |ε2 | < | 12 f (c) (x − c)2 |. Hence, for any |h| < δ, we have that f (c) > 0 implies that f (x) > f (c). Thus, have a local minimum.
• General rule for local minima/maxima
♠
Theorem 142 above does not give us any information whenever we have both f (c) = 0 and f (c) = 0. In this case, one needs to look at the higher derivatives. For instance, at the end of Subsection 9.5.1, we have seen that the function f (x) = x3 , for which we have f (c) = f (c) = 0, and f (c) > 0, has a horizontal inflection point at x = c (Fig. 9.4, p. 385). On the other hand, for f (x) = x4 , which has f (c) = f (c) = f (c) = 0, and f (4) (c) = 4! > 0, we have an minimum at x = c. Is there a general rule to decide on this matter? The answer is: “Yes!” For, we have the following Theorem 143 (General rule for local minima/maxima). Assume that dn f /dxn is lowest–order nonzero derivative at x = c, namely that f (c) = · · · = f (n−1) (c) = 0, whereas f (n) (c) = 0. Then, if n is odd, we have a horizontal inflection point, whereas if n is even and f (n) (c) > 0 (f (n) (c) > 0), we have a minimum (maximum). ◦ Proof : Since f (c) = · · · = f (n−1) (c) = 0, the Taylor polynomial of degree n approximating f (x) is (Eq. 13.16) f (x) = f (c) +
1 (n) f (c) (x − c)n + εn , n!
(13.83)
where εn = o[hn ], with h = x−c. Note that the definition of the symbol o[hn ] (Definition 152, p. 326) implies that, for h sufficiently small, εn is negligible with respect to f (n) (c) (x−c)n /n!. Specifically, there exists a δ > 0 such that, for any |h| < δ, we have |εn | < |f (n) (c) (x − c)n /n!|. Consider the case of n even. Then, for any |h| < δ, we have that f (n) (c) > 0 implies that f (x) > f (c), and we have a local minimum. [Of course, if f (n) (c) < 0 we have a maximum.] Next, consider the case in which n is odd. Then, Eq. 13.83 still applies, as does the fact that for |h| sufficiently small we have |εn | < |f (n) (c) (x−c)n /n!|. Now however, n being odd implies that (x − c)n changes sign as x − c changes sign. Thus, we have a horizontal inflection point!
542
Part II. Calculus and dynamics of a particle in one dimension
13.7 Appendix B. Finite differences revisited
♣
We now have the tool necessary to assess the level of accuracy of the finite– difference approximations introduced in Section 9.9. To begin with, note that, if f (x) admits continuous derivatives, up to order n + 1, for x ∈ (c − δ, c + δ), then Eqs. 13.14, 13.16 and 13.65 imply f (x) = f (c) + f (c) h + f (c)
h2 hn + · · · + f (n) (c) + O hn+1 , 2! n!
(13.84)
where h = x − c, with |h| < δ. Next, assume that we know the function f (x) at a discrete set of points that are uniformly spaced, with xk+1 − xk = h (k = 1, . . . , n). Let k denote the generic point. Then, setting fk±1 = f (xk ± h), for n = 3 the above equation yields fk±1 = fk ± fk h + fk
h2 h3 ± fk + O h4 . 2! 3!
(13.85)
Thus, we have fk+1 − fk−1 = fk 2 h + O[h3 ]. Then, using h O[hn ] = O[hn+1 ] (Eq. 8.90), we obtain the central finite–difference approximation for the first derivative fk =
fk+1 − fk−1 + O h2 , 2h
(13.86)
an improvement upon Eq. 9.174. Similarly, we have fk+1 − 2fk + fk−1 = fk h2 + O[h4 ] (use again Eq. 13.85). Accordingly, we obtain the central finite–difference approximation for the second derivative (use h2 O[hn ] = O[hn+2 ], Eq. 8.90) fk =
fk+1 − 2 fk + fk−1 + O h2 , h2
(13.87)
an improvement upon Eq. 9.175. You may obtain the higher–order derivatives. For instance, the central finite–difference approximation for the fourth–order derivative (use Eq. 13.84 with n = 5, xk = c and xk+j = c + j h, an extension of Eq. 13.85) (4)
fk
=
fk+2 − 4 fk+1 + 6 fk − 4 fk−1 + fk−2 + O h2 , h4
an improvement upon Eq. 9.177.
(13.88)
543
13. Taylor polynomials
• Forward and backward finite differences The expressions given above are symmetric with respect to the point xk . For this reason, they are referred to as central finite–difference approximations. Sometimes, we prefer to use points that lie only on one side of xk . For instance, at the boundary of the interval under consideration, we cannot use the central finite difference expression in Eq. 13.86. In such a case, we use fk =
fk+1 − fk +O h h
and
fk =
fk − fk−1 +O h , h
(13.89)
which are known respectively as the forward and backward finite–difference approximations for the first derivative. You may verify this, by using the same procedure employed to obtain Eq. 13.86. However, now the accuracy is lower than that in Eq. 13.86. If we want to obtain an accuracy of order h2 , for the forward finite difference approximation we can use fk =
−3 fk + 4 fk+1 − fk+2 + O h2 , 2h
(13.90)
as you may verify. [Similar expressions may be obtained for the backward finite difference approximation. Ditto for higher–order derivatives.]
13.8 Appendix C. Linearization
♣
Thus far, regarding the solution of systems of algebraic equations, we have addressed in some depth solely the linear ones. In other words, the algebraic systems that we know how to solve at this point in our journey are linear. Unfortunately, for most problems, the governing equations are nonlinear. [This true in particular for spring–connected particles in two or three dimensions, even when the springs are assumed to be linear.] The same is even more true for differential equations, with the exception of Subsection 10.5.1 on separation of variables. One wonders whether one can approximate the nonlinear equations with linear ones. To this end, one needs to introduce some simplifying assumptions. In this section, we introduce a procedure that allows us to eliminate all the nonlinear terms, so that the final equations are linear. Such a procedure is known as linearization. The basis for linearization consists in assuming the unknowns to be suitably small, akin to what we did for linear springs (Remark 41, p. 149). [Once more, linearization is of primary importance in engineering, since linear sys-
544
Part II. Calculus and dynamics of a particle in one dimension
tems are much easier to work with than nonlinear ones, because of the superposition theorems (Subsection 9.8.3).] Specifically, on the basis of the theory on Taylor polynomials presented in this chapter, in a suitable neighborhood of a given point, say c, we may approximate a suitably smooth function f (x) with a polynomial. The smaller the interval under consideration, the better the approximation. Accordingly, if |x − c| is small, all the nonlinear terms of the polynomial (quadratic, cubic, etc.) are much smaller than the linear ones, and hence may be neglected. For instance, if x = 10−3 = .001, we have x2 = 10−6 = .000 001. The other powers are even smaller (see again Remark 66, p. 325). It should be apparent that, if we disregard all the nonlinear terms, the resulting equations contain only linear terms (and possibly terms that are independent of the unknown and therefore may be moved to the right side of the equation, as a forcing term). The final equations are called linearized. This means that the original equations have been approximated with some linear equations that are very close to the original ones, provided that the unknowns are very small. You are going to ask me: “How small should the variables be?” Answer: “Suitably small!” This means that the answer depends upon the desired level of accuracy for the problem under consideration. In other words, there are no clear–cut rules in this respect. For sure, the smaller the value of the unknown, the better is the approximation of the description obtained after linearization.
Chapter 14
Damping and aerodynamic drag
In this chapter, we exploit the newly acquired knowledge on the exponential function that we introduced in Chapter 12. Much of the chapter is devoted to extending the results for undamped harmonic oscillators that were obtained in Chapter 11 to the corresponding problems in which damping is present. Specifically, in addressing undamped harmonic oscillators, we considered a system composed of a mass and a spring. Here, we extend the problem to include the presence of damping. Aerodynamic drag is also addressed.
• Overview of this chapter We begin with the definition of a damper and a dashpot (a linear damper) and the corresponding mathematical models. We use these to obtain the equation that governs the motion of a damped harmonic oscillator, namely a mass– spring–dashpot system (Section 14.1). This is accomplished by adding the force due to the presence of a dashpot, to obtain the equation for a damped harmonic oscillator, m x ¨ + c x˙ + k x = F (t) (Eq. 14.4). The solution to this equation is addressed in Sections 14.2, 14.3 and 14.4, which address, respectively: (i) free oscillations, (ii) periodic (in particular harmonic) forcing and (iii) arbitrary forcing. [These are extensions to damped harmonic oscillators of the undamped harmonic oscillator results of Sections 11.6–11.8.] Illustrative examples are presented in Section 14.5, and include how to find out when it’s time to change the shock absorbers of your car. Next, in Section 14.6, we consider a specific problem that involves nonlinear damping, specifically damping that is proportional to the square of the velocity, as in the case of aerodynamic drag. In particular, we address the free fall of a heavy particle (namely a particle subject to gravity), in the presence of air. Then, in Section 14.7, we shift gears and address how the presence of damping alters the energy conservation theorem. © Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_14
545
546
Part II. Calculus and dynamics of a particle in one dimension
We also have two appendices. In the first (Section 14.8), we extend the stability analysis to a related problem, namely x ¨ + p˘ x˙ + q˘ x = 0, which is fully equivalent to m x ¨ + c x˙ + k x = 0 (Eq. 14.7), except that this time no constraint is imposed on the signs of p˘ and q˘, which had been assumed to be positive before. Specifically, we will obtain the conditions that p˘ and q˘ must satisfy to guarantee stability. In the second appendix (Section 14.9), we address a particularly interesting problem, a damped harmonic oscillator in which the mass is so small that the contribution of the inertia term is negligible.
14.1 Dampers. Damped harmonic oscillators In this section, we introduce dampers, in particular a dashpot, namely a linear damper. Then, we include this in the equation of motion and obtain the equations that govern free and forced oscillations of a damped harmonic oscillator (namely a linear mass–spring–dashpot system).
14.1.1 Dampers and dashpots We have the following Definition 220 (Damper and dashpot). A dashpot is a mechanical device that produces a force proportional to the velocity, but acting in a direction equal and opposite to that of the velocity: FD = − c v
(c > 0),
(14.1)
where c is a positive constant, called the damping coefficient. If the damping coefficient c is a positive function of the absolute value of the velocity, c = c(|v|) > 0,
(14.2)
we use the more generic term damper. ◦ Warning. Some authors use the terms damper and dashpot interchangeably (with the meaning here used for damper). In this case, a damper subject to Eq. 14.1, with c constant, is referred to as an ideal dashpot.
547
14. Damping and aerodynamic drag
14.1.2 Modeling damped harmonic oscillators Consider a mass–spring–dashpot system, as depicted in Fig. 14.1. We will refer to this system as a damped harmonic oscillator.
Fig. 14.1 Forced oscillation of a damped harmonic oscillator
In the case of a damped harmonic oscillator, the Newton second law which governs its motion takes the form m a = FS + FD + F (t),
(14.3)
where the force of the (linear) spring, is given by FS = −k x, with k > 0 (Eq. 4.3), the dashpot force is given by Eq. 14.1, and F (t) denotes the sum of all the additional forces, if any. Combining yields m
d2 x dx + k x = F (t), +c 2 dt dt
(14.4)
where m > 0,
c > 0,
k > 0.
(14.5)
The equation in Eq. 14.4 will be referred to as the equation of the forced oscillations of a damped harmonic oscillator. ◦ A word of caution. The term damped harmonic oscillator may appear to you like an oxymoron, because harmonic oscillator implies constant– amplitude oscillations in the absence of a forcing function, whereas damped implies decreasing amplitude motion. However, this term is widely used and I see no reason to change it. Think of it as a harmonic oscillator, with the addition of a damper. The situation in fact is even worse! The expression damped harmonic oscillator may be misleading, because, if the damping coefficient is sufficiently high, the system does not oscillate at all (see Subsection 14.2.2).
548
Part II. Calculus and dynamics of a particle in one dimension
However, following tradition, we adopt this terminology even in such a case, namely when the mass does not oscillate in the absence of a forcing function. Of particular interest here is the case in which the forcing is harmonic, ˘ sin Ωt, namely F (t) = F˘ cos Ωt + G m
dx d2 x ˘ sin Ωt. + k x = F˘ cos Ωt + G +c 2 dt dt
(14.6)
The above equation will be referred to as the equation of the harmonically forced oscillations of a damped harmonic oscillator. Also of interest is the case in which the applied force vanishes: F (t) = 0. In this case, we simply have m
dx d2 x + k x = 0. +c 2 dt dt
(14.7)
The above equation will be referred to as the equation of the free oscillations of a damped harmonic oscillator. As in the case of the oscillations of an undamped harmonic oscillator (Remark 98, p. 465), experimental evidence indicates that the equation of motion must be “completed” by adding two initial conditions, typically those which state that the initial location and the initial velocity are assigned, namely (Eq. 11.33) x(0) = x0
and
x(0) ˙ = v0 ,
(14.8)
where x0 and v0 are arbitrarily prescribed constants.
14.2 Damped harmonic oscillator. Free oscillations In this section, we address the unforced problem, namely the mass–spring– dashpot system depicted in Fig. 14.2. This will be referred to as the free– oscillation problem for a damped harmonic oscillator, and is governed by Eq. 14.7, along with the two initial conditions in Eq. 14.8. To solve this problem, let us try solutions of the type x(t) = eβt cos ωt
and
x(t) = eβt sin ωt.
(14.9)
Consider the function x(t) = eβt cos ωt. We have dx = β eβt cos ωt − ω eβt sin ωt dt
(14.10)
549
14. Damping and aerodynamic drag
Fig. 14.2 Free oscillations of a damped harmonic oscillator
and d2 x = (β 2 − ω 2 ) eβt cos ωt − 2 β ω eβt sin ωt. dt2
(14.11)
[Hint: Use the product rule (Eq. 9.18), as well as the derivatives of the functions sin ωt, cos ωt, and eβt , given by ω cos ωt, −ω sin ωt, and β eβt (Eqs. 9.70, 9.71, and 12.30, respectively).] Combining with Eq. 14.7 yields ) * m (β 2 − ω 2 ) + c β + k eβt cos ωt − ω (2 m β + c) eβt sin ωt = 0. (14.12) Equation 14.12 implies necessarily (Theorem 122, p. 433, on the linear independence of sin x and cos x) m (β 2 − ω 2 ) + c β + k = 0 and ω 2 m β + c = 0. (14.13) The second equality in the above equation indicates that we have two options: either (i) 2 β m + c = 0 or (ii) ω = 0. ◦ Option (i). Consider the first option, which yields β = −c/(2m) < 0. Then, the first equality in Eq. 14.13 is used to obtain ω 2 , as ω2 =
c2 k + cβ k 4 m k − c2 + β2 = − = , 2 m m 4m 4 m2
(14.14)
√ namely ω = 4 m k − c2 /(2m). Similar results are obtained for x(t) = eβt sin ωt, as you may verify. Summarizing, for the first option we have that both eβt cos ωt and eβt sin ωt are solutions to Eq. 14.7, provided that √ 4 m k − c2 −c Option (i) : ω= and β= . (14.15) 2m 2m Note that the above expression for ω is real if, and only if, c ≤ ccr, where
550
Part II. Calculus and dynamics of a particle in one dimension
ccr := 2
√
m k = 2 m˚ ω
(14.16)
is called the critical damping. [In the above equation, ˚ ω := k/m
(14.17)
denotes the natural frequency in the absence of damping.] However, if c = ccr we have ω = 0. Hence eβt sin ωt = 0 (trivial solution) and eβt cos ωt is the only nontrivial solution. Accordingly, we can use this option only for c < ccr. ◦ Option (ii). Next, consider the second option, namely ω = 0. Then, x(t) = eβt sin ωt = 0 (Eq. 14.9) corresponds to the trivial solution. Nonetheless, the 2 first equality in Eq. 14.13 reduces to √ m β + c β + k = 0 and it gives us two roots for β, namely β± = (−c ± c2 − 4mk)/(2m) (roots of a quadratic equation, Eq. 6.36). Summarizing, for the second option we have √ −c ± c2 − 4 m k Option (ii) : ω = 0 and β± = . (14.18) 2m β t
β t
namely two solutions, namely e + and e − . Note that the expression for β± in the second option is real if, and only if, c ≥ ccr. However, if c = ccr we have β+ = β−, and again we have only one nontrivial solution. Accordingly, we can use this option only for c > ccr. ◦ Option (iii). The above considerations indicate that we have to address √ a third possibility, namely c = ccr = 2 mk. In this case, we have Option (iii) : ω=0 with (use ˚ ω := k/m, Eq. 14.17) −ccr βcr = =− 2m
and
-
k = −˚ ω. m
β = βcr,
(14.19)
(14.20)
[More on this in Subsection 14.2.3.] ◦ Comment. The solution for c < ccr will be referred to as the small– damping solution and will be denoted by xSD(t). The solution for c > ccr will be referred to as the large–damping solution and will be denoted by xLD(t). The solution for c = ccr will be referred to as the critical–damping solution and will be denoted by xCD(t). The three solutions are addressed separately in the three subsections that follow. Remark 117. Note that in all three cases, we have β < 0,
(14.21)
551
14. Damping and aerodynamic drag
since m, c and k are positive (Eq. 14.5). We say that, for all three options, the solution is asymptotically stable, to mean that it tends to zero as t tends to infinity.
14.2.1 Small–damping solution Let us consider Option (i), namely assume that the damping is smaller than the critical one (small–damping solution): √ c < ccr := 2 mk = 2 m ˚ ω. (14.22) In this case, the functions in Eq. 14.9 (namely eβt cos ωt and eβt sin ωt) are linearly independent solutions to Eq. 14.7, √ provided of course that β and ω are those given in Eq. 14.15, namely ω = 4 m k − c2 /(2m) and β = −c/(2m). Thus, we can exploit the superposition theorem for linear homogeneous equations (Theorem 112, p. 407) and use a linear combination of the two solutions to obtain xSD(t) = A eβt cos ωt + B eβt sin ωt.
(14.23)
Next, let us consider the initial conditions. We have two arbitrary constants, A and B, which may be used to satisfy the two initial conditions in Eq. 14.8. These yield xSD(0) = A = x0 , x˙ SD(0) = βA + ωB = v0 .
(14.24)
[Hint: Use Eq. 14.10, and the corresponding one for eβt sin ωt.] Thus, we have A = x0 (14.25) and B = v0 − β x0 /ω. Hence, the solution is xSD(t) = x0 eβt cos ωt +
v0 − β x0 βt e sin ωt, ω
with β and ω given by Eq. 14.15. This is equivalent to 1 β xSD(t) = x0 eβt cos ωt − sin ωt + v0 eβt sin ωt. ω ω
(14.26)
(14.27)
Indeed, xSD(t) satisfies Eq. 14.7, as well as the two initial conditions in Eq. 14.8. [Specifically, the first term in Eq. 14.27 satisfies the conditions x(0) = x0
552
Part II. Calculus and dynamics of a particle in one dimension
and x(0) ˙ = 0, whereas for the second term we have x(0) = 0 and x(0) ˙ = v0 , as you may verify.] Recall that β is negative (Eq. 14.21). Thus, the particle tends to the equilibrium position, x = 0, while oscillating. From a physical point of view, this solution corresponds to damped oscillations. [Of course, if c = 0, we have β = 0 and ω = k/m, and we recover the solution of the undamped system obtained in Chapter 11 (Eq. 11.40).] The solution for x0 = 1 and v0 = 0, and that for x0 = 0 and v0 = 1 are depicted respectively in Figs. 14.3 and 14.4, which are obtained with ω = 1 and β = −0.1.
Fig. 14.3 xSD (t) for x0 = 1, v0 = 0
Fig. 14.4 xSD (t) for x0 = 0, v0 = 1
14.2.2 Large–damping solution Consider Option (ii), namely assume that the damping is larger than the critical one (large–damping solution): √ c > ccr := 2 mk = 2 m ˚ ω. (14.28) As shown above, in this case we have two linearly independent solutions for √ β t β t Eq. 14.9, namely e + and e − , with β± = (−c ± c2 − 4mk)/(2m) (Eq. 14.18). [Alternatively, and more directly, we can try x(t) = eβt , combine with Eq. 14.7, and obtain the same result.] Thus, again, we can exploit the superposition theorem for linear homogeneous equations (Theorem 112, p. 407) and use a linear combination of the two solutions to obtain
553
14. Damping and aerodynamic drag
xLD(t) = A+ e
β+ t
+ A− e
β− t
.
(14.29)
Again, we have two arbitrary constants, A+ and A−, which may be used to impose the two initial conditions in Eq. 14.8. These yield x(0) = A+ + A− = x0 , x(0) ˙ = β + A+ + β − A− = v 0 .
(14.30)
This is a system of two linear algebraic equations with two unknowns, A+ and A−, whose solution is given by A± = (v0 − β∓ x0 )/(β± − β∓), as you may verify. Thus, in this case the solution is + , + , v0 x0 β t β t β t β t β+ e − − β− e + + e + − e − . (14.31) xLD(t) = β+ − β− β+ − β− Indeed, xLD(t) satisfies Eq. 14.7, as well as the two initial conditions in Eq. 14.8. [Specifically, the first term satisfies the conditions x(0) = x0 and x(0) ˙ = 0, whereas for the second term we have x(0) = 0 and x(0) ˙ = v0 , as you may verify.] Both β+ and β− are negative (Eq. 14.18). From a physical point of view, the particle tends to the equilibrium position x = 0, without oscillating. [As mentioned above, the system is still called a damped harmonic oscillator, even though there is no oscillation involved.] The solution for x0 = 1 and v0 = 0, and that for x0 = 0 and v0 = 1 are depicted respectively in Figs. 14.5 and 14.6, which were obtained with β+ = −0.1 and β− = −0.2.
Fig. 14.5 xLD (t) for x0 = 1, v0 = 0
Fig. 14.6 xLD (t) for x0 = 0, v0 = 1
554
Part II. Calculus and dynamics of a particle in one dimension
14.2.3 Critical–damping solution
♣
Here, we consider the third option, namely that the damping is equal to the critical one (critical–damping solution): √ c = ccr := 2 m k = 2 m ˚ ω. (14.32) In this case, we have (Eq. 14.19) β = βcr := −
c = −˚ ω = − K/m 2m
(14.33)
and ω = 0. Again, the second function in Option (i), namely eβt sin ωt (Eq. 14.23), vanishes (trivial solution). Moreover, the two solutions in Option (ii) merge, because we have only one value for β. Therefore, we have encountered a new situation — we have only one nontrivial solution to Eq. 14.9, namely x1 (t) = eβt . In order for us to be able to impose two initial conditions, we need a second nontrivial solution. I claim that one such solution is x2 (t) = t eβt . Let us verify that this is indeed the case. We have dx2 = β t eβt + eβt dt
and
d 2 x2 = β 2 t eβt + 2 β eβt . dt2
(14.34)
Substituting into m x ¨ + c x˙ + k x = 0 (Eq. 14.7), one obtains (m β 2 + c β + k) t eβt + (2 m β + c) eβt = 0.
(14.35)
Next, note that the coefficient of the second term vanishes, because β = βcr := −c/(2m) (Eq. 14.33), whereas the coefficient of the first term equals √ √ 2 m β + c β + k = m (k/m) − 2 mk k/m + k = 0. [Use c = c mk cr := 2 (Eq. 14.32) and β = βcr = k/m (Eq. 14.33).] Thus, we have verified that x2 (t) = t eβt is indeed a nontrivial solution to Eq. 14.7. The two solutions are linearly independent, since 1 and t are linearly independent (Subsection 6.3.5). Thus, again, we can exploit the superposition theorem for linear homogeneous equations (Theorem 112, p. 407) and use a linear combination of the two solutions to obtain xCD(t) = A eβt + B t eβt .
(14.36)
Again, we have two arbitrary constants, A and B, which may be used to impose the two initial conditions in Eq. 14.8. These yield xCD(0) = A = x0 , x˙ CD(0) = βA + B = v0 ,
(14.37)
555
14. Damping and aerodynamic drag
namely A = x0 and B = v0 − β x0 . Hence, the solution is xCD(t) = x0 (1 − β t) eβt + v0 t eβt .
(14.38)
Indeed, xCD(t) satisfies Eq. 14.7, as well as the two initial conditions in Eq. 14.8. [Specifically, the first term satisfies the conditions x(0) = x0 and x(0) ˙ = 0, whereas for the second one we have x(0) = 0 and x(0) ˙ = v0 , as you may verify.] Note that β is negative. From a physical point of view, the particle tends to the equilibrium location x = 0, without oscillating. The solutions for x0 = 1 and v0 = 0, and for x0 = 0 and v0 = 1 are depicted respectively in Figs. 14.7 and 14.8, obtained with β = −0.1.
Fig. 14.7 xCD (t) for x0 = 1, v0 = 0
Fig. 14.8 xCD (t) for x0 = 0, v0 = 1
14.2.4 Continuity of solution in terms of c
♥
From the solutions presented above (Eqs. 14.27, 14.31 and 14.38), you might get the impression that, as we vary c, the solution of the original problem, as a function of c, presents a discontinuity at c = ccr. Here, for those of you who might be curious about it, we show that this is not the case, in the sense that, as c tends to ccr (from above or from below), both the small–damping and the large–damping solutions tend to the critical–damping solution. ◦ Limit of small–damping solution. Here, we consider √ the limit of the small–damping solution, xSD(t) (Eq. 14.27), where ω = 4 m k − c2 /(2m) and β = βcr =: −c/(2m) (Eqs. 14.15 and 14.20). Note that limc→ccr ω = 0.
556
Part II. Calculus and dynamics of a particle in one dimension
Therefore, using limω→0 cos ωt = 1 (Eq. 8.61) and limω→0 [(sin ωt)/ω] = t (Eq. 8.62), we obtain (for the last equality, use Eq. 14.38) , + β βt βt 1 lim xSD(t) = lim x0 e sin ωt cos ωt − sin ωt + v0 e c→ccr ω→0 ω ω = x0 (1 − βcrt) eβcrt + v0 t eβcrt = xCD(t).
(14.39)
◦ Limit of large–damping solution. Next, consider the √ limit of the large– damping solution (Eq. 14.31), where we have β± = (−c ± c2 − 4mk)/(2m) . 2 , (Eq. 14.18). We may set β± = β˘ ± δ, with β˘ = −c/(2m) and δ = β˘2 − βcr where βcr = − k/m (Eq. 14.33). Then, Eq. 14.31 yields * v ) * x0 ) ˘ ˘ ˘ ˘ ˘ 0 (β + δ) e(β−δ)t − (β˘ − δ) e(β+δ)t + e(β+δ)t − e(β−δ)t xLD(t) = 2δ 2δ δt −δt δt −δt e +e eδt − e−δt βt e −e ˘ ˘ = x0 − β˘ eβt + v0 e . (14.40) 2 2δ 2δ As c > ccr approaches ccr, β˘ tends to βcr and δ tends to zero. Accordingly, we obtain lim xLD(t) = x0 (1 − βcr t) eβcrt + v0 t eβcrt = xCD(t).
c→ccr
[Hint: Use deδ t /dδ
(14.41)
= t for the first equality and Eq. 14.38 for the second.] δ=0
14.3 Damped harmonic oscillator. Harmonic forcing Here, we limit ourselves to the case of harmonically forced oscillations. [For, periodic, but not harmonic, forcing may be addressed using the same approach used in Subsection 11.7.4.] To avoid cumbersome expressions, we address the solution to Eq. 14.4 for the limited case of F (t) = F˘ cos Ωt.
(14.42)
ˇ sin Ωt (Eq. 14.6) is equivalent [The more general case F (t) = Fˇ cos Ωt + G to F (t) = F˘ cos(Ωt + χ) (see Eq. 11.46). The corresponding formulation is virtually identical to that of this section — simply replace Ωt with Ωt + χ in all the pertinent equations of this section.]
557
14. Damping and aerodynamic drag
• A particular solution. Response to a harmonic input Let xPN(t) denote a Particular solution to the Nonhomogeneous problem. This may be obtained by setting xPN(t) = C cos Ωt + D sin Ωt.
(14.43)
This yields x˙ PN = Ω (−C sin Ωt + D cos Ωt), x ¨PN = −Ω 2 (C cos Ωt + D sin Ωt) = −Ω 2 xPN(t).
(14.44)
Combining with Eq. 14.4 yields (−m Ω 2 + k) C + c Ω D cos Ωt + (−m Ω 2 + k) D − c Ω C sin Ωt = F˘ cos Ωt. (14.45) This equation implies necessarily (Theorem 122, p. 433, on the linear independence of sin x and cos x) (k − m Ω 2 ) C + c Ω D = F˘ , (k − m Ω 2 ) D − c Ω C = 0.
(14.46)
This is a system of two linear algebraic equations with two unknowns (namely C and D), whose solution is given by (use Eq. 2.15) k − m Ω2 F˘ , (k − m Ω 2 )2 + c2 Ω 2 cΩ F˘ , D= (k − m Ω 2 )2 + c2 Ω 2 C=
(14.47)
as you may verify. Combining with Eq. 14.43, one obtains the desired particular solution of the nonhomogeneous problem, namely xPN(t) =
(k − m Ω 2 ) cos Ωt + c Ω sin Ωt ˘ F. (k − m Ω 2 )2 + c2 Ω 2
(14.48)
Here, for later convenience, we introduce a different expression for the solution xPN(t). Specifically, using C cos ωt + D sin ωt = X cos(ωt − ϕ) (Eq. 11.46), we have that Eq. 14.43 is equivalent to xPN(t) = X cos(Ωt − ϕ), where X =
(14.49)
√ C 2 + D2 and ϕ = tan -1 D/C (Eq. 11.47), or (use Eq. 14.47)
558
Part II. Calculus and dynamics of a particle in one dimension
F˘ X= (k − m Ω 2 )2 + c2 Ω 2
and
ϕ = tan -1
cΩ . k − mΩ 2
(14.50)
[Recall again Remark 55, p. 270, regarding the value of tan -1 x.]
• The complete solution The expression for xPN(t) in Eq. 14.49, referred to as the steady–state solution, or the response to a harmonic input, is examined in the subsubsection that follows. Here we address the initial conditions. Indeed, the above solution does not necessarily satisfy the initial conditions. In order for us to be able to impose these, we need to introduce two arbitrary constants in the solution. According to the superposition theorem for linear nonhomogeneous equations (Theorem 114, p. 407), this is obtained by adding to the particular solution to the nonhomogeneous equation the general solution to the associated homogeneous equation (Eq. 9.168). ◦ Comment. Note that the solution in Eq. 14.48 is valid independently of the amount of damping, namely for small, critical and large damping. For simplicity, here we limit our analysis to the small–damping case. The treatment of the critical– and large–damping cases is similar. Adding to xPN(t) (Eq. 14.48) the general solution of the associated homogeneous problem, namely xSD(t) = A eβt cos ωt + B eβt sin ωt (Eq. 14.23), one obtains x(t) = A eβt cos ωt + B eβt sin ωt + X cos(Ωt + ϕ),
(14.51)
which still satisfies Eq. 14.6, as you may verify. The constants A and B are no longer given by Eq. 14.25. Indeed, the initial conditions have to be imposed only after adding the term X cos(Ωt+ϕ). This yields the following equations for the constants A and B: x(0) = A + X cos ϕ = x0 , x(0) ˙ = βA + ωB − XΩ sin ϕ = v0 ,
(14.52)
from which one obtains the constants A and B, as A = x0 − X cos ϕ, 1 B= v0 − βA + XΩ sin ϕ . ω
(14.53)
559
14. Damping and aerodynamic drag
• Amplification factor. Frequency response The general solution to the associated homogeneous equation in Eq. 14.23, xGH(t) = A eβt cos ωt + B eβt sin ωt, is damped and hence after a while it vanishes (Remark 117, p. 550). Here, we are interested in the portion of the solution that remains after the transient, namely xPN(t) = X cos(Ωt − ϕ) (Eq. 14.49). ◦ Comment. As pointed out above, this expression for xPN(t) is valid independently of the amount of damping, namely is valid for small, critical and large damping. Let X0 denote the value of X for Ω = 0, namely the static response, that is, X0 = F˘ /k. Setting Ξ=
X X = , ˘ X0 F /k
(14.54)
we have Ξ = 1/ (1 − m Ω 2 /k)2 + c2 Ω 2 /k 2 , or Ξ(σ, ζ) = .
1 − σ2
1 2
,
(14.55)
+ 4 ζ 2 σ2
with Ω Ω σ := = ˚ ω k/m
and
ζ :=
c c c √ = , = 2 m ˚ ω c cr 2 mk
(14.56)
as you may verify. [Recall that ˚ ω = k/m (Eq. 14.17)√is the natural frequency of oscillation in the absence of damping and ccr = 2 mk (Eq. 14.16).] Akin to Ξ(σ) in Eq. 11.61, the factor Ξ(σ, ζ) is also known as the amplification factor (or as the frequency response for the specific case under consideration) and is depicted in Fig. 14.9 as a function of σ, for several val√ ues of ζ, specifically, for ζ = 0.0, 0.1, 0.2, 0.3, 1/ 2, 1.0 and 2.0. The light √ gray area spans from ζ = 0 (undamped harmonic oscillator) to ζ = 1/ 2. The dark gray area spans from ζ = 1 (critical damping) to ζ = ∞. [The dashed line is the locus of the point where Ξ(σ) reaches its maximum, ΞMax, as a function of ζ. This addressed in the subsubsection that √ follows, where it is also shown that ΞMax attains its minimum for ζ = 1/ 2.] Turning to the phase shift ϕ, we have ϕ = tan -1(c Ω/(k − mΩ 2 ) (Eq. 14.50), or ϕ = tan -1
2ζ σ 1 − σ2
+
, ϕ ∈ [0, π] ,
(14.57)
560
Part II. Calculus and dynamics of a particle in one dimension
Fig. 14.9 Ξ vs σ, for various ζ
Fig. 14.10 ϕ/π vs σ, for various ζ
as you may verify. Figure 14.10 depicts the phase shift ϕ, also as a function of σ, for several values of ζ (same values as in Fig. 14.9). ◦ Comment. Note that m x ¨ + c x˙ + k x = F (t) (Eq. 14.4), divided by m, may be written as d2 x dx +˚ ω 2 x = F (t)/m. + 2ζ˚ ω 2 dt dt
(14.58)
[You might like to repeat the above analysis starting from this equation, instead of Eq. 14.4. Were you to do this, you would notice that the math simplifies considerably. However, the gut feeling might be lost.]
• Locus of maxima
♠
The amplification factor Ξ(σ, ζ), as a function of σ, reaches its maximum when the denominator reaches its minimum, namely when the term under the square root reaches its minimum. This occurs when * ) * 2 d ) 1 − σ 2 + 4 ζ 2 σ 2 = − 2 1 − σ 2 + 4 ζ 2 2 σ = 0. dσ
(14.59)
This yields two solutions. The first, namely σ = 0, corresponds to a global minimum in the interval [0, 1] (Fig. 14.9). [At Ω = 0, we have Ξ = 1 and the slope of the graph is horizontal.] The second solution, given by σ = σM := 1 − 2 ζ 2 , (14.60) corresponds to the local and global maximum value for Ξ(σ, ζ) as a function of σ, for a given value of ζ. The function ΞMax(ζ) is given by
561
14. Damping and aerodynamic drag
1 1
ΞMax(ζ) = =
2 2 2 2 4 √
(1 − σ ) + 4 ζ σ σ= 1−2 ζ 2 4 ζ + 4 ζ 2 (1 − 2 ζ 2 ) =
1 . 2 ζ 1 − ζ2
(14.61)
◦ Comment. The quantity ΞMax as a function of σ is obtained by eliminating ζ between the last two equations. This yields (set 2ζ 2 = 1 − σ 2 in the term after the first equality in Eq. 14.61) 1 1 ΞMax(σ) = =√ . 2 2 2 2 1 − σ4 (1 − σ ) + 2 (1 − σ ) σ
(14.62)
The above function is included in Fig. 14.9, as a dashed line. The √ corresponding Taylor polynomial is ΞMax(σ) = 1 + 12 σ 4 + O[h8 ] (use 1/ 1 + x = 1 − 12 x + O[x2 ], Eq. 13.29). Its graph exhibits the behavior of a fourth–order parabola (defined in Remark 57, p. 280), as noticeable from Fig. 14.9. √ ◦ The case ζ = 1/ 2. Note that, if √ √ ζ = c/ccr = c/ 4mk = 1/ 2, (14.63) √ we have σM = 0 (Eq. 14.60). Let √ us examine 1this case2in detail. If ζ = 1/ 2, Eq. 14.55 yields (use again 1/ 1 + x = 1 − 2 x + O[x ], Eq. 13.29) √ 1 1 = 1 − σ4 + O σ8 . Ξ(σ, 1/ 2) = √ 4 2 1+σ
(14.64)
Accordingly, in the neighborhood of σ = 0, the graph behaves again like a fourth–order parabola. [For all the other values of ζ, the graphs behave like a second–order parabola. The difference is noticeable in Fig. 14.9.] √ √ ◦ The case ζ > 1/ 2. For ζ > 1/ 2, σM = 1 − 2ζ 2 become imaginary. √ Accordingly, ΞMax(ζ) reaches its minimum for ζ = 1/ 2. In this case, the minimum namely√Ω = 0. From this point on, namely for √ occurs for σ = 0, √ ζ > 1/ 2 (i.e., for c > ccr/ 2 = mk = m ˚ ω ), the function Ξ(ζ) has no local maximum and decreases monotonically. The higher the frequency, the lower the amplitude of the oscillations. ◦ The case ζ 1. For ζ 1 (namely for c ccr), Eq. 14.60 yields σM 1, namely ΩM ˚ ω . In other words, for very small damping, the peak occurs near the resonant frequency of the undamped system, and is given by (use Eq. 14.61) √ ΞMax = XMax/X0 1/(2ζ) = mk/c 1. (14.65)
562
Part II. Calculus and dynamics of a particle in one dimension
14.4 Damped harmonic oscillator. Arbitrary forcing
♣
The preceding section was limited to harmonic forcing. Here, we consider the general forced–oscillation problem (Eq. 14.4), namely m
dx d2 x + k x = F (t), +c dt2 dt
(14.66)
where F (t) is an arbitrary input force. The initial conditions are still arbitrarily prescribed: x(0) = x0 and x(0) ˙ = v0 (Eq. 14.8). Again, for simplicity, we limit ourselves to the small–damping case. I claim that in this case the solution is given by 1 β βt cos ωt − sin ωt + v0 eβt sin ωt x(t) = x0 e ω ω / t 1 + eβ(t−τ ) sin ω(t − τ ) F (τ ) dτ, (14.67) mω 0 √ with ω = 4 m k − c2 /(2m) and β = −c/(2m) (Eq. 14.15). Let us verify that my claim is correct. We know that the first two terms of Eq. 14.67 are the solution to the homogeneous equation with initial conditions in Eq. 14.8, namely x(0) = x0 and x(0) ˙ = v0 . Thus, on the basis of the superposition theorem for linear nonhomogeneous equations (Theorem 114, p. 407), we simply have to verify that the third term, xPN(t) :=
1 mω
t
/
eβ(t−τ ) sin ω(t − τ ) F (τ ) dτ,
(14.68)
0
satisfies Eq. 14.66, with homogeneous initial conditions, namely xPN(0) = 0
and
x˙ PN(0) = 0.
(14.69)
[The reasoning is conceptually identical to that in Remark 100, p. 472, on the superposition of the effects of the three non–homogeneities (differential equations and two initial conditions).] Indeed, using the rule for the derivative of an integral as given in Eq. 10.98 (derivative under the integral sign, plus integrand evaluated at the upper limit), one obtains, for the problem under consideration, x˙ PN(t) =
1 mω
t
/ 0
* ∂ ) β(t−τ ) e sin ω(t − τ ) F (τ ) dτ. ∂t
(14.70)
[Hint: The contribution from the integrand evaluated at the upper limit of integration, vanishes. Indeed, [eβ(t−τ ) sin ω(t − τ ) F (τ )]τ =t = 0.]
14. Damping and aerodynamic drag
563
Differentiating the above equation and using again Eq. 10.98, we have / t 2) * ∂ 1 1 β(t−τ ) F (t). (14.71) e sin ω(t − τ ) F (τ ) dτ + x ¨PN(t) = 2 m ω 0 ∂t m [Hint: Use [d(eβu sin ωu)/du]u=0 = [eβu (ω cos ωu + β sin ωu)]u=0 = ω.] If we combine Eqs. 14.68, 14.70 and 14.71 we see that xPN(t) satisfies Eq. 14.66. [Hint: For, (m d2 /dt2 +c d/dt+k)(eβt sin ωt) = 0, as you may verify. In other words, eβt sin ωt is a solution to the associated homogeneous equation (use Eq. 14.23). Indeed, the three terms under the integral sign (obtained after combining Eqs. 14.68, 14.70 and 14.71) offset each other.] In addition, xPN(t) satisfies the homogeneous initial conditions, xPN(0) = 0 and x˙ PN(0) = 0, as required by Eq. 14.69. [Indeed, for t = 0 the integrals in Eqs. 14.68 and 14.70 vanish.] In summary, the expression in Eq. 14.68 satisfies Eq. 14.66, with homogeneous initial conditions, as desired. Recalling that the solution is unique (Remark 96, p. 459), the expression in Eq. 14.67 provides the solution to Eq. 14.66, with its initial conditions x(0) = x0 and x(0) ˙ = v0 . ◦ Comment. You may have noted that the expression given for the solution when F (t) is arbitrary (Eq. 14.67) does not look at all like the expression obtained in the preceding section, when F (t) = F cos Ωt, namely Eq. 14.51, with A and B given by Eq. 14.53. You may show that, when F (t) = F cos Ωt, Eq. 14.67 properly reduces to the solution obtained in the preceding section, namely Eq. 14.51, with A and B given by Eq. 14.53. [Hint: Use the approach presented in Section 11.8.]
14.5 Illustrative examples In this section, we provide a few illustrative examples, just to give you a flavor of how the above models may be applied to daily life.
14.5.1 Model of a car suspension system Let us begin by relating the damped harmonic oscillator to the suspension system of a car or motorcycle. A simple analysis of the frequency response of the suspension system of a car in the absence of shock absorbers was presented in Subsection 11.7.5, on “irrational mechanics.” There, the shock absorbers of the car were assumed to be totally gone, and hence we related the
564
Part II. Calculus and dynamics of a particle in one dimension
suspension system to an undamped harmonic oscillator. However, doing this in general is not legitimate, because the suspensions of a car are composed of springs, as well as dampers (or shock absorbers, or simply shocks). Thus, we are now in a position to address this issue at a depth greater than that presented in Subsection 11.7.5. This is a considerable improvement with respect to what we did then. In fact, it is a fairly adequate representation for the test of the suspension system within a manufacturing facility (as opposed to a test “on the road ”), provided of course that we can approximate the equation governing the shock absorber with that given in Eq. 14.2. [In fact, for simplicity, throughout this section I am assuming that the shocks may be approximated with dashpots (linear dampers, Eq. 14.1). In addition, we assume that the tire may be represented by a linear spring–dashpot system. For a detailed analysis of these assumptions see Ref. [28].]
• Model of a car suspension system, in a test facility (with shocks) To model a car suspension system in a test facility, let us consider the damped harmonic oscillator shown in Fig. 14.11. [This system is obtained from that in Fig. 11.11, by adding two dashpots.]
Fig. 14.11 Model of a car suspension system
The mass (treated as a particle placed at P ) represents the wheel, whereas the point Q represents the point of the car where the suspension is connected to the body, and is fixed in the problem under consideration (test facility). On the other hand, the point C is the point of contact of the tire with the pavement, and is assumed to have a prescribed motion. The spring S1 and the dashpot D1 represent the suspension system, whereas the spring S2 and the dashpot D2 model the tire.
14. Damping and aerodynamic drag
565
Let us the consider the equilibrium configuration with unloaded springs, and use this as the reference configuration, namely the configuration from which we measure the displacements of P and C. [If the suspension system is preloaded, we use this as the reference configuration. In this case, the forces represent the variation due to the displacements from such a configuration.] The forces acting on the particle P are due to: (i) the first spring, S1 , which connects the mass point P to the fixed point Q; the corresponding force is FS1 = −k1 x, where x denotes the displacement of P from the reference configuration, namely from the origin O; (ii) the second spring, S2 , which connects P to C; the corresponding force is FS2 = −k2 (x − u), where u denotes the displacement of C from the reference configuration, namely from the point C0 ; (iii) the first dashpot, D1 , which connects P to Q; the corresponding force is FD1 = −c1 x; ˙ and (iv) the second dashpot, D2 , which connects P to C; the corresponding force is FD2 = −c2 (x˙ − u). ˙ Thus, the Newton second law gives us m
d2 x dx + (k1 + k2 ) x = k2 u(t) + c2 u(t), + (c1 + c2 ) ˙ 2 dt dt
(14.72)
namely m x ¨ + c x˙ + k x = F (t) (as in Eq. 14.4, forced oscillations of a damped harmonic oscillator), with k = k1 +k2 , c = c1 +c2 , and F (t) = k2 u(t)+c2 u(t). ˙ In order to obtain the equation of the damped harmonic oscillator subject to a harmonic forcing (Eq. 14.6), we assume that the point C moves according ˘ cos Ωt. Then, to the equation u(t) = U ˘ sin Ωt, F (t) = F˘ cos Ωt + G
(14.73)
˘ . Combining with Eq. 14.4, one obtains Eq. ˘ and G ˘ = −c2 Ω U with F˘ = k2 U 14.6. Accordingly, the formulation of the preceding section applies.
• Model of a car suspension system, on the road If we stretch it a bit, the above model may be used even if the car is not in the test facility but “on the road.” The main difference is that in the test facility the point Q is fixed, whereas on the road the point Q moves with the car body. The motion of Q is negligible, provided that the mass of the car is much greater than that of the wheel, and that the frequency of the forced oscillations is sufficiently large (Remark 106, p. 482). So, let us examine how to use the formulation to address the dynamics of a wheel of your car, as the car travels at a constant velocity V over a street ˘ cos 2πs/, with a sinusoidally shaped pavement. Then, we have u(s) = U where s denotes the abscissa along the street, whereas is the wavelength
566
Part II. Calculus and dynamics of a particle in one dimension
of the oscillation, namely the distance between two successive peaks. Having assumed the speed of the car to be constant, we have s = V t, which yields ˘ cos Ωt, where Ω = 2 π V /. Then, Eq. 14.6 applies, along with u(t) = U the corresponding formulation. [The assumptions that the car is moving at a constant velocity and that the pavement is sinusoidally shaped are introduced only for the sake of simplicity. For arbitrary motion, see the discussion in Subsection 11.7.5.]
14.5.2 Time to change the shocks of your car!
♥
Here, we use the results obtained above in order to understand a simple test that auto mechanics perform to check the shocks of your car, say the front shocks. To do that, they typically press down the front–end of your car and then suddenly release it. If the car overshoots too much (namely it goes considerably above the final equilibrium configuration), then they tell you that it is about time to change your shocks. Let us try to make some sense out of this test. To this end, for simplicity, I assume that the shocks may be approximated with dashpots. [For a detailed analysis of this assumption, see again Ref. [28].] Correspondingly, the analysis presented here is only conceptual. As you use the car, the shocks progressively lose their damping capacity. Thus, here we worry about such a situation and discuss what happens when the shocks are too weak. [On the other hand, if the damping is too high the car is too rigid. However, we do not have to worry about this — the car manufacturer takes care of that.] As my auto mechanic tells me, if the car overshoots, it is a clear sign that the passenger would experience an uncomfortable ride. Also, as you may have noted if you drive the way I used to when I was young, when you take a turn a bit too fast with weak shocks, if the pavement is not smooth, the back wheels of the car may lose their grip, producing oversteering in a choppy way — a dangerous situation. Another issue is the level of oscillation under periodic forcing (as in the example of “irrational mechanics,” in Subsection 11.7.5), which is evaluated through the amplification factor Ξ = XM/X0 (Eq. 14.55). Let us look at the mechanics’ test from a mathematical point of view. Specifically, let us connect: (i) the results for forced oscillations under periodic forcing (Section 14.3), which affect the ride comfort, with (ii) those for free oscillations (Section 14.2), which is what occurs during the test by your auto mechanic.
567
14. Damping and aerodynamic drag
We start from the fact that we have c < ccr (small–damping case), as we infer from the overshooting of the car. [This cannot occur in the large–damping case. Why? Hint: Recall that v0 = 0, and use Eq. 14.31. Alternatively, see Fig. 14.5, p. 553.] We can do better. We are able to assess the level of damping from the test that the auto mechanics performs, as described above. To do this, we may use the free–oscillation small–damping solution, which is given by (use Eq. 14.27, with v0 = 0) β (14.74) x(t) = x0 cos ωt − sin ωt eβt , ω where x0 = x(0) =< 0, since the auto mechanic pushes the front–end downwards. [Here, the displacements are treated as positive when upwards.] The maximum overshoot time tOS is obtained from imposing v(tOS) = 0. Differentiating the above equation, we have β2 βt x(t) ˙ = x0 e − ω sin ωt + β cos ωt − β cos ωt − sin ωt ω ω 2 + β 2 βt e sin ωt. (14.75) = −x0 ω Hence, x(t) ˙ = 0 yields ωt = kπ (k = 0, 1, . . . ). The overshoot value xOS is obtained when k = 1, namely for ωtOS = π, and is given by
= −x0 eπβ/ω . (14.76) xOS = x(t) ωt=π
What is of interest here is the function |xOS/x0 | = eπβ/ω . Recalling that √ β = −c/(2m) and ω = 4 m k − c2 /(2m) (Eq. 14.15), we have β −c −ζ =√ , = ω 4 m k − c2 1 − ζ2 √ where ζ = c/ccr = c/ 4mk (Eq. 14.56). Hence, √ 2 xOS = e−πζ/ 1−ζ . |x0 |
(14.77)
(14.78)
On the other hand, turning to the forced oscillation analysis, we have (use Eqs. 14.54 and 14.61) ΞMax =
XMax 1 = . X0 2 ζ 1 − ζ2
(14.79)
568
Part II. Calculus and dynamics of a particle in one dimension
In other words, both xOS/|x0 | and XMax/X0 are only functions of ζ = c/ccr. Therefore, from a given ζ we can obtain both, xOS/|x0 | and ΞMax/X0 . √ Let us be specific. We can start with ζ = 1/ 2, for which the maximum oscillation occurs for Ω = 0 (static value). In this case, we have |xOS/x0 | = e−π 0.043, namely an overshoot of about 4%. If this is what you get, you do not have to worry about changing your shocks. On the other hand, √ for ζ = 1/2 we have |xOS/x0 | = e−π/ 3 0.163, namely an overshoot of about 16%. Correspondingly, we have ΞM = XM/X0 1.155, which is barely acceptable. You may choose not to change your shocks, provided you drive like a “responsible adult ” and the pavement is √ smooth, especially on the turns. However, for ζ = 1/3, we have xOS/x0 = e−π/ 8 .329, namely an overshoot of about 33%. Correspondingly, we have ΞM = XM/X0 1.590, too high a value. Don’t take these numbers literally. As stated above, this is only a conceptual analysis.
14.5.3 Other problems governed by same equations
♥
Are there other mechanical problems that are governed by Eqs. 14.6 and 14.7? The answer is: “Definitely yes!” Indeed, these equations may be encountered in a wide variety of mechanical problems. For instance, in the presence of damping, the motion of a free pendulum (namely when a forcing function is absent) is governed by Eq. 14.7, at least for small amplitudes (linearization). Also, the motion of a swing, where a forcing function is present, may be analyzed (at least from a qualitative point of view) with Eq. 14.4. [This is what I was referring to in Remark 105, p. 479, on the finite amplitude of a swing.] However, deriving the corresponding formulations requires the equations governing the motion along a prescribed (non–rectilinear) path (which are covered in Section 20.3), or the equations governing rigid–body planar motions (Section 23.7). Thus, for these applications, you have to wait a bit longer... and then, you’ll be in a position to solve them yourself. Also, the forcing function may be due to different types of phenomena. For instance, it might be due to an eccentric (rotating unbalanced mass, such as you have in the wheel of your car before your auto mechanic “balances it”). However, this requires notions that will be covered in Chapter 22, on rigid body motion. On the other hand, at this stage, it is not really important to know which problem gave rise to Eq. 14.4, or Eq. 14.6, or Eq. 14.7. The solution to these equations applies to all the problems that are governed by them. Thus, the
569
14. Damping and aerodynamic drag
model used in the derivation of the above formulations ought to be considered important not per se, but only as representative of an entire class of problems governed by the same type of equations.
14.6 Aerodynamic drag
♥
Here, we consider a body moving in the presence of air. Specifically, we replace the dashpot force with one due to the presence of the air. Experimental evidence shows that an object moving through the air is subject to a force D, called the aerodynamic drag, which acts in the direction opposite to the velocity and is proportional to the square of the velocity. Specifically, we have D=
1 C A v2 , 2 D A
(14.80)
where A denotes the air density and A the cross–section area of the object, whereas the (dimensionless) drag coefficient CD > 0, typically obtained experimentally, is a function of the shape of the body and may be treated as a constant in all the cases of interest in this book. ◦ Warning. Similar considerations apply if the particle moves through water. In this case we speak of hydrodynamic drag. This is not considered any further in this book.
14.6.1 Free fall of a body in the air
♥
In Subsection 11.5.1, we considered a heavy particle (again, a particle subject to gravity) in the absence of aerodynamic drag, namely we assumed the aerodynamic drag to be negligible, for instance in a vacuum tube. Here, we remove this limitation, namely we consider a particle subject to its weight and to the aerodynamic drag D = 12 CD A A v 2 > 0 (Eq. 14.80). The Newton second law (Eq. 11.11) gives m
dv = W − D = W − C v2 , dt
(14.81)
where W > 0 and C=
1 C A>0 2 D A
(14.82)
570
Part II. Calculus and dynamics of a particle in one dimension
are constant. Remark 118. For a change, the convention adopted here is that displacements, velocities and forces are considered positive if directed downwards. Since the particle is falling, the drag is pointed upwards, and hence the corresponding force is negative. It should be noted that the expression in Eq. 14.81 is valid only because the velocity is assumed to be always pointing in the same direction (as in Eq. 14.81), namely downwards. In general, we should write that the force due to drag is given by FD = − 12 CD A A |v| v, so as to take into account that the force is always opposite to the velocity. [Note that this expression is a particular case of Eq. 14.1, namely FD = − c v, with c = 12 CD A A |v| > 0 (nonlinear damping).] For simplicity, we limit our analysis to homogeneous initial conditions x(0) = v(0) = 0.
(14.83)
In other words, the body is let go from the origin with zero velocity.
• Adimensionalization technique Before discussing how to solve the problem, we want to simplify our equation. To this end, let us introduce a technique which consists in introducing some parameters, so as to make the variables dimensionless. This is referred to as the adimensionalization technique.1 The parameters thereby introduced are then used to simplify the equation. This is the main objective of the adimensionalization technique. The technique is better understood by showing a specific application, such as the problem under consideration. Accordingly, let us introduce the following dimensionless variables x ˇ= 1
x , xR
t tˇ = , tR
vˇ =
v , xR/tR
(14.84)
The term adimensionalization stands for the process of making the variable dimensionless. The technique is widely used in aerodynamics, yielding the introduction of two dimensionless numbers: the Reynolds number and the Mach number, which we will encounter in Vol. III. A similar terminology is common in the Romance languages (adimensionnalisation in French, adimensionalizaci´ on in Spanish,adimensionalizzazione in Italian). In the English literature the term, rarely encountered in the past, recently has been used more frequently. Thus, I am adopting it, especially because I not aware of any other term that expresses the concept so concisely. One alternative terminology would be the dimensionless–variable technique, but it is not as expressive as adimensionalization, which emphasize the use of a process (illustrated in this subsection), in order to arrive at the dimensionless–variable formulation.
571
14. Damping and aerodynamic drag
where xR is a suitable reference length and tR is a suitable reference time. [Note that, numerically, this is equivalent to using different unit length and unit time to express lengths and times, namely changing from meters and seconds, used in the metric system (International System of Units), into different units, which depend upon the specific problem under consideration.] Combining with Eq. 14.81 and dividing by mxR/t2R, one obtains t2R W xR C 2 dˇ v vˇ . − = m xR m d tˇ
(14.85)
Next, we choose xR and tR, so as to have dˇ v = 1 − vˇ2 . d tˇ
(14.86)
This requires t2RW/(mxR) = 1 and xRC/m = 1, namely m xR = , C
m , tR = √ CW
Combining with Eq. 14.84, we have √ CW C ˇ x, t= t, x ˇ= m m
• Solution
xR vR = = tR
vˇ =
-
W . C
C v. W
(14.87)
(14.88)
♥
Equation 14.86 may be solved by using the method of separation of variables for first–order ordinary differential equations, covered in Subsection 10.5.1. Specifically, Eq. 10.68 yields (use Eq. 12.66, and recall that tanh -1 0 = 0) tˇ =
/ 0
v ˇ
1 dˇ v1 = tanh -1 vˇ, 1 − vˇ12
(14.89)
which yields vˇ = tanh tˇ.
(14.90)
This is the solution to Eq. 14.86, as you may verify. [Hint: Use (tanh x) = 1 − tanh2 x (Eq. 12.50).] To complete the solution of the problem, we have to find the function x ˇ=x ˇ(tˇ), which is obtained by integrating Eq. 14.90 to yield
572
Part II. Calculus and dynamics of a particle in one dimension
x ˇ(tˇ) =
tˇ
/
tanh tˇ1 dtˇ1 = ln(cosh tˇ).
(14.91)
0
[Hint: The primitive of tanh x is ln(cosh x) (Eq. 12.86).]
• Asymptotic behavior of the solution
♠
Let us see how the solution behaves for very large values of tˇ. To do this, note that x ˇ(tˇ) = ln
(1 + e−2tˇ) etˇ etˇ + e−tˇ ˇ = ln = − ln 2 + tˇ + ln(1 + e−2t ). 2 2
(14.92)
[Hint: Use cosh u = 12 (eu + e−u ) = 12 eu (1 + e−2u ) (Eq. 12.43), as well as ln(uv) = ln u + ln v (Eq. 12.6), along with ln eu = u (Eq. 12.29).] The last term tends to zero, as tˇ tends to infinity. Thus, asymptotically the solution x ˇ(tˇ) grows linearly with tˇ, in agreement with the fact that the dimensionless velocity vˇ tends to one, as tˇ tends to infinity (Eq. 14.90). On the other hand, for very small values of tˇ, we can use the Taylor polynomials of the solution given in Eq. 14.91, namely 1 x ˇ(tˇ) = ln(cosh tˇ) = tˇ2 + o[tˇ2 ]. 2
(14.93)
[Hint: Recall that cosh u = 1+ 12 u2 +o[u2 ] (Eq. 13.56) and ln(1+u) = u+o[u] (Eq. 13.53).]
• Back to the original variables
♥
Finally, we can go back to the original variables. If we use Eq. 14.88, Eqs. 14.91 and 14.90 yield, respectively, √ CW m ln cosh t (14.94) x(t) = C m and v(t) =
W tanh C
Even better, using W = m g and C =
1 2
√
CW t . m
(14.95)
CDAA, Eqs. 14.94 and 14.95 yield
573
14. Damping and aerodynamic drag
2m ln cosh x(t) = CD AA
-
CD AA g t 2m
(14.96)
C D A A g t . 2m
(14.97)
and 1 v(t) =
2mg tanh C D A A
-
Remark 119. As t tends to infinity, we have v =
.
2 m g/(CD AA). This
may be obtained directly from Eq. 14.81 with v˙ = 0. [To an observer who travels downwards with the particle, the particle appears to be in equilibrium (namely not moving), and the aerodynamic drag is constant. Therefore, this solution may be obtained with the equations of statics.] Also, for tˇ 1, Eq. 14.93 yields x(t) = 12 g t2 , in agreement with x(t) = 12 F t2 /m (Eq. 11.22), with F = mg. This indicates that for very small t, the aerodynamic drag is negligible.
14.6.2 A deeper analysis. Upward initial velocity
♥
In the preceding subsections, we have assumed the particle velocity to be always pointed downwards. Here, we briefly outline how the formulation is modified when the velocity is pointing upwards, namely for v0 < 0, given the convention used here (positive x-axis pointing downwards). [For instance, in the problem of a ball addressed in Subsection 11.5.1, in the ascending portion of the motion, drag and weight are both pointing downwards.] In this case, Eq. 14.81 would have to be replaced by m dv/dt = W + D, with W > 0 and D > 0 (Remark 118, p. 570). Equation 14.86 would be replaced with dˇ v /dtˇ = 1 + vˇ2 , which would yield, instead of Eq. 14.90, vˇ = vˇ0 + tan tˇ (Eq. 9.75), where vˇ0 < 00 (see again Remark 118, p. 570). The function x ˇ(tˇ) is then obtained by using tan x dx = − ln | cos x| (Eq. 12.82). After the ball reaches its peak, we can shift to the formulation of the preceding subsection (Eqs. 14.81–14.97).
14.7 Energy theorems in the presence of damping In this section, we want to discuss how the energy approach is affected by the presence of damping. Specifically, we want to broaden Theorem 130, p. 492,
574
Part II. Calculus and dynamics of a particle in one dimension
on the conservation of mechanical energy for conservative forces, to include the effect of the presence of damping. Assume that F is given by (see Eq. 14.2) F (t, x, v) = F (x) − c(|v|) v,
where c(|v|) > 0.
(14.98)
The first term denotes a conservative force that possesses a potential energy, whereas the second represents a generic damper (Eq. 14.1) and/or the aerodynamic drag (Eq. 14.80). Then, generalizing the procedure used to obtain Eq. 11.110, we have / tb Tb + Ub = Ta + Ua − c(|v|) v 2 dt. (14.99) ta
Note that the integrand in the last term of the above equation is positive as long as v = 0. Thus, we may state the above result as follows: Theorem 144 (Dissipation of mechanical energy). If F (t, x, v) is given by Eq. 14.98, we have that the total mechanical energy E = T (v) + U(x) decreases in time, as / t E(t) = T [v(t)] + U [x(t)] = E0 − c(|v|) v 2 dt. (14.100) 0
This implies that time–independent solution is attained only when v = 0. For, the last term in Eq. 14.100 keeps on subtracting energy as long as v = 0. As a consequence, we have the following Theorem 145. Assume that U(x) has only one minimum, say at x = xm, and neither inflection points, nor maxima. Then, the steady–state solution (namely the time–independent solution, reached in the limit as time tends to infinite) is given by x = xm. ◦ Proof : Once, we have reached a time–independent solution (namely v = 0), we are back into statics, which requires that U is minimum (use the minimum– energy theorem, Subsection 11.9.6).
14.8 Appendix A. Stability of x ¨ + p˘ x˙ + q˘ x = 0 In the preceding sections, we have examined the stability of the operator in Eq. 14.7 for the limited case of m, c and k positive (Eq. 14.5). We have pointed out that such a solution is asymptotically stable, in the sense that it goes to zero as time tends to infinity (Remark 117, p. 550). For, eβt tends to zero as
14. Damping and aerodynamic drag
575
t tends to infinity, because β < 0 whenever m, c and k are positive (see Eqs. 14.15, 14.18 and 14.19, for small–, large– and critical–damping solutions). In this section, we discuss the stability of the solution to Eq. 14.7 in general, namely we want to explore the conditions that the coefficients m, c and k must satisfy, to make sure that the solution is asymptotically stable. We want to do this from a purely mathematical point of view, independently of the physical phenomenon that the equation represents. To this end, let us write Eq. 14.7 as d2 x dx + q˘ x = 0, + p˘ dt2 dt
(14.101)
where p˘ and q˘ are not necessarily positive. [This includes the case modeled by Eq. 14.7, with p˘ = c/m and q˘ = k/m, provided of course that m = 0.] Remark 120. Phenomena governed by such models exist, but you have to know more about mechanics before I can present them to you. Indeed, unstable solutions that occur in mechanics may be due to several causes. For instance, the motion of a particle in the absence of friction on the top of a hill (think of yourself on skis, on the top of an icy hill) is governed by an equation equal to that in Eq. 14.101, with p˘ = 0 and q˘ < 0. The same situation occurs when we have an inverted pendulum, namely a pendulum in its highest location, an equilibrium position; in this case, the force due to a displacement u from the equilibrium location is not restoring, but repulsive, namely tends to move the pendulum further away (Subsubsection “Small disturbances around the top,” p. 809). A similar effect has the (apparent) centrifugal force acting on a pendulum when the hinge itself rotates around a vertical axis (a problem addressed in Subsection 22.3.2). More complicated problems occur when coupling of more than one equation occurs (see, for instance, Subsection 23.2.5, where we study the stability of rotating bodies, such as frisbees and oval balls). In order to be clear about the meaning given here to the term stability, let us introduce the following Definition 221 (Stability of the operator in Eq. 14.101). Consider the operator in Eq. 14.101, which admits an equilibrium solution x(t) = 0. Consider the General solution xGH(t) of the Homogeneous equation in Eq. 14.101, when both initial conditions (location and velocity) are nonhomogeneous. We say that the operator is unstable iff |xGH(t)| is unbounded, namely iff there exists no M such that |xGH(t)| < M at all times. We say that the operator is stable iff xGH(t) stays bounded for t ∈ (0, ∞). We say that such an operator is asymptotically stable, iff lim xGH(t) = 0.
t→∞
(14.102)
576
Part II. Calculus and dynamics of a particle in one dimension
Finally, we say that the operator is at the stability boundary iff it is stable, but not asymptotically stable. ◦ An important point of clarification. It should be pointed out that the stability of a linear differential equation is always addressed for the homogeneous equation. This means that the results obtained for Eq. 14.101 can be applied also to x ¨ + p˘ x˙ + q˘ x = g(t). In other words, for us the stability is an intrinsic property of the operator. The fact that the solution might tend to infinity because of the type of forcing function is not relevant in determining whether an operator is declared stable or not. In this sense, we have to distinguish between resonance and instability. The phenomenon of resonance is not “due to the instability of the operator.” [An appealing way to appreciate this choice is by looking at the “propagation of the error,” as one of my profs used to say. Specifically, let x1 (t) and x2 (t) be two solutions to x ¨ + p˘ x˙ + q˘ x = g(t), with the difference due for instance to an error in the initial conditions. Then, their difference ε(t) := x1 (t) − x2 (t) satisfies the associated homogenous equation. Accordingly, we would say that stability is related to the “propagation of the error.”] As stated above, the solutions presented in Eqs. 14.103, 14.104 and 14.105 are asymptotically stable, provided that m, c and k are positive (Eq. 14.5), namely that p˘ > 0 and q˘ > 0. It should be apparent, however, that in deriving the corresponding solutions, we never had to use the assumptions that p˘ > 0 and q˘ > 0. These were used only in describing the behavior of the solutions. In other words, the solutions presented above are valid even if p˘ ≯ 0 and q˘ ≯ 0, whereas the conclusion that the operator is asymptotically stable is contingent upon the fact that p˘ > 0 and q˘ > 0. Let us examine the problem without these restrictions. Consider the analysis presented in Section 14.2. Setting p˘ = c/m and q˘ = k/m, we recover the three possibilities obtained there. [See, in particular, the three options in Eqs. 14.15, 14.18 and 14.19, and the corresponding solutions, in 14.23, 14.29 and 14.36, respectively.] Specifically, even if we remove the hypothesis that p˘ > 0 and q˘ > 0, we still have: 1. For p˘2 < 4˘ q, x(t) = A eβt cos ωt + B eβt sin ωt, (14.103) with β = − 12 p˘ and ω = 12 4˘ q − p˘2 (small–damping solution; Eqs. 14.15 and 14.23). q, 2. For p˘2 > 4˘ x(t) = A+e
β+ t
+ A− e
β− t
,
(14.104)
577
14. Damping and aerodynamic drag
with β± = 12 (−˘ p ± p˘2 − 4˘ q ) (large damping solution; Eqs. 14.18 and 14.29). 3. Finally, for p˘2 = 4˘ q, x(t) = A eβcrt + B t eβcrt ,
(14.105)
with βcr = − 12 p˘ (critical damping solution; Eqs. 14.19 and 14.36). We have seen that the solution is asymptotically stable if p˘ > 0 and q˘ > 0. In the following, we want to study what happens if we remove these hypotheses.
• The cases q˘ < 0 and/or p˘ < 0) First, for any value of q˘, let us consider the case in which p˘ < 0, a case of negative damping, a situation not addressed thus far. From the above equations, we see that if p˘ < 0, we have β > 0 in Eqs. 14.103 and 14.105 (small and critical damping), as well as β+ > 0 in Eqs. 14.104 (large damping). [For q˘ > 0, we have that even β− is positive, but this does not affect our conclusions.] Therefore, the solution is unstable for all three cases, because at least one term becomes unbounded as t tends to infinity. Next, for any value of p˘, let us consider the case q˘ < 0, a case of non– restoring (repulsive) force (Remark 120, p. 575), again a situation not addressed thus far. In this case, we have p˘2 > 4˘ q . Accordingly, only the second option (namely the large damping solution, Eq. 14.104) can occur. That said, we have that β+ = 12 − p˘ + p˘2 − 4˘ q is always positive, independently of the value of p˘ (namely no matter how much damping we have). Therefore, the solution is always unstable. In summary, if either p˘ and/or q˘ is negative, the solution is unstable, whereas if both p˘ and q˘ are positive the solution is asymptotically stable, as we have seen in Section 14.2.
• The cases q˘ = 0 and/or p˘ = 0 The case p˘ = q˘ = 0 was addressed in Section 11.4. The solution is given by x(t) = A + B t (Eq. 11.18, particle subject to no force). [It may be noted that this is a particular case of Eq. 14.105, with βcr = − 12 p˘ = 0.] This corresponds for instance to a billiard ball, with a given initial location and a given nonzero initial velocity. This solution is clearly unstable, in the sense that the solution tends to infinity with time. The particle moves further and further away from
578
Part II. Calculus and dynamics of a particle in one dimension
its initial location, with a constant velocity, provided of course that v0 = 0. [Recall that in studying the stability of the operator under consideration we assume both initial conditions to be nonhomogeneous (Definition 221, p. 575).] Next, let us assume that q˘ = 0. We can limit our analysis to q˘ = 0 and p˘ > 0 (namely the case of no spring). [For, the case p˘ < 0 was addressed in the preceding subsubsection for any value of q˘, whereas the case p˘ = q˘ = 0 was addressed in the preceding paragraph.] Then, the solution is necessarily given by Eq. 14.104, with β± = 12 (−˘ p ± |˘ p|). Thus, we have that β+ = 0, β+ t since p˘ > 0 (whereas β− < 0). Then, e = e0 = 1 corresponds to a solution β t constant in time. This yields x(t) = A + B e − . Hence, the solution does not tend to zero as t tends to infinity (even though β− < 0, and hence the second term goes to zero). But it does not go to infinity either — the solution is at the stability boundary (stable, but not asymptotically stable). Next, let us assume that p˘ = 0. The case p˘ = 0 and q˘ > 0, was addressed in Chapter 11 (see Eq. 11.35; solution at the stability boundary). Moreover, the case p˘ = q˘ = 0 was addressed at the beginning of this subsubsection. Finally, the case p˘ = 0 and q˘ < 0, was included in the analysis of the preceding subsubsection (unstable solution). Nonetheless, it may be worth addressing this case in detail. Specifically, the solution is given by x(t) = C+ eαt + C− e−αt = A cosh αt + B sinh αt,
(14.106)
√ with α = −˘ q and C+ = 12 (A + B) and C− = 12 (A − B), as you may verify. Imposing the initial conditions, we have x(t) = x0 cosh αt +
v0 sinh αt. α
(14.107)
This confirms that the solution is unstable (Eqs. 12.61 and 12.62).
• Summary The results are summarized in Table 14.1, where AS stands for “Asymptotically Stable,” SB for “at Stability Boundary,” U for “Unstable.”
q˘ > 0 q˘ = 0 q˘ < 0 Table 14.1 Stability criteria
p˘ > 0 AS SB U
p˘ = 0 SB U U
p˘ < 0 U U U
579
14. Damping and aerodynamic drag
14.9 Appendix B. Negligible–mass systems In this section, we study what happens when the term m¨ x (in Eqs. 14.4, 14.6 and 14.7) is very small compared to the other ones and may be neglected. In other words, here we present an analysis of first–order differential equations. [Recall that an equation is called differential iff the corresponding expression contains the unknown function and at least one of its derivatives (Remark 82, p. 406). A first–order differential equation contains only the function and its first–order derivative.] ◦ What’s the point? You might wonder why anyone should care about such a peculiar problem. To begin with, in the preceding sections, we spent a lot of time studying second–order differential equations. Therefore, it makes sense to complete our work and include as well first–order differential equations, which incidentally are the simplest differential equation that we can consider. But that is not the only reason to study such equations. In order to clarify the basis for this broader interest, you might like to know that a very broad class of (linear) dynamical systems may be reduced (within the real field) to a collection of uncoupled equations of two types: (i) second–order differential equations (such as those governing damped or undamped harmonic oscillators), and (ii) first–order differential equations (such as those examined here). Thus, the analysis presented in this section facilitates the introduction of basic concepts that will be used in a more complicated context. Accordingly, here we address this very special case, namely the limit of Eqs. 14.4, 14.6 and 14.7, whenever the term m¨ x is negligible. We limit ourselves to the case c > 0 and k > 0. Setting k/c = a > 0 and F (t)/c = f (t), Eq. 14.4 with m = 0 yields x˙ + a x = f (t).
(14.108)
[The results obtained here are valid even if a < 0, as you may verify.] To complete the problem, we need to add an initial condition, namely that the initial location is assigned: x(0) = x0 ,
(14.109)
where x0 is a prescribed constant. Remark 121. As already hinted in Remark 98, p. 465, you might wonder what is the correct number of conditions to impose to obtain a unique solution. Why now we add only one condition? Again, at this point, you might want to consider this choice as “based upon experimental evidence.” [The
580
Part II. Calculus and dynamics of a particle in one dimension
issue of the initial conditions, and the corresponding theorems of existence and uniqueness will be addressed in Vol. II.]
14.9.1 The problem with no forcing Assume that f (t) = 0. This is called the unforced problem in mechanics, or the homogeneous problem in mathematics. Then, it is easy to verify that x(t) = C e−at
(14.110)
satisfies x+a ˙ x = 0 (Eq. 14.108, with f (t) = 0). Next, let us impose the initial condition, namely x(0) = x0 (Eq. 14.109). This yields C = x0 . Therefore, x(t) = x0 e−at
(14.111)
satisfies the differential equation and the initial condition. Thus, this is the solution to the above problem (Remark 96, p. 459, on the uniqueness of the solution).
• A comparison with the large–damping solution
♠
It may be of interest to compare this solution with the limit, as the mass tends to zero, of the unforced damped harmonic oscillator, addressed in Section √ 14.2. In the limit, as m tends to zero, we always have that c > ccr := 2 mk (large–damping solution). The large–damping solution is given by Eq. 14.31, namely , , v0 + β+ t x0 + β t β t β t xLD(t) = β + e − − β− e + + e − e − , (14.112) β+ − β− β+ − β− √ with β± = (−c ± c2 − 4mk)/(2m) (Eq. 14.18). Therefore, we obtain that β− tends to −∞ as m tends to zero. Accordingly, the second term on the right side of Eq. 14.112 tends to zero, whereas the limit of the first term is finite. Specifically, we have lim xLD(t) = x0 e
β− →∞
β+ t
,
(14.113)
as you may verify, with β+ understood in the limit as m tends to zero. To obtain such a limit, let us use the alternate expression for √the roots (namely Eq. 6.41, instead of Eq. 6.36), which gives β± = 2k/(−c∓ c2 − 4mk). Therefore, we have
581
14. Damping and aerodynamic drag
lim β+ = lim
m→0
m→0
2k −k √ . = 2 c −c − c − 4mk
(14.114)
Combining with Eq. 14.113, we obtain lim xLD(t) = x0 e−kt/c ,
(14.115)
m→0
in agreement with Eq. 14.111, since a = k/c.
14.9.2 The problem with harmonic forcing Here, we assume that f (t) = F˘ cos Ωt (harmonic–forcing problem, Eq. 14.42). Let us try a solution of the type x(t) = C cos Ωt + D sin Ωt.
(14.116)
Combining with Eq. 14.108 yields Ω (−C sin Ωt + D cos Ωt) + a (C cos Ωt + D sin Ωt) = F˘ cos Ωt,
(14.117)
or (a C + Ω D − F˘ ) cos Ωt + (−Ω C + a D) sin Ωt = 0.
(14.118)
This necessarily implies that (Theorem 122, p. 433, on the linear independence of sin x and cos x) a C + Ω D = F˘ , −Ω C + a D = 0.
(14.119)
This is a system of two linear algebraic equations with two unknowns, C and D, whose solution is given by C=
a2
a F˘ + Ω2
and
D=
Ω F˘ , + Ω2
a2
(14.120)
as you may verify. Combining with Eq. 14.116, we have that xPN(t) (Particular solution to the Non–homogeneous equation under consideration) is given by xPN(t) =
a2
a F˘ Ω F˘ cos Ωt + 2 sin Ωt. 2 +Ω a + Ω2
(14.121)
582
Part II. Calculus and dynamics of a particle in one dimension
Next, recall that the operator in Eq. 14.108 (namely L, with L x = x+a ˙ x) is linear (Eq. 9.162). Thus, we can exploit the superposition theorem for linear nonhomogeneous equations (Theorem 114, p. 407). Accordingly, we can add to xPN(t) the general solution of the homogeneous problem, given by Eq. 14.110, since doing this does not alter the contribution to the right side of Eq. 14.108. This yields x(t) = C cos Ωt + D sin Ωt + A e−at ,
(14.122)
where C and D are known (Eq. 14.120), whereas A is an arbitrary constant. Let me emphasize again that now we impose only one initial condition, Eq. 14.109. This allows us to determine the value of the arbitrary constant A. Specifically, by combining Eqs. 14.109 and 14.122, one obtains x(0) = C + A = x0 , namely A = x0 − C. Thus, the solution to our problem is x(t) = C cos Ωt + D sin Ωt + (x0 − C) e−at ,
(14.123)
where x0 , C and D are known constants (Eqs. 14.109 and 14.120), since F˘ is prescribed (see Eq. 14.73). [Indeed, this expression satisfies Eq. 14.108 with f (t) = F˘ cos Ωt, as well as the initial conditions (Eq. 14.109), and hence is the solution to the above problem (again Remark 96, p. 459, on the uniqueness for the solution).]
14.9.3 The problem with arbitrary forcing
♣
Finally, consider the general problem, namely Eq. 14.108, with f (t) arbitrary. I claim that the solution is given by / t x(t) = x0 e−at + e−a(t−τ ) f (τ ) dτ. (14.124) 0
Indeed, using Eqs. 10.97, 12.28 and 12.30, we have / t ) * dx −at = −a x0 e −a e−a(t−τ ) f (τ ) dτ + e−a(t−τ ) f (τ ) dt τ =t 0 = −a x + f (t), (14.125) in agreement with Eq. 14.108. In addition, we have x(0) = x0 , as you may verify. Thus, Eq. 14.124 satisfies both, the differential equation and the initial condition. Hence, it is the solution to the above problem (see once again Remark 96, p. 459, on the uniqueness of the solution).
Part III Multivariate calculus and mechanics in three dimensions
Chapter 15
Physicists’ vectors
Thus far, we did the best we could within the limitations due to the one– dimensional motion assumption. With this chapter, we begin Part III, where we move from one–dimensional formulations of mathematics and mechanics into three–dimensional ones, thereby completing the objectives of this volume.
• Overview of Part III Let me begin with an overview of Part III, so as to give you an idea of its overall aim. In order to perform the transition from one to three dimensions, it is necessary to introduce physicists’ vectors, those that look like arrows. They are introduced in this chapter. Next, before continuing, we need to strengthen our background on matrices and mathematician’s vectors, by introducing new notions, such as inverse and orthogonal matrices, as well as change of bases (Chapter 16). In Chapter 17, we apply this new know–how to study statics in three dimensions. Then, before tackling dynamics, we have to expand our background in infinitesimal calculus from one to several dimensions, into the so–called multivariate differential and integral calculus (Chapter 18 and 19, respectively). With this, we will be in a position to study the dynamics of single–particle and n-particle dynamics, in three dimensions (Chapters 20 and 21, respectively). Next, we address (apparent) forces, which mysteriously show up in non–inertial frames of reference, such as centrifugal and Coriolis forces (Chapter 22). We conclude this volume with an introduction to rigid– body dynamics (Chapter 23).
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_15
585
586
Part III. Multivariate calculus and mechanics in three dimensions
• Overview of this chapter Turning to this chapter, let me say that, just as numbers and functions are the key building blocks of mathematical analysis, so vectors are the key building blocks of mechanics. Of course, I am referring to physicists’ vectors, which were already announced in Remark 30, p. 94. They are formally introduced in this chapter, along with all the rules that govern their use. Specifically, in Section 15.1, we define physicists’ vectors and introduce their basic properties. [In particular, we introduce the following basic operations: multiplication by a scalar and addition. These are fully corresponding to the analogous operations on mathematicians’ vectors, which have been presented in Subsection 3.1.2. In addition, we discuss projections, as well as orthogonal and orthonormal physicists’ vectors, bases and vector components in a given basis.] And that’s not all there is! We also need some special products between vectors. We begin with the so–called dot product between two vectors (Section 15.2). In particular, we will present its properties, as well as its expression in terms of components. We then use this to clarify the relationship between components and projections and also to give a geometrical interpretation of the equations for lines and planes. Next, in Section 15.3, we introduce the so– called cross product between two vectors, along with its properties, whereas in Section 15.4, we introduce various combinations of dot and cross products. Then, in Section 15.5 we discuss how to perform changes of base vectors. Finally, in Section 15.6, we present the Gram–Schmidt procedure to generate a so–called orthonormal basis for physicists’ vectors. We also have an appendix (Section 15.7), on a property of parabolas, which is related to the fact that a light bulb placed at the focus of a paraboloid of revolution produces a beam of light rays parallel to the axis of the paraboloid. This is a coda to the material in Subsection 7.9 on conics, which is more easily dealt with by using physicists’ vectors.
15.1 Physicists’ vectors In the formulation of Newtonian mechanics, we assume that we live in a Euclidean three–dimensional space, namely a space in which the three– dimensional Pythagorean theorem holds. Accordingly, here the discussion of vectors is limited to Euclidean spaces. Specifically, we avoid non–Euclidian spaces, which are needed in Einstein’s relativistic mechanics (general relativity), but not in Newtonian mechanics, the only one of interest here.
15. Physicists’ vectors
587
What is a physicists’ vector? Why do we need to introduce them? The importance of physicists’ vectors may be appreciated by considering a displacement in our ordinary space. Let us carry an object from point A to point B. To represent this displacement, we need to know the direction of the displacement, as well as its magnitude. Thus, an arrow would be adequate to describe this, whereas the segment AB would not, because it does not tell us whether we are coming or going. Another good example of a physical entity to be represented by a vector is the velocity of a particle at a given time t, such as the velocity of an artificial satellite. We need to identify first of all the magnitude of the velocity, namely the absolute value of the speed. [Recall Remark 89, p. 449, on the distinction made between velocity, a vector quantity, and speed, a scalar quantity. The speedometer gives you the speed of your car, not its velocity.] However, we also need its direction, namely the tangent to the trajectory and which way the satellite is going. Again, an arrow can do that. Similar considerations apply to the acceleration, which is the rate of change of the velocity (and not of the speed, as you might be inclined to think). [As already pointed out in Remark 90, p. 450, the rate of change of the speed of a car is colloquially referred to as the car pickup. As we will see in Section 20.3, the acceleration includes also a term that is related to the change of direction of the velocity (Eq. 20.32).] ◦ Warning. We have implicitly assumed that physicists’ vectors are real. Indeed, in this book, we have no use for complex physicists’ vectors. ◦ Comment. It looks that the use of arrows is adequate to describe all these physical entities, namely displacements, velocities and accelerations. Why then, do we need a whole chapter to introduce physicists’ vectors, instead of being content with the use of arrows? The reason is that identifying a vector simply with an arrow would be misleading, because, when we talk about physicists’ vectors, there is much more than what is describable by an arrow. Specifically, there are important vector operations that make a vector much more complicated than an arrow. To clarify this statement, assume that I move an object from point A to point B and then from B to C; this is equivalent to moving the object from A to C. Thus, we also need a rule that tells us how to add the two displacements in the correct way. We also need a rule for multiplication by a scalar. And then, . . . , then, there is much much more. We have the following Definition 222 (Physicists’ vector). A physicists’ vector is a geometrical entity characterized by magnitude and direction. Direction is understood as inclusive of alignment and pointing. [Sometimes, instead of pointing, we use orientation, as in “oriented segment,” Definition 80, p. 176.] Two vectors are
588
Part III. Multivariate calculus and mechanics in three dimensions
said to have the same direction iff they are mutually parallel and point in the same way.1 ◦ Warning. Some authors use the term “direction” with the meaning given here to the term “alignment” (namely excluding the pointing) and “sense” with the meaning given here to the term “pointing.” I prefer the terminology adopted here. The other one has always been very confusing to me and to my students, as too remote from everyday use of the term direction. Remark 122. In an attempt to reduce the confusion, I will adopt the following notation. Physicists’ vectors are denoted by lower–case boldface letters, like a, b, . . . (and α, β, . . . ). This will be referred to as vector notation. [Note the difference with respect to lower–case sans serif letters, like a, b, . . . (and α, β, . . . ), which represent mathematicians’ vectors (Footnote 2, p. 90, on matrix notation).] ◦ Warning. For the sake of conciseness, in the rest of this chapter the term “vector” is understood to mean “physicists’ vector.” Graphically, we represent a vector with an oriented segment, namely an arrow: the length of the segment represents the magnitude, whereas the direction of the vector is identified by the alignment of the line that contains the segment, along with the pointing of the arrowhead (Fig. 15.1).
Fig. 15.1 Representation of a vector
Remark 123. A vector may be translated to any place you want. In other words, two vectors that are parallel, point the same way and have the same 1 Vector calculus, pretty much the way we know it, was introduced independently and nearly simultaneously by Oliver Heaviside and Willard Gibbs. Josiah Willard Gibbs (1839–1903) was an American scientist, who made important contribution to several fields. In addition to his work on modern vector analysis, he is one of the founding fathers of thermodynamics and statistical mechanics, and their interrelation. The Gibbs phenomenon in Fourier analysis is also named after him. [If you would like to know more about the development of vector calculus, you might like to consult “A History of Vector Analysis,” by Michael J. Crowe (Ref. [15]), which presents a detailed analysis of its history — from Leibniz to Gibbs and Heaviside, and then more.]
589
15. Physicists’ vectors
magnitude are considered equal. [Some authors distinguish between applied vectors, for which you have to add to Definition 222 above that they have a point of application, and free vectors, which can be freely moved in the space (no point of application). I will need to include the point of application only for forces. Accordingly, I will introduce applied vectors only in the next chapter. Free vectors are simply referred to as (physicists’) vectors — the qualifier “free” is tacitly understood.] Definition 223 (Magnitude (or norm) of a vector). The magnitude of a vector b (sometimes referred to as norm) is denoted by the symbol b, or, if there is no possibility of confusion, by the same letter in italics: b = b.
(15.1)
As an illustrative example of a vector quantity, we can discuss again the velocity of a particle. To identify this, we need to indicate the magnitude of the velocity, v = v. [I repeat, this is typically denoted the speed. I said “typically” because of our choice that speed may have a negative sign, as when your car is in reverse (Remark 89, p. 449).] We also need the direction of the motion (that is, the line tangent to the trajectory at time t, and the pointing of the motion along the trajectory, namely whether the particle is coming or going). Another example of vector (“applied,” this time) is a force in three dimensions. To visualize this, assume that you are pulling an object with a rope; the magnitude of the force that you are applying to the object is given by the tension in the rope (as in the one–dimensional case), whereas the direction of the force is parallel to the rope, pointing away from the object. Remark 124. For future reference, a string (or a rope) has the property that the forces acting on its endpoints are always aligned with the string itself, and they are necessarily such that the string is in tension and not in compression. You may consider this as the definition of a string. In addition, comparing one– and three–dimensional forces, we see that the only really new aspect is the alignment of the force; the pointing replaces the sign of the one–dimensional force. [More in Remark 139, p. 695.]
15.1.1 Multiplication by scalar and addition In analogy with mathematicians’ vectors, a physicists’ vector may be multiplied by a scalar, according to the following
590
Part III. Multivariate calculus and mechanics in three dimensions
Definition 224 (Multiplication of a vector by a scalar). Let b be a vector and α a scalar. Then, c := α b is a vector that is parallel to b and has magnitude c = |α| b. Iff α > 0, the pointing is the same as that of b; iff α < 0 it’s the opposite one. [Multiplication by −1 yields simply a change in the way the vector is pointing.] Two vectors may be added to each other according to the following Definition 225 (Addition of vectors). Let b1 and b2 be two vectors. The vector b = b1 +b2 is the vector obtained by using the so–called parallelogram rule, illustrated in Fig. 15.2.
Fig. 15.2 Parallelogram rule
Fig. 15.3 Parallelepiped rule
Note that the parallelogram rule for the sum of two vectors applies to both two and three dimensions. [For the three–dimensional case, the rule is applied on the plane determined by the two vectors.] For the sum of three non–coplanar three–dimensional vectors, we use the parallelogram rule twice and obtain the parallelepiped rule (Fig. 15.3). Note that b1 + b2 = b2 + b1
and
(b1 + b2 ) + b3 = b1 + (b2 + b3 ),
(15.2)
as apparent from the parallelogram and parallelepiped rules. We also have (α + β) b = α b + β b
(15.3)
α (a + b) = α a + α b,
(15.4)
and
as you may verify on the basis of Definitions 224 and 225 above. Next, consider the following Lemma 13 (Uniqueness of decomposition). Given a vector b and two arbitrary distinct directions, we have b = b1 + b2 , with b1 and b2 paral-
591
15. Physicists’ vectors
lel respectively to the two directions under consideration. This decomposition is unique. [This holds true even if we are dealing with a vector b in space, provided that the two directions and b are coplanar.] Similarly, in three dimensions, given a vector b and three distinct non–coplanar directions, we have b = b1 + b2 + b3 , with b1 , b2 , and b3 parallel respectively to the three given directions. This decomposition is also unique. ◦ Proof : Use the parallelogram rule for the first part, and the parallelepiped rule for the second. [Note that the process used here is the inverse of that in Fig. 15.3, p. 590. Indeed, in the figure we use the parallelepiped rule to compose the vectors b1 , b2 and b3 and obtain the vector b, whereas here we use the same rule to decompose b into the vectors b1 , b2 and b3 along the given directions.]
15.1.2 Linear independence of physicists’ vectors This subsection deals with the linear independence of physicists’ vectors. We have the following definitions (analogous to those for mathematicians’ vectors): Definition 226 (The physicists’ vector 0). The symbol 0 denotes the zero vector, namely a physicist vector whose magnitude equals zero. Any vector different from 0 is called a nonzero vector. Definition 227 (Linear combination of vectors). Consider n nonzero physicists’ vectors b1 , . . . , bn . Their linear combination denotes the following operation n #
ck bk .
(15.5)
k=1
Definition 228 (Linearly dependent/independent vectors). Consider n nonzero physicists’ vectors b1 , . . . , bn . They are called linearly dependent iff there exists a nontrivial linear combination (namely one in which not all the coefficients ck vanish) such that n #
ck bk = 0.
(15.6)
k=1
They are called linearly independent iff they are not linearly dependent. As an immediate consequence, we have the following
592
Part III. Multivariate calculus and mechanics in three dimensions
Theorem 146 (Linear dependence of coplanar vectors). Consider three vectors in R3 , say b1 , b2 , b3 . If they are coplanar, they are linearly dependent. On the other hand, three non–coplanar vectors are linearly independent. ◦ Proof : Indeed, the fact that in two dimensions any vector may be expressed as b = c1 b1 +c2 b2 , with b1 and b2 parallel to two distinct directions, implies that three coplanar vectors plane are necessarily linearly dependent, as you may verify. This completes the first part of the theorem. On the other hand, recall that any vector may be uniquely written as b = c1 b1 + c2 b2 + c3 b3 , with b1 , b2 , and b3 parallel to three non–coplanar directions (see Lemma 13 above). [This implies that, if b = 0, then necessarily c1 = c2 = c3 = 0.] ◦ Comment. Similarly, we have that we cannot have more than three linearly independent physicists’ vectors. [Hint: Given three linearly independent ones, use the parallel rule to express any other one in terms of these three.]
15.1.3 Projections Consider the following Definition 229 (Unit vector). A vector whose amplitude equals one is called a unit vector. ˆ to denote the unit vector in the ◦ Warning. Many authors use the symbol b ˆ direction of the vector b, namely b = b/b. In my experience, this does not sufficiently emphasize the difference between the two and has been a source of confusion for my students. Accordingly, I typically use the symbol e to denote unit vectors. Specifically, I use for instance eb to denote a unit vector in the direction of the vector b, namely eb = b/b. Without a subscript, e denotes a generic unit vector. The only other symbols used to denote unit vectors are i, j, k, with or without subscripts (see for instance Eqs. 15.9, 15.10 and 22.1), as well as t and n for unit tangent and unit normal. [Accordingly, ˘ for tangents and normals, when they are not I will use the symbols ˘t and n necessarily unit vectors.] Consider the following Definition 230 (Projection of a vector into a direction). Given a vector b and a direction defined by a unit vector e, the projection of b in such a direction is given beP = b cos θ,
(15.7)
593
15. Physicists’ vectors
Fig. 15.4 Projection
where θ is the angle between b and e, as in Fig. 15.4. This is analogous to Definition 105, p. 197, regarding the projection of segments. Now, however, the projection has a sign (contrary to the projections of segments, since segments have no orientation). In particular, the projection coincides with b if θ = 0, and with −b if θ = π. Moreover, the projection vanishes if, and only if the two vectors are orthogonal, namely iff θ = π/2 (by definition). Finally, if θ is acute (obtuse), the projection is positive (negative). [Recall Definition 87, p. 178, of acute and obtuse angles.]
15.1.4 Base vectors and components In this subsection, we introduce base vectors and components. For brevity, we present only the formulation for vectors in a three–dimensional space. For two–dimensional vectors, the definitions are analogous.
Fig. 15.5 Basis and components
Fig. 15.6 Orthonormal basis
594
Part III. Multivariate calculus and mechanics in three dimensions
Consider Fig. 15.5 and Lemma 13, p. 590, on the uniqueness of decomposition. We have the following Definition 231 (Basis, base vector, and component). We know that in three dimensions, given an ordered set of three non–coplanar vectors g1 , g2 , g3 , using the parallelepiped rule we have that any vector b may be decomposed uniquely as b = b1 g1 + b2 g2 + b3 g3 =
3 #
bk gk .
(15.8)
k=1
The set composed of the three non–coplanar vectors, say g1 , g2 , g3 , is called a basis. The vectors gk are called the base vectors. The coefficients bk are called the components of the vector b in the basis g1 , g2 , g3 . Note that, as in the proof of Lemma 13, p. 590, the process illustrated in Fig. 15.5 is the inverse of that in Fig. 15.3, p. 590. Specifically, in the latter we use the parallelepiped rule to compose the vectors b1 , b2 and b3 to obtain the vector b, whereas in the former we use the same rule to decompose the vector b along the directions defined by the vectors g1 , g2 and g3 and obtain the components bk of the vector b in such a frame of reference. In addition, we have the following definitions: Definition 232 (Normalized vectors). A vector may be normalized by imposing the so–called normalization condition that its norm (Definition 223, p. 589) be equal to one. This is achieved by dividing a vector by its magnitude. [It should be emphasized that here “normalized” does not mean “made normal” (namely orthogonal, Definition 92, p. 180). Rather it refers to the condition on the norm, a term used to indicate the magnitude of a physicist vector (Definition 223, p. 589).] Definition 233 (Orthogonal and orthonormal vectors). Vectors that are perpendicular to each other are called orthogonal. Mutually orthogonal unit vectors are called orthonormal. [The term “orthonormal” is a contraction of the terms “orthogonal” and “normalized,” indicating that the orthogonal vectors have been normalized by imposing the normalization condition that their norm is equal to one.] Definition 234 (Orthonormal basis). A basis is called orthonormal iff its base vectors are orthonormal. [In this volume, we deal primarily with orthonormal bases. Non–orthonormal bases are addressed in Vol. III.]
595
15. Physicists’ vectors
Definition 235 (Right–handed and left–handed ordered vectors). A set of three distinct ordered non–coplanar vectors is called right–handed iff, pointing the thumb of the right hand like the first vector and the index finger like the second vector, the middle finger is directed like the third vector, as in Figs. 15.5 and 15.6. Otherwise, it is called left–handed. Definition 236 (Right–handed and left–handed bases). An orthonormal basis is called right–handed (left–handed ) iff the (ordered) base vectors form a right–handed (left–handed) set. ◦ Warning. In the rest of this book, all the bases are assumed to be right– handed orthonormal bases, unless otherwise specified. Orthonormal bases are typically denoted by the standard notation (15.9)
i, j, k.
However, if we want to use summation symbols, it is more convenient to use instead (see Fig. 15.6) i 1 , i2 , i3 .
(15.10)
Correspondingly, Eq. 15.8 becomes, respectively, b = bx i + by j + bz k
(15.11)
and b = b1 i1 + b2 i2 + b3 i3 =
3 #
bk i k .
(15.12)
k=1
Of particular interest is the location vector x for the point P , namely the vector that corresponds to the oriented segment OP . In terms of components, we use the notations x = x i + y j + z k,
(15.13)
or x = x1 i1 + x2 i2 + x3 i3 =
3 #
xk ik .
(15.14)
k=1
◦ Comment. We consider x as a free vector (Remark 123, p. 588), specifically as a vector that has the tip placed at P whenever the other end is placed at the origin O.
596
Part III. Multivariate calculus and mechanics in three dimensions
Remark 125. We will refer to the notation used in Eqs. 15.9, 15.11 and 15.13 as the expanded notation, whereas those used in Eqs. 15.10, 15.12 and 15.14 are referred to as indicial notation. [The term index notation is also used.] Next, consider the operation of multiplication of a vector by a scalar α, in terms of components. We have (use Eq. 15.4) α b = α bx i + by j + bz k = α bx i + α by j + α bz k, (15.15) or, in indicial notation, αb = α
3 #
bh i h =
h=1
3 #
α b h ih .
(15.16)
h=1
In other words, if we multiply a vector by a scalar α, each component is multiplied by α. [This rule applies as well if the base vectors are not orthonormal, as you may verify.] Similarly, for the operation of addition, we have (use Eqs. 15.2 and 15.3) a + b = (a1 i1 + a2 i2 + a3 i3 ) + (b1 i1 + b2 i2 + b3 i3 ) = (a1 + b1 ) i1 + (a2 + b2 ) i2 + (a3 + b3 ) i3 .
(15.17)
In other words, if we add two vectors, the corresponding components are added. [Again, this rule applies as well if the base vectors are not orthonormal, as you may verify.] ◦ An important consideration. It should be emphasized that in this book none of the definitions regarding physicists’ vectors is given in terms of components. This is true, in particular, not only for the definition of physicists’ vectors (Definition 222, p. 587), but also for those that follow, such as dot product (Eq. 15.18) and cross product (Eq. 15.58). This is done to emphasize the “invariance” of the definitions, namely their independence of the frame of reference.
• Notations again We have already introduced different types of notation, in particular, matrix notation (Footnote 2, p. 90), vector notation (Remark 122, p. 588), expanded and indicial notation (Remark 125, p. 596). At this point, it seems like a good idea to summarize these definitions. Accordingly, we have the following Definition 237 (Notation). We have:
15. Physicists’ vectors
597
1. Vector notation denotes the use of lower–case boldface letters to represent physicists’ vectors. [As we will see, in this book capital boldface letters are reserved for second–order tensors, mathematical entities that will be introduced in Section 23.9.] 2. Matrix notation denotes the use of sans serif letters, to represent matrices (upper–case letters, Definition 43, p. 92) and mathematicians’ vectors (lower–case letters, Definition 47, p. 93). 3. Indicial notation denotes the use of subscripted notation to represent the elements of matrices, in particular mathematicians’ vectors (one index for the latter, and two for the former). Indicial notations are also used for the components of physicists’ vectors (and later for the components of tensors). [As pointed out in Remark 30, p. 94, the components of the physicists’ vector may be used as the elements of a mathematicians’ vector.] 4. Expanded notation, also used for the components of physicists’ vectors, refers to the use of the symbols i, j, k for the base vectors, the letters x, y, z for the components of the location vector x, the letters u, v, w for the component of the velocity vector v (sometimes for the displacement vector u), as well as bx , by , bz for a generic vector b.
15.2 Dot (or scalar) product In this section, we introduce the dot product (also known as the scalar product), between two (real) vectors. Specifically, we have the following (recall the definition of the magnitude of a physicists’ vector, Eq. 15.1)
Fig. 15.7 Dot product
Definition 238 (Dot product). Consider two vectors a and b (Figure 15.7). The dot product a · b (to be read as: “Vector a dot vector b”) is a scalar given by
598
Part III. Multivariate calculus and mechanics in three dimensions
a · b := a b cos θ ≤ a b,
(15.18)
where θ ∈ [0, π] is the angle between a and b. Note that the result is a scalar — hence the name scalar product. [The verb “to dot” is used to mean “to take the dot product.” For instance, we use the expression “dotting a with b” to mean “taking the dot product between a and b.”] Note that, given two vectors a and b, setting ea = a/a (unit vector with the same direction as a), we have (recall the notation for projections introduced in Eq. 15.7) bePa := b cos θ = b · ea .
(15.19)
• Properties of the dot product Note that, from the above definition, we have a · b = b · a.
(15.20)
Thus, all the properties that we will show for b apply to a as well. If b is parallel to a, we have cos θ = ±1, and hence a · b = ± a b,
(15.21)
with the plus (minus) sign if a and b have the same (opposite) pointing. On the other hand, if a and b are orthogonal, then cos θ = 0 and a·b=0
(orthogonality condition).
(15.22)
◦ Comment. Recall that i1 , i2 , and i3 denote orthonormal vectors (Eq. 15.10). We can express this by using the dot product to yield ih · ik = δhk ,
(15.23)
where δhk denotes the Kronecker delta (Eq. 3.29). In addition, we have that a =
√
a · a,
a · (α b) = α (a · b),
(15.24) (15.25)
which are immediate consequences of the definition (Eq. 15.18), as you may verify. Moreover, we have the following
599
15. Physicists’ vectors
Theorem 147. We have a · (b + c) = a · b + a · c.
(15.26)
Fig. 15.8 Proof that e · (b + c) = e · b + e · c
◦ Proof : To prove this, let us begin by assuming that the vectors b, c and e are coplanar, where e is an arbitrary unit vector. Then, we have e · (b + c) = e · b + e · c.
(15.27)
[Hint: The projection along e of the sum b + c is equal to the sum of the projections along e of b and c, as apparent from Fig. 15.8.] On the other hand, for any a, with a = 0 (otherwise Eq. 15.26 is trivial), set e = a/a in Eq. 15.27 and then multiply by a, and obtain Eq. 15.26. Next, consider the case in which the vectors b, c and e are not coplanar. In this case, we would have to modify Fig. 15.8 as follows. Let the plane of the figure contain e and b. Then replace (in the figure) c with c := c − cN, where cN = (c · n) n (n being the normal to the plane of the figure) denotes the portion of c normal to the plane of the figure, whereas c · n = 0, as you may verify, so that c is on the plane of the figure. Next, note that the projection of cN into any vector on the plane of the figure vanishes, as you may verify. Accordingly, we have c · e = c · e. Of course, this aspect affects the figure, but not the final result, namely the validity of Eq. 15.26. ◦ Comment. Note that Eqs. 15.25 and 15.26 indicate that the dot product is linear with respect to b, namely that a · (c1 b1 + c2 b2 ) = c1 a · b1 + c2 a · b2 .
(15.28)
600
Part III. Multivariate calculus and mechanics in three dimensions
Thus, recalling all the properties for b apply to a as well (see the comment below Eq. 15.20), we have that the dot product is linear with respect to both terms, or, as mathematicians say, the dot product is bilinear.
15.2.1 Projections vs components in R3 We have defined the components of a vector in an orthonormal basis as the scalars bk such that b = b1 i1 + b2 i2 + b3 i3 (Eq. 15.12). On the other hand, the projection of b in the direction of a unit vector e is given by b · e (Eq. 15.19). What is the relationship between components and projections? We have the following Theorem 148 (Components vs projections). Components and projections in the direction of the base vectors coincide, namely b k = b · ik ,
(15.29)
if, and only if, the base vectors are orthonormal. ◦ Proof : Indeed, in general, using Eq. 15.8 and introducing the unit vector ek = gk /gk , we have that the projection in direction ek is given by 3 # gk gk b · ek = b1 g1 + b2 g2 + b3 g3 ) · = . bj gj · gk j=1 gk
(15.30)
If, and only if, the vectors are orthonormal, namely iff gk = ik , with ik satisfying ih · ik = δhk (Eq. 15.23), Eq. 15.30 reduces to the desired Eq. 15.29. As a consequence, we have, for any vector b, b=
n #
(b · ik ) ik ,
(15.31)
k=1
provided of course that the vectors ik form an orthonormal basis (Eq. 15.23). ◦ Comment. The relationship between components and projections in non– orthonormal bases is addressed in Vol. III.
601
15. Physicists’ vectors
15.2.2 Parallel– and normal–vector decomposition Consider the following Definition 239 (Parallel– and normal–vector decomposition). In two and three dimensions, given a vector b and a direction defined by a unit vector e, by using the parallelogram rule we can always decompose b as b = b P + bN ,
(15.32)
with bP and bN are parallel and normal to e, respectively (see Fig. 15.9).
Fig. 15.9 Parallel and normal vectors
We have the following Theorem 149 (Parallel– and normal–vector decomposition). For the terms bP and bN in Eq. 15.32, we have bP = (b · e) e
and
b N = b − bP .
(15.33)
◦ Proof : Indeed, the second in Eq. 15.33 is fully equivalent to Eq. 15.32. In addition, bP is clearly parallel to e. On the other hand, bN is normal to e. For, bN · e = (b − bP) · e = b · e − (b · e) (e · e) = 0, since e · e = e2 = 1.
(15.34)
◦ Comment. It should be emphasized that for any vector c, the term bN (normal to the direction defined by c) does not contribute to the dot product b · c, in the sense that b · c = bP · c,
(15.35)
where bP is parallel to c. [Hint: Use, in this order, Eqs. 15.32, 15.26 and 15.34.]
602
Part III. Multivariate calculus and mechanics in three dimensions
◦ Comment. Note the relationship between beP and bP. We have (compare Eq. 15.7, where the superscript P stands for Projection, and the first in Eq. 15.33, where subscript P stands for parallel) bP = beP e.
(15.36)
15.2.3 Applications Note that a ± b2 = (a ± b) · (a ± b) = a2 ± 2a · b + b2 = a2 ± 2 a b cos θ + b2 .
(15.37)
◦ Triangle inequality. For θ = 0 (namely if the two vectors are not co– directed), Eq. 15.37 implies (use cos θ < 1 for θ = 0) 2 a + b2 < a2 + 2 a b + b2 = a + b ,
(15.38)
a + b < a + b,
(15.39)
namely
a generalization to physicists’ vectors of the triangle inequality (Theorem 56, p. 198), which states that one side of a triangle is always smaller than the sum of the other two. [Iff θ = 0 we have a + b = a + b.]
Fig. 15.10 Diagonals of a parallelogram
◦ Generalization of the Pythagorean theorem. Using a := a and b := b to denote the sides of the parallelogram, and c± := a ± b to denote the two diagonals (Fig. 15.10), we have c2± = a2 + b2 ± 2 a b cos θ.
(15.40)
603
15. Physicists’ vectors
Hence, for the sides of the triangle formed by a, b and c := a − b, we have c2 = a2 + b2 − 2 a b cos θ,
(15.41)
in agreement with the generalization of the Pythagorean theorem (Eq. 7.72). [You might appreciate how much simpler is the proof here, compared to that given in Section 7.5. Again, often the more sophisticated is the math, the simpler it is to obtain the results.]
15.2.4 Dot product in terms of components Using ih · ik = δhk (Eq. 15.23), and Eqs. 15.25 and 15.26, we have a · b = a 1 i 1 + a 2 i 2 + a 3 i 3 · b1 i 1 + b2 i 2 + b3 i 3 = a 1 b 1 + a 2 b 2 + a 3 b3 .
(15.42)
This is the desired expression of the dot product in terms of the components of the vectors a and b. We have a·b=
3 #
ak bk = aT b,
(15.43)
k=1
where a = ak T and b = bk T are the mathematicians’ vectors whose elements equal the components of the physicists’ vectors a and b. [Note that the three terms in Eq. 15.43 express the same quantity by using three different types of notation (Definition 237, p. 596).] √ Accordingly, Eq. 15.43 yields (use a = a · a , Eq. 15.24) a2 = a · a = a21 + a22 + a23 = aT a.
(15.44)
Also, the orthogonality condition (Eq. 15.22) may be written as a·b=
3 #
ak bk = aT b = 0.
(15.45)
k=1
For future reference, the definition of dot product (Eq. 15.18) may be inverted to yield the angle between two vectors as θ = cosP-1
a·b ∈ [0, π], a b
(15.46)
where y = cosP-1 x denotes the principal branch of y = cos -1 x (Eq. 6.135).
604
Part III. Multivariate calculus and mechanics in three dimensions
• Orthogonal vectors in two dimensions We have the following Theorem 150. Consider two two–dimensional vectors a = a1 i1 + a2 i2 and b = b1 i1 + b2 i2 . Their orthogonality condition, a · b = 0 (Eq. 15.22), implies necessarily b 1 = λ a2
and
b2 = −λ a1 .
(15.47)
If they have the same magnitude, namely if a = b, then b1 = ± a2
and
b2 = ∓ a 1 .
(15.48)
◦ Proof : In two–dimensions, the orthogonality (Eq. 15.45) reduces to a · b = a1 b1 + a2 b2 = 0, which is equivalent to Eq. 15.47, as you may verify. The fact that the vectors have the same magnitude implies that λ = ±1, in agreement with Eq. 15.48.
15.2.5 Lines and planes revisited The introduction of the dot product allows us to provide a geometrical interpretation of the equations for straight lines and planes, which were introduced in Chapter 7.
• Two dimensions Consider a straight line on a plane (Section 7.3). Using vector notation, the implicit representation of a straight line (Eq. 7.19) may be written as ax (x − x0 ) + ay (y − y0 ) = a · (x − x0 ) = 0.
(15.49)
The geometrical interpretation is that Eq. 15.49 includes all the points such that x − x0 is perpendicular to a. Hence, a is the non–unit normal to the line. If, instead of a generic vector a, we use a unit normal n = a/a, Eq. 15.49 may be written as x · n = x0 · n. [In other words, all the vectors x that represent points of a straight line have the same projection along n.] Remark 126. As you might have seen in high school, the distance of a point O from a straight line L is defined as the OQ, where Q is the point of L that is closest to O. The line normal to L through O intersects L at Q, as you may
605
15. Physicists’ vectors
verify (use the Pythagorean theorem, Eq. 5.6). Accordingly, |n · x0 | equals the distance OQ of the line from the origin. In other words, we can say that Eq. 15.49 identifies all the points x of a straight line L (perpendicular to n with distance |n · x0 | = OQ from the origin). [Of course, there exist two lines with this property, since x0 and −x0 yield the same value for |n · x0 |.] Next, consider the parametric representation (Eq. 7.21). This may be written as x − x0 = λ b,
(15.50)
with λ ∈ (−∞, ∞). The above equation identifies all the points x of a straight line such that x − x0 is proportional to a given vector b, which is its tangent (not necessarily unit). ◦ Comment. The vectors a and b are related by the condition ax bx +ay by = 0 (Eq. 7.25), which may be written as a · b = 0.
(15.51)
In other words, a and b are mutually orthogonal. Indeed, a and b are, respectively, normal and tangent to the line under consideration.
• Three dimensions Here, we consider straight lines and planes in three dimensions, which were introduced in Section 7.4. The implicit representation of a plane in three dimension (Eq. 7.32) may be written as ax (x − x0 ) + ay (y − y0 ) + az (z − z0 ) = a · (x − x0 ) = 0.
(15.52)
The geometrical interpretation is that Eq. 15.52 includes all the points such that x − x0 is perpendicular to a. In other words, a is the normal to the plane in Eq. 15.52. [Again, |x0 · n|, with n := a/a, is the distance of the origin from the plane.] Similarly, the parametric representation of a straight line in three dimensions (Eq. 7.36) may be written as x − x0 = λ b,
with λ ∈ (−∞, ∞).
(15.53)
The geometrical interpretation is that Eq. 15.53 includes all the points x such that x − x0 is proportional to a given vector b. Therefore, b is the tangent to the line in Eq. 15.53. [If b = 1, then λ is an arclength, since |λ| = x−x0 .]
606
Part III. Multivariate calculus and mechanics in three dimensions
Next, consider the parametric representation of a plane in three dimensions (Eq. 7.34), which may be written as x − x0 = λ1 b1 + λ2 b2 ,
with λ1 , λ2 ∈ (−∞, ∞).
(15.54)
The geometrical interpretation is that the plane includes all the points x such that x − x0 is parallel to the plane defined by the vectors b1 and b2 .
15.2.6 Tangent and normal to a curve on a plane Here, we present a general expression for the tangent and normal vectors to a curve L described by y = f (x). According to Eq. 13.1, the straight line y − y0 = f0 (x − x0 ),
(15.55)
with y0 = f (x0 ) and f0 = f (x0 ), represents the tangent line to L at x = x0 . This may be written as −f0 (x − x0 ) + (y − y0 ) = 0, which coincides with Eq. 15.49, with ax = −f0 and ay = 1. Accordingly, ˘ = −f0 i + j n
(15.56)
is the (not necessarily unit) normal to L at x = x0 . Correspondingly, ˘t = i + f0 j
(15.57)
˘ · ˘t = 0. is the (not necessarily unit) tangent to L at x = x0 , since n
15.3 Cross (or vector) product In this section, we introduce the cross product (also known as the vector product) between two vectors. Specifically, we have the following Definition 240 (Cross product). Given any two vectors a and b, the cross product a × b (to be read as: “Vector a cross vector b”) is a vector given by a × b := A n,
(15.58)
where A denotes the area of the parallelogram defined by a and b (gray area in Fig. 15.11), whereas n is the unit normal to the plane defined by a and b, and oriented in such a way that the ordered set a, b, n is right–handed
607
15. Physicists’ vectors
Fig. 15.11 Cross product (top view)
Fig. 15.12 Cross product (3D view)
(see Figs. 15.11 and 15.12). Note that the result is a vector — hence the name vector product. [The verb “to cross” is used to mean “to take the cross product.” For instance, we use the expression “crossing a with b” to mean “taking the cross product between a and b.”] Note that the magnitude of a × b is given by a × b = A = a b sin θ,
(15.59)
where θ ∈ [0, π] is the angle between the two vectors.
• Properties of the cross product Note that, by definition, we have that a × b = 0,
(15.60)
iff a and b are parallel, since in this case sin θ = 0. In particular, we have a × a = 0,
for any a.
(15.61)
Remark 127. The trivial cases in which a = 0 and/or b = 0 imply that a × b = 0, as you may verify. [Hint: Use Eqs. 15.58 and 15.59.] To avoid lengthy sentences, we say that the vector 0 is parallel to any vector. Also from the definition, we have a × b = −b × a. Thus, all the properties that we prove for b apply to a as well.
(15.62)
608
Part III. Multivariate calculus and mechanics in three dimensions
Note also that, if we decompose b as b = bN + bP, with bN normal to a and bP parallel to a (Eq. 15.32), the definition of cross product implies a × b = a × bN,
(15.63)
since neither the normal n, nor the area between the two vectors is modified when we substitute b with bN (see Figs. 15.13 and 15.14). In other words, the portion of b that is parallel to a does not affect a × b.
Fig. 15.13 a × b = a × bN (side view)
Fig. 15.14 a × b = a × bN (top view)
Next, note that for a generic unit vector e, we have that the vector e×b = e × bN equals the vector bN rotated by 90◦ counterclockwise with respect to e. [“Clockwise with respect to a vector” is universally understood as “in the clockwise direction as seen by a person, with toes–to–head direction equal to that of the vector”. The same applies to counterclockwise.] In other words, we have
e × b = bN rotated , (15.64)
where rotated denotes a 90◦ counterclockwise rotation. Therefore, e × e × b = −bN, (15.65) since two 90◦ rotations yield the same vector in the opposite direction. As an immediate consequence, we have a × a × b = −a2 bN, (15.66) where a = a. Similarly, using a × b = −b × a (Eq. 15.62), we have a × b × a = a2 b N . (15.67)
609
15. Physicists’ vectors
We have the following Theorem 151. The cross product has the following properties a × (α b) = α (a × b),
(15.68)
a × (b1 + b2 ) = a × b1 + a × b2 .
(15.69)
Fig. 15.15 e × (b1 + b2 )
◦ Proof : Equation 15.68 is an immediate consequence of the definition. To prove Eq. 15.69, let us begin with an arbitrary unit vector e. Then, using Eqs. 15.63 and 15.64, we have (see Fig. 15.15)
e × (b1 + b2 ) = (b1 N + b2 N) rotated = b1 N rotated + b2 N rotated = e × b1 N + e × b 2 N = e × b1 + e × b 2 .
(15.70)
Next, consider the case in which a is not a unit vector. Assume a = 0 (otherwise, Eq. 15.69 is trivial). Setting in the above equation e = a/a and multiplying by a, one obtains Eq. 15.69. ◦ Comment. Note that Eqs. 15.68 and 15.69 indicate that the cross product is linear with respect to b, namely that a × (c1 b1 + c2 b2 ) = c1 a × b1 + c2 a × b2 .
(15.71)
Thus, recalling all the properties for b apply to a as well (see the comment below Eq. 15.62), we have that the cross product is linear with respect to both terms, or, akin to the dot product, the cross product is bilinear.
610
Part III. Multivariate calculus and mechanics in three dimensions
15.3.1 Cross product in terms of components Recall that i, j and k form a right–handed orthonormal basis. Hence, from the definition of cross product, we have i × j = −j × i = k, j × k = −k × j = i, k × i = −i × k = j.
(15.72)
Similarly, using indicial notation, we have that i1 , i2 and i3 form a right– handed orthonormal basis. Hence, from the definition of cross product, we have i1 × i2 = −i2 × i1 = i3 , i2 × i3 = −i3 × i2 = i1 , i3 × i1 = −i1 × i3 = i2 .
(15.73)
Next, using Eqs. 15.68 and 15.69, from c := a × b = (a1 i1 + a2 i2 + a3 i3 ) × (b1 i1 + b2 i2 + b3 i3 ),
(15.74)
c = (a2 b3 − a3 b2 ) i1 + (a3 b1 − a1 b3 ) i2 + (a1 b2 − a2 b1 ) i3 .
(15.75)
we obtain
◦ Comment. The second and third expressions in Eq. 15.73 may be obtained by cyclic rotation of the indices. [As stated in Remark 26, p. 82, cyclic rotation means (1 → 2); (2 → 3); (3 → 1).] Similarly, the second and third components in Eq. 15.75 may be obtained from the first, also by cyclic rotation of the indices.] For future reference, note that Eq. 15.73 may be written in a compact form as ij × ik =
3 #
hjk ih ,
(15.76)
h=1
where the permutation symbol hjk was defined in Eq. 3.169, namely hjk =
1,
if (h, j, k) = (1, 2, 3), or (2, 3, 1), or (3, 1, 2);
= −1,
if (h, j, k) = (3, 2, 1), or (2, 1, 3), or (1, 3, 2);
=
otherwise.
0,
(15.77)
611
15. Physicists’ vectors
• Cross product in determinant notation Equation 15.75 may be expressed in determinant notation, as
i1 i2 i3
c = a × b =
a1 a2 a3
,
b 1 b 2 b3
(15.78)
where we (nonchalantly) use the rule for the determinant of a 3 × 3 matrix (Eq. 3.165, or the Sarrus rule, Eq. 3.166), even though some elements are physicists’ vectors and some are scalars.
• Cross product in indicial notation
♣
Next, we want to express the cross product in terms of indicial notation (see Definition 237, p. 596). Note that Eq. 15.75 implies c 1 = a 2 b 3 − a 3 b2 , c 2 = a 3 b 1 − a 1 b3 , c 3 = a 1 b 2 − a 2 b1 .
(15.79)
Using the definition of the permutation symbol hjk (Eq. 15.77), the above equation may be written as 3 #
ch =
hjk aj bk .
(15.80)
j,k=1
◦ Comment. The same result is obtained by expressing a and b in terms of their components (as in Eq. 15.12) and then using Eq. 15.76, and equating components, as you may verify.
• Cross product in matrix notation
♣
Finally, for future reference, we express the cross product in terms of matrix notation (see Definition 237, p. 596). Let us introduce the matrix A = [ahk ], with ahk =
3 # j=1
Then, Eqs. 15.80 and 15.81 yield
hjk aj .
(15.81)
612
Part III. Multivariate calculus and mechanics in three dimensions
ch =
3 #
hjk aj bk =
j,k=1
3 #
ahk bk .
(15.82)
k=1
In matrix notation, we have c = A b,
(15.83)
⎤ 0 −a3 a2 A = ⎣ a3 0 −a1 ⎦. −a2 a1 0
(15.84)
with ⎡
Indeed, performing the matrix by vector multiplication one obtains ⎫ ⎧ ⎨ a 2 b 3 − a 3 b2 ⎬ (15.85) A b = a 3 b 1 − a 1 b3 , ⎭ ⎩ a 1 b 2 − a 2 b1 in agreement with Eq. 15.79. In other words, in terms of components, c = a × b may be written as c = A b, with A given by Eq. 15.85, and vice versa.
15.4 Composite products There are other products of interest, which are combinations of dot and cross products. These are presented here.
15.4.1 Scalar triple product We have the following Definition 241 (Scalar triple product). Consider a set of three vectors a, b and c. The product a × b · c is called the scalar triple product. [We could have written (a × b) · c. However, this is not necessary (and is rarely used in practice), because no other interpretation is possible. Specifically, a × (b · c) is meaningless.] We have the following Theorem 152. Consider a set of three vectors a, b and c. The scalar triple product equals ±V , where V is the volume of the parallelepiped defined by the three vectors, and the plus (minus) sign holds if the set a, b, c is right–handed
613
15. Physicists’ vectors
(left–handed), namely a × b · c = ±V.
(15.86)
◦ Proof : Indeed, by definition of cross product (Eq. 15.58), we have a × b · c = A n · c = A h = ±V,
(15.87)
where h := c · n is the projection of c along the normal to the plane that contains a and b (height of the parallelepiped). [Hint: Note that V = |h| A. Moreover, h > 0 (h < 0) iff c and n form an angle less (greater) than π/2, namely if the set a, b, c is right–handed (left–handed), in agreement with Eq. 15.86.] Consider the following Corollary 6. We have a × b · c = 0,
(15.88)
if, and only if, a, b and c are coplanar, in particular, if two of them are parallel. ◦ Proof : If, and only if, a, b and c are coplanar, the volume V vanishes. [Consistently with Remark 127, p. 607, we call three vectors coplanar even when one (or more) of them vanishes.] We have the following Theorem 153 (Equivalent expressions). The following expressions are equivalent a×b·c =b×c·a=c×a·b = c · a × b = a · b × c = b · c × a = ±V
(15.89)
and b×a·c =c×b·a=a×c·b = c · b × a = a · c × b = b · a × c = ∓V.
(15.90)
The upper (lower) sign holds if a, b and c are right–handed (left–handed). ◦ Proof : Indeed, note that the parallelepipeds defined by (a, b, c), (b, c, a) and (c, a, b) have the same volume. In addition, if the set a, b, c is right– handed (or left–handed), so are the sets b, c, a and c, a, b. Hence, Eq. 15.86 implies
614
Part III. Multivariate calculus and mechanics in three dimensions
a × b · c = b × c · a = c × a · b = ±V,
(15.91)
with the plus (minus) sign if the set a, b, c is right–handed (left–handed). In addition, using u · v = v · u (Eq. 15.20), Eq. 15.91 yields c · a × b = a · b × c = b · c × a = ±V.
(15.92)
This completes the proof of Eq. 15.89. On the other hand, using a×b = −b×a (Eq. 15.62), Eq. 15.89 yields Eq. 15.90. ◦ Comment. The above theorem shows that the scalar triple product does not change if the sequence of the vector is rotated cyclically. Also, comparing Eqs. 15.91 and 15.92, the dot–product and cross–product signs may be interchanged, provided that the sequence of the vectors is maintained. For instance, we have a × b · c = a · b × c.
(15.93)
In addition, comparing Eqs. 15.89 and 15.90, we notice that the scalar triple product changes sign if we interchange the order of any two of the three vectors.
• Scalar triple product in terms of components " Note that, combining a · d = 3k=1 ak dk = a1 d1 + a2 d2 + a3 d3 (Eq. 15.43) with d := b × c = (b2 c3 − b3 c2 ) i1 + (b3 c1 − b1 c3 ) i2 + (b1 c2 − b2 c1 ) i3 (Eq. 15.75), one obtains a · b × c = a1 (b2 c3 − b3 c2 ) + a2 (b3 c1 − b1 c3 ) + a3 (b1 c2 − b2 c1 ). (15.94) Accordingly, the expression for the triple scalar product may be written in terms of Cartesian components as
a1 a 2 a 3 a 1 b1 c 1
a · b × c =
b1 b2 b3
=
a2 b2 c2
. (15.95)
c 1 c 2 c 3 a 3 b3 c 3 [Hint: Use the definition of the 3 × 3 determinant (Eq. 3.165) and recall that |AT | = |A| (Eq. 3.168).]
615
15. Physicists’ vectors
15.4.2 Vector triple product Another product of interest is the vector triple product a × (b × c), for which we have a × (b × c) = b (a · c) − c (a · b).
(15.96)
Indeed (use again Eq. 15.75) b (a · c) − c (a · b) = (a1 c1 + a2 c2 + a3 c3 ) (b1 i1 + b2 i2 + b3 i3 ) − (a1 b1 + a2 b2 + a3 b3 ) (c1 i1 + c2 i2 + c3 i3 ) = a2 (b1 c2 − b2 c1 ) − a3 (b3 c1 − b1 c3 ) i1 + a3 (b2 c3 − b3 c2 ) − a1 (b1 c2 − b2 c1 ) i2 + a1 (b3 c1 − b1 c3 ) − a2 (b2 c3 − b3 c2 ) i3 = a × (b × c).
(15.97)
Equation 15.96 is known as the Lagrange formula (named after Joseph–Louis Lagrange). However, my students referred to Eq. 15.96 as the “BAC–minus– CAB rule,” and so will I. [This will avoid any possible confusion with Eq. 15.103, which is known as the Lagrange identity.] ◦ Comment. The vector triple product was already used in Eqs. 15.65 and 15.66, which are particular cases of Eq. 15.96. Indeed, if a = b, we have a × a × c = a × a × cN = −a2 cN, (15.98) in agreement with Eq. 15.66. [Hint: Use a × c = a × cN (Eq. 15.63) for the first equality, and Eq. 15.96 as well as a := a for the second.] Moreover, if a = c = e, where e is any unit vector, Eq. 15.96 yields e × (b × e) = b − e (b · e) = bN (Eq. 15.33). ◦ Warning. Note that a × (b × c) = (a × b) × c, as you may verify using Eq. 15.96.
• An alternate proof of the BAC–minus–CAB rule Another way to arrive at Eq. 15.96 may be obtained by using the following very useful identity 3 # k=1
hjk klm = δhl δjm − δhm δjl ,
(15.99)
616
Part III. Multivariate calculus and mechanics in three dimensions
that you might like verify (again δhl is the Kronecker delta). [Hint: Consider only the terms for which the result differs from zero. To have this, h and j are necessarily different, say h = 1 and j = 2. In this case, the only nonzero contribution occurs when k = 3. Therefore, we have necessarily either l = 1 and m = 2 (with result equal to 1, on both sides of the equation), or l = 2 and m = 1 (with result equal to −1, again on both sides of the equation).] Accordingly, in indicial notation, d = a × (b × c) may be written as (use Eq. 15.80) dh =
3 #
hjk aj
j,k=1
=
3 #
3 #
klm bl cm =
l,m=1
3 #
δhl δjm − δhm δjl aj bl cm
j,l,m=1
b h c j − bj c h a j ,
(15.100)
m=1
which is fully equivalent to Eq. 15.96.
15.4.3 Another important vector product Another important vector product is (a × b) · (c × d), for which we have the following vector identity (the parenthesis are redundant and used only for clarity) (a × b) · (c × d) = (a · c) (b · d) − (a · d) (b · c).
(15.101)
[Hint: Using a × b · f = a · b × f (Eq. 15.93) and setting f := c × d, we have (a × b) · (c × d) = a · b × (c × d).
(15.102)
Then, using b × (c × d) = c (b · d) − d (b · c) (BAC–minus–CAB rule, Eq. 15.96), we obtain Eq. 15.101.] Equation 15.101 is known as the Binet–Cauchy identity.2 Equation 15.101 implies the following identity a × b2 = a2 b2 − (a · b)2 ,
(15.103)
known as the Lagrange identity, after Joseph–Louis Lagrange. 2 Named after the French mathematician and physicist Jacques Philippe Marie Binet and Augustin–Louis Cauchy.
617
15. Physicists’ vectors ♣
15.5 Change of basis
In this section, we discuss the effect of a change of basis on the components of a vector. Consider two orthonormal bases, say (i1 , i2 , i3 ) and (i1 , i2 , i3 ). Denoting by rhk the components of the vector ih in the basis ik , we have ih =
3 #
rhj ij .
(15.104)
j=1
This implies (dot Eq. 15.104 with ik , and use ij · ik = δjk , Eq. 15.23) rhk = ih · ik ,
(15.105)
in agreement with the fact that, in an orthonormal basis, components and projections coincide (Eq. 15.29). It is convenient to introduce the matrix R, whose elements are the quantities rhk , namely ⎡ ⎤ i1 · i1 i1 · i2 i1 · i3 (15.106) R := rhk = ih · ik = ⎣i2 · i1 i2 · i2 i2 · i3 ⎦. i3 · i1 i3 · i2 i3 · i3 Similarly, we can write ik =
3 #
rkj ij ,
(15.107)
j=1
where rkj = ik · ij = rjk
(15.108)
are the of ik in the basis i1 , i2 , i3 . In other words, the matrix components R := rkh equals the transpose of the matrix R, namely ⎤ ⎡ i1 · i1 i1 · i2 i1 · i3 R = ⎣i2 · i1 i2 · i2 i2 · i3 ⎦ = RT . (15.109) i3 · i1 i3 · i2 i3 · i3 Next, consider a vector a: a=
3 #
ak ik =
k=1
Dotting this equation with ih , we obtain
3 # k=1
ak ik .
(15.110)
618
Part III. Multivariate calculus and mechanics in three dimensions
ah =
3 #
ih · ik ak =
k=1
3 #
rhk ak ,
(15.111)
k=1
or, in matrix notation, a = R a .
(15.112)
Similarly, dotting Eq. 15.110 with ih , we obtain ah =
3 #
ih · ik ak =
k=1
3 #
rhk ak ,
(15.113)
k=1
or, in matrix notation (use Eq. 15.109), a = R a = RT a.
(15.114)
◦ Comment. It is worth noting that combining Eqs. 15.112 and 15.114, and using Eq. 15.109, we have, for any a, we have a = R a = R RT a.
(15.115)
In other words, if we multiply a vector a by R = RT and then by R, we return to the original mathematicians’ vector (as expected, since we are back to the original frame). Similarly, for any a , a = R a = R T R a .
(15.116)
[This issue will be further addressed in Chapter 16, after we introduce the inverse of a matrix. Specifically, we will show that Eqs. 15.116 and 15.115 imply that RT R = R RT = [δhk ],
(15.117)
as well as R R = R R = [δhk ].] Accordingly, we have (combine Eqs. 15.107, 15.110, 15.113 and 15.117) T
T
a =
3 #
ah ih
h=1
=
3 # j,k=1
=
3 # 3 # h=1
aj δjk ik =
rhj
aj
# 3
j=1 3 #
ak ik = a.
rhk ik
k=1
(15.118)
k=1
In other words, by going from one frame to the other and then back, we obtain the original vector, as expected.]
15. Physicists’ vectors
619
15.6 Gram–Schmidt procedure for physicists’ vectors
♣
Consider three vectors, bk (k = 1, 2, 3), that are linearly independent, namely non–coplanar (Theorem 146, p. 592). Here, we want to show how from these vectors one can obtain a set of orthonormal vectors, ek . The procedure used to accomplish this, presented below, is known as the Gram–Schmidt procedure.3 To begin with, set e1 := b1 /b1 ,
(15.119)
which implies e1 = 1. Next, use the normal/parallel decomposition b2 = b2 N + b2 P, with b2 N normal to e1 and b2 P parallel to e1 , so that (Eq. 15.33) b2 N = b2 − (b2 · e1 ) e1 .
(15.120)
Note that b2 N · e1 = b2 · e1 − (b2 · e1 ) (e1 · e1 ) = 0. In other words, b2 N is orthogonal to e1 . Then, set e2 = b2 N/b2 N,
(15.121)
which implies e2 = 1. Thus, we have obtained two mutually orthonormal vectors, namely e1 and e2 . Next, we use a similar procedure on b3 . Specifically, set b3 N = b3 − (b3 · e1 ) e1 − (b3 · e2 ) e2 ,
(15.122)
which implies b3 N · eh = 0, for h = 1, 2, as you may verify. Then, set e3 = b3 N/b3 N,
(15.123)
which implies e3 = 1. In conclusion, we have obtained three orthonormal vectors, namely e1 , e2 and e3 , which is the desired objective. ◦ Comment. It is worth noting that in Eqs. 15.121 and 15.123 we have tacitly assumed that bk N = 0 (k = 2, 3). This is guaranteed by the fact that, by hypothesis, the three vectors bk are linearly independent, as you may verify. ◦ Comment. If we change the sign of one of the vectors ek , the set is still orthonormal. This fact may be used to impose that the set is right–handed. 3 Named after the Danish mathematician Jørgen Pedersen Gram (1850–1916) and the German mathematician Erhard Schmidt (1876–1959), Refs. [26] and [59], respectively.
620
Part III. Multivariate calculus and mechanics in three dimensions
15.7 Appendix. A property of parabolas
♥
Here, as an application, we address a property of parabolas, related to the fact that a light bulb placed at the focus of a paraboloid of revolution produces a beam of light rays parallel to the axis of the paraboloid, along with other interesting phenomena. To this end, consider the following Theorem 154 (Parabolic reflectors). Consider the horizontal parabola represented by the equation y 2 = 4˚ a x (Eq. 7.122), and depicted in Fig. 15.16. Let P = (x, y) denote a generic point of the parabola, F = (˚ a, 0) its focus, P N the normal to the parabola at P , α the angle between P F and the normal P N , and β the angle between the normal P N and a horizontal segment P H. The parabola has the property that α = β. ◦ Proof : ♠ Consider the vector xP = 14 (y 2 /˚ a) i + y j corresponding to a generic point P of the parabola, and the vector xF = ˚ a i corresponding to the point F in Fig. 15.16, where ˚ a = .25. The vector connecting these points is given by y2 b := xF − xP = ˚ i − y j. (15.124) a− 4˚ a
Fig. 15.16 Parabolic reflector
In addition, consider a vector c that: (i) is parallel to the x-axis, namely proportional to i, and (ii) has the same magnitude as the vector b. These conditions yield that the vector c is given by 1 2 y2 c := + y 2 i. (15.125) ˚ a− 4˚ a
621
15. Physicists’ vectors
Next, we want a vector normal to the parabola at P . To this end, let us √ rewrite Eq. 7.122 as y = 2 ˚ a x = f (x) (upper portion of the parabola, as ˘ = −f0 i + j in Fig. 15.16). This yields f (x) = ˚ a/x = 2˚ a/y. Then, using n ˘ = i − (1/f0 ) j, we have that the (non–unit) (Eq. 15.56), or its equivalent n normal to the parabola is given by ˘ =i− n
y j. 2˚ a
(15.126)
The theorem claims that α = β, where α and β are the angles that b and ˘ (see Fig. 15.16). This is equivalent to saying that c form with the normal n ˘ =c·n ˘ , namely b·n 1 2 2 2 y y y2 ˚ a− + = ˚ a− + y2 , (15.127) 4˚ a 2˚ a 4˚ a or 2 2 y4 y2 y2 y2 y2 + +2 ˚ a− = ˚ a− + y2 , ˚ a− 4˚ a 4˚ a 2˚ a 4˚ a2 4˚ a
(15.128)
which is satisfied, as you may verify. [Hint: Eliminate the two terms ±y 4 /(4˚ a2 ) on the left side.] ˘ = c·n ˘ . This, in This proves the validity of Eq. 15.127, namely that b · n ˘ (because b = c), in turn, means b and c form the same angle with n agreement with the theorem.
• High beams, Archimedes, and the Walkie–Scorchie
♥
The high beams of your car work on the basis of the property stated in Theorem 154 above. Indeed, they consist of a mirror that is axisymmetric (i.e., a surface of revolution), with a cross–section like a parabola (paraboloid of revolution), with the light bulb located at the focus of the parabola (parabolic reflector). Since the light rays bounce off a reflecting surface with the same angle in which they arrive (take my word for it), we can apply the preceding theorem and obtain that a source of light, placed in the focus of the parabola, yields a beam of light with rays parallel to the axis of the paraboloid. Vice versa, using a parabolic reflector, parallel rays coming from the Sun are directed towards the focus of the parabola, thereby creating a concentration of radiation energy. Archimedes is said to have used parabolic reflectors (ustoria specula, Latin for “burning mirrors”), as a weapon to burn ancient Roman ships during the siege of Syracuse by the Romans, during the Sec-
622
Part III. Multivariate calculus and mechanics in three dimensions
ond Punic War (218 to 201 BC). These heat rays were obtained by using a collection of mirrors, placed so as to approximate a parabola. One more example. In September 2013, a skyscraper located at 20 Fenchurch Street in London was under construction. It was nicknamed the Walkie–Talkie because of its shape. What is of interest here is that its facade consisted of a curved wall with outward concavity. Because of the phenomenon described above, the light reflected by the windows on the curved wall was concentrated, and melted parts of cars parked nearby and caused also other damages. Accordingly, the skyscraper was re–nicknamed the Walkie– Scorchie. [More on this subject, for instance, in the article by Angela Monaghan (Ref. [47]).]
Chapter 16
Inverse matrices and bases
Before we can tackle statics in three dimensions, we need additional material on matrices and mathematicians’ vectors. Accordingly, the main objective of this chapter is to further develop the material covered in Chapters 2 and 3. Here, we introduce two new important definitions. The first is that of inverse matrices. In matrix algebra, the inverse matrix plays the same role that the inverse (namely the reciprocal) of a number plays in ordinary algebra. The second one is the notion of change in the frame of reference (basis) in Rn and Cn . These lead us to the so–called similar matrices and orthogonal matrices. ◦ Comment. In this chapter, the matrices can be either real or complex, unless otherwise stated. Recall that R denotes the collection of all the real numbers (Definition 21, p. 39), whereas C denotes the collection of all the complex numbers (Definition 29, p. 53). Moreover, R2 denotes the collection of all the ordered pairs of real numbers (Definition 177, p. 353). Similarly, C2 denotes the collection of all the ordered pairs of complex numbers. Here, we deal with the collections of all n-tuples of real or complex numbers, denoted by respectively by Rn and Cn (Definition 246, p. 633). ◦ Warning. Unless otherwise stated, in this chapter the term “vector” denotes “mathematicians’ vectors,” which are the ones primarily addressed here.
• Overview of this chapter In Section 16.1, we go back to diagonal matrices, namely matrices whose off–diagonal elements are all equal to zero (Definition 44, p. 92). We show in particular that, when the matrix is diagonal, the matrix–by–vector product consists in multiplying each element of the vector by the corresponding diagonal element of the matrix. Similar results are valid for matrix–by–matrix
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_16
623
624
Part III. Multivariate calculus and mechanics in three dimensions
multiplications. Diagonal matrices include, in particular, the identity matrix, namely a diagonal matrix whose diagonal elements are all equal to one. Next, in Section 16.2 we present one of the most important and useful notions in matrix algebra, that of the inverse matrix, along with its properties. In particular, we introduce the definition of inverse matrix, along with its properties, and we reexamine the problem A x = b, extending it to A X = B. As stated in the preceding paragraph, matrix–by–vector and matrix–by– matrix multiplications are particularly simple in the presence of a diagonal matrix. Wouldn’t it be nice if all the matrices were diagonal? Well, this is not as far from the truth as one might think. In order to address this possibility, in Section 16.3 we introduce the notions of bases and similar matrices. In particular, we first extend the definition of basis, from three–dimensional physicists’ vectors to n-dimensional mathematicians’ vectors. Then, we address the issues related to a change of basis and introduce the definition of similar matrices and finally introduce the so–called diagonalizable matrices, as those matrices that may be transformed, through a change of basis, into a diagonal form. The next item of business is the definition of orthogonality for mathematicians’ vectors. This is dealt with in Section 16.4, where we introduce orthogonal vectors and related concepts. In particular, we introduce orthogonal bases and orthogonal matrices in Rn . In addition, we extend the Gram–Schmidt procedure for physicists’ vectors to that for mathematicians’ real vectors. In the following section (Section 16.5), we shift to the complex field, and discuss the properties of complex matrices, in particular of the so–called Hermitian matrices, which have a role similar to that of real symmetric matrices when we move to the complex field. We conclude the chapter with two appendices. In the first (Section 16.6), the primary objective is to connect the material on the inverse matrix to that of Chapters 2 and 3 on the Gaussian elimination procedure. To this end, we introduce a modification of the Gaussian elimination procedure (namely the so–called Gauss–Jordan procedure), which allows us to connect the two subjects. In the second appendix (Section 16.7), we introduce triangular matrices (namely matrices whose elements vanish either above or below the main diagonal) and the L-U decomposition (namely a decomposition of a matrix into the product of lower and upper triangular matrices), and show its relationship with the Gaussian elimination procedure. We also discuss the relationship between triangular matrices and the so–called nilpotent matrices.
625
16. Inverse matrices and bases
16.1 Zero, identity and diagonal matrices In this section, we address in greater depth diagonal matrices (Definition 44, p. 92). In particular, we introduce the identity matrix I (Subsection 16.1.2), and diagonal matrices in general (Subsection 16.1.3). Before doing this, in Subsection 16.1.1, we discuss some properties of the zero matrix O introduced in Definition 57, p. 101.
16.1.1 The zero matrix As stated in Definition 57, p. 101, the square) zero matrix, namely a matrix ⎡ 0 0 ⎢ 0 0 ⎢ O=⎢ ⎢ 0 0 ⎣... ... 0 0
symbol O denotes the (not necessarily whose elements are all equal to zero: ⎤ 0 ... 0 0 ... 0 ⎥ ⎥ 0 ... 0 ⎥ (16.1) ⎥. ... ... ...⎦ 0 ... 0
The reason for the name given to this matrix is related to the fact that, for any vector x, we have O x = 0.
(16.2)
Similarly, we have AO = O
and
O B = O.
(16.3)
◦ Warning. Of course, here we have assumed that x, A, B, and O are multipliable, in the sense that the number of columns of whichever comes first equals the number of rows of whichever comes second. This comment is tacitly applied in the future. Vice versa, we have the following Theorem 155. For any matrix A, we have that, wherever A x = 0 for any vector x, then necessarily A = O. ◦ Proof : Indeed, choosing x = 1, 0, 0, . . . , 0 T , we obtain that the first column of A equals 0 = 0, 0, 0, . . . , 0 T . Similarly, choosing x = 0, 1, 0, . . . , 0 T , we obtain that the second column of A equals 0. Proceeding this way, we obtain that indeed A coincides with O.
626
Part III. Multivariate calculus and mechanics in three dimensions
16.1.2 The identity matrix We have the following Definition 242 (Identity matrix). Consider the n × n matrix with all the diagonal elements equal to one and all the off–diagonal elements equal to zero, namely (recall Eq. 3.118) ⎡ ⎤ 1 0 0 ... 0 ⎢ ⎥ ⎢ 0 1 0 ... 0 ⎥ ⎥, 0 0 1 . . . 0 I := δhk = ⎢ (16.4) ⎢ ⎥ ⎣... ... ... ... ...⎦ 0 0 0 ... 1 where δhk is the Kronecker delta. The matrix I is called the n × n identity matrix. [I will use the notation In = δhk h, k = 1, . . . , n , (16.5) whenever I wish to emphasize that the dimensions of the matrix.] The reason for the name given to this matrix is related to the fact that, for any vector x, we have ' n ( # Ix = (16.6) δhk xk = xh = x. k=1
◦ Warning. Above, we have used the rule for the Kronecker delta, namely "n k=1 δhk ak = ah (Eq. 3.30). This fact is tacitly assumed for several equations in this section. Similarly, for any m × n matrix A, we have 3 2m # (16.7) δhj ajk = ahk = A, IA = j=1
and AI =
2 n #
3 ahj δjk = ahk = A.
(16.8)
j=1
In summary, multiplication by the identity matrix I leaves a vector (or a matrix) unchanged, akin to multiplication by 1 in the fields of real and complex numbers. Vice versa, we have the following
627
16. Inverse matrices and bases
Theorem 156. For any square n × n matrix A, iff we have A x = x, for any x ∈ Rn , we have necessarily A = I. ◦ Proof : Indeed, choosing x = 1, 0, 0, . . . , 0 T , we obtain that the first column of A is 1, 0, 0, . . . , 0 T . Similarly, choosing x = 0, 1, 0, . . . , 0 T , we obtain that the second column of A is 0, 1, 0, . . . , 0 T . Proceeding this way, we obtain that indeed A coincides with I.
16.1.3 Diagonal matrices The identity matrix is a particular case of a diagonal matrix, namely a square matrix whose off–diagonal terms vanish (Definition 44, p. 92). For matrices of this type, we use the notation in Eq. 3.8, namely D = Diag dhh := dhh δhk , (16.9) where dhh denoted the h-th diagonal term of D. If a matrix is diagonal, the operation of matrix–by–vector multiplication is particularly simple. Indeed, in such a case we have that b = D c may be written in terms of elements as bh =
n #
dhh δhk ck = dhh ch .
(16.10)
k=1
In other words, as above, multiplying a vector c by the diagonal mentioned matrix D = Diag dhh is equivalent to multiplying the h-th element of c by dhh . Therefore, multiplication by a diagonal matrix is not only conveniently simple, but also easy to visualize in our minds (gut–level mathematics again). In addition, if D is an n × n diagonal matrix, then for any n × m matrix A, we have that B = D A may be written in terms of elements as bhk =
n #
dhh δhj ajk = dhh ahk .
(16.11)
j=1
In other words, left–multiplying an arbitrary n × m matrix A by the n × n diagonal matrix D = Diag dhh is equivalent to multiplying the h-th row of A by dhh . Similarly, B = A D may be written in terms of elements as (use again Eq. 3.30)
628
Part III. Multivariate calculus and mechanics in three dimensions
bhk =
n #
ahj djj δjk = ahk dkk .
(16.12)
j=1
In other words, right–multiplying an arbitrary m×n matrix A by the diagonal n × n matrix D = Diag dhh is equivalent to multiplying the k-th column by dkk . Moreover, the product of two n × n diagonal matrices, A = Diag akk and B = Diag bkk , is a diagonal n × n matrix with diagonal elements given by akk bkk , namely Diag akk Diag bkk = Diag akk bkk = Diag bkk Diag akk . (16.13) "n [Hint: For the first equality, use j=1 (ahh δhj )(bkk δjk ) = ahh bkk δhk . For the second, use the symmetry of the middle term of the equation with respect to A and B.] Accordingly, we have the following Theorem 157. Any two n × n diagonal matrices, say A and B, commute. ◦ Proof : Indeed, using matrix notation, Eq. 16.13 states A B = B A (commuting matrices, Eq. 3.52).
16.2 The inverse matrix In Subsection 16.2.1, we reexamine the problem A x = b and use this to introduce a very important notion, that of the inverse of a given matrix. [As stated above, in matrix algebra, the inverse matrix plays the same role that the inverse (namely the reciprocal) of a number plays in ordinary algebra: if we want to divide a by b, we multiply a by the reciprocal of b.] Then, in Subsection 16.2.2, we address the problem A X = B. [You will note that the material in this section is closely related to that on the Gaussian elimination presented in Chapters 2 and 3. This relationship is explored at greater length in the appendices of this chapter, Sections 16.6 and 16.7.]
16.2.1 Definition and properties The study of matrices that satisfy the relationship AB = I
(16.14)
629
16. Inverse matrices and bases
is the main objective of this section, and arguably yields the most important result of this entire chapter. As a preliminary step, consider the following Lemma 14 (B A = I implies A B = I). Consider any two n × n matrices, A and B, that are such that B A = I.
(16.15)
A B = I.
(16.16)
Then, we have, necessarily,
◦ Proof : Consider the equation A g = f,
with f arbitrary.
(16.17)
Left–multiplying this equation by B yields B A g = B f, or (use Eq. 16.15) g = B f,
with f arbitrary.
(16.18)
Combining Eqs. 16.17 and 16.18, one obtains A B f = f,
with f arbitrary.
(16.19)
This, in turn, implies A B = I (Theorem 156, p. 626), in agreement with Eq. 16.16. Taking into account the above lemma, we may introduce the following Definition 243 (Inverse matrix). Consider two n × n matrices A and B. Iff A B = B A = I,
(16.20)
then A and B are called invertible, B is called the inverse of A, and vice versa. To indicate this, we use the notation B = A -1
and
A = B -1.
(16.21)
Then, Eq. 16.20 reads A A -1 = A -1 A = I.
(16.22)
◦ Comment. You might like to revisit the influence matrix G (Eq. 4.51). Specifically, now we can say that the influence matrix G is the inverse of the stiffness matrix K.
630
Part III. Multivariate calculus and mechanics in three dimensions
For future reference, given an invertible matrix A and B = A -1, we have ˘T ak = δhk , b h
(16.23)
˘T are, respectively, the columns of A and the rows of B. Indeed, where ak and b h BA = I (Eq. 16.20) is equivalent to Eq. 16.23. Similarly, AB = I (Eq. 16.20) is equivalent to ˘aTh bk = δhk . We have the following Theorem 158 (Uniqueness of the inverse). The inverse of A, if it exists, is unique. ◦ Proof : Assume that there exist B1 and B2 such that A B1 = B1 A = I and A B2 = B2 A = I. Then, we would have B1 = (B2 A)B1 = B2 (A B1 ) = B2 , in agreement with the theorem.
(16.24)
Remark 128. If we have A x = b and x = C b independently of x, then necessarily C = A -1. Indeed, left–multiplying the first by C, we have C A x = C b = x, which yields CA = I (Theorem 156, p. 626). Recall the definition of non–singular matrix, namely that the rows of the matrix are linearly independent. We have the following Theorem 159 (Nonsingular matrices are invertible). A nonsingular square matrix is invertible. ◦ Proof : If you go back to Subsection 2.4.2 on the details of the Gaussian elimination procedure, you will note that all the operations performed during each of the n − 1 cycles of the first phase of the Gaussian elimination procedure consist in left–multiplying the system of equations resulting from the preceding cycle by a suitable matrix (see, in particular, Eqs. 2.62 and 2.64). The same is true for the back–substitution performed during the second phase of the Gaussian elimination procedure. In other words, we started from A x = b and, through a left–multiplication by a suitable matrix C, we have obtained x = C b. Therefore, we have C = A -1 (Remark 128 above). Thus, we can say that the inverse exists, provided that the rows of A are linearly independent, namely that A is nonsingular. In addition, we have the following Theorem 160 (Inverse of a symmetric matrix). If A is symmetric, its inverse is symmetric as well.
631
16. Inverse matrices and bases
◦ Proof : Taking the transpose of A A -1 = I (Eq. 16.22), and using (A B)T = BT AT (Eq. 3.53) and AT = A (symmetry of A), we have necessarily T I = A A -1 = (A -1)T AT = (A -1)T A, (16.25) which shows that (A -1)T is the inverse of A, which is unique (Theorem 158, p. 630). In other words, (A -1)T = A -1, in agreement with the theorem.
(16.26)
Moreover, we have the following Theorem 161 (Inverse of matrix product). If two n × n matrices A and B are invertible, then their product is invertible with the inverse given by -1 A B = B -1 A -1. (16.27) ◦ Proof : Indeed, we have A B B -1 A -1 = I, and B -1 A -1 A B = I, which imply Eq. 16.27.
• Powers of a matrix We have the following Definition 244 (Integer powers of A). The p-th power of A (where p is a natural number), denoted by Ap , equals the matrix I multiplied p times by A. In addition, for p = −q (where q is a natural number), we define q A−q := A -1 . (16.28) We have Ap A±q = Ap±q ,
(16.29)
with p and q integers. [The proof is analogous to that of the corresponding equation for real numbers (Eqs. 6.94 and 6.103).] Of course, this is still true even if the order in which the matrices A and A -1 appear is altered arbitrarily, since they commute (Eq. 16.20).
632
Part III. Multivariate calculus and mechanics in three dimensions
16.2.2 The problem A X = B. Inverse matrix again
♠
In Theorem 159, p. 630, we have used the Gaussian elimination procedure to prove the existence of the inverse matrix. Here, we use the method to obtain the inverse matrix A -1. Given a nonsingular n × n matrix A, consider m problems of the type addressed in Eq. 3.103, namely A x p = bp
(p = 1, . . . , m).
(16.30) Next, let us introduce the partitioned n × m matrices X = x and . . . x 1 m B = b1 . . . bm . Then, the equations in Eq. 16.30 may be rewritten in a more compact form as
A X = B.
(16.31)
Multiplying this equation by A -1 and using A -1 A = I, we have that the solution of the above problem is given by X = A -1 B.
(16.32)
In particular, consider the case m = n and B = I. In this case, from Eq. 16.32 we have X = A -1. In other words, in this case, the solution X coincides with A -1. This implies that a conceptual method (not necessarily the most convenient one in practice) to obtain A -1 is to solve (for instance by Gaussian elimination) a collection of systems of equations, in which the right sides are, respectively, b1 = 1, 0, 0, . . . , 0 T , b2 = 0, 1, 0, . . . , 0 T , . . . , bn = 0, 0, 0, . . . , 1 T .
16.3 Bases and similar matrices In this section, we introduce the notion of basis for mathematicians’ vectors (Subsection 16.3.1), in full analogy with that formed by i1 , i2 , i3 , introduced in Chapter 15 for physicists’ vectors. Then, we explore what happens if we use a different basis, and, as a result, we introduce the so–called similar matrices, as those matrices that represent the same linear transformation (namely a transformation that consists of a matrix multiplication) in different bases (Subsection 16.3.2). Finally, we define diagonalizable matrices, as those matrices that have a similar matrix that is diagonal, and discuss their properties (Subsection 16.3.3).
16. Inverse matrices and bases
633
16.3.1 Linear vector spaces Rn and Cn and their bases We have the following Definition 245 (Linear vector space). A collection of real (complex) mathematicians’ vectors vk is called a linear vector space iff, for any given v1 and v2 in the collection, and any given c1 and c2 in R (in C), we have that v = c1 v1 + c2 v2 is also an element of the same collection. As particular cases of linear vector spaces, we have the following Definition 246 (The linear vector spaces Rn and Cn ). The collection of all the n-dimensional real (complex) mathematicians’ vectors is a linear vector space, denoted by Rn (by Cn ), provided of course that c1 and c2 are real (complex). ◦ Comment. In order to appreciate the difference between Definition 245 and Definition 246 above, consider as an example the collection of all the ndimensional real mathematicians’ vectors whose last element vanishes. This satisfies the conditions in Definition 245 above, and hence is itself a linear space. However, it does not satisfy Definition 246. Nonetheless, any element of such a collection belongs to Rn . Accordingly, we will refer to this as a subspace of Rn . [To provide you with an example that you can visualize, consider the space of the components of the physicists’ vectors, namely R3 . The corresponding subspace with the last element equal to zero corresponds to those three–dimensional vectors that lie in the (x1 , x2 )-plane.] Typically, when we consider a linear n-dimensional vector space, we limit ourselves either to Rn (real n-dimensional vector space), or to Cn (complex n-dimensional vector space), or their subspaces. However, there exist other linear vector spaces. Consider, for instance, a collection of vectors whose elements are rational numbers. These form a linear vector space, provided that c1 and c2 in the above definition are rational. Of course, we would also have to broaden Definition 245, p. 633, of linear vector spaces, to include vectors composed of rational numbers. [Incidentally, similar definitions apply to other mathematical entities, such as functions.] That said, in this chapter, the only spaces we are interested in are those of (real or complex) n-dimensional vectors, Rn and Cn . • Bases in Rn or Cn Here, we want to extend to mathematicians’ vectors the notions of bases and base vectors, introduced in Subsection 15.1.4 for physicists’ vectors. We have the following
634
Part III. Multivariate calculus and mechanics in three dimensions
Definition 247 (Basis). In the linear vector spaces Rn and Cn , any collection of n linearly independent vectors qk (k = 1, . . . , n) is called a basis. The basis is denoted by {qk }nk=1 , or simply by {qk }. The vectors qk are called the base vectors. For instance, consider an n-dimensional real (complex) vector v in Rn (in C ). Recall the “best–kept secret in matrix algebra” (Eq. 3.68). Then, the relationship v = Iv may be written as n
v=
n #
v k ek ,
(16.33)
k=1
where vk are real (complex) numbers, whereas ek = δjk , namely ⎧ ⎫ ⎧ ⎫ ⎧ ⎫ 1⎪ 0⎪ 0⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎬ ⎨ ⎬ ⎨ ⎪ ⎬ 0 1 0 e1 = , e2 = , ..., en = . . . .⎪ . . .⎪ . . .⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎪ ⎭ ⎩ ⎪ ⎭ ⎩ ⎪ ⎭ 0 0 1
(16.34)
Thus, the vectors ek are a basis for Rn (and for Cn ). Indeed, the vectors ek satisfy the definition of linearly independent mathematicians’ vectors (Definition 59, p. 101). [The vectors ek are in full analogy with the physicists’ base vectors i1 , i2 , i3 , introduced in Eq. 15.10.] Here, we want to extend to mathematicians’ vectors the notion of components introduced in Definition 231, p. 594, for physicists’ vectors. We begin with the following Theorem 162. Given a basis {qk }nk=1 , any vector v in Rn or Cn may be uniquely represented as v=
n #
vk qk .
(16.35)
k=1
◦ Proof : Recalling again the “best–kept secret in matrix algebra” (Eq. 3.68), Eq. 16.35 may be written as v=
n #
vk qk = Q v ,
(16.36)
k=1
where v := vh , whereas Q := q1 , q2 , . . . , qn .
(16.37)
635
16. Inverse matrices and bases
The columns, qh , of Q are linearly independent, by definition of basis. Hence, Q is nonsingular (Theorem 23, p. 117). Thus, Q v = v (Eq. 16.36) has a unique solution for v . Accordingly, we have the following Definition 248 (Components in Rn ). The coefficients vh in Eq. 16.35 are called the components of v in the basis formed by the linearly independent vectors qh .
16.3.2 Similar matrices Here, we want to address changes of basis. Specifically, we want learn how to express in a different basis the linear transformation generated by an n × n matrix A in the original basis e1 , . . . , en (introduced in Eq. 16.34). Specifically, consider two n-dimensional vectors b and c that are related by a linear transformation of the type c = A b,
(16.38)
as well as n linearly independent n-dimensional vectors qh (h = 1, . . . , n). If we apply v = Q v , with Q = [q1 , . . . , qn ] (Eq. 16.36), to both b and to c, so as to have b = Q b and to c = Q c , we obtain Q c = A Q b , namely c = A b ,
(16.39)
A = Q -1A Q.
(16.40)
where
[Hint: As stated in the definition above, the fact that the columns qh of Q form a basis implies that they are linearly independent vectors. Therefore, the matrix Q is nonsingular (Theorem 23, p. 117), and hence invertible (Theorem 159, p. 630).] Matrices that are connected by a transformation of the type given in Eq. 16.40 deserve a special name. Accordingly, we have the following Definition 249 (Similar matrices). Two square matrices A and A are called similar iff there exists a nonsingular matrix Q such that A = Q -1A Q (Eq. 16.40). Remark 129. From Eq. 16.39, we see that the matrix A has the same role in the new basis {qh }, as A had in the original basis {eh }. In other words, we
636
Part III. Multivariate calculus and mechanics in three dimensions
can say that similar matrices represent the same linear transformation within different bases. We have the following Theorem 163. Consider two similar matrices A and A , with A = Q -1A Q. We have (A )k = Q -1 Ak Q
and
Ak = Q (A )k Q -1.
(16.41)
◦ Proof : Consider the first equation above (the second equation is fully equivalent to the first, as you may verify). Let us use the principle of mathematical induction (Subsection 8.2.5). For the case k = 2, from A = Q -1A Q, we have (A )2 = (Q -1 A Q) (Q -1 A Q) = Q -1 A2 Q.
(16.42)
On the other hand, for a generic k, we have (A )k = (A )k−1 A = (Q -1 Ak−1 Q) (Q -1 A Q) = Q -1 Ak Q.
(16.43)
[Hint: For the second equality, use the premise in Item 2 of the principle of mathematical induction.]
16.3.3 Diagonalizable matrices We have introduced diagonal matrices D = Diag dkk = dkk δhk (Eq. 16.9). Here, we want to introduce a terminology to identify those matrices that may be transformed into a diagonal one through a basis transformations. Specifically, we have the following Definition 250 (Diagonalizable matrix). A square matrix A is called diagonalizable iff there exists a matrix Q such that Q -1A Q = D,
(16.44)
D = Diag dkk = dkk δhk
(16.45)
where
is a diagonal matrix, similar to A. The matrix Q will be referred to as the diagonalizing matrix. The basis {qh } (columns of Q) will be referred to as the diagonalizing basis. The matrix D will be referred to as the diagonal form of the matrix A (or A in diagonal form).
637
16. Inverse matrices and bases
In this case, Eq. 16.39 reduces to c = D b ,
(16.46)
ck = dkk bk .
(16.47)
or (use Eq. 16.10)
In plain language, if A is diagonalizable, in the diagonalizing basis the matrix– by–vector multiplication c = A b consists in multiplying the k-th component of the vector by dkk (see Eq. 16.10 and the paragraph that follows). In other words, we have extended from diagonal matrices to diagonalizable matrices the fact that the operation of matrix–by–vector multiplication is particularly simple to visualize.
16.3.4 Eigenvalue problem in linear algebra
♥
Diagonalizable matrices are a very powerful tool and are broadly used in this book. Accordingly, it seems appropriate to spend a few words on the solvability of Eq. 16.44. This may be written as A Q = Q D, where D is diagonal. In indicial notation, we have (set λj = djj ) n # k=1
ahj qjk =
n #
qhj λj δjk = λk qhk ,
(16.48)
k=1
namely A qk = λk qk , with qk = {qhk }. Let us consider the related problem A q = λ q.
(16.49)
This is homogeneous systems of linear algebraic equations, which admits one or more nontrivial solutions qk for any λ = λk such that A − λk I is singular. If we find n linearly independent vectors qk , then the matrix Q = q1 , q2 , . . . , qn (Eq. 16.37) is nonsingular and the matrix A is diagonalizable. Equation 16.49 is known as the eigenvalue problem in linear algebra. The λk ’s and the qk ’s are called respectively the eigenvalues and the eigenvectors of A. [An illustrative example of matrix diagonalization is presented in Subsection 23.2.4, on the principal axes of inertia, for the limited case of a symmetric 3 × 3 matrix. The general theory behind the eigenvalue problem in linear algebra is addressed in Vol. II.]
638
Part III. Multivariate calculus and mechanics in three dimensions
16.4 Orthogonality in Rn In this section, we introduce orthogonal and orthonormal mathematicians’ vectors, as well as orthogonal matrices, in the real field. [The extension to Cn is presented in the next section.]
• Orthogonal vectors. Orthogonal basis Recall the definition of magnitude of a real mathematicians’ vector (Eq. 3.61), namely Definition 251 (Magnitude of a real vector. Unit vector). Given any real n-dimensional vector b, its magnitude (or norm), denoted by the symbol b, is defined by $ % n √ %# T b := b b = & |bk |2 ≥ 0, (16.50) k=1
where the equal sign holds if, and only if, b = 0. A real vector with magnitude equal to one will be referred to as a unit vector. Recall that two physicists’ vectors, say a and b, are called orthogonal "3 iff a · b = k=1 ak bk = 0 (Eq. 15.45). Here, we exploit the second equality to extrapolate to mathematicians’ vectors the definition of orthogonality for physicists’ vector. Accordingly, we have the following Definition 252 (Orthogonality for real n-dimensional vectors). Two real n-dimensional vectors, a and b in Rn , are said to be orthogonal iff n #
aT b =
ah bh = 0.
(16.51)
h=1
Remark 130. Do not even think of visualizing two orthogonal mathematicians’ vectors, as being perpendicular to each other. Here, the term “orthogonal” simply means that the vectors satisfy Eq. 16.51. The term is just “borrowed” from the field of physicists’ vectors. Orthogonality for mathematicians’ vectors is only a convenient extrapolation of orthogonality for physicists’ vectors (Eq. 15.45, namely Eq. 16.51 with n = 3). Similar considerations apply when we define θ := cosP-1
aT b ∈ [0, π], a b
(16.52)
639
16. Inverse matrices and bases
by “borrowing” Eq. 15.46 on the angle between two physicists’ vectors. We have the following Theorem 164 (Orthogonality implies linear independence). If p ≤ n nonzero real vectors bh (h = 1, . . . , p) are mutually orthogonal, they are also linearly independent. ◦ Proof : To say that the p nonzero vectors bh are linearly independent is equivalent to saying that, if p #
αk bk = 0,
(16.53)
k=1
then, necessarily αk = 0 (k = 1, . . . , p ≤ n). In our case, left–multiplying Eq. 16.53 by bTh , and using bTh bk = 0 for h = k (orthogonality of bk ), one obtains bTh
n # k=1
αk bk =
n #
αk bTh bk = αh bTh bh = αh bh 2 = 0.
(16.54)
k=1
This implies αh = 0, because bh 2 > 0, since bh = 0 by hypothesis.
We have the following Definition 253 (Orthonormal vectors). Two or more real vectors bh are called orthonormal iff bTh bk = δhk ,
(16.55)
namely iff (i) they are mutually orthogonal, and (ii) they have been “normalized” by the condition bh = 1 (unit vector). [The comments in Definition 233, p. 594, on the meaning of the term “normalized,” apply here as well.] We have the following Definition 254 (Orthonormal basis). A basis {rh } is called orthonormal, iff its base vectors are orthonormal, namely iff rhT rk = δhk .
(16.56)
For instance, the base vectors eh in Eq. 16.34 are orthonormal, as you may verify.
640
Part III. Multivariate calculus and mechanics in three dimensions
16.4.1 Projections vs components in Rn Let us introduce the following Definition 255 (Projection in Rn ). Consider an n-dimensional real vector b. Let e be a real n-dimensional unit vector. The scalar beP := bT e
(16.57)
is called the projection of the vector b in the direction of the unit vector e. [Recall the projection for physicists’ vectors, beP = b · e (Eq. 15.19), and Remark 130, p. 638, on “borrowing” the term from physicists’ vectors.] Recall that the coefficients bk in the expression b=
n #
bk r k
(16.58)
k=1
are called the components of b with respect to the basis {rk } (Definition 248, p. 635). Then, we have the following Theorem 165 (Projections vs components in orthonormal bases). For a real orthonormal basis rh , the components bh of a vector b coincide with the projection bhP := bT rh of b in the direction of rh (Eq. 16.57), namely b h = bT r h .
(16.59)
◦ Proof : Indeed, b T rh =
n #
bk rkT rh =
k=1
n #
bk δkh = bh ,
(16.60)
k=1
in agreement with Eq. 16.59.
◦ Comment. This is an extension of Theorem 148, p. 600, on the expression for the components of a three–dimensional physicists’ vector v with respect to an orthonormal basis, i1 , i2 , i3 . Of course (as in Theorem 148, p. 600), this equality holds only for orthonormal bases. For future reference, note that Eq. 16.58 may be written as (use Eq. 16.59) b=
n #
(rkT b) rk ,
k=1
in full analogy with Eq. 15.31 for physicists’ vectors.
(16.61)
641
16. Inverse matrices and bases
From now on, we will use the same symbol bh for components and projections, provided of course that we are dealing with orthonormal basis.
16.4.2 Orthogonal matrices in Rn Consider the matrix R := r1 r2 . . . rn ,
(16.62)
with rh satisfying Eq. 16.56. Equation 16.56 may be expressed as RT R = I. Therefore, we have (recall that B A = I implies A B = I, Lemma 14, p. 629) RT R = R RT = I.
(16.63)
Matrices that satisfy the above relationship deserve a name. Accordingly, let us introduce the following Definition 256 (Orthogonal matrix). A real matrix that satisfies Eq. 16.63 is called an orthogonal matrix. ◦ Comment. Note that the use of the term “orthogonal matrix” is standard. Even though the term orthonormal matrices would make more sense to me, I will stick with the standard terminology. We have the following theorems: Theorem 166 (Inverse of orthogonal matrices). The inverse of an orthogonal matrix coincides with its transpose: R -1 = RT .
(16.64)
Vice versa, any matrix satisfying the above equation is orthogonal. ◦ Proof : The first part of the theorem is an immediate consequence of Eq. 16.63 and of the definition of inverse matrix (Definition 243, p. 629). For the second part, combining R R -1 = R -1 R = I (Eqs. 16.20 and 16.21) with 16.64, one obtains Eq. 16.63. Theorem 167. The product of orthogonal matrices is still an orthogonal matrix. ◦ Proof : Indeed, we have -1 T A B = B -1 A -1 = BT AT = A B .
(16.65)
642
Part III. Multivariate calculus and mechanics in three dimensions
[Hint: Use (A B) -1 = B -1A -1 (Eq. 16.27) and (A B)T = BT AT (Eq. 3.53), as well as A -1 = AT (Eq. 16.64).] Theorem 168 (Columns and rows of orthogonal matrices). A matrix is orthogonal if, and only if, its columns are mutually orthonormal. The same holds true for the rows. ◦ Proof : For the columns use rhT rk = δhk (Eq. 16.56), namely RT R = I (Eq. 16.63). For the rows use RRT = I (Eq. 16.63 again), which implies ˘rhT rk = δhk , and vice versa. ◦ Comment. Let us go back to the matrix R, for orthogonal transformations of physicists’ vectors, namely (Eq. 15.106) ⎡ ⎤ i1 · i1 i1 · i2 i1 · i3 R := rhk = ih · ik = ⎣i2 · i1 i2 · i2 i2 · i3 ⎦. (16.66) i3 · i1 i3 · i2 i3 · i3 Its columns are the components of the vectors ik , which are mutually orthogonal. Similarly, its rows are the component of the vectors ik , which are also mutually orthogonal. Therefore, the matrix R in Eq. 16.66 is orthogonal (Theorem 168 above), namely RT R = R RT = I (Eq. 16.63).
16.4.3 Orthogonal transformations We have the following Definition 257 (Orthogonal transformation). A transformation obtained through an orthogonal matrix is called an orthogonal transformation. We have the following theorems: Theorem 169. The magnitude of a mathematicians’ real vector is invariant under orthogonal transformations. √ ◦ Proof : Recall the definition of magnitude of a real vector, a = aT a (Eq. 16.50). Given a, as well as (see Eq. 16.36) a = RT a,
(16.67)
where R is an orthogonal matrix, we have a 2 = a a = aT R RT a = aT a = a2 , T
in agreement with the theorem.
(16.68)
643
16. Inverse matrices and bases
Theorem 170. Orthogonal vectors subject to an orthogonal transformation remain orthogonal. ◦ Proof : Given two real vectors a and b, consider also the vectors a = RT a and b = RT b (Eq. 16.67). By hypothesis, a and b are orthogonal vectors, so that aT b = 0 (Eq. 16.51). Then, we have a b = aT R RT b = aT b = 0, T
in agreement with the theorem.
(16.69)
Of course, the last two theorems hold if, and only if, the transformation is orthogonal. [Hint: Use Eq. 16.68, with R replaced by Q, where QT Q = I (non–orthogonal matrix).] Orthogonal transformations are not only magnitude invariant. The angle θ > 0 between two vectors is also invariant. For, Eq. 16.52 combined with T Eq. 16.68 and a b = aT b (Eq. 16.69) yields a b aT b = cosP-1 . a b a b T
θ = cosP-1
(16.70)
Finally, for future reference, for orthogonal transformations, the relationship between similar matrices (Eq. 16.40) reduces to A = RT A R.
(16.71)
16.4.4 Gram–Schmidt procedure in Rn In Theorem 164, p. 639, we have shown that orthogonal vectors are always linearly independent. Of course, the reverse is not true — linearly independent vectors are not necessarily orthogonal. However, akin to physicists’ vectors, given a set of p ≤ n linearly independent n-dimensional vectors it is always possible to construct from them a set of p orthonormal vectors, by using an extension to Rn of the Gram– Schmidt procedure used for physicists’ vectors in Section 15.6. The algorithm introduced here will be referred to as the Gram–Schmidt procedure in Rn . Specifically, consider a set of p ≤ n linearly independent n-dimensional real vectors, ak . Then, set e1 = a1 /a1 and
(16.72)
644
Part III. Multivariate calculus and mechanics in three dimensions
b2 = a2 − (aT2 e1 ) e1 ;
e2 = b2 /b2 ;
...
...
b h = ah −
h−1 #
(aTh ek ) ek ;
eh = bh /bh ;
k=1
... bp = ap −
... p−1 #
(aTp ek ) ek ;
ep = bp /bp .
(16.73)
k=1
In analogy with the results in Section 15.6, for every ek we have ek = 1. In addition, every bh (and hence every eh ) is orthogonal to all the ek ’s with k < h, as you may verify. [Hint: For instance, right–multiplying bT2 by e1 yields bT2 e1 = [a2 − (aT2 e1 ) e1 ]T e1 = aT2 e1 − aT2 e1 = 0.] Therefore, the above procedure yields a set of p orthonormal vectors. Note that, in Eq. 16.73, we have necessarily bh = 0 (h = 1, . . . , p ≤ n), because, if we didn’t, the vectors a1 , . . . , ah would be linearly dependent, contrary to our assumption, as you may verify. Finally, if the number p of linearly independent vectors equals the dimension of the space n, we have generated an orthonormal basis for the space Rn . ◦ Comment. Note that, akin to physicists’ vectors, if we change the sign of one of the vectors ek , the set is still orthonormal. In other words, the orthonormal set is not uniquely defined.
16.5 Complex matrices
♣
In this section, we deal with complex matrices, namely matrices that operate in Cn . This section parallels, almost word by word, the preceding one. For complex matrices and vectors, we have the following Definition 258 (Conjugate transpose matrices and vectors). Given any m × n matrix C, its conjugate transpose (or adjoint), denoted by C† , T is the transpose of the complex–conjugate of C, namely C† := C . In other words, C† is the n×m matrix obtained by taking the transpose of C, and then the complex of the result. In terms of the elements c†kh of C† , we † conjugate have ckh = chk . Similarly, given an n-dimensional vector c, its conjugate transpose, denoted by c† , is the transpose of the complex conjugate of c, namely c† := cT .
(16.74)
645
16. Inverse matrices and bases
16.5.1 Orthonormality in Cn
♣
Here, we extend to Cn the results of the preceding section, which are valid only in the real field. √ For a real vector, its magnitude is defined by b := bT b (Eq. 16.50). For complex vectors, the definition is slightly different. We have the following definitions: Definition 259 (Magnitude of a complex vector. Unit vector). Given any complex vector c, its magnitude (or norm) is defined by (see Eq. 1.90) $ % n √ %# † c := c c = & |ck |2 ≥ 0, (16.75) k=1
where the equal sign holds iff c = 0. [Of course, Eq. 16.76 includes the equivalent for real vectors (Eq. 16.51), as a particular case.] A complex vector with magnitude equal to one will be referred to as a unit vector. Definition 260 (Orthogonality for complex n-dimensional vectors). We say that two complex n-dimensional vectors, b and c in Cn are orthogonal iff b† c = 0.
(16.76)
Remark 131. In Eq. 16.75, we take the conjugate transpose of the first vector because we want the result of the operation to be positive for any c = 0, as expected in the definition of the magnitude of a complex vector (Eq. 16.75). [Just to give you an example, consider the vector c := ı, ı T . In this case, had we used c := cT c, we would have had cT c = ı, ı T ı, ı = −1 − 1 < 0.] Similarly, had we used bT c = 0 instead of Eq. 16.76 for the orthogonality condition, we would have had the undesirable consequence that we could have vectors orthogonal to themselves. [Just to give you an example, consider the vector c := 1, ı T . In this case, we would have had cT c = 1, ı T 1, ı = 1 − 1 = 0.] Then, we have the following Theorem 171 (Orthogonality implies linear independence). If the nonzero complex vectors ch (h = 1, . . . , p ≤ n) are orthogonal, they are also linearly independent. ◦ Proof : To say that the nonzero vectors ch are linearly independent is equivalent to saying that, if
646
Part III. Multivariate calculus and mechanics in three dimensions p #
γk ck = 0,
(16.77)
k=1
then, necessarily γk = 0 (k = 1, . . . , p ≤ n). Indeed, right–multiplying the conjugate transpose of Eq. 16.77 by ch , and using c†k ch = 0 for h = k (orthogonality of ch ), one obtains p #
γk c†k ch = γh c†h ch = γh ch 2 = 0,
(16.78)
k=1
which implies γh = 0, since ch 2 > 0 (Eq. 16.75).
In general, we have the following Definition 261 (Orthonormal complex vectors). Two or more complex vectors ch are called orthonormal iff c†h ck = δhk ,
(16.79)
namely iff (i) they are mutually orthogonal, and (ii) they have been “normalized” with the condition ch = 1. A basis in Cn , formed by n orthonormal vectors ch (h = 1, . . . , n) is called an orthonormal basis in Cn .
16.5.2 Projections vs components in Cn
♣
The definition of components in Cn is analogous to that in Rn (Eq. 16.58). Specifically, the coefficients bh in the expression b=
n #
bk c k
(16.80)
k=1
are called the components of b with respect to the (not necessarily orthonormal) basis ck . On the other hand, the projection in Cn of b in the direction of the orthonormal base vector ch , is given by bhP := b† ch .
(16.81)
Then, we have the following extension of Theorem 165, p. 640, for real vectors: Theorem 172 (Projections vs in orthonormal bases). components In a complex orthonormal basis ch , the components bh of a complex vector b coincide with the projections bhP in the direction of ch (Eq. 16.81).
647
16. Inverse matrices and bases
◦ Proof : Indeed, Eqs. 16.79–16.81 yield bhP := b† ch =
n #
bk c†k ch =
k=1
n #
bk δkh = bh ,
(16.82)
k=1
in agreement with the theorem.
From now on, we will use the same symbol bh for components and projections, provided of course that we are dealing with orthonormal bases.
16.5.3 Hermitian matrices
♣
Recall that C† := C (Definition 258, p. 644, of the conjugate transpose matrix). We have the following1 Definition 262 (Hermitian matrix). An n × n matrix C = chk is called Hermitian (or self–adjoint), iff it coincides with its conjugate transpose: T
C = C† ,
(16.83)
namely iff chk = ckh . [It is apparent that all the diagonal elements of a Hermitian matrix are necessarily real. It is also apparent that the notion of Hermitian matrices is the extension to the complex field of the notion of symmetric matrices in the real field. Indeed, a real Hermitian matrix is symmetric.] Note that (B C)† = C† B† .
(16.84)
For, using (B C)T = CT BT (Eq. 3.53), we have (use BC = B C, Eq. 1.88) (B C)† = (B C)T = C B = C† B† .
(16.85)
(a† B c)† = c† B† a.
(16.86)
T
T
Accordingly, we have
For Hermitian matrices, this yields (a† B c)† = c† B a. 1
(16.87)
Named after the French mathematician Charles Hermite (1822–1901). The Hermite interpolation polynomials and the Hermite orthogonal polynomials are also named after him.
648
Part III. Multivariate calculus and mechanics in three dimensions
16.5.4 Positive definite matrices in Cn
♣
We have the following Definition 263(Positive definite matrix in Cn ). An n × n Hermitian matrix C = chk is called positive definite iff x† C x > 0,
(16.88)
for any nonzero x ∈ Cn . ◦ Comment. You might wonder whether such a matrix exists. To this end, consider the Hermitian matrix C = Z† Λ Z, where Λ = Diag[λp ], with λp all real and positive, whereas Z is any nonsingular complex matrix. The matrix C is clearly positive definite, because for any nonzero x ∈ Cn , we have x † C x = x † Z† Λ Z x = y † Λ y =
n #
λp |yp |2 > 0.
(16.89)
p=1
[Hint: We necessarily have that y := Z x = 0, because Z is nonsingular, and hence it is possible to have y = Z x = 0 only if x = 0.] Therefore, C is a positive definite matrix in Cn .
16.5.5 Unitary matrices
♠
We also have the following Definition 264 (Unitary matrix and transformation). A complex square matrix that satisfies R † R = R R† = I
(16.90)
is called a unitary matrix. The corresponding transformation is called a unitary transformation. [Note the analogy with RT R = R RT = I (Eq. 16.63), for real orthogonal matrices. In plain words, unitary matrices are the complex equivalent of real orthogonal matrices.] Thus, we have the following Theorem 173 (Inverse of unitary matrices). The inverse of a unitary matrix coincides with its conjugate transpose: R -1 = R† .
(16.91)
649
16. Inverse matrices and bases
◦ Proof : It stems from the definitions of unitary matrices (Eq. 16.90) and inverse matrices (Definition 243, p. 629).
16.5.6 Gram–Schmidt procedure in Cn
♠
Here, we extend to the complex field the Gram–Schmidt procedure in Rn presented in Subsection 16.4.4. The algorithm introduced here will be referred to as the Gram–Schmidt procedure in Cn . Consider a set of p ≤ n linearly independent complex vector ah . Set c1 = a1 /a1 and b2 = a2 − (a†2 c1 ) c1 ;
c2 = b2 /b2 ;
...
...
b h = ah −
h−1 #
(a†h ck ) ck ;
ch = bh /bh ;
k=1
... bp = ap −
... p−1 #
(a†p ck ) ck ;
cp = bp /bp .
(16.92)
k=1
In analogy with what we did for the real case in Subsection 16.4.4, for every ch we have ch = 1. In addition, every bh (and hence every ch ) is orthogonal to all the vectors ck with k < h, as you may verify. [Hint: For instance, right– multiplying b†2 by c1 , we have b†2 c1 = [a2 − (a†2 c1 ) c1 ]† c1 = a†2 c1 − a†2 c1 = 0.] Note that, as in the real case (Eq. 16.73), in Eq. 16.92 we have necessarily bh = 0 (h = 1, . . . , k ≤ p), because, if we didn’t, the vectors a1 , . . . , ah would be linearly dependent (contrary to our assumption), as you may verify. Finally, if the number p of linearly independent vectors equals the dimension n of the space, we have generated a (complex) orthonormal basis for the space Cn .
16.6 Appendix A. Gauss–Jordan elimination
♣
As mentioned at the beginning of Section 16.2, the material on the inverse matrix (Subsection 16.2.2) is closely related to that on the Gaussian elimination presented in Chapters 2 and 3. This relationship is explored at great length in this appendix, where we present the Gauss–Jordan elimination pro-
650
Part III. Multivariate calculus and mechanics in three dimensions
cedure, a modification of the Gaussian elimination procedure introduced in Section 2.4. This new procedure might not be as intuitive and as easy to follow as the standard one, but it is useful to connect the Gaussian elimination procedure to the notion of inverse matrix.2 Specifically, consider Eq. 16.30, namely A x = b,
(16.93)
where A is assumed to be nonsingular. [Without loss of generality, here again we assume that no interchange of rows and/or columns is required (Remark 28, p. 84).] We begin by recalling that the first cycle in the Gaussian elimination procedure of Section 2.4 consists in using the first to eliminate the variable x1 from all the equations that follow, namely from the second to the last (see Eqs. 2.61 and 2.62). Note that the operations produce a matrix of the type ⎤ ⎡ a11 a12 a13 . . . a1n ⎢ 0 a22 a23 . . . a2n ⎥ ⎥ ⎢ ⎥ A =⎢ (16.94) ⎢ 0 a32 a33 . . . a3n ⎥, ⎣... ... ... ... ... ⎦ 0 an2 an3 . . . ann where a1k = a1k . Specifically, in terms of components, we have ahk = ahk − (1 − δh1 )
n # ah1 ah1 δhj − (1 − δh1 ) a1k = δ1j ajk a11 a11 j=1
(16.95)
(with h, k = 1, . . . , n), thereby extending Eq. 2.62, to include the case h = 1. In other words, we have A = C1 A,
(16.96)
ah1 C1 = δhj − (1 − δh1 ) δ1j = I − Q1 , a11
(16.97)
where
with 2
The Gauss–Jordan elimination is named after Carl Friedrich Gauss and the German geodesist Wilhelm Jordan (1842–1899). [Not to be confused with the French mathematician Camille Jordan (of Jordan curve and Jordan matrix fame).]
651
16. Inverse matrices and bases
⎡
0 0 0 ⎢ 0 0 a * 21 1 ) 1 ⎢ ⎢ a31 0 0 Q1 = (1 − δh1 ) ah1 δ1j = a11 a11 ⎢ ⎣ ... ... ... an1 0 0
⎤
... 0 ... 0 ⎥ ⎥ ... 0 ⎥ ⎥, ... ...⎦ ... 0
(16.98)
as you may verify. Of course, the same operations are performed on the right sides of the equations. Thus, the result of the first cycle of the Gaussian elimination procedure (Eq. 2.61) may be expressed, in a more compact form, as A x = b ,
(16.99)
where A = C1 A (Eq. 16.96), whereas b = C1 b. The next cycle in the Gaussian elimination procedure of Section 2.4 consists in using the second equation to eliminate the variable x2 from all the equations that follow, namely from the third to the last (Eqs. 2.63 and 2.64). Here instead , we use the Gauss–Jordan procedure, which consists in using the second equation to eliminate the variable x2 from all the equations except for the second. Specifically, in the modified procedure we remove the variable x2 from the first equation as well. This produces a matrix of the type ⎤ ⎡ a11 0 a13 . . . a1n ⎢ 0 a22 a23 . . . a2n ⎥ ⎥ ⎢ 0 a33 . . . a3n ⎥ A =⎢ (16.100) ⎥, ⎢ 0 ⎣... ... ... ... ... ⎦ 0 0 an3 . . . ann where a2k = a2k . [Compare the above equation to Eq. 2.63, obtained with the Gaussian elimination procedure.] The operations involved are analogous to those used to obtain A . Specifically, we have (compare to Eq. 16.96) A = C2 A ,
(16.101)
where (compare to Eq. 16.97) ah2 C2 = δhj − (1 − δh2 ) δ2j = I − Q2 , a22
(16.102)
with (compare to Eq. 16.98) ⎡
0 ⎢ 0 * 1 ) 1 ⎢ 0 Q2 = (1 − δh2 ) ah2 δ2j = ⎢ a22 a22 ⎢ ⎣... 0
a12 0 0 0 a32 0 ... ... an2 0
⎤ ... 0 ... 0 ⎥ ⎥ ... 0 ⎥ ⎥, ... ...⎦ ... 0
(16.103)
652
Part III. Multivariate calculus and mechanics in three dimensions
as you may verify. We may proceed this way for a total of n cycles (not n − 1 cycles, as in the standard Gaussian elimination procedure, which does not require any modification of the last columns after n − 1 cycles). This yields C∗ A x = C∗ b =: g,
(16.104)
C∗ := Cn . . . C2 C1 ,
(16.105)
with
where the matrices Ck are all defined as Ck = I − Qk
(k = 1, . . . , n),
(16.106)
with Qk =
1 (k−1)
akk
)
(k−1)
(1 − δhk ) ahk
δkj
*
(k = 1, . . . , n),
(16.107)
(k−1)
is the element hk of the matrix obtained after k − 1 cycles. where ahk [Again, the multiplication by Ck eliminates the variable xk from all the equations, except for the k-th, as you may verify.] The end result of the above Gauss–Jordan elimination procedure is that, by construction, the matrix C∗ A is diagonal: C∗ A =: D = Diag dhh . (16.108) Remark 132. Contrary to the Gaussian elimination procedure, which yields an upper triangular matrix, the Gauss–Jordan procedure does not require the back–substitution phase of the procedure. [Incidentally, if you are familiar with computing, you will easily realize that this modified procedure is also more convenient to use within such a context. Indeed, the Gauss–Jordan elimination technique virtually eliminates the second phase of the Gaussian elimination procedure, that is, the solution by back–substitution of the upper triangular system resulting from the first phase of the Gaussian elimination procedure. Whatever remains is highly parallelizable, since the operations on the rows are independent of each other.] Combining Eqs. 16.104 and 16.108 we see that, through left–multiplications by appropriate matrices, the original problem, A x = b (Eq. 16.93), has been transformed into diagonal form, as D x = g,
(16.109)
653
16. Inverse matrices and bases
namely dhh xh = gh
(h = 1, . . . , n).
(16.110)
Next, recall that, in Eq. 16.93, we have assumed A to be nonsingular. This guarantees that none of the dhh ’s vanishes (Lemma 4, p. 115). Therefore, it is possible to divide the h-th equation by dhh , so as to obtain xh = gh /dhh . This is fully equivalent to multiplying Eq. 16.109 by the diagonal matrix D -1 := Diag 1/dhh , (16.111) so as to obtain x = D -1 C∗ b. Thus, we have x = C b,
(16.112)
C := D -1 C∗ = D -1 Cn . . . C2 C1 .
(16.113)
where
Equation 16.112 indicates that the matrix C introduced in Eq. 16.113 coincides with the inverse of A (Remark 128, p. 630), namely C = A -1.
(16.114)
Remark 133. In full analogy with Lemma 3, p. 86, we have that Eq. 16.110 indicates that the condition dhh = 0 for all the values of h (namely that A is nonsingular), is necessary and sufficient for the existence and uniqueness of the solution. If, in Eq. 16.110, the last p diagonal coefficients dhh vanish, then the solution does not exist, unless all the corresponding gh is also equal to zero. In this case, the unknowns xh are arbitrary, and the solution is not unique.
• Simply a curiosity
♠
Consider the matrix Q1 , as given by Eq. 16.98. Note that Q21 = O. [Hint: Use the expression for Q1 given on the far right of Eq. 16.98.] Similarly, we have Q2k = O (k = 1, . . . , n), with Qk given by Eq. 16.107, as you may verify. [Hint: Use Eq. 16.107, along with the identity (1 − δpr ) δpr = 0.] Hence, setting C∗k := I + Qk , one obtains C∗k Ck = I + Qk I − Qk = I, (16.115) namely C∗k = Ck-1. Therefore, using (A B) -1 = B -1A -1 (Eq. 16.27), as well as Eq. 16.113 and 16.114, we have that A is given by
654
Part III. Multivariate calculus and mechanics in three dimensions
A = C -1 = C1-1 C2-1 . . . Cn-1 D = C∗1 C∗2 . . . C∗n D.
(16.116)
16.7 Appendix B. The L-U decomposition, and more
♣
In Section 16.6, we have introduced an expression for the inverse matrix, by using the Gauss–Jordan elimination procedure. In this section, we address the relationship at a deeper level. Specifically, here we address the inverse matrix, directly in terms of the Gaussian elimination procedure. To do this, we begin with some introductory material on upper and lower triangular matrices, including some material on the so–called nilpotent matrices. Then, we introduce the L-U decomposition and discuss its relationship with the Gaussian elimination procedure. [Here again, without loss of generality, we assume that no interchange of rows and/or columns is required (Remark 28, p. 84).]
16.7.1 Triangular matrices. Nilpotent matrices
♣
In Definition 46, p. 93, we have already introduced the definition of upper (lower) triangular matrix, as a matrix whose elements below (above) the main diagonal vanish. Here, we introduce an expanded definition. Specifically, we have the following Definition 265 (Upper and lower triangular matrices). A square matrix U is called upper (lower) triangular iff all the terms below (above) its main diagonal vanish. An upper (lower) triangular matrix is called strictly upper (lower) triangular iff all the diagonal terms also vanish. Finally, an upper (lower) triangular matrix is called upper (lower) unitriangular iff all its diagonal terms equal one. The relevance of upper triangular matrices stems from the fact that the corresponding system of equations may be easily solved, namely by following the back–substitution portion of the Gaussian elimination procedure (Chapters 2 and 3). However, they have additional important properties that are addressed here.
• Properties of triangular matrices We have the following
655
16. Inverse matrices and bases
Theorem 174 (Product of triangular matrices). The product of two upper (lower) triangular matrices is an upper (lower) triangular matrix. ◦ Proof : Consider two upper triangular matrices, say A and B. [The proof for lower triangular matrices is analogous.] We have ahj = 0 (j < h) Accordingly, the elements chk = by chk =
k #
and "n
j=1
ahj bjk
bjk = 0 (j > k).
(16.117)
ahj bjk (Eq. 3.45) of C = A B are given
if h ≤ k ;
j=h
=0
if h > k .
(16.118)
in agreement with the theorem. [Hint: The second line of the above equation "n stems from the fact that all the terms in j=1 ahj bjk vanish if h > k. This is "k consistent with the convention j=h fj = 0, whenever h > k, independently of fj (Eq. 3.21).] ◦ Warning. Recall that, if all the diagonal elements of an upper triangular matrix U differ from zero, then the rows of the matrix are linearly independent (Lemma 4, p. 115), and hence the matrix U is nonsingular (Definition 69, p. 117). Thus, for any upper triangular matrix U = [uhk ], we have that “uhh = 0 for all h” is equivalent to “U is nonsingular.”
(16.119)
The last statement is more direct and therefore it will be used in the rest of the book. [The same holds true for lower triangular matrices.]
• Inverse of triangular matrices Here, we want to show that the inverse of a lower (upper) triangular matrix is itself lower (upper) triangular. We have the following Theorem 175 (Inverse of triangular matrices). The inverse of any nonsingular lower triangular matrix A = [ahk ] is a nonsingular lower triangular matrix, with diagonal elements equal to 1/ahh . In particular, the inverse of a lower unitriangular matrix is lower unitriangular. The same holds true for upper triangular matrices. ◦ Proof : Let us consider a nonsingular lower triangular matrix A. [The proof for an upper triangular matrix is analogous.] This implies that
656
Part III. Multivariate calculus and mechanics in three dimensions
ahj = 0,
for j > h.
(16.120)
Let B be the inverse of A, so that A B = I, namely (use the above equation) n #
ahj bjk =
j=1
h #
ahj bjk = δhk
(h, k = 1, . . . , n).
(16.121)
j=1
The above equation yields bhk =
h−1 # 1 ahj bjk , δhk − ahh j=1
(16.122)
because ahh = 0 (Eq. 16.119), since the triangular matrix A is nonsingular by hypothesis. Next, we want to show that the above equation yields bhk = 0,
for k > h.
(16.123)
We do this recursively. Consider first h = 1. In this case, Eq. 16.121 reduces to a11 b1k = δ1k ,
(16.124)
namely b11 = 1/a11 and b1k = 0,
for k > 1.
(16.125)
Next, consider the case h = 2. In this case, Eq. 16.121 reduces to a21 b1k + a22 b2k = δ2k ,
(16.126)
namely b22 = 1/a22 and b2k = 0,
for k > 2.
(16.127)
Similarly, for h = 3 we obtain b3k
2 −1 # −1 ah1 b1k + ah2 b2k = 0, = ahj bjk = a33 j=1 a33
for k > 3, (16.128)
because b1k = 0 for k > 1, and b2k = 0 for k > 2 (Eqs. 16.125 and 16.127). You may continue this way for n times, and show that bhk = 0 for k > h, in agreement with Eq. 16.123. [If you want to be more elegant, you may use the principle of mathematical induction.] Equation 16.123 indicates that the
657
16. Inverse matrices and bases
matrix B is lower triangular as well. Also, consider the case h = k, for which Eq. 16.122 yields bhh = 1/ahh , since the sum in Eq. 16.122 vanishes (use bjk = 0 for j < h = k, Eq. 16.123). Finally, for unitriangular matrices, we have ahh = bhh = 1. Hence, the inverse of a lower unitriangular matrix is lower unitriangular.
• Strictly triangular matrices are nilpotent In closing this subsection, we want to show an interesting property of strictly triangular matrices, for which akk = 0. Consider a generic 4 × 4 strictly lower triangular matrix, namely ⎤ ⎡ 0 0 0 0 ⎢a21 0 0 0⎥ ⎥ (16.129) A=⎢ ⎣a31 a32 0 0⎦. a41 a42 a43 0 Note that ⎡
0 ⎢ a 21 A2 = ⎢ ⎣a31 a41
0 0 a32 a42
0 0 0 a43
⎤ 0 0⎥ ⎥ 0⎦ 0
⎡
0 ⎢a21 ⎢ ⎣a31 a41
0 0 a32 a42
0 0 0 a43
⎤ ⎡ 0 0 0 ⎢ 0 0 0⎥ ⎥=⎢ 0⎦ ⎣a32 a21 0 0 a41 a43 a32
where a41 = a42 a21 + a43 a31 . Moreover, ⎤⎡ ⎡ 0 0 0 0 0 0 0 ⎥ ⎢ ⎢ a 0 0 0 0 ⎥ ⎢ 21 0 0 A3 = ⎢ ⎣a32 a21 0 0 0⎦ ⎣a31 a32 0 a41 a43 a32 0 0 a41 a42 a43
we have that ⎤ ⎡ 0 0 ⎢ 0⎥ 0 ⎥=⎢ 0⎦ ⎣ 0 0 a43 a32 a21
⎤ 0 0⎥ ⎥, (16.130) 0⎦ 0
0 0 0 0
0 0 0 0
⎤ 0 0⎥ ⎥ (16.131) 0⎦ 0
⎤ 0 0⎥ ⎥. 0⎦ 0
(16.132)
0 0 0 0
and ⎡
0 ⎢ 0 A4 = ⎢ ⎣ 0 a43 a32 a21
0 0 0 0
0 0 0 0
⎤⎡ 0 0 ⎢a21 0⎥ ⎥⎢ 0 ⎦ ⎣a31 0 a41
0 0 a32 a42
0 0 0 a43
⎤ ⎡ 0 0 ⎢0 0⎥ ⎥=⎢ 0⎦ ⎣ 0 0 0
0 0 0 0
0 0 0 0
What is relevant here is not so much the expression of the coefficients, but rather the fact that A4 = O. Specifically, by going from A to A4 , the nonzero portion becomes a smaller and smaller lower triangular matrix, until the entire matrix is filled up with zeros. To formulate a generalization of this result, let us introduce the following
658
Part III. Multivariate calculus and mechanics in three dimensions
Definition 266 (Nilpotent matrix). An n×n matrix A is called nilpotent, iff there exists a natural number k such that Ak = O.
(16.133)
The smallest k for which the above equation holds is called the degree of A. We have the following Theorem 176. Strictly triangular n × n matrices are nilpotent, with degree at most equal to n. ◦ Proof : Let A = [ahk ] denote a strictly lower triangular n × n matrix, (2) namely a matrix with ahk = 0 for k ≥ h. The generic element ahk of A2 is given by (2)
ahk =
n #
ahj ajk =
j=1
h−1 #
ahj ajk ,
(16.134)
j=k+1
because ahj = 0 for j ≥ h, and ajk = 0 for k ≥ j. Hence, we have (2)
for k ≥ h − 1.
ahk = 0,
(16.135)
Similarly, by mathematical induction, we have that (p)
for k ≥ h − p + 1,
ahk = 0,
as you may verify. Thus, when p = n all the elements of An vanish.
(16.136)
◦ Comment. Although most of the nilpotent matrices that we will encounter in this book have a lot of zero elements, you should not have the impression that this is a necessary condition. For example, we have
1 −1 1 −1
2 = O,
(16.137)
as you may verify. [Note that the determinant of the above matrix vanishes. This is always the case for any nilpotent matrix, an issue addressed in Vol. II, in connection with the Binet theorem on the determinant of the product of two n × n matrices.]
659
16. Inverse matrices and bases
16.7.2 The L-U decomposition
♠
Here, we use the Gaussian elimination procedure to show that a nonsingular matrix may be decomposed in the product of a lower triangular matrix times an upper one (L-U decomposition). [The presentation is quite cursory. If you want to know more about L-U decompositions, you may consult for instance Stoer and Burlisch (Ref. [62], pp. 159 ff.)] Specifically, we have the following Theorem 177 (L-U decomposition). Any nonsingular square matrix A may be written as A = L U,
(16.138)
where L is a lower unitriangular matrix, and U is an upper triangular matrix. ◦ Proof : Here, we reformulate the Gaussian elimination procedure (Subsection 2.4.2) in terms of matrix multiplications (akin to what we did in Section 16.6, in connection with the Gauss–Jordan elimination). As noted in the preceding section, the first step is equal in the two approaches. Accordingly, we have A = L1 A, where L1 = I − P1 , with P1 = Q1 , given in Eq. 16.98. [This fully equivalent to A = C1 A (Eq. 16.96), with C1 = I − Q1 (Eq. 16.97), with Q1 given by Eq. 16.98. Of course, L1 coincides with C1 .] Note that P1 = Q1 (Eq. 16.98) is strictly lower triangular, and hence L1 = C1 is a lower unitriangular matrix. Next, let us consider the second cycle of the procedures. Here, we have A = L2 A , where L2 = I − P2 , with ⎡ ⎤ 0 0 0 ... 0 ⎢ 0 0 0 ... 0 ⎥ ⎥ 1 ⎢ 0 a32 0 . . . 0 ⎥ P2 = ⎢ (16.139) ⎢ ⎥. a22 ⎣ ... ... ... ... ...⎦ 0 an2 0 . . . 0 [Note the difference with respect to the Gauss–Jordan procedure, when we had A = C2 A (Eq. 16.101), where C2 = I − Q2 (Eq. 16.102), with Q2 = P2 , because in Q2 the first coefficient of the second column does not vanish (Eq. 16.103).] Again, P2 is strictly lower triangular, and hence L2 = I − P2 is a lower unitriangular matrix. For all the other steps, all the equations in Section 16.6 (Gauss–Jordan elimination) apply, provided that Ck = I − Qk is replaced by Lk = I − Pk , where the matrices Pk are strictly lower triangular matrices. [They are obtained from Qk by setting equal to zero all the coefficients above the main diagonal.] Accordingly, after completing all the cycles of the Gaussian elimination procedure the matrix A has been modified into an upper triangular matrix U, namely
660
Part III. Multivariate calculus and mechanics in three dimensions
Ln−1 . . . L2 L1 A = U,
(16.140)
where, as noted above, all the matrices Lk are lower unitriangular and so is their product (Theorem 174, p. 655). In summary, the above equation is equivalent to Eq. 16.138, with -1 -1 L = Ln−1 . . . L2 L1 = L1-1 . . . L2-1 Ln−1 , (16.141) which is itself lower unitriangular (Theorem 175, p. 655).
◦ Comment. The L-U decomposition is quite useful when we have to solve a problem iteratively (or sequentially, as in the so–called step–by–step solutions). For these types of problems, we have to solve a sequence of linear algebraic systems whose right sides is dependent upon the results of the preceding solution. In this case, the initial effort to obtain the decomposition is paid off by the fact that U and L are easily invertible.
16.7.3 Tridiagonal matrices
♠
The L-U decomposition is particularly interesting for tridiagonal matrices, introduced in Section 3.9. To make our life simpler, I will simply state the results and then verify their correctness. Consider again Eq. 3.174, namely ⎤ ⎡ 0 0 d 1 c1 0 . . . 0 0 0 . . . 0 ⎢ a 2 d2 c 2 . . . 0 0 0 . . . 0 0 0 ⎥ ⎥ ⎢ ⎢. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ⎥ ⎥ ⎢ 0 0 ⎥ (16.142) A=⎢ ⎥. ⎢ 0 0 0 . . . a k dk c k . . . 0 ⎥ ⎢. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ⎥ ⎢ ⎣ 0 0 0 . . . 0 0 0 . . . an−1 dn−1 cn−1 ⎦ 0 0 0 ... 0 0 0 ... 0 an dn For this matrix, I claim ⎡ 1 0 ⎢ a2 1 ⎢ ⎢ 0 a3 ⎢ L=⎢ ⎢. . . . . . ⎢0 0 ⎢ ⎣. . . . . . 0 0
that we have A = L U (Eq. 16.138), with ⎤ 0 ... 0 0 ... 0 0 0 0 ... 0 0 ... 0 0 0 ⎥ ⎥ 1 ... 0 0 ... 0 0 0 ⎥ ⎥ . . . . . . . . . . . . . . . . . . . . . . . .⎥ ⎥, 0 . . . ak 1 . . . 0 0 0 ⎥ ⎥ . . . . . . . . . . . . . . . . . . . . . . . .⎦ 0 . . . 0 0 . . . 0 an 1
(16.143)
661
16. Inverse matrices and bases
where ak = ak /dk−1 ,
(16.144)
with d1 = d1
(16.145)
and dk = dk − ak ck−1 On the other hand, ⎡
d1 ⎢0 ⎢ ⎢0 ⎢ U=⎢ ⎢. . . ⎢0 ⎢ ⎣. . . 0
c1 d2 0 ... 0 ... 0
0 c2 d3 ... 0 ... 0
... ... ... ... ... ... ...
k = 2, . . . , n .
0 0 0 ... 0 ... 0
0 0 0 ... dk ... 0
0 0 0 ... ck ... 0
... ... ... ... ... ... ...
0 0 0 ... 0 ... 0
(16.146)
⎤ 0 0⎥ ⎥ 0⎥ ⎥ . . .⎥ ⎥. 0⎥ ⎥ . . .⎦ dn
(16.147)
Let us verify that my claim is correct. We have d1 ⎢ a2 ⎢ ⎢. . . LU = ⎢ ⎢ 0 ⎢ ⎣. . . 0 ⎡
c1 dˆ2 ... 0 ... 0
0 c2 ... 0 ... 0
... ... ... ... ... ...
0 0 ... ak ... 0
0 0 ... dˆk ... 0
0 0 ... c2 ... 0
... ... ... ... ... ...
0 0 ... 0 ... an
⎤ 0 0⎥ ⎥ . . .⎥ ⎥, . . .⎥ ⎥ . . .⎦ dˆn
(16.148)
with d1 = d1
(16.149)
dˆk = ak ck−1 + dk = ak ck−1 + (dk − ak ck−1 ) = dk ,
(16.150)
and
as you may verify. Therefore, we have L U = A. Let us introduce the following Definition 267 (Bidiagonal matrix). A matrix is called lower (upper) bidiagonal iff its nonzero elements are only those on the main diagonal and on the subdiagonal (superdiagonal). A matrix is called lower (upper) unibidi-
662
Part III. Multivariate calculus and mechanics in three dimensions
agonal iff its nonzero elements are only those on the main diagonal, which equal one, and on the subdiagonal (superdiagonal). Accordingly, L is a lower unibidiagonal matrix (Eq. 16.143), whereas U is an upper bidiagonal matrix (Eq. 16.147).
Chapter 17
Statics in three dimensions
According to Newtonian mechanics, we live in a Euclidean three–dimensional space. Here, we broaden the principles of statics, by extending the formulation from one to three dimensions. This chapter provides us with essential tools (such as the definitions of moment, rigid–body, and barycenter) that are used in dynamics as well. [The two–dimensional case is seen as a particular case of the three–dimensional one, when all the forces lie in the same plane. Accordingly, their components in the direction normal to such a plane vanish.] ◦ Warning. Unless otherwise stated, in this chapter the term “vector” denotes “physicists’ vectors,” which are the ones primarily used here.
• Overview of this chapter In Section 17.1, we introduce the definition of applied vectors, which allows us to properly introduce the notion of force. Indeed, in a three–dimensional space, forces are vector quantities, in the sense that forces have not only magnitude and direction, but also in the sense that the rule for their addition is precisely that introduced for vectors, namely the parallelogram and parallelepiped rules. However, forces are special vectors. For, in addition to the properties stated in the definition of vectors (and their operations, such as dot and cross products), the so–called line of application (namely the straight line parallel to the force, through the point of application) is also relevant. This fact was inessential in the one–dimensional formulation, in which case the forces are acting along the same line. Vectors for which the line of application cannot be changed without altering the essence of the problem are called applied vectors, and are introduced in Definition 268, p. 664. Next, in Section 17.2, we present the three–dimensional extension for the Newton equilibrium law of a single particle, with illustrative examples. Then, we ap-
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_17
663
664
Part III. Multivariate calculus and mechanics in three dimensions
ply this law to study the equilibrium of systems of n particles, in particular spring–connected particles (Section 17.3). Chains of spring–connected particles are addressed in Section 17.4, where we expand the analysis to include lateral motion. Next, we shift gears and address some very general (and very important) properties of systems that consist of n particles. Specifically, in Section 17.5, we define the moment of a force, a concept for which it is essential that a force be defined as an applied vector. We use this definition in Section 17.6 to study some general properties of n-particle systems. In particular, we extend to three dimensions the Newton third law of action and reaction, which was introduced in Subsection 4.2.1 for the limited one–dimensional case, and introduce resultant forces and resultant moments, as well as torques and wrenches. Then, still in Section 17.6, we present some general considerations on n-particle system constraints, specifically on isostatic (statically determinate) systems. If the particles are rigidly connected, we obtain the so–called rigid systems. These are addressed in Section 17.7, which includes applications to rigid–system statics, in particular levers and related matters. We also have two appendices. In the first (Section 17.8), we introduce the laws of static and dynamic friction. In the second (Section 17.9), we present some illustrative examples of static and dynamic friction.
17.1 Applied vectors and forces As pointed out above, to extend to three dimensions the definition of a force and the Newton laws (in particular the third), we need to introduce the notion of applied vectors. We have the following (compare to Definition 222, p. 587, of vectors) Definition 268 (Applied vector). An applied vector is a vector with a point of application. Specifically, an applied vector is a geometrical entity characterized by magnitude, direction (alignment and pointing), and point of application. [Of course, all the rules for vector operations introduced in Chapter 15 hold for applied vectors as well.] Next, consider forces. Thus far, we have considered forces only within a one–dimensional context. If we deal with forces in two or three dimensions, the physical definition remains unchanged. However, we have to be careful. Indeed, we have to add the following
665
17. Statics in three dimensions
Definition 269 (Force). In two and three dimensions forces are applied vectors. Indeed, experimental evidence shows that they follow the corresponding rules.1 In other words, a force is subject to all the rules introduced for vectors in the preceding chapter, but, in addition, it is essential to know its point of application. As we will see, this is paramount for the three–dimensional extension of the Newton third law. ◦ Comment. Forces are the only applied vectors encountered in this book (see the comment below Eq. 15.14, regarding the location vector).
17.2 Newton equilibrium law of a particle In this section, we formulate the problem of the static equilibrium of a particle (material point) in three dimensions. As in Section 4.1, here we still limit ourselves to a single particle P that does not move. However, we remove the one–dimensional limitation. In particular, we assume that all the forces acting on the particle P have the same point of application (that is, P ), but not necessarily the same direction. In this case, the equilibrium may be attained only if all the forces are balanced. Specifically, we have the following Law 4 (Newton equilibrium law. Single particle) For a single particle P that does not move (statics), the equilibrium (namely the fact that the particle at rest does not start to move, but remains at rest) is attained if, and only if, all the forces are balanced, in the sense that their sum vanishes, namely f :=
n #
fh = 0,
(17.1)
h=1
where fh (h = 1, . . . , n) include all the n forces acting on P (point of application). Remark 134. As we will see, Eq. 17.1 is a particular case of the Newton first law, which states that, if the sum of all the forces acting on the material point vanishes, then the point moves with a constant velocity. All the comments presented in Subsection 11.4.1 apply to the three–dimensional case as well. Specifically, the Newton first law is a particular case of the Newton second 1
The German physicist Arnold Sommerfeld (1868–1951) refers to the rule of addition of forces as the fourth law. Specifically, he states: “We shall call the rule of the parallelogram of forces our Fourth law, even though in Newton it appears merely as an addition or corollary to the other laws of motion” (Ref. [61], p. 6).
666
Part III. Multivariate calculus and mechanics in three dimensions
law, which governs the dynamics of a particle (Eq. 20.3). On the other hand, the Newton equilibrium law for a single particle is obtained from the Newton first law, by using a frame of reference that travels with the particle. Vice versa, the Newton first law one may be obtained from the Newton equilibrium law for a single particle, by using any inertial frame of reference different from that in which one observes equilibrium. Remark 135. If you have had a course on physics, you might wonder why I am using the symbol f (lower–case boldface letter) to indicate forces, instead of the more common F (capital boldface letter), in analogy with what we did for the one–dimensional case, where we used F . The reason is that, as mentioned in Definition 237, p. 596, in this book capital boldface letters are reserved for tensors, a notion introduced in Section 23.9.
17.2.1 Illustrative examples In the formulation presented in Chapter 4, we barely addressed those forces that arise when the system of particles is constrained (see, for instance, Section 4.4, on spring–particle chains anchored at both ends). These forces, known as constraint reactions, are particularly important when we deal with the equilibrium of a rigid n-particle system (Section 17.7). Thus, we start discussing them here, when it is easier to present the notion per se. Specifically, we introduce them simply by presenting some elementary examples of how such forces may arise.
• Heavy particle on a horizontal surface Consider first a heavy particle (again, a particle subject to gravity) on a horizontal plane, such as a steel ball on a marble table (Fig. 17.1). The weight is given by w = −W n, where n denotes the upward unit normal to the table, and is balanced by the reaction of the constraint, the table in our case, so that w + f = −W n + N n = 0, which implies, W = N.
(17.2)
Some comments are in order. The force N is the reaction of the horizontal surface, so as to prevent penetration. In other words, the horizontal surface acts as a constraint against a further motion of the particle. To be specific, we have the following
17. Statics in three dimensions
667
Fig. 17.1 Weight on a table
Definition 270 (Unilateral contact. Unilateral constraint). The term unilateral contact refers to two objects pressed against each other, and able to separate if the force that pushes the two against each other changes sign. A constraint is called unilateral iff it is caused by a unilateral contact. In our case, if W < 0 (think of the figure turned upside down), the particle would not be prevented to move away from the table. ◦ Comment. The horizontal equilibrium is assured by the fact that none of the forces involved has a horizontal component.
• A particle hanging by two strings Here, we consider a heavy particle P , hanging by two strings, which in turn are anchored to a ceiling, at Q1 and Q2 (Fig. 17.2). [Again, a string has, by definition (Remark 124, p. 589), the property that the forces acting on its endpoints are always aligned with the string itself, and they are necessarily
Fig. 17.2 Weight suspended
668
Part III. Multivariate calculus and mechanics in three dimensions
such that the string is in tension and not in compression (Remark 124, p. 589).] The static equilibrium equation for the particle, in this case, is given by −W j + T1 e1 + T2 e2 = 0,
(17.3)
where eh = (xQ − xP )/xQ − xP (h = 1, 2) are two unit vectors pointing h h from P to Qh , whereas T1 and T2 denote the tension in each of the two strings, with Th ≥ 0, by definition of string. ◦ Comment. The above problem is two–dimensional, in the sense that all the forces lie on the same plane. [If you prefer, the component of Eq. 17.3 in the direction normal to the plane defined by P , Q1 and Q2 is identically satisfied, since 0 + 0 + 0 = 0.] The solution may be obtained by writing the horizontal and vertical components of Eq. 17.3, namely T1 cos θ1 − T2 cos θ2 = 0, T1 sin θ1 + T2 sin θ2 = W,
(17.4)
where θh ∈ (0, π/2) are the (acute) angles that each of the two strings forms with the ceiling, as shown in Fig. 17.2. The above equation may be written as cos θ1 − cos θ2 T1 0 = . (17.5) W sin θ1 sin θ2 T2 This is a system of two linear algebraic equations with two unknowns, T1 and T2 , whose determinant, D = sin θ1 cos θ2 + sin θ2 cos θ1 , is positive, because all the four terms are positive. Hence, there exists a unique solution. [You may obtain an explicit expression for the above system, and compare it with the solution given below.] ◦ Comment. A more direct approach to obtain the solution is obtained by introducing the unit vectors nh (h = 1, 2), which are respectively normal to eh , so that eh · nh = 0 (h = 1, 2). [Accordingly, the components of the vectors nh are given by nhx = ehy and nhy = −ehx (Eq. 15.48).] Dotting Eq. 17.3 with nh , one obtains T1 =
j · n2 W e1 · n 2
and
T2 =
j · n1 W. e2 · n 1
(17.6)
17. Statics in three dimensions
669
• Particle on a frictionless slope in the presence of drag What about the tangential component of the contact force? You experience this whenever you try to move a piece of furniture by pushing it on its side. When these exist, we say that there exists “friction” between the two solid objects that are in contact with each other. [Friction is addressed in detail in the appendix (Section 17.8).] Accordingly, we have the following Definition 271 (Frictionless contact). A contact is called frictionless iff it is incapable of producing a tangential force. [A sufficiently close example of a frictionless contact is provided by an object on an icy surface, such as a puck in ice hockey.] Consider a particle moving, at a constant velocity, down an inclined frictionless slope, in the presence of drag. The configuration is shown in Fig. 17.3. [The particle is depicted as rectangular to emphasize that here we do not include rolling (a problem addressed in Section 23.5).]
Fig. 17.3 Particle on a slope
◦ Comment. You might object: “Why are you addressing a particle in motion within a chapter on statics?” The issue is similar to that addressed in Remark 119, p. 573, on the fall of a particle through the air. Indeed, as already pointed out in Remark 134, p. 665, the Newton equilibrium law, which governs statics, is a particular case of the Newton first law, in the sense that in a frame of reference that moves with the particle, the Newton first law reduces to the Newton equilibrium law. Here, we have a similar situation: if the forces are perfectly balanced, the particle moves with constant speed down the slope. Then, to an observer that has the same velocity (inertial frame of reference), the particle appears to be in equilibrium (namely not moving), and the laws of statics apply.
670
Part III. Multivariate calculus and mechanics in three dimensions
The forces that act on the particle are: (i) the weight w = −W j, (ii) the constraint reaction N n, where n is pointed upwards, and (iii) the drag D t, where D = 12 CD A A v 2 (Eq. 14.80), whereas t is a unit vector tangent to the slope and pointing upwards. ◦ Comment. As shown in Fig. 17.3, the forces are not applied at the same point. For the time being, think of the body as a material point; then, the three points of application coincide. [A more satisfactory justification may be obtained by using Theorem 178, p. 684, which indicates that a force f may be moved arbitrarily along the line of action, namely a line through P and parallel to f (Definition 274, p. 684). Accordingly, it’s like all the forces are applied at the center of the particle.] The longitudinal and normal components (namely in the direction of t and n) are, respectively, −W sin α +
1 C A v2 = 0 2 D A
and
N − W cos α = 0.
The first equation yields the speed as 1 2 W sin α v= . C D A A
(17.7)
(17.8)
The second equation yields the constraint reaction as N = W cos α.
17.3 Statics of n particles In this section, we extend the formulation to a system of n particles. Specifically, we first consider the general case (Subsection 17.3.1). Then, we apply the general formulation to the specific case of spring–connected particles (Subsection 17.3.2). We will use the following Definition 272. Consider the forces that two particles in an n-particle system exert on one another. The forces may be grouped into internal forces (which depend upon the relative location of the two particles, such as those due to the springs that connect them) and external forces, namely (prescribed) applied forces, as well as (non–prescribed) constraint reactions. [The nature of the forces is not relevant at this point. The important fact is that the external ones are prescribed and that the mathematical expression for the internal ones in terms of the particle locations is known.]
671
17. Statics in three dimensions
17.3.1 General formulation Consider a general system of particles that are connected by forces (for instance, due to springs). For the sake of generality, the problem is formulated in three dimensions, so that one vector equation corresponds to three scalar equations. [Again, two–dimensional problems are considered as particular cases of the three–dimensional one, in which particle and forces are all contained in one plane.] ◦ Warning. For the sake of clarity and simplicity, at this point we assume that the particles are either (i) fully constrained, or (ii) completely free to move. In other words, we do not address more complicated situations in which a particle is partially constrained, for instance, free to move over a given plane or along a given direction, but constrained along the other directions, such as the bead of an abacus. [You might want to sharpen your skills and see how the formulation may be modified to include these types of problems. If you do, I suggest that you begin by considering the case in which a particle is free to move along a coordinate direction (say along the x-axis) and constrained otherwise.] Akin to the formulation in Section 4.4, the points to which a spring chain is anchored are replaced by particles that are assumed to be constrained, so as not to move at all. This allows us to use the same type of equation for all the points. Let n > 1 denote the number of particles that are free to move and p ≥ 0 the number of the fixed particles, for a total number of particles equal to n + p. The unknowns of the problem are the locations of the n particles that are free to move, and the p forces (constraint reactions) that are necessary to keep the p fixed particles in their prescribed locations, for a total of n + p vector unknowns, namely 3 (n + p) scalar unknowns in three dimensions. On the other hand, the equations that allow one to solve the problem are the equilibrium equations of all the n + p particles. Each one of these equations is conceptually identical to that for a single particle (Eq. 17.1). Let xh identify the location of the h-th particle, with h = 1, . . . , n + p (where h = 1, . . . , n corresponds to the particles that are free to move, whereas h = n + 1, n + 2, . . . , n + p corresponds to the constrained ones). Also, let fhE denote the external force applied to particle h (including the constraint reaction for h = n + 1, . . . , n + p), and fhj the internal force that particle j exerts on particle h. Then, we have fh := fhE +
n+p # j=1
fhj = 0
(h = 1, . . . , n + p),
(17.9)
672
Part III. Multivariate calculus and mechanics in three dimensions
for a total of n + p vector equations, namely 3 (n + p) scalar equations in three dimensions. ◦ Comment. For notational simplicity, I have included all the possible forces, with the understanding that fhj = 0 if the particles h and j do not exert forces on one another. In addition, again for notational simplicity, akin to what we did in Eq. 4.57, I have included fhh in the sum in Eq. 17.9, with the understanding that fhh = 0.
(17.10)
◦ Comment. The first n equations (corresponding to the equilibrium of the free particles) do not contain the constraint reactions, namely the forces that keep the fixed particles in their place. These forces appear only in the last p equations. Thus, the first n equations may be solved independently from the last p. Specifically, the set of the first n equations includes the locations of all the n + p particles. However, the location of the p fixed particles xj (j = n + 1, . . . , n + p) are known. Thus, these n vector equations form a system of 3n scalar equations with 3n scalar unknowns (the components of xh , with h = 1, . . . , n). Once these equations have been solved, the remaining 3p scalar equations may be used to obtain the 3p components of the constraint reactions. Remark 136. Note that in general, the contrary is not true, in the sense that the last p equations cannot be solved unless we have already solved the first n. In other words, in general the constraint reactions cannot be obtained without finding first the location of all the free particles. [This is not always true. There is a notable exception: as shown later, under special circumstances (namely when, in three dimensions, we have exactly a total of six linearly independent scalar constraint reactions), all the constraint reactions may be evaluated directly (Subsection 17.6.5).]
17.3.2 Statics of spring–connected particles Here, we assume that the internal forces are due to springs that connect the n+p particles. Note that, contrary to the one–dimensional case, the direction of the force fhj (in particular, the alignment) is not known a priori (namely before solving the problem). However, we know for sure that the force is parallel to the line that connects the particle Ph to the particle Pj , whose locations are identified by xh and xj , respectively. For the time being, you may consider this fact as part of the definition of a spring. [This is analogous to the definition of a string (Remark 124, p. 589). Contrary to a string, a spring
673
17. Statics in three dimensions
has no fixed length and, by definition, is assumed to withstand compression, even though a spring chain may not (see the warning in Subsection 17.4.2).] ◦ Comment. It should be emphasized that, as we will see, the alignment of the force with the vector xh −xj is true as well for other forces (such as gravity, and that exchanged by atoms). This will be clarified by the introduction of the Newton third law in Subsection 17.6.1. To be specific, the force fhj is parallel to the vector xh −xj , which becomes known only after we have solved the problem. Nonetheless, we can write fhj = −Thj ehj ,
(17.11)
where ehj =
xh − x j = − ejh hj
hj = xh − xj
(17.12)
is a unit vector in the direction of xh −xj (which points towards xh ), whereas Thj = ±fhj represents the magnitude of the force fhj , as well as the pointing. Specifically, we have Thj > 0, if fhj points away from Ph (tension). • Linear springs Next, note that the intensity Thj of the spring force is only a function of the distance hj = xh − xj between the two particles. Accordingly, Thj is not altered by rigid–body translations or rotations of the spring itself (use Eq. 16.68). To express fhj in terms of the locations xh and xj , we assume for simplicity that all the springs are linear, as in Eq. 4.8. Thus, the expression for Thj is similar to that for the one–dimensional case (Eq. 4.8). Specifically, we have U U Thj = κhj hj − hj = κhj xh − xj − hj , (17.13) where κhj > 0, whereas hj (Eq. 17.12) denotes the length of the individual spring in the equilibrium configuration and the superscript U denotes evalU uation when the spring is unloaded. [Note that if hj > hj , Eq. 17.13 gives Thj > 0, as it should. Indeed, in this case, the spring is in tension, and hence the force is directed away from Ph .] If we combine Eqs. 17.11 and 17.13, we have the final expression for the relationship between forces and locations for linear springs, namely U fhj = − κhj hj − hj ehj = − fjh , (17.14)
674
Part III. Multivariate calculus and mechanics in three dimensions
since κhj = κjh and ehj = −ejh (Eq. 17.12). Substituting this into Eq. 17.9 yields the desired system of 3 (n + p) algebraic equations in 3 (n + p) unknowns, namely: (i) the locations of the particles free to move (xh , with h = 1, . . . , n), and (ii) the constraint reactions for the particles that are fixed (fh , with h = n + 1, n + 2, . . . , n + p). Note that the equations are algebraic, in that they involve only algebraic operations. Even though we have assumed the springs to be linear, the equations are not necessarily linear, because hj and ehj are nonlinear functions of xj (Eq. 17.12). [A notable exception are the equations for the longitudinal equilibrium of a spring chain, discussed in Sections 4.3–4.6.]
17.4 Chains of spring–connected particles In this section, we consider a system composed of a chain of spring–connected particles that are anchored at both endpoints. The problem is analogous to that addressed in Sections 4.3 (longitudinal equilibrium of spring–particle chains). However, the formulation presented here is more general, because now the forces are not necessarily longitudinal. Specifically, they may have components that are longitudinal, as well as lateral (i.e., transversal, namely perpendicular to the undeformed chain). Throughout the section, the springs are assumed to be linear. A brief overview of the material covered in this rather long section is in order. We begin with the general three–dimensional formulation (Subsection 17.4.1). Next, we introduce a simplifying approach that will allow us to have a better understanding of the behavior of this system. Specifically, we split the overall problem into a two–part sequence. In the first part of the problem, we anchor one endpoint and then stretch the chain so that the other endpoint is anchored as well. Then, we address the equilibrium in this configuration, referred to as the stretched configuration (Subsection 17.4.2). In the second part, we apply the forces fh to the free particles, and consider the exact equilibrium in the resulting configuration, referred to as the loaded configuration (Subsection 17.4.3, which is limited to planar configurations). Finally, in Subsection 17.4.4 (also limited to planar configurations), we present the linearized formulation for the problem under consideration, and show that in this case the two problems of longitudinal and lateral equilibrium decouple, in the sense that the solution for longitudinal and that for lateral equilibrium may be obtained independently from each other. Of course, they have to be superimposed to obtain the overall solution.
675
17. Statics in three dimensions
17.4.1 Exact formulation for spring–particle chains
♥
It is apparent that the problem under consideration is a particular case of the general one addressed in the preceding section. Now we have n particles free to move, and two end particles that are fixed: p = 2. Contrary to the preceding section, where the fixed particles correspond to the numbers n + 1, . . . , n + p, here we use the more convenient numbering system introduced in Sections 4.3–4.6. Specifically, x0 and xn+1 are the two (fixed) end particles, whereas xh , with h = 1, . . . , n, are the free particles. Of course, xh and xh+1 (h = 0, . . . , n) denote two consecutive particles, so that the particle h is springs– connected only to the particles h − 1 and h + 1 (h = 1, . . . , n). Accordingly, Eq. 17.9 reduces to fh,h+1 + fh,h−1 + fhE = 0
(h = 1, . . . , n),
(17.15)
namely (use Eq. 17.11) −Th,h+1 eh,h+1 − Th,h−1 eh,h−1 + fhE = 0
(h = 1, . . . , n),
(17.16)
U where eh,h±1 = (xh − xh±1 )/h,h±1 (Eq. 17.12) and Thj = κhj (hj − hj ) (Eq. 17.13), so that Thj > 0 corresponds to a spring under tension. For the end particles, the locations x0 and xn+1 are given, whereas the E constraint reactions, f0E and fn+1 , are unknown. The corresponding equilibrium equations are
f0,1 + f0E = 0
and
E fn,n+1 + fn+1 = 0.
(17.17)
Note again that the equations for the particle locations may be solved without addressing the equations for the constraint reactions. Indeed, Eq. 17.16 corresponds to a total of 3n scalar equations. The corresponding unknowns are the three components of the locations of the n free particles, for a total of 3n scalar unknowns. After solving these equations, we may use Eq. E 17.17 to obtain the constraint reactions f0E and fn+1 . Equation 17.16 presents the exact formulation for the static equilibrium of a chain of spring–connected particles. As in the preceding subsection, these equations are, in general, algebraic and nonlinear. In the following subsections, we address the linearized equations corresponding the small displacements with respect to a stretched configuration.
676
Part III. Multivariate calculus and mechanics in three dimensions
17.4.2 Equilibrium in the stretched configuration
♥
As mentioned above, we split the problem into a two–part sequence. In the first, we stretch the chain and anchor the two endpoints. In the second part, we introduce the applied forces fhE . In this subsection, we address the first part, namely the equilibrium attained when the chain has been stretched and both endpoints have been anchored, while the forces fhE (h = 1, . . . , n) have not yet been applied. In the original (unstretched) configuration, the particles are assumed to lie on a straight line that connects the endpoints. Let us choose the x-axis to be directed along such a line, with the origin coinciding with the zeroth particle (so that x0 = 0), and all the other particles located along the positive side of the x-axis, so that xh > xh−1
(h = 1, . . . , n).
(17.18)
◦ Warning. I will use the subscript or superscript U (for Unstretched ) to refer to the original (unstretched) configuration, and the subscript or superscript S (for Stretched, unloaded) to refer to the configuration attained after the chain has been stretched and both endpoints have been anchored. The term “stretch” implies that the length of the chain gets longer. This in turn implies that the chain is in tension. It seems appropriate to make this assumption explicit. [The reason for it is similar to that used for strings, which — when we try to compress them by bringing the endpoints closer — become slack, namely can take any shape (different from a straight line) compatible with the string length.] In our case, if a mass–spring chain is under compression the straight–line solution exists but is unstable. This statement is apparent from a physical point of view. [Hint: Add constraints that allow the particle to have only longitudinal motion. Then remove the constraints. What happens? It buckles! The corresponding mathematical formulation will be addressed in Vol. II.] Consider the stretched configuration. This is obtained by moving the point Pn+1 along the x-axis, from (U, 0) to (S, 0) (where S > U), with fhE = 0 (h = 1, . . . , n). Accordingly, Eq. 17.9 yields S S fh,h+1 = − fh,h−1
(h = 1, . . . , n).
(17.19)
Next, let us consider the three–dimensional extension of the Newton third law of action and reaction. [In one–dimensional problems, we had Fkh = −Fhk , Eq. 4.6.] In this particularly simple case, Eq. 4.6 is extended by S S fh,h−1 = −fh−1,h .
(17.20)
677
17. Statics in three dimensions
[The general three–dimensional Newton third law of action and reaction includes Eq. 17.20 as a particular case, but has additional requirements (Remark 42, p. 152). More on this in Eqs. 17.59 and 17.61, after we introduce the definition of moment.] Combining Eqs. 17.19 and 17.20, we have S S fh,h+1 = fh−1,h =: fS
(h = 1, . . . , n),
(17.21)
where fS is a constant force. In other words, in the stretched configuration, the force that a spring exerts on the preceding particle is the same for all the particles. This in turn implies that the forces are coaligned, and so are the vectors xh − xj (use Eqs. 17.11 and 17.12). Accordingly, in the stretched configuration, the particles lie along the x-axis, as intuitively expected. The fact that fS is aligned with the x-axis yields (Eqs. 17.11 and 17.12) eh,h+1 = −i and S fS = fh,h+1 = TS i
(h = 0, . . . , n),
where, according to Eq. 17.13, we have S U TS = κh,h+1 h,h+1 >0 − h,h+1
(h = 0, . . . , n),
(17.22)
(17.23)
with S S h,h+1 = xh+1 − xhS
and
U U h,h+1 = xh+1 − xhU .
(17.24)
S U [Note that TS > 0, since h,h+1 > h,h+1 , because we have assumed the chain to be stretched. Accordingly, the force fS exerted by a particle on the preceding one is directed like i, namely away from xh .] Next, we want to evaluate TS. To this end, write Eq. 17.23 as
1 S U T = h,h+1 − h,h+1 κh,h+1 S
(h = 0, 1, . . . , n),
(17.25)
and note that the sum of the lengths of individual springs equals the total length of the spring chain, in both the unstretched and stretched configurations, namely n # h=0
U h,h+1 = U
and
n #
S h,h+1 = S .
(17.26)
h=0
Combining the last two equations, we have TS = κC S − U ,
(17.27)
678
Part III. Multivariate calculus and mechanics in three dimensions
where κC denotes the overall stiffness of the spring–particle chain, and is given by n
# 1 1 = . κC κh,h+1
(17.28)
h=0
◦ Comment. If all the springs have the same individual stiffness constant, κ, the above equation yields 1/κC = n/κ, so that κC =
κ n
and
TS = κ
S − U . n
(17.29)
Indeed, the elongation of a single spring equals 1/n the overall elongation of n springs.
17.4.3 The loaded configuration (planar case)
♥
For the sake of simplicity, in the rest of this section, we limit ourselves to the planar problem. Specifically, we assume that all the forces lie within a plane that includes the x-axis. Let the y-axis denote the axis perpendicular to the x-axis within such a plane, with origin at x = x0 .
Fig. 17.4 Spring–particle chain
The scalar equations may be obtained by taking the x- and y-components of the equilibrium equation of each particle (Eq. 17.16). Let θh,h+1 ∈ (−π/2, π/2) denote the angle that the segment Ph Ph+1 forms with the xaxis, so that (see Fig. 17.4) tan θh,h+1 =
yh+1 − yh xh+1 − xh
(h = 0, 1, . . . , n),
(17.30)
whereas eh,h+1 = − cos θh,h+1 i − sin θh,h+1 j = −eh+1,h
(h = 0, 1, . . . , n). (17.31)
679
17. Statics in three dimensions
Hence, Eq. 17.16 (namely −Th,h+1 eh,h+1 − Th,h−1 eh,h−1 + fhE = 0) yields (x)
= 0,
(17.32)
(y) fh
= 0,
(17.33)
Th,h+1 cos θh,h+1 − Th−1,h cos θh−1,h + fh Th,h+1 sin θh,h+1 − Th−1,h sin θh−1,h + where h = 1, . . . , n, and we have set (x)
(y)
fhE = fh i + fh j.
(17.34)
U S S U Finally, noting that h,h+1 − h,h+1 = (h,h+1 − h,h+1 ) + (h,h+1 − h,h+1 ), U S U and using Th,h+1 = κh,h+1 (h,h+1 − h,h+1 ) and TS = κh,h+1 (h,h+1 − h,h+1 ) (Eqs. 17.13 and 17.23), one obtains S Th,h+1 = κh,h+1 h,h+1 − h,h+1 + TS . (17.35)
Substituting the above equations into Eqs. 17.32 and 17.33 and using Eq. 17.30 to express cos θh,h+1 and sin θh,h+1 , one obtains the desired system of 2n equations in 2n unknowns, namely the two components (xh and yh ) of xh (h = 1, . . . , n).
17.4.4 Linearized formulation (planar case)
♥
Here, we apply the considerations on linearization (Section 13.8) to the case of interest. Let uh denote the displacement of the particle h from the stretched configuration: uh := xh − xhS = uh i + vh j
(h = 1, 2, . . . , n),
(17.36)
where xh and xhS identify the location of the particle h in the fully loaded and in the stretched (unloaded) configurations, respectively. Next, for the sake of simplicity, assume that all the springs are equal, namely that they have the same constant, κ, and the same unstretched length, U/n. This implies that they have the same length ˘S := S/n,
(17.37)
in the stretched configuration as well, as expected (use Eq. 17.23). The small–disturbance hypothesis may be introduced by assuming that the displacements are small compared to the length of the springs in the stretched configuration: uh ˘S.
(17.38)
680
Part III. Multivariate calculus and mechanics in three dimensions
In this case, for negligible nonlinear terms, we have S S xh+1 − xh = (xh+1 + uh+1 ) − (xhS + uh ) = h,h+1 + uh+1 − uh , S yh+1 − yh = (yh+1 + vh+1 ) − (yhS + vh ) = vh+1 − vh ,
(17.39)
since yhS = 0 (h = 1, . . . , n. Hence, neglecting nonlinear terms, we have (recall S that h,h+1 = ˘S := S/n, Eq. 17.37) h,h+1 xh+1 − xh = ˘S + uh+1 − uh ,
(17.40)
as you may verify. In addition, recalling that, neglecting nonlinear terms, sin α α, cos α 1 and tan α α (Eqs. 8.85, 8.86 and 8.87), we have sin θh,h+1 tan θh,h+1 =
vh+1 − vh 1 vh+1 − vh ˘ ˘ S + uh+1 − uh S
(17.41)
and cos θh,h+1 1. Combining with Eqs. 17.32 and 17.33, and neglecting higher–order terms, one obtains (x)
Th,h+1 − Th−1,h + fh = 0, vh+1 − vh vh − vh−1 (y) Th,h+1 − Th−1,h + fh = 0, ˘ ˘ S
(17.42) (17.43)
S
(x)
(y)
where Th,h+1 is given by Eq. 17.35, whereas fh and fh denote the x– and y–components of fhE . Next, consider the corresponding small–displacement approximation of Eqs. 17.42 and 17.43. Note that h,h+1 − ˘S uh+1 −uh (Eq. 17.40), combined with Th,h+1 = κ (h,h+1 − ˘S) + TS (Eq. 17.35), yields Th,h+1 κ (uh+1 − uh ) + TS.
(17.44)
The next question is: “Can we neglect the terms κ(uh+1 − uh ) when compared to TS?” Here, a curious thing happens. The answer is: “It depends!” Indeed, if we substitute Eq. 17.44 into Eq. 17.42, we are left with (x)
κ (uh+1 − 2uh + uh−1 ) + fh
=0
(h = 1, . . . , n),
(17.45)
with u0 = un+1 = 0. Therefore, in this case, clearly the terms under consideration (namely κuj with j = h, h±1 in Eq. 17.45) cannot be neglected, since they are linear! [Indeed, TS has disappeared from the equation, and hence the linear terms are the largest spring–related ones left in the equation.] On the other hand, if we substitute Eq. 17.44 into Eq. 17.43, the terms under consideration yield quadratic terms, and hence may be neglected. [In-
681
17. Statics in three dimensions
deed, terms of the type uh vh are much smaller than the other spring–related terms, which are linear.] Accordingly, we are left with TS (y) (vh+1 − 2vh + vh−1 ) + fh = 0 ˘
(h = 1, . . . , n),
(17.46)
S
with v0 = vn+1 = 0. Equations 17.45 and 17.46 correspond to two uncoupled n × n linear algebraic systems, as desired. Let us analyze the two separately.
• Longitudinal equilibrium In matrix notation, Eq. 17.45 may be written as K u = fx ,
(17.47)
(x) where u = uj and fx = fh , whereas K is given by K = khj = κ Q, with Q given by Eq. 4.20, ⎡ 2 −1 0 ⎢−1 2 −1 ⎢ ⎢. . . . . . . . . ⎢ Q=⎢ ⎢0 0 0 ⎢. . . . . . . . . ⎢ ⎣0 0 0 0 0 0
(17.48)
namely ... 0 0 0 ... 0 0 0 ... ... ... ... . . . − 1 2 −1 ... ... ... ... ... 0 0 0 ... 0 0 0
⎤ ... 0 0 0 ... 0 0 0 ⎥ ⎥ . . . . . . . . . . . .⎥ ⎥ ... 0 0 0 ⎥ ⎥. . . . . . . . . . . . .⎥ ⎥ . . . −1 2 −1⎦ . . . 0 −1 2
(17.49)
Note that Eq. 17.45 (and hence Eq. 17.47) does not contain any of the lateral components of the displacements. Moreover, the matrix Q in Eq. 17.49 coincides with that in Eq. 4.20. Therefore, Eq. 17.47 is fully equivalent to Eq. 4.14, which was obtained under the assumption of purely longitudinal motion. In other words, within the small–displacement approximation, the longitudinal equilibrium is not affected by the presence of lateral motion. We may add that all the considerations regarding the existence and uniqueness of the solution (Subsubsections “Existence and uniqueness of the solution,” on p. 158, and “Alternate proof of existence and uniqueness,” on p. 159) apply here as well.
682
Part III. Multivariate calculus and mechanics in three dimensions
• Lateral equilibrium Similarly, Eq. 17.46 may be written as K v = fy ,
(17.50)
(y) where v = vj and fy = fh , whereas K is given by TS K = khj = Q, ˘S
(17.51)
with Q given again by Eq. 17.49 and S − U TS =κ . ˘ S
(17.52)
S
[Use Eqs. 17.27 and 17.29, as well as ˘S := S/n, Eq. 17.37.] Note that Eq. 17.46 (and hence Eq. 17.50) does not contain any of the longitudinal components of the displacements. Thus, within the small– displacement approximation, the lateral equilibrium is not affected by the presence of longitudinal motion. Note also that, for a given value of TS, the governing equations for lateral motion are independent of κ, which however determines TS, for a given value of the elongation applied. Moreover, the above matrix differs from that for longitudinal motion (Eq. 4.20) only because of the factor TS/˘S, instead of κ. Hence, all the considerations on the solution existence and uniqueness apply in this case as well.
• Summary We may conclude this subsection by emphasizing that we have obtained the result anticipated in the beginning — under the small–displacement assumption, the two problems of longitudinal and lateral equilibrium decouple. The overall the solution is obtained as the sum of the two. [We may use the superposition theorem (Theorem 34, p. 129), because of the linearity of the problem.]
17. Statics in three dimensions
683
17.5 Moment of a force In the next three sections, we shift gears and address new types of problems. Specifically, in this section, we introduce a new tool (namely the so–called moment of a force), which is used to take into account the point of application of a force. Next, in Section 17.6, we use the definition of moment to introduce the Newton third law in three dimensions. This, in turn, allows us to uncover some properties of a collection of particles, such as the vanishing of the sum of all the forces that the particles exert on each other, and of all the corresponding moments. These results are then used in Section 17.7 to address the equilibrium of rigid systems of particles (rigid bodies), namely systems in which the mutual distance of the various particles (atoms and molecules, if you wish) remains virtually unchanged when the forces fh are applied. As stated in Definition 269, p. 665, forces are applied vectors. To identify the point of application of a force, it is convenient to introduce the following Definition 273 (Moment of a force). Consider a force f with its point of application P , and an arbitrary point O. The moment mO (f , P ) of the force f , applied at the point P , with respect to the point O is a vector defined by mO (f , P ) := (xP − xO ) × f ,
(17.53)
where the vector xO identifies the location of the point O with respect to an arbitrary origin, whereas xP identifies the location, also with respect to the origin, of the point P where f is applied. The point O with respect to which we evaluate the moment will be referred to as the pivot of the moment, or more briefly as the pivot. [Note that while forces are applied vectors, moments are not.] ◦ Comment. The comments in Remark 135, p. 666, apply here as well. Specifically, you might wonder why I am using the symbol m (lower–case boldface letter) to indicate moments, instead of the more common M (capital boldface letter). The reason is again that, as stated in Definition 237, p. 596, in this book capital boldface letters are reserved for tensors, a notion introduced in Section 23.9. ◦ Warning. The above definition of moment — while correct from a theoretical point of view — does not help in clarifying the physical meaning of this notion, nor its relevance. These will emerge from the material presented later in this chapter (specifically, see Section 17.7, in particular Subsection 17.7.1 on levers and balance scales). However, before we do this, a few comments are in order if we want to have a better understanding of the implication of the above definition.
684
Part III. Multivariate calculus and mechanics in three dimensions
Recall that in the equilibrium of an individual particle, we have tacitly assumed that f is applied at the location P of the particle. Here, we analyze this aspect at a deeper level. To this end, consider the following Definition 274 (Line of action). Consider a force f applied at a point P . The point P and the alignment of f determine a line. Such a line is called the line of action of the force f applied at a point P . We have the following Theorem 178. Given two arbitrary points, P and Q, both on the line of action of the force f , we have mO (f , P ) = mO (f , Q).
(17.54)
◦ Proof : Indeed, mO (f , P ) − mO (f , Q) = (xP − xO ) × f − (xQ − xO ) × f = (xP − xQ) × f = 0,
(17.55)
because xP − xQ and f are parallel (by definition of line of action), and this implies that their cross product vanishes (Eq. 15.60). ◦ Comment. It should be emphasized that the above theorem indicates that the moment of the force f with respect to the point O depends upon the line of action, but not upon the specific point of the line of action where the force is applied. In other words, a force may be moved arbitrarily along the line of action without affecting its moment with respect to an arbitrary point O. As we will see, what really matters is not the point of application but the line of action. ◦ Warning. From now on, xP is replaced by x (which is understood to be any point on the line of action), and the notation mO (f , P ) will be replaced with the less cumbersome mO , whenever the force f and the corresponding line of action are apparent from the context.
17.5.1 Properties of moments Here, we address some of the consequences of the definition of moments. To begin with, let us examine how the moment of a given force f is affected by the pivot O with respect to which it is evaluated. We have the following Theorem 179. Consider a force f . The difference between the moments, mQ and mO (with pivots Q and O respectively) is given by
685
17. Statics in three dimensions
mQ − mO = (xO − xQ) × f .
(17.56)
◦ Proof : Let x denote any point on the line of action. We have mQ − mO = (x − xQ) × f − (x − xO ) × f = (xO − xQ) × f , in agreement with Eq. 17.56.
(17.57)
◦ Comment. Equation 17.56 tells us that the moment does not change if the pivot O is moved along a line parallel to f . This is expected, since a force may be moved along its line of action.
• Arm of moment Let us introduce the following Definition 275 (Arm of the moment). The arm b ≥ 0 of the moment of a force f is the distance between the line of action of f and the pivot. [Remember that such a distance is measured along the normal to the line of action (Remark 126, p. 604).] Remark 137. Note that b = (x − xO )N, where x denotes any point on the line of action of the force and xO denotes any point on the axis of the moment, whereas the subscript N denotes the portion of x − xO normal to f (Eq. 15.32). Then, using a × b = a × bN (Eq. 15.63), we have that the moment mO = (x − xO ) × f (where x is any point on the line of action of f , Eq. 17.53) is a vector, whose amplitude is given by mO = f b = A,
(17.58)
where A is the area of the rectangle with sides equal to f and b. Moreover, the direction is normal to the plane defined by f and O, with pointing such that the right–hand rule is satisfied, namely that the moment sees the force as pointing counterclockwise (Definition 240, p. 606). ◦ Comment. The arm of a moment is particularly useful for a planar problem. In such a case, from a three–dimensional viewpoint, the moment has only one nonzero component, namely that along the normal to the plane, and hence it is fully identified by a scalar. Its sign is positive (negative) if the force is pointed counterclockwise (clockwise), with respect to the pivot.
686
Part III. Multivariate calculus and mechanics in three dimensions
17.6 Particle systems In this section, we uncover new properties of systems of n particles in equilibrium. First, we exploit the definition of moment of a force to introduce the three–dimensional Newton third law. Then, we introduce some consequences of the Newton third law, which will be particularly useful in addressing rigid systems of particles, introduced in Section 17.7.
17.6.1 The Newton third law Here, we introduce the following Principle 3 (The Newton third law) The Newton third law (or law of action and reaction) states that the internal forces that the particles exert on one another 1. are equal and opposite vectors, and 2. their line of action coincides with the line connecting the two particles. Consider two particles located respectively at xh and xk . Let fhk denote the force that the particle k exerts on the particle h, whose point of application is by definition xh . Ditto for fkh . The first portion of the Newton third law may be expressed in mathematical terms as fhk + fkh = 0,
for all h, k.
(17.59)
On the other hand, the second portion of the Newton third law may be expressed as (xh − xk ) × fhk = 0,
for all h, k,
(17.60)
namely that fhk is parallel to xh −xk , since the forces have xh and xk as their point of application (use Eq. 15.60). In addition, let xO identify the location of an arbitrary point O in R3 . Then, Eq. 17.60 is equivalent to mhkO + mkhO = 0,
for all h, k,
(17.61)
where mhkO := (xh − xO ) × fhk
(17.62)
denotes the moment with respect to xO of the force fhk that the particle k exerts on the particle h.
687
17. Statics in three dimensions
Indeed, we have (use Eqs. 15.60 and 17.60) mhkO + mkhO = (xh − xO ) × fhk + (xk − xO ) × fkh = (xh − xk ) × fkh = 0.
(17.63)
◦ Comment. This result may be obtained also in a more intuitive (gut–level) way, by recalling that in evaluating the moments the forces may be moved at our convenience along their line of action. Thus, moving them to the same point, and recalling that they have opposite direction, we obtain that the moments are equal and opposite, which is equivalent to Eq. 17.63. ◦ An important consideration. It should be emphasized that the notion of moment arises naturally in connection with the Newton third law. Indeed, its huge relevance in the field of mechanics is due exclusively to this law. In the absence of this law, we would have little use for the notion of moment.
17.6.2 Resultant force Here, we examine some of the consequences of the Newton third law. Specifically, here we introduce the definition of resultant force. [Resultant moments are addressed in the next subsection.] Consider a system of n particles in R3 . Recall that the total force fh acting on the particle h (sum of external and internal forces) is given by (Eq. 17.9) fh = fhE + fhI ,
where fhI =
n #
fhk ,
(17.64)
k=1
where fhI is the sum of all the internal forces fhk acting on the particle h, with fhk denoting the force that the particle k exerts on the particle h. [Note again that fhh is meaningless. Nonetheless, for the sake of notational simplicity, we have included it in Eq. 17.64, with the convention that fhh = 0 (Eq. 17.10).] ◦ Comment. It should be recognized that the identification of a force as either internal or external depends upon which particles we choose to include in the system. Consider, for instance, the system of particles examined in Section 17.3, where each particle is allowed to be spring–connected to any other particle. We can make two choices. First, we can include all the particles, be they free or constrained. In this case, all the spring forces are considered internal, whereas the constraint reactions are included in our system of external forces. In other words, with this choice, the external forces comprise both the applied forces as well as the constraint reactions. We have the following
688
Part III. Multivariate calculus and mechanics in three dimensions
Definition 276 (Resultant force). Given the forces fh (h = 1, . . . , n) acting on n particles, the vector f :=
n #
(17.65)
fh
h=1
is called the resultant force, or the resultant of the forces. ◦ Warning. It may be noted that some authors use the term net force to refer to what we call the resultant force. These authors use the term “resultant force” as inclusive of what we call resultant moment (Definition 277, p. 689). We have the following Theorem 180 (Internal resultant force theorem). Given a system of n particles in equilibrium, we have that the resultant of all the internal forces acting on these n particles vanish, namely (use Eq. 17.64) n #
fhI =
h=1
n #
fhk = 0.
(17.66)
h,k=1
◦ Proof : Summing fhk over h and k (with h, k = 1, . . . , n), we have that the resulting sum contains both fhk and fkh . Thus, recalling that fhh = 0 (Eq. 17.10) and using fhk + fkh = 0 (Eq. 17.59), one obtains Eq. 17.66. As a consequence, we have the following Theorem 181 (Resultant force theorem). Given a system of n particles in equilibrium, we have that the resultant of all the forces applied to the n particles equals the resultant force fE due solely to the external forces. In other words, we have f :=
n #
fh = fE,
(17.67)
h=1
where fE :=
n #
fhE .
(17.68)
h=1
◦ Proof : Indeed, we have f :=
n # h=1
fh =
n # h=1
fhE +
n #
fhk =
h,k=1
n # h=1
fhE =: fE,
(17.69)
689
17. Statics in three dimensions
"n in agreement with Eq. 17.67. [Hint: Start from f := h=1 fh (Eq. 17.65), " n with fh = fhE + k=1 fhk (Eq. 17.64), and use Eqs. 17.66 and 17.68.]
17.6.3 Resultant moment Next, in order to exploit the second part of the Newton third law (Eq. 17.61), it is useful to introduce the following Definition 277 (Resultant moment). Given the forces fh (h = 1, . . . , n) acting on n particles located at xh , the vector mO :=
n ) #
(xh − xO ) × fh
*
(17.70)
h=1
is called the resultant moment (or the resultant of the moments) with respect to the point O. We have the following2 Theorem 182 (Varignon theorem). For a given pivot O, the sum of the moments of several forces fh , all applied at the same point, P , equals the "n moment of the sum of the forces f = h=1 fh , also applied at P . ◦ Proof : Indeed, we have mO =
n ) #
* (xP − xO ) × fh = (xP − xO ) × f ,
(17.71)
h=1
in agreement with the theorem.
We also have the following Theorem 183 (Internal resultant moment theorem). Given a system of n particles Ph , we have that the resultant moment, with respect to an arbitrary pivot O, due solely to the internal forces vanish, namely n #
mhkO = 0,
(17.72)
h,k=1
where mhkO := (xh − xO ) × fhk . 2
Named after the French mathematician Pierre Varignon (1654–1722).
(17.73)
690
Part III. Multivariate calculus and mechanics in three dimensions
◦ Proof : Note that the sum in Eq. 17.72 includes both mpq and mqp . Thus, using the fact that mhkO + mkhO = 0 (Eq. 17.61) and recalling that fhh = 0 (Eq. 17.10), one obtains Eq. 17.72. As a consequence, we have the following Theorem 184 (Resultant moment theorem). Given a system of n particles, we have that the resultant of all the moments applied to the n particles equals the resultant of the moments due solely to the external forces, namely mO :=
n ) #
* (xh − xO ) × fh = mOE ,
(17.74)
h=1
where mOE :=
n ) #
(xh − xO ) × fhE
*
(17.75)
h=1
is the external resultant moment. ◦ Proof : We have mO = =
n ) #
n n * # # (xh − xO ) × fhE + (xh − xO ) × fh = fhk
h=1 n ) #
h=1
k=1
*
(xh − xO ) × fhE = mOE ,
(17.76)
h=1
"n in agreement with Eq. 17.74. [Hint: Start from mO := h=1 (xh − xO ) × fh "n (Eq. 17.70), where again fh := fhE + k=1 fhk (Eq. 17.64), and use Eqs. 17.72, 17.73, and 17.75.]
17.6.4 Equipollent systems. Torques and wrenches For future reference, it is convenient to introduce the following Definition 278 (Equipollent systems). Two systems of forces and moments are called equipollent, iff they have the same resultant force and the same resultant moment.
17. Statics in three dimensions
691
• Pure moments. Torques We have the following Definition 279 (Pure moment). A pure moment denotes a system of forces with zero resultant force (but not zero resultant moment). We have the following Definition 280 (Torque). A torque is a system of two equal and opposite forces, having different lines of action. ◦ Warning. Some authors use the term “torque” to indicate what we call the “moment due to a torque.” I prefer to avoid the use of the term torque with this alternate meaning, so as to avoid any possible source of confusion. [Also, the term couple is sometimes used as equivalent to torque. I prefer to avoid the term “couple” since it has multiple meanings, and use instead only the term “torque” with the meaning in Definition 280 above.] It is apparent that a torque has a zero resultant force. Thus, a torque corresponds to a pure moment. Also, given a torque, the corresponding resultant moment is such that: (i) its magnitude equals the common magnitude of the two forces multiplied by the distance of their (parallel) lines of action, and (ii) its direction is normal to the plane that contains the two forces, with pointing such that the moment sees the torque as a counterclockwise system (Remark 137, p. 685). We have the following theorems: Theorem 185 (Pivot for a torque). For a torque, there is no need to specify the pivot. ◦ Proof : Indeed, consider two forces f1 and f2 , with f1 + f2 = 0. For any two pivots O and Q arbitrarily chosen, we have, in analogy with Eq. 17.56, mQ − mO = (xO − xQ) × (f1 + f2 ) = 0. Theorem 186 (Torque equipollent to a pure moment). We can always construct a torque equipollent to any prescribed pure moment. ◦ Proof : Given a pure moment, m, consider any plane normal to m. On this plane, introduce an arbitrary force f (arbitrary magnitude and arbitrary direction). Add to this a second force, −f , with a line of action located on the same plane, and distant from the first by a quantity d = m/f . [There exist two lines of action satisfying the above condition. The correct one is that for which the moment sees the torque as a counterclockwise system (Remark 137, p. 685).] The torque thereby constructed is equipollent to m, as you may verify.
692
Part III. Multivariate calculus and mechanics in three dimensions
Theorem 187 (Change of line of action). A force f applied at a point xO is equipollent to the same force applied to a different a point xQ, plus a moment equal to m = (xO − xQ) × f . ◦ Proof : To the original force f applied at xO , add two forces, f and −f , both applied at xQ. It is apparent that this addition alters neither the resultant force nor the resultant moment. Thus, the new system is equipollent to the original one and it may be conceived as follows: a force f applied at xQ plus a torque composed of f applied at xO and −f applied at xQ, whose moment is indeed m = (xO − xQ) × f , as you may verify.
• Wrenches We have the following Definition 281 (Wrench). A system composed of a force and a moment parallel to the force is called a wrench. ◦ Comment. The name stems from the fact that this system produces a combined effect of push and twist, akin to that obtained with a mechanical wrench. [To be precise, such an effect is obtained with a specific wrench, the so–called lug wrench (sometimes referred to as a tire iron), the one shaped like a cross, which you might have used to remove the lug nuts (bolts) of your car wheels. Indeed, any double–handled wrench (not to mention a screwdriver) produces a similar effect.] We have the following Theorem 188 (Equipollence of any system to a wrench). Any system of forces and moments is equipollent to a wrench. ◦ Proof : Consider the resultant force, f , and the resultant moment, mO . Using the parallel– and normal–vector decomposition (Subsection 15.2.2), let us decompose the moment as mO = mOP +mON , with mOP and mON respectively parallel and normal to f . Then, following the construction introduced in the proof of Theorem 186 above, we can replace mON with a torque, with one of the two forces being equal and opposite to f , and having the same line of action. The force −f and the original force f cancel out, and we are left with a system composed of a force equal to the original one, but with a different line of action, and of a moment mOP , parallel by definition to f .
693
17. Statics in three dimensions
17.6.5 Considerations on constrained particle systems The considerations presented in this subsection are valid for a generic particle system. However, they will turn out to be of particular interest for the statics of rigid systems, which are introduced in Section 17.7. Remark 138. It goes without saying that the particles are assumed to “path–connected,” in the sense that from any particle we can reach any other particle through a path composed of a sequence of spring and particles. Consider an n-particle system in static equilibrium. Recall that we have "n "n E h=1 fh = fE (Eq. 17.69) and h=1 (xh − xO ) × fh = mO (Eq. 17.74). For the equilibrium of the h-the particle, the force fh must vanish (Eq. 17.9), namely fh = 0. Combining these equations, we obtain that, if the system of all n particles is in static equilibrium, then we necessarily have fE = 0
and
mOE = 0.
(17.77)
In plain language, a system can be in equilibrium only if the resultant force and the resultant moment vanish. Thus, Eq. 17.77 yields fE =
n # h=1
fhA +
p #
fhR = 0
h=1
and
mOE =
n # h=1
mhA +
p #
mhR = 0. (17.78)
h=1
On the other hand, we may choose to include only the free particles: in this case, the constraint reactions are not included in our system of forces, but the forces of those springs that are connected to the constrained particles are now included with the external forces. [Of course, the two formulations are equivalent, as you may verify.]
• Statically determinate particle systems We have the following Definition 282 (Statically determinate system). In three dimensions, an n-particle system is called statically determinate (or isostatic), iff (i) there exist exactly six scalar constraint reactions, and (ii) the six scalar equation in Eq. 17.78 are linearly independent with respect to these six unknowns. In this case, the six constraint reactions may be determined independently of the deformations. [For two–dimensional problems, just replace six with three.]
694
Part III. Multivariate calculus and mechanics in three dimensions
◦ Warning. It should be noted that two fixed particles do not provide us with six linearly independent constraint reactions. This is apparent from the fact that the corresponding system can rotate around the line through the two particles. To have a statically determinate system we must consider partially constrained particles. To give you an example, consider: (i) a first particle fully constrained, placed at the origin, (ii) a second particle placed on the x-axis and constrained along the y- and z-directions, but free to move in the x-direction, and (iii) a third particle place on the y-axis and free to move in the (x, y)-plane, but constrained in the z-direction. We have six constraint reactions that are sufficient to constrain the system. ◦ Comment. For a system that is not statically determinate, we have two possibilities. One occurs, for instance, if there exist less than six scalar constraint reactions (less than three, in two dimensions). This is an under– constrained system — the solution to the overall problem may not exist (unbalanced system), or, if it exists, it is not unique. [This is analogous to the one–dimensional unanchored spring–particle chains (Section 4.6).] A second possibility occurs when there exist more than six linearly independent constraint reactions. In this case, the constraint reactions may be obtained only after we have evaluated the deformations (Remark 136, p. 672).
17.7 Rigid systems. Rigid bodies Here, we study what happens to a system of n spring–connected particles in equilibrium, when the stiffness κhj of each spring becomes larger and larger. We can see, from Eq. 17.14, that, if fhj remains constant as we vary κhj , then U hj − hj becomes smaller and smaller. In other words, the distance between the various particles tends to remain unchanged. In general, even when the forces are not due to springs (for instance gravity, or the force exchanged by atoms), we have the following definitions: Definition 283 (Rigid n-particle system). A system of n particles is called a rigid n-particle system iff the particles are rigidly connected to each other, namely iff the distance of each particle from all the others remains unchanged. [Sometimes a system that is not rigid may be approximated with a rigid one, provided that the relative motion of the particles is negligible, as in the case of a system of n spring–connected particles with very stiff springs, subject to relatively small forces.] Definition 284 (Rigid body). Consider a single piece of solid material in which the deformations are negligible. Note that this piece may be considered
17. Statics in three dimensions
695
as composed of particles (namely atoms) that are rigidly connected to each other. Hence, it may be treated as a rigid n-particle system. Systems of this type are called rigid bodies. Of course, a rigid system (in particular a rigid body) may attain equilibrium only if the resultant forces and moments vanish, namely fE = 0 and mOE = 0, since these conditions are necessary for any system (Eq. 17.77). This is particularly important in the case of statically determinate systems, namely systems with six linearly independent scalar constraint reactions due to the constraints (Definition 282, p. 693). In this case, as mentioned above, the reactions may be obtained directly from Eq. 17.77. Remark 139. Let us go back to the definition of a string (Remark 124, p. 589), namely that the forces acting on its endpoints are always aligned with the string itself. Note that as long as the string is in tension, nothing changes if we replace it with a bar (namely an elongated rigid body) hinged at the endpoints. The equilibrium of the bar implies that indeed such a bar is aligned with the forces acting on its endpoints, as you may verify (Fig. 17.5).
Fig. 17.5 Equilibrium of a bar
17.7.1 Levers and balance scales As anticipated in Section 17.5, here we consider some classical applications of the above considerations, which help in clarifying the physical meaning of the notion of moment, as well as its relevance in mechanics. Specifically, we consider levers and balance scales (as distinct from spring scales). We also introduce the definition of barycenter, or center of gravity. These examples should help in grasping the physical meaning and significance of moments. The example that is best suited to understand the physical meaning of the moment at an intuitive level is that of the lever, which you are most likely quite familiar with, at least from a practical point of view. [As noted in Footnote 7, p. 150, the principle of the lever was introduced by Archimedes.] To be specific, a lever consists of a bar, hinged at a point called the fulcrum, which is acted upon by two applied forces: the first one, the input force, is the force that we apply so as to balance the second one, the output
696
Part III. Multivariate calculus and mechanics in three dimensions
force. A third force is the constraint reaction at the fulcrum. Traditionally, levers are classified into three kinds, which are addressed below. ◦ Comment. In dealing we levers, we typically choose the pivot (namely the point about which we measure the moments) to coincide with the fulcrum. In this way, the constraint reaction at the fulcrum needs not be evaluated.
• Levers of the first kind In levers of the first kind, the fulcrum is located between the two forces. This is the kind of lever that most people think about when they hear the word lever. The most common example is the balance scale. [Other examples include crowbars, oars, seesaws, pliers, scissors — they all operate according to the same principle, and are classified as levers of the first kind.] There are two types of balance scales. In one case, you have two pans, hanging from a horizontal bar, with hanging points that are equally distant from the fulcrum. You place the object that you want to weigh in one of the two pans, and put enough known weights in the other pan until you reach a balance. This type of scale was already encountered in Subsection 4.1.2. Now, we can address it more appropriately. Let WB be the total balancing weight placed in the second pan when the equilibrium is reached. Consider the equation of the moments. As mentioned above, for levers it is convenient to choose the pivot to coincide with the fulcrum. Accordingly, we have W b = WBb, where b is the common distance from the fulcrum of the two vertical lines through the two points from which the pans hang. This implies W = WB. Since WB is known, we know the desired weight W of our object. The other type of balance scale has a single pan plus a horizontal bar. In addition, there is a body B, of a known weight WB, which can slide along the bar. You move WB until the equilibrium is reached. Accordingly, we have W b = WBbB, where b and WB are known fixed quantities, whereas bB is the balancing distance of B from the fulcrum when the equilibrium is reached, and W is the weight that we desire to evaluate. This is given by W = WBbB/b. [In practice, the horizontal bar of the scale has marks that convert directly the balancing distance bB into the weight W on the pan of the scale. The marks on the horizontal bar are such that the weight of the pan is not included.]
697
17. Statics in three dimensions
• Levers of the second kind In levers of the second kind, the output force is located between the fulcrum and the input force. Examples of levers of this kind include the wheelbarrow and the nutcracker. In this case, from the equation of the moments about the fulcrum, we have that FIbI = FObO, where FI > 0 (FO > 0) is the magnitudes of the input (output) force. [Indeed, in this case, the force vectors have opposite directions.] Since bI > bO, we have FO > FI. Hence, levers of this kind are used to reduce the input force FI needed to generate the desired output force.
• Levers of the third kind Levers of the third kind operate on the reverse principle: the input force is located between the fulcrum and the output force. Since now bI < bO, we have WI > WO. Examples include the catapult, as well as brooms. The objective, in this case, is not so much to increase the output force, as to reduce the input displacement.
17.7.2 Rigid system subject to weight. Barycenter Consider a rigid system of particles that are subject to their own weight. Assume that the system is suspended by a string at a specific point, say Q. The force acting on the particle h is wh = −Wh k
(h = 1, . . . , n),
(17.79)
where k is directed upwards (namely opposite to the direction of gravity). "n Thus, according to the first in Eq. 17.78, the reaction is h=1 Wh k. Thus, necessarily the string must assume a vertical direction, since the string is always aligned with the forces acting on its endpoints (Remark 139, p. 695). Next, consider mOE = 0 (second in Eq. 17.77). The discussion of this equation becomes easier if we choose the pivot (that is, the point with respect to which one evaluates the moments) to coincide with the point Q, since, in this way, the reaction is eliminated from the equation. Then, we see that equilibrium is possible only if mOE =
n # h=1
(xh − xQ) × wh = 0.
(17.80)
698
Part III. Multivariate calculus and mechanics in three dimensions
To make the analysis more gut–level, it is convenient to introduce the following3 Definition 285 (Barycenter). Consider a rigid system of n particle in equilibrium, subject to their own weight, wh = −Wh k (Eq. 17.79). The barycenter (or center of gravity), G, is identified by the vector (use Wk = mk g, Eq. 11.26) xG =
n n 1 # 1 # Wk x k = mk xk , W m k=1
(17.81)
k=1
where W =
n #
Wk ,
(17.82)
mk ,
(17.83)
k=1
is the total weight of the system, and m=
n # k=1
is the total mass of the system. Using Eqs. 17.81 and 17.82, and setting wh = −Wh k (Eq. 17.79), one obtains that the moment of the weights around the barycenter vanishes. Indeed, using Eqs. 17.81 and 17.82, we have n #
(xh − xG) × wh = −
h=1
# n
(xh − xG) Wh
× k = 0.
(17.84)
h=1
This means that a rigid system of particles suspended by the barycenter satisfies the equilibrium condition (Eq. 17.80.
• Equilibrium of rigid body subject to weight
♥
Are there any other points for which this property is valid? To answer this question, note that the equilibrium condition (Eq. 17.80) may be written as n #
(xh − xQ) × wh = (xG − xQ) × w = 0.
(17.85)
h=1 3 The term barycenter comes from the ancient Greek word βαρυκ ντρoν (barykentron), from βαρυς (barus, heavy) and κ ντρoν (kentron, center). Indeed, it means “center of weights.”
699
17. Statics in three dimensions
This implies that the equilibrium can occur only if xG − xQ is parallel to k, that is, directed along the vertical. We can distinguish three cases: (i) xG is located directly above xQ, or (ii) xG is located directly below xQ, or (iii) xG coincides with xQ. As we will see in Subsection 20.3.2 (see in particular Eq. 20.46), the configuration with xG above xQ is in equilibrium, but a small disturbance causes the system to move away from the equilibrium configuration. [It’s like trying to hold a cue stick above the tip of your finger.] We refer to this case as unstable equilibrium. Thus, the only possibilities of practical interest are that xG is located right above xQ (stable equilibrium), or that xG coincides with xQ. In the latter case, any rotation of the rigid particle system around xG = xQ yields again an equilibrium configuration — we refer to this case as neutral equilibrium.
• Is the condition that wh (h = 1, . . . , n) be parallel necessary?
♠
In Definition 285, p. 698, we used the condition wh = −Wh k, for h = 1, . . . , n. We wonder whether such a condition (namely that wh be parallel) is necessary. To answer this, we should ask ourselves if there exists a unique point around which the moment vanishes, even when the forces are not parallel. In other words, we want to find out whether the condition xG × w =
n #
x h × wh ,
h=1
where w =
n #
wh ,
(17.86)
h=1
defines a unique point. The answer is no! To see this, consider, to begin with, a two–particle rigid body (specifically, a body composed of two equal particles rigidly connected by a weightless bar). Assume that the weights are directed towards a single point O (such as the center of the Earth), and that their intensity is a (decreasing) function of the distance from O. Let C denote the midpoint of the bar. The moment vanishes only if the bar is perpendicular to the line OC, or parallel to it. If this is not the case, in general the moment around the bar midpoint does not vanish, as you may verify. This indicates that the center of gravity is not uniquely defined.
17.8 Appendix A. Friction We conclude this chapter by introducing an important issue that has been ignored thus far — the friction between two solid objects that are in contact
700
Part III. Multivariate calculus and mechanics in three dimensions
with each other. In this section, we examine both static and dynamic friction. Illustrative examples are included.
17.8.1 Static friction Let us consider static friction, namely friction between two solid objects that, while in contact with each other, are not in motion one with respect to the other. To tackle this problem, let us consider some experimental evidence. Specifically, consider a heavy particle P on a horizontal plane. We assume the particle to be subject to its (vertical) weight W and to a horizontal force F . The question that we ask ourselves is: “For the problem stated above, are the equilibrium equations satisfied?” The weight W is balanced by the reaction, N , of the horizontal plane. What balances the horizontal forces? Experimental evidence indicates that a force, called friction, arises from the contact between the particle and the plane. Specifically, friction is capable of generating a tangential force T , equal and opposite to F , provided however that F does not exceed a certain limit value, say FMax. In addition, still experimental evidence indicates that this maximum value of the friction force is proportional to the weight: FMax = μSW , where the coefficient of proportionality μS, known as the coefficient of static friction, depends upon the properties of two surfaces in contact, such as surface roughness, temperature, their chemical compositions, presence of lubricants, and so on. Of course, the process is applicable even if the plane is not horizontal and the normal force is not due to gravity. Indeed, what matters are the tangential and the normal components of the resultant force of all the forces applied to the body. Thus, we can state the experimental results as follows: for any particle in contact with a surface, we have that equilibrium exists, provided that T < μSN,
(17.87)
where T > 0 and N > 0 denote respectively the friction and the constraint reaction resulting from the equilibrium equations. In other words, μSN represents the maximum tangential value TMax of the force that the surface may exert on the body if the contact normal force is N , namely TMax = μSN.
(17.88)
◦ Comment. For the illustrative cases addressed here, the contact is typically unilateral. However, the unilaterality of the contact is not a requirement —
701
17. Statics in three dimensions
the formulation applies as well if the two objects cannot separate from each other. [To convince yourself, consider pulling a piece of material from a vise, or uncorking a wine bottle.]
17.8.2 Dynamic friction Next, consider what happens when an object slides with respect to a surface. The force that develops is known as dynamic friction. This occurs, for instance, when you want to bring to a full stop your old car and push too hard on the brake pedal: the wheels get locked, unless, of course, the car is not that old and has an ABS (anti–lock braking system). The force that develops between the tires and the pavement is the dynamic friction. [Incidentally, if the wheels are not locked, the force between the brake shoes and the brake drum is also a dynamic friction force. In this case, still on the basis of experimental evidence, we have T = μD N,
(17.89)
where μD also depends also upon the properties of two surfaces in contact. Finally, experimental evidence indicates that μD < μ S .
(17.90)
◦ Comment. Accordingly, if an object is subject to a force resultant equal to μSN and start slipping, then to keep it moving at a constant velocity, we need to reduce the resultant to μD N . ◦ Warning. Note the conceptual difference between Eqs. 17.87 and 17.89: the first is an inequality, the second an equality. In other words, μSN in Eq. 17.87 represents the maximum value of T in the absence of slip, whereas μD N in Eq. 17.89 represents the value of T in the presence of slip.
17.9 Appendix B. Illustrative examples of friction In this section, we clarify the implications of the laws of friction by providing you with some illustrative examples. Let us start with a simple example. Let us consider a flat pan on the top of the kitchen counter and load it with weights. The more weights you put inside the pan, the higher is the force you have to apply to be able to move the pan. If you could repeat the same
702
Part III. Multivariate calculus and mechanics in three dimensions
experiment on the Moon, you’d find out that the force you have to apply is reduced to about 1/6 of that on the Earth (Remark 97, p. 462). In the rest of this section, we examine four more complicated (and more interesting) examples. First, we address what happens to a chair (or a piece of furniture) if you push it on its side. Does it slide or does it fall over? Then, we examine what happens if you place a ruler on your two fingers and slowly move the two fingers towards each other. Finally, we consider a particle sliding on a slope in the presence of friction, first by assuming the effect of the air drag to be negligible, and then by removing such an assumption.
17.9.1 Pushing a tall stool Let us assume that you are applying a horizontal force F to an object, such as a tall stool as shown in Fig. 17.6.
Fig. 17.6 Pushing a tall stool
We may ask ourselves: “What is going to happen? Is it going to slide, or is it going to tip over, namely lift on one side, and eventually fall down?” The answer is: “It depends!” (as I often replied to the asking prof during my oral exams, whenever I needed more time to come up with an answer). Indeed, as you may have experienced, if you press the stool near its top, it tips over. On the contrary, if you press the stool near its bottom, it slides. Let us try to make some sense out of this experimental fact. Let us assume, for the sake of discussion, that you press the stool at its top, so that it is going to tip over. Then, just as it starts to tip over, we have that the force at P1 vanishes. In this case, the equilibrium of the vertical forces yields N = W , where W is the magnitude of the weight, applied at the barycenter G, and N is the magnitude of the constraint reaction at P2 .
703
17. Statics in three dimensions
Also, the balance of the horizontal forces yields T = F , where F is the force that you apply at P3 , and T is the friction force applied at P2 . Finally, the balance of moments about P2 yields F h = W b, where b is the horizontal distance between the barycenter and the force N n, whereas h is the vertical distance between the force F i and the ground. Combining, one obtains T F b = = . N W h
(17.91)
Note that the above behavior occurs only if the static friction condition is satisfied, namely if T < μSN . Combining with the above equation yields h > h∗ :=
b . μS
(17.92)
If this condition is not satisfied, namely if h < h∗ = b/μS, the chair will not tip over. It will slide. You can use this result in case you want to slide a piece of furniture: make sure you push it below the value h = h∗ = b/μS. If you don’t, the piece of furniture will tip over and fall down, instead of sliding. It makes no difference how low the pushing point is. What really matters is solely that h < h∗ . ◦ Comment. Of course, tipping may occur only if h∗ < hS, where hS is the height of the stool. From Eq. 17.92, we can see that this condition is satisfied if μS > b/hS.
17.9.2 Ruler on two fingers
♥
Consider a ruler, which we assume to be uniform, just to make our life simpler. Place on your two index fingers under the ruler, close to its endpoints (Fig. 17.7). As you move your fingers closer to each other, you will observe that the ruler will begin to slide over one of the two fingers, say over that at P1 . As you continue to move your fingers closer to each other, the sliding finger will move closer to the center of the ruler. At some point, however, you will note a role reversal, in the sense that the ruler will begin to slide over the other finger, namely over that at P2 . Then, after a while, as the fingers get even closer to each other, the ruler will return to slide over that at P1 . This will continue with the sliding alternating between the two points P1 and P2 , until the two fingers touch each other. It is relatively easy to explain this phenomenon with the laws of friction presented in the preceding subsections. Let us introduce an abscissa on the ruler, with the origin placed at the barycenter P0 of the ruler. Then, the
704
Part III. Multivariate calculus and mechanics in three dimensions
Fig. 17.7 Ruler on two fingers
balance of the vertical forces yields N1 + N2 = W , whereas the balance of moments with respect to the barycenter yields N1 b1 = N2 b2 , where b1 = P0 P1 and b2 = P0 P2 denote the distances of N1 and N2 from the barycenter. Therefore, we have N1 =
b2 W b1 + b 2
and
N2 =
b1 W. b 1 + b2
(17.93)
On the other hand, the balance of the horizontal forces will give us T2 = T1 , where Tk (k = 1, 2) denotes the magnitude of the h-th tangential force and is always positive. Next, just to be specific, assume that initially b2 < b1 (namely that P2 is closer to the barycenter than P1 ). Therefore, N1 < N2 , and hence the static friction condition (Eq. 17.87) will be first violated at P1 , and P1 will be the point where the sliding occurs first. Then, using the dynamic friction equation (Eq. 17.89), we have T1 = μDN1 . As the sliding continues, b1 decreases (with b2 fixed). Accordingly, N1 increases and N2 decreases (Eq. 17.93). On the other hand, we have T2 = T1 = μDN1 , and hence T2 increases. In other words, N2 becomes smaller and smaller as T2 becomes larger and larger, until eventually we reach the condition T2 > μSN2 , and Eq. 17.87 is violated. At this point, we have a role reversal: the sliding occurs at P2 . Then, we may repeat the considerations above, until a second role reversal occurs. Throughout the process, P1 and P2 , continue to approach each other. In the very end, when their distance vanishes, both of their locations necessarily coincide with that of the barycenter of the ruler.
705
17. Statics in three dimensions
17.9.3 Particle on a slope with friction, no drag
♥
Consider a particle on a slope, such as a brick on an inclined plane (Fig. 17.3, p. 669). Assume that initially the plane is not inclined and let us gradually increase the angle that the slope makes with the horizontal plane. The only forces acting on the particle are its weight, the constraint reaction, and the friction. [In this subsection we assume that the aerodynamic drag is negligible.] The balance of forces gives us, respectively in terms of the normal and tangential components, N = W cos α
and
T = W sin α.
(17.94)
Note that combining Eqs. 17.87 and 17.94, we have tan α = T /N . This is all we need to know, as long as tan α < μS. For, in this case, the static friction is adequate to keep the brick from sliding. As we increase α, T grows with α, whereas N decreases, until α reaches the maximum value it can take without sliding to occur. This is given by αMax = tan -1 μS ∈ (0, π/2).
(17.95)
For α > αMax, the particle starts to slide, and the solution of the problem must be obtained by using the equations of dynamics. ◦ Comment. There is a case, however, in which we can use the equation of statics, even if the particle is sliding. Assume that the angle is such that tan α = μD.
(17.96)
In this case, if we give the particle an initial push, the particle starts to move with an initial speed, say v0 . Then, for an observer having speed v0 , the brick is not moving (Remark 134, p. 665) and we can apply the laws of statics. The forces are perfectly balanced, and the particle will continue to slide with constant speed v0 . [Note that the speed v0 is not affected by the phenomenon. It solely depends upon the initial conditions. If you feel that this is contrary to your intuition, you are correct. The conundrum is explained by the fact that we have neglected the air drag, which is included in the subsection that follows.]
17.9.4 Particle on a slope with friction and drag
♥
In the preceding subsection, we have addressed a particle on a slope with friction, without drag. In Subsection 17.2.1, we have addressed a particle on
706
Part III. Multivariate calculus and mechanics in three dimensions
a frictionless slope in the presence of drag. Here, we consider a particle on a slope with friction in the presence of drag. Assume the particle to be descending, with constant velocity, say v0 . Then, again for an observer moving at a constant speed with the particle, we can use the equations of statics (Remark 134, p. 665). This allows us to find out the value of v0 as a function of the incline angle. Specifically, the balance of forces gives us, respectively in terms of the normal and tangential components (compare to Eq. 17.7 for zero friction, and Eq. 17.94 for zero drag) N = W cos α
and
T + D = W sin α.
(17.97)
[Note that the drag D has the same direction as the friction force T .] Using T = μDN (Eq. 17.89) and combining with the equations above, so as to eliminate N , we have (compare to Eq. 17.96 for zero drag) D(α) = W (sin α − μD cos α).
(17.98)
Next, recall that the drag is given by D = 21 CD A A v02 (Eq. 14.80), where A is the air density and A is the cross–section area, whereas CD > 0 is the drag coefficient, which may be treated as a constant. Thus, we have (compare to Eq. 17.8 for zero friction) 1 2W v0 (α) = (sin α − μD cos α). (17.99) C D A A [If μD = 0 (frictionless slope), we recover Eq. 17.8.] ◦ Comment. Let me say it one more time. If the particle has the above constant velocity v0 , then in the (inertial) frame of reference that follows the particle we have static equilibrium, and hence the statics formulation is appropriate. [As we will see in Subsection 20.2.4, if the particle has initially a lower velocity, even a zero velocity, it will accelerate until it reaches the situation v = v0 . Similarly, if initially v > v0 , the particle will decelerate until it reaches the situation v = v0 .]
Chapter 18
Multivariate differential calculus
In Chapters 15 and 16, we covered physicists’ vector algebra along with more material on matrices, which is all we needed to cover statics. For dynamics (Chapter 20), we will need multivariate calculus. This includes: (i) the extension from scalar functions of one variable to scalar functions of several variables, and (ii) the extension from scalar functions to vector functions (of one or more variables). Accordingly, in this chapter we present a fairly complete extension of differential calculus to multivariate differential calculus. [Multivariate integral calculus is addressed in the next chapter.] For much of the material, we deal with the two or three spatial coordinates. However, some of the concepts presented (such as the multidimensional extension of the Taylor polynomials) are more general and are valid for n variables, with n arbitrary.
• Overview of this chapter In Section 18.1, we deal with differentials and differential forms, in two and n dimensions. Then, in Section 18.2, we introduce some topics related to lines in two and three dimensions. Specifically, we address properties of lines on a plane and introduce the mathematical expressions for its tangent, its normal and its curvature. Then, we extend the formulation to three dimensions and add the mathematical expressions for the so–called binormal to a curve and its twist. Next, in Section 18.3, we consider vector functions in two and three dimensions and introduce the definition of gradient of a scalar function, as well as those of divergence and curl of a vector field (namely a vector function in R2 or R3 ). Then, in Section 18.4, we extend the formulation for the Taylor polynomials to scalar functions of n variables. Finally, we use the results obtained to discuss the minimum of a scalar function of several variables, without and with constraints (Section 18.5). © Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_18
707
708
Part III. Multivariate calculus and mechanics in three dimensions
18.1 Differential forms and exact differential forms In Subsection 9.1.2, we introduced the differential of a function of a single variable, as df = f (x) dx (Eq. 9.13). Akin to what we did there, we have the following Definition 286 (Differential in two dimensions). The differential of a function f = f (x, y) is given by df =
∂f ∂f dx + dy. ∂x ∂y
(18.1)
However, the situation now is a bit more complicated than it was in one dimension. To address this issue, let us introduce the following Definition 287 (Differential form in two dimensions). In two dimensions, an expression of the type M (x, y) dx + N (x, y) dy
(18.2)
is called a differential form. The novel aspect, in comparison to the one–dimensional case, is that, in general, we have no guarantee that there exists a function f (x, y) such that df equals M (x, y) dx + N (x, y) dy. Thus, we introduce the following Definition 288 (Exact differential form, two dimensions). The differential form M (x, y) dx + N (x, y) dy is said to be exact iff there exists a function (not necessarily single–valued) f (x, y) such that M (x, y) =
∂f ∂x
and
N (x, y) =
∂f , ∂y
(18.3)
so that M (x, y) dx + N (x, y) dy =
∂f ∂f dx + dy = df. ∂x ∂y
(18.4)
We have the following Theorem 189 (Necessary conditions for exact differential form). Consider the differential form M (x, y) dx + N (x, y) dy (Eq. 18.2), and assume ∂N /∂x and ∂M /∂y to be continuous. A necessary condition for the differential form to be exact is that ∂N ∂M = . ∂x ∂y
(18.5)
709
18. Multivariate differential calculus
◦ Proof : A differential form M (x, y) dx + N (x, y) dy is exact iff Eq. 18.3 is satisfied. Thus, using the Schwarz theorem on the invertibility of the order of the second mixed derivative (Theorem 106, p. 396), which applies because ∂N /∂x and ∂M /∂y are continuous by hypothesis, we must necessarily have ∂2f ∂2f = , ∂x ∂y ∂y ∂x
(18.6)
which is equivalent to Eq. 18.5.
◦ Comment. The sufficient conditions are discussed in Subsection 19.5.4.
• Extension to n dimensions
♣
Turning to n dimensions, we have the following definitions: Definition 289 (Differential in n dimensions). The differential of a function f = f (x1 , . . . , xn ) is defined by df =
n # ∂f dxh . ∂xh
(18.7)
h=1
Definition 290 (Differential form in n dimensions). In n dimensions, an expression of the type n #
vh (x1 , . . . , xn ) dxh
(18.8)
h=1
is called a differential form. Definition 291 (Exact differential form in n dimensions). The differ"n ential form h=1 vh (x1 , . . . , xn ) dxh is said to be a exact iff there exists a function f (x1 , . . . , xn ) such that vh (x1 , . . . , xn ) =
∂f . ∂xh
(18.9)
In analogy with Eq. 18.5, if vh (x1 , . . . , xn ) (h = 1, . . . , n) are continuous, the necessary conditions for the differential form to be exact are ∂vh ∂vj = ∂xj ∂xh as you may verify.
(h, j = 1, . . . , n),
(18.10)
710
Part III. Multivariate calculus and mechanics in three dimensions
18.2 Lines in two and three dimensions revisited In this section, we address again the mathematical representation of lines and their properties. In Chapter 7, we have seen that a line may be represented in parametric form as: (i) x = x(u) and y = y(u) in two dimensions (Eq. 7.9), and (ii) x = x(u), y = y(u), and z = z(u) in three dimensions (Eq. 7.15). Using the vector notation introduced in Chapter 15, both of these equations may be written as x = x(u).
(18.11)
The material presented in this section may be simplified considerably if we use, instead of a generic abscissa u, the arclength s of a line in space, as given in the following extension to three dimensions of the two–dimensional Definition 201, p. 434, Definition 292 (Arclength). The arclength s of a line L is the length of L from a given point x(u0 ). Specifically, consider a line L described by the continuously differentiable vector function x = x(u). We have ds = dx.
(18.12)
Next, assume that dx/du does not change orientation, to avoid reversing the way L is covered. Then, we have / s / s / u4 4 4 dx 4 4 4 du. s= (18.13) ds = dx = 4 4 0 0 u0 du [The material on integrals in one dimension (Chapter 10) applies to the last integral in Eq. 18.13, because the integrand is a scalar function of a single variable.] ◦ Comment. In particular, consider a three–dimensional curve explicitly defined by y = y(x) and z = z(x) (Eq. 7.13), namely x = x i + y(x) j + z(x) k. Choosing u = x, we have dx/du = 1 + (dy/dx)2 + (dz/dx)2 . Thus, Eq. 18.13 yields 1 2 2 / x dy dz s= 1+ + dx. (18.14) dx dx x0 [This is an extension to three dimensions of Eq. 10.85, which is limited to two–dimensional lines. In fact, the above equation reduces to Eq. 10.85 for z = constant namely for a two–dimensional curve.]
711
18. Multivariate differential calculus
In the rest of this section, we obtain the expressions for tangent and normal to a line that is described by x = x(u), along with the so–called binormal in three dimensions. We assume that the functions x(u), y(u), and z(u) are twice differentiable. As stated above, the formulation simplifies considerably if u coincides with the arclength s, namely if the line is defined by x = x(s).
(18.15)
18.2.1 Lines in two dimensions To make our life simpler, let us begin with lines in two dimensions. [The three–dimensional extension is addressed in Subsection 18.2.2.] Accordingly, here we introduce the mathematical representations of unit tangent and unit normal for a line on a plane.
• Unit tangent Consider the vector Δx connecting two points of the curve, say, those corresponding to the abscissas s + Δs and s: Δx := x(s + Δs) − x(s).
(18.16)
Recall that: (i) the straight line that contains two points of a curve is called the secant, and (ii) in the limit, as the two points tend to each other, the secant tends to the tangent evaluated at the limit point (Section 5.3). Thus, in the limit, as Δs tends to zero, the direction of Δx (namely that of the secant) approaches that of the tangent to the line at x = x(s). However, as Δs tends to zero, the magnitude of Δx approaches zero as well. To circumvent this problem, let consider the unit vector Δx/Δx, where Δx = Δs + o[Δs], as implicit in Eq. 18.12.
Fig. 18.1 Unit tangent, t = dx/ds
712
Part III. Multivariate calculus and mechanics in three dimensions
Hence, taking the limit as Δs tends to zero, we have that the unit vector parallel to the secant, namely Δx/Δx, tends to the unit tangent. In other words, t(s) := lim
Δs→0
Δx dx = Δs ds
t(s) = 1
(18.17)
is a unit vector, which by construction is tangent to the line x = x(s). Therefore, it is referred to as the unit tangent to the line x = x(s). ◦ Comment.♣ For the sake of completeness, let us consider what happens if, instead of the arclength s, we use a generic abscissa u. In this case, we have to assume dx = 0. du
(18.18)
[This assumption is only slightly stronger than that made in Definition 292, p. 710, namely that dx/du does not change orientation.] Then, we have t(u) :=
dx du dx/du dx/du dx = = = . ds du ds ds/du dx/du
(18.19)
The last expression clearly indicates that this is a unit vector. On the other hand, the vector ˘t(u) := dx = lim Δx du Δu→0 Δu
(18.20)
is different from zero (Eq. 18.18), and by construction is tangent to the line x = x(u), albeit not necessarily a unit tangent. [The assumption that dx/du = 0 (Eq. 18.18) is needed to avoid a 0/0 situation in Eq. 18.19. For √ instance, if u = 3 s, namely s = u3 , we have ds/du = 3u2 , which vanishes for s = 0. In this case, Eq. 18.19 cannot be used, whereas Eq. 18.17 can.]
• Unit normal. Curvature Here, we address in greater depth the material in Section 9.7, on the curvature of the graph of a function. To this end, we move from an explicit representation of the curve to a parametric one. Let us introduce the following (still in two dimensions) Definition 293 (Unit normal). The unit normal n is the unit vector that: (i) is normal to the curve, and (ii) lies on the plane of the curve. In addition, here n is pointed towards the center of the osculating circle.
18. Multivariate differential calculus
713
◦ Comment. Note the difference with respect to the definition of outer normal for a contour (Remark 71, p. 348), which is always pointing on the same side. Next, consider the derivative of t with respect to s. We have dt t(s + Δs) − t(s) d2 x = 2 = lim Δs→0 ds ds Δs x(s + 2Δs) − 2x(s + Δs) + x(s) = lim . Δs→0 (Δs)2
(18.21)
Assume initially that the points x(s), x(s + Δs) and x(s + 2Δs) are not coaligned. Hence, they identify a circle (Theorem 57, p. 199). The circle obtained in the limit as Δs tends to zero is the osculating circle (Definition 193, p. 401). The curvature of the line at x = x(s) coincides, by definition, with the curvature of the osculating circle there. Differentiating t2 = t · t = 1, we have t · dt/ds = 0. As a consequence, we have dt/ds = κ ˘ n, where n is the unit normal pointed toward the center of curvature of the line, whereas κ ˘ is a proportionality coefficient. To obtain an expression for κ ˘ , note that the triangles C1 P P1 and P QQ1 are similar (isosceles triangles with the same angle Δθ at the apex (Fig. 18.2). Therefore, using Theorem 42, p. 184, on the ratios of the sides of similar triangles, we have Δt/1 = Δx/R1 , or Δt/Δx = 1/R1 , where R1 := C1 P . [Note that in the figure the vector t1 appears in two locations. This is legitimate because t is not an applied vector, and hence its location is irrelevant.]
Fig. 18.2 On the equation dt/ds = κ n
As Δθ tends to zero, the point C1 tends to the center C of the osculating circle, namely R1 tends to the radius of curvature of the osculating circle R := CP . In the limit, we obtain
714
Part III. Multivariate calculus and mechanics in three dimensions
4 4 4 dt 4 1 4 κ ˘=4 4 ds 4 = R =: κ > 0,
(18.22)
where κ is the curvature of the line at x = x(s). [Note the analogy with the definition of “signed curvature” given in Eq. 9.145, where the curvature may be negative. We have κ = |κS|. Here, the role of the sign is taken up by the pointing of the normal n (see Eq. 18.23).] In summary, we have obtained dt = κ n. ds
(18.23)
◦ Comment. If the points — in the limit — are coaligned, we obtain dt/ds = 0. The center of the circle goes to infinity. In other words, the radius of curvature R is infinitely large and the curvature vanishes. This occurs when the center of curvature moves from one side of the curve to the other. [Try the curve x(u) = u i + u3 j, which corresponds to the function y = x3 .]
• What about dn/ds? Differentiating n · t = 0 and using Eq. 18.23, one obtains d dn dt dn (n · t) = ·t+ ·n= · t + κ = 0, ds ds ds ds
(18.24)
namely the component of dn/ds along t equals −κ. On the other hand, differentiating n · n = 1, we have d dn (n · n) = 2 · n = 0, ds ds
(18.25)
namely the component of dn/ds along n equals zero. Accordingly, we have dn = −κ t. ds
18.2.2 Lines in three dimensions
(18.26)
♣
Here, we extend the results to a curve in three dimensions, described by the equation x = x(s). All the considerations regarding the unit tangent in two dimensions apply to the three–dimensional case equally well. Thus, we have again t = dx/ds, as in Eq. 18.17.
715
18. Multivariate differential calculus
Next, consider the derivative of the unit normal. Most of the considerations regarding the derivative of the unit vector in two dimensions are still valid. However, not all of them are. Specifically, Eq. 18.26 is no longer valid, as we will see (Eq. 18.33). Accordingly, additional comments are in order, because of the three–dimensionality of the problem. Specifically, recall that, in three dimensions, three non–coaligned points identify a plane and a circle within this plane (Theorem 57, p. 199). Let P1 , P2 and P3 denote three points on L, which we assume not to be coaligned. If we introduce the origin O, and identify x0 , b1 and b2 respectively with the oriented segments OP1 , P1 P2 and P1 P3 , the parametric representation of the plane through P1 , P2 and P3 may be written as x − x0 = λ1 b1 + λ2 b2 (Eq. 15.54). Accordingly, in the limit as Δs tends to zero, the plane identified by the points x0 := x(s), x1 := x(s+Δs) and x2 := x(s+2Δs) tends to a specific plane that is called the osculating plane, whereas the corresponding circle tends to the osculating circle, which of course is contained in the osculating plane. Thus, Eq. 18.23 determines both the curvature, κ = 1/R, and the unit normal n to the curve, which is, by definition, the unit vector normal to the curve x = x(s) that lies in the osculating plane. Looking at it the other way around, the osculating plane is the plane that contains both the tangent and the normal to the line (Eqs. 18.17 and 18.23, respectively).
• Unit binormal. Frenet–Serret formulas
♠
Contrary to the two–dimensional case, here it is convenient to introduce a third vector, namely b = t × n,
(18.27)
which is known as the unit binormal of the line at x = x(s). From the definition of cross product, we have that the vectors t, n and b form a right–handed orthonormal basis. Hence, in analogy with Eq. 15.72, we have also t=n×b
(18.28)
n = b × t.
(18.29)
and
Next, consider the derivative of b. Differentiating Eq. 18.27, we obtain db dt dn dn = ×n+t× =t× , ds ds ds ds
(18.30)
716
Part III. Multivariate calculus and mechanics in three dimensions
because dt/ds = κ n (Eq. 18.23) and a×a = 0 for any a (Eq. 15.61). Equation 18.30 indicates that db/ds is orthogonal to t. In addition, differentiating b2 = b · b = 1, we have b · db/ds = 0. Thus, db/ds is also orthogonal to b itself. Hence, db/ds, being orthogonal to both t and b itself, is necessarily parallel to n, and hence it may be expressed as db = τ n. ds
(18.31)
The coefficient of proportionality, τ , is called the torsion of the line at x = x(s) and is a measure of the rate of twist of the curve out of the osculating plane. [Of course, τ = 0 indicates that, at that point, the curve remains in the osculating plane.] Finally, differentiating Eq. 18.29, we have dn dt db =b× + × t, ds ds ds
(18.32)
namely (use Eqs. 18.23 and 18.31, as well as Eqs. 18.27 and 18.28) dn = −κ t − τ b. ds
(18.33)
[Of course, if τ = 0 (in particular, in the planar case), Eq. 18.33 reduces to Eqs. 18.26.] Equations 18.23, 18.31 and 18.33 are called the Frenet–Serret formulas.1
• An illustrative example: a helix
♥
A helix is a spatial curve that has a spiral shape and is defined by x = RH cos θ,
y = RH sin θ,
z = a θ,
(18.34)
namely x(θ) = RH cos θ i + RH sin θ j + a θ k,
(18.35)
where RH is the radius of the helix and 2πa is called its pitch (Fig. 18.3). Therefore, the (non–unit) tangent is given by (use ˘t := dx/dθ, Eq. 18.20) ˘t = dx = −RH sin θ i + RH cos θ j + a k. dθ 1
(18.36)
Named after the French mathematicians Jean Fr´ ed´ eric Frenet (1818–1900) and Joseph Alfred Serret (1819–1885), who discovered them independently.
717
18. Multivariate differential calculus
Fig. 18.3 Helix
Dividing it by dx/dθ =
. RH2 + a2 , we obtain the unit tangent
1 − RH sin θ i + RH cos θ j + a k . t= . RH2 + a2 Next, recall that ds/dθ = dx/dθ =
(18.37)
. RH2 + a2 . Therefore, we have (use
Eq. 18.23) κn =
−RH dt dθ dt = = 2 (cos θ i + sin θ j), ds dθ ds RH + a 2
(18.38)
namely κ=
RH 1 = 2 R RH + a 2
and
n = − cos θ i − sin θ j.
(18.39)
◦ Comment. You may verify that t and n are indeed mutually orthonormal. Also, the normal is horizontal, but the tangent is not. Therefore, the osculating plane (which contains both t and n) is not horizontal. Next, using b = t × n (Eq. 18.27), we have (use Eqs. 18.37 and 18.39 along with Eq. 15.72) b= .
1 R2
H
+
a2
a sin θ i − cos θ j + RH k ,
(18.40)
which is a unit vector orthogonal to t and n, as you may verify. Finally, you may enjoy yourself and differentiate Eq. 18.40 to obtain db/ds, and hence the torsion τ . [Use τ = n · db/ds (Eq. 18.31).]
718
Part III. Multivariate calculus and mechanics in three dimensions
18.3 Gradient, divergence, and curl In this section, we introduce the operators gradient, divergence and curl. To begin with, let us introduce the following Definition 294 (Scalar and vector fields). A scalar function of x, namely f = f (x),
(18.41)
is called a scalar field. In contrast, a vector v, whose components are functions of x, namely vk = vk (x), i.e., v = v(x),
(18.42)
is called a vector field. As an example of a scalar field, you may consider the density or the temperature of a fluid, as functions of x. As an example of a vector field, you may consider the velocity of a fluid, such as water or air, around an object, such as an automobile model in a wind tunnel. Other vector fields include the gravitational field (addressed in Chapter 20, in connection with the Kepler laws), as well as the electric and magnetic fields (not addressed in this book).
18.3.1 Gradient and directional derivative We have the following2 Definition 295 (Gradient). The gradient of a function f (x), denoted by gradf , is a vector defined by gradf =
∂f ∂f ∂f i+ j+ k. ∂x ∂y ∂z
(18.43)
∂f ∂f i+ j. ∂x ∂y
(18.44)
In two dimensions, we have gradf =
This means, for instance, that the x-component of the vector g := gradf is gx = ∂f /∂x. Note that the above definition implies that in three dimensions 2 Gradient comes from gradiens (present participle of grad¯ ı, Latin for “to walk, to go forward, to advance”). [Indeed, as we will see later, the direction of grad f is that of the maximum increase of f (x).]
719
18. Multivariate differential calculus
we have gradf · dx =
∂f ∂f ∂f dx + dy + dz = df. ∂x ∂y ∂z
(18.45)
[Hint: Use the expression for the dot product in terms of components (Eq. 15.43), as well as Eq. 18.7.] A similar expression holds in two dimensions (use Eq. 18.1), namely gradf · dx =
∂f ∂f dx + dy = df. ∂x ∂y
(18.46)
• Directional and normal derivatives Here, we introduce the so–called directional derivative. Recalling the definition of composite functions (Definition 128, p. 248), we have the following Definition 296 (Directional derivative). Consider a function f = f (x) and a line x = x(s), where s is the arclength along the curve. The directional derivative df /ds is defined as the derivative with respect to s of the composite function f (s) = f [x(s)]. Recalling that t = dx/ds (Eq. 18.17) and using the multiple chain rule (Eq. 9.140), we obtain df ∂f dx ∂f dy ∂f dz dx = + + = gradf · = gradf · t. ds ∂x ds ∂y ds ∂z ds ds
(18.47)
Therefore, we have df = gradf cos θ, ds
(18.48)
where θ is the angle between the vectors t and gradf . [Use the definition of dot product (Eq. 15.18), as well as t = 1 (Eq. 18.17).] Hence, at a given point x, df /ds attains its maximum value when cos θ = 1, namely for θ = 0. In other words, the directional derivative reaches its maximum in the direction of gradf . Looking at it from a different angle: The direction of gradf is that of the fastest increase of f (x).
(18.49)
On the other hand, df /ds = 0 iff θ = ±π/2, namely iff at x the line x(s) is orthogonal to gradf . [The implications of this fact are very interesting, as we will see in Theorem 190 below.] To state such a theorem, we have to introduce the following definitions:
720
Part III. Multivariate calculus and mechanics in three dimensions
Definition 297 (Smooth point, tangent plane, normal of S). Consider a surface S, along with all the lines L (through a given point of S say x∗ ) that are obtained as intersections of S and a plane through x∗ . Assume that at x∗ all these lines are smooth (Definition 166, p. 347, of smooth line) and that the tangents of all these lines lie on a single plane, say P∗ . Then, the point x∗ is called a smooth point of S, and the plane P∗ is called the tangent plane of S at x∗ . The normal to S at x∗ coincides with the normal to P∗ . [The normal is defined except for the sign.] Definition 298 (Normal derivative). Consider the function f = f (x) and a surface S. The directional derivative along the normal at a smooth point of S is called the normal derivative. Accordingly, we have (use Eq. 18.47) ∂f := gradf · n. ∂n
(18.50)
◦ Comment. Note that, almost universally, the ordinary–derivative sign is used for the directional derivative, whereas the partial–derivative sign is used for the normal derivative. [I avoid the symbol ∂f /∂n used by some authors, because, in the notation adopted in this book, this symbol denotes a vector (see Eq. 18.87).] Then, we have the following Theorem 190. Consider the surface S, defined by f (x) = 0. Assume that the function f (x) is continuous in a layer surrounding S, with f (x) > 0 on one side of S and f (x) < 0 on the other. Consider any smooth point of S, say x∗ , where f (x) is differentiable. Then, at x∗ , we have: (i) gradf is normal to S at x∗ , (ii) gradf points from the region where f (x) < 0 to that where f (x) > 0, and (iii) gradf =
∂f n. ∂n
(18.51)
◦ Proof : To begin with, the hypotheses guarantee that, at x∗ , we have that gradf exists, and gradf = 0. Next, consider any line L ∈ S and a point x∗ ∈ L. Let L be described by x = x(s). For each point of L we have df /ds = 0, since f (x) = 0 for any x ∈ L. Accordingly, df /ds = gradf · t (Eq. 18.47) shows that gradf is normal to the tangents to all the lines on L through x∗ . This proves Item (i). Item (ii) is necessarily true, because of Eq. 18.49. Regarding Item (iii), note that Item (i) implies gradf = C n, where C = gradf · n = ∂f /∂n (Eq. 18.50), in agreement with Eq. 18.51. ◦ Comment. Note that Eq. 18.51 indicates that the gradient is invariant, namely one obtains the same result from Eq. 18.43, by using different coordinate systems. [More on this in Vol. III.]
721
18. Multivariate differential calculus
• Illustrative examples As an illustrative example, consider the three–dimensional surface described by the function f (x, y, z) =
1 2 x + y 2 + z 2 − R2 = 0. 2
(18.52)
The above equation is an implicit representation of a sphere. The normal to this surface is proportional to gradf = x i + y j + z k.
(18.53)
Equation 18.52 and 18.53 show that gradf = 0, and hence gradf = 0. Accordingly, a sphere has a normal at any of its points (and it is outwardly directed). Next, let us address a surface with gradf = 0, at least at one point. Consider the three–dimensional surface described by the function f (x, y, z) =
1 2 x + y 2 − z 2 = 0. 2
(18.54)
The above equation is an implicit representation of a circular cone. [Indeed, a horizontal cross–section consists of a circle of radius r = z. The tangent line through the origin has a slope equal to 45◦ . See also Eq. 7.142.] The (non–unit) normal to this surface is given by gradf = x i + y j − z k.
(18.55)
The origin is a point of the surface and grad f vanishes there. In fact, the normal and the two tangents are not even defined there. ◦ Comment. Of course, the results apply to the two–dimensional case as well. Specifically, the two–dimensional gradient, grad f = (∂f /∂x) i + (∂f /∂y) j, points in the direction of the highest increase of f (x, y), which is normal to the curves f (x, y) = 0. [This property is important in numerical techniques for the search of a minimum.] As an illustrative example, consider a surface described by z = f (x, y), such as a mountain. Consider lines f (x, y) = z(x, y) − C = 0, namely the so–called contour lines (i.e., constant–altitude lines) on a map of the mountain. The two–dimensional gradient grad f points in the direction where the directional derivative attains its maximum value. [Such a direction is known as that of steepest ascent. Its opposite is the direction of steepest descent). The above comments indicate that the direction of steepest descent is normal to the lines z = constant, namely the contour lines on a map.]
722
Part III. Multivariate calculus and mechanics in three dimensions
18.3.2 Divergence
♣
We have the following3 Definition 299 (Divergence). Consider the vector b = bx i + by j + bz k, and assume its components to be differentiable. The divergence of b, denoted by div b, is a scalar defined by div b =
∂bx ∂by ∂bz + + . ∂x ∂y ∂z
(18.56)
In two dimensions, the divergence of b is given by div b =
∂bx ∂by + . ∂x ∂y
(18.57)
◦ Comment. It may of interest to note that the divergence is invariant, namely one obtains the same result by using different coordinate systems. [For simplicity, the proof of this fact will be postponed in Vol. III, since it is inessential until then.]
• A relationship involving divergence Note that, for any differentiable α(x) and b(x), we have div(α b) = b · gradα + α div b.
(18.58)
Indeed, in terms of components, we have ∂ (α bx ) ∂ (α by ) ∂ (α bz ) + + ∂x ∂y ∂z ∂bx ∂α ∂by ∂bz ∂α ∂α = bx + by + bz + α + + , ∂x ∂y ∂z ∂x ∂y ∂z
(18.59)
in agreement with Eq. 18.58. 3
Where does the term “divergence” come from? As we will see, for the velocity flow field, the divergence is a measure of how much the flow “diverges,” namely expands. Specifically, limiting ourselves to two–dimensional flows, we will see that the Gauss theorem (Eq. 19.79) states that the integral of the divergence over a region R equals the flux through the contour of R. If the divergence is positive (and hence the flux is positive), the flow expands, namely diverges. [The three–dimensional formulation, addressed in Vol. III, is similar.]
723
18. Multivariate differential calculus
18.3.3 The Laplacian
♣
We have the following Definition 300 (Laplacian). The Laplace operator, or Laplacian, denoted by ∇2 , is defined by ∇2 f := divgradf,
(18.60)
namely (as in Eq. 11.48) 3
∇2 f =
# ∂ 2f ∂ 2f ∂ 2f ∂ 2f + 2+ 2 = . 2 ∂x ∂y ∂z ∂x2k
(18.61)
k=1
In two dimensions, we use 2
∇2 f :=
# ∂ 2f ∂ 2f ∂ 2f + = . ∂x2 ∂y 2 ∂x2k
(18.62)
k=1
◦ Warning. The symbol ∇2 for the Laplacian is widely used in applied mathematics, physics, and engineering. An alternate symbol for the Laplacian, Δ, is preferred by some mathematicians. Such a symbol is not used in this book to denote the Laplacian, in order to avoid too many meanings for the same symbol. Indeed, the symbol Δ is typically used with other meanings, such as: (i) the discriminant Δ := b2 − 4ac of a quadratic equation (Eq. 6.44), (ii) difference of (dependent and independent) variables (Eq. 9.3), and (iii) discontinuities in the field. [The motivation for the symbol ∇2 used to indicate the Laplacian is discussed in Subsection 18.3.6.]
18.3.4 Curl
♣
We have the following4 4 The curl operator is also known as the rotor. This alternate term (along with the alternate notation rot b) is used primarily in some European countries. The term rotor is more descriptive, in that it is reminiscent of rotation. Indeed, the curl of a vector at a given point x∗ is related to the presence of rotation of the field around x∗ . [See in particular Eq. 18.67.] Also, the term rotor is particularly convenient to justify the term irrotational vector field (literally, vector field without rotation), which refers to a vector field whose curl equals zero (Definition 310, p. 762). [Indeed, I noted that my American students had more problems in grasping the connection than my Italian ones.] Nonetheless, I will use the term curl, simply because it seems to be the standard.
724
Part III. Multivariate calculus and mechanics in three dimensions
Definition 301 (Curl). Consider the vector b = bx i+by j+bz k, and assume its components to be differentiable. The curl of a vector b, denoted by curlb, is a vector defined by ∂bz ∂by ∂bx ∂bz ∂by ∂bx curlb = − i+ − j+ − k. (18.63) ∂y ∂z ∂z ∂x ∂x ∂y For two–dimensional fields, the first two components in the above equation vanish and we use the scalar k · curlb =
∂bx ∂by − ∂x ∂y
(18.64)
and refer to it as the two–dimensional curl. Remark 140. In two and three dimensions, the condition for a differential form to be exact (∂vh /∂xj = ∂vj /∂xh , Eq. 18.10) may be expressed as curlv = 0.
• An illustrative example. Curl in fluid dynamics
♥
In fluid dynamics, the curl of the velocity field is called the vorticity and is denoted with ζ. For planar velocity fields, the vorticity is a scalar, given by ζ := k · curlv =
∂vy ∂vx − . ∂x ∂y
(18.65)
◦ Comment. Consider a velocity field given by v = ω k × x = ω (−y i + x j).
(18.66)
with angular speed ω, namely the angle covered in unit time. [This corresponds to a rigid–body rotation around the origin. Hint: Take the time derivative of Eq. 7.3 (planar rigid–body rotation) and set θ˙ = ω.] Combining Eqs. 18.65 and 18.66, we have ζ = 2 ω.
(18.67)
[The fact that this relationship is valid in general (even in three dimensions for the corresponding vector quantities) is addressed in Vol. III.]
725
18. Multivariate differential calculus
• Curl in determinant notation In analogy with Eq. 15.78 (cross product), Eq. 18.63 may be conveniently expressed in determinant notation, as
i j k
curlb =
∂/∂x ∂/∂y ∂/∂z
, (18.68)
bx by bz where we (nonchalantly) use the rule for the determinant of a 3 × 3 matrix (Eq. 3.165, or the Sarrus rule, Eq. 3.166), even though the elements of the determinant include base vectors, partial derivative operators, and functions. ◦ Warning. In addition to disregarding the fact that the elements of the matrix in Eq. 18.68 are a mixture of base vectors, partial derivative operators, and functions, we have to be careful in using Eq. 18.68 and make sure that the derivatives appear in front of the functions.
• Curl in indicial notation
♣
Setting c := curlb and using indicial notations, we have c1 =
∂b3 ∂b2 − , ∂x2 ∂x3
c2 =
∂b1 ∂b3 − , ∂x3 ∂x1
c3 =
∂b2 ∂b1 − , ∂x1 ∂x2
(18.69)
which may be written as (use the definition of hjk , Eq. 15.77) ch =
3 # j,k=1
hjk
∂bk . ∂xj
(18.70)
• Relationships involving curl Note that, for any differentiable α(x) and b(x), we have curl(α b) = gradα × b + α curlb.
(18.71)
[Indeed, in terms of components, we have (use Eq. 18.70, as well as the cross "3 product in terms of components, Eq. 15.80, namely ch = j,k=1 hjk aj bk ) 3 # j,k=1
hjk
3 3 # # ∂ ∂α ∂bk α bk = hjk bk + α hjk , ∂xj ∂xj ∂xj j,k=1
j,k=1
(18.72)
726
Part III. Multivariate calculus and mechanics in three dimensions
which is fully equivalent to Eq. 18.71.] In addition, we have, for any twice differentiable vector b(x), curl(curlb) = grad(div b) − ∇2 b,
(18.73)
where ∇2 is defined in Eq. 18.61. [Indeed, setting c = curl(curlb) and using "3 k=1 hjk klm = δhl δjm − δhm δjl (Eq. 15.99), we have 5 3 6 3 3 # # # ∂ 2 bm ∂ ∂bm δhl δjm − δhm δjl hjk klm = ∂xj ∂xl ∂xj ∂xl j,k=1
l,m=1
=
3 # j=1
2
j,l,m=1
3 #
2
∂ bj ∂ bh − , ∂xj ∂xh j=1 ∂x2j
(18.74)
in agreement with Eq. 18.73.] Furthermore, for any differentiable a(x) and b(x), we have curl a × b = b · grad a + a div b − a · grad b − b div a.
(18.75)
[Indeed, in terms of components we have (use Eq. 15.99 again) 5 3 6 3 3 # # # ∂ ∂ δhl δjm − δhm δjl a l bm hjk klm al bm = ∂xj ∂xj j,k=1
l,m=1
=
3 # j=1
j,l,m=1
3 #
∂ ∂ a h bj − a j bh , ∂xj ∂xj j=1
(18.76)
which is fully equivalent to Eq. 18.75.] Another useful formula is a × curlb + b × curla = grad(a · b) − (a · grad b − (b · grad a. [Indeed, in terms of components we have (use Eq. 15.99 again) 5 6 3 3 3 # # # ∂bm ∂am hjk aj klm + bj klm ∂xl ∂xl j,k=1 l,m=1 l,m=1 5 6 3 # ∂bm ∂am δhl δjm − δhm δjl aj = + bj ∂xl ∂xl j,l,m=1 5 6 3 # ∂bj ∂aj ∂bh ∂ah = + bj − aj − bj aj , ∂xh ∂xh ∂xj ∂xj j=1 which is equivalent to Eq. 18.77.]
(18.77)
(18.78)
727
18. Multivariate differential calculus
18.3.5 Combinations of grad, div, and curl
♣
◦ Warning. In this section, the term “suitably smooth” indicates that we assume that the hypotheses of the Schwarz theorem on the interchangeability of the order of the second mixed derivative (Eq. 9.126) are satisfied. We have the following important relationship between the operators curl and grad: curl gradu(x) = 0, (18.79) for any suitably smooth u(x). For, using Eqs. 18.43 and 18.68, as well as the Schwarz theorem (Eq. 9.126), we have
j k
i curl gradu(x) =
∂/∂x ∂/∂y ∂/∂z
= 0. (18.80)
∂u/∂x ∂u/∂y ∂u/∂z In addition, we have the following relationship between the operators div and curl: div curlb(x) = 0, (18.81) for any suitably smooth b(x). For, using the definitions of divergence and curl (Eqs. 18.56 and 18.68), as well as the Schwarz theorem (Eq. 9.126), we have ∂ ∂bz ∂by ∂ ∂bx ∂bz div curlb(x) = − + − ∂x ∂y ∂z ∂y ∂z ∂x ∂ ∂by ∂bx + − = 0. (18.82) ∂z ∂x ∂y
18.3.6 The del operator, and more
♣
Many authors find it convenient to introduce the vector differential operator del, which is represented by the symbol ∇, and is defined by ∇=
∂ ∂ ∂ i+ j+ k. ∂x ∂y ∂z
(18.83)
Using this operator, we can write gradf = ∇f,
div a = ∇ · a,
curla = ∇ × a.
(18.84)
728
Part III. Multivariate calculus and mechanics in three dimensions
◦ Comment. It may be noted that del is just a mathematical symbol, whose main advantage is that it makes it faster for the lecturer to write down the equations on the blackboard, and makes it easier for the student to work out and/or “remember” the various formulas. However, in my experience, most of the students end up with quite a vague understanding of the equations, solely visual, while confusing the three different operators, grad, div, and curl. For this reason, the use of the del symbol ∇ to refer to the operators grad, div, and curl is banned in this introductory text. However, I will use the del operator ∇ in the two following instances. First, ∇2 is used to indicate the Laplacian operator, introduced in Eq. 18.60. The motivation for such a symbol would be apparent if we were to use Eq. 18.84. In this case, we would have ∇2 f = div[gradf ] = ∇ · ∇f . I will also make use of the symbol ∇ to indicate (k = 1, . . . , n): ∇f (x1 , . . . , xn ) := ∂f /∂xk , (18.85) namely the (mathematicians’) vector of the derivatives of a function with respect to xk . Mathematicians often refer to this quantity as the gradient of the function f (x1 , . . . , xn ). [The situation is similar to that encountered for the term vector (see Remark 30, p. 94, on physicists’ vectors vs mathematicians’ vectors). Accordingly, I will follow the tradition and use the term gradient for both entities, specifying what I am referring to, whenever the meaning does not clearly emerge from the context.] Sometimes, I will use the mnemonically convenient notation ∂f ∂f := ∇f = . (18.86) ∂x ∂xk Similar considerations apply for the notation ∂f := gradf. ∂x
(18.87)
18.4 Taylor polynomials in several variables In this section, we extend to functions of several variables the Taylor polynomials introduced in Chapter 13 for a function of a single variable. We begin with functions of two variables and then extend the results to functions of more than two variables.
18. Multivariate differential calculus
729
18.4.1 Taylor polynomials in two variables Here, we consider a function of two variables, say f = f (x, y). First, we address the general theory. Then, we present a more meaningful, more elegant and more convenient expression. Finally, we address in detail the formulation for two–variable Taylor polynomials of degree two.
• General formulation For the sake of notational simplicity, here we limit ourselves primarily to Taylor polynomials around the origin, x0 = y0 = 0, since the generalization requires simply a shift of the origin (Eqs. 18.94 and 18.100). We also assume f (x, y) to be suitably smooth. For any fixed value of y, we have that f is only a function of x and the corresponding approximation with a Taylor polynomial of order n around the point x = 0 is given by (use Eq. 13.16 with x = O[h])
∂f
xn ∂ n f
f (x, y) = f (0, y) + + o[hn ], x + · · · + (18.88) ∂x (0,y) ∂xn (0,y) n!
where we use the notation g (0,y) := g(0, y), for any function g(x, y). Next, note that f (0, y) and ∂ p f /∂xp (0,y) (p = 1, 2, . . . , n) are functions of y. Hence, we can expand them around y = 0. Thus, using again Eq. 13.16, and assuming that y is of the same order of magnitude as x, namely y = O[h], one obtains
∂f
∂ n f
y n f (x, y) = f0 + + o[hn ] y + ··· + n (18.89) ∂y 0 ∂y 0 n!
∂f
∂ 2 f
∂ n+1 f
y n n + o[h + y + · · · + ] x + ... + ∂x 0 ∂x ∂y 0 ∂x∂y n 0 n!
n n x ∂ f
∂ n+1 f
∂ 2n f
y n n + o[h + o[hn ], + y + · · · + ] + ∂xn 0 ∂xn ∂y 0 ∂xn ∂y n 0 n! n!
where we use the notation g 0 := g(0, 0), for any function g(x, y). The above equation may be written as
n # ∂ h+k f
xh y k f (x, y) = + o[hn ]. (18.90) ∂xh ∂y k 0 h! k! h,k=0
730
Part III. Multivariate calculus and mechanics in three dimensions
• A more meaningful expression Here, we want to obtain a more meaningful (and more elegant and more convenient) expression for the above Taylor expansion. Indeed, the above equation is somewhat inconsistent. To see this, note that the presence of the term with xn y n implies that the above expression is a polynomial of order 2n, although it contains powers of x no larger than n, as well as powers of y, also no larger than n. Now, some of the terms included explicitly in Eq. 18.90 are of an order higher than hn . [For instance, the term xn y n is of order h2n .] On the other hand, the error is o[hn ]. In other words, we have included terms of order equal to or higher than that of terms that we have neglected. Adding these terms does not improve the overall accuracy of the approximation. Accordingly, we might as well neglect all the terms of order higher than n. This way, we end up with a general polynomial of order n, namely a polynomial that contains all the possible powers of the type xn−k y k (k = 0, 1, . . . , n) along with all the lower–order ones, but none of the higher–order ones. This may be obtained by collecting equal–order terms in Eq. 18.89, to yield
∂f
∂f
f (x, y) = f0 + x+ y ∂x 0 ∂y 0 +...
k
n n
x ∂ f
x ∂nf y n−k ∂ n f
y n
+ · · · + + · · · + + ∂xn 0 n! ∂xk ∂y n−k 0 k! (n − k)! ∂y n 0 n! +o[hn ].
(18.91)
This may be written as f (x, y) = tn (x, y) + o[hn ],
(18.92)
where tn (x, y) denotes the Taylor polynomial of degree n for a function of two variables, given by p n # #
k
x ∂pf y p−k
tn (x, y) = ∂xk ∂y p−k 0 k! (p − k)! p=0 k=0
p n #
k p−k 1 # p ∂pf
x y , = k ∂ y p−k p! ∂x k 0 p=0
(18.93)
k=0
where
p = p!/[h! (p − h)!] denotes the binomial coefficient (Eq. 13.35). k
731
18. Multivariate differential calculus
◦ Comment. As mentioned at the beginning of this subsection, the extension to the case in which we want an expansion around x0 = 0 and y0 = 0 is obtained simply by shifting the origin, namely by replacing (in Eq. 18.93) x and y, with x − x0 and y − y0 respectively. This yields
p n #
∂pf 1 # p
(x − x0 )h (y − y0 )p−h , tn (x, y) = h ∂ y p−h h p! ∂x 0 p=0
(18.94)
h=0
where now the notation 0 denotes evaluation at x = x0 and y = y0 . • Taylor polynomials of degree two For n = 2, Eq. 18.91 yields
∂f
∂f
x + y f (x, y) = f0 + ∂x 0 ∂y 0
∂ 2 f
∂ 2 f
2 1 ∂ 2 f
2 x + 2 x y + y + o[h2 ], + 2 ∂x2 0 ∂x ∂y 0 ∂y 2 0
(18.95)
which may be written as f (x, y) = t2 (x, y) + o[h2 ],
(18.96)
with t2 (x) = f0 + g0T x +
1 T x H0 x, 2
(18.97)
where x = x, y T , whereas ' g := ∇f (x, y) =
∂f /∂x ∂f /∂y
( (18.98)
is the (mathematicians’) gradient of f (x, y) (Eq. 18.85). Moreover, the matrix H(x, y), given by ⎡ ⎤ ∂2f ∂2f ⎢ ⎥ ⎢ ∂x2 ∂x ∂y ⎥ H(x, y) = ⎢ 2 (18.99) ⎥, ⎣ ∂ f ∂2f ⎦ ∂x ∂y ∂y 2 is known as the Hessian matrix of f (x, y).5 5
Named after the German mathematician Ludwig Otto Hesse (1811–1874).
732
Part III. Multivariate calculus and mechanics in three dimensions
◦ Comment. A very simple generalization of the above results is obtained by expanding the function around the point x0 , instead of the origin. In this case, we have f (x) = f0 + g0T (x − x0 ) +
1 (x − x0 )T H0 (x − x0 ) + o x − x0 2 , (18.100) 2
where now the subscript 0 denoted evaluation at x = x0 .
18.4.2 Taylor polynomials for n > 2 variables
♣
It should be noted that the above process (expanding first in x, then in y) may be repeated in the case of n variables x1 , x2 , . . . , xn . The result is that Eq. 18.97 is still valid with x = xk = x1 , . . . , xn T , and g := ∇f (x1 , . . . , xn ) =
∂f ∂xk
and
H = hhk
∂2f , (18.101) := ∂xh ∂xk
as you may verify. Even better, you may continue the procedure used in obtaining the first equality in Eq. 18.93 and show that, for a function f = f (x, y, z), we have f (x, y, z) = tn (x, y, z) + o[hn ], with tn (x, y, z) =
n #
#
p=0 h+j+k=p
h j k
x y z ∂pf
, h j k ∂x ∂y ∂z 0 h! j! k!
(18.102)
" where the summation symbol h+j+k=p is used to indicate that we include all the values of h, j, k such that h + j + k = p.
18.5 Maxima and minima for functions of n variables In this section, we extend to n variables the conditions for maxima and minima introduced in Subsection 9.5.1 and in Section 13.6 for a function of one variable. [Again, for the sake of simplicity, we present only the conditions for minimum, since those for maximum may be obtained simply by changing the sign of the function.] We have the following
733
18. Multivariate differential calculus
Definition 302 (Local minimum of a function). A function f (x) is said to attain a local minimum at x = x0 iff there exists an ε > 0, arbitrarily small, such that f (x) − f (x0 ) > 0 for any x = x0 with x − x0 < ε. We have the following Theorem 191. Consider a function f (x) that is twice differentiable and assume that H0 := H(x0 ) = O. Then, the necessary and sufficient conditions for f (x) to attain a local minimum at x0 are that g(x0 ) vanishes and that H0 is positive definite. ◦ Proof : For simplicity, assume that x0 = 0. [If not, set x − x0 = y.] Consider first the necessary condition. Assume that f (x) attains a minimum at x = 0. Then, consider Eq. 18.97, which as stated above is also valid in n dimension, with g and H given by Eq. 18.101. Let us begin by showing that g0 = 0. Let us consider xT = [x1 , 0, . . . , 0]. Then, the function f (x) becomes a function of a single variable, whose Taylor expansion is f (x1 , 0, . . . , 0) = f0 + g01 x1 +
1 h0 x2 + o[x21 ]. 2 11 1
(18.103)
According to Theorem 99, p. 384, we have that g01 = 0 is a necessary condition for the function to have a local minimum in x1 = 0. Similarly, we prove that g0k = 0 (k = 2, . . . , n), so that g0 = 0. Thus, we are left with f (x) = f0 + 12 xT H0 x + o[x2 ]. The definition of minimum requires that f (x) − f (0) = 12 xT H0 x + o[x2 ] > 0, with x suitably small. This requires that H0 be positive definite (Definition 70, p. 123), as you may verify. [Hint: Note that in the condition f (x) − f (0) = 12 xT H0 x + o[x2 ] > 0, the term o[x2 ] may be made smaller than 12 xT H0 x , by choosing x suitably small (see Subsection 8.3.3, on the “big O” and “little o” notation, in particular Eq. 8.79, and the comment that follows).] This completes the proof of the necessary condition. Next, let us address the sufficient condition. If g0 = 0 and H0 is positive definite, then for x suitably small we have f (x) − f (0) = 12 xT H0 x + o[x2 ] > 0, which proves that the above condition is sufficient as well.
18.5.1 Constrained problems. Lagrange multipliers
♣
Here, we address the problem in the presence of constraints. Specifically, consider the problem f (xk ) = stationary xh
(h = 1, . . . , n),
(18.104)
734
Part III. Multivariate calculus and mechanics in three dimensions
with equality constraints gr (xk ) = 0
(r = 1, . . . , nC),
(18.105)
with nC < n. To address this problem, let us divide the variables xh into two groups, say yh = xh (h = 1, . . . , nF, with nF = n−nC) and zj = xj+nF (j = 1, . . . , nC), and assume — only for the time being — that the constraints in Eq. 18.105 may be used to express the variables zj explicitly in terms of the variables yk , as zj = zj (yk ).
(18.106)
Then, the problem may be formulated as an unconstrained problem: f yk , zj (yk ) = stationary (h = 1, . . . , n). (18.107) yh
The solution to this problem is obtained by imposing the condition nC
# ∂f ∂zj ∂f + =0 ∂yh j=1 ∂zj ∂yh
(h = 1, . . . , nF).
(18.108)
[Hint: Use Theorem 191, p. 733.] In matrix notation, we have fy + Z fz = 0, (18.109) where fy = ∂f /∂yh , fz = ∂f /∂zj , and Z = zhj = ∂zj /∂yh . To improve the generality of the result, let us remove the assumption that the variables zj may be expressed explicitly in terms of the variables yh . In other words, let the constraints be given in implicit form, as in Eq. 18.105, i.e.,
gr (yk , zj ) = 0
(r = 1, . . . , nC).
(18.110)
Note that the only difference with respect to Eq. 18.108 is that ∂gr /∂zj cannot be obtained from Eq. 18.106, because in Eq. 18.110 the functions zj = zj (yh ) are only defined implicitly. This, however, is not going to prevent us from dealing with the problem. For, note that Eq. 18.110 yields nC
∂gr # ∂gr ∂zj + = 0, ∂yh j=1 ∂zj ∂yh or, in matrix notation,
(18.111)
735
18. Multivariate differential calculus
Gy + Z Gz = O,
(18.112)
(y) (z) with Gy = ghr = ∂gr /∂yh , Gz = gjr = ∂gr /∂zj , Z = zhj = ∂zj /∂yh . Next, assume Gz to be invertible. Then, right–multiplying Eq. 18.112 by Gz-1, we obtain the desired expression for Z, namely (use A A -1 = I, Eq. 16.22) Z = −Gy Gz-1.
(18.113)
Combining with Eqs. 18.109 and 18.113, one obtains fy − Gy Gz-1 fz = 0, or fy + Gy λ = 0,
(18.114)
where we have introduced the vector λ = {λr } := −Gz-1 fz , namely fz + Gz λ = 0.
(18.115)
Finally, recalling that the vector x was partitioned into the vectors y and z, Eqs. 18.114 and 18.115 may be combined to yield ∇f + G λ = 0,
(18.116)
with ∇f = {∂f /∂xk } = fyT , fzT T and G = [Gkr ] = [∂gr /∂xk ] = [GTy , GTz ]T . Equation 18.116 may be written in a more expressive way as ∂g ∂f + λT =0 ∂xk ∂xk
(k = 1, . . . , n).
(18.117)
Interestingly, we have obtained that the solution of f (x) = stationary, subject x
to the constraint g(x) = 0, is the same as that of J(x, λ) = f (x) + λT g(x) = stationary.
(18.118)
x, λ
[Indeed, this yields ∂J/∂xh = ∂f /∂xh + λT ∂g/∂xh = 0 (in agreement with Eq. 18.117), and ∂J/∂λr = gr (x) = 0 (namely the constraints).] The components λr of λ are known as the Lagrange multipliers, and the procedure outlined above is known as the method of Lagrange multipliers (after Joseph–Louis Lagrange).
• The case with f (x) quadratic and g(x) linear Here, we consider the case in which the function f (x) is quadratic and the functions gk (x) are linear, namely the problem
736
Part III. Multivariate calculus and mechanics in three dimensions
f (x) =
1 T x A x − bT x = stationary, 2 x
with g(x) = C x − d = 0.
(18.119)
The method of Lagrange multipliers consists in replacing the above problem with the following equivalent one J(x, λ) =
1 T x A x − bT x + λT (C x − d) = stationary. 2 x, λ
(18.120)
Using Theorem 191, p. 733, the solution is obtained by imposing A x − b + CT λ = 0, C x − d = 0.
(18.121)
ˆ x = b, ˆ with ◦ Comment. The above equations may be recast as Aˆ x b A CT ˆ ˆ ˆx = , , b= . (18.122) A= C O λ d ˆ ˆx = b ˆ is equivalent to [You may verify that A
1 2
ˆ ˆx − ˆxT b ˆ = stationary.] ˆxT A ˆ x
ˆ is not necessarily positive definite, even when A ◦ Comment. Note that A is. For instance, consider the 2 × 2 matrix A = I, with C = c1 , c2 . Then, ⎡ ⎤ 1 0 c1 ˆ = ⎣ 0 1 c2 ⎦. A (18.123) c1 c2 0 ˆ is not positive definite (Eq. 3.120), namely that there exists I claim that A ˆ y < 0. To this end, note that for y = 1, 1, α T , we have some y = 0 with yT A ˆ y = 2 + 2 α (c1 + c2 ), yT A
(18.124)
as you may verify. The right side can be positive or negative, depending upon the (arbitrary) value of α, independently of the value of c1 + c2 . This proves ˆ is not positive definite, even though A is. [These considerthat, in this case, A ations indicate that some powerful numerical techniques for solving positive definite linear algebraic systems (such as the conjugate–gradient method addressed in Vol. II) cannot be used in this case.]
Chapter 19
Multivariate integral calculus
In the preceding chapter, we covered multivariate differential calculus. To tackle particle dynamics in space in the next chapter, we also need some background material on multivariate integral calculus. Accordingly, in this chapter, we extend to two or three dimensions the definition and properties of one–dimensional integrals introduced in Chapter 10.
• Overview of this chapter The first five sections are limited to two-dimensional spaces. Specifically, in Section 19.1, we introduce line integrals, namely integrals over a line on a plane. Then, in Section 19.2, we show that under special conditions, a line integral can be path–independent, namely that its value depends upon the endpoints of of the line, but not on the line itself. Next, we address double integrals, namely integrals over a planar region (Section 19.3), and show how to evaluate them by iterated one–dimensional integrals (Section 19.4). At this point, we are in a position to introduce one of the most useful theorems, namely the Green theorem, which allows us to see that, under certain restrictive conditions, a double integral over a two-dimensional region R equals a line integral over the contour of R (Section 19.5). The last two sections deal with three–dimensional problems. Specifically, in Section 19.6, we first introduce the three types of integrals in space, namely line integrals, surface integrals, and volume integrals. Then, in Section 19.7, we discuss the time derivative of a volume integral. We also have an appendix (Section 19.8), where we show how to perform double and triple integrals in cylindrical and spherical coordinates, and in curvilinear coordinates in general.
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_19
737
738
Part III. Multivariate calculus and mechanics in three dimensions
19.1 Line integrals in two dimensions As stated above, this section and the following four are limited to R2 . Accordingly, a point P = (x, y) ∈ R2 is identified by the vector x = x i + y j. Here, we introduce line integrals, namely integrals over a line L ∈ R2 . Specifically, consider a continuous function of two variables, f (x, y), defined in Eq. 8.116. Again, it is convenient to conceive of this as a scalar field (function of a point), namely f = f (x) (Eq. 18.41). Next, consider a smooth line L ∈ R2 that is defined in parametric form by x = x(s) (Eq. 18.15), where s is the arclength along L, so that ds = dx (Eq. 18.12). Combining these two functions, we obtain the composite function f = f [x(s)] =: fˆ(s).
(19.1)
Note that fˆ(s) is a function of a single variable and its integral from s = a to s = b is simply a one–dimensional integral, namely an integral of the type introduced in Eq. 10.4. Hence, we divide the line into n elements and have /
b
Ga,b =
fˆ(s) ds =
a
/
b
f [x(s)] ds := lim
Δsk →0
a
n #
fˆ(sk ) Δsk ,
(19.2)
k=1
where sk is at the k-th element center, Δsk being its length. Alternatively, we write / Ga,b = f (x) ds (19.3) L(xa , xb )
and refer to it as the line integral of the function f (x) over L(xa , xb ), namely over the line L defined by x = x(s), from the point xa = x(a) to xa = x(b). In view of the fact that line integrals have been expressed as one– dimensional integrals, we have (use Eqs. 10.8) / / / f (x) ds + f (x) ds = f (x) ds, (19.4) L1 (xa , xb )
L2 (xb , xc )
L(xa , xc )
where L(xa , xc ) = L1 (xa , xb )∪L2 (xb , xc ), with the symbol ∪ denoting union (Definition 165, p. 347). As a consequence, we have (in analogy with Eq. 10.10) / / f (x) ds = − f (x) ds. (19.5) L(xa , xb )
L(xb , xa )
◦ Comment. Using Eq. 19.4, we can extend the definition of line integral to piecewise–smooth lines and piecewise–continuous functions.
739
19. Multivariate integral calculus
• Some special cases Of particular interest is the case in which f (x) is the projection of a vector field v(x) along to the tangent t(s) = dx/ds to the line L(xa , xb ) (Eq. 18.17), namely f (x) = v(x) · t(x) = v ·
dx . ds
(19.6)
In this case, Eq. 19.3 takes the form b
/ Ga,b = a
dx ds = v[x(s)] · ds
/ L(xa , xb )
v(x) · dx,
(19.7)
where xa = x(a) and xb = x(b). The integral in Eq. 19.7 is called the line integral from xa to xb of the vector field v(x). Note that, in analogy with Eq. 19.2, Eq. 19.7 implies / Ga,b =
L(xa , xb )
v(x) · dx := lim
Δxk →0
n #
v(xk ) · Δxk .
(19.8)
k=1
In terms of components, Eq. 19.8 reads / vx (x, y) dx + vy (x, y) dy Ga,b = L(xa , xb ) n #
= lim
Δxk →0
vx (xk , yk ) Δxk + vy (xk , yk ) Δyk .
(19.9)
k=1
Other line integrals of interest are obtained when f (x) is the Cartesian component of a vector, namely when f (x) = wh (x). Then, in vector form we have, in analogy with Eqs. 19.2 and 19.3, b
/ ga,b =
/ w[x(s)] ds =
a
w(x) ds.
(19.10)
L(xa , xb )
In particular, if w(x) = v(x) × t(x) = v(x) × dx/ds, we have another line integral of interest b
/ ga,b = a
dx ds = v[x(s)] × ds
/ L(xa , xb )
v(x) × dx.
(19.11)
740
Part III. Multivariate calculus and mechanics in three dimensions
19.2 Path–independent line integrals Here, we up the ante and turn our attention to special line integrals, called path–independent integrals, namely integrals that depend upon the endpoints, say xa and xb , but not upon the line that connects them. For the sake of clarity (and in order to make our life simpler), it is convenient to discuss path–independent integrals separately for simply connected (Definition 175, p. 352) and multiply connected regions (Definition 176, p. 352). These are addressed in Subsection 19.2.1 and 19.2.2, respectively.
19.2.1 Simply connected regions Here, we deal with simply connected regions. We have the following Definition 303 (Path–independent line integral). Let R denote an open simply connected region and L(xa , xb ) a line in R, described by x = x(s) and connecting two points, say xa = x(a) and xa = x(b) in R. A line integral is called path–independent, iff it is independent of the specific L(xa , xb ) and depends only upon the endpoints of the line, xa and xb . Next, we want to introduce a condition for a line integral of f (x) to be path–independent. To this end, let us introduce the following (recall that a contour is a finite–length piecewise–smooth non–self–intersecting line that closes on itself, Definition 167, p. 347) Definition 304 (Contour integral. Circulation). Let R denote an open simply connected region. A line integral over a contour C in R, covered counterclockwise, is called a contour integral, and is denoted by / IC = f (x) ds. (19.12) C
[The integral symbol indicates that C is covered counterclockwise.] Iff the integrand is of the type in Eq. 19.7, we use the term circulation, and we denote it by / Γ := v(x) · dx. (19.13) C
[The term circulation and the symbol Γ to indicate these types of integrals are borrowed from fluid dynamics.] We have the following
741
19. Multivariate integral calculus
Theorem 192. Let R denote an open simply connected region. Consider a function f (x), defined on R, and a piecewise–smooth line L(xa , xb ) in R, described by x = x(s) and connecting two arbitrary points xa = x(a) and x 0 a = x(b) in R. A necessary and sufficient condition for the line integral f (x) ds to be path–independent for any L(xa , xb ) ∈ R, is that the L(xa , xb ) corresponding contour integral over any arbitrary piecewise–smooth contour C in R vanishes, namely / f (x) ds = 0, for all piecewise–smooth C in R. (19.14) C
◦ Proof : Indeed, assume Eq. 19.14 to be valid. Then, any two lines L1 and L2 , both in R, that connect xa to xb may be combined to form a contour C := L1 (xa , xb ) ∪ L2 (xb , xa ) in R. [Note that if L1 goes from xa to xb , then L2 must go from xb to xa .] Hence, we have / / / 0 = f (x) ds = f (x) ds + f (x) ds, (19.15) C
L1 (xa ,xb )
L2 (xb ,xa )
or (use Eq. 19.5, on the interchange of the limits of integration) / / / f (x) ds = − f (x) ds = f (x) ds. L1 (xa ,xb )
L2 (xb ,xa )
(19.16)
L2 (xa ,xb )
Thus, the integral is path–independent (sufficient condition). Vice versa, assume the line integral to be path–independent. Then, we have / / / f (x) ds = f (x) ds + f (x) ds C L1 (xa ,xb ) L2 (xb ,xa ) / / = f (x) ds − f (x) ds = 0. (19.17) L1 (xa ,xb )
L2 (xa ,xb )
In other words, Eq. 19.14 is also a necessary condition.
• Contour integrals of gradu, for single–valued u(x) Recall that du = gradu · dx (Eq. 18.45). We have the following Theorem 193 (Contour integral of gradu, single–valued u(x)). Let R denote an open simply connected region. Consider any suitably smooth single– valued function u(x) defined on R and the integral 7 7 gradu(x) · dx = du, (19.18) IC = C
C
742
Part III. Multivariate calculus and mechanics in three dimensions
where C is a contour in R. We have IC = 0.
(19.19)
◦ Proof : Consider an arbitrary point x0 on C, and an open line L(x1 , x2 ) that goes from x1 to x2 (without encountering x0 ), where x1 and x2 are close to x0 (Fig. 19.1). Next, let x1 and x2 tend to x0 . We have
Fig. 19.1 Line integral of grad u
/ IC =
lim
x1 , x2 →x0
du = L(x1 ,x2 )
lim
x1 , x2 →x0
u(x2 ) − u(x1 ) = 0,
because u(x) is single–valued.
19.2.2 Multiply connected regions
(19.20)
♣
In this subsection, the region R is assumed to be open, but not simply connected. For the sake of simplicity, we assume that the region R has a single “hole,” namely that R is 2-connected (Definition 176, p. 352). Let C0 and C1 denote, respectively, the outer and inner contours of R (Definition 168, p. 348).
• Winding number Recall the difference between a closed line and a contour (a non–self– intersecting closed line). We have the following
19. Multivariate integral calculus
743
Definition 305 (Winding number in two dimensions). Consider a 2D oriented closed line LC and a point x0 in R2 . Assume that nW 2 π is the angle swept by the segment connecting x0 to x ∈ LC, as x travels along LC. The number nW is called the winding number of LC around x0 . A positive (negative) winding number corresponds to the total number of revolutions in the counterclockwise (clockwise) direction. Remark 141. From Definition 167, p. 347, of closed lines and contours, we may infer that a contour around a “forbidden region” R0 is a closed line with a winding number (around any x0 ∈ R0 ) equal to 1 iff it is counterclockwise oriented (and −1 iff it is clockwise oriented). It is apparent that a continuous deformation of a closed line that does not require crossing x0 does not affect the winding number. In particular, we have that shrinkable contours (Definition 174, p. 351) have a winding number equal to zero. As an illustration, consider the line LC (traveled in the orientation shown in Fig. 19.2. Its winding number around xk equals k, as you may verify. [Hint: Shrink the line LC without crossing the point xk under consideration.]
Fig. 19.2 Winding number
• Contour integrals and winding number Recall Definition 176, p. 352, of p-connected regions. We have the following Theorem 194. Let R denote an open 2-connected region. Consider a vector field v(x), defined on R. Assume that
744
Part III. Multivariate calculus and mechanics in three dimensions
7 CS
v(x) · dx = 0,
(19.21)
for any shrinkable contour CS ∈ R. Next, consider any contour C that surrounds the hole. The value of the contour integral is independent of the contour C, and is denoted by / Γ := v(x) · dx, (19.22) C
where again C is covered in the counterclockwise direction. [Of course, the possibility that Γ = 0 is not to be excluded.] Moreover, for any closed line L(n) with a winding number equal to nW , we have C /
(n)
LC
v(x) · dx = nW Γ.
(19.23)
◦ Proof : Consider first the case nW = 1. In this case, the closed line is a = C. Then, given two different counterclockwise contour and we can set L(1) C contours, say C1 and C2 (Fig. 19.3.a), we have to show that / / v(x) · dx = v(x) · dx. (19.24) C1
C2
To this end, let us connect C1 and C2 with two lines, say L3 and L4 , that are
Fig. 19.3 Contour integral (multiply connected region)
close to each other, as in Fig. 19.3.b. Let us denote by Lk (k = 1, 2) the line obtained from Ck by removing from the light gray region the narrow strip between L3 and L4 . [In the limit, as the line L3 tends to L4 , we have that Lk tends to Ck (k = 1, 2).] Note that the lines L1 , L2 , L3 and L4 form a contour C := L1 ∪ L2 ∪ L3 ∪ L4 that is shrinkable. Hence,
745
19. Multivariate integral calculus
/ v(x) · dx = 0,
(19.25)
C
by hypothesis. On the other hand, as L3 and L4 become infinitesimally close to each other, the contributions for these two lines offset one another, since they are covered in opposite directions (use Eq. 19.5, on the interchange of the limits of integration). Then, we have / / / lim v(x) · dx = v(x) · dx − v(x) · dx = 0, (19.26) L3 →L4
C
C1
C2
in agreement with Eq. 19.24. [Hint: The minus sign in front of the last integral is due to the fact that we assume C to be covered counterclockwise, and hence L1 is covered counterclockwise, whereas L2 is covered clockwise.] This completes the proof that Γ is independent of Ck (Eq. 19.22, for nW = 1). may be deformed To prove Eq. 19.23, it is sufficient to use the fact that L(n) C into the union of n contours C1 .
• Contour integrals of gradu In particular, if v = gradu, we have the following Theorem 195 (Integrals of gradu, for 2-connected regions). Let C denote a closed line in an open 2-connected region R. Consider the multi–valued function u(x) defined on R by Eq. 19.95. Then, / / IL(n) = gradu(x) · dx = du = nW Γ, (19.27) C
(n)
LC
(n)
LC
around the hole of where nW is the winding number of the closed line L(n) C 2-connected region R. ◦ Proof : The theorem is a direct consequence of Theorem 193, p. 741 (on integrals of exact differentials) and of Theorem 194, p. 743. ◦ Important observation. If the function u(x) is single–valued, then Γ = 0, as you may verify. [Hint: Use Theorem 193, p. 741.] ◦ Comment. The formulation for n-connected regions with n > 2 is similar, as you may verify.
746
Part III. Multivariate calculus and mechanics in three dimensions
19.3 Double integrals In this section, still limited to 2D problems, we address double integrals, namely integrals over a planar region. We begin with double integrals of continuous functions over rectangular regions (Subsection 19.3.1). Then, in Subsection 19.3.2, we refine the definition of integrals, along the lines of the material on the Darboux integral (Section 10.8), so as to make it applicable to piecewise–continuous functions. This in turn is used to extend the definition to arbitrary regions (Subsection 19.3.3).
19.3.1 Double integrals over rectangular regions
♣
We have the following Definition 306 (Double integral over a rectangular region). Let R denote a closed rectangular (and hence bounded) region on the plane (x, y), specifically a region with x ∈ [a, b] and y ∈ [c, d]. Consider a continuous function f (x, y), defined on R. Let us divide R into n × n equal rectangular subregions Rhk , with boundaries given by coordinate lines. Accordingly, any point P = (x, y) in Rhk satisfies the following conditions: (i) x ∈ [xh−1 , xh ], where xh = a + h Δx (h = 0, . . . , n), with Δx = (b − a)/n, and (ii) y ∈ [yk−1 , yk ], where yk = c + k Δy (k = 0, . . . , n), with Δy = (d − c)/n) (Fig. 19.4). Let f˘hk denote the value of f (x, y) at the midpoint of the rectangular
Fig. 19.4 Double integral (rectangular region)
subregion Rhk , namely f˘hk = f (˘ xh , y˘k ), where x ˘k = 12 (xh−1 + xh ) and y˘k = 1 the double integral I of the 2 (yk−1 + yk ). Then, 00 00 function f (x) over the region R, denoted by I = R f (x, y) dx dy or I = R f (x) dR, is defined by
747
19. Multivariate integral calculus
I := lim
n→∞
n #
f˘hk Δx Δy,
(19.28)
h,k=0
provided of course that the limit exists. ◦ Warning. If f = f (x, y, . . . ), we use the notation // f (x, y, . . . ) dR(x)
(19.29)
R
to indicate that x is the dummy variable of integration.
19.3.2 Darboux double integrals
♣
In this subsection, we extend to two dimensions the definition of Darboux integrals (introduced in Section 10.8 for one–dimensional integrals) and show that, if we use it, the double integral defined in Eq. 19.28 exists if the function f (x, y) is piecewise–continuous in R. Consider a function f (x, y), defined on the closed rectangular region R, with x ∈ [a, b] and y ∈ [c, d]. Here, we assume that the increase of the number of subregions is obtained by dividing the intervals [a, b] and [c, d] into two equal subintervals, then into four, then into eight, and so on. Specifically, as in the preceding subsection, the sizes of the region Rhk are Δx = (b − a)/n and Δy = (d − c)/n. However, now we have n = 2p . In other words, by increasing p by 1, we divide each rectangular subregion into four, equal to each other. Sup Inf Next, let fhk and fhk denote respectively the supremum and infimum of f (x, y) for x ∈ [xh−1 , xh ], where xh = a + h Δx (h = 0, . . . , n) and y ∈ [yk−1 , yk ], where yk = c + k Δy (k = 0, . . . , n). Let us consider the sequences Inf
sp =
p n=2 #
h,k=1
Inf
fhk Δx Δy
and
Sup
sp
=
p n=2 #
Sup fhk Δx Δy.
(19.30)
h,k=1
Sup It is apparent that sInf p ≤ sp . In addition, the fact that, by increasing p by 1, we divide each subregion into four, guarantees that Sup Inf Sup sInf p ≤ sp+1 ≤ · · · ≤ sp+1 ≤ sp .
(19.31)
Sup Thus, {sInf p } is a bounded non–decreasing sequence, whereas {sp } is a bounded non–increasing sequence. Thus, both sequences converge (Theorem 74, p. 309), and we can define
748
Part III. Multivariate calculus and mechanics in three dimensions
sInf = lim sInf p
and
n→∞
sSup = lim sSup p . p→∞
(19.32)
Then, we have the following Definition 307 (Darboux double integral of f (x, y)). Iff sInf and sSup, as defined in Eq. 19.32, coincide, the expression // f (x, y) dR = sInf = sSup (19.33) R
is called the double integral of the function f (x, y) over the region R.
• Piecewise–continuous functions You might wonder whether from the above definition we are able to prove that Eq. 19.28 is valid whenever the function f (x, y) is piecewise–continuous. To this end, we have the following Theorem 196. If the function f (x, y) is piecewise–continuous in the closed rectangular region R, then sInf and sSup coincide, and are equal to the limit in Eq. 19.28, provided of course that we use the same subdivision scheme. ◦ Proof : We begin by assuming the function f (x, y) to be continuous in R. We have
n
#
Inf Inf f˘hk − fhk Δx Δy ≤ (b − a) (d − c) max f˘hk − fhk , (19.34)
h,k h,k=1
where the symbol maxk [ak ] denotes the maximum of the values a1 , . . . , an (Definition 164, p. 346). If f (x) is continuous (and hence, uniformly continuous, Theorem 86, p. 343), we have that small ε > 0,
given an arbitrarily Inf there exists an N such that f˘hk − fhk < ε, for any n = 2p > N . Therefore, lim
p→∞
p n=2 #
f˘hk Δx Δy = sInf .
(19.35)
f˘hk Δx Δy = sSup.
(19.36)
h,k=1
Similarly, we prove that lim
p→∞
p n=2 #
h,k=1
Thus, we have proven that, if f (x, y) is continuous, then
749
19. Multivariate integral calculus
sInf = sSup = lim
p→∞
p n=2 #
f˘hk Δx Δy.
(19.37)
k=1
In other words, for f (x, y continuous, the two limits in Eq. 19.32 coincide and the limit in Eq. 19.28 always exists. Next, consider the case of a piecewise–continuous function. The considerations in Section 10.8 (on the convergence of integrals of bounded piecewise– continuous functions) may be extended to double integrals as well. Indeed, as pointed out in Definition 181, p. 355, of 2D piecewise–continuous function, the total length of the lines formed by the points of discontinuity is finite. Thus, the area of the union of all the rectangles that contain a discontinuity vanishes as n tends to infinity. Hence, its contribution tends to zero, and sSup and sInf coincide as n tends to infinity.
19.3.3 Double integrals over arbitrary regions Consider a piecewise–continuous function f (x, y), defined on the closed convex region R, having contour C (Fig. 19.5).
Fig. 19.5 Double integral (arbitrary region)
We can extend the domain of definition of f (x, y) to the closed rectangular region R0 (which includes R), with x ∈ [a, b] and y ∈ [c, d], by introducing the function fˆ(x) defined on R0 as fˆ(x) = f (x), = 0,
if x ∈ R; if x ∈ / R.
(19.38)
750
Part III. Multivariate calculus and mechanics in three dimensions
This function is piecewise–continuous in R0 . [Hint: Recall that the contour of a region has a finite length, Definition 168, p. 348.] Therefore, we can use all the results presented in the preceding subsection, in particular the definition of double integral, in Eq. 19.33. [Note that in Fig. 19.5 the region R is composed of all the dark gray rectangles and portions of the light gray ones. As the subdivision gets smaller and smaller, the contribution of the light gray rectangles tends to zero, as in the proof of Theorem 196, p. 748.]
• Properties of double integrals In analogy with Eq. 10.7, we have // α f (x, y) + β g(x, y) dR R // // =α f (x, y) dR + β g(x, y) dR, R
(19.39)
R
as you may verify. [Hint: Use the definition, Eq. 19.33.] In addition, consider a closed region R, along with a finite–length line L that divides R into two closed regions R1 and R2 , so that R = R1 ∪ R2 . Then, in analogy with Eq. 10.8, you may show that // // // f (x, y) dR + f (x, y) dR = f (x, y) dR. (19.40) R1
R2
R
◦ Comment. In analogy with Eq. 10.3 for the one–dimensional integrals, we have that the double integral gives the volume of the region bounded by
Fig. 19.6 Double integrals and volumes
751
19. Multivariate integral calculus
the graph of the function and the region R of (x, y)-plane, with the volume understood as being negative when f (x, y) < 0 (Fig. 19.6).
19.4 Evaluation of double integrals by iterated integrals Here, we address the following question. Can we replace a double integral with two iterated integrals, first one in the x-direction and then one in the y-direction, or vice versa? To be specific, let us consider a closed rectangular region R. This time, however, we have to use a different number of subdivisions in the two directions (say m and n), so as to allow for the possibility that Δx = (b − a)/m and Δy = (d − c)/n, go to zero independently of each other. Accordingly, had we defined // f (x, y) dx dy := lim
n→∞
R
n #
lim
m→∞
k=0
we would have had //
m #
d
/
b
/
c
f (˘ xh , y˘k ) Δx Δy,
(19.41)
h=0
f (x, y) dR = R
f (x, y) dx dy.
(19.42)
a
On the other hand, had we defined // f (x, y) dx dy := lim
Δx→0
R
we would have had //
m # h=0
b
/
lim
Δy→0
d
/
f (x, y) dR = R
a
n #
f (˘ xh , y˘k ) Δy Δx, (19.43)
k=0
f (x, y) dy dx.
(19.44)
c
The expressions on the right side of Eqs. 19.42 and 19.44 are called iterated integrals. Accordingly, we ask ourselves whether a double integral may be evaluated via iterated integrals. Remark 142. Note the notational convention typically used for iterated integrals. The limits on the first integral sign refer to the last differential. Typically, the parentheses are not used. However, I will use them, whenever deemed necessary, in order to avoid possible misunderstandings. Similar conventions are used for triple integrals, and — in general — for multiple integrals. [Another notation often used is
752
Part III. Multivariate calculus and mechanics in three dimensions
//
b
/ f (x, y) dR = R
/
d
dx a
f (x, y) dy.
(19.45)
c
I avoid this because it was confusing to me when I was a student.]
• A case of failure of iterated integrals♥ Note that interchanging the order of integration does not always yield the same result. For instance, consider the function y f (x, y) = tanP-1 , x
for (x, y) = (0, 0).
(19.46)
We have ∂2f ∂2f y 2 − x2 = = 2 , ∂x ∂y ∂y ∂x (x + y 2 )2
for (x, y) = (0, 0),
(19.47)
as you may verify. [Hint: For (x, y) = (0, 0), we have ∂f 1 = ∂x 1 + y 2 /x2 ∂f 1 = ∂y 1 + y 2 /x2
−y −y = 2 , 2 x x + y2 1 x = 2 , x x + y2
(19.48)
and hence ∂2f −1 2 y2 y 2 − x2 = 2 + = , ∂x ∂y x + y2 (x2 + y 2 )2 (x2 + y 2 )2 ∂2f 1 2 x2 y 2 − x2 = 2 − = , ∂y ∂x x + y2 (x2 + y 2 )2 (x2 + y 2 )2
(19.49)
in agreement with Eq. 19.46.] Accordingly, we have / 1 / 1 2 / 1 / 1 y − x2 −y
1 −1 dx dy dx = dx = 2 + y 2 )2 2 + y 2 0 2+1 (x x x 0 0 0 0
1 −π
(19.50) = − tanP-1 x = 4 0 and / 1 / 1 y 2 − x2 x
1 1 dx dy = dy = dy
2 + y 2 )2 2 + y2 0 (x x 1 + y2 0 0 0
1 π
(19.51) = tanP-1 y = . 4 0
/ 1 / 0
1
753
19. Multivariate integral calculus
In other words, interchanging the order of integration yields different results. It would be nice to know when the double integral may be evaluated by using iterated integrals, which requires that interchanging the order of the iterated integrals does not affect the result. The issue is addressed in the rest of this section.
• Fubini theorem for rectangular regions
♣
Here, we present a preliminary result that will be generalized in the subsubsection that follows. We have the following1 Theorem 197 (Fubini theorem for rectangular regions). If the function f (x, y) is piecewise–continuous in a closed rectangular region R, we have / d / f (x, y) dR =
// R
c
b
/ b / f (x, y) dx dy =
a
a
d
f (x, y) dy dx. (19.52)
c
In other words, the double integral may be evaluated by using iterated integrals, and the order of integration may be interchanged. [Here, the integrals are understood a la Darboux (namely given by Eq. 19.28, with n = 2p ).] ◦ Proof : ♠ Let us limit ourselves to the first equality. [The proof of the equality of the first and third terms is virtually identical.] Let us begin with the restriction that f (x, y) be continuous. Consider the function b
/ F (y) :=
f (x, y) dx.
(19.53)
a
Dividing the interval [a, b] into n = 2p equal subintervals Ij = [xj−1 , xj ], with j = 1, . . . , 2p , and using the mean value theorem for integral calculus (Eq. 10.11), we have F (y) =
p n=2 #
j=1
/
xj
f (x, y) dx = xj−1
p n=2 #
f [ξj (y), y] Δx,
(19.54)
j=1
where ξj (y) ∈ Ij . Similarly, dividing also the interval [c, d] into n = 2p equal subintervals Jk = [yk−1 , yk ], with k = 1, . . . , 2p , and using again the mean value theorem for integral calculus (Eq. 10.11), we have 1
Named after the Italian mathematician Guido Fubini (1879–1943).
754
Part III. Multivariate calculus and mechanics in three dimensions d
/
F (y) dy = c
p n=2 #
yk
/
F (y) dy = yk−1
k=1
p n=2 #
F (ηk ) Δy,
(19.55)
f [ξj (ηk ), ηk ] Δx Δy.
(19.56)
k=1
where ηk ∈ Jk . Combining, we have d
/
b
/
f (x, y) dx dy =
c
a
p n=2 #
j,k=1
On the other hand, the convergence of Eq. 19.28 with n = 2p (guaranteed by Theorem 196, p. 748, valid for piecewise–continuous functions) implies that // f (x, y) dR = R
p n=2 #
f˘jk Δx Δy + Ep ,
(19.57)
j,k=1
where, given an ε1 > 0 arbitrarily small, there exists an N1 such that |Ep | < (b − a)(d − c)ε1 , for any n = np > N1 . Then, subtracting Eqs. 19.56 and 19.57, one obtains
//
/ d / b
f (x, y) dR − f (x, y) dx dy
R
c a =
p n=2 #
˘
fjk − f [ξj (ηk ), ηk ] Δx Δy + Ep .
(19.58)
j,k=1
Next, recall that, for the time being, we have assumed the function f (x, y) to be continuous in the closed rectangular region R, and hence uniformly continuous there (Theorem 90, p. 356). Accordingly, for any x ∈ Ij and any y ∈ Jk , given an ε arbitrarily small, there exists an N2 such that
f˘jk − f (x, y) < ε2 , (19.59) for any n = 2p > N . This implies, for n > N1 and n > N2 ,
//
/ d / b
f (x, y) dR − f (x, y) dx dy < (b − a) (d − c) ε,
R
c a
(19.60)
where ε denotes the greater of ε1 and ε2 . This completes the proof of the theorem for f (x, y) continuous in R. Finally, let us remove the restriction that f (x, y) be continuous, and assume instead that f (x, y) is simply piecewise–continuous. In this case, we can use the strategy adopted in the proof of Theorem 196, p. 748 and obtain
755
19. Multivariate integral calculus
that the contribution of the rectangles that include discontinuities vanish as n tends to infinity. ◦ Comment. In the example in Eqs. 19.46–19.51, the function is not bounded (and hence not continuous) in the closed region of interest — it’s not even defined at the origin.
• Fubini theorem for convex regions
♣
Here, we extend the Fubini theorem to convex regions, which are defined by the following Definition 308 (Convex region). A region R (in R2 or R3 ) is called convex iff, given any two points A and B, all the points of the segment AB belong to R. Two–dimensional examples of convex and non–convex 2D regions are shown in Figs. 19.7 and 19.8, respectively.
Fig. 19.7 A convex region
Fig. 19.8 A non–convex region
Consider a convex region, R. Let C denote the contour of R, and let P1 and P2 , with x1 < x2 , denote the two points of C, where the tangent to C is parallel to the y-axis. The contour C is divided by P1 and P2 into two portions, described respectively by the equations (Fig. 19.9.a) y = Y1 (x)
and
y = Y2 (x),
(19.61)
with Y2 (x) ≥ Y1 (x) and x ∈ [x1 , x2 ]. Similarly, let Q1 and Q2 (with y1 < y2 ) denote the two points of C, where the tangent to C is parallel to the x-axis. The contour C is divided by Q1 and Q2 into two portions, described respectively by the equations (Fig. 19.9.b) x = X1 (y)
and
x = X2 (y),
(19.62)
756
Part III. Multivariate calculus and mechanics in three dimensions
Fig. 19.9 Iterated integral
with X2 (y) ≥ X1 (y) and y ∈ [y1 , y2 ]. Then, we have the following Theorem 198 (Fubini theorem for convex regions). If the function f (x, y) is piecewise–continuous in the closed bounded convex region R, we have / x2 / Y2 (x) // f (x, y) dR = f (x, y) dy dx R
Y1 (x)
x1
/
y2
/
X2 (y)
=
f (x, y) dx dy,
y1
(19.63)
X1 (y)
where the symbols x1 , x2 , Y1 (x) and Y2 (x), as well as the symbols y1 , y2 , X1 (y) and X2 (y), are defined in Fig. 19.9 above. ◦ Proof : Consider the function in Eq. 19.38, namely fˆ(x) = f (x) in R, and fˆ(x) = 0 otherwise. This function is piecewise–continuous, since the boundary of a region has a finite length (Definition 168, p. 348). Therefore, the Fubini theorem for rectangular regions (Eq. 19.52) applies. ◦ Comment. The above results are valid for the limited case of a convex region. However, the condition that the region R be convex is not essential to the proof of the above theorem. It is sufficient that the contour of R be representable either by Eq. 19.61, or by Eq. 19.62, for one of the two expressions in Eq. 19.63 to be valid. For instance, Fig. 19.8 is may be described by Eq. 19.61, and hence the first equality in Eq. 19.63 may be used.
757
19. Multivariate integral calculus
19.5 Green theorem We have the following2 Theorem 199 (Green theorem). Consider two functions M (x, y) and N (x, y) defined on the open region R0 of the (x, y)-plane. Assume ∂M /∂y and ∂N /∂x to be piecewise continuous in R0 . The Green theorem states that, for any closed, bounded, simply connected R ∈ R0 , we have // / ) * ∂N ∂M − dx dy = M (x, y) dx + N (x, y) dy , (19.64) ∂x ∂y R C where the counterclockwise contour C = ∂R is the boundary of R. ◦ Proof : Let us begin with the restriction that R is convex, so that the hypotheses for the Fubini theorem for convex regions (Eq. 19.63) are satisfied. Thus, we have (for the symbols x1 , x2 , Y1 (x) and Y2 (x), see Fig. 19.9.a, p. 756) // R
/ x2 / Y2 (x) ∂M ∂M dR = dy dx (19.65) ∂y ∂y x1 Y1 (x) / x2 / x2 / = M x, Y2 (x) dx − M x, Y1 (x) dx = − M (x, y) dx, x1
C
x1
where again the contour is understood to be covered in the counterclockwise direction. Similarly (for the symbols y1 , y2 , X1 (y) and X2 (y), see Fig. 19.9.b, p. 756), // R
/ y2 / X2 (y) ∂N ∂N dR = dx dy (19.66) ∂x y1 X1 (y) ∂x / y2 / / y2 N X2 (y), y dy − N X1 (y), y dy = N (x, y) dy, = y1
y1
C
where again the contour is understood to be covered counterclockwise. [The difference in sign in Eq. 19.65 and 19.66 is due to the fact that, if we cover C counterclockwise, the line y = Y2 (x) goes from x2 to x1 (see Eq. 19.65 and Fig. 19.9.a), whereas the line x = X2 (y) goes from y1 to y2 (see Eq. 19.66 and Fig. 19.9.b).] Combining Eqs. 19.65 and 19.66 yields Eq. 19.64. Next, let us remove the restriction that R be convex and consider an arbitrary region. In this case, R may always be divided into smaller regions, each of which is convex, so that Eq. 19.64 applies. Next, let us add Eq. 19.64 2
Named after the British mathematical physicist George Green (1793–1841). The three Green identities and the Green function are also named after him, and so is the Cauchy–Green deformation tensor.
758
Part III. Multivariate calculus and mechanics in three dimensions
for all the subregions and use Eq. 19.40. Noting that the contributions of the internal lines offset each other, we obtain the desired result, namely that the Green theorem (Eq. 19.64) is valid for any region.
19.5.1 Consequences of the Green theorem In this subsection, we address some consequences of the Green theorem. They all stem from the following Theorem 200. Consider two functions M (x, y) and N (x, y) defined on the open region R0 of the (x, y)-plane. Assume ∂M /∂y and ∂N /∂y to be piecewise–continuous in R0 . If ∂N ∂M = , ∂x ∂y
for every x ∈ R0
(19.67)
(in the sense given in Remark 70, p. 346, on the convention of piecewise– continuous functions), then 7 M dx + N dy = 0, for all C in R0 . (19.68) C
In addition, denoting with L(xa , xb ) a line from xa to xb and fully contained in R0 , we have that / M dx + N dy (19.69) L(xa , xb )
is path–independent. ◦ Proof : The first part of the theorem (Eq. 19.68) is an immediate consequence of the Green theorem (Eq. 19.64). The second part of the theorem (Eq. 19.69) is a consequence of Theorem 192, p. 741, on the conditions for the path–independence of line integrals. Accordingly, under the hypotheses of the above theorem, we need not specify which line we are using and we can use the less cumbersome notation / xb M dx + N dy . (19.70) xa
Therefore, still under the hypotheses of the above theorem, if we keep x0 fixed and allow x to vary, the integral
759
19. Multivariate integral calculus x
/
M dx + N dy
(19.71)
x0
defines a function of x. Then, we have the following Theorem 201 (Existence of u(x, y), with ∂u/∂x = M and ∂u/∂y = N ). Consider two functions M (x, y) and N (x, y) defined on the open bounded simply connected region R0 of the (x, y)-plane. Assume ∂M /∂y and ∂N /∂y to be piecewise continuous in R0 . If ∂N /∂x = ∂M /∂y for all points of R0 (use again Remark 70, p. 346, as in Eq. 19.67), then the function u(x) defined by / x M dx + N dy (19.72) u(x) := x0
(where x0 is fixed and x variable, and the line L(x0 , x) is fully contained in R0 ) is such that ∂u = M (x, y) ∂x
and
∂u = N (x, y), ∂y
(19.73)
so that du = M dx + N dy. ◦ Proof : By definition, we have / x+hi / x 1 ∂u = lim M dx + N dy − M dx + N dy h→0 h ∂x x0 x0 / 1 x+hi = lim M dx + N dy . (19.74) h→0 h x Since the integral is path–independent, we can use a straight line connecting x to x + hi, namely a line y = constant, so that dy = 0. Then, we have 1 ∂u = lim ∂x h→0 h
/
x+h
M (x , y) dx = M (x, y).
(19.75)
x
[Hint: For the second equality, use the corollary to the mean value theorem in integral calculus, Eq. 10.13.] The proof of ∂u/∂y = N (x, y) is analogous, as you may verify. ◦ Comment. Recall the first fundamental theorem of calculus (Eq. 10.38), namely that the derivative of an integral (considered as a function of its upper limit), with respect to the upper limit, equals the integrand evaluated at the upper limit itself. Note that Eq. 19.87 is simply an extension to multidimensional path–independent line integrals of such a theorem.
760
Part III. Multivariate calculus and mechanics in three dimensions
19.5.2 Green theorem vs Gauss and Stokes theorems
♣
In this subsection, we present the relationship between the Green theorem and the so–called 2D versions of: (i) Gauss theorem (Eq. 19.79) and (ii) Stokes theorem (Eq. 19.82). ◦ Warning. The terms “Gauss theorem” and “Stokes theorems” are typically used only in connection with their three–dimensional versions (Subsection 19.6.4). Let us begin by expressing the Green theorem in a slightly different form. To this end, introduce the outward unit normal and the counterclockwise unit tangent, namely t = tx i + ty j and n = nx i + ny j. We have dx/ds = tx = −ny
and
dy/ds = ty = nx .
(19.76)
[Hint: Use t = dx/ds (Eq. 18.17). Also, note that t and n are orthonormal, and hence satisfy Eq. 15.48). The sign ambiguity is resolved by the directions chosen for n and t (Fig. 19.10).]
Fig. 19.10 Outward unit normal and counterclockwise unit tangent to C
Accordingly, Eq. 19.64 may be rewritten as // / ∂N ∂M − dx dy = N nx − M ny ds ∂x ∂y R C / = M tx + N ty ds.
(19.77)
C
The above forms of the Green theorem are useful to discuss the so–called 2D Gauss and Stokes theorems, addressed in the two subsubsections that follow.
761
19. Multivariate integral calculus
• The planar Gauss theorem Consider the Green theorem in the form given by the first equality in Eq. 19.77 and set N = vx and M = −vy . This yields // / ∂vx ∂vy + dx dy = vx nx + vy ny ds. (19.78) ∂x ∂y R C Recalling that, in two dimensions, the divergence of v is given by Eq. 18.57, namely div v = ∂vx /∂x+∂vy /∂y, Eq. 19.78 may be written in a more compact form as // / div v dR = v · n ds. (19.79) R
C
The above equation will be referred to as the planar Gauss theorem. We have the following Definition 309 (Flux in two dimensions). Consider a generic planar vector field v(x), with x ∈ R2 , and an arbitrary contour C, with its outward unit normal n. The flux F(v, C) of the vector v through C is defined by 7 F(v, C) := v · n ds. (19.80) C
In plain words, the flux is a measure of the amount of the vector field v “going through” C. [The term flux comes from fluid dynamics, with v denoting the velocity.] Accordingly, the two–dimensional Gauss theorem tells us that the double integral over R of the two–dimensional divergence (Eq. 18.57) of any vector field v equals the flux of v through C = ∂R.
• The planar Stokes theorem Consider the Green theorem, in the form obtained by equating the first and last terms in Eq. 19.77, and set M = vx and N = vy . Then, one obtains // / ∂vy ∂vx − dx dy = vx tx + vy ty ds. (19.81) ∂x ∂y R C Note that, according to Eq. 18.63, the integrand in the left–side integral is the component of the vector curlv in the direction of k = i × j, which is normal to the (x, y)-plane. Thus, we may write ∂vy /∂x − ∂vx /∂y = k · curlv (two–dimensional curl, Eq. 18.64). Then, Eq. 19.81 may be written as
762
Part III. Multivariate calculus and mechanics in three dimensions
// R
/ k · curlv dR = v · dx.
(19.82)
C
The above equation will be referred to as the planar Stokes theorem. ◦ Planar Stokes theorem in fluid dynamics.♥ In fluid dynamics, Eq. 19.82 reads // / ζ dR = v · dx, (19.83) R
C
where v denotes the velocity, whereas ζ = ∂vy /∂x − ∂vx /∂y (Eq. 18.65) denotes the two–dimensional vorticity. In plain terms, for planar flows the integral of the vorticity over the region R equals the circulation of the velocity over C = ∂R.
19.5.3 Irrotational, lamellar, conservative, potential
♣
In this subsection, we express the above results in a more interesting way. We consider separately simply and multiply connected regions
• Simply connected regions Here, R denotes a (two–dimensional) simply connected region. Let us introduce the following definitions: Definition 310 (Irrotational field). A planar vector field v(x), defined on a two–dimensional region R, is called irrotational in R iff k · curlv =
∂vy ∂vx − = 0, ∂x ∂y
(19.84)
at every point of R. [Recall Footnote 4, p. 723, on the terms irrotational, rotor and curl.] Definition 311 (Lamellar field). A planar vector field v(x), defined on a two–dimensional region R, is called lamellar iff 7 v · dx = 0, (19.85) C
for any contour C in R.3 3
According to Truesdell (Ref. [71], p. 23), the term lamellar, with the meaning given here, was introduced by William Thomson, Lord Kelvin.
763
19. Multivariate integral calculus
Definition 312 (Conservative field). A planar vector field v(x), defined on a two–dimensional region R, is called conservative iff its line integral is path–independent, namely iff, for any L(xa , xb ) ∈ R, we have that / xb / v · dx = v · dx (19.86) L(xa , xb )
xa
depends upon the endpoints xa and xb , and not upon the line itself.4 Definition 313 (Potential field). A planar vector field v(x), defined on a two–dimensional region R, is called potential in a region R iff there exists a single–valued function u(x) such that v(x) = gradu(x).
(19.87)
The function u(x) is called the potential of the vector field v(x). We have the following Theorem 202. Consider a conservative vector field v(x) defined in an open simply connected region R, along with the function / x u(x) := vx dx + vx dy + C, (19.88) x0
where x0 ∈ R is arbitrarily fixed and C is an arbitrary constant, whereas x ∈ R is variable, and the line L(x0 , x) is fully contained in R. The function u(x) is called the potential of v(x), in the sense that gradu(x) = v(x), as in Eq. 19.87. In addition, du = v · dx.
(19.89)
◦ Comment. Note that changing the point x0 or the constant C in Eq. 19.88 changes u(x) only by an arbitrary additive constant. However, this constant is typically inessential, since we are interested primarily in its gradient, and the constant disappears upon differentiation. [The reason for adding the constant C is the same as that discussed in Remark 107, p. 490, where we showed that not all the primitives may be expressed as an integral.] 4
I find it convenient to distinguish between lamellar fields and fields with path– independent integrals. In Definition 333, p. 827, we will introduce the standard definition that a force field f (x) is called conservative iff its work (namely its line integral, Eq. 20.91) from xa to xb is independent of the path connecting xa and xb . Thus, for want of a better name, I use the same term (conservative) for any vector field with the same property. However, I have to warn you that many authors use the term conservative only in connection with force fields.
764
Part III. Multivariate calculus and mechanics in three dimensions
◦ Proof : The theorem differs from Theorem 201, p. 759, only because the symbols are changed (from M and N to vx and vy ). The proof is quite similar, but it will not hurt going through it again, using the new symbols. Consider ∂u/∂x. [The proof for ∂u/∂y is virtually identical.] By definition, we have ∂u 1 = lim h→0 h ∂x 1 h→0 h
/ /
x+hi
x0 x+hi
= lim
x
v(x ) · dx −
/
x x0
v(x ) · dx
v(x ) · dx .
(19.90)
Since the integrals are path–independent, we can use a straight line connecting x to x + hi, namely a line with y constant, so that dy = 0. Then, we have / 1 x+h ∂u = lim vx (x , y) dx = vx (x, y). (19.91) ∂x h→0 h x [Hint: For the second equality, use the corollary to the mean value theorem for integral calculus (Eq. 10.13).] Hence, Eq. 19.89 is an immediate consequence of Eq. 19.87. [Hint: Dot Eq. 19.87 with dx, and use du = gradu · dx (Eq. 18.46).] Finally, we have the following Theorem 203 (Irrotational, lamellar, conservative, potential fields). For any suitably smooth vector field v(x), defined in an open bounded simply connected planar region R, we have that irrotational, lamellar, conservative, and potential fields are equivalent terms. ◦ Proof : Irrotational fields are lamellar and lamellar fields are conservative (Theorem 200, p. 758). Moreover, conservative fields are potential (Theorem 202 above). Finally, potential fields are irrotational, because ∂vy ∂vx ∂2u ∂2u − = − =0 ∂x ∂y ∂y ∂x ∂x ∂y
(19.92)
(i.e., k · curl[gradu(x)] = 0, Eq. 18.79). [Of course, the assumption that the vector field v(x) is suitably smooth implies that the hypothesis of the Schwarz theorem on the invertibility of the order of the second mixed derivative (Eq. 9.126) are satisfied.] ◦ Comment. Interestingly, the above result is valid in three dimensions as well. For the sake of simplicity, this fact is addressed in Vol. III, after introducing the three–dimensional Stokes theorem.
765
19. Multivariate integral calculus
• Multiply connected regions
♠
Here, we extend to multiply connected regions (Definition 168, p. 348; see also Subsection 19.2.2) the results of the preceding subsubsection, limited to simply connected regions. We have the following definitions, limited for simplicity to 2-connected regions: Definition 314 (Quasi–lamellar fields for 2-connected regions). Consider a vector field v(x), defined on the open 2-connected region R. Assume that for any shrinkable contour C ∈ R we have / v(x) · dx = 0, (19.93) C
whereas for any counterclockwise contour C ∈ R surrounding the hole in R, the contour integral is path–independent, and equal, say, to Γ : / v(x) · dx = Γ. (19.94) C
Then, the vector field is called quasi–lamellar. [Of course if Γ = 0 the field is lamellar.] Definition 315 (Multi–valued potential for 2-connected regions). Consider a vector field v(x), defined on the open 2-connected region R. Assume that, for any counterclockwise contour C ∈ R, the contour integral is 0x path–independent, and equal, say, to Γ . Then, the integral x0 v(x )·dx over any line L ∈ R connecting x0 to x (where x0 is kept fixed and x is allowed to vary) defines a multi–valued function of the upper limit of integration, given by / x u(x) := v(x ) · dx + nW Γ. (19.95) x0
This function is called the multi–valued potential of the vector field v(x). Of course, if Γ = 0, the potential is single–valued. [The comment to Theorem 202, p. 763, regarding the arbitrary additive constant, applies here as well.] ◦ Comment. The results of the preceding subsubsection are essentially valid for multiply connected regions, the main difference being that in such a case the potential function may be multi–valued. In this case, I typically use the terms quasi–potential, quasi–lamellar and quasi–conservative.
766
Part III. Multivariate calculus and mechanics in three dimensions
19.5.4 Exact differential forms revisited As stated in Definition 288, p. 708, a differential form M (x, y) dx+N (x, y) dy is called exact iff there exists a function u(x, y) such that M (x, y) = ∂u/∂x and N (x, y) = ∂u/∂y. Then, in Theorem 189, p. 708, we obtained the necessary conditions for the differential form to be exact. We now have the tools to obtain the corresponding sufficient conditions. Specifically, we have the following theorem Theorem 204 (Sufficient condition for exact differential form). Let R denote an open simply connected planar region. Assume ∂N /∂x and ∂M /∂y to be continuous for every point of R, with ∂N ∂M = , ∂x ∂y
(19.96)
for all the points of R. The above condition is sufficient for the differential form M dx + N dy, to be exact in R. ◦ Proof : The condition is sufficient for the existence of u(x, y), with M = ∂u/∂x and N = ∂u/∂y (Theorem 203, p. 764, with vx = M and vy = N ). Therefore, the form is exact, by definition (Eq. 18.3).
19.5.5 Integrating factor
♣
Consider a differential form M (x, y) dx+N (x, y) dy that is not exact, namely ∂N /∂x = ∂M /∂y. We wonder if there exists a function P (x, y) such that the form P (x, y) M (x, y) dx + P (x, y) N (x, y) dy is an exact differential form. We have the following Definition 316 (Integrating factor for exact differential forms). A function P (x, y) is called an integrating factor for the differential form M dx+ N dy iff multiplication by P (x, y) generates a new form, P M dx + P N dy, that is exact. We have the following Theorem 205 (Equation for integrating factor). The integrating factor, if it exists, must satisfy the equation ∂N ∂P ∂P ∂M N −M +P − = 0. (19.97) ∂x ∂y ∂x ∂y
767
19. Multivariate integral calculus
◦ Proof : Indeed, according to Theorem 204 above, in order for the new form to be exact, it must satisfy the condition (use Eq. 19.96) ∂ (P N ) ∂ (P M ) = , ∂x ∂y which is fully equivalent to Eq. 19.97.
(19.98)
◦ Comment. Note that Eq. 19.97 may be written as a(x, y)
∂P (x, y) ∂P (x, y) + b(x, y) + c(x, y) P (x, y) = 0, ∂x ∂y
(19.99)
with a(x, y) = N (x, y), b(x, y) = −M (x, y), and c(x, y) = ∂N /∂x − ∂M /∂y. This equation is an example of a linear first–order partial differential equation, whose analysis (namely the theorems of existence and uniqueness, and the corresponding boundary conditions) is beyond the scope of this book. Here, it suffices to say that the original form may be transformed into an exact one, provided that we can find a nontrivial solution to Eq. 19.99. [Note that the equation is homogeneous, and hence there always exists a trivial (and useless) solution P (x, y) = 0.] For future reference, let us introduce the following Theorem 206. If there exists a second integrating factor, say P1 [P (x, y)], then necessarily
P1 (x, y) = C P (x, y) , (19.100) where C > 0 is an arbitrary constant. ◦ Proof : ♣ Indeed, according to Theorem 204 above, in order for the new form (that is, P1 M dx + P1 N dy) to be exact, it must satisfy the condition ∂(P1 N )/∂x = ∂(P1 M )/∂y (Eq. 19.96), namely dP1 ∂P ∂N ∂M dP1 ∂P N + P1 = M + P1 . dP ∂x ∂x dP ∂y ∂y
(19.101)
Combining with Eq. 19.97, we obtain, necessarily, dP1 P1 = , dP P
(19.102)
as you may verify. The solution to this equation is ln |P1 | = ln |P | + ln C, where ln C (with C > 0) is an arbitrary constant of integration. This implies ln |P1 /(P C)| = 0, namely |P1 /P | = C, in agreement with Eq. 19.100.
768
Part III. Multivariate calculus and mechanics in three dimensions
19.6 Integrals in space Here, we extend the above considerations to three–dimensional spaces. Accordingly, here we introduce three different types of integrals that we may encounter in space. Specifically, line integrals are covered in Subsection 19.6.1, surface integrals (namely integrals over a surface) are addressed in Subsection 19.6.2, and triple integrals (namely volume integrals) are presented in Subsection 19.6.3.
19.6.1 Line integrals in space Line integrals in two dimensions have been addressed in Section 19.1. You may review that material and verify that it applies as well to line integrals in three dimensions, with suitable self–evident modifications. These include in particular the material in Section 19.2, on path–independent line integrals. ◦ Comment. The definition of lamellar, conservative and potential fields on a plane (Definitions 311, 312 and 313, pp. 762–763) require only the definition of line integrals and therefore may be used in space as well. [The analysis of irrotational fields requires the use of the Stokes theorem in space, not addressed in this volume (see Subsection 19.6.4), and hence it is postponed to Vol. III.] Then, we have the following (analogous to Theorem 201, p. 759) Theorem 207 (Potential, conservative and lamellar fields). Let R denote an open bounded simply connected region. Consider a vector field v(x) defined on R. If its contour integral vanishes for any C ∈ R (lamellar field), then v(x) is conservative and potential: we have a single–valued u(x) with gradu(x) = v(x).
(19.103)
Vice versa, if v(x) admits a single–valued u(x), its contour integral vanishes for any C ∈ R (lamellar), with path–independent line integral (conservative). ◦ Proof : If the contour integral vanishes (lamellar field), then the line integral from x0 to x is path–independent (conservative field), as you may verify. If x0 ∈ R is fixed and x ∈ R is allowed to vary, then / x v · dx + C, (19.104) u(x) := x0
is single–valued. I claim that it satisfies Eq. 19.103 (potential field). Indeed, extending Theorem 202, p. 763, to three dimensions, we have
769
19. Multivariate integral calculus
∂u 1 = lim ∂x h→0 h
/
x+hi x0
v · dx −
x
/
x0
v · dx
1 h→0 h
/
x+hi
= lim
x
v · dx , (19.105)
or, using the x-line connecting x to x + hi (use path–independence), 1 ∂u = lim ∂x h→0 h
/
x+h
vx (x , y, z) dx = vx (x, y, z).
(19.106)
x
[Hint: For the second equality, use the corollary to the mean value theorem in integral calculus (Eq. 10.13).] Similar proofs hold for the other components. Vice versa, if v(x) = gradu(x) (Eq. 19.103), with u(x) single–valued, then / x / x / x v · dx = gradu · dx = du = u(x) − u(x0 ) (19.107) x0
x0
x0
is path–independent and its contour integral over any C ∈ R vanishes.
19.6.2 Surface integrals Recall that, by definition, surfaces are connected and not self–intersecting (Definition 169, p. 350). Recall also the definitions of closed surfaces (Definition 171, p. 350) and of regions in three dimensions (Definition 172, p. 351). Then, in analogy with what we did for double integrals, we have the following Definition 317 (Surface integral). Consider a function f = f (x), and an open surface S embedded in a three–dimensional space. Let us divide S into small subsurfaces Sk , and let xk denote a point inside S00 k . The surface integral of the function f (x) over the surface S, denoted by S f (x) dS, is defined by // f (x) dS := lim
Sk →0
S
n #
f (xk ) Sk ,
(19.108)
k=0
provided of course that the limit exists. If the surface S is closed, the definition is analogous. However, in this case the surface integral of the function f (x) over the closed surface S is denoted by // f (x) dS. (19.109) S
770
Part III. Multivariate calculus and mechanics in three dimensions
• Transforming a surface integral into a double integral
♣
Here, we want to show how to evaluate a surface integral. Specifically, we want to show how a surface integral may be expressed as a double integral, which we already know how to evaluate. [For an additional way to evaluate a surface integral, namely by using curvilinear coordinates, see Subsection 19.8.4.]
Fig. 19.11 Projected surface element
To this end, let us consider the normal n(x) to a surface element dS, at a given point x of S. Assume, to make our life simpler, that nz has the same sign for any point of S, say nz > 0. Accordingly, we have n · k = cos α > 0, where α ∈ [0, π/2] is the angle between n and k, which equals the angle between the surface element dS and the (x, y)-plane (Fig. 19.11). Then, we have n · k dS = cos α dS = dSP,
(19.110)
where dSP denotes the projection of dS into the (x, y)-plane, namely its “shadow” (Definition 105, p. 197). Thus, we have (use |n · k| to cover also the case nz < 0) // // 1 dx dy, (19.111) f (x) dS = f (x) |n · k| S SP where SP denotes the projection of S into the (x, y)-plane. [Similar considerations hold if we project into the (y, z) plane, or into the (x, z) plane.]
771
19. Multivariate integral calculus
19.6.3 Triple (or volume) integrals Recall Definition 306, p. 746, of double integrals, as well as Fig. 19.4, p. 746. Recall also the definition of a cuboid (Definition 104, p. 196). Then, akin to what we did for double integrals in R2 , we have the following Definition 318 (Triple integral). Let V denote a cuboid in the (x, y, z)space, specifically a closed bounded region with x ∈ [a, b], y ∈ [c, d] and z ∈ [e, f ]. Consider a function f (x) = f (x, y, z) that is defined on V. Let us divide the region into n × n × n equal subregions Vhjk , with boundaries given by coordinate surfaces. [Specifically, any point P = (x, y, z) of Vhjk is such that (i) x ∈ [xh−1 , xh ], where xh = a + h Δx (h = 0, . . . , n), with Δx = (b − a)/n; (ii) y ∈ [yj−1 , yj ], where yj = c + j Δy (j = 0, . . . , n), with Δy = (d − c)/n; and (iii) z ∈ [zk−1 , zk ], where zk = e + k Δz (k = 0, . . . , n), with Δz = (f − e)/n.] Let f˘hjk denote the value of f (x, y, z) at the midpoint of the subregion Vhjk , namely f˘hjk = f (˘ xh , y˘j , z˘k ), where x ˘k = 12 (xh−1 + xh ), 1 1 y˘j = 2 (yj−1 +yj ) and z˘k = 2 (zk−1 +zk ). Then, the triple (or volume) integral of the function f (x) over the region V, denoted by /// /// f (x, y, z) dx dy dz or f (x) dV, (19.112) V
V
is defined by /// f (x, y, z) dx dy dz := lim
n→∞
V
n #
f˘hjk Δx Δy Δz,
(19.113)
h,j,k=0
provided of course that the limits exist. ◦ Warning. Note the difference between V (which denotes a region in space) and V (which denotes the volume of V). If f = f (x, y, . . . ), we use the notation /// f (x, y, . . . ) dV (x) (19.114) V
to indicate that x is the dummy variable of integration. ◦ Comment. The consideration in Subsection 19.3.2 on the Darboux integral may be extended to triple integrals, as you may verify, thereby guaranteeing the existence of the limit for piecewise–continuous functions. As a consequence, we are able to define integrals over arbitrary volumes, following the guidelines used in Subsection 19.3.3, on double integrals over arbitrary regions.
772
Part III. Multivariate calculus and mechanics in three dimensions
• Properties of triple integrals Akin to Eq. 10.7 (linear integrals) and Eq. 19.39 (double integrals), we have /// /// /// α f (x) + a g(x) dV = α f (x) dV + β g(x) dV. (19.115) V
V
V
In addition, consider a closed region V, along with a finite–area surface S that divides V into two closed volumes V1 and V2 , so that V = V1 ∪ V2 . Then, akin to Eq. 10.8 (linear integrals) and Eq. 19.40 (double integrals), we have /// /// /// f (x) dV + f (x) dV = f (x) dV. (19.116) V1
V2
V
• Iterated integrals In analogy with what was done for double integrals in Section 19.4, under similar conditions we may use iterated integrals for volume integrals as well. If the region is a cuboid, with x ∈ [0, a], y ∈ [0, b], z ∈ [0, c], we have ///
a
/
b
/
c
/
f (x, y, z) dV = V
f (x, y, z) dz dy dx. 0
0
(19.117)
0
It should be noted that if f (x, y, z) = X(x) Y (y) Z(z), we have a
/
/// f (x, y, z) dV = V
0
/ b / c X(x) dx Y (y) dy Z(z) dz , 0
(19.118)
0
as you may verify. Next, consider more general regions. Assume that any line parallel to the x-axis through V intersects the boundary of V only twice. Let P1 and P2 (with x1 < x2 ) denote the two points of S intersected by such a line. Let Sk (k = 1, 2) denote the surface defined as the locus of all the points Pk . Let x = X1 (y, z) and x = X2 (y, z) be the explicit representations for the surfaces S1 and S2 . Let R denote the region corresponding to the projection of the surfaces S1 and S2 into the (y, z) plane. Then, in analogy with the Fubini theorem for convex planar regions (Eq. 19.63), we have ///
// / X2 (y,z) f (x, y, z) dV = f (x, y, z) dx dy dz V R X1 (y,z) // + , = F [X2 (y, z), y, z] − F [X1 (y, z), y, z] dy dz, (19.119) R
773
19. Multivariate integral calculus
where F (x, y, z) is the primitive of f (x, y, z) with respect to the variable x, namely Fx (x, y, z) = f (x, y, z). Then, we can use again the Fubini theorem for convex regions (Eq. 19.63), and obtain that the volume integral may be expressed as three iterated integrals.
19.6.4 Three–dimensional Gauss and Stokes theorems Using the preceding results, we could at this point introduce the Gauss and Stokes theorems in three dimensions. However, they will be needed only in Vol. III. Therefore, they will be addressed there.
19.7 Time derivative of volume integrals
♣
In this section, we consider the time derivative of a volume integral in which both the integrand and the region of integration vary with time, namely /// f (x, t) dV. (19.120) F (t) = V(t)
Note that F (t) is only a function of t. [Note also the analogy with the derivative of one–dimensional integrals (Eq. 10.97).] ◦ Comment. Incidentally, here t denotes time. However, in general it may represent any independent variable. We will need the following Definition 319 (Surface speed). For a surface S(t), we use the term “speed” and not “velocity,” because for surfaces it does not make any sense to speak of “tangential velocity.” [For instance, if a body of revolution (namely an axisymmetric body) is spinning around its axis, its boundary (a surface!) does not move. The speed of the surface is zero, although the body points on the surface have a nonzero velocity. Specifically, if a body point x on S has a velocity v, we have that the surface speed is given by vS := v · n, where v(x) and n(x) denote, respectively, the velocity of x and the unit normal there.] We have the following Theorem 208 (Time derivative of a volume integral). The time derivative of F (t) is given by /// // dF ∂f = dV + f vS dS, (19.121) dt V(t) ∂t S(t)
774
Part III. Multivariate calculus and mechanics in three dimensions
where vS is the speed of S(t) at x, positive if the surface moves in the direction of its outer normal n(x). ◦ Proof : To prove Eq. 19.121, note that, by definition of derivative, /// /// dF 1 = lim f (x, t ) dV − f (x, t) dV , (19.122) Δt→0 Δt dt V(t ) V(t) where Δt = t − t. Next, set V0 = V(t)∩V(t ), V1 := V(t)\V0 and V2 := V(t )\V0 (Fig. 19.12). [Recall Definition 165, p. 347: (i) VA = VB ∩ VC is composed of the points common to VB and VC (intersection), whereas (ii) VA = VB\VC is composed of the points of VB that do not belong to VC (relative complement).] Hence, Eq. 19.122 may be written as /// /// dF 1 = lim f (x, t ) dV − f (x, t) dV . (19.123) Δt→0 Δt dt V0 ∪V2 V0 ∪V1
Fig. 19.12 Derivative of integral
Next, consider /// /// 1 f (x, t ) dV − f (x, t) dV Δt→0 Δt V0 V0 /// /// 1 ∂f = lim dV. f (x, t ) − f (x, t) dV = Δt→0 Δt V0 V ∂t
IV := lim
(19.124)
[For a rigorous approach, you might want to extend to three dimensions the formulation used in Subsection 10.6.2 for one dimension.] On the other hand, for small values of Δt, the volumes V1 and V2 consist of thin layers, respectively with thicknesses δ1 and δ2 , given by
775
19. Multivariate integral calculus
δ1 := −vS Δt > 0
and
δ2 := vS Δt > 0.
(19.125)
[Hint: Regarding the signs, note that, if v2 is the velocity of the point x2 , then δ2 = n · Δx2 = n · v2 Δt = vS2 Δt > 0, because vS2 = v · n is positive for any point on S2 , where the surface moves in the direction of the outer normal n. Vice versa, δ1 = −n · Δx1 = −n · v1 Δt = −vS1 Δt > 0. Hence, the expressions given for the thicknesses are both positive.] Then, denoting with Sk the portion of the boundary S in contact with Vk (k = 1, 2), we have (using dVk = δk dSk ) /// /// 1 f (x, t ) dV2 − f (x, t) dV1 IB := lim Δt→0 Δt V V // 2 // 1 1 = lim (19.126) f (x, t ) δ2 dS2 − f (x, t) δ1 dS1 Δt→0 Δt S2 S1 // // 1 = lim f (x, t) vS Δt dS = f (x, t) vS dS. Δt→0 Δt S1 ∪S2 S Finally, using dF /dt = IV + IB (Eq. 19.123), one obtains Eq. 19.121.
19.8 Appendix. Integrals in curvilinear coordinates
♣
In Subsection 19.6.2, we have seen how a surface integral may be evaluated as a double integral, via a projection onto a coordinate surface (Eq. 19.111). However, sometimes, surface integrals are more easily evaluated by introducing curvilinear coordinates over the surface. Similar considerations hold true for double and triple integrals as well. To give you an example, in Subsection 19.8.1, we show that a double integral over a disk is conveniently obtained by using polar coordinates (Eq. 7.45). Similarly, in Subsections 19.8.2 and 19.8.3, we show that when the region of integration is either cylindrical or spherical, we are better off by using cylindrical or spherical coordinates. Are we limited to cylindrical and spherical coordinates? No! We can use arbitrary curvilinear coordinates! These are introduced in Subsection 19.8.4, along with their use for evaluating volume and surface integrals. The procedure is based upon the use of the Jacobian, also introduced there. What happens if the curvilinear coordinates are orthogonal? In this case, the expressions simplify a bit, as shown in Subsection 19.8.5, where we present the general formulation for orthogonal curvilinear coordinates, and also show how this yields, as particular cases, the results for cylindrical and spherical coordinates.
776
Part III. Multivariate calculus and mechanics in three dimensions
19.8.1 Polar coordinates. Double integrals
♣
In two dimensions, you are already familiar with polar coordinates r and θ, with r ∈ [0, ∞) and θ ∈ (−π, π], introduced in Eq. 7.45, namely (see Fig. 7.7) x = r cos θ
and
y = r sin θ.
(19.127)
If we keep r constant, then Eq. 19.127 describes a circle. On the other hand, if we keep θ fixed and vary r, we describe a ray through the origin. The lines r = constant (circles) and θ = constant (rays) are called coordinate lines. Before considering an illustrative example, let us introduce the following Definition 320 (Image). Given a one–to–one (Definition 12, p. 32) transformation x(ξ 1 , ξ 2 ), and a region R in the (x1 , x2 ) plane, the image of R in ˚ is the locus of all the points of the (ξ 1 , ξ 2 ) the (ξ 1 , ξ 2 ) plane, denoted by R, plane that correspond to the (x, y) points of the region R. In order to address a double integral, consider the region R, with r ∈ [R1 , R2 ] and θ ∈ [Θ1 , Θ2 ] (see Fig. 19.13.a, where R1 = 1 and R2 = 5, whereas ˚ of the region R in the (r, θ) Θ1 = π/4 and Θ2 = 3π/4). Consider the image R plane, namely the locus of all the points of the (r, θ) plane that correspond to the (x, y) points of the region R (Definition 320 of image, above), in our case those with r ∈ [R1 , R2 ] and θ ∈ [Θ1 , Θ2 ] (Fig. 19.13.b). Let us subdivide the ˚ into n2 equal rectangles, say R ˚hk , with sides defined by coordinate image R ˚hk we have: (i) r ∈ (rh−1 , rh ), where lines. Specifically, for any point in R rh = R1 +h Δr (h = 0, . . . , n), with Δr = (R2 −R1 )/n, and (ii) θ ∈ (θk−1 , θk ), where θk = Θ1 + k Δθ (k = 0, . . . , n), with Δθ = (Θ2 − Θ1 )/n.
Fig. 19.13 Region of integration in (x, y)- and (θ, r)-planes
Let r˘h = 12 (rh + rh−1 ) and θ˘k = 12 (θk + θk−1 ) denote the midpoint coordi˚hk . Neglecting higher–order terms, the area Ahk of the region Rhk nates of R
777
19. Multivariate integral calculus
is given by Ahk = r˘h Δθk Δrh . Then, as in Eq. 19.28, we take the limit, and define // n # f (x, y) dA := lim f (˘ rh , θ˘k ) r˘h Δrh Δθk , (19.128) n→∞
R
h,k=1
or //
// f (x, y) dA = R
◦
f (r, θ) r dθ dr,
(19.129)
R
˚ is the image of R in the (r, θ) plane. where, again, R
• An illustrative example: area of a disk As a first illustrative example, we want to use our newly acquired know–how to obtain the area A of a disk. Consider a disk with center at the origin and radius R. Its boundary are defined by r ∈ [0, R] and θ ∈ [−π, π]. Recall the convention used for double integrals, namely that the limits on the first integral sign refer to the last differential (Remark 142, p. 751). Accordingly, we have / π / R 2 R / π / R r A= r dr dθ = dθ rdr = 2π = π R2 , (19.130) 2 0 −π 0 −π 0 in agreement with Eq. 6.90.
• A second illustrative example: volume of a ball As a second illustrative example, let us evaluate the volume of a ball of radius R. Consider the boundary of the ball, namely a sphere of radius R and center at the origin. The explicit representation for the top hemisphere (namely for z ≥ 0) is given by z(x, y) = R2 − x2 − y 2 . (19.131) The volume of the ball inside the sphere is twice that of the region inside the top hemisphere, namely // // V =2 z(x, y) dx dy = 2 R2 − x2 − y 2 dx dy, (19.132) R
R
778
Part III. Multivariate calculus and mechanics in three dimensions
where R denotes the disk of radius R on the plane z = 0. Shifting to polar coordinates (Eq. 19.127), we have /
π
R
/
V =2 −π
=−
/ R2 − r2 r dr dθ = 4 π
0
R
R2 − r2 r dr
0
3/2 *R 4 4 ) 2 π R − r2 = π R3 . 3 3 0
(19.133)
[Hint: Use dA = r dr dθ √ (Eq. 19.129) and r ∈ [0, R] and θ ∈ [−π, π], as well as [(R2 − r2 )3/2 ] = −3 r R2 − r2 .]
• A third illustrative example. The Gauss error function As a third illustrative example, we want to show that / ∞ π −ax2 (a > 0). e dx = I= a −∞
(19.134)
Indeed, we have / ∞ / ∞ // 2 2 2 −ax2 −ay 2 e dx e dy = e−a(x +y ) dx dy I = R2 −∞ −∞ / ∞ / ∞ / π / ∞ 2 2 −ar −ar e r dr dθ = 2 π e r dr = π e−au du = −π 0 0 0 −au ∞ e π (19.135) = , =π −a 0 a in agreement with Eq. 19.134. [Hint: Set u := r2 .] ◦ Comment. As an application of the above result, consider the Gauss error function, erf(x), which is defined by / x 2 2 erf(x) := √ e−u du. (19.136) π 0 Using Eq. 19.134 we obtain erf(∞) = 1.
19.8.2 Cylindrical coordinates. Volume integrals
♣
In three dimensions, you might be familiar with cylindrical coordinates, close cousins of polar coordinates, namely
779
19. Multivariate integral calculus
x = r cos θ,
y = r sin θ,
z = z,
(19.137)
where r ∈ [0, ∞), θ ∈ [−π, π], and z ∈ (−∞, ∞) (Fig. 19.14). ◦ Comment. Note that z is a coordinate in both systems. Indeed, it refers to the same geometrical entity, namely the vertical distance from the (x, y)plane. Hence, we use the same symbol for both systems.
Fig. 19.14 Cylindrical coordinates
If we keep both r = r0 and z = z0 fixed and vary only θ, we describe the circle with radius r0 , on the plane z = z0 . If we keep both θ = θ0 and z = z0 fixed and vary only r, we describe a ray on the plane z = z0 , forming an angle θ0 with respect to the x-axis. Finally, if we keep both θ = θ0 and r = r0 fixed and vary only z, we describe a vertical straight line through the point (x0 , y0 , 0), with x0 = r0 cos θ0 , y0 = r0 sin θ0 . These lines are referred to as cylindrical coordinate lines. On the other hand, if we keep only r = r0 fixed and vary both θ and z, we describe the cylindrical surface with radius r0 , centered around the z-axis. If we keep only θ = θ0 fixed and vary both r and z, we describe a vertical half–plane from the z-axis, forming an angle θ0 with the x-axis. Finally, if we keep only z = z0 fixed and vary both r and θ, we describe a horizontal plane through the point (0, 0, z0 ). These surfaces are referred to as cylindrical coordinate surfaces.
• Evaluation of volume integrals in cylindrical coordinates In order to evaluate a triple integral using cylindrical coordinates, consider the image ˚ V of the region V in the (r, θ, z)-space (namely the locus of the points r, θ, z corresponding to the points x, y, z that belong to the region V). Let us subdivide the image ˚ V of V into n × n × n equal cuboids ˚ Vk , with sides
780
Part III. Multivariate calculus and mechanics in three dimensions
defined by coordinate planes. Neglecting higher–order terms, the volume Vk of Vk is given by Δr (˘ rk Δθ) Δz. Then, as in Eq. 19.28, we take the limit and obtain /// /// f (x) dV = f (r, θ, z) r dr dθ dz, (19.138) ◦ V
V
where, again, ˚ V is the image of V.
• An illustrative example: volume of a circular cylinder As an illustrative example, let us obtain the volume of a circular cylindrical solid figure, with circular cross–section of radius R, and length . Let us choose the z-axis to coincide with the cylindrical axis. Accordingly, using cylindrical coordinates, the boundaries are defined by r ∈ [0, R], θ ∈ [−π, π], and z ∈ [0, h]. Thus, recalling again the convention used for iterated triple integrals (namely that the limits on the first integral sign refer to the last differential, Remark 142, p. 751), we have (use Eq. 19.118) /// V =
/ /
/// dV =
V /
◦
/
V π
dz
= 0
R
/
r dr dθ dz −π
0
/ dθ
−π
π
r dr dθ dz = R
r dr
0
= π R2 ,
(19.139)
0
namely the area of its base multiplied by its height.
19.8.3 Spherical coordinates. Volume integrals
♣
Also of interest are the spherical coordinates, namely x = r cos ψ cos ϕ,
y = r cos ψ sin ϕ,
z = r sin ψ,
(19.140)
where r ∈ [0, ∞), ϕ ∈ [−π, π], and ψ ∈ [−π/2, π/2] (Fig. 19.15). ◦ Comment. We have already encountered spherical coordinates in Eq. 7.81, where we used u and v, in place of ψ and ϕ. As noted there, we have a correspondence between ψ and ϕ on the one hand, and on the other the standard definition of latitude and longitude used to determine the location of a point on the surface of the Earth. If we keep both ϕ = ϕ0 and ψ = ψ0 fixed and vary only r, we describe a ray in the vertical half–plane ϕ = ϕ0 and forming an angle ψ0 with the
781
19. Multivariate integral calculus
Fig. 19.15 Spherical coordinates
plane z = 0. If we keep both r = r0 and ψ = ψ0 fixed and vary only ϕ, we describe the circle with radius r0 cos ψ0 on the plane z = r0 sin ψ0 and center on the z-axis (like a parallel on the Earth). Finally, if we keep both ϕ = ϕ0 and r = r0 fixed and vary only ψ, we describe a semicircle, with radius r0 , center at the origin, in the vertical half–plane ϕ = ϕ0 (like a meridian on the Earth). These lines are referred to as spherical coordinate lines. On the other hand, if we keep only r = r0 fixed and vary both ϕ and ψ, we describe the sphere with radius r0 and center at the origin. If we keep only ϕ = ϕ0 fixed and vary both r and ψ, we describe a vertical half–plane from the z-axis, forming an angle ϕ0 with the x-axis. Finally, if we keep only ψ = ψ0 fixed and vary both r and ϕ, we describe a conical surface, with vertex in the origin and axis equal to the z-axis, forming an angle ψ0 with respect to the horizontal plane z = 0. These surfaces are referred to as spherical coordinate surfaces.
• Evaluation of volume integrals in spherical coordinates For evaluating a triple integral using spherical coordinates, we subdivide the image ˚ V into cuboids ˚ Vk , with sides defined by coordinate planes. The volume Vk of Vk is given by Δr (rk cos ψk Δϕ) (rk Δψ), as you may verify. Then, as in Eq. 19.28, we take the limit and obtain ///
π/2
/
/
π
R
/
f (x) dV = V
−π/2
−π
f (r, ϕ, z) r2 cos ψ dr dϕ dψ.
(19.141)
0
◦ Comment. If f (r, ϕ, z) = f (r), we simply have ///
R
/ f (x) dV = 4 π V
0
f (r) r2 dr,
(19.142)
782
Part III. Multivariate calculus and mechanics in three dimensions
as you may verify. [Hint: Use
0 π/2
−π/2
cos ψ dψ = 2 (Eq. 10.54).]
• An illustrative example: volume of a ball, again As an illustrative example, we want to use our newly acquired know–how to obtain again the volume of a ball of radius R, a problem already addressed in Subsection 19.8.1, on double integrals in polar coordinates (see Eq. 19.133). The boundaries of a ball of radius R, centered at the origin, are defined by r ∈ [0, R], ϕ ∈ [−π, π], and ψ ∈ [−π/2, π/2]. We have (use Eq. 19.142) R
/ V = 4π 0
r2 dr =
4 π R3 , 3
(19.143)
in agreement with Eq. 19.133.
19.8.4 Integration using general coordinates
♣
You might ask: “Are cylindrical and spherical the only types of coordinates for which we have a rule for evaluating volume integrals? Is there a method to address more general types of coordinates?” The answer: “Yes! There is a general rule for arbitrary curvilinear coordinates, even non–orthogonal!” Such a rule is presented in this subsection, where we extend to general curvilinear coordinates the procedure for evaluating volume integrals and introduce also surface integrals in curvilinear coordinates.
• Curvilinear coordinates In order to introduce a system of arbitrary curvilinear coordinates, consider a suitably smooth transformation of the type (Fig. 19.16) x = x(ξ 1 , ξ 2 , ξ 3 ).
(19.144)
This equation assigns to each set of ξ 1 , ξ 2 , ξ 3 a location x. The variables ξ α (α = 1, 2, 3) are called the curvilinear coordinates. [The cylindrical and spherical coordinates are particular cases, as you may verify.] To shed some light on the significance of this transformation, let us consider the equation x = x(ξ 1 , ξ02 , ξ03 ), where it is understood that ξ 1 is allowed to vary, whereas ξ02 and ξ03 are considered as fixed. This is formally equivalent to Eq. 18.11 and hence it describes a line in R3 , which will be referred to as
783
19. Multivariate integral calculus
Fig. 19.16 Curvilinear coordinates
a coordinate line, specifically as a ξ 1 -line. Similarly, consider the equation x = x(ξ 1 , ξ 2 , ξ03 ), which is understood with ξ 1 and ξ 2 varying, whereas ξ03 is kept fixed. This describes a surface in R3 , which will be referred to as a coordinate surface, specifically as a ξ 3 −constant surface. Assume that each of the coordinates ξ α is increased by a quantity dξ α (α = 1, 2, 3). This corresponds to a variation of x given by dx =
3 3 # # ∂x α dξ = gα dξ α , α ∂ξ α=1 α=1
(19.145)
where gα are given by the following5 Definition 321 (Covariant base vectors. Jacobian). Consider the transformation x = x(ξ 1 , ξ 2 , ξ 3 ) (Eq. 19.144). Introduce the vectors gα =
∂x ∂ξ α
α = 1, 2, 3 ,
(19.146)
which are tangent to the ξ α -line (Eq. 18.20), and assume that, at any of the points of a region R, none of the vectors gα vanishes and that they are not coplanar. Accordingly, we have g1 × g2 · g3 = 0.
(19.147)
5 The Jacobian is named after the German mathematician Carl Gustav Jacob Jacobi (1804–1851), one of the greatest of his generation. He made an important contribution to the theory of Hamilton equations (named after William Rowan Hamilton). Jacobi’s contribution is known as the theory of canonical transformations (Ref. [40], p. 348). [For an extensive history of the respective contributions of Hamilton and Jacobi, see also Dugas (Ref. [16], p. 401–404, on the Jacobi criticism of the Hamilton formulation), as well as Kline (Ref. [38], pp. 735–745), Hirsch and Smale (Ref. [32], pp. 290–294), and Arnold (Ref. [4], p. 63).]
784
Part III. Multivariate calculus and mechanics in three dimensions
Thus, these vectors may be used as a local basis at any of the points of R. They are known as the covariant base vectors. The function
∂x1 /∂ξ 1 ∂x1 /∂ξ 2 ∂x1 /∂ξ 3
(19.148) J(ξ1 , ξ2 , ξ3 ) := g1 × g2 · g3 =
∂x2 /∂ξ 1 ∂x2 /∂ξ 2 ∂x2 /∂ξ 3
∂x3 /∂ξ 1 ∂x3 /∂ξ 2 ∂x3 /∂ξ 3 is called the Jacobian of the transformation x = x(ξ 1 , ξ 2 , ξ 3 ). The last expression for the Jacobian in Eq. 19.148 is conveniently shortened into J(ξ α ) =:
∂(x1 , x2 , x3 ) . ∂(ξ 1 , ξ 2 , ξ 3 )
(19.149)
In this book, we assume the base vectors to be right–handed, unless otherwise stated. Accordingly, we have (use Eq. 15.86) J(ξ α ) = g1 · g2 × g3 > 0.
(19.150)
Finally, note that (use Eq. 19.145) ds = dx = 2
2
3 #
gα · gβ dξ α dξ β .
(19.151)
α,β=1
• Volume integrals Let us consider an infinitesimal vector in the direction of the first coordinate line, namely dx1 := g1 dξ 1 . [This is obtained from Eq. 19.145, by setting dξ 2 = dξ 3 = 0.] Similarly, we can define dx2 := g2 dξ 2 and dx3 := g3 dξ 3 (see Fig. 19.17). Next, consider the parallelepiped defined by dx1 , dx2 , and dx3 , which we will refer to as a coordinate volume element. Neglecting higher–order terms, its volume is given by
dV = dx1 · dx2 × dx3 = g1 · g2 × g3 dξ 1 dξ 2 dξ 3 , (19.152) or (use g1 · g2 × g3 = J(ξ α ) > 0, Eq. 19.150) dV = J(ξ α ) dξ 1 dξ 2 dξ 3 .
(19.153)
In defining a volume integral, we can subdivide the region V into small coordinate volume elements and take the limit. We obtain
785
19. Multivariate integral calculus
Fig. 19.17 Volume integral
Fig. 19.18 Surface integral
///
/// f (x) dV =
◦
V
f (ξ α ) J(ξ α ) dξ 1 dξ 2 dξ 3 ,
(19.154)
V
where ˚ V is the image of V in the ξ α –space.
• Integrals over coordinate volumes Let us introduce the following Definition 322 (Coordinate volume). The term coordinate volume denotes a distorted hexahedral region whose image in the ξ α -space is a cuboid, bounded by six planes defined by ξ α = ξ±α (α = 1, 2, 3). The volume integrals for coordinate volumes are particularly simple because they may be expressed in terms of repeated integrals, as ///
/
ξ3
+
f (x) dV = V
ξ3
−
/
ξ2
+
ξ2
−
/
ξ1
+
f (ξ α ) J(ξ α ) dξ 1 dξ 2 dξ 3 ,
(19.155)
ξ1
−
provided of course that the hypotheses of the Fubini theorem (Theorem 197, p. 753) are satisfied. Remark 143. Question: “Does a circular cylinder qualify as a volume? Answer: “Yes!” After all its boundaries are coordinate cylindrical coordinates! Indeed, it may be conceived as the limit coordinate volume in Fig. 19.19. [Similar considerations hold for volume.]
coordinate surfaces in case of the a spherical
786
Part III. Multivariate calculus and mechanics in three dimensions
Fig. 19.19 Circular cylinders and coordinate volumes
• Surface integrals Next, consider a surface integral, introduced in Definition 317, p. 769. Assume the surface to be described by an equation of the type x = x(ξ 1 , ξ 2 ) (parametric representation, Eq. 7.12). Introduce the surface covariant base vectors aα :=
∂x ∂ξ α
α = 1, 2 .
(19.156)
Consider an infinitesimal vector in the direction of the first coordinate line, given by dx1 = a1 dξ 1 . Similarly, we define dx2 = a2 dξ 2 . Next, consider the parallelogram defined by dx1 and dx2 , which we will refer to as a coordinate surface element (see Fig. 19.18). Its area is given by dS = x1 × x2 (use the definition of the cross product between two vectors, namely a × b = A n, Eq. 15.58), or dS = a1 × a2 dξ 1 dξ 2 .
(19.157)
In defining a surface integral, we can subdivide the surface S into small coordinate surface elements and take the limit, to obtain // // f (x) dS = ◦ f (ξ 1 , ξ 2 ) a1 × a2 dξ 1 dξ 2 , (19.158) S
S
˚ is the image of S in the ξ α -plane. where S
19.8.5 Orthogonal curvilinear coordinates
♣
We have the following Definition 323 (Orthogonal curvilinear coordinates). The term orthogonal curvilinear coordinates will be used to refer to a set of curvilinear coordinates for which the covariant base vectors are mutually orthogonal, so that
787
19. Multivariate integral calculus
gα · gβ = 0
(α = β).
(19.159)
Equation 19.159 implies (use Eq. 19.151) 2 2 2 ds2 = g1 · g1 dξ 1 + g2 · g2 dξ 2 + g3 · g3 dξ 3 .
(19.160)
Following tradition, when dealing with orthogonal curvilinear coordinates we set6 hα = gα
(19.161)
2 2 2 ds2 = h1 dξ 1 + h2 dξ 2 + h3 dξ 3 .
(19.162)
and obtain
In addition, because of the orthogonality, we have J := g1 × g2 · g3 = g1 g2 g3 = h1 (ξ α ) h2 (ξ α ) h3 (ξ α ).
(19.163)
Thus, dV = h1 (ξ α ) h2 (ξ α ) h3 (ξ α ) dξ 1 dξ 2 dξ 3 , and hence /// /// f (x) dV = f (ξ α ) h1 (ξ α ) h2 (ξ α ) h3 (ξ α ) dξ 1 dξ 2 dξ 3 , ◦ V
(19.164)
(19.165)
V
where ˚ V is the image of V in the ξ α -space. Similarly, consider a surface S defined by x = x(ξ 1 , ξ 2 , ξ∗3 ). We have dS = h1 (ξ α ) h2 (ξ α ) dξ 1 dξ 2 ,
(19.166)
and hence //
// f (x) dS = S
◦
f (ξ α ) h1 (ξ α ) h2 (ξ α ) dξ 1 dξ 2 ,
(19.167)
S
˚ is the image of S in the ξ α -space. where S 6 The coefficients h e coefficients, after the French matheα are known as the Lam´ matician Gabriel Lam´ e (1795–1870), who introduced them. He also introduced the Lam´ e parameters for linear isotropic elastic solids.
788
Part III. Multivariate calculus and mechanics in three dimensions
19.8.6 Back to cylindrical and spherical coordinates
♣
Here, we apply the results of the preceding subsection to the cases of cylindrical and spherical coordinates, which were presented in Subsections 19.8.2 and 19.8.3. Specifically, we show how the general formulation for orthogonal coordinates yields those results, as particular cases.
• Cylindrical coordinates For cylindrical coordinates, we have (use Eq. 19.137) dx = cos θ dr − r sin θ dθ,
dy = sin θ dr + r cos θ dθ,
dz = dz. (19.168)
This yields ds2 = dr2 + r2 dθ2 + dz 2 ,
(19.169)
as you may verify. Comparing to ds2 = (h1 dξ 1 )2 + (h2 dξ 2 )2 + (h3 dξ 3 )2 (Eq. 19.162), which holds only for orthogonal coordinates, we obtain that cylindrical coordinates are orthogonal, and in addition that hr = 1,
hθ = r,
hz = 1.
(19.170)
Therefore, using J = hr hθ hz (Eq. 19.163), one obtains J(r, θ, z) = r.
(19.171)
[Alternatively, we may use the definition of Jacobian (Eq. 19.148), namely
cos θ sin θ 0
∂(x, y, z) J(r, θ, z) = = −r sin θ r cos θ 0
, (19.172) ∂(r, θ, z)
0 0 1 and obtain the same result.] Therefore, the cylindrical volume element is given by dV = r dr dθ dz, in agreement with Eq. 19.138. Accordingly, the volume of a circular cylinder having radius R and height is given by / R/
π
/
V =
/
J dz dθ dr = 0
−π 0
in agreement with Eq. 19.139.
R
r dr 0
/
π
/
dz = π R2 ,
dθ −π
0
(19.173)
789
19. Multivariate integral calculus
Similarly, the area of the lateral surface of the circular cylinder of radius R and height is given by (use r = R) /
π
/
S= −π
/ hθ hz dz dθ = R
0
π
/ dθ
−π
dz = 2 π R .
(19.174)
0
◦ Comment. Similar considerations hold true for polar coordinates, for which hr = 1 and hθ = r, as you may verify. [As an exercise, define the Jacobian for two dimensions, and show that J = r (Eq. 19.129).]
• Spherical coordinates Similarly, for spherical coordinates (Eq. 19.140) we have dx = cos ψ cos ϕ dr − r cos ψ sin ϕ dϕ − r sin ψ cos ϕ dψ, dy = cos ψ sin ϕ dr + r cos ψ cos ϕ dϕ − r sin ψ sin ϕ dψ, dz = sin ψ dr + r cos ψ dψ.
(19.175)
This yields ds2 = dx2 + dy 2 + dz 2 = dr2 + r2 cos2 ψ dϕ2 + r2 dψ 2 ,
(19.176)
as you may verify. Comparing the above equation to ds2 = (h1 dξ 1 )2 + (h2 dξ 2 )2 + (h3 dξ 3 )2 (Eq. 19.162), which holds only for orthogonal coordinates, we obtain that spherical coordinates are orthogonal, and in addition that hr = 1,
hϕ = r cos ψ,
hψ = r.
(19.177)
Therefore, using J = h1 h2 h3 (Eq. 19.163), one obtains that the Jacobian of the transformation is J = r2 cos ψ.
(19.178)
[Alternatively, we may use the definition of Jacobian (Eq. 19.148), namely
cos ψ cos ϕ cos ψ sin ϕ sin ψ
∂(x, y, z) 0
, J= = −r cos ψ sin ϕ r cos ψ cos ϕ (19.179) ∂(r, ϕ, ψ)
−r sin ψ cos ϕ −r sin ψ sin ϕ r cos ψ and obtain the same result.] Hence, the spherical element volume is given by dV = r2 cos ψ dr dϕ dz, in agreement with Eq. 19.138. Accordingly, the volume of a ball of radius R
790
Part III. Multivariate calculus and mechanics in three dimensions
is given by / R/ V=
π
/
π/2
/ R / π / J dψ dϕ dr = r2 dr dϕ
−π −π/2
0
−π
0
π/2
cos ψ dψ =
−π/2
4 π R3 (19.180) 3
(in agreement with Eq. 19.133), whereas its area is given by /
π
S= −π
/
π/2
hϕ hψ dψ dϕ = R
2
/
−π/2
as you may verify. [Hint: Use
0 π/2
−π/2
π
/ dϕ
−π
π/2
cos ψ dψ = 4 π R2 ,
−π/2
cos x dx = 2 (Eq. 10.54).]
(19.181)
Chapter 20
Single particle dynamics in space
Finally, we begin to deal with dynamics in earnest, albeit just for a single particle. Indeed, in this chapter we exploit the know–how on multivariate calculus, acquired in the preceding chapters, to remove the simplifying assumption of one–dimensional motion used in Chapter 11, and extend the formulation of dynamics to two and three dimensions. [The problems are typically formulated in three dimensions. Two–dimensional problems are seen as particular cases in which the third dimension is irrelevant, in that the equations of motion in such a direction are automatically satisfied.]
• Overview of this chapter In Section 20.1, we address the Newton second law in three dimensions, whereas in Section 20.2 we present some illustrative examples. These include the motion of: (i) a particle not subject to any force, (ii) a particle subject only to a constant force, such as a cannonball subject solely to gravity, and (iii) a sled on an icy slope in the presence of friction, without and with air drag. We conclude this section by introducing the Newton law of gravitation and present a historical perspective of Newton’s explanation of the Kepler laws. [To smooth out the presentation, the mathematical aspects are covered in Appendix A, Section 20.8).] Then, we up the ante. In Section 20.3, we discuss longitudinal and lateral accelerations, with applications to the analysis of: (i) a bead moving along a spiral, (ii) a circular pendulum, and (iii) the so–called Huygens isochronous pendulum, namely a pendulum in which the period of oscillation is — in theory — independent of the oscillation amplitude. In Section 20.4, we introduce the so–called d’Alembert principle of inertial forces, with applications to a car on: (i) a level turn, and (ii) a banked turn. In Section 20.5, we de-
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_20
791
792
Part III. Multivariate calculus and mechanics in three dimensions
fine angular momentum and introduce the corresponding equation, with an application to spherical pendulums. Next, we address energy. In Section 20.6, we extend to three dimensions the one–dimensional formulation of energy. In particular, first we present the energy equation. Then, we discuss conservative forces and potential energy, and derive the expression for the potential energy of the forces of interest. We conclude this section by presenting the minimum–energy theorem for a single particle. Then, in Section 20.7, we show how to obtain — via energy considerations — the solution for the problem of a particle subject to conservative forces, an extension to three dimensions of the one–dimensional formulation that was presented in Subsection 11.9.7. We also have two appendices. As stated above, in the first (Section 20.8), we present a beautiful example of the interplay between mathematics and classical mechanics, namely the derivation of the three Kepler laws on the motion of the planets around a star, starting from the Newton second law and the Newton law of gravitation, along with related matters. In particular, first we address again the law of gravitation. Then, we study planets on circular orbits, and finally, we derive the three Kepler laws, for the motion of planets and comets on non–circular orbits. We also discuss the relationship between gravitation and gravity. In the second appendix (Section 20.9), for completeness we address certain curves (namely cycloids, tautochrones and isochrones), which are relevant for the analysis of the Huygens pendulum).
20.1 Velocity, acceleration and the Newton second law The first item we address in this chapter is the extension to three dimensions of the Newton second law for one–dimensional motion (Eq. 11.11). Some preliminaries on kinematics are included.
• Kinematics. Trajectory, velocity, acceleration In three dimensions, locations, displacements, velocities, accelerations and forces are vector quantities. The term frame of reference refers to a set of three linearly independent vectors (typically, orthonormal). For one–dimensional motion, we have been using the terms “velocity” and “speed” interchangeably. From now on, the velocity is a vector quantity, whereas the speed is a scalar one (e.g., what you read from the speedometer in your car).
793
20. Single particle dynamics in space
Similar considerations apply to the acceleration. Is acceleration a scalar quantity or a vector quantity? In everyday life, when you say “My car has excellent acceleration,” you are talking about what is colloquially referred to as its pick–up, namely the rate of change of your car’s speed, a scalar quantity. On the other hand, in classical mechanics, acceleration is the rate of change of the velocity, not the speed. Hence, acceleration is a vector quantity. What is its direction? We will discuss this later. Here, it suffices to say that the direction of the acceleration does not necessarily coincide with the direction of the velocity. Indeed, if the car is turning (e.g., moves along a circle), the acceleration is different from zero, even when your car has a constant speed. For, the speed might be constant, but the velocity changes! Accordingly, we have the following Definition 324 (Trajectory, velocity and acceleration). Let x = x(t) denote the location of a point P , at time t. The curve described by the function x = x(t) is called the trajectory covered by P during its motion. The velocity of P , denoted by v(t), is defined as the time derivative of x(t): v(t) :=
dx . dt
(20.1)
The acceleration, denoted by a(t), is the time derivative of the velocity: a(t) :=
d2 x dv = 2. dt dt
(20.2)
◦ Comment. The considerations regarding frames of reference (Subsection 11.1.3) and inertial frames of reference (Subsection 11.2.1) apply here as well. The only difference is that, given an inertial frame of reference, a second one has a constant velocity v (vector quantity!) with respect to the first.
• The Newton second law A major step in the development of classical mechanics came with the introduction of the following (extension of Eq. 11.11) Principle 4 (The Newton second law) The second law states that, for any particle, we have m a = f, where
(20.3)
794
Part III. Multivariate calculus and mechanics in three dimensions
f=
n #
fk
(20.4)
k=1
is the sum of all the n forces acting on the particle. In other words, the acceleration is proportional to the resultant force f acting on it. The constant of proportionality m > 0 is called the mass of the particle. ◦ Comment. Note that the wording of the above principle is virtually identical to that presented in connection with Eq. 11.11, for the one–dimensional case, the only difference being the use of the vector symbols a and f , in place of a and F . [One more time: f (and not F) is the three–dimensional equivalent of F , because in this book capital boldface letters are reserved for tensors, a notion introduced in Section 23.9 (Remark 135, p. 666, on notations).]
• Momentum Recall that, for one–dimensional problems, the quantity p = m v (Eq. 11.13) is called the momentum of a particle having mass m and velocity v. For future reference, we extend this to three dimensions. We have the following Definition 325 (Momentum of a particle). Consider a particle, with mass m and velocity v. The quantity p = mv
(20.5)
is called the momentum of the particle. Accordingly, the Newton second law (Eq. 20.3) is fully equivalent to the following momentum equation: dp = f. dt
(20.6)
◦ Warning. Some authors use the term “linear momentum” (instead of momentum), to distinguish it from the “angular momentum” (Section 20.5).
20.2 Illustrative examples In this section, we present the formulations for some elementary problems and the corresponding solutions. We begin with a particle subject to no forces (Subsection 20.2.1). Next, we consider a particle subject to a constant force, such as the weight (Subsection 20.2.2). Then, we consider a sled on a slope in
795
20. Single particle dynamics in space
the presence of friction, without and with aerodynamic drag (respectively, in Subsections 20.2.3 and 20.2.4). We conclude this section by introducing the Newton law of gravitation, along with some historical comments on the Kepler laws and the corresponding formulation by Newton (Subsection 20.2.5).
20.2.1 Particle subject to no forces. Newton first law As the most elementary illustrative example, let us consider the case in which f = 0 (the corresponding one–dimensional problem was addressed in Section 11.4). The equation that governs the motion is given by m a = 0 (Eq. 20.3, with f = 0), namely (use Eq. 20.2) a=
dv d2 x = 2 = 0. dt dt
(20.7)
We complete the problem with the following initial conditions are x(0) = x0 ,
(20.8)
with x0 denoting the initial location of the particle, and v(0) = v0 ,
(20.9)
with v0 denoting the initial velocity of the particle. Integrating each component of Eq. 20.7 between 0 and t, we have (recall that the primitive of zero is a constant, Theorem 118, p. 420) v(t) =
dx = v0 , dt
(20.10)
where we have used Eq. 20.9. Integrating again, we have x(t) = v0 t + x0 ,
(20.11)
where we have used Eq. 20.8. Equation 20.11 tells us that the particle moves in (rectilinear) uniform motion. We have discovered the following Law 5 (The Newton first law) The Newton first law states that, in an inertial frame of reference, a particle subject to no force either remains at rest or moves with a constant velocity.
796
Part III. Multivariate calculus and mechanics in three dimensions
◦ Comment. It will not hurt to observe that the considerations presented in Subsection 11.4.1 for the one–dimensional Newton first law apply to the three–dimensional case as well. Specifically, the Newton first law is a consequence of the Newton second law. And that’s not all there is! In a frame of reference that moves with the particle, the Newton first law reduces to the Newton equilibrium law for a particle, introduced in Section 17.2. [We have exploited this several times in Chapter 17, on statics.] Vice versa, the Newton equilibrium law turns into the Newton first law if we use a second frame of reference that moves in uniform motion with respect to the first one (both inertial, of course).
20.2.2 Particle subject to a constant force Consider the motion of a particle in space subject to a constant force, say a cannonball subject solely to its weight. [Here, we assume that we can neglect the force produced by the air, such as the aerodynamic drag (Eq. 14.80). Of course, the aerodynamic drag is not negligible in the case of a feather and similar cases.] The same formulation applies to similar problems, such as the trajectory of the weights in a shot put competition. In these two cases, the air resistance (air drag) may be neglected. [In other similar cases this is not true. For instance, the presence of the air affects considerably the ball trajectory in volleyball.] Experimental evidence shows that the weight is proportional to the mass m (introduced in Principle 4, p. 793), namely w = m g = −m g k,
(20.12)
where k is pointed upwards. [That is, the z-axis is directed upwards (by definition of “upwards”, I may add), namely in the direction opposite to g]. Then, dividing by m, the Newton second law gives us a=
dv d2 x = 2 = g. dt dt
(20.13)
In addition, in the following, we assume that: (i) the origin coincides with the initial location of the ball (x0 = 0), (ii) the initial velocity of the ball is given by v(0) = v0 = 0 (Eq. 20.9), and (iii) the (x, z) plane is that determined by g and v0 (Fig. 20.1). In terms of components we have d2 x = 0, dt2
d2 y = 0, dt2
d2 z = −g := −g. dt2
(20.14)
797
20. Single particle dynamics in space
The initial conditions for the location are x0 = y0 = z0 = 0, whereas for the initial velocity we have x˙ 0 = u0 , y˙ 0 = 0 and z˙0 = w0 . Integrating Eq. 20.14 once, and using the above initial conditions, we have u(t) = u0 ,
v(t) = 0,
w(t) = w0 − g t.
(20.15)
Integrating Eq. 20.14 a second time, we have (use x(0) = 0) x(t) = u0 t,
y(t) = 0,
z(t) = w0 t −
1 2 gt . 2
(20.16)
◦ Comment. In analogy with Remark 98, p. 465, for one–dimensional motion, note that in arriving at this solution, we have to know the values of six constants: the three components of the initial location and those of the initial velocity (see also Remark 96, p. 459). Eliminating t between the first and the third of Eq. 20.16, we have z(x) =
w0 1 g 2 x− x . u0 2 u20
(20.17)
The resulting graph of the function z(x) is of the type shown in Fig. 20.1, namely a parabola (Eq. 6.32).
Fig. 20.1 z/zM as a function of x/xM
In the following, we assume that v0 forms an angle θ ∈ (0, π/2) with the x-axis (as in Fig. 20.1), so as to have u0 = v0 cos θ > 0 and w0 = v0 cos θ > 0 (directed upwards). Since we have assumed w0 to be positive, the ball goes initially upward, reaches a maximum and then goes downwards. Note that, for the case under consideration, the two motions (namely horizontal and vertical) are decoupled (that is, they may be solved independently of each other). Note also that the treatment of the horizontal component of the motion is analogous to the one–dimensional material presented in Section
798
Part III. Multivariate calculus and mechanics in three dimensions
11.4 (particle subject to no forces), and that of the vertical one is analogous to the one–dimensional one presented in Section 11.5 (particle subject to a constant force). ◦ Comment. Note that, were we to use a frame of reference that travels with speed u0 along the x-axis, we would see only the vertical motion and the present formulations would be indistinguishable from that in Subsection 11.5.1.
• Maximum height With the know–how developed in Subsection 9.5.1 and Section 13.6, on local maxima and minima of differentiable functions, we are able to determine the maximum height reached by the ball. Indeed, let xM denote the abscissa where the ball reaches its maximum height. At x = xM, the slope of z(x) (as given in Eq. 20.17) vanishes, that is, dz/dx|x=xM = w0 /u0 − gxM/u20 = 0. Therefore, we have xM = u0 w0 /g.
(20.18)
Combining with Eq. 20.17, we obtain zM = z(xM) = 12 w02 /g, in agreement with Eq. 11.29. [Looking at it in a different way, the maximum height is reached when the vertical component of the velocity vanishes: w(tM) = w0 − g tM = 0 (third of Eq. 20.15). This implies tM = w0 /g, which combined with x = u0 t (first of Eq. 20.16) yields the same result as in Eq. 20.18.]
• Range Also worth noting is the fact that, for x = 2 xM, the two terms on the right side of Eq. 20.17 offset each other and z attains again the value z0 = 0, namely the ball is returned to its original height. Correspondingly, we have that the range, xR, namely the value of x when the ball returns to its original height, is given by xR = 2 xM = 2 u0 w0 /g.
(20.19)
[Another way to look at it is the fact that, for t = 2 w0 /g = 2 tM, the two terms on the right side of the third of Eq. 20.16 offset each other and z attains again the value z0 = 0. This was already observed in Subsection 11.5.1, in connection with the vertical motion of a particle subject to gravity. Also, this could have been anticipated from the fact that the graph of a vertical parabola is symmetric with respect to its axis.]
799
20. Single particle dynamics in space
• Maximum range, by changing θ Finally, consider the following problem. Assume that the cannonball (or the shot put weight) has a given initial speed, say v0 = V > 0, prescribed, and that in addition we are free to choose the inclination of the cannon, namely the angle θ ∈ (0, π/2) that the velocity forms with the horizontal plane. We ask ourselves which angle yields the maximum value for the range xR. It is apparent that θ = 0 (horizontal start) the ball starts to go down immediately, and hence the range is zero. As we increase θ, the range gradually increases. On the other hand, if θ = π/2 (vertical start), the ball does not move from the vertical (as in Subsection 11.5.1), and hence the range is again equal to zero. As we decrease θ from π/2, the range gradually increases. Thus, we expect that, somewhere between θ = 0 and θ = π/2, the range attains its maximum value. To find this value of θ, say θM, set V := v0 , so that u0 = V cos θ
and
w0 = V sin θ.
Combining Eqs. 20.19 and 20.20 and using sin θ cos θ = we have xR(θ) = 2
V2 V2 cos θ sin θ = sin 2θ. g g
(20.20) 1 2
sin 2θ (Eq. 7.64),
(20.21)
Clearly, the maximum range is obtained when sin 2θ = 1, that is, for θ = π/4. [Alternatively, this is also obtained by imposing dxR/dθ = cos 2θ = 0.] The angle for the maximum range is θM = 45◦ . The corresponding maximum range is xR = V 2 /g (use Eq. 20.21). [If you include air drag, numerical Max
techniques give you less than 45◦ , as confirmed by experimental results.]
20.2.3 Sled on snowy slope with friction (no drag) Consider a particle on an inclined plane, with a constant angle α with respect to a horizontal plane. Here, we include the effect of the presence of friction. However, for the sake of simplicity, we assume the aerodynamic drag to be negligible: D 0. [The limitation D 0 is removed in Subsection 20.2.4.] As a practical example, you may consider this as a crude model of a sled on a smooth snowy slope, with velocity relatively small so that the aerodynamic drag is negligible (Fig. 20.2). The sled stays in contact with the slope at all times. Hence, the motion is really one–dimensional: the location of the particle may be defined by an abscissa along the slope. To be precise, the forces that affect the acceleration
800
Part III. Multivariate calculus and mechanics in three dimensions
Fig. 20.2 Sled on a slope
are those in the direction of the slope. However, the forces that are normal to the slope are also relevant to the motion, because they determine the value of the dynamic friction (Subsection 17.8.2). The balance of the normal forces gives us N = W cos α,
(20.22)
as you may verify. [Hint: The acceleration along the normal vanishes.] On the other hand, the tangential forces are: (i) the component of the weight W sin α, pointed downwards, and (ii) the dynamic friction T = μDN = μDW cos α, pointed upwards (use Eqs. 17.89 and 20.22). Therefore, if x denotes the abscissa along the slope (positive downwards), the equation that governs the tangential motion is given by m
d2 x = F = W sin α − μDW cos α. dt2
(20.23)
Next, we assume that F > 0, namely tan α > μD.
(20.24)
Remark 144. Actually, strictly speaking, if the sled is initially at rest, we need tan α > μS > μD in order for the sled to start moving (Eq. 17.95 and the paragraph that follows). For μD < tan α < μS, we need an initial velocity v(0) = 0 (a slight push would do). [If tan α = μD (Eq. 17.96), we are back to Subsection 17.9.3, on a particle on a slope with friction and no drag.] The force F in Eq. 20.23 is constant. Hence, Eqs. 11.21 and 11.22 apply in this case as well. For simplicity, let the origin coincide with the initial location of the sled, namely x0 = 0. Also, assume the initial velocity to be zero, namely v0 = 0. [As pointed out in the above remark, this requires tan α > μS.] Thus, the solution is
801
20. Single particle dynamics in space
v(t) =
F t m
and
x(t) =
1 F 2 t , 2 m
(20.25)
where F/m = g (sin α − μD cos α) > 0 (Eq. 20.24). [Note the analogy with the vertical fall in a vacuum (Subsection 11.5.1), to which the present analysis reduces when α = π/2. The only difference is the expression for F .] Note that the solution is independent of m. Indeed, m becomes relevant only if we take into account the aerodynamic drag, as shown in the next subsection.
20.2.4 Sled on snowy slope with friction and drag
♥
In the preceding subsection, we have considered a sled on a smooth snowy slope, in the presence of friction, but in the absence of aerodynamic drag. Specifically, we have assumed the friction to be relevant, but the aerodynamic drag to be negligible. Such an assumption is acceptable for the early phase of the motion, when the velocity is still small, and hence the drag is negligible. Here, we remove this limitation, namely we consider a sled on a slope, in the presence of both friction and aerodynamic drag (see Fig. 20.3, where now D = 0).
Fig. 20.3 Sled on a slope
◦ Comment. D´ej` a vu? The problem was already addressed in connection with a particle on a slope with friction and drag (Subsection 705), for the limited case of constant velocity, say v∗ , as given in Eq. 17.99. Moreover, the problem is closely related to the one–dimensional one addressed in Section 14.6 (vertical fall of a particle in the air). Indeed, after all, the motion is one– dimensional. [However, we could not cover the problem under consideration then, because the vector formulation is needed, since the longitudinal force due to friction depends upon the normal force.]
802
Part III. Multivariate calculus and mechanics in three dimensions
The governing equation is (add the drag D to Eq. 20.23) m
dv = W sin α − μDW cos α − D, dt
(20.26)
where D denotes the drag caused by the presence of the air. Recall that experimental evidence shows that the drag is given by D = 12 CD A A v 2 (Eq. 14.80), where A is the air density and A the cross–section area of the object under consideration, whereas the drag coefficient CD > 0 is obtained experimentally and typically may be treated as a constant. The initial conditions are x(0) = v(0) = 0. [Of course, here we assume that α is such that tan α > μS (see Eq. 17.95). However, for tan α ∈ (μD, μS), Remark 144, p. 800, applies here as well.] Note that the formulation is formally identical to that of Section 14.6, on the fall of a heavy particle in the presence of aerodynamic drag. Thus, the solution presented in Subsection 14.6.1 √ is applicable. Hence, the desired solution is given by x(t) = (m/C) ln cosh( CF t/m) (Eq. 14.94), where C is still given by C = 12 CD A A (Eq. 14.82), whereas now F = m g sin α − μD cos α . (20.27) [Of course, if α = π/2, we recover the solution of Subsection 14.6.1, on free fall in the air, namely Eq. 14.96].
20.2.5 Newton gravitation and Kepler laws
A major step forward in the development of celestial mechanics came with the introduction of the Newton law of gravitation, which may be stated as follows Principle 5 (Newton law of gravitation) Let m1 and m2 be the masses of two material points, P1 and P2 , located at x1 and x2 respectively. The force of gravitational attraction that the first particle exerts on the second is given by fG(x1 , x2 ) = − G
m1 m2 r, r3
(20.28)
where r = x2 − x1 is the vector from the first to the second particle, and r = r, whereas G = (6.674 30 ± 0.000 15) · 10−11 m3 kg−1 s−2
(20.29)
20. Single particle dynamics in space
803
is known as the gravitational constant.1 [All the astronautical data used in this volume are collected on p. 1003.] The relative uncertainty is 2.2 · 10−5 .2 ◦ Comment. You might get curious and ask whether the terms “gravitation” and “gravity” have the same meaning. Close, but not quite! The relationship between the two is addressed in Subsection 20.8.1. With the introduction of this law along with the Newton second law (Eq. 20.3), Newton was able to predict (“postdict,” really) the Kepler laws of the motion of the planets.3 This is arguably the most beautiful example of the interplay between mathematics and classical mechanics. Newton not only introduced his second law and his law of gravitation. He had to develop infinitesimal calculus in order to be able to provide a solution of the governing equations, so as to compare them with the empirical laws obtained by Kepler, which in turn are based upon the experimental data by Tycho Brahe.4 ◦ Comment. While by now we have all the know–how required for the derivation of the Kepler laws from the Newton second law and gravitational law, I prefer to group together in Appendix A (Section 20.8) all the material of this chapter pertaining to celestial mechanics. This is done for the sake of organizational clarity, in the sense that I deem it more appropriate to present the material as a whole, rather than in bits and pieces, as new mathematical tools are introduced. Indeed, some portions of Appendix A (Section 20.8) require know–how that has not yet been covered. [Specifically, the analysis of planets in circular motion is based upon the notion of longitudinal and lateral acceleration, addressed in the following section. Also, the material on angular momentum, introduced in Section 20.5, is useful to provide a physical interpretation of the Kepler second law.] If these motivations were not enough for you, I may add that covering all this material requires about ten pages, too long a digression at this point. 1 Note that the gravitational constant is typically indicated by G. I chose to use G instead, in order to avoid confusion with the center of gravity (Eq. 17.81) and the center of mass (which will be introduced in Eq. 21.46), both denoted by G, as it will be discussed in Remark 158, p. 868. [The need to distinguish between the two stems from the fact that both symbols may appear in the same equation (see for instance Eq. 21.167).] 2
Note that the mass (gravitational mass) in the gravitational law (Eq. 20.28) coincides with mass (inertial mass) in the Newton second law (Eqs. 11.11 and 20.3). This fact is the basis of Einstein’s theory of general relativity.
3
Johannes Kepler (1571–1630) was a German mathematician and astronomer. He is best known for his laws, which provided Isaac Newton with the appropriate input for the theory of gravitation. After teaching mathematics in Graz, Austria, he became an assistant to the astronomer Tycho Brahe, from whom he received the experimental data that allowed him to formulate his own laws. 4 Tycho Brahe (1546–1601), was a Danish astronomer, known for his extremely accurate (for those days) experimental observations on the motion of the planets. Kepler used the experimental data by Brahe to formulate his own laws.
804
Part III. Multivariate calculus and mechanics in three dimensions
20.3 Intrinsic velocity and acceleration components In Section 20.1, we pointed out that the acceleration vector is not parallel to the velocity, since it has a normal component as well. To address this issue, here the acceleration vector is expressed in terms of the so-called intrinsic components, namely the acceleration components that are tangential and normal to the velocity. Consider, for the sake of clarity, a particle that moves along a prescribed line L, described by x = x(s). [A good example of a physical problem that can be approximated as the motion of a particle on a prescribed trajectory is a roller–coaster cart.] In this case, the motion is defined by x = x[s(t)].
(20.30)
Then, we have the following definitions: Definition 326 (Intrinsic velocity components). Using v = dx/dt (Eq. 20.1), we have v = (dx/ds) (ds/dt), or v = s˙ t,
(20.31)
where t = dx/ds (Eq. 18.20) is the unit tangent to the trajectory, whereas s˙ is called the speed of the particle along the line and is positive if s increases with time. Of course, the velocity is parallel to the trajectory L. Definition 327 (Intrinsic acceleration components). Using a = dv/dt (Eq. 20.2), Eq. 20.31 yields a = dv/dt = s¨ t + s˙ 2 dt/ds. Recall that dt/ds = n/R (Eq. 18.23), where R is the radius of curvature of the line and n its normal in the osculating plane, pointed toward the center of curvature. Thus, using v = v = |s|, ˙ we have a = s¨ t +
v2 n. R
(20.32)
The vector s¨ t (v 2 n/R) is called the longitudinal (lateral ) acceleration, whereas s¨ (v 2 /R) is called the longitudinal (lateral) component of the acceleration. For want of a better name, s¨ will be referred to as the particle pick–up (positive if s˙ is increasing, e.g., if your car is accelerating). ◦ Warning. It is apparent that the longitudinal acceleration is due to the rate of change of the speed s, ˙ whereas the lateral acceleration is due to the change in the velocity direction. [Note that, contrary to what we did in Eq. 18.18, here we do not require dx/dt = 0. Indeed, a change in sign of s˙ indicates a reversal of the direction of the motion along the line L.]
805
20. Single particle dynamics in space
◦ Comment. It should be emphasized that intrinsic acceleration components have broader use than studying the motion of a particle along a given trajectory. [For instance, the study of the motion of an airplane is typically based upon the use of intrinsic acceleration components, even though the trajectory is obtained only after solving the equations of motion.] Indeed, the prescribed–trajectory assumption has been introduced here only for the sake of convenience and clarity, but is not essential for any of the mathematical derivations used above. That said, the equations are particularly useful in the case of motion along a prescribed trajectory.
20.3.1 A heavy particle on a frictionless wire As an illustrative example, consider a heavy particle, moving without friction along a given trajectory, such as a bead moving along a frictionless wire. Assume also the aerodynamic drag to be negligible. [You may also think of a particle sliding down an icy slope with prescribed trajectory, such as a bobsled. However, we have to impose that there is no separation when the contact is unilateral. No ski jumping — just to make sure that we understand each other.]
• A bead moving along a helicoidal spiral Consider a bead moving along a helicoidal spiral. The objective here is to relate the acceleration to the velocity. Accordingly, in this subsection, the speed s(t) ˙ is assumed to be known. [To give you an example of how this may occur, assume the bead to be only subject to: (i) the force of gravity and (ii) the reaction of a frictionless constraint. As we will see in Section 20.7, in this case (in analogy with the one–dimensional case, Eq. 11.114), the speed may be obtained from the conservation of energy, and is given by v = 2gh, (20.33) with h = z0 − z > 0, where z0 corresponds to the point where the velocity vanishes, as in a start from rest (see Eq. 20.127).] Specifically, recall that a helicoidal spiral is described by Eq. 18.34, namely x = RH cos θ,
y = RH sin θ,
z = a θ,
where RH is the radius of the helix, and 2πa is called its pitch.
(20.34)
806
Part III. Multivariate calculus and mechanics in three dimensions
As stated above, here we assume that the speed s(t) ˙ is prescribed. Then, using a = s¨ t + (v 2 /R) n (Eq. 20.32), and recalling that for a helix we have 1/R = κ = RH/(RH2 + a2 ) (first in Eq. 18.39), the acceleration is given by a = s¨ t +
RH v 2 n, RH2 + a2
(20.35)
. with t = (−RH sin θ i + RH cos θ j + a k)/ RH2 + a2 and n = − cos θ i − sin θ j (Eq. 18.37 and second in Eq. 18.39, respectively).
• A heavy particle on a planar vertical path (no friction) Here, we consider a heavy particle, moving without friction along a given planar vertical trajectory, e.g., along a wire that lies on a vertical plane. Again, we assume the aerodynamic drag to be negligible. The only forces acting on the particle are the gravity mg, and the reaction of the frictionless wire N n, where n is the normal to the wire, pointing toward the center of curvature of the trajectory. Accordingly, the Newton second law states that ma = mg + N n. Since the trajectory is fixed, it is convenient to express the acceleration in terms of its intrinsic components. Thus, using Eq. 20.32, the equation governing the motion is m
d2 s v2 n = m g + N n. t + m dt2 R
(20.36)
[Of course, N is positive if the constraint reaction is pointed in the same direction as n, and negative otherwise.] Setting g = −g j, with j pointing upwards, and dotting Eq. 20.36 with t, one obtains the longitudinal component of the equation, namely d2 s = −g sin α, dt2
(20.37)
where sin α = j · t, α being the angle between t and the x-axis (see Fig. 20.4, where α > 0). This is the desired (exact) equation for the dynamics of any heavy particle on a prescribed trajectory on a vertical plane. On the other hand, dotting Eq. 20.36 with n, one obtains the reaction N , namely N =m
v2 + m g cos α, R
(20.38)
807
20. Single particle dynamics in space
Fig. 20.4 Particle on a prescribed planar vertical path
with cos α = j · n. [Again, n points towards the center of curvature of the trajectory (Definition 293, p. 712), namely upwards in Fig. 20.4.]
• Ideal pendulum Sometimes it is convenient to obtain a desired trajectory by connecting the particle P to a fixed point O through a massless rod, hinged at O and having a length that may be a function of θ, namely = (θ), where θ denotes the angle between the segment OP and a vertical line. Accordingly, the formulation in the preceding subsubsection may be used to study, in particular, the dynamics of a pendulum, understood as a mass point with a distance from a fixed point given by = (θ) (Fig. 20.5).
Fig. 20.5 Pendulum
Fig. 20.6 Circular pendulum
808
Part III. Multivariate calculus and mechanics in three dimensions
Two types of pendulums are considered in the subsections that follow. In Subsection 20.3.2 we assume to be constant (ideal circular pendulum). For small–amplitude oscillation, the frequency is nearly constant. The assumption that is constant is removed in Subsection 20.3.3, where we will consider the so–called Huygens isochronous pendulums, namely a pendulum with a length given as a prescribed function of θ, in such a way that the frequency is independent of the amplitude of oscillation even for large amplitudes.
20.3.2 The circular pendulum Here, we assume to be independent of θ, so that the curve followed by the particle is an arc of a circle, with radius , as depicted in Fig. 20.6. In this case, the pendulum is called circular. [It will be referred to as ideal to distinguish it from the realistic pendulum, addressed in Subsection 23.7.1.] Let us choose the origin of the reference system to coincide with the center O of the circle (Fig. 20.6). Then, using Eq. 7.42, the trajectory may be expressed in parametric form as x = sin θ and y = − cos θ. [Here, θ = 0 coincides with the bottom point of the circle, namely the point A in Fig. 20.6, where y = −.] Recalling that s = θ and noting that in Eq. 20.37 α may be replaced with θ (since for a circular pendulum the two coincide, Fig. 20.6) yields θ¨ + g sin θ = 0, namely d2 θ + C sin θ = 0, dt2
(20.39)
C = g/ > 0.
(20.40)
where
Equation 20.39 is the exact governing equation for a circular pendulum. The solution to Eq. 20.39 is addressed in the subsubsections that follow, for the limited small–amplitude case, around either the bottom or the top of the circle. [Large amplitude oscillations are addressed in Subsection 20.7.2.]
• Small oscillations around the bottom Here, we assume the amplitude of oscillation around θ = 0 (bottom location) to be small. Accordingly, we can use sin θ = θ + O[θ3 ] (Taylor polynomial, Eq. 13.40). Neglecting higher–order terms, we obtain the linearized circular pendulum equation:
809
20. Single particle dynamics in space
d2 θ + C θ = 0. dt2
(20.41)
Consider, for instance, the following initial conditions: θ(0) = θ0 = 0 and ˙ θ(0) = 0. Then, using x(t) = x0 cos ωt+(v0 /ω) sin ωt (Eq. 11.40), the solution is θ = θ0 cos ωt,
(20.42)
where (Eq. 20.40) ω=
√ C = g/
(20.43)
is the frequency of oscillation. The corresponding period of oscillation is given by T = 2π/ω (Eq. 11.38), namely T = 2 π /g . (20.44)
• Small disturbances around the top Here, we study the motion of the particle in a small neighborhood of the top of the circle, where θ = π. Accordingly, set θ = π + ϕ. Then, using sin θ = sin(π + ϕ) = − sin ϕ (Eq. 6.69), Eq. 20.39 yields d2 ϕ − C sin ϕ = 0. dt2
(20.45)
For suitably small disturbances, we obtain ϕ ¨ − C ϕ = 0, with C := g/ > 0. Accordingly, using similar initial conditions, namely ϕ(0) = ϕ0 = 0 and ϕ(0) ˙ = 0, the solution is (Eq. 14.107) √ ϕ(t) = ϕ0 cosh C t. (20.46) In this case, the solution is unstable. ◦ Comment. As the solution grows, the small–disturbance assumption is no longer valid! In other words, the above small–disturbance solution is valid only at the beginning of the motion.
810
Part III. Multivariate calculus and mechanics in three dimensions
20.3.3 The Huygens isochronous pendulum
♥
Here, we present the formulation for the so–called Huygens pendulum.5 At least in theory, such a pendulum is isochronous, namely with a period independent of the amplitude.6 ◦ Comment. The formulation makes use of certain types of curves, known as cycloids and tautochrones. In particular, at the end of this subsection we show that the curve described by the isochronous pendulum (Eq. 20.54) is a tautochrone. In order not to interrupt the flow of the presentation, these curves are addressed in Appendix B (Section 20.9). Note that the period of oscillation T = 2π/ω (Eq. 11.38) of the linearized pendulum equation (Eq. 20.42) is independent of θ0 (the value of θ when θ˙ = 0). On the other hand, as we will see in Subsection 20.7.2, for the exact circular pendulum equation, the period of oscillation depends upon the amplitude of oscillation, as apparent by comparing Eq. 20.133 (exact) and Eq. 20.141 (approximate). Thus, the circular pendulum is not isochronous (or, if you prefer, is isochronous only for infinitesimally small oscillations). It was very important, in the Huygens days, to have an isochronous pendulum. Indeed, in those days, on ships traveling over the oceans an accurate evaluation of the longitude, then based upon the locations of the stars, required an equally accurate measure of the time of the day. Thus, isochronicity was a paramount feature for an accurate pendulum clock to have.
• Mathematical formulation
♥♥
To decide what to do, recall the exact equation governing the motion of a particle on a prescribed vertical trajectory, namely s¨ = −g sin α (Eq. 20.37), where α is the angle that the local tangent to the curve makes with the x-axis (the y-axis being aligned like gravity). From this stems the idea of obtaining an isochronous pendulum by choosing a trajectory such that sin α is proportional to s, so as to make the gov5 Christiaan Huygens (1629–1695) was a Dutch physicist, astronomer, mathematician and horologist. Huygens is one of the greatest scientists of all time. As an inventor, he got the first patent on the pendulum clock. As an experimental physicist, he improved the design of the telescope, with which he studied the rings of Saturn and discovered its moon Titan. As a theoretician, he is considered the father of mathematical physics, because of his extensive use of mathematics to formulate physics. In particular, he proposed the wave theory of light (in contrast to the later theory by Newton, based upon light particles), and obtained the mathematical expression for the centrifugal force (Section 22.4, on apparent inertial forces). He even wrote the first book on probability. 6
The term isochronous is from ancient Greek, namely ισoς (isos, equal) and χρoνoς (chronos, time).
811
20. Single particle dynamics in space
erning equation identical to the linearized pendulum equation. In order to identify the equation of the trajectory that the particle has to follow to provide isochronicity, let the curve of the trajectory be described in the following unusual but convenient form y = y(s),
(20.47)
where y denotes the distance from a given horizontal line, whereas as usual s is the arclength along the curve. Given y = y(s), we can obtain sin α as sin α = dy/ds, where α ∈ (−π/2, π/2). [Indeed, this stems from the geometrical definition of the function sin α (Eq. 6.63). Alternatively, recall that dy/dx = tan α (Eq. 9.7), so that dy dy tan α = sin α, = =√ ds 1 + tan2 α dx2 + dy 2
(20.48)
as you may verify.] Thus, combining with s¨ = −g sin α (Eq. 20.37), we have d2 s dy = 0. +g dt2 ds
(20.49)
This equation reduces to the linearized equation of the pendulum if, and only if, dy/ds is proportional to s, namely dy = 2Qs ds
(Q > 0),
(20.50)
whose solution is discussed in the subsubsection that follows. Accordingly, combining Eqs. 20.49 and 20.50, one obtains d2 s +Cs=0 dt2
(C = 2 g Q > 0).
(20.51)
Given the fact that C > 0, for the initial conditions s(0) = s0 and s(0) ˙ =0 the solution is given by s = s0 cos ωt, where s0 is the initial value of s, whereas √ ω = C = 2gQ is the frequency of oscillation, which is independent of s0 , as desired.
(20.52)
(20.53)
812
Part III. Multivariate calculus and mechanics in three dimensions
• Isochronous pendulum and tautochrones
♥♥♥
Here, we discuss the relationship between the trajectory of an isochronous pendulum and a tautochrone. [Tautochrones are introduced in Appendix B, Subsection 20.9.2.] We have the following Theorem 209. The curve defined by Eq. 20.54 is a tautochrone with radius R = 1/(8Q). Proof. ♥ To begin with, Eq. 20.50 yields (choose the additive constant of integration, B = 0, so as to have y = 0 when s = 0) y = Q s2 Q>0 , (20.54) √ as you may verify. This equation implies dy/ds = 2Qs = 2Q y/Q = 2 Qy , namely
dy ds
2 = 4 Q y.
(20.55)
On the other hand, as we will see in Subsection 20.9.2, for a tautochrone we have (dx/dy)2 = (2 R−y)/y (Eq. 20.187). Hence, using (ds)2 = (dx)2 +(dy)2 , we have 2 2 ds dx 2R 2R − y = , (20.56) =1+ =1+ dy dy y y in agreement with Eq. 20.55, provided that 8Q = 1/R.
◦ Comment. Note that, if s is small, we have s x, and hence y = Qx2 . Thus, in a small neighborhood of the origin, the curve in Eq. 20.54 may be approximated with a vertical parabola, with the apex being its bottom point. Accordingly, Eq. 20.49 reduces to the linearized pendulum equation.
• Historical comments
♥
The above results are exploited in the Huygens isochronous pendulum. Specifically, Huygens proposed an isochronous pendulum, which is obtained by suspending a particle in such a way that its trajectory satisfies Eq. 20.54. To implement this, let L denote the string that connects the mass point to the fixed point. If the string L, as it swings, wraps itself around a curve C, the trajectory of the particle is no longer circular. It is possible to choose the curve C so that the trajectory satisfies Eq. 20.54. [For your information,
813
20. Single particle dynamics in space
the curve C is itself an upside–down cycloid, a curve closely related to the tautochrone, and discussed in Subsection 20.9.1.] In reality, when implemented in Huygens’ days, the results were not as good as hoped for. This is attributed to the fact that the friction due to L rolling over C causes a lack of isochronism larger than the one Huygens was trying to correct. [More on the subject in Boyer, Ref. [13], p. 374–378.]
20.4 The d’Alembert principle of inertial forces Sometimes, the motion is prescribed (for instance, as a desirable objective in the design of a roller coaster, or for an airplane landing approach) and one wants to find out what forces are necessary to obtain such a motion. In this case, it may be convenient to combine the Newton second law, m a = f (Eq. "n 20.3), with f = k=1 fk (Eq. 20.4). Specifically, set f0 := −m a = −m s¨ t − m
v2 n, R
(20.57)
to obtain n #
fh = 0.
(20.58)
k=0
[Note that, contrary to Eq. 20.4, in Eq. 20.58 the sum starts from k = 0.] The vector f0 is called the inertial force. Equation 20.58 is known as the d’Alembert principle of inertial forces, named after d’Alembert who introduced it. Equation 20.58 allows one to treat dynamics problems as if they were statics ones. Indeed, Eq. 20.58 is formally equal to the Newton law of the static equilibrium of a particle, Eq. 17.1. ◦ Comment. Of course, Eq. 20.58 this is not really a principle, it’s simply a definition of f0 . Indeed, it is a consequence of the Newton second law (Eq. 20.3). However, historically, it was introduced as a principle, and I prefer to maintain the terminology. Remark 145. In Eq. 20.32, the terms −m¨ s and −mv 2 n/R will be referred to as the longitudinal and lateral inertial forces, respectively. Some authors use the term “centrifugal force” to refer to the lateral inertial force, namely to the (inertial) force due to the lateral acceleration. I believe that this is misleading and can be a source of confusion, since the centrifugal force is an (apparent) inertial force field that is experienced in rotating frames of reference (Chapter 22). For this reason, I will distinguish between the two and use only the term
814
Part III. Multivariate calculus and mechanics in three dimensions
“lateral inertial force” for the term −mv 2 n/R. [It should be acknowledged, however, that in some cases (e.g., a particle in uniform rigid–body rotation around an axis), both points of view are valid — indeed indistinguishable from each other.]
20.4.1 Illustrative example: a car on a turn
♥
In order to illustrate the d’Alembert principle of inertial forces, consider a race car on a turn. First, we present the formulation for a turn that we assume to be level, then we consider a banked turn. Finally, we present some criticism of these formulations.
• A car on a level turn Consider a car on a level turn, specifically a car that turns counterclockwise (namely, so that the center of rotation is placed to the left in Fig. 20.7, which shows the back of the car).
Fig. 20.7 Car on a level turn
At the risk of oversimplifying the model, we treat the car as if it were a point particle. [A critique to this assumption is presented in Subsubsection “In search of a more sophisticated analysis,” on p. 818.] Let us introduce the left–handed set of vectors i, j, k, where j is the horizontal outwardly–directed normal to the trajectory, k is directed vertically upwards, whereas i is normal to the plane of the figure, pointing towards your eyes (Figure 20.7). Accordingly, the velocity is given by v = −vi, with
815
20. Single particle dynamics in space
v = v > 0. Assume that the car travels at a constant speed. Hence, in using Eq. 20.32, we can set s¨ = 0, so that the inertial force includes only the lateral portion. The normal n to the trajectory coincides with −j. Hence, the acceleration is given by a = −v 2 j/R, where R > 0 is the local radius of curvature of the turn. [It makes no difference if R is constant or not.] Accordingly, the lateral inertial force is given by f0 = m v 2 j/R. The other forces acting on the car are: (i) the weight fW = −m g k (Eq. 20.12); (ii) the constraint reaction N nP (N > 0), where nP is the upward unit normal to the (level) pavement; (iii) the aerodynamic drag fD = D i; (iv) the propulsive force fP = −P i; and (v) the sidewise friction force fF = T tP, where tP is the unit tangent to the pavement on the plane of the figure, pointing towards the center of curvature of the trajectory. Remark 146. It may be noted that the propulsive force fP is also a friction force. [No friction, no propulsion! Try to accelerate suddenly on a patch of ice!] Indeed, for the sake of simplicity, we define the propulsive force fP to be the difference between the propulsive friction of the drive wheels and the drag friction of the other wheels. Specifically, by definition, the propulsive force acts in the direction of the motion, whereas the sidewise friction force acts in the direction perpendicular to the motion. According to the d’Alembert principle of inertial forces, the equilibrium requires that m
v2 j − m g k + N nP + D i − P i + T tP = 0, R
(20.59)
as if we were dealing with statics. Noting that tP = −j and nP = k. Hence, equating the three Cartesian components, we have P = D,
T =m
v2 , R
N = m g.
(20.60)
The first equation, P = D, indicates that to keep the speed constant the propulsive force (inclusive of the longitudinal friction force, Remark 146 above) must equal the aerodynamic drag. On the other hand, the third equation gives us N . So, let us concentrate on the second equation. Note that T is limited by the static friction law. This yields (use Eq. 17.87, as well as Eq. 20.60) m v 2 /R = T < TMax = μS N = μS m g, namely v
0 and θS > 0 contribute towards increasing vM (Eq. 20.64). ◦ Comment. If v 2 < R g tan θ, we have T < 0 (second . in Eq. 20.62). Then,
we should replace θS with −θS < 0, and hence vMin = tan(θ − θS) g R gives the lower speed limit of the static friction range. For lower speeds, the car would slide downwards — gravity would be overwhelming!
818
Part III. Multivariate calculus and mechanics in three dimensions
• In search of a more sophisticated analysis
♥♥♥
At this point, you might be complaining, because the above formulation is too oversimplified. For instance, still treating the car as a point particle, the in–plane force for assessing if the static friction law is violated should not be limited to friction, but should also include the propulsion −P i (also in–plane), where P = D = 12 CD A A v 2 (first in Eq. 20.62 and Eq. 14.80). Accordingly, it would be more appropriate to replace Eq. 20.63 with 1 2 2 2 vM 1 2 m cos θ − m g sin θ + C A vM R 2 D A 2 vM sin θ + m g cos θ . (20.65) = μS N = μ S m R In addition, race cars use wings to provide a downforce (also called negative lift). This should be added to N in Eq. 20.65. An even more serious objection stems from the fact that we have treated the car as a point particle and lumped the four reaction forces acting on the four wheels into a single one (OK for a unicycle, not a car), namely N k (Remark 147, p. 816). In view of the fact that the d’Alembert principle of inertial forces has transformed the problem into a static one, we can apply the formulation of Subsection 17.9.1, on pushing a tall stool, and determine the reaction forces Nj acting on each of the four wheels. We would not be able, however, to determine the friction force acting on each wheel. Even for the reaction forces, the situation becomes more complicated if we try to distinguish between front and back wheels. With this, we would be able to address oversteering (which occurs when both back wheels are sideslipping) and understeering (which occurs when both front wheels are sideslipping). However, to fully understand that, we should treat the car as a collection of rigid bodies (e.g., frame and wheels) and apply the angular momentum equation (see next section), in order to determine the normal load on front and back wheels. This, in turn, requires that we distinguish between front– and back–wheel–drive cars. Moreover, we should include the fact that the condition T < TMax = μSN (Eq. 17.87) should be applied to each individual tire. This requires an analysis of the elasticity of the suspensions, and even of the tires themselves. [Indeed, the system is not statically determinate, as you may verify.] Clearly, we are not in a position to include all of these considerations at this point in our journey. In any event, if we take all the facts into account the final expression would be so complicated that it would not be easy to draw any simple conclusion about what is happening.
819
20. Single particle dynamics in space
20.5 Angular momentum In Section 17.5, we introduced the moment of a force and used it for statics. Here, we use it for dynamics. Recall that, for one–dimensional problems, the quantity p = m v is called the momentum of a particle having mass m and velocity v (Eq. 20.5). Here, we introduce the following Definition 328 (Angular momentum of a particle). Consider a particle, with mass m and velocity v. The quantity hO = (x − xO ) × p = m (x − xO ) × v
(20.66)
is called the angular momentum of the particle, with respect to the point O. Next, let us introduce the angular–momentum equation. This is obtained by taking the cross product between x and the Newton second law, m a = f (Eq. 20.3). Recalling that mO = (x − xO ) × f is the moment of the force f with respect to the fixed point O (Eq. 17.53), we obtain m (x − xO ) × a = m
d (x − xO ) × v = mO . dt
(20.67)
[Hint: Use [(x − xO ) × v]˙ = v × v + (x − xO ) × a = (x − xO ) × a, since b × b = 0, for any b (Eq. 15.61).] If mO = 0, we can integrate the above equation and obtain the equation for the angular momentum conservation: (x − xO ) × p = (x0 − xO ) × p0 .
20.5.1 Spherical pendulum
(20.68)
♥
◦ Comment. The true relevance of the momentum equation will appear more evident in connection with multiple particles (Chapter 21), and even more for rigid bodies (Chapter 23) and deformable continua (Vol. III). However, the angular–momentum equation is useful for a particle that is constrained at a point (pivot), but free to swing in any direction, as in the case of the so–called spherical pendulum. Specifically, consider a particle that has mass m and is placed at one endpoint of a massless rigid bar, having length . [Recall that a massless bar behaves by definition like a string, except that contrary to the string it can withstand both tension and compression (Remark 139, p. 695).] The bar in turn is connected, at the other end to
820
Part III. Multivariate calculus and mechanics in three dimensions
a fixed point Q, through a universal joint, also known as a Cardano joint (point hinge), so that the bar is free to swing in any direction.7 The forces acting on the particle are the weight mg and the force exerted by the massless bar, which is directed like the bar itself. For convenience, we place the pivot point O at Q, so that the force exerted by the bar does not contribute to the angular momentum equation; only the weight m g does. In addition, we place the origin at Q, so as to have xO = xQ = 0. Let x denote the location of the particle. Then, Eq. 20.67 yields ¨ = m x × g. mx × x
(20.69)
[Of course, here we are assuming that a frame connected to the Earth may be considered as an inertial frame of reference. The effects of the rotation of the Earth on a spherical pendulum are dealt with in connection with the so–called Foucault pendulum (Subsection 22.4.6).] Next, let us introduce a Cartesian right–handed frame of reference i, j, k, with k directed vertically upwards, so that g = −g k. Dividing Eq. 20.69 by m we have, in terms of components, y z¨ − z y¨ = −y g, zx ¨ − x z¨ = x g, x y¨ − y x ¨ = 0.
(20.70)
◦ Comment. Note that the third equation is linearly dependent upon the other two. [Hint: Adding the first multiplied by x to the second multiplied by y yields the third multiplied by −z.] Note also that the third equation may be written as [x y˙ − y x]˙ ˙ = 0, which may be integrated to yield x y˙ − y x˙ = constant.
(20.71)
This is the z-component of the angular momentum conservation equation (Eq. 20.68). [Note that the conservation applies to the z-component, because the z-component of the moment due to w = m g, vanishes.] 7
Gerolamo Cardano (1501–1576) was an Italian Renaissance mathematician, physician, astrologer and inventor. He is the author of Ars Magna (Ref. [14]), a highly influential book on algebra, which includes the solution to algebraic equations of third and fourth degree. Because of his interests in gambling, he formulated elementary rules in probability, making him one of the founders of the field. [For his feud with the Italian mathematician Niccol` o Fontana (c. 1499–1557, better known as Niccol` o Tartaglia, namely Niccol` o the “Stammerer”), on the discovery of the solutions for cubic algebraic equations, see for instance Boyer (Ref. [13], pp. 282–286), Hellman (Ref. [30], pp. 7–25), and Kline (Ref. [38], pp. 263–270); see also Bewersdorff (Ref. [11]), which includes mathematical formulas for the roots of third and fourth order equations, and Galois theory on higher order equations.]
821
20. Single particle dynamics in space
• Circular (planar) pendulum as a particular case Let us go back to the first two in Eq. 20.70. Assume first that the initial conditions are given by y(0) = y(0) ˙ = 0. Then, y(t) = 0 for all t satisfies the first in Eq. 20.70 along with its initial conditions (see Remark 96, p. 459, on the uniqueness of the solution). In other words, in this case the motion is planar and occurs in the (x, z) plane. Accordingly, we expect that the remaining equation (z x ¨ − x z¨ = x g, second in Eq. 20.70) describes a circular pendulum. You might be surprised, because this equation does not look at all like the governing equation of a pendulum, namely θ¨ + C sin θ = 0 (Eq. 20.39), as one would expect. In order to make some sense out of this discrepancy, note that if y = 0 we have (see Fig. 20.6, p. 807) x = sin θ
and
z = − cos θ,
(20.72)
with constant. [Note that θ = 0 corresponds to z = −. The minus sign is consistent with the fact that at the bottom point z = − is negative, because k is directed upwards.] Accordingly, we have x˙ = θ˙ cos θ
and
x ¨ = θ¨ cos θ − θ˙2 sin θ
and
z˙ = θ˙ sin θ,
(20.73)
as well as z¨ = θ¨ sin θ + θ˙2 cos θ.
(20.74)
Combining with z x ¨ − x z¨ = x g (second in Eq. 20.70), one obtains − cos θ θ¨ cos θ− θ˙2 sin θ − sin θ θ¨ sin θ+ θ˙2 cos θ = g sin θ, (20.75) namely −2 θ¨ = g sin θ, as you may verify, in agreement with θ¨ + C sin θ = 0, with C = g/ (Eqs. 20.39 and 20.40).
• Spherical pendulum. Linearized formulation Here, we consider the linearized formulation in a small neighborhood of the bottom point of the sphere, where z = −. Note that z is not an independent quantity. For, we have the constraint that the pendulum mass point moves on the sphere x2 + y 2 + z 2 = 2 . Accordingly, we have (use z < 0, Eq. 20.72)
822
Part III. Multivariate calculus and mechanics in three dimensions
z =−
-
x2 + y 2 x2 + y 2 1− =− 1− + ··· . 2 2 2
(20.76)
Since we are dealing with the linearized formulation, we can neglect the nonlinear terms, and set z = −. Similarly, the terms that contain z¨ are nonlinear, and hence may be neglected. Thus, the first two in Eq. 20.70 yield the linearized equations for a spherical pendulum, namely x ¨ + ω 2 x = 0, y¨ + ω 2 y = 0,
(20.77)
where ω 2 = g/. The two equations are decoupled and easy to solve. I’ll let you fill in the details.
• Spherical pendulum. Exact formulation
♠
In order to obtain the exact governing equations for x(t) and y(t), we could time differentiate Eq. 20.76 twice, and obtain z¨ in terms of x, x, ˙ x ¨ and y, y, ˙ y¨. However, this does not particularly clarify the phenomenon. Instead, to facilitate the comparison with the pendulum equation, θ¨ + C sin θ = 0 (Eq. 20.39), we use the following transformation x/ = sin θ cos ϕ, y/ = sin θ sin ϕ, z/ = − cos θ,
(20.78)
with θ ∈ [0, π] and ϕ ∈ [−π, π]. ◦ Comment. The coordinate system in Eq. 20.78 is a close cousin of the spherical coordinates in Eq. 19.140. Indeed, the variable ψ ∈ [−π/2, π/2] in Eq. 19.140 has been changed into θ ∈ [0, π] in Eq. 20.78, because of the different range. [The symbol θ here was conveniently chosen to facilitate the comparison with the circular pendulum.] Specifically, to obtain Eq. 20.78 from Eq. 19.140, set ψ = θ − π/2, and use cos(θ − π/2) = sin θ and sin(θ − π/2) = − cos θ (Eq. 6.71). Time differentiating Eq. 20.78, we have x/ ˙ = θ˙ cos θ cos ϕ − ϕ˙ sin θ sin ϕ, y/ ˙ = θ˙ cos θ sin ϕ + ϕ˙ sin θ cos ϕ, z/ ˙ = θ˙ sin θ,
(20.79)
823
20. Single particle dynamics in space
and x ¨/ = θ¨ cos θ cos ϕ − ϕ¨ sin θ sin ϕ − (θ˙2 + ϕ˙ 2 ) sin θ cos ϕ − 2θ˙ϕ˙ cos θ sin ϕ, y¨/ = θ¨ cos θ sin ϕ + ϕ¨ sin θ cos ϕ − (θ˙2 + ϕ˙ 2 ) sin θ sin ϕ + 2θ˙ϕ˙ cos θ cos ϕ, z¨/ = θ¨ sin θ + θ˙2 cos θ.
(20.80)
Substituting into y z¨ − z y¨ = −y g and z x ¨ − x z¨ = x g (first two in Eq. 20.70), and multiplying the second by −1, one obtains θ¨ sin ϕ + ϕ¨ sin θ cos θ cos ϕ − ϕ˙ 2 sin θ cos θ sin ϕ + 2θ˙ ϕ˙ cos2 θ cos ϕ = −g sin θ sin ϕ/, ¨ θ cos ϕ − ϕ¨ sin θ cos θ sin ϕ − ϕ˙ 2 sin θ cos θ cos ϕ − 2θ˙ ϕ˙ cos2 θ sin ϕ = −g sin θ cos ϕ/,
(20.81)
as you may verify. This does not look much better than what we started from, does it? However, ... hold a sec! Let us multiply the first by cos ϕ and subtract it from the second multiplied by sin ϕ. [This corresponds to taking the component of the equation along ϕ, as you may verify.] This yields ϕ ¨ sin θ cos θ+2 θ˙ ϕ˙ cos2 θ = 0, namely (multiplying by tan θ) ϕ¨ sin2 θ + 2 θ˙ ϕ˙ cos θ sin θ = ϕ˙ sin2 θ ˙ = 0. (20.82) This may be integrated to yield ϕ˙ sin2 θ = constant.
(20.83)
◦ Comment. This equation, multiplied by 2 , may be written as r vϕ = constant, where r = sin θ is the distance of the pendulum mass point from the vertical axis through the pivot, whereas vϕ := ϕ˙ r = ϕ˙ sin ϕ is the circumferential component of its velocity. This result coincides with the z-component of the conservation of the angular momentum, and the result could have been obtained directly from Eq. 20.71, as you may verify. [Hint: Combine the third in Eq. 20.71 with Eqs. 20.78 and 20.80.] On the other hand, let us multiply the first by sin ϕ and add it to the second multiplied by cos ϕ. [This corresponds to taking the component of the equation along θ, as you may verify.] This yields g θ¨ − ϕ˙ 2 sin θ cos θ = − sin θ.
(20.84)
Equations 20.83 and 20.84 are the governing equations of a spherical pendulum.
824
Part III. Multivariate calculus and mechanics in three dimensions
Remark 148. Equation 20.84 looks a bit like θ¨ + (g/) sin θ = 0 (Eq. 20.39). Question: “Can we explain where the extra term comes from?” Also, we have seen that Eq. 20.83 may be obtained more directly from Eq. 20.71. Then, a second question arises: “Is there a simple way to obtain Eq. 20.84?” Answer to both questions: “Yes, but we need a more sophisticated formulation of mechanics.” Specifically, one may obtain Eqs. 20.82 and 20.84, by using a formulation in a non–inertial frame of reference that rotates around the vertical axis, so as to keep the pendulum on a plane rigidly connected with the new frame of reference. Accordingly, this subject matter will be further addressed in Subsection 22.4.5. What we can say here is that the approach used yields the correct equations, but a gut–level interpretation is missing.
20.6 Energy In this section, we generalize to three dimensions the results obtained in Section 11.9 for one–dimensional problems. Specifically, in Subsection 20.6.1 we address the energy equation. Then, in Subsection 20.6.2, we address the formulation for conservative forces and introduce the potential energy. Applications to springs, weight and gravitation (as well as central forces in general) are addressed in Subsection 20.6.3. Finally, in Subsection 20.6.4 we discuss a minimum–energy theorem for a single particle.
20.6.1 Energy equation Here, we extend to three–dimensional motion the results of Subsection 11.9.1, on kinetic energy and work for one–dimensional motion.
• Kinetic energy and power We have the following definitions: Definition 329 (Kinetic energy for a single particle). The quantity T =
1 1 m v 2 = m v2 . 2 2
(20.85)
is called the kinetic energy of a particle with velocity v and mass m. [The definition is virtually identical to that for rectilinear motion (Eq. 11.91).]
825
20. Single particle dynamics in space
Definition 330 (Power for a single particle). The quantity P =f ·v
(20.86)
is called the power generated by the force f applied to a particle having velocity v. Let us introduce the following Theorem 210. We have dT = f (x, v, t) · v(t). dt
(20.87)
In plain words, the power generated by the force f through the motion of the particle equals the time derivative of the kinetic energy. Proof. ♥ Consider the Newton second law, mv˙ = f (Eq. 20.3). Dotting this with v, we have ˙ m v(t) · v(t) = f (x, v, t) · v(t).
(20.88)
1 dv 2 1 d 2 ˙ = v + vy2 + vz2 = vx v˙ x + vy v˙ y + vz v˙ z = v · v. 2 dt 2 dt x
(20.89)
Next, note that
Thus, Eq. 20.88 is fully equivalent to Eq. 20.87.
• Kinetic energy and work Here, we consider the extension to three dimensions of the two–dimensional definition of work (Eq. 11.92). We have the following Definition 331 (Work in three dimensions). Consider a particle that follows a trajectory x = x(t), with t ∈ (ta , tb ), and is subject to a resultant force f (x, v, t). The quantity WL(ta ,tb ) :=
/
tb
f (x, v, t) · v(t) dt
(20.90)
ta
is called the work performed by the force f (x) between ta and tb . If f is only a function of x, we set v dt = dx and say that / f (x) · dx (20.91) WL(xa , xb ) := L(xa , xb )
826
Part III. Multivariate calculus and mechanics in three dimensions
is the work performed by the force f (x) along a specific path L(xa , xb ), covered by the particle by going from xa = x(ta ) to xb = x(tb ). [The work, in general, is path–dependent even for f = f (x) (contrary to the one–dimensional case). Path–independence is addressed in the next subsection.] Let us recast Eq. 20.87 in terms of work. We have the following Theorem 211. Consider a particle that follows a trajectory x = x(t), with t ∈ (ta , tb ), and is subject to a resultant force f (x, v, t). We have T (tb ) − T (ta ) = WL(ta ,tb ) ,
(20.92)
where WL(ta ,tb ) is given by Eq. 20.90. In other words, the work done by the resultant force equals the difference in kinetic energy at time tb and ta . Proof.
♥
Integrating Eq. 20.87 with respect to time, we have / tb / tb ˙ f (x, v, t) · v(t) dt, T (t) dt = T (tb ) − T (ta ) = ta
(20.93)
ta
which is fully equivalent to Eq. 20.92.
20.6.2 Potential vs conservative force fields In this subsection, we examine in greater depth the notion of potential and conservative force fields, in relation to the discussion of path–independent integrals, and the corresponding potential energy. [Much of the material that follows is an application of the general results on two–dimensional and three– dimensional path–independent line integrals (Section 19.2 and Subsection 19.6.1).] ◦ Warning. In this subsection, we consider only forces that depend upon x and t, but not on v (no velocity dependence). Accordingly, here “time– independent” is equivalent to “dependent solely upon x.” In Theorem 207, p. 768, we addressed potential, conservative and lamellar vector fields in three dimensions, and showed their equivalence. In mechanics, the situation is a bit different, because the vector field may be a function of time (an issue not addressed in the preceding chapter). This muddies the waters considerably, even if the vector field is not a function of the velocity v, as assumed for this subsection. [We limit ourselves to force fields, which are the only ones for which the considerations presented here are relevant.] Accordingly, here we show how the situation is modified when a force field is a function of time. We have the following
827
20. Single particle dynamics in space
Definition 332 (Potential force field). A force field f (x, t) is called potential, iff there exists a function u(x, t) such that f (x, t) = gradu(x, t).
(20.94)
The function u(x, t) is called the potential of f (x, t).
• Time–independent potential force fields As preliminary material, we consider a time–independent force field f = f (x), so as to facilitate the comparison with the results of the preceding chapter. We have the following (recall Footnote 4, p. 763, on conservative vector fields) Definition 333 (Time–independent conservative force field). A time– independent force field f (x) defined in a simply connected region is called conservative iff its work / WL(xa , xb ) = f (x) · dx (20.95) L(xa , xb )
is independent of the path L(xa , xb ) connecting xa to xb , and depends only upon xa and xb . In this case, we write / xb f (x) · dx. (20.96) W(xa , xb ) = xa
◦ Comment. The above definition applies as well to multiply connected regions, provided that Γk = 0 for any contour, even non–shrinkable (Definition 314, p. 765, of quasi–lamellar fields for n-connected regions). We have the following Theorem 212. A conservative force field is also potential. Vice versa, a time–independent single–valued potential force field is also conservative. Proof. ♥ Consider the 0first part of the theorem. If x0 is considered to be x fixed, then W(x0 , x) = x0 f (x ) · dx defines a (time–independent) function of x, / x f (x ) · dx , (20.97) u(x) = x0
which is such that gradu(x) = f (x) (use Theorem 207, p. 768). Vice versa, if f (x) = gradu(x),
(20.98)
828
Part III. Multivariate calculus and mechanics in three dimensions
then /
x1 x0
f · dx =
/
x1
x0
gradu · dx =
/
x1
x0
du = u(x1 ) − u(x0 ),
(20.99)
where u(x) is single–valued. Hence, the field is conservative.
That said, we have the following Definition 334 (Potential energy). For any time–independent potential force field, the potential energy is defined by / x f (x ) · dx + C = −W(x0 , x) + C, (20.100) U (x) := −u(x) = − x0
where x0 is an arbitrarily prescribed point and C is an arbitrary constant. Accordingly, we have (use Eq. 20.98) f (x) = −grad U (x).
(20.101)
[The reason for adding the constant C was discussed in Remark 107, p. 490, where we showed that not all the primitives may be expressed as an integral.] We have the following Theorem 213. For a time–independent potential force field f (x), we have W(xa , xb ) = U (xa ) − U (xb ). Proof.
(20.102)
♥
We have (use Eq. 20.100) / xb f (x) · dx W(xa , xb ) = xa / xb / = f (x) · dx − x0
in agreement with Eq. 20.102.
(20.103) xa x0
f (x) · dx = − U(xb ) + U (xa ),
◦ Comment. Note the analogy with Theorem 193, p. 741, on the integrals of gradu for single–valued u(x). Note also that the theorem is limited to time– independent potential fields. Indeed, time–dependent potential fields are not necessarily conservative. [This issue is addressed in the subsubsection that follows. See Remark 149, p. 830, to find out what happened when we have f = f (x, t).] Finally, in analogy with the one–dimensional case (Definition 214, p. 492), we have the following Definition 335 (Total energy). The sum of kinetic and potential energies
829
20. Single particle dynamics in space
E(t) = T v(t) + U x(t)
(20.104)
will be referred to as the total energy. Then, we have the following Theorem 214 (Total energy conservation for conservative f (x)). For a single particle subject to a time–independent conservative force field f (x), we have T v(t) + U x(t) = E0 , (20.105) where E0 is the total energy at any given time. Proof.
♥
Combining Eqs. 20.92 and 20.102, we have T (vb ) − T (va ) = W(xa , xb ) = − U (xb ) + U (xa ),
or T (vb ) + U(xb ) = T (va ) + U (xa ), in agreement with Eq. 20.105.
• Time–dependent potential force fields
(20.106)
♠
The considerations in Subsection 11.9.3, regarding one–dimensional potential force fields that are not time–independent applies to three–dimensional force fields as well. Indeed, the material in Eq. 11.106–11.107 may be extended to three dimensions to yield the desired result. Specifically, let us start from the fact that for any f = f [t, x(t)], we have 3
df ∂f # ∂f dxk ∂f = + = + v · gradf. dt ∂t ∂xk dt ∂t
(20.107)
k=1
Accordingly, we have dU ∂U ∂U = + v · grad U = − v · f. dt ∂t ∂t
(20.108)
Therefore, we obtain W(xa , xb ) :=
/
xb xa
f [x(t), t] · v dt
= U [xa , ta ] − U [xb , tb ] + which generalizes Eq. 11.107.
/
xb xa
∂U dt, ∂t
(20.109)
830
Part III. Multivariate calculus and mechanics in three dimensions
Remark 149. Thus, the fact that a force field admits a single–valued potential does not imply that the work done is path–independent (as it does for conservative fields, namely time–independent potential fields). Accordingly, we have to be careful with time–dependent potential force fields. Indeed, it is apparent that even if we follow the same trajectory, but with a different time–history, the result will be different (see Section 11.9, for the corresponding one–dimensional problem). For instance, consider the combined force field acting on an artificial satellite due to the Earth, the Sun, the Moon, and the other celestial bodies. This force field is potential, with the potential energy given by the sum of the individual contributions. [The expression is given in the following subsection (Eq. 20.120).] However, it is not conservative. Indeed, the work done to move a spaceship from the Earth to the Moon depends upon the location of the various celestial bodies during the motion.
20.6.3 Potential energy for forces of interest Here, we present the expressions for the potential energy for forces that are of interest in this book, specifically those due to: (i) springs, (ii) gravity, (iii) gravitation, and (iv) central forces in general.
• Springs Let us consider again a linear spring in three dimensions. We have addressed this in Section 17.3 on the equilibrium of spring–connected particles. Here, we limit ourselves to springs that are fixed at one endpoint, say at x∗ . Specifically, let us consider the force fS that a spring exerts on a particle connected its free endpoint, say x. [This is equal and opposite to the force necessary to stretch/compress the spring.] Hence, we may still use the expression for the corresponding force fS as given in Eq. 17.14, with minor modifications. The force is aligned with the unit vector in direction x − x∗ , given by eS =
1 (x − x∗ )
= x − x∗ .
(20.110)
Thus, in three dimensions the force fS that a linear spring exerts on a particle that is connected at the spring’s free endpoint x is given by fS = −κ ( − 0 ) eS,
(20.111)
831
20. Single particle dynamics in space
where 0 = x0 − x∗ is the length of the unloaded spring. [Indeed, the direction is that of x − x∗ if < 0 (compression), and the opposite one if > 0 (tension).] The corresponding potential energy is given by US = Indeed, using =
grad =
1 κ ( − 0 )2 . 2
(20.112)
(x1 − x1∗ )2 + (x2 − x2∗ )2 + (x3 − x3∗ )2 , we have
3 3 # # ∂ x k − x k∗ x − x∗ ik = = eS, ik = ∂xk
k=1
(20.113)
k=1
and hence grad US = κ ( − 0 ) grad = κ ( − 0 ) eS = −fS.
(20.114)
◦ Comment. We can even consider a nonlinear spring with a force given by fS = f ( − 0 ) eS,
(20.115)
where f (x) gives the intensity and the sign of the force. Then, denoting by F (x) the primitive of −f (x), so that F (x) = −f (x), the potential energy is given by US = F ( − 0 ).
(20.116)
Indeed, we have (use Eq. 20.113) grad US = F ( − 0 ) grad = −f ( − 0 ) eS = −fS.
(20.117)
◦ Comment. It should be pointed out that there is an important difference between the force due to a spring on the one hand, and gravitation on the other. Such a difference arises not so much from a mathematical point of view, but rather from a physical one. Indeed, in the case of gravitation, we formulate the problem in terms of a vector field that exists in the entire space. On the other hand, in the case of a spring that is fixed at one endpoint, we are not dealing with a vector field, but rather with a force that depends upon the location of the free endpoint. Accordingly, for the case of springs, instead of speaking of potential energy, we often use the term elastic energy. However, from the mathematical point of view, there is no difference between the two cases, and hence here we may use the term potential energy for springs as well. Specifically, we have the following
832
Part III. Multivariate calculus and mechanics in three dimensions
Definition 336 (Elastic energy). In the case of springs (linear or not), the potential energy is typically referred to as the elastic energy.
• Weight The expression for the weight is given by fW = −m g k, where k is pointed upwards (Eq. 20.12). The corresponding potential energy is given by UW = m g z,
(20.118)
Indeed, noting that gradz = k, we have grad UW = m g k = −fW .
(20.119)
• Gravitation The gravitational force of attraction that a particle with mass M located at x0 exerts on a particle with mass m located at x, is given fG = −G M m r/r3 , with r = x − x0 and r = r (Eq. 20.28). The corresponding potential energy is given by UG = −
GM m r
r = x − x0 .
(20.120)
Indeed, noting that (in analogy with Eq. 20.113) gradr =
r x − x0 = =: er , r r
(20.121)
GM m er = −fG. r2
(20.122)
we have grad UG =
• Central forces
♥
Gravitational forces are a particular case of central forces, which are defined by the following Definition 337 (Central force). A potential force field is called central iff: (i) is aligned with the radial direction, towards or away from a given point (say xC, the center), and (ii) has a magnitude that depends only upon the distance from xC, namely iff
833
20. Single particle dynamics in space
fC = f (r) er .
(20.123)
Remark 150. Are central forces potential? We will see this in the next theorem. However, one thing we know for sure. Since a central force is aligned with the radial direction, if a potential energy exists, then the constant– potential–energy surfaces are necessarily spheres with the center at xC. [For, f = −grad U is normal to the surfaces U=constant (Theorem 190, p. 720).] In other words, U = U (r). Neat, isn’t it? That said, we have the following Theorem 215 (Potential energy for central forces). A potential energy of the type U = U (r), where r = x−xC denotes the distance from the center xC, corresponds to a central force. Vice versa, any central force is potential, with U = U(r). Proof. ♥ For the first part, recalling again that er := r/r = gradr (Eq. 20.121), we have fC = − grad U(r) = −
dU er . dr
(20.124)
Hence, fC is central. Vice versa, for any central force fC = f (r) er (Eq. 20.123), we have that U exists. Indeed, it coincides with the primitive of −f (r). [Hint: Use d U /dr = −f (r) and gradr = er (Eq. 20.121).]
20.6.4 Minimum–energy theorem for a single particle In this subsection, we return to statics and extend to three dimensions the one–dimensional minimum–energy theorem for a single particle (Theorem 131, p. 494). Specifically, we have the following Theorem 216 (Minimum–energy theorem for a single particle). A particle subject to a potential force f (x) is in equilibrium at x = x0 iff the potential energy U (x) is stationary there. If U (x) has a minimum (maximum) at x = x0 , the equilibrium is stable (unstable) there. If the energy is constant in a whole (open) neighborhood N of x = x0 , no matter how small N is, then we have a neutral equilibrium for all the points of N . Proof. ♥ Consider the first part of the statement. At an equilibrium point, say x0 , we have f (x0 ) = 0. This implies that grad U (x) x=x = −f (x0 ) = 0 0 (Eq. 20.101), namely that the energy is stationary at x = x0 , and vice versa.
834
Part III. Multivariate calculus and mechanics in three dimensions
Next, consider the stability of the equilibrium. Assume that U(x) has a local minimum at x = x0 . Then, there is a neighborhood N of x0 , where we have f (x) · dx = −grad U · dx = − d U < 0, with x = x0 + dx ∈ N . Accordingly, a displacement dx yields a force with a component in the opposite direction (restoring force). In other words, if the particle is moved away from the equilibrium position, the resulting force tends to move the particle back to the equilibrium point (stable equilibrium). The opposite occurs if U (x) has a maximum (unstable equilibrium). Finally, if U (x) is constant in a neighborhood of x0 , then f (x) = 0 in that entire neighborhood (neutral equilibrium). ◦ Comment. In analogy with the one–dimensional case, it is also interesting to see the same problem from a dynamical point of view. Assume that at x = x0 the potential energy U has a minimum. [For the sake of simplicity and without loss of generality, we may set U (x0 ) = 0, since U is defined up to an additive constant (Definition 313, p. 763).] Consider a particle that is located at x0 , with zero velocity. Then, according to Eq. 20.105, we have T + U = 0. Any displacement away from x0 causes the potential energy to increase, and hence the kinetic energy to decrease, a clear impossibility since T , which is initially zero, cannot be negative.
20.7 Dynamics solutions, via energy equation In this section, we present a methodology to obtain — via energy considerations — the solution to problems involving only time–independent conservative force fields f (x) (a generalization to three dimensions of the material presented in Subsection 11.9.7 for one–dimensional problems). To begin with, we can use the energy equation (Eq. 20.105) to obtain v = v. Indeed, Eq. 20.105 may be written as (akin to Eq. 11.118) v(t) =
1/2
, 2 + E0 − U [x(t)] m
> 0.
(20.125)
◦ Warning. Here, just for the sake of clarity, we limit ourselves to the case s˙ = v > 0. ◦ Comment. This method provides us only with v, and not v. Therefore the method is primarily useful for problems in which the trajectory x = x(s) is prescribed. For, in this case, we may evaluate the function s = s(t), in full analogy with what we did for one–dimensional problems (Subsection 11.9.7). Indeed, we have v = ds/dt. The solution to this equation may be obtained with the method of separation of variables for ordinary differential equations
20. Single particle dynamics in space
(Subsection 10.5.1), to yield (akin to Eq. 11.119) - / s m 1 t= ds, 2 s0 E0 − U (s)
835
(20.126)
which provides us with an implicit expression of the function s = s(t), through its inverse t = t(s). [Recall that we have assumed v > 0. You might want to practice your skill and modify the formulation when we deal with s˙ = −v < 0 (see Remark 110, p. 495).]
20.7.1 A bead moving along a wire, again As an illustrative example, consider a bead moving along a wire, with the wire having an arbitrarily prescribed shape, for instance a bead moving along a spiral, a problem addressed in Subsection 20.3.1, where I claimed that the √ velocity is given by v = 2gh (Eq. 20.33). Here, we show that this is indeed the case. Assume that the bead starts with zero velocity, from the horizontal plane z = z0 = 0. For simplicity, we assume the friction and the aerodynamic drag to be negligible. Then, using Eqs. 20.118 and 20.125, we obtain that the speed of the bead may be obtained from 12 mv 2 + mgz = 12 mv02 + mgz0 = 0, since z0 = v0 = 0. Setting h = −z > 0, we have v = 2 g h. (20.127) as in Eq. 20.33. In general, the bead may have an arbitrary initial velocity. In this case, there might be no point where the velocity vanishes. Then, h > 0 measures the distance along a vertical axis (oriented downwards), from the height where the speed would vanish. [For example, consider a roller coaster: neglecting friction and aerodynamic drag, h is measured from the altitude where the velocity would vanish.] ◦ Comment. In particular, if the bead moves downwards along the vertical, then we have v = dh/dt > 0, and 1 / h 1 2h √ t= , (20.128) dh = g 2 g h 0 namely h = 12 g t2 , in agreement with Eq. 11.23. [Compare to the analogous formulation for one–dimensional problems (Eq. 11.120).]
836
Part III. Multivariate calculus and mechanics in three dimensions
20.7.2 Exact circular pendulum equation
♥
Here, we consider another example, namely the exact large–amplitude solution of the pendulum equation, θ¨ + C sin θ = 0, where C = g/ (Eqs. 20.39 and 20.40). Multiplying this equation by θ˙ and integrating, we have 1 ˙2 θ − C cos θ = E˘0 , 2
(20.129)
1 E˘0 := θ˙02 − C cos θ0 2
(20.130)
where
is the initial energy E0 divided by m2 . [Indeed, the kinetic energy is given by 12 m v 2 = 12 m 2 θ˙2 , whereas the potential energy is given by mg cos θ = m2 C cos θ. Hence, E˘ = E/(m2 ) = 12 θ˙2 − C cos θ.] Using Eq. 20.126 with s = θ, one obtains dθ
θ
/ t = t(θ) = 0
.
.
(20.131)
2 E˘0 + 2 C cos θ
[As in Remark 110, p. 495, here we chose θ ∈ (0, θ0 ), so that θ˙ > 0.] Let us examine the various possibilities that may occur, depending upon the initial energy E˘0 . Let us begin with the case E˘0 = C = g/ > 0 (Eq. 20.40). Then, for θ = π (namely at the top of the circle), we have cos θ = −1, and Eq. 20.129 gives us θ˙ = 0. The pendulum stops at the top of the circle, which is a situation of unstable equilibrium (Eq. 20.46). If E˘0 > C, we have that 12 θ˙2 = E˘0 + C cos θ is positive for all values of θ. Thus, the velocity never vanishes, namely the pendulum never stops, not even at the top of the circle. It keeps on rotating in the same angular direction. Finally, consider the case E˘0 < C. For the sake of clarity, let us assume that the pendulum starts, with θ˙0 = 0, from the angle θ(0) = −θ0 ∈ (−π, 0), where θ0 denotes the angle where the pendulum stops again, namely the largest value that θ may take. [The choice θ(0) < 0 is necessary to ensure that θ˙ > 0, as we assumed in Eq. 20.131.] Then Eq. 20.129 yields E˘0 = −C cos θ0 (where (C = g/), and Eq. 20.131 reduces to 1 / θ dθ √ t = t(θ) := . (20.132) 2g −θ0 cos θ − cos θ0 The period of oscillation T is four times the time to go from 0 to θ0 , or
837
20. Single particle dynamics in space
1 T =4
2g
θ0
/
√
0
• A more convenient expression
dθ . cos θ − cos θ0
(20.133)
♠
To obtain a more convenient expression, set sin
θ = k sin u, 2
with k = sin
θ0 . 2
(20.134)
Thus, we have 1 θ cos dθ = k cos u du. 2 2
(20.135)
Next, note that, using Eq. 20.134, we have θ θ cos = 1 − sin2 = 1 − k 2 sin2 u. 2 2
(20.136)
On the other hand, using Eq. 20.134 as well as sin2 α = 12 (1 − cos 2α) (Eq. 7.62), one obtains θ 2 k cos u = k 1 − sin u = k 2 − sin2 2 θ θ cos θ − cos θ0 0 = sin2 − sin2 = . (20.137) 2 2 2 Thus, combining the last three equations, one obtains √
√ dθ du = 2 , cos θ − cos θ0 1 − k 2 sin2 u
(20.138)
as you may verify. Then, substituting into Eq. 20.133, one obtains 1 / π/2 1 T =4 du. (20.139) g 0 1 − k 2 sin2 u [For your information, / K(k) := 0
π/2
1
1 − k 2 sin2 u
du
(20.140)
is called the complete elliptic integral of the first kind (Ref. [2], p. 590).]
838
Part III. Multivariate calculus and mechanics in three dimensions
• Period for small–amplitude oscillations For small amplitude oscillations, √we may obtain an approximate expression 2 1 for T . To this end, recall that 1/ 1 − x = 1 + 2 x + O x (Eq. 13.29), and 0 π/2 2 sin x dx = π/4 (Eq. 10.48). Accordingly, we have 0 1 1 / π/2 1+ 14 k 2 +· · · , 1+ 12 k 2 sin2 u+· · · du = 2π T =4 g 0 g
(20.141)
which gives T = T (k) for small value of k = sin(θ0 /2). Of course, for very small values of k (namely for θ0 1), we recover T = 2 π /g (Eq. 20.44).
20.8 Appendix A. Newton and the Kepler laws
♥
Arguably, the most beautiful example of the interplay between mathematics and classical mechanics is the Newton derivation of the Kepler laws on the motion of the planets, starting from the Newton law of gravitation. Specifically, as anticipated in Subsection 20.2.5, here we present all the material of this chapter that relates to gravitation and celestial mechanics. In particular, in Subsection 20.8.1 we further discuss the Newton law of gravitation. Next, in Subsection 20.8.2 we use the formulation in terms of longitudinal and lateral acceleration, presented in Section 20.3, to discuss the motion of planets on circular orbits. Then, we get to the main topic of this appendix, namely the Kepler laws (Subsection 20.8.3). Finally, in Subsection 20.8.4, we address the relationship between gravity and gravitation. All we need here is the material covered up to Subsection 20.2.5 (included), with the exception of Subsections 20.8.2 (circular orbits) and 20.8.4 (gravity vs gravitation), where we use Eq. 20.32 (intrinsic acceleration components). In addition, the material on angular momentum (Section 20.5) is useful to provide a physical interpretation of the Kepler second law. 8 Remark 151. In this appendix, we assume that the star may be considered as fixed, while the planet orbits around it. The hypothesis that the star is 8
Let me reiterate that here the Kepler laws are presented as a mathematical consequence of the Newton second law and the Newton law of gravitation. However, as pointed out in Subsection 20.2.5, history tells us a whole different story. Specifically, from a historical point of view, the story goes the other way around. The German astronomer and mathematician Johannes Kepler discovered his three laws (published between 1609 and 1619), by using experimental data provided to him by Tycho Brahe. The Kepler laws in turn were the basis for the subsequent introduction by Newton of the laws on dynamics and gravitation, as a way to explain them. The Newton laws were introduced in 1687, in his Philosophiae Naturalis Principia Mathematica (Ref. [49]).
839
20. Single particle dynamics in space
fixed is removed in Subsection 21.7.3, where the Kepler laws are revisited by using a two–particle analysis, which allows us to show that the fixed–star hypothesis is equivalent to assuming that the mass of the star is much larger than that of the planet. [This assumption is closely related to that in Remark 106, p. 482, on the infinite–mass assumption for the car–wheel–suspension system.] Additional material regarding gravitation and celestial mechanics is presented later, because some items require additional mathematical know– how. In particular, in Section 21.8 we consider the Sun–Earth–Moon system, whereas in Section 22.6 we present an analysis of the tides. Also relevant is the Gauss law of gravitation, addressed in Vol. III.
20.8.1 Newton law of gravitation
♥
As stated in Principle 5, p. 802, the Newton law of gravitation, specifically the attraction that a material point P1 (having mass m1 ) exerts on a material point P2 (having mass m2 ) is given by Eq. 20.28, namely fG(x1 , x2 ) = − G
m1 m2 r, r3
(20.142)
where r := x2 − x1 is the vector from P1 to P2 , whereas the gravitational constant G is given by Eq. 20.29, namely G 6.674 30 · 10−11 m3 kg−1 s−2 . ◦ Comment. Note that, according to Eq. 20.142, this force is equal and opposite to the force that the second exerts on the first (just interchange x1 and x2 in Eq. 20.142), in agreement with the Newton third law of action and reaction (Eqs. 17.59 and 17.61). Also, Eq. 20.142 may be written as fG(x1 , x2 ) = − G
m1 m2 er , r2
(20.143)
where er :=
r r
(20.144)
is a unit vector having direction equal to r. Equation 20.143 justifies the name “inverse square law,” often used for it.
840
Part III. Multivariate calculus and mechanics in three dimensions
20.8.2 Planets on circular orbits
♥
Consider two celestial bodies, say a star and a planet, in the absence of any other celestial body. [Similar considerations apply to the motion of a satellite around a planet.] In this subsection, we assume the orbit to be circular. Remark 152. We know that in reality the orbits of the planets are elliptical (an extensive analysis of this issue is presented in the subsection that follows). Nonetheless, for the Sun–Earth system, the circular–orbit assumption is reasonable, since the eccentricity of the Earth orbit is E = 1 − b2 /a2 0.016 709, namely (a − b)/a 0.000 140. [The symbols a and b denote the semi–major and semi–minor axes of the elliptical orbit as in Eq. 7.117.] Similarly, the mean eccentricity of the Moon orbit is given by M = 1 − b2 /a2 0.054 901, namely (a − b)/a 0.001 508. Specifically, here we want to find the relationship between the planet speed and the star–planet distance R, such that R remains constant. To this end, let M denote the mass of the star (in particular, of the Sun), located at PS, and m the mass of a planet (in particular, of the Earth), located at PP. As stated in Remark 151, p. 838, here we assume m M (infinite–mass assumption). Treating again the star and the planet as material points, the force of gravitation that the star exerts on the planet may be written as (Eq. 20.142) fG(xS, xP) = − G
mM er , r2
(20.145)
with er = r/r (Eq. 20.144), where r = xP − xS is the vector from the star to the planet. Next, let us combine this with the Newton second law, m a = f (Eq. 20.3), and use the longitudinal/lateral decomposition of the acceleration, namely a = v˙ t + (v 2 /R) n (Eq. 20.32). This yields v˙ t +
GM v2 n = − 2 er . R r
(20.146)
For circular orbits, we have R = r and n = −er . Therefore, the circumferential component of the above equation yields v˙ = 0,
(20.147)
namely the planet speed remains constant. On the other hand, the lateral (namely radial) component of the above equation gives us v 2 /R = GM/R2 ,
841
20. Single particle dynamics in space
or v2 =
GM . R
(20.148)
In other words, in order for the planet to have a circular orbit, its speed must be constant, with its square inversely proportional to the constant distance R, the constant of proportionality being given by GM . Finally, note that the orbital period, namely the time it takes for a complete orbit, may be obtained from v T = 2πR, or (use Eq. 20.148) T 2 = 4 π2
R3 . GM
(20.149)
In plain words, we have that the square of the orbital period is proportional to the cube of the orbital radius.
• The Earth around the Sun
♥
Let us apply these results to find the approximate values for the orbital periods of the Earth around the Sun, and of the Moon around the Earth, of course within the approximation that the corresponding orbits are circular (see Remark 152, p. 840). We already know that G 6.674 30·10−11 m3 kg−1 s−2 (Eq. 20.29). In addition, the average distance Sun–Earth is rSE 149.598 · 106 km. Finally, for the mass of the Sun, we have MS = (1.988 47 ± 0.000 07) · 1030 kg (p. 1003). Therefore, the orbital period of the Earth is approximately given by (149.598 262 · 109 )3 TE 2 π s 365.252 days. (20.150) 6.674 30 · 10−11 × 1.988 47 · 1030 ◦ Comment. If we use G 6.674 45 · 10−11 m3 kg−1 s−2 and MS = 1.988 54 · 1030 kg (highest values for G and MS within their accuracy bracket, p. 1003), we obtain T 365.242 3, in excellent agreement with the actual orbital period of T 365.242 2 solar days (p. 1003). ◦ Comment. Incidentally, had we had T = 365.25 solar days, we would only need a leap year every four years. The Gregorian calendar (no leap day for the years divisible by 100, except for those divisible by 400) produces an average year (over a 400–year period) of 365.242 5. ◦ Comment. As we will see in Subsection 20.8.3, if we remove the assumptions that the orbit is circular, the third Kepler law (corrected for a two–body analysis, namely Eq. 21.159 instead of Eq. 21.159), states that T 2 = 4 π 2 a3SE/[G(MS + ME)], where aSE is the semi–major axis of the el-
842
Part III. Multivariate calculus and mechanics in three dimensions
liptical orbit. Note that the distance Sun–Earth, rSE, is evaluated as the arithmetic average of perihelion (147,098,291 km) and aphelion (152,098,233 km), so that rSE = aSE (p. 1003).9 On the other hand, ME = .5 · 1025 kg is smaller than the uncertainty of MS = 7 · 1025 kg (p. 1003). Hence, Eq. 20.150 is exact within the limitations of Subsection 20.8.3.
• The Moon around the Earth
♥
For the Moon–Earth system, assuming again a circular orbit (Remark 152, p. 840), the distance is rEM 384.4 · 106 m, average of perigee (363,104 km) and apogee (405,696 km).10 Also, G 6.674 30 · 10−11 m3 kg−1 s−2 (Eq. 20.29) and ME 5.972 19 · 1024 kg (p. 1003). Therefore, the orbital period of the Moon is given by (384.4 · 106 )3 TM 2 π s 27.452 days (20.151) 6.674 30 · 10−11 × 5.972 19 · 1024 For the Moon, the actual orbital period is TM 27.321 661 days.
• Geostationary satellites We have the following Definition 338 (Geostationary satellite). A satellite is called geostationary iff it always appears at the same point in the sky to an observer on the Earth. [The orbit of a geostationary satellite lies necessarily on the equatorial plane. Why?] Let us introduce the following Definition 339 (Sidereal rotation period). The sidereal rotation period is the time necessary for a celestial body to complete one revolution around its axis, in an inertial frame. [The Earth sidereal rotation period is TESid 0.997 269 68 solar days (namely TESid 86, 164 s. A solar day, namely 86, 400 seconds, includes the extra time needed to take into account the portion of the orbit around the Sun covered by the Earth during one day. More on this later (Eqs. 22.106–22.109).] 9
Perihelion and aphelion denote, respectively, the closest and farthest Earth locations of the orbit around the Sun. They come from Ancient Greek: περιηλιoν (perielion) and αϕηλιoν (aphelion), from περι (peri, around), απo (apo, away), and ηλιoς (helios, Sun). 10 Perigee and apogee denote, respectively, the closest and farthest Earth locations of the orbit around the Sun. They come from Ancient Greek: περιγειoν (perigeion) and απoγειoν (apogeion), from περι (peri, around), απo (apo, away), and γε (ge, Earth).
843
20. Single particle dynamics in space
Regarding the geostationary satellite, we want: (i) its orbit to be circular and on the plane of the equator and (ii) its orbital period to be equal to the sidereal rotation period, TESid , of the Earth. Accordingly, Eq. 20.149 yields that the (circular) geostationary orbital radius RGS is obtained from (use G 6.674 30 · 10−11 m3 kg−1 s−2 , Eq. 20.29, and ME 5.972 19 · 1024 kg, p. 1003) 3
RGS =
TESid 2 G ME 4 π2
= 42, 164 km,
(20.152)
namely 35,786 km above sea level (since the Earth’s equatorial radius is RE 6, 378 km), in good agreement with available data.
20.8.3 The Kepler laws
♥
In the preceding subsection, we have studied circular orbits. In reality, we know that the orbits of the planets are elliptical. This fact is the first of the three Kepler laws, which are 1. The orbit described by a single planet in its motion around an isolated star is an ellipse, of which the star occupies one focus (1608). 2. The segment connecting the star to the planet sweeps equal areas in equal times (1609). 3. The square of the orbital period of a single planet is proportional to the cube of the semi–major axis of its orbit (1619). Here, we show how the three Kepler laws can be derived from the Newton second law (Eq. 20.3) and the Newton law of gravitation (Eq. 20.142). These may be combined to yield d2 r GM = − 2 er , dt2 r
(20.153)
with r = x − xS, where xS denotes the location of the star.
• Formulation of the problem
♥
Here, we consider a system composed of a single star S with a single planet P and assume that the star and its planet may be treated as mass points. ◦ The orbit is planar. We first show that the orbit is planar. To this end, assume the initial velocity v0 and the initial location vector x0 not to be
844
Part III. Multivariate calculus and mechanics in three dimensions
parallel. [If v0 and x0 were parallel, the motion would be rectilinear and the planet would crash onto the star, or move away, at least initially. In fact, we have to assume that the angle between the two vectors is such that, in its trajectory, the planet does not get excessively close to the center of the star.] Let us choose a frame of reference with the origin coinciding with S, and the z-axis to be directed like x0 ×v0 , so that the initial motion is counterclockwise with respect to the z-axis (see Fig. 20.9, where the z-axis, perpendicular to the plane of the figure, points towards your eyes). [Thus, the plane z = 0 contains S and is parallel to x0 and v0 . The x- and y-axes are chosen so as to form with the z-axis a right–handed system.] That said, the component of
Fig. 20.9 Anomaly ϕ and unit vectors er and eϕ
Eq. 20.153 in the z-direction is d2 z = 0, dt2
(20.154)
whereas the initial conditions are z(0) = z(0) ˙ = 0. Hence, the (unique) solution is z(t) = 0, namely the motion is restricted to the plane z = 0. ◦ Convenient components of the Newton second law. To derive the Kepler laws, it is convenient to use (at any given time), instead of the Cartesian components, the components of the Newton second law (Eq. 20.3) along two orthogonal vectors, defined as follows (see again Fig. 20.9): (i) the first, er , is given by Eq. 20.144, with r = x − xS, and points in the direction opposite to the force of gravitation acting on the planet (Eq. 20.145), (ii) the second, eϕ , is orthogonal to er and points in the counterclockwise direction, so that v0 · eϕ > 0. Accordingly, we have er = cos ϕ i + sin ϕ j
and
eϕ = − sin ϕ i + cos ϕ j,
(20.155)
where ϕ, known as the anomaly, is the angle that r(t) makes with the x-axis. [The direction of the x-axis will be conveniently chosen after Eq. 20.168.] ◦ Kinematics. These equations imply
845
20. Single particle dynamics in space
der = ϕ˙ eϕ dt
and
deϕ = −ϕ˙ er . dt
(20.156)
Hence, the velocity is given by (set r = r er and use Eq. 20.156) v = r˙ = r˙ er + r ϕ˙ eϕ ,
(20.157)
whereas for the acceleration we have (use Eqs. 20.156 and 20.157) a = v˙ = r¨ − r ϕ˙ 2 er + 2 r˙ ϕ˙ + r ϕ¨ eϕ . (20.158) For future reference, note that ϕ˙ 0 > 0,
(20.159)
because we have chosen the z-axis so as to have an initial counterclockwise motion. ◦ Equations of motion. Combining the second Newton law (Eq. 20.153) with Eq. 20.158, one obtains GM r¨ − r ϕ˙ 2 er + 2 r˙ ϕ˙ + r ϕ¨ eϕ = − 2 er . r
(20.160)
The components of the above equation, in directions er and eϕ , are GM , r2 2 r˙ ϕ˙ + r ϕ¨ = 0. r¨ − r ϕ˙ 2 = −
• Kepler second law
(20.161) (20.162)
♥
It is convenient to begin by deriving the second law, because this simplifies the proof of the first law. ˙ = 0, which Equation 20.162, multiplied by r, may be written as d(r2 ϕ)/dt may be integrated to yield r2 ϕ˙ = r02 ϕ˙ 0 =: C > 0.
(20.163)
The constant C is called the areal velocity constant, and is positive, because ϕ˙ 0 > 0 (Eq. 20.159). [Of course, for circular orbits, the Kepler second law reduces to v˙ = 0, in agreement with Eq. 20.147.] Equation 20.163 is equivalent to the Kepler second law. Indeed, note that the area dA swept by r during any time interval dt is given by (see the light gray region shown in Fig. 20.9)
846
Part III. Multivariate calculus and mechanics in three dimensions
dA =
1 1 1 2 r dϕ = r2 ϕ˙ dt = C dt. 2 2 2
(20.164)
Equation 20.164 agrees with the Kepler second law, namely: The line connecting the star to the planet sweeps equal areas in equal times. ◦ Comment. Equation 20.163 may also be obtained from the conservation equation of the angular momentum (Eq. 20.68). Indeed, recalling that v = r˙ er + r ϕ˙ eϕ (Eq. 20.157), we have that the angular momentum (per unit mass) is given by (use er × eϕ = k) r × v = r er × (r˙ er + r ϕ˙ eϕ ) = r2 ϕ˙ k. In addition, the moment of the gravitational force with respect to the origin vanishes. Then, we may use the angular–momentum conservation equation (Eq. 20.68) and obtain that r×v = r2 ϕ˙ k remains constant, as in Eq. 20.163.
• Kepler first law
♥
Here, we show that the orbit is an ellipse, with the star occupying one focus. To this end, we will use the polar representation formulation presented in Section 7.9. Specifically, consider Eq. 20.161. To integrate this, set r[ϕ(t)] = 1/u[ϕ(t)]. Then, we have dr d 1 −1 du −1 du du = = 2 = 2 ϕ˙ = − C . (20.165) dt dt u u dt u dϕ dϕ [Hint: Use the Kepler second law, r2 ϕ˙ = C (Eq. 20.163) and (1/u)˙ = −u/u ˙ 2 2 2 2 (Eq. 9.49).] Hence, r¨ = −C ϕ˙ d u/dϕ . Combining with r¨ − r ϕ˙ = −GM/r2 (Eq. 20.161) yields −C ϕ˙ d2 u/dϕ2 − r ϕ˙ 2 = −GM/r2 , namely d2 u GM +u= 2 , 2 dϕ C
(20.166)
as you may verify. [Hint: Divide the whole equation by −C ϕ, ˙ use again the Kepler second law, r2 ϕ˙ = C (Eq. 20.163), and recall that u = 1/r. For the second term, use r ϕ˙ 2 /(C ϕ) ˙ = 1/r = u.] Equation 20.166 is satisfied by u(ϕ) =
GM 1 + cos(ϕ − ϕ∗ ) , C2
(20.167)
where and ϕ∗ are arbitrary constants. [Let us verify that Eq. 20.167 indeed satisfies Eq. 20.166. We have GM d2 u −GM GM + u= cos(ϕ − ϕ∗ )+ 2 1 + cos(ϕ − ϕ∗ ) = 2 , (20.168) 2 2 dϕ C C C
847
20. Single particle dynamics in space
in agreement with Eq. 20.166.] The arbitrary constants and ϕ∗ are to be determined by imposing the initial conditions (see again Eq. 11.42). Specifically, recall that ϕ is the angle that r(t) makes with the x-axis, and note that the direction of the x-axis has not yet been chosen. Here, we choose it so as to have ϕ∗ = 0. Thus, using u = 1/r, we have r 1 + cos ϕ = , (20.169) where C2 . GM
=
(20.170)
Equation 20.169 shows that, in general, the trajectory is a conic, specifically, an ellipse if < 1 (Eq. 7.116), a parabola if = 1 (Eq. 7.127), and a hyperbola if √> 1 (Eq. 7.137). [Recall that, for an ellipse, the eccentricity is given by = a2 − b2 /a (Eq. 7.117), whereas the semi–latus rectum (Eq. 20.170) is given by = b2 /a (Eq. 7.110), where a and b denote respectively the semi–major and semi–minor axes of the ellipse. Similar expressions hold for parabolas and hyperbolas (Table 7.1, p. 294).] Parabolic and hyperbolic trajectories are not periodic, whereas the orbit of a planet is (by definition) periodic, and hence is necessarily elliptical, in agreement with the Kepler first law, namely: The orbit described by a single planet in its motion around an isolated star is an ellipse, of which the star occupies one focus. [The same holds true for comets as well.]
• Kepler third law
♥
In the case of elliptical orbits, we have that Eq. 20.164 may be integrated over a complete period to yield A = 12 CT , namely T =
2A , C
(20.171)
where A = πab is the area of the ellipse (Eq. 10.56). Using C 2 /(GM ) = = b2 /a (Eqs. 7.110 and 20.170), we have C 2 = GM = GM b2 /a. Accordingly, Eq. 20.171 yields T 2 = 4 A2 /C 2 = 4 π 2 a2 b2 /(GM b2 /a), namely T2 =
4 π 2 a3 , GM
(20.172)
in agreement with the Kepler third law, namely: The square of the orbital period of a single planet is proportional to the cube of the semi–major axis of
848
Part III. Multivariate calculus and mechanics in three dimensions
its orbit. [If the orbit is circular, Eq. 20.172 reduces to T 2 = 4 π 2 R3 /(GM ), in agreement with Eq. 20.149.]
20.8.4 Gravity vs gravitation
♥
Here, at last, we address the difference between gravity and gravitation. It should be noted that the Newton law of gravitation is not limited to celestial bodies. It is absolutely general and is valid for any two particles. In particular, it is the basis for the force that we call gravity, or weight, as discussed in this subsection. However, gravity does not coincide with the gravitational field, even though it is closely related to it. Specifically, if we assume the Earth to be a perfect ball, with spherically symmetric density distribution, we have that fW is given by (use fG(x) = −G m ME r/r3 , Eq. 20.142) fW (x) := −G
m ME r + m Ω 2 R ∗ eR ∗ , r3
(20.173)
where ME is the mass of the Earth, Ω = θ˙ is the Earth angular speed (θ being the angle of rotation around its axis), R∗ is the distance of P from the axis of the Earth, and eR∗ is the unit vector normal to the axis of the Earth through P . [The motivation for the additional term in Eq. 20.173 is presented in the comment below. In addition, the apparently unacceptable fact that we apply to a point on the surface of the Earth the law of the gravitational force exchanged between two mass points is addressed in Remark 154, below.] Remark 153. To be consistent, I use the term angular speed to refer to Ω. The term angular velocity is used for the vector quantity ω = Ω eω , where eω is a unit vector parallel to the axis around which the rotation occurs (Eq. 21.121). ◦ Comment. Where does the additional term in Eq. 20.173 come from? It is due to the rotation of the Earth. To see this, the term may be written as (m v 2 /R∗ ) eR∗ , where v = Ω R∗ is the speed of the point P of the Earth. This equals the lateral inertial force (Subsection 20.4.1), acting on a point that moves on a circular trajectory, with speed equal to v = Ω R∗ , as the point P indeed does. [As an alternative point of view, in Chapter 22 we will see that the additional term may also be considered as due to the so–called centrifugal force, an apparent force that arises because the fact that the Earth is rotating implies that a frame of reference rigidly connected to the Earth is not inertial. These issues are addressed in Section 22.4, where it is shown
849
20. Single particle dynamics in space
that the apparent force in the Earth frame of reference is essentially given by the additional term in Eq. 20.173. In this case, both viewpoints discussed in Remark 145, p. 813 (namely lateral inertial force and centrifugal force) are valid and give the same result.] Remark 154. You might feel confused because the first term in Eq. 20.173 was obtained under the assumption that the two bodies may be conceived as mass points, which implies that their distance is considerably greater than their sizes, whereas in dealing with gravity we are on the surface of the Earth. However — I promise you — this is not a problem. Indeed, as we will see, such a result is exact if we assume that the Earth is a perfect ball, with spherically symmetric density distribution. Specifically, using the Gauss law of gravitation (addressed in Vol. III), the gravitational force that a perfect ball (with radius R and center O, and having spherically symmetric density distribution) exerts on any point with distance from O greater than (or equal to) R is exactly equal to that produced by a material point with the same mass placed at O. For the time being, you have to take my word for it. Equation 20.173 may be written as fW (x) := m g(x),
(20.174)
where the vector g, which determines the direction of the vertical at P , by definition (Remark 39, p. 147), is given by g(x) := −G
ME er + Ω 2 R ∗ eR ∗ , r2
(20.175)
where ME is the mass of the Earth, r = x − xC (xC being the location of the center of the Earth, assumed to be spherically symmetric), and r = r > RE, RE being the radius of the Earth. [This generalizes the definition for the local acceleration of gravity, given in Eq. 20.12, as g = w/m = gk, where k is locally aligned with gravity, by definition. Specifically, up to this point, gravity (weight) was treated as a constant, the underlying assumption being that the motion was local (namely that the displacement was adequately small for us to neglect the variations of g). Now we can see how g is a function of x.]
20.9 Appendix B. Cycloids, tautochrones, isochrones
♣
As mentioned in Theorem 209, p. 812, the curve described by Eq. 20.54 is a tautochrone, a close cousin of a cycloid. For the sake of completeness, both of these curves are addressed in this appendix. Specifically, in Subsection 20.9.1,
850
Part III. Multivariate calculus and mechanics in three dimensions
we address cycloids, whereas in Subsection 20.9.2 we consider tautochrones. In the last subsection, we discuss the relationship between the two.
20.9.1 Cycloids
♣
We have the following Definition 340 (Cycloid). The cycloid is the curve generated by a point P on the boundary of a disk having radius R, as this rolls (clockwise), without sliding, over the x-axis. As initial conditions, we assume that the origin belongs to the cycloid. Specifically, we have x(θ) = y(θ) = 0,
for θ = 0,
(20.176)
where θ denotes the angle of rotation of the disk from its initial location. Remark 155. Strictly speaking, I should use the term cycloid through the origin. For, the assumption that the origin belongs to the cycloid is here introduced for the sake of convenience, as it simplifies the mathematical expressions considerably. The same holds true for the tautochrone, which is obtained from the upside–down cycloid by a horizontal translation, so that the origin belongs to the curve. To visualize a cycloid, consider the trajectory of a little pebble stuck in the tread of the tire, as your car moves over a flat road. Even better, place a small light next to the tread of the tire of your bicycle (the closer, the better), and pedal. At night, what people would see is a cycloid. To obtain the equation that describes the cycloid, note that the fact that there is no sliding implies that, as the disk rolls (clockwise) by an angle θ > 0, its center advances along the positive x-semiaxis of a quantity R θ. On the other hand, for an observer placed at the center of the disk O = (Rθ, R), the location of the point P describes a circle, with center O, namely x ˘ = −R sin θ and y˘ = R (1 − cos θ). Hence, adding the two contributions, we obtain the parametric representation of a cycloid, namely x(θ) = R (θ − sin θ)
and
y(θ) = R (1 − cos θ).
(20.177)
The cycloid with R = 1 is depicted in Fig. 20.10. [Note that the origin of the (x, y)-plane belongs to the curve. Indeed, we have x(0) = y(0) = 0 (Eq. 20.176).] ◦ Comment. The function y = y(θ) repeats itself, whenever θ is increased by 2π. Correspondingly, x(θ) is increased by 2πR. In other words, the function
851
20. Single particle dynamics in space
Fig. 20.10 Cycloid, with R = 1
y = y(x) is periodic, with period 2πR. It is also an odd function of x, as you may verify. Next, we want to obtain the differential equation for the explicit representation, y = y(x), of the curve. This may be obtained by noting that dx = R (1 − cos θ) dθ
and
dy = R sin θ. dθ
(20.178)
Therefore, we have dy sin θ = . dx 1 − cos θ
(20.179)
Accordingly, using A2 − B 2 = (A + B) (A − B) (Eq. 6.37), one obtains
dy dx
2 =
2R − y sin2 θ 1 − cos2 θ 1 + cos θ = , = = 2 2 (1 − cos θ) (1 − cos θ) 1 − cos θ y
(20.180)
that is, dy =± dx
1
2R − 1. y
(20.181)
[The ambiguity of the sign will be resolved at the end of this subsection (Remark 156, p. 852).] + Note that as y approaches 0 , dy/dx tends to infinity. [Here, we use the + sign in Eq. 20.181, because y > 0 for θ > 0, and y(0) = 0.] Thus, at the origin of the (x, y)-plane, the curve has a cusp, namely a vertical slope on both sides of the origin. [For, y(x) is an odd function of x (see the comment above.] This occurs also for θ = ±2kπ (k = 1, 2, . . . ), namely for x = ±2kπR. [For, y(x) is an periodic function of x (see the comment above.]
852
Part III. Multivariate calculus and mechanics in three dimensions
We have the following Theorem 217. Through any given point (x∗ , y∗ ), with x∗ > 0 and y∗ > 0, there exists only one cycloid described by Eq. 20.177, with θ ∈ (0, 2π). Proof. ♥ Figure 20.10 shows that y/x decreases as we move along the curve away from the origin, with y/x = (1 − cos θ)/(θ − sin θ) (Eq. 20.177), where θ ∈ (0, 2π). [For an analytical proof, see the comment below.] Therefore, for any given point P∗ = (x∗ , y∗ ), with x∗ > 0 and y∗ > 0, there exists only one ray from the origin through P∗ . Correspondingly, we have only one value θ∗ ∈ (0, 2π). Once we have determined θ∗ , there is only one value of R such that the cycloid contains the point P∗ . ◦ Comment.♥ If you prefer to use analytical proofs instead of “visual thinking” and want me to be picky, I’ll be picky. We have (use Eq. 20.177) y 1 − cos θ = . x θ − sin θ
(20.182)
Differentiating yields (use the quotient rule, Eq. 9.22) d + y , sin θ (θ − sin θ) − (1 − cos θ)2 −2 + 2 cos θ + θ sin θ = = . dθ x (θ − sin θ)2 (θ − sin θ)2
(20.183)
Next, set θ = 2α ∈ [0, 2π], so as to have 1 − cos θ = 2 sin2 α and sin θ = 2 cos α sin α (Eqs. 7.62 and 7.64 respectively). On the numerator N , we have N = −2 (1 − cos θ) + θ sin θ = −4 sin α (sin α − α cos α),
(20.184)
Note that, for α = θ/2 ∈ (0, π), the numerator is always negative, because sin α > 0 and sin α > α cos α. [Hint: To show that sin α > α cos α: (i) for α ∈ (0, π/2), use tan α > α (Eq. 6.89), and (ii) for α ∈ (π/2, π), use cos α < 0.] On the other hand, the denominator is always positive. Therefore, the derivative of y/x with respect to θ is negative in this interval θ ∈ (0, 2π), and hence y/x is monotonically decreasing there. Remark 156. For θ = π, we have x = πR and y(θ) reaches its maximum value, namely y = 2R (Eq. 20.177), and we have y/x = 2/π. For values of y∗ /x∗ greater than 2/π, namely for θ < π, the point (x∗ , y∗ ) will be located on the ascending portion of the curve (plus sign in Eq. 20.181). For values of y∗ /x∗ less than 2/π, namely for θ∗ > π, the point (x∗ , y∗ ) will be located on the descending portion of the curve (minus sign in Eq. 20.181). This resolves the ambiguity of the sign in Eq. 20.181.
853
20. Single particle dynamics in space
20.9.2 Tautochrones
♣
Next, consider the curve introduced with Eq. 20.54, namely the tautochrone, which as mentioned at the beginning of this appendix is a close cousin of a cycloid. To be specific, we have the following11 Definition 341 (Tautochrone). Consider a point P located on the boundary of a disk of radius R that is placed just below the line y = 2R. The tautochrone is the curve generated by the point P , as the disk rolls (counterclockwise) without sliding along such a line. The origin belongs to the tautochrone, so that x(θ) = y(θ) = 0, for θ = 0, as in Eq. 20.176. [Remark 155, p. 850, applies here as well. Indeed, the assumption regarding the origin is again introduced only for the sake of convenience. If we didn’t, tautochrones and upsidedown cycloids (see the subsubsection that follows) would refer to the same type of curve.] Proceeding as we did for the cycloid, taking into account that θ is now positive when counterclockwise, and imposing the conditions x(0) = y(0) = 0 (Eq. 20.176), we obtain that the tautochrone is described by the equation x = R (θ + sin θ)
and
y = R (1 − cos θ),
(20.185)
where again R denotes the radius of the disk. The function y(x) is even and periodic, with period 2πR. The tautochrone with R = 1 is depicted in Fig. 20.11.
Fig. 20.11 Tautochrone
Next, consider the differential equation for the explicit representation of the curve y = y(x). Noting that 11
The term tautochrone (namely the curve that yields “equal times”) stems from (what else!) ancient Greek: τ αυτ oς (tautos, same) and χρoνoς (chronos, time). It is sometimes referred to as an isochrone, also from ancient Greek: ισoς (isos, equal).
854
Part III. Multivariate calculus and mechanics in three dimensions
dx = R (1 + cos θ) dθ
and
dy = R sin θ, dθ
(20.186)
we have dy/dx = sin θ/(1 + cos θ), namely (use A2 − B 2 = (A + B) (A − B), Eq. 6.37)
dy dx
2 =
y sin2 θ 1 − cos2 θ 1 − cos θ = , = = 2 2 (1 + cos θ) (1 + cos θ) 1 + cos θ 2R − y
(20.187)
that is, dy =± dx
1
1 . 2R/y − 1
20.9.3 From cycloids to tautochrones
(20.188)
♠
Here, for the sake of completeness, we discuss the relationship between cycloids and tautochrones. Specifically, we first address the relationship between cycloids and upside–down cycloids, and then that between upside–down cycloids and tautochrones. ◦ From cycloids to upside–down cycloids. The curve obtained from a cycloid through a reflection around the line x = R is called an upside–down cycloid. Its equation is obtained from the equation of the cycloid, namely x = R (θ − sin θ) and y = R (1 − cos θ) (Eq. 20.177) through a rotation around the line y = R. This operation corresponds to changing the sign of y and then adding 2R, so as to obtain y = R (1 + cos θ). ◦ From upside–down cycloids to tautochrones. To go from an upside– down cycloid to a tautochrone, we take two steps. First, we replace θ with θ + π, so as to have θ = 0 where the curve reaches its minimum (Fig. 20.11). This yields x = R(θ +π +sin θ) and y = R(1−cos θ). [Use sin(θ +π) = − sin θ and cos(θ + π) = − cos θ (Eq. 6.69).] Next, we subtract πR from x, so that the point of minimum (i.e., θ = 0) coincides with the origin (recall that the origin is a point of the tautochrone, by definition). The final result is x = R (θ + sin θ) in full agreement with Eq. 20.185.
and
y = R (1 − cos θ),
(20.189)
Chapter 21
Dynamics of n-particle systems
In Chapter 11, we addressed the dynamics of n particles, in particular of mass–spring systems, but only for the limited case of one–dimensional motion. On the other hand, in Chapter 17, we have addressed n particles in space, but only for the limited case of statics. Then, in Chapter 20, we addressed three–dimensional dynamics for the limited case of a single particle. In this chapter, we address the dynamics of n-particle systems in space. This consists in using the single–particle formulation for each of the n particles involved and extending to dynamics the validity of the Newton law of action and reaction introduced for statics (Eqs. 17.59 and 17.61).
• Overview of this chapter In Section 21.1, we extend to n-particle systems the single–particle formulation of Chapter 20. Next, in Section 21.2 we consider an application to the dynamics of mass–spring chains, in particular to a two–particle chain, where we encounter again the phenomenon of beats and uncover a new phenomenon, that of energy transfer. Next, we shift gears, as if taking a U–turn. Specifically, after a brief section to justify the sudden change of subject matter (Section 21.3), we get into the bulk of this chapter. This is done in Sections 21.4, 21.5 and 21.6, where we exploit the Newton third law (which — let me emphasize — applies to statics as well as dynamics) to obtain the theorems regarding, respectively, the evolution equations of momentum, angular momentum, and energy of nparticle systems. In particular, we introduce the so–called center of mass and introduce the equation governing its motion. We also show how the equations of angular momentum and energy may be decomposed into those regarding the motion of the center of mass and those for the motion around the center
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_21
855
856
Part III. Multivariate calculus and mechanics in three dimensions
of mass, namely the motion perceived by an observer in a translating frame of reference with origin at the center of mass (Definition 345, p. 872). Applications of these equations are presented in Section 21.7. [In particular, first we discuss why cats always land on their feet, then we present the dynamics of two unanchored masses connected by a spring, and finally we revisit the Kepler laws, by removing the assumption that one of the two celestial bodies is much heavier than the other.] We conclude the chapter with two appendices. In the first (Section 21.8) we address the dynamics of the Sun–Earth–Moon system. The second (Section 21.9) deals with the potential energy for conservative force fields, for n particles.
21.1 Dynamics of n-particle systems Consider a system of n particles in space. According to the Newton second law (Eq. 20.3), for each particle we have mh
dvh = fh (t, x1 , . . . , xn , v1 , . . . , vn ) dt
(h = 1, . . . , n),
(21.1)
where mh is the (constant) mass of the h-th particle, vh is its velocity, whereas fh is given by (Eq. 17.64) fh = fhE + fhI ,
with fhI =
n #
fhk ,
(21.2)
k=1
where fhE is the resultant of all the external forces acting on the h-th particle, whereas fhI is the resultant of all the internal forces fhk acting on the h-th particle, with fhk denoting the force that the k-th particle exerts on the h-th one. [Recall that for simplicity we have set fhh = 0 (Eq. 17.10) and included it in the sum.] In Eq. 21.1, we state explicitly the fact that, as in the single–particle case, the force vectors fh in general depend upon time, as well as the location and velocity of each particle. However, as pointed out in Remark 93, p. 455, none of the forces depends upon the particles’ acceleration.
857
21. Dynamics of n-particle systems
21.2 Dynamics of mass–spring chains Here, we apply the above formulation for n-particle systems to the specific case of mass–spring chains. In Subsection 17.4.4, we have examined a small–displacement formulation for the statics of spring–particle chains constrained at the endpoints, where the springs have the same stiffness constant κ and the same reference length R. For both longitudinal and lateral equilibrium, the (linearized) governing equation is of the type K x = f (Eqs. 17.47 and 17.50, respectively), where x = {xh }, with xh denoting either the longitudinal or the lateral displacement of the h-th particle. On the other hand, for both cases K may be written as K = κ Q, where Q = QT is given by Eq. 17.49. [Specifically, for the longitudinal case, the constant κ coincides with the spring stiffness (Eq. 17.48), whereas for the lateral case we have κ = TS/S (Eq. 17.51), where S and TS are the spring length and tension in the stretched configuration.] Here, we extend such a formulation to dynamics. Let us consider a mass– spring chain, where the masses and the spring stiffness coefficients are not all necessarily equal. In Subsection 17.4.4 we obtained the linearized formulation for the planar case, under the assumption that the masses and the spring stiffness coefficients are all equal. Using the d’Alembert principle of inertial forces (Eq. 20.58), we obtain immediately that the system of linearized governing equation for the dynamics of n particles, connected by springs and having mass mk , is given by ¨h + mh x
n #
khj xj = fh (t),
(21.3)
j=1
namely M ¨x + K x = f(t).
(21.4)
If the motion is planar and the masses and the spring stiffness coefficients are all equal, we have that M = m I, whereas K = κ Q, with Q = QT given by Eq. 17.49, where κ coincides with the spring stiffness (Eq. 17.48), whereas for the lateral case we have κ = TS/S (Eq. 17.51), S and TS being the spring length and tension in the stretched configuration. ◦ Comment. On the other hand, if we remove the above assumptions, we have that M := Diag[mh ] is positive definite, whereas K = [khj ] is symmetric, as you may verify. Indeed, K is typically positive definite. [Cases in which K is not positive definite are addressed in Vol. II, when we study elastic instability.] In addition, the force vector f(t) = {fh (t)} includes the forces acting on the h-th particles that are not due to the springs.
858
Part III. Multivariate calculus and mechanics in three dimensions
21.2.1 Two–particle chain. Beats. Energy transfer
♥
Here, we restrict the preceding linearized mass–spring chain formulation to the particular case of the longitudinal motion of two particles connected to each other by a spring having constant κC, and anchored to the ground by springs having constants κh (h = 1, 2). [The system representation is similar to that in Fig. 4.4, p. 153, with only two masses instead of three.] The forces fk (namely additional applied forces not due to the springs) are assumed to be prescribed functions of time. The governing equations are given by m1 u ¨1 + κ1 u1 + κC (u1 − u2 ) = f1 (t), m2 u ¨2 + κ2 u2 + κC (u2 − u1 ) = f2 (t),
(21.5)
or m1 u ¨1 + k1 u1 − κC u2 = f1 (t), m2 u ¨2 + k2 u2 − κC u1 = f2 (t),
(21.6)
where kr = κr + κC. In matrix form, we have
m1 0 0 m2
u ¨1 u ¨2
k1 −κC + −κC k2
u1 u2
f1 (t) . f2 (t)
=
(21.7)
[Of course, the above equation is a particular case of Eq. 21.4.]
• Simplified problem The problem introduced above is of particular interest here, because it allows us to introduce an interesting new phenomenon, that of energy transfer. This is easier to understand when f1 (t) = f2 (t) = 0. Accordingly, here we assume this to be the case. In addition, to further simplify our analysis we assume that m1 = m2 =: m and κ1 = κ2 =: κ, so that k1 = k2 = k = κ + κC. [These restrictions are removed in Subsection 21.2.2.] Then, Eq. 21.6, divided by m, yields u ¨1 + ω ˘ 2 u1 − μ u2 = 0, u ¨2 + ω ˘ 2 u2 − μ u1 = 0,
(21.8)
where ω ˘2 =
κC κ k = + =ω ˘ 02 + μ, m m m
(21.9)
859
21. Dynamics of n-particle systems
with ω ˘ 02 =
κ m
and
μ=
κC . m
(21.10)
◦ Comment. Note that ω ˘ 0 = κ/m is the frequency of oscillation of the particles in the absence of coupling. [Set μ = 0 in Eqs. 21.8 and 21.9.] On the other hand, ω ˘ = k/m (21.11) is the frequency of oscillation of one particle when the other is fixed (set u2 = 0 in the first in Eq. 21.8). To solve the problem in Eq. 21.8, let us introduce an ad hoc approach. Let us introduce two new variables, namely the mean displacement and the semi–difference: x1 (t) =
1 u1 (t) + u2 (t) 2
and
x2 (t) =
1 u1 (t) − u2 (t) , (21.12) 2
so that u1 (t) = x1 (t) + x2 (t)
and
u2 (t) = x1 (t) − x2 (t).
(21.13)
◦ Comment. It is apparent from Eq. 21.13 that, if the semi–difference vanishes (namely if x2 (t) = 0), we have u1 (t) = u2 (t) = x1 (t), and the particles have the same motion. On the other hand, if the mean displacement vanishes (namely if x1 (t) = 0), we have u1 (t) = −u2 (t) = x2 (t) and the particles have equal and opposite motion. Next, consider the semi–sum and the semi–difference of the equations in Eq. 21.8. This yields (use Eq. 21.12) x ¨1 + ω12 x1 = 0, x ¨2 + ω22 x2 = 0,
(21.14)
where (see Eqs. 21.9 and 21.10) ω12 = ω ˘2 − μ = ω ˘ 02 = κ/m, ω22 = ω ˘2 + μ = ω ˘ 02 + 2 μ = (κ + 2 κC)/m.
(21.15)
◦ Comment. This result could have been anticipated from the comment below Eq. 21.13. Indeed, when the particles move with the same motion, namely for x2 (t) = 0, the coupling spring is not active, and hence it is as if we had μ = 0, so as to yield ω1 = ω ˘ 0 . On the other hand, when the particles move with equal and opposite motion, namely for x1 (t) = 0, the coupling
860
Part III. Multivariate calculus and mechanics in three dimensions
spring produces a force equal to twice that due to the motion of a single particle, with the other fixed (compare Eqs. 21.9 and 21.15). Note that the two equations in Eq. 21.14 are decoupled. Their solution is given by xr (t) = Ar cos ωr t + Br sin ωr t
(r = 1, 2).
(21.16)
Finally, substituting Eq. 21.16 into Eq. 21.13, one obtains u1 (t) = A1 cos ω1 t + B1 sin ω1 t + A2 cos ω2 t + B2 sin ω2 t, u2 (t) = A1 cos ω1 t + B1 sin ω1 t − A2 cos ω2 t − B2 sin ω2 t.
(21.17)
• Imposing the initial conditions Next, to obtain A1 , B1 , A2 and B2 , we need to impose the initial conditions u1 (0) = u10
and
u˙ 1 (0) = u˙ 10 ,
u2 (0) = u20
and
u˙ 2 (0) = u˙ 20 ,
(21.18)
where u10 , u˙ 10 , u20 and u˙ 20 are prescribed constants. To simplify the discussion of the solution, let us consider a specific set of initial conditions, namely u1 (0) = 1
and
u˙ 1 (0) = 0,
u2 (0) = 0
and
u˙ 2 (0) = 0.
(21.19)
The two conditions u˙ 1 (0) = 0 and u˙ 2 (0) = 0 yield ω1 B1 + ω2 B2 = 0 and ω1 B1 − ω2 B2 = 0, namely B1 = B2 = 0. On the other hand, the conditions u1 (0) = 1 and u2 (0) = 0 yield A1 + A2 = 1
and
A1 − A2 = 0,
(21.20)
namely A1 = A2 = 12 . Accordingly, Eq. 21.17 reduces to 1 (cos ω1 t + cos ω2 t), 2 1 u2 (t) = (cos ω1 t − cos ω2 t). 2
u1 (t) =
This solution is discussed below.
(21.21)
861
21. Dynamics of n-particle systems
• Beats again You might have noted that we have expressions similar to those encountered in Section 11.8, for the case of forced oscillations of an undamped harmonic oscillator, when discussing the case of a harmonic force, with forcing frequency close to the natural frequency: Ω ω. [See, in particular, Eq. 11.70, as well as Fig. 11.10, p. 476.] We saw then that, in such a case, we have a phenomenon that we called “near resonance,” which produces beats. As stated in Remark 102, p. 477, the phenomenon of beats occurs whenever we have two signals that have almost the same frequency. Thus, we have all the reasons to believe that a similar phenomenon may occur here as well, whenever κC is suitably small (weak coupling). To address this, let us consider the prosthaphaeresis equations (Eqs. 7.69 and 7.70), namely β−α α+β cos , 2 2 α+β β−α sin . cos α − cos β = 2 sin 2 2
cos α + cos β = 2 cos
(21.22)
Setting α = ω1 t and β = ω2 t, and combining with Eq. 21.21, we have ω1 + ω2 ω2 − ω1 t cos t , u1 (t) = cos 2 2 ω1 + ω2 ω2 − ω 1 t sin t . (21.23) u2 (t) = sin 2 2 We can interpret Eq. 21.23 as saying that both u1 (t) and u2 (t) oscillate with frequency 12 (ω1 + ω2 ), and an envelope that varies harmonically, with frequency 12 (ω2 − ω1 ). Of particular interest, is the case of weak coupling, in which μ = κC/m is very small, so√as to have ω1 ω2 . According to Eq. 21.15, we have ω1 = ω ˘0, whereas (use 1 + x = 1 + 12 x + O[x2 ], Eq. 13.28) ω2 =
. . μ ω ˘ 02 + 2μ = ω ˘ 0 1 + 2μ/˘ ω02 ω ˘0 + . ω ˘0
(21.24)
Thus, we obtain ωM :=
ω 1 + ω2 μ ω ˘0 + 2 2ω ˘0
and
ε :=
μ ω 2 − ω1 . 2 2ω ˘0
(21.25)
Therefore, in the case of weak coupling, we can say that u1 (t) and u2 (t) behave like the solution in Section 11.8, where we discussed “near resonance,” when the solution has an envelope amplitude that varies harmonically very slowly. We have again beats.
862
Part III. Multivariate calculus and mechanics in three dimensions
• Energy transfer Here, however, we encounter a new phenomenon, i.e., the energy moves back and forth from one particle to the other. In order t see this, note that when ε t = kπ (k = 0, ±1, ±2, . . . ), we have cos εt = ±1 and sin εt = 0. Therefore, according to Eq. 21.23, for tk := kπ/ε the amplitude of the envelope for u1 (t) is maximum, whereas u2 (t) vanishes. Therefore, let us concentrate on the neighborhood of tk , where, neglecting terms of order (t − tk )2 , we have | sin εt| ε|t − tk | and | cos εt| 1, where ε = 12 (ω2 − ω1 ) (Eq. 21.25). Hence, Eq. 21.23 yields
u2 (t) = sin εt sin ωMt ε (t − tk ) sin(ωMt) ≤ ε |t − tk |, (21.26) where ωM = 12 (ω1 + ω2 ) (Eq. 21.25). Similarly, for the velocity u˙ 2 (t), we have |u˙ 2 (t)| = |ε cos εt sin ωMt + ωM sin εt cos ωMt| ≤ ε 1 + ωM|t − tk | . (21.27) Correspondingly, in the neighborhood of tk = 2kπ/(ω2 − ω1 ), the mechanical energy connected with the particle 2, namely T + U (t) :=
1 1 m u˙ 22 + k u22 , 2 2
(21.28)
is of order ε2 . [Similar considerations hold for the particle 1 in the neighborhood of tk+ 12 = (k + 12 )π/ε, as you may verify]. What we have discovered is that there is a transfer of energy, from one particle to the other.
• Conservation of energy Don’t be misled by the above results. It’s true that there is an energy transfer. However, I claim that the total energy remains constant (as we have seen for a single particle (Eq. 20.105). Here, we verify this fact only for the specific problem considered above. [This result is addressed in its generality in Eq. 21.111 for two particles, and in Eq. 21.208 for n particles.] For the conservative system under consideration (Eq. 21.8 multiplied by m), the total energy is the sum of the kinetic and elastic energies of the overall system, namely E(t) := T (t) + U (t) 1 1 2 1 = m u˙ 21 + u˙ 22 + κ u21 + u22 + κC u1 − u2 . 2 2 2
(21.29)
863
21. Dynamics of n-particle systems
◦ Specific initial conditions. Let us begin with the specific initial conditions used above (Eq. 21.19) to obtain the solution in Eq. 21.21. We have * 1 ) * 1 ) 2 m ω1 sin2 ω1 t + ω22 sin2 ω2 t + κ cos2 ω1 t + cos2 ω2 t 4 4 1 1 1 (21.30) + κC cos2 ω2 t = κ + (κ + 2 κC), 2 4 4
E(t) =
as you may verify. [Hint: For the first equality in Eq. 21.30, substitute Eq. 21.21 into Eq. 21.29. For the second equality, use m ω12 = κ and m ω22 = κ + 2 κC (Eq. 21.15).] Hence, we have E(t) =
1 (κ + κC). 2
(21.31)
In other words, E(t) remains constant in time (in agreement with the above claim), and equal to its value at time t = 0, namely E(0) = 12 (κ + κC), which is obtained by combining the initial conditions (Eq. 21.19) with Eq. 21.29. ◦ Arbitrary initial conditions. The above result uses the solution corresponding to the specific initial conditions in Eq. 21.19. It should be noted, however, that this result is valid independently of the initial conditions. To show this, we can time differentiate Eq. 21.29, and then use Eq. 21.5, with m1 = m2 = m and κ1 = κ2 = κ, as well as f1 (t) = f2 (t) = 0. This yields dE/dt = 0, in agreement with Eq. 21.31. [Alternatively, one may multiply the first in Eq. 21.5 by u˙ 1 and the second by u˙ 2 , add, set f1 (t) = f2 (t) = 0, and obtain again dE/dt = 0. As we will see, this second approach is nothing but a particular case of the energy conservation theorem for conservative fields, which states that, if all the forces (external and internal) are conservative, the sum of kinetic energy and potential energy remains constant in time. [For two particles, see Eq. 21.111; for n particles, see Eq. 21.208).]
21.2.2 A more general problem
♥
I do not want to mislead you into believing that the assumptions that the masses of the particles and the corresponding spring constants are equal (namely m1 = m2 and κ1 = κ2 ) are essential for us to be able to find the solution to the problem in Eq. 21.6. They are not. In order to show you this, let me take this opportunity to introduce another ad hoc approach. Specifically, we replace the system of two equations in two unknowns with a single equation with a single unknown, so as to reduce the problem with one that we know how to solve.
864
Part III. Multivariate calculus and mechanics in three dimensions
To accomplish this, let us begin by adding: (i) the first in Eq. 21.6 multiplied by k2 , and (ii) the second derivative of the same equation, this time multiplied by m2 . This yields d 4 u1 d2 u1 d2 u2 d2 f1 + k − κ − m2 m1 1 C dt4 dt2 dt2 dt2 d 2 u1 + k2 m1 + k u − κ u − f (21.32) 1 1 2 1 = 0. C dt2 Next, we use m2 u ¨2 + k2 u2 = κC u1 + f2 (t) (second in Eq. 21.6), so as to eliminate u2 from the above equation. This yields m1 m2
d2 u1 d 4 u1 + m 2 k 1 + m1 k 2 + k1 k2 − κ2C u1 = F (t), 4 2 dt dt
(21.33)
where F (t) := m2 f¨1 (t) + k2 f1 (t) + κC f2 (t).
(21.34)
Then, dividing the result by m1 m2 one obtains 2 2 2 d 4 u1 2 2 d u1 + ω ˘ + ω ˘ + ω ˘1 ω ˘2 − μ ˘2 u1 = f (t), (21.35) 1 2 dt4 dt2 where ω ˘ r = kr /mr (r = 1, 2), is the natural frequency of the particle r when the other is fixed (Eq. 21.11), whereas μ ˘ 2 := κ2C/(m1 m2 ) and f (t) :=
* 1 )¨ F (t) = ˘ 22 f1 (t) + κC f2 (t)/m2 . f1 (t) + ω m1 m2 m1
(21.36)
In the following, we discuss the solution to Eq. 21.35. For simplicity, we limit again our analysis to the homogeneous problem, namely for f (t) = 0. Let us try a solution of the type u1 (t) = A cos ωt + B sin ωt.
(21.37)
Combining with Eq. 21.8, with f (t) = 0, we obtain that this is indeed a solution to the problem, provided that ω 4 − (˘ ω12 + ω ˘ 22 ) ω 2 + (˘ ω12 ω ˘ 22 − μ ˘2 ) = 0. This yields (recall the roots of a quadratic equation, Eq. 6.36) 1 2 2 2 2 2 ω ˘ 12 + ω + ω ˘ ˘ 22 ω ˘ 1 2 2 ω1,2 = ± − ω ˘1 ω ˘2 − μ ˘2 , 2 2
(21.38)
(21.39)
865
21. Dynamics of n-particle systems
namely 2 ω1,2
˘ 22 ω ˘2 + ω ± = 1 2
1
ω ˘ 12 − ω ˘ 22 2
2 +μ ˘2 .
(21.40)
2 [Of course, if ω ˘1 = ω ˘2 = ω ˘ , we recover ω1,2 =ω ˘2 ± μ ˘ (Eq. 21.15).] We have two (real) values for ω, and hence two sets of particular solutions. Thus, the solution is a linear combination of these, namely
u1 (t) = A1 cos ω1 t + B1 sin ω1 t + A2 cos ω2 t + B2 sin ω2 t,
(21.41)
akin to the first in Eq. 21.17. Next, we may obtain u2 (t) by using the first in Eq. 21.8. [This time the result will not coincide with the second in Eq. 21.17. We have the same functions, but different coefficients, linear functions nonetheless of Ak and Bk .] Then, imposing the initial conditions in Eq. 21.18, we have four equations with four unknowns, which allow us to obtain the solution. [You may try and work out the details. If you do that, you will appreciate the fact that I limited myself to the simpler problem addressed in the preceding subsection. Also, you may verify that if m1 = m2 =: m and κ1 = κ2 =: κ, the solution reduces to that of the preceding subsection.] ◦ Comment. Of course, the considerations regarding beats and energy transfer apply again whenever ω1 ω2 , namely when the term under the square root in Eq. 21.40 is small. This requires both: μ ˘ 0 (weak coupling) and ω ˘1 ω ˘2.
21.3 What comes next? Equation 21.4 is a specific example of systems of linear ordinary second–order differential equations with constant coefficients. We have seen two methods (namely those based upon the use of Eqs. 21.14 and 21.35, respectively) to find the solution of a specific system, namely that in Eq. 21.7. Question: “Can we apply either methodology to the general (linearized) problem in Eq. 21.4, namely to dynamical systems with an arbitrary number of particles?” Answer: “No!” Question: “Are there methods of solution for this problem? Answer: “Yes!” Can I present them to you now? “No!” We have not yet developed the mathematical tools for solving equations of the type given in Eq. 21.4. This requires some basic notions from linear algebra, specifically those regarding eigenvalues and eigenvectors, namely the eigenvalue problem in linear algebra, which is barely addressed in this volume (Subsection 16.3.4). [The general methodology for solving systems of second–order linear ordinary
866
Part III. Multivariate calculus and mechanics in three dimensions
differential equations with constant coefficients is addressed in Vol. II.] The above comments are presented expressly to give you a reason why in the rest of this chapter we do not address general methods of solution for systems of the differential equations. Nonetheless, there are some additional problems that we can address even without introducing such a general methodology. However, before doing this, we have to develop additional know–how regarding some important properties of n-particle systems. In particular, in Sections 21.4, 21.5 and 21.6, we introduce, respectively, the equations of momentum, angular momentum, and energy n particles, along with some of their consequences. Specifically, we introduce the definitions of momentum, angular momentum and energy for the overall system (rather than those for an individual particle), and then obtain the governing equations for these quantities. [These results will be especially useful in dealing with rigid bodies (Chapter 23).] Then, we will be in a position to discuss some applications, presented in Sections 21.7, along with the dynamics of the Sun–Earth–Moon system (Section 21.8).
21.4 Momentum Here, we define the momentum for n-particle systems and introduce some of its properties including its governing equation. Remark 157 (The Newton third law in dynamics). It should be emphasized that, as noted at the beginning of this chapter, crucial for the results of this section (and the two that follow) is the fact that, as experimental evidence indicates, the Newton third law (action and reaction, Eqs. 17.59 and 17.61), introduced in Chapter 17 for statics (used in particular in introducing resultant forces and resultant moments, Eqs. 17.67 and 17.74), is still valid for dynamics. Accordingly, many of the results presented here are based upon the theorems regarding resultant forces and resultant moments, presented in Chapter 17, on statics, since these were obtained by using solely the Newton law of action and reaction. We have the following Definition 342 (Momentum of particle systems). Consider a system of n particles, having mass m1 , . . . , mn . The quantity pk = mk vk is called the momentum of the particle k. The quantity p :=
n # k=1
pk =
n # k=1
mk vk
(21.42)
867
21. Dynamics of n-particle systems
is called the momentum of the n-particle system. We have the following (see Remark 157 above) Theorem 218 (Momentum equation). The time derivative of the momentum of the n-particle system equals the resultant force of the external forces dp = fE. dt
(21.43)
Equation 21.43 will be referred to as the momentum equation for an n-particle system. ◦ Proof : Recalling that mk is time–independent, summing mk v˙ k = fk (Eq. 21.1) over k and using Eq. 21.42, as well as the resultant force theorem, "n k=1 fk = fE (Eq. 17.67), one obtains n
n
k=1
k=1
# dvk dp # = = mk fk = fE, dt dt in agreement with Eq. 21.43.
(21.44)
Finally, we have the following (a generalization, from one particle to an n-particle system, of the Newton first law, Principle 5, p. 795) Theorem 219 (Conservation of momentum). In the absence of external forces, namely if fE = 0, the momentum of the n-particle system is conserved, namely p(t) = p(t0 ).
(21.45)
◦ Proof : If fE = 0, Eq. 21.43 reduces to dp/dt = 0, which integrated between t0 and t yields p(t) − p(t0 ) = 0, namely Eq. 21.45.
21.4.1 Center of mass In order to give a gut–level interpretation of the momentum equation, it is convenient to introduce the following Definition 343 (Center of mass). Consider n particles located at xk and having mass mk (k = 1, . . . , n). The center of mass is a point G identified by the vector
868
Part III. Multivariate calculus and mechanics in three dimensions n
xG =
1 # mk x k , m
(21.46)
k=1
where m=
n #
mk
(21.47)
k=1
is the total mass of the system. ◦ Warning. A symbol often used to denote the center of mass is R (see for instance Refs. [37] and [61]). This is not possible here, because, as mentioned in Definition 237, p. 596, in this book capital boldface letters are reserved for tensors, a notion that will be introduced in Section 23.9. In its place, I use the symbol xG. On the other hand, the point identified by the vector xG is denoted by G. Remark 158 (Barycenter vs center of mass. G vs G). In view of the fact that the weight of a particle equals the mass times the acceleration of gravity, wk = mk g (Eq. 20.175), comparing the definition of center of mass "n (Eq. 21.46), with the definition of barycenter, xG = k=1 mk xk /m (Eq. 17.81), we have that the center of mass coincides with the barycenter (Definition 285, p. 698, where we assumed g to be constant). Thus, the moment, with respect to the center of mass, of the weights of an n-particle system vanishes (Eq. 17.84), provided of course that g may be treated as constant. Nonetheless, it is important to keep the two notions distinguished. Indeed, they correspond to conceptually different phenomena. Again, the barycenter is the point with respect to which the moment of the gravity forces vanishes, provided that g may be treated as constant. The center of mass, on the other hand, is crucial in uncovering some secrets hidden in the equations of momentum, angular momentum and energy. That said, in order to emphasize the fact that the corresponding mathematical expressions coincide, I will use the symbol G for the both, barycenter and center of mass, and xG for their location. [On the other hand, as mentioned in Footnote 1, p. 803, the gravitational constant is denoted by G, to avoid confusion with the symbol G used for the center of mass.] Differentiating Eq. 21.46, noting that m and mk are constant, we have vG :=
n dxG 1 # = mk vk . dt m k=1
Accordingly, the definition of p (Eq. 21.42) may be written as
(21.48)
869
21. Dynamics of n-particle systems
p=
n #
mk vk = m vG.
(21.49)
k=1
Thus, we have the following Theorem 220 (Dynamics of the center of mass). For an n-particle system, we have m
dvG = fE, dt
(21.50)
namely the center of mass G moves like a particle of mass m subject to the resultant of the external forces. ◦ Proof : Combine p˙ = fE (Eq. 21.43) with Eq. 21.49.
Accordingly, if the resultant of the external forces vanishes, we have vG(t) = vG(t0 ),
(21.51)
namely the velocity of the center of mass remains constant with time. This is simply a different way to express the theorem of conservation of momentum (Eq. 21.45). ◦ Comment. For future reference, note that Eqs. 21.46 and 21.47 are equivalent to n #
mk (xk − xG) = 0.
(21.52)
mk (vk − vG) = 0.
(21.53)
k=1
This equation implies n # k=1
In other words, for an observer in a translating (not necessarily uniformly) frame of reference with origin at the center of mass xG, the corresponding momentum of the n-particle system vanishes. Remark 159. Note that, even if in pure translation (i.e., no rotation), a frame of reference with origin at the center of mass is not necessarily inertial. It is inertial if, and only if, the center of mass is in uniform rectilinear motion, namely when the resultant force vanishes.
870
Part III. Multivariate calculus and mechanics in three dimensions
21.5 Angular momentum Consider a particle placed at xk , and having mass mk and velocity vk . Recall that its angular momentum, hkO , with respect to a point xO , was defined by hkO := mk (xk − xO ) × vk (Eq. 20.66). Accordingly, we have the following Definition 344 (Angular momentum of an n-particle system). Consider an n-particle system. Its angular momentum, hO , with respect to a point xO , is defined as the sum of the individual angular momentums with respect to the same point xO , namely hO :=
n #
h kO =
k=1
n #
mk (xk − xO ) × vk .
(21.54)
k=1
In this section, we want to obtain the governing equation for the angular momentum hO of a particle system. To this end, we exploit the Newton third law, namely that fhk and fkh are equal and opposite and have the same line of action, which coincides with the line connecting the two particles. Correspondingly, we have mhkO + mkhO = 0 (Eq. 17.61). Accordingly, we have the following Theorem 221 (Angular momentum equation). For an n-particle system, we have dhO = mOE + m vG × vO , dt
(21.55)
where (Eq. 17.75) mOE :=
n #
(xk − xO ) × fkE
(21.56)
k=1
is the resultant of the individual external moments with respect xO . Equation 21.55 will be referred to as the angular momentum theorem. ◦ Proof : Using Eq. 21.54, one obtains n dhO d # = mk (xk − xO ) × vk dt dt k=1
= =
n # k=1 n # k=1
n
mk (xk − xO ) ×
dvk # + mk (vk − vO ) × vk dt k=1
mk (xk − xO ) ×
dvk + m vG × vO . dt
(21.57)
871
21. Dynamics of n-particle systems
[Hint: For the last equality, use a × a = 0 (Eq. 15.61), as well as m vG = "n ˙ k = fk (Eq. 21.1) k (Eq. 21.48).] On the other hand, using mk v k=1 mk v" n and mOE = k=1 (xk − xO ) × fk (Eq. 17.74), one obtains n #
n
mk (xk − xO ) ×
k=1
# dvk = (xk − xO ) × fk = mOE . dt
(21.58)
k=1
Finally, combining Eqs. 21.58 and 21.57 yields Eq. 21.55.
◦ Comment. As pointed out in Remark 157, p. 866, the Newton law of action and reaction is essential for the last equality in Eq. 21.58. Note that, if vG × vO = 0, Eq. 21.55 reduces to dhO = mOE , dt
(21.59)
which is the form more frequently used in the applications. The possibility that vG × vO = 0 occurs whenever vG and vO are parallel (in particular, when one vanishes, Remark 127, p. 607). Accordingly, we have 1. If vO = 0, namely if xO is fixed, or 2. If xO coincides with xG. In the latter case, we write Eq. 21.59 as dhG = mGE , dt
(21.60)
where hG :=
n #
mk (xk − xG) × vk
(21.61)
k=1
is the angular momentum with respect to the center of mass xG, whereas mGE :=
n #
(xk − xG) × fkE
(21.62)
k=1
is the resultant moment of the external forces, also with respect to xG. Finally, we have the following Theorem 222 (Conservation of angular momentum). In the absence of an applied resultant moment, namely for mOE = 0, if vG × vO = 0 we have that the angular momentum is conserved, namely hO (t) = hO (t0 ).
(21.63)
872
Part III. Multivariate calculus and mechanics in three dimensions
◦ Proof : Equation 21.59 reduces to dhO /dt = 0, which integrated between t0 and t yields hO (t) − hO (t0 ) = 0, in agreement with Eq. 21.63.
21.5.1 Useful expressions for angular momentum Let us introduce the following definitions: Definition 345 (Motion around the center of mass). The motion around the center of mass is, by definition, the motion of the n particles, as seen by an observer in a translating frame of reference with origin at the center of mass xG. [The translation is not necessarily uniform. Hence, the frame of reference is not necessarily inertial (Remark 159, p. 869).] Definition 346 (Angular momentum of the motion around xG). Given an n-particle system, the angular momentum with respect to xG of ˇ , is defined by the motion around xG, denoted by h G ˇ := h G
n #
ˇk , ˇk × v mk x
(21.64)
k=1
where ˇ k := xk − xG x
and
ˇ k = vk − vG. v
(21.65)
Then, one obtains the following Theorem 223. We have ˇ . hG = h G
(21.66)
"n ◦ Proof : Recall the definitions hG := k=1 mk (xk − xG) × vk (Eq. 21.61) ˇ := "n mk (xk − x ) × (vk − v ) (Eq. 21.64). Then, we have and h G G G k=1 ˇ := hG − h G
n #
mk (xk − xG) × vG = 0,
(21.67)
k=1
as in Eq. 21.66. [Hint: Use
"n
k=1
mk (xk − xG) = 0 (Eq. 21.52).]
Next, combining dhG/dt = mGE (Eq. 21.60) with Eq. 21.66, we have ˇ dh G = mGE . dt
(21.68)
873
21. Dynamics of n-particle systems
In plain language, we have that the time derivative of the angular momentum with respect to xG of the motion around xG equals the resultant of the external moments with respect to xG. Finally, we want to show the following Theorem 224 (From hG to hO ). The relationship between: (i) the angular momentum with respect to any point xO and (ii) that with respect to xG is given by hO = hG + m (xG − xO ) × vG.
(21.69)
In plain language, the angular momentum with respect to xO is the sum of two terms: (i) the angular momentum with respect to xG and (ii) the angular momentum with respect to xO of the mass m, placed at xG with velocity vG. "n ◦ Proof : Indeed, recalling that hO = k=1 mk (xk − xO ) × vk (Eq. 21.54), we have hO =
n #
mk (xk − xG) × vk + (xG − xO ) ×
k=1
n #
mk vk .
(21.70)
k=1
"n "n Then, using mvG = k=1 mk vk (Eq. 21.48) and hG = k=1 mk (xk − xG) × vk (Eq. 21.61), one obtains Eq. 21.69.
21.6 Energy In analogy with what we did in Subsection 11.9.1 for the one–dimensional case (Eq. 20.85), we have the following definitions Definition 347 (Kinetic energy of n particles). The kinetic energy of an n-particle system is defined as the sum of the individual kinetic energies 1 2 2 mk vk , namely T :=
n # 1 mh vh2 . 2
(21.71)
h=1
Definition 348 (Power for n particles). The power for an n-particle system is the sum of the powers generated by each particle through its motion (Eq. 20.86), namely P=
n # k=1
fk · v k .
(21.72)
874
Part III. Multivariate calculus and mechanics in three dimensions
Then, we have the following Theorem 225. Consider an n-particle system. The time derivative of the kinetic energy T equals the power generated by all the forces, be they external or internal, namely dT = PE + PI, dt
(21.73)
where PE and PI denote the power generated, respectively, by the external and internal forces, namely PE := PI :=
n #
fhE · vh ,
(21.74)
h=1 n #
n #
h=1
h,k=1
fhI · vh =
fhk · vh =
n 1 # fhk · (vh − vk ), 2
(21.75)
h,k=1
"n where fhI = k=1 fhk is the resultant force of the internal forces acting on the k-th particle. ◦ Proof : Dotting mh v˙ h = fh = fhE + fhI (Eqs. 21.1 and 21.2) with vh , and summing over all the particles, one obtains n # h=1
n
n
h=1
h=1
# # dvh · vh = mh fhE · vh + fhI · vh . dt
(21.76)
Next, note that (use Eq. 21.71) n # h=1
mh v h ·
dT dvh = . dt dt
(21.77)
Combining Eqs. 21.74–21.77, yields Eq. 21.73. [Hint: For the second equality "n in Eq. 21.75, use fhI = k=1 fhk (Eq. 21.2). For the third equality in Eq. 21.75, let us interchange the indices h and k, which is legitimate, because the dummy indices of summation may be renamed arbitrarily (Eq. 3.23). Then, take the semi–sum of the two resulting equivalent expressions, and use fkh = −fhk (Eq. 17.59).]
• Work Integrating Eq. 21.73 with respect to time, one obtains T (tb ) − T (ta ) = WE(ta , tb ) + WI(ta , tb ),
(21.78)
875
21. Dynamics of n-particle systems
where (use Eqs. 21.74 and 21.75) WE(ta , tb ) =
tb
/
PE dt =
ta
n / # h=1
Lh (ta ,tb )
fhE · dxh
(21.79)
fhI · dxh
(21.80)
and WI(ta , tb ) =
/
tb
PI dt =
ta
n / # h=1
Lh (ta ,tb )
are respectively the work performed by the external and internal forces between ta and tb . [Here, Lh (ta , tb ) denotes the path covered by the h-th particle between ta and tb , namely to go from xh (ta ) to xh (tb ).] In plain language, the work done by all the forces (external and internal) equals the change in kinetic energy.
21.6.1 Kinetic energy. The K¨ onig theorem Recall that the motion around the center of mass of an n-particle system is, by definition, the motion of the particles as seen by an observer in a translating frame of reference (no rotation) with origin at the center of mass (Definition 345, p. 872). Correspondingly, let us introduce the following definitions: Definition 349 (Kinetic energy of the center of mass). Given an nparticle system, consider a (fictitious) particle that has the total mass of the system and is placed at the center of mass, so that its velocity is equal to that of the center of mass. The kinetic energy of such a (fictitious) particle, namely TG :=
1 m vG2 , 2
(21.81)
is called the kinetic energy of the center of mass. Definition 350 (Kinetic energy around the center of mass). Given an n-particle system, consider its kinetic energy, as seen in a translating (not necessarily inertial) frame of reference with origin at the center of mass, namely Tˇ =
n # 1 k=1
2
mk vˇk2 ,
(21.82)
876
Part III. Multivariate calculus and mechanics in three dimensions
ˇ k = vk − vG (Eq. 21.65). The quantity Tˇ is called where vˇk = ˇ vk , with v the kinetic energy around the center of mass. [Recall Definition 345, p. 872, of motion around the center of mass.]
• The K¨ onig theorem We have the following1 Theorem 226 (K¨ onig theorem). The kinetic energy of an n-particle system may be expressed as T = TG + Tˇ ,
(21.83)
where TG = 12 m vG2 is the kinetic energy of the center of mass (Eq. 21.81), "n whereas Tˇ = h=1 12 mh vˇh2 is the kinetic energy around the center of mass (Eq. 21.82). ˇ h := vh − vG, we have ◦ Proof : Using v n
n
1# 1# ˇ h 2 T = mh vh2 = mh vG + v 2 2 k=1
k=1
n
n
k=1
k=1
# 1 1# ˇh + = m vG2 + vG · mh v mh vˇh2 , 2 2 which is equivalent to Eq. 21.83, since (Eq. 21.53).
"n
ˇh h=1 mh v
=
"n
h=1 mh (vh
(21.84) − vG) = 0
21.6.2 Decomposition of the energy equation In this subsection, we show that the energy equation for an n-particle system may be decomposed into the sum of two parts: (i) energy equation of the center of mass, namely for the mass concentrated in the center of mass and subject only to the external forces, and (ii) energy equation around the center of mass, namely as perceived by an observer in a (not necessarily inertial) translating frame of reference with origin at the center of mass. 1
Named after Johann Samuel K¨ onig (1712–1757), a German mathematician, who published it in 1751.
877
21. Dynamics of n-particle systems
• Energy equation of the center of mass We have the following Theorem 227 (Energy equation of the center of mass). The time derivative of the kinetic energy of the center of mass, equals the power generated by the resultant force of the external forces, all applied to the center of mass, namely dTG = PGE (xG), dt
(21.85)
PGE (xG) = fE · vG
(21.86)
where
is the power generated by the resultant force fE, applied to xG, through the motion of xG. ◦ Proof : Dotting m v˙ G = fE (Eq. 21.50) with vG, and using TG := (Eq. 21.81), which implies T˙ = m v · v˙ , one obtains Eq. 21.85. G
G
G
1 2
m vG2
• Energy equation around the center of mass We have the following Theorem 228 (Energy equation around the center of mass). We have dTˇ = PˇE + PˇI, dt
(21.87)
"n where Tˇ = 12 h=1 mh vˇh2 is the kinetic energy around the center of mass (Eq. 21.82), whereas PˇE := PˇI :=
n #
ˇh, fhE · v
(21.88)
h=1 n #
n #
h=1
h,k=1
ˇh = fhI · v
ˇh = fhk · v
n 1 # ˇk ) fhk · (ˇ vh − v 2
(21.89)
h,k=1
are the powers generated respectively by the external and internal forces, through the motion around the center of mass. In plain language, the time derivative of the kinetic energy around the center of mass equals the sum of the powers generated by all the forces (external and internal) through the motion around the center of mass.
878
Part III. Multivariate calculus and mechanics in three dimensions
◦ Proof : Subtracting the energy equation of the center of mass, T˙G = fE · vG (Eq. 21.85) from the equation of energy T˙ = PE + PI (Eq. 21.73) and using the K¨onig theorem, T = TG + Tˇ (Eq. 21.83), one obtains dTˇ = PE + PI − fE · vG. (21.90) dt "n Next, note that, using PE = h=1 fhE ·vh (Eq. 21.74), as well as the definition of PˇE (Eq. 21.88), we have PˇE =
n #
ˇh = fh · v E
h=1
n #
fhE · (vh − vG) = PE − fE · vG.
(21.91)
h=1
On the other hand, we have PˇI =
n #
ˇh = fhI · v
h=1
n #
fhI · (vh − vG) =
h=1
n #
fhI · vh = PI.
(21.92)
h=1
"n ˇ (Eq. 21.89), for the second [Hint: For the first equality use PˇI = h=1 fhI · v "n h I ˇ k = vk − vG (Eq. 21.65), for the third v = 0 (Eq. 17.66, internal h=1 fh" n resultant force theorem), and for the fourth PI = h=1 fhI · vh (Eq. 21.75).] Then, combining Eqs. 21.90, 21.91 and 21.92, one obtains Eq. 21.87. [For the second and third equality in Eq. 21.89, see the paragraph in brackets at the end of the proof of Theorem 225, p. 874).]
• Work Integrating Eq. 21.87 with respect to time, one obtains ˇ (ta , tb ) + W ˇ (ta , tb ), Tˇ (tb ) − Tˇ (ta ) = W E I
(21.93)
where the works done by the external and internal forces, in the motion around the center of mass, are given by ˇ (ta , tb ) = W E
/
ˇ (ta , tb ) = W I
/
tb
PˇE dt =
n / #
ta
h=1
tb
n / #
ta
PˇI dt =
h=1
ˆh (ta ,tb ) L
ˆh (ta ,tb ) L
ˇ h dt, fhE · v
ˇ h dt, fhI · v
(21.94)
where Lˆh (ta , tb ) denotes the line covered by xh through its motion around the center of mass.
879
21. Dynamics of n-particle systems
In plain language, the change in kinetic energy around the center of mass equals the sum of the work done by the external and internal forces through the motion around the center of mass.
21.6.3 Potential energy for two particles
♣
Here, we want to address what happens if the force, fkE (xk ), admits a time– independent potential energy, UkE (xk ). For the sake of simplicity, in this subsection we limit ourselves to problems with two particles, say x1 and x2 , since this is all we need to address the problems in Section 21.7. [For the sake of completeness, the generalization to n particles is addressed in Appendix B to this chapter (Section 21.9).]
• Conservative external forces For any differentiable function g(x1 , x2 ), we use the notation gradh g(x1 , x2 ) :=
∂g ∂g ∂g ∂g i+ j+ k =: , ∂xh ∂yh ∂zh ∂xh
(21.95)
with h = 1, 2. [Recall the definition of ∂g/∂x (Eq. 18.87).] Next, assume that for the external force acting on the first particle, f1E (x1 ), there exists a (time–independent) potential energy U1E (x1 ) such that f1E (x1 ) = − grad1 U1E (x1 ) = −
d U1E , d x1
(21.96)
where we have stated explicitly that U1E is not a function of x2 and t, but only a function of x1 . [Accordingly, the symbol of ordinary derivatives is used instead of that of partial derivatives.] Then, for W1E (ta , tb ), defined in Eq. 21.79, we have / / d U1E E E W1 (ta , tb ) := f1 · dx1 = − · dx1 L1 (ta ,tb ) L1 (ta ,tb ) d x1 = U1E [x1 (ta )] − U1E [x1 (tb )].
(21.97)
Of course, similar considerations hold for the second particle, and yield W2E (ta , tb ) = U2E x2 (ta ) − U2E x2 (tb ) . (21.98)
880
Part III. Multivariate calculus and mechanics in three dimensions
This shows that time–independent potential fields are conservative for 2−particle systems as well. [This generalizes Theorem 213, p. 828, for a single particle.] Finally, note that, setting UE = U1E + U2E ,
(21.99)
d UE = −PE, dt
(21.100)
PE := f1E · v1 + f2E · v2
(21.101)
we have
where
denotes the power generated by the forces f1E and f2E . Indeed, we have d UE d U1 dx1 d U2 dx2 = + = − f1E · v1 − f2E · v2 = − PE. · · dt dx1 dt dx2 dt
(21.102)
• Conservative internal forces Next, let us extend the formulation to internal forces, again for the limited case of two particles. We have the following Theorem 229. Consider two particles, placed respectively at x1 and x2 . Assume that there exists a potential energy, which is only a function of the distance between the two particles, namely U12 = U21 =: UI(r)
(r = x2 − x1 ).
(21.103)
Then, we have x 1 − x2 , r x 2 − x1 , := r
f12 = F (r) e12 ,
where e12 :=
f21 = F (r) e21 ,
where e21
(21.104)
with F (r) := −
d UI . dr
◦ Proof : Note that (in analogy with Eq. 20.121)
(21.105)
21. Dynamics of n-particle systems
grad1 r =
x1 − x 2 x2 − x 1 = e12 = −e21 = − = −grad2 r. r r
881
(21.106)
Thus, we have f12 := −grad1 UI(r) = −
d UI d UI e12 = e21 = grad2 UI(r) =: −f21 , (21.107) dr dr
in agreement with Eq. 21.104.
Equation 21.104 states that the resulting internal forces satisfy both items in the Newton law of action and reaction (Eqs. 17.59 and 17.61). Indeed: (i) the forces f12 and f21 are equal and opposite, and (ii) they have the same line of action, which is parallel to x1 − x2 . The reverse of Eq. 21.103 is also true! To be specific, we have the following Theorem 230. Consider a pair of conservative internal forces, which are such that f12 = F (r) e12 = −f21 (Eq. 20.123). The corresponding potential energy is only a function of the distance between the two particles. Specifically, it is given by any UI(r) with d UI/dr = −F (r). ◦ Proof : Indeed, grad 1 UI = −F (r) e12 = −f12 (Eq. 21.107). [Note the analogy with the proof of Theorem 215, p. 833, on the potential energy for central forces.]
• Work Next, consider the work, W12 , done by the two internal forces, f12 and f21 , as x1 moves along the trajectory L1 (ta , tb ), traveling from x1 (ta ) to x1 (tb ), whereas x2 moves along the trajectory L2 (ta , tb ), traveling from x2 (ta ) to x2 (tb ). Consider (use e21 = r/r, Eq. 21.106) 1/2 1 = r · dr = (dx2 − dx1 ) · e21 . dr = d r · r r
(21.108)
[Note that dr is equal to the projection of dr = d(x2 − x1 ) in the direction of e21 .] Then, using f12 = F (r) e12 = −f21 (Eq. 21.104), with F (r) = −d UI/dr (Eq. 21.105), we obtain
882
Part III. Multivariate calculus and mechanics in three dimensions
W12 =
/ L1 (ta ,tb )
/ = L1 (ta ,tb ) / rb
f12 · dx1 +
L2 (ta ,tb )
F (r) e12 · dx1 +
F (r) dr = −
=
/
ra
/
rb ra
f21 · dx2
/ L2 (ta ,tb )
(21.109)
F (r) e21 · dx2
d UI dr = UI(ra ) − UI(rb ), dr
where ra = r(ta ) and rb = r(tb ), whereas (use Eq. 21.105) / r UI(r) := − F (r ) dr
(21.110)
r0
equals minus the work done by the two forces starting from an arbitrary distance r0 .
• Energy conservation We have the following Theorem 231 (Energy conservation for potential fields; 2 bodies). Consider a system composed of two particles. If all the force fields (external and internal) are conservative, the sum of kinetic energy and potential energy remains constant in time, namely T (t) + UE(t) + UI(t) = T (0) + UE(0) + UI(0) = E(0),
(21.111)
where UE(t) = U1E [x(t)] + U2E [x(t)] is the sum of the potential energy of the two external forces, whereas UI(t) = UI[r(t)] denotes the potential energy of the internal force. ◦ Comment. In plain language, Eq. 21.111 states that the total energy of a two–particle system, namely E :=
1 1 m1 v12 + m2 v22 + U1E (x1 ) + U2E (x2 ) + UI(r), 2 2
(21.112)
remains constant in time. ◦ Proof : Indeed, using Eqs. 21.78, 21.97, 21.98, 21.99 and 21.109, we have T (tb ) − T (ta ) = UE(ta ) − UE(tb ) + UI(ta ) − UI(tb ), which is equivalent to Eq. 21.111. Similarly, for the motion around G, we have
(21.113)
883
21. Dynamics of n-particle systems
ˇ Tˇ (t) + UˇE(t) + UˇI(t) = E(0),
(21.114)
as you may verify. [Hint: Use Eq. 21.93; UˇE(t) and UˇI(t) denote the potential energies corresponding to the works done (by external and internal forces, respectively), through the motion around G.]
21.6.4 Potential energy for fields of interest
♣
This subsection presents an extension to internal forces of the single–particle formulation presented in Subsection 20.6.3, on the potential energy for fields of interest. Specifically, here we address the forces exchanged between two particles that are due to springs and gravitation, which are used in Section 21.7.
• Gravitation Consider the potential energy due to gravitation. In order to emphasize the fact that the two particles play an equivalent role, Eq. 20.120 may be rewritten as G G U12 = U21 = −G
m1 m2 , r
(21.115)
with r = x2 − x1 . Recalling that grad1 r = e12 (Eq. 21.106), the corresponding forces are G d U12 m 1 m2 grad1 r = − G e12 , dr r2 dUG m 1 m2 = − 12 grad2 r = − G e21 = −f12 . dr r2
G f12 = −grad1 U12 =− G f21 = −grad2 U21
(21.116)
• Linear springs Next, consider the potential energy due to a spring (namely its elastic energy, Definition 336, p. 832). Limiting ourselves to linear springs, Eq. 20.112 may be rewritten as S U12 =
1 κ ( − 0 )2 , 2
(21.117)
884
Part III. Multivariate calculus and mechanics in three dimensions
where = x1 − x2 , whereas 0 is the unstretched string length. Using S S d U12 /d = d U21 /d = κ( − 0 ), we obtain the corresponding forces as (use Eq. 20.113) S d U12 grad1 = −κ ( − 0 ) e12 , d S dU = − 21 grad2 = −κ ( − 0 ) e21 = −f12 . d
S f12 = −grad1 U12 =− S f21 = −grad2 U12
(21.118)
21.7 Applications In this section, we present three illustrative examples of the material covered in Sections 21.4–21.6. Specifically, we address the following problems: (i) the cat righting reflex (Subsection 21.7.1), namely why cats always land on their feet; (ii) the motion of two unanchored masses connected by a spring (Subsection 21.7.2); (iii) the motion of two masses subject to gravitational attraction (Subsection 21.7.3, where we revisit the Kepler laws and remove the limitation that one mass is much larger than the other one). [Another application is presented in Appendix A (Section 21.8), where we present a fairly detailed analysis of the dynamics of the Sun–Earth–Moon system).]
• Time derivative of a unit vector Before we can proceed, we need the time derivative of a unit vector. Consider an arbitrary unit vector and its positions at times t and t + Δt, namely e = e(t), and e = e(t + Δt) (Fig. 21.1, where Δθ > 0). The vectors e and Δe := e − e, in the limit as Δt tends to zero, determine a plane (limit of the plane of Fig. 21.1). [Of course, the plane determined, in the limit, by e and Δe is a function of time.] Next, note that Δe, in the limit, takes the direction perpendicular to e, in the plane of the figure. On the other hand, the magnitude of Δe approaches |Δθ|, since e = 1. Accordingly, we have de Δe Δθ = lim = lim n × e, Δt→0 Δt Δt→0 Δt dt
(21.119)
where n denotes the normal to the plane of the figure, whereas n × e is the unit vector obtained by rotating e by π/2 counterclockwise, on the plane of Fig. 21.1. Thus, we have
885
21. Dynamics of n-particle systems
Fig. 21.1 Time derivative of a unit vector
de = ω × e, dt
(21.120)
where the angular velocity ω is defined by ω :=
dθ n, dt
(21.121)
with θ˙ being the angular speed (Remark 153, p. 848). [Equation 21.120 is a three–dimensional extension of e˙ r = ϕ˙ eϕ (first in Eq. 20.156).] For future reference, note that ω · e = 0,
(21.122)
as you may verify.
21.7.1 How come cats always land on their feet?
♥
In this subsection, we discuss the so–called cat righting reflex, namely the ability that cats have to control themselves as they fall, so as to always land on their feet. To this end, we limit ourselves to an extremely simplified model of the dynamics of a cat as it falls, which however illustrates the phenomenon, at least for the simplest of all possible falls, namely a rotation in a vertical plane.
886
Part III. Multivariate calculus and mechanics in three dimensions
• A simple model of a falling cat To understand the essence of the phenomenon, it is sufficient to picture a cat as two particles, whose distance may be changed at our pleasure (really, at the cat’s pleasure), by changing (with a prescribed law) the length = (t) of a massless rod connecting the two masses. According to the Newton second law (Eq. 20.3), assuming the air drag to be negligible, the governing equations are mk
d2 x k = mk g + fk (t) dt2
(k = 1, 2),
(21.123)
where mk g denotes the force of gravity acting on each mass, whereas fk (t) (k = 1, 2) denotes the force that the rod exerts on the mass mk . [This is the force necessary to obtain a given time history for the distance between the two masses, = (t) (Subsubsection “The force necessary to obtain r(t),” on p. 888).] Recall Eq. 21.46, which for n = 2 reduces to xG =
m1 m2 x1 + x2 , m m
(21.124)
where m = m1 + m2 is the total mass of the system. Adding the equations in Eq. 21.123 for k = 1 and k = 2 and using Eq. 21.124 as well as f1 = −f2 (Newton law of action and reaction, Eq. 17.59), we have m
d2 x G = m g. dt2
(21.125)
This is simply a particular case of the equation governing the dynamics of ¨ G = fE (Eq. 21.50). the center of mass, m x The above equation implies that the center of mass of the system follows a parabola as the two masses fall (see Subsection 20.2.2, on the trajectory followed by a cannonball subject only to gravity). Next, we want to study the motion around the center of mass. To do this, we may use the results of Section 21.5, on the angular momentum. Specifically, consider the angular momentum with respect to xG, of the motion around xG. In our case, Eq. 21.64 simply gives ˇ = m1 x ˇ 1 + m2 x ˇ2, ˇ1 × v ˇ2 × v h G
(21.126)
ˇ k = xk − xG and v ˇ k = vk − vG (k = 1, 2; Eq. 21.65). where x ˇ 1 := Then, to obtain a cleaner expression, note that (use Eq. 21.124) x ˇ 2 . Therefore, x1 −xG = x1 −(m1 x1 +m2 x2 )/m, with a similar expression for x
887
21. Dynamics of n-particle systems
we have ˇ1 = x
m2 (x1 − x2 ) m
ˇ2 = x
and
m1 (x2 − x1 ), m
(21.127)
namely ˇ 1 = m2 x ˇ 2 = m12 r, −m1 x
(21.128)
where ˇ2 − x ˇ1 r = x2 − x 1 = x
and
m12 =
m1 m2 . m
(21.129)
Therefore, substituting into Eq. 21.126, one obtains ˇ = m12 r × (ˇ ˇ 1 ). v2 − v h G
(21.130)
◦ Comment. For future reference, Eq. 21.127 yields ˇ1 = v
m2 (v1 − v2 ) m
and
ˇ2 = v
m1 (v2 − v1 ). m
(21.131)
• Back to our falling cat Here, we apply the above results to the problem under consideration. Set er = r/r
where
r = r,
(21.132)
ˇ2 − x ˇ 1 (Eq. 21.129). and r = x Thus, we have (use e˙ r = ω × er , Eq. 21.120) ˇ 1 = (ˇ ˇ 1 )˙ = r˙ = r˙ er + r e˙ r = r˙ er + ω × r. ˇ2 − v x2 − x v
(21.133)
Combining Eqs. 21.130 and 21.133, one obtains ˇ (t) = m12 r × (ω × r). h G
(21.134)
[Hint: Use r × r = 0 (Eq. 15.61), so that r × er = 0.] Next, using the BAC– minus–CAB rule, a × (b × c) = b (a · c) − c (a · b) (Eq. 15.96) as well as ω · er = 0 (Eq. 21.122), we have ˇ (t) = m12 r2 ω. h G
(21.135)
Finally, recall that mGE = 0. [Indeed, the resultant moment of the weights with respect to G vanishes (see Remark 158, p. 868, as well as Eq. 17.84).] Therefore, we may use that theorem of conservation of angular momentum
888
Part III. Multivariate calculus and mechanics in three dimensions
ˇ (t) = h ˇ (0), or (Eq. 21.63) and obtain h G G r2 ω = r02 ω 0 .
(21.136)
As a consequence, the motion around xG is planar, and the angular speed ω is inversely proportional to r2 , namely ω r2 = C = ω0 r02 .
(21.137)
◦ Warning. In the following, we assume that the plane of rotation coincides with the plane of the parabola covered by G (simplest of all possible falls).
• The force necessary to obtain r(t) In the subsubsection that follows, we show how to choose the function r(t), to obtain how the cat rotates. Here, we evaluate the force necessary to obtain a prescribed r(t). Thus far, for the motion under consideration, we have used the two scalar equations for the motion of G (Eq. 21.125), and the angular momentum equation for the motion around G, which gave us Eq. 21.136. To complete the solution of the problem, we will use the component along r of the momentum equation around G to obtain the force necessary to impose the desired varia¨ k = mk g + fk , tion of the distance r. Specifically, consider the equations mk x where k = 1, 2 (Eq. 21.123). Dividing by mk and subtracting, one obtains ¨r =
1 f2 , m12
where r = x2 − x1 . [Hint: Recalling that f1 = −f2 , we have 1 f2 f1 1 1 f2 = − = + f2 , m2 m1 m2 m1 m12
(21.138)
(21.139)
with m12 = m1 m2 /m (Eq. 21.129).] r − r ϕ˙ 2 ) er + (2 r˙ ϕ˙ + r ϕ) ¨ eϕ (Eq. Setting f2 = f2 er and using ¨r = (¨ 20.158), the component along r of Eq. 21.138 yields f2 (t) := f2 · er = m12 (¨ r − r ϕ˙ 2 ),
(21.140)
where r(t) is prescribed. This equation allows us to evaluate the desired f2 (t).
21. Dynamics of n-particle systems
889
• The cat righting reflex The above model is adequately representative of the dynamics of a falling cat. Recall that we have assumed that the cat rotates within the plane of the parabola followed by xG. By changing ω (obtained by changing r), the cat can make sure that, when it lands, its feet are facing the ground. So, how do cats perform their trick? Do they have a supercomputer implanted in their brains that allows them to coordinate the parabolic trajectory with the rotation around the center of mass, so that, when they land, their feet face the ground? Surprise, surprise, surprise! They do not have a supercomputer implanted in their brain. Nonetheless, in essence, they manage to change the length of their body, so that, at landing, their feet always face the ground. What makes them do that? Practice? Survival of the fittest? Your guess is as good as mine! After all, the same trick is used by trapeze acrobats, when they move from one trapeze to another, to make sure to be in the correct angular position for grabbing the trapeze bar (they do not have a supercomputer implanted in their brains either). And so do competitive divers, to avoid splashes when entering the water.
• A better model
♥
The formulation presented above is really oversimplified, in that we assumed that the plane of rotation coincides with the plane of the parabola (see the warning on p. 888). Here is a brief review of more sophisticated analyses of the problem. The ´ cat falling problem was first investigated experimentally by Etienne–Jules 2 Marey (Ref. [45]). He also proposed a simple explanation for the falling cat problem. A better modeling of the falling cat problem was first proposed by Kane and Scher (Ref. [34]). Their model consists of a pair of rotating cylinders connected by a universal joint (two halves of the cat, hinged in the middle). For a more sophisticated model, see the 2018 paper by Ess´en and Nordmark (Ref. [20]), which includes a brief historical review. [For interesting visualizations of the phenomenon and its modeling, search the web under “falling cat problem.”] ´ hst Etienne–Jules Marey (1830–1904) was a French inventor, scientist, physiologist and chronophotographer. He is most famous for his invention and use of the chronophotographic gun (1882), an apparatus of his own creation that could take twelve consecutive frames a second (all recorded on the same photo), which he used to record animals and human beings in motion (including falling cats), making him a precursor of cinematography. [His photos are available on the web.]
2
890
Part III. Multivariate calculus and mechanics in three dimensions
21.7.2 Two masses connected by a spring
♥
Consider two masses connected by a linear spring and subject to gravity and no other external force. In other words, consider the simplified model presented in the preceding subsection, and replace the variable–length rod with a massless spring. Thus, the first part of the formulation is virtually identical to that of the preceding subsection. All the equations are still valid, in particular ω r2 = C = ω0 r02 (Eq. 21.137). Thus, the only issue we have to deal with is the variation of the distance between the two masses, which in the preceding subsection was assumed to be prescribed. In the preceding subsection, I have used the radial component of Eq. 21.138 to evaluate the force necessary to obtain a prescribed time history ω = ω(t). We could use the same equation here (namely the radial component of the momentum equation around G), to obtain r(t). Instead, just to show you that there more than one way to peel an apple and to expose you to a broader range of methodologies, I’d like to use a different approach, namely one based on energy considerations. [The equivalence of the two methods is presented in the subsubsection that follows.] Specifically, I will use the equation of energy around the center of mass, given by Eq. 21.93. Consider the kinetic energy Tˇ of the system around the center of mass (Eq. 21.82). For the case under consideration, if we combine ˇ 1 /m2 = m v ˇ 2 /m1 = v2 − v1 (Eq. 21.131) and v2 − v1 = r˙ er + ω × r −m v ˇ 1 /m2 = m v ˇ 2 /m1 = r˙ er + ω × r. This implies (Eq. 21.133), we obtain −m v 2 2 2 2 2 2 2 2 vˇ1 = m2 (r˙ + ω r )/m and vˇ2 = m1 (r˙ 2 + ω 2 r2 )/m2 , with m = m1 + m2 . Accordingly, we have 1 1 1 Tˇ := m1 vˇ12 + m2 vˇ22 = m12 r˙ 2 + ω 2 r2 , 2 2 2
(21.141)
where m12 = m1 m2 /m (Eq. 21.129). [Hint: Use (m1 m22 + m2 m21 )/m = m1 m2 /m = m12 Next, recall that the work performed by the internal forces due to a spring may be expressed in terms of their potential energy (Eq. 21.109). In our case, we have f12 = −κ (r − rU) e12 , where r denotes the current length of the spring, and rU the length in the unloaded configuration (Eq. 20.111). The corresponding potential energy is (use Eq. 21.117) 1 UˇI = κ (r − rU)2 . 2
(21.142)
Combining the last two equations with Tˇ (t) + UˇI(t) = Eˇ0 (namely Eq. 21.114, with UˇE = 0, as you may verify), we have
891
21. Dynamics of n-particle systems
1 2 1 m12 r˙ 2 + ω 2 r2 + κ r − rU = E0 . 2 2
(21.143)
Finally, ω may be obtained from r2 ω = C (see Eq. 21.137), to obtain 1 2 1 m12 r˙ 2 + C 2 /r2 + κ r − rU = E0 . 2 2
(21.144)
Equation 21.144 is a first–order ordinary differential equation for the unknown r(t), whose solution may be obtained by using the method of separation of variables for ordinary differential equation (Subsection 10.5.1). This yields the function t = t(r) as / t= 0
r
2 C 2 2 E0 κ r − rU − 2 − m12 m12 r
-1/2 dr.
(21.145)
This may be inverted to yield r = r(t).
• Comparison with the approach of Subsection 21.7.1
♥
In order to verify that the present approach is fully equivalent to the formulation of Subsection 21.7.1,let us start from Eq. 21.140 and replace f2 with −κ r − rU), to obtain m12 r¨ − ω 2 r + κ r − rU) = 0. Next, using C = ω r2 (Eq. 21.137), we have m12 r¨ − C 2 /r3 + κ r − rU) = 0, (21.146) namely the time derivative of Eq. 21.144 divided by r. ˙
• Small perturbation solution
♠
Here, we want to understand the essence of the phenomenon, by finding an approximate solution to Eq. 21.146, under the small–perturbation assumption. Specifically, we want to study the problem as a small perturbation around the steady–state solution (namely that with r˙ = 0). Let us begin with the steady–state solution. The steady–state value for r, denoted by rS, is obtained from Eq. 21.146, with r¨ = 0. This gives CS2 r3 S
where
= ωV2 rS − rU ,
(21.147)
892
Part III. Multivariate calculus and mechanics in three dimensions
ωV2 = κ/m12
(21.148)
is the square of the frequency of vibration of the mass–spring system when C = 0 (namely when ωS = 0), whereas (use Eq. 21.137) CS = ωS rS2 .
(21.149)
[In order for the phenomenon to be stationary, we have necessarily r0 = rS.] Combining Eqs. 21.149 and 21.147 yields ωS2 rS = ωV2 rS − rU . Thus, for a given value ωS, we have rS =
rU . 1 − ωS2 /ωV2
(21.150)
Next, let us perturb Eq. 21.146 around this solution, by setting r(t) = rS + u(t).
(21.151)
To make our life simple, we choose to have C = CS, and ω0 = ωS. This implies r0 = rS, and hence u(0) = 0. Accordingly, the perturbation is obtained by imposing u(0) ˙ = 0. −3 Using (1 + x) = 1 − 3 x + O[x2 ] (Eq. 13.26), one obtains −3 2 2 , 1 1 1 + 1 − 3 u/r 1 + u/r = = + O u /rS . S S (rS + u)3 rS3 rS3
(21.152)
Then, combining with Eq. 21.146, using Eq. 21.147 (so that the constant terms offset each other), and assuming u/rS 1 (so as to eliminate the nonlinear terms), we have d2 u + ω 2 u = 0, dt2
(21.153)
where ω 2 := ωV2 + 3 CS2 /rS4 , or (use Eq. 21.149 to eliminate CS2 ) ω 2 = ωV2 + 3 ωS2 .
(21.154)
Equation 21.153 tells us.that the two particles undergo harmonic oscillations, with frequency ω = ωV2 + 3 ω02 , around the steady–state distance rS, provided of course that u(t) rS.
893
21. Dynamics of n-particle systems
21.7.3 The Kepler laws revisited
♥
In Subsection 20.8.3, we studied the Kepler laws, under the assumption that the star may be considered to be fixed (Remark 151, p. 838). In this subsection, we revisit that analysis and remove such an assumption. ◦ Comment. Here, we consider two celestial bodies, having respectively masses m and M , without making any assumption on the ratio m/M . However, we are again assuming that the two bodies may be treated as mass points, so that the gravitational field is spherically symmetric. [The considerations in Remark 154, p. 849, apply here as well.] We may follow two approaches. In the first, we could extend the approach used in Subsection 20.8.3 to obtain the Kepler laws. Alternatively, we could proceed as we did in the preceding subsection, and modify the corresponding analysis of two masses connected by a string. We have shown that the two approaches yield the same result (Eq. 21.146). For completeness, both of them are presented in the two subsubsections that follow.
• First approach (momentum equation) Let us begin with the extension of the analysis used in studying the Kepler laws (Subsection 20.8.3). Combining the Newton second law (Eq. 20.3), with the Newton law of gravitation (Eq. 20.142), and denoting by M and m, respectively, the mass of the first and the second particle, we have M
d 2 x1 GmM = r dt2 r3
and
m
d2 x2 GmM =− r, dt2 r3
(21.155)
where r = x2 − x1 . ¨ G = 0. Therefore, ¨ 1 + m¨ x2 = (M + m) x Adding these equations yields M x the center of mass moves in uniform rectilinear motion. On the other hand, dividing the first in Eq. 21.155 by M and the second by m, and subtracting the results, we have d2 r G (M + m) =− r. 2 dt r3
(21.156)
What a nice surprise! Except for the fact that M is replaced by M + m, Eq. 21.156 is equal to Eq. 20.153, which is the basis of the whole analysis that leads to the Kepler laws. Therefore, the Kepler laws may be generalized to the case under consideration simply by replacing M with M + m. [It should be emphasized that here however r is measured from the center of gravity of the three bodies, instead of the center of the star.] This implies that the
894
Part III. Multivariate calculus and mechanics in three dimensions
second law (Eq. 20.163) remains unchanged and may be written as r2 ω = C,
(21.157)
with ω = ϕ. ˙ The first law, r(1 + cos ϕ) = (Eq. 20.169), remains formally unchanged. However, instead of = C 2 /(GM ) (Eq. 20.170), now we have :=
C2 b2 = . a G (M + m)
(21.158)
Finally, consider the third law. In the case of elliptical orbits, we still have that Eq. 20.164 may be integrated over a complete period to yield 2A = CT , with A = πab, so that T = 2πab/C. However, using Eq. 21.158 to eliminate b/C, we have T2 =
4 π 2 a3 . G (M + m)
(21.159)
This is still in agreement with the Kepler third law, which says that the square of the orbital period of the planet is proportional to the cubes of the semi–major axes of their orbits (Subsection 20.8.3). Now however, for each planet we have a (slightly) different constant of proportionality. Remark 160. It may be noted that the solar mass is MS (1.988 47 ± 0.000 07) · 1030 kg, and is 333,030 times the mass of the Earth ME 5.972 19 · 1024 kg (p. 1003). In other words, the uncertainty on the mass of the Sun is much larger than the mass of the Earth. Accordingly, the more correct expression for the period does not yield any appreciable difference in the case of the Earth. In this case, the assumption that the Sun is fixed is very reasonable. On the other hand, the new expression makes a difference, albeit ever so small, in the case of Jupiter, since the solar mass is “only” 1,047 times the mass of Jupiter (p. 1003). Even worse, we have MM = 73.48 · 1021 kg, so that the Moon to Earth mass ratio is MM/ME 0.012 3 (p. 1003).
• Second approach (energy equation) Next, let us address the same problem by using the approach employed to study two masses connected by a spring (Subsection 21.7.2). Using the angular momentum equation, we had obtained that r2 ω = r02 ω0 (Eq. 21.137). This result is valid here as well and is equal to Eq. 21.157 (Kepler second law). On the other hand, using the energy equation, we had obtained Eq. 21.143. Now, the potential energy of the spring 12 κ(r −rU)2 must be replaced with the
895
21. Dynamics of n-particle systems
potential energy of the gravitational field, namely UˇI(r) = −G m M/r (Eq. 20.120). Accordingly, the energy equation that we had in Subsection 21.7.2 (Eq. 21.143), must be replaced with GmM 1 Tˇ + UˇI = m12 r˙ 2 − ω 2 r2 − = E0 , 2 r
(21.160)
where m12 = m M/(m + M ) (Eq. 21.129). Differentiating Eq. 21.160 with respect to t and dividing the result by r, ˙ r − ω 2 r) + G m M/r2 = 0, namely one obtains m12 (¨ r¨ − ω 2 r + G (M + m)
1 = 0, r2
(21.161)
with ω = C/r2 (Eq. 21.157). The above equation is equal to Eq. 20.161 (namely r¨ − ϕ˙ 2 r = −GM/r2 ), provided of course that M is replaced by M + m. Hence, the conclusions of the preceding subsubsection are confirmed.
21.8 Appendix A. Sun–Earth–Moon dynamics
♥♥♥
In this appendix, we up the ante, and present an analysis of the motion of a planet and its satellite around a star, with emphasis on the Sun–Earth–Moon system. Let us start with some simple considerations. We know that: 1. The mass of the Sun is much larger than that of the Earth–Moon system, which allows us to treat the Sun as if it were fixed. 2. The Earth–Moon distance, r, is much smaller than the Sun–Earth distance, RG, namely ε :=
r 1. RG
(21.162)
These facts allow us to use the following gut–level approach. First, we consider the center of mass of the Earth–Moon system and study its motion around the Sun (an ellipse, the Sun being one of the foci, fixed because of Item 1). Then, we can study the motion of the Earth–Moon system around their center of mass. In doing this we can ignore the presence of the Sun, because the moment arising from the non–uniformity of the gravitational field is negligible, since the non–uniformity is small because of Item 2. One would expect the error introduced by this gut–level approach to go to zero with ε, specifically to be of order ε. This turns out to be too conservative
896
Part III. Multivariate calculus and mechanics in three dimensions
an estimate. The error is even smaller! In this section, we show that the error is of order ε2 .
• Restrictive hypotheses The following analysis is based upon these restrictive hypotheses: 1. The mass of the star M is much larger than m1 + m2 , where m1 and m2 denote, respectively, the mass of the planet and of its satellite. For the Sun–Earth–Moon system, this assumption is definitely valid. Indeed, we have (ME + MM)/MS 1.012 3 ME/MS 1.012 3/333, 060 3.039 · 10−6 (p. 1003). 2. The orbits of the satellite around the planet and that of the planet around the star are coplanar. For the Sun–Earth–Moon system in particular, this assumption is quite reasonable. Indeed, the inclination between the Earth orbit and the Moon orbit is about 5◦ (p. 1003). [Incidentally, the Moon orbit is closer to the plane of the Earth orbit than to the Earth equatorial plane. This is an unusual characteristic for other satellite–planet configurations.] This hypothesis is introduced only for the sake of simplicity. You may want to see what happens if this hypothesis is removed (see Subsection 21.7.1, on the cats landing on their feet). 3. The orbit, around the star, of the center of mass of the planet–satellite system is assumed to be circular. This hypothesis is also introduced only for the sake of simplicity. For the Sun–Earth system in particular, this assumption is very reasonable. Indeed, denoting by a and b the semi–major and semi–minor axes of the Earth elliptical orbit, we have (a − b)/a 0.000 140 (p. 1003; Remark 152, p. 840). 4. The orbit of the satellite around the planet (more precisely around the center of mass of the planet–satellite system, Subsection 21.7.3 on the Kepler laws revisited) is approximately circular. Again, this hypothesis is introduced only for the sake of simplicity. For the Earth–Moon system in particular, this assumption is also quite reasonable. Indeed, denoting by a and b the semi–major and semi–minor axes of the elliptical orbit of the Moon around the Earth, we have (a − b)/a 0.001 508 (p. 1003; Remark 152, p. 840, again). 5. The distance planet–satellite is much smaller than the distance star– planet. This hypothesis is essential for the small–perturbation analysis presented in this appendix. For the Sun–Earth–Moon system in particular, this assumption is quite reasonable. Indeed, for the average distance Sun– Earth we have rSE 149.60 · 106 km, whereas for the average distance Earth–Moon we have rEM 384, 400 km (p. 1003). Thus,
897
21. Dynamics of n-particle systems
ε := rEM/rSE 2.57 · 10−3 .
(21.163)
6. As in all the preceding analyses, the three celestial bodies are treated as point particles, so that their gravitational fields are spherically symmetric. ◦ Comment. We do not assume m2 m1 . Hence, the formulation is applicable for instance to a double planet. In view of Hypothesis (1) above, we can assume the star to be fixed (Subsection 21.7.3). Specifically, we can place the origin of our frame of reference at the center of the star, so that xS = 0.
(21.164)
As a consequence, we obtain that the complexity of the problem is considerably reduced: we have a two–body problem (the planet and the satellite), in the presence of a prescribed force field, namely the (time–independent) potential gravitational field that the star exerts on a celestial body having mass mk (be it the planet or the satellite). For the problem under consideration, the forces are given by (see Eq. 20.145, as well as Fig. 21.2) fkS = −G M
mk xk Rk3
(k = 1, 2),
(21.165)
where xk is the vector from the star (origin) to the celestial body under consideration, and Rk = xk . [In order to distinguish it from the planet– satellite distance, which is denoted by r, the distances from the star are denoted by Rk .]
Fig. 21.2 Star, planet and satellite
Moreover, in view of Hypothesis (5) above, we hope to be able to assume that, at any given time, planet and satellite (in particular, Earth and Moon) are subject to parallel gravitational forces so that the resultant moment around their center of mass may be considered small.
898
Part III. Multivariate calculus and mechanics in three dimensions
As mentioned above, our hope is that we can apply the theory of Section 20.8 (one body in a prescribed central force field) to the motion of the center of mass of the planet–satellite system, whereas for the motion around the corresponding center of mass we expect to be able to apply the formulation of the Subsection 21.7.3, on the Kepler laws revisited. Specifically, we expect that, neglecting higher–order terms, the center of gravity of the planet–satellite system moves in a circular orbit around the star, whereas the planet and the satellite move in circular orbits around their combined center of mass. The objective of this appendix is to find out to what extent this is true. Specifically, in the analysis that follows, we modify the analysis of Subsection 21.7.3, so as to take into account the fact that the presence of the external forces due to the star affects all the equations governing the phenomenon. In particular, we address: (i) equation of the motion of the center of mass of the planet–satellite system, (ii) angular momentum equation, and (iii) the energy equation.
• Motion of xG Let us begin with the formulation for the motion of the center of mass of the planet–satellite system. For this problem, we can use m v˙ G = fE (Eq. 21.50), which in our case reads m
dvG = f1S + f2S dt
(m = m1 + m2 ),
(21.166)
where fkS is given by Eq. 21.165. Next, let us consider what would happen if the right side of the equation were to be equal to fGS := −
GM m xG 3 RG
(RG := xG),
(21.167)
that is, to the gravitational force of a body having mass m = m1 +m2 , placed at the center of mass xG. In this case, Eq. 21.50 would be equivalent to Eq. 20.153 and we could conclude that the center of mass moves around the star like a body having a mass equal to m = m1 + m2 . [In general, such a motion is elliptical. However, by Hypothesis (3), p. 896, for the sake of simplicity we assume motion to be circular.] Unfortunately, f1S + f2S = fGS . However, in view of Hypothesis (5), p. 896, we expect this difference to be small. So, let us find out how far we are from the situation f1S + f2S − fGS = 0. We have (use Eq. 21.165)
899
21. Dynamics of n-particle systems
f1S + f2S = −
G M m1 G M m2 x1 − x2 R13 R23
Rk := xk .
(21.168)
Here, we want to evaluate the difference between Eqs. 21.167 and 21.168. Setting Δf := f1S + f2S − fGS ,
(21.169)
and using m xG = m1 x1 + m2 x2 (Eq. 21.46), we have m xG −Δf m1 x1 m2 x2 = + − 3 GM R13 R23 RG 1 1 1 1 = m1 x1 − x − + m . 2 2 3 3 R13 RG R23 RG
(21.170)
Next, let us consider the Taylor polynomial for 1/R1α , with R1 := x1 (Eq. 21.168), which we shall need for α = 1 and α = 3. Recall that we have x1 − xG = m2 (x1 − x2 )/m and x2 − xG = m1 (x2 − x1 )/m (Eq. 21.127). These may be written as x 1 = xG − μ2 r
and
x2 = xG + μ1 r,
(21.171)
where r := x2 − x1
and
μk =
mk m
(k = 1, 2).
(21.172)
Accordingly, setting r = r, using the first in Eq. 21.171 as well as (1+x)α = 1 + α x + O[x2 ] (Eq. 13.26), we have −α/2 2 1 = xG − μ2 r−α = RG − 2 μ2 r · xG + μ22 r2 R1α ,−α/2 1 + 2 = α 1 − 2 μ2 r · xG/RG + O ε2 RG 1 = α 1 + ε α μ2 cos χ + O ε2 , RG
(21.173)
with ε=
r RG
and
cos χ :=
r xG · , r RG
(21.174)
where χ is the angle between r and xG (see again Fig. 21.2). 3 + O[ε2 ], In particular, for α = 3, we have 1/R13 = (1 + ε 3 μ2 cos χ)/RG namely
900
Part III. Multivariate calculus and mechanics in three dimensions
1 1 3ε − 3 = 3 μ2 cos χ + O ε2 . R13 RG RG
(21.175)
Similarly, for 1/R23 , one obtains (use the second in Eq. 21.171, instead of the 3 first) 1/R23 = xG + μ1 r−α = (1 − 3 μ1 ε cos χ)/RG + O[ε2 ], namely 1 1 3ε − 3 = − 3 μ1 cos χ + O ε2 . R23 RG RG
(21.176)
Hence, combining Eqs. 21.170, 21.175 and 21.176, one obtains Δf 3 = 3 m1 μ2 x1 − m2 μ1 x2 cos χ ε + O ε2 (21.177) GM RG 3 m12 r 3 m12 cos χ + O ε3 . x1 − x2 cos χ ε + O ε2 = −ε2 = 3 2 RG RG r [Hint: Use ε = r/RG (Eq. 21.162), and m1 μ2 = m2 μ1 = m12 ,
(21.178)
which is obtained by combining m12 := m1 m2 /m (Eq. 21.129) and μk = mk /m (Eq. 21.172).] Thus, if ε 1, the term Δf is of order ε2 , and hence clearly negligible. This is definitely acceptable for the Sun–Earth–Moon system. Indeed, we have ε := rEM/rSE 2.57 · 10−3 (Eq. 21.163; Hypothesis (5), p. 896), and hence ε2 6.60 · 10−6 .
(21.179)
Then, neglecting terms of order ε2 , the center of mass moves along an elliptical orbit, which we assumed to be circular (Hypothesis (3), p. 896), definitely acceptable for the Sun–Earth–Moon system.
• Angular momentum Here, we consider the angular momentum equation. With respect to the formulation on the Kepler laws revisited (Subsection 21.7.3), we have to take into account the moment mG due to the Sun–generated gravitational forces. As mentioned at the beginning of this section, if the ratio ε = r/RG (Eq. 21.162) tends to zero, the force field acting on the Earth–Moon system tends to become uniform and mG(t) tends to zero. In such a case, for the motion around the center of mass of the Earth–Moon system, we may neglect the presence of the Sun.
21. Dynamics of n-particle systems
901
Here, we want to find out how small mG is. We have mG = (x1 − xG) × f1S + (x2 − xG) × f2S = −r × (μ2 f1S − μ1 f2S ). (21.180) [Hint: Use x1 − xG = −μ2 r and x2 − xG = μ1 r (Eq. 21.171 ).] Also, we have (use Eqs. 21.165 and 21.175) f1S = −G M
m1 m1 x1 = −G M 3 x1 1 + ε 3 μ2 cos χ + O ε2 . 3 R1 RG
(21.181)
3 + O[ε2 ] (use Eqs. Similarly, we have f2S = −GM m2 x2 (1 − ε 3 μ1 cos χ)/RG 21.165 and 21.176). Combining with Eq. 21.180, one obtains
) 2 * GM m r × x (1 + ε 3 μ cos χ) − x (1 − ε 3 μ cos χ) + O ε 12 1 2 2 1 3 RG r GM (21.182) = ε2 3 2 m12 × xG cos χ + O ε3 , RG r
mG =
as you may verify. [Hint: For the first equality, use m1 μ2 = m2 μ1 = m12 (Eq. 21.178). For the second equality, use r × r = 0 (Eq. 15.61) to show that the low–order terms offset each other. In addition, note that (use x1 = xG − μ2 r and x2 = xG + μ1 r, Eq. 21.171, and μ1 + μ2 = 1, Eq. 21.172) μ2 x1 + μ1 x2 = μ2 (xG − μ2 r) + μ1 (xG + μ1 r) = xG − (μ22 − μ21 ) r, (21.183) and then use r × r = 0 (Eq. 15.61) and ε = r/RG (Eq. 21.162).] Again, if ε 1, the term mG is definitely negligible, as we expected (for the Sun–Earth–Moon system, see Eq. 21.179). Then, if we neglect this term, we are back to the formulation of Subsection 21.7.3. Specifically, we have that r2 ω remains constant in time (Eq. 21.157). ◦ Comment. Let us get a cleaner expression for mG. Note that xG r × = sin χ n, r RG
(21.184)
where n is normal to the plane of the motion (Hypotheses (2), p. 896), whereas χ is the angle between r and xG (as in Eq. 21.174). Then, using sin χ cos χ = 1 2 sin 2χ (Eq. 7.64), we have mG = ε2
3 GM m12 sin 2χ n + O[ε3 ]. 2 RG
(21.185)
902
Part III. Multivariate calculus and mechanics in three dimensions
• Energy The last item we have to check is how the presence of the star affects the ˇ = P +P −f ·v energy equation for the planet–satellite system. Using dT/dt E I E G S S (Eq. 21.90) with fE = f1 + f2 , we obtain dTˇ = PI + PE − (f1S + f2S ) · vG. dt
(21.186)
The term PE − (f1S + f2S ) · vG was not present in the analysis presented in the second approach in Subsection 21.7.3 (the Kepler laws revisited), because there we had no external forces (see Eq. 21.160). Here, we want to show that such a term is of order ε2 . Consider UE. We have (Eqs. 21.99 and 20.120) UE = −
G M m1 G M m2 − . R1 R2
(21.187)
Using 1/R1 = (1+εμ2 cos χ)/RG +O[ε2 ] and 1/R2 = (1−εμ1 cos χ)/RG +O[ε2 ] (Eq. 21.173 with α = 1), along with m1 μ2 = m2 μ1 = m12 (Eq. 21.178), we obtain * GM ) UE = − (m1 + ε m12 cos χ) + (m2 − ε m12 cos χ) + O ε2 RG GM m (21.188) + O ε2 . =− RG 3 (Eq. Hence, using PE = −d UE/dt (Eq. 21.100), and fGS = −G M m xG/RG 21.167), we have
PE = −
d UE GM m =− xG · vG + O ε2 = fGS · vG + O ε2 . 3 dt RG
(21.189)
Thus, using Δf := f1S + f2S − fGS = O[ε2 ] (Eq. 21.177), we obtain PE − (f1S + f2S ) · vG = [fGS − (f1S + f2S )] · vG + O[ε2 ] = O[ε2 ].
(21.190)
Hence, once again, if ε 1, the term PE − (f1S + f2S ) · vG is negligible. [This is definitely true for the Sun–Earth–Moon system (Eq. 21.179).] Hence, we are back to the energy formulation of Subsection 21.7.3. Thus, we can say that the orbits of Earth and Moon around the center of mass are elliptical, which we assumed to be circular (definitely acceptable for the Sun–Earth–Moon system, see Hypothesis (4), p. 896). In other words, r remains constant in time, so that ω(t) = ω(0) and v = ω r = v(0). [Use r2 ω = C (Eq. 21.157).]
903
21. Dynamics of n-particle systems
• Concluding remarks In summary, for the Earth–Moon system, the effect of the presence of the Sun on the equations of momentum, angular momentum and energy is of order ε2 = (r/RG)2 , and can therefore be neglected. If we do this, we can say that, within the approximations introduced at the beginning of this appendix, the center of gravity of the planet–satellite system moves in a circular orbit around the star, and that the planet and the satellite move in a circular orbit around their center of gravity.
21.9 Appendix B. Potential energy for n particles
♠
In Subsection 21.6.3, we have presented the formulation for the potential energy of external and internal forces exchanged between two particles. The approach presented there for two particles is basically still valid in the presence of additional particles. Here, for the sake of completeness, we present the extension to n particles. For the sake of simplicity, here we limit ourselves to the time–independent formulation. [For the time–dependent potential fields, see Remark 149, p. 830.]
21.9.1 Conservative external forces
♠
Here, we consider external forces. For any differentiable function g(x1 , . . . , xn ) of several variables, we’ll use the notation introduced in Eq. 21.95, namely gradh g(x1 , . . . , xn ) :=
∂g ∂g ∂g ∂g i+ j+ k =: . ∂xh ∂yh ∂zh ∂xh
(21.191)
As in Eq. 21.96, we assume that for the external force acting on the h-th particle there exists a time–independent potential energy Uh (xh ) such that (using the notation introduced in Eq. 21.96) fhE (xh ) = −gradh UhE (xh ) = −
d UhE , dxh
(21.192)
where we have stated explicitly that Uh is only a function of xh , and not of xj , with j = k. Next, we introduce the potential energy for all the external forces as
904
Part III. Multivariate calculus and mechanics in three dimensions
UE(x1 , . . . , xn ) :=
n #
UkE (xk ).
(21.193)
k=1
Accordingly, we have (use Eqs. 21.192 and 21.193) gradh UE(x1 , . . . , xn ) =
∂ UE d UhE = = −fhE (xh ). ∂xh dxh
(21.194)
In addition, for WE(ta , tb ), defined in Eq. 21.79, we have (in analogy with Eq. 21.109) WE(ta , tb ) := =−
n / #
k=1 n + #
Lk (ta ,tb )
fkE · dxk = −
n / # k=1
Lk (ta ,tb )
d UkE · dxk d xk
, UkE xk (tb ) − UkE xk (ta ) = UE(ta ) − UE(tb ),
(21.195)
k=1
where we have set UE(t) = UE[x1 (t), . . . , xn (t)]. Again, we obtain that time– independent potential fields are conservative. Finally, Eq. 21.194 implies d UE = −PE, dt
(21.196)
where PE =
n #
fhE · vh
(21.197)
h=1
denotes the power generated by the forces fhE (h = 1, . . . , n). Indeed, we have n n # d UE # d UhE dxh = =− · fhE · vh = −PE. dt dxh dt h=1
(21.198)
h=1
21.9.2 Conservative internal forces
♠
Here, we consider internal forces. Note that, given two particles, say Ph and Pk , we have (in analogy with Eq. 21.106) gradh rhk =
xh − x k xk − x h =: ehk = −ekh = − = −gradk rhk . (21.199) rhk rhk
905
21. Dynamics of n-particle systems
Next, assume again that we are dealing with time–independent potential forces and that there exists a function Uhk (rhk ), with rhk = xh − xk , such that (use Eq. 21.199) gradh Uhk (rhk ) =
d Uhk d Uhk gradh rhk = ehk = −fhk . drhk drhk
(21.200)
Thus, setting fhk = fhk ehk , the potential energy equals the primitive of −fhk . Of course, the Newton law of action and reaction (Eqs. 17.59 and 17.61), requires that Ukh (rhk ) := Uhk (rhk ).
(21.201)
Indeed, this yields (use Eq. 21.199) fkh = −gradk Ukh (rhk ) =
d Uhk ehk = −fhk ehk = −fhk . drhk
(21.202)
The internal force fhk is called potential, whereas Uhk (rhk ) is called the potential energy for the forces exchanged between particles h and k. Next, let us consider the work Whk done by the two internal forces, fhk and fkh , as xh moves along the trajectory Lh (ta , tb ), going from xh (ta ) to xh (tb ), whereas xk moves along the trajectory Lk (ta , tb ), going from xk (ta ) to xk (tb ). Using ehk · (dxh − dxk ) = drhk = drkh (analogous to Eq. 21.108 for two particles) and Ukh = Uhk (Eq. 21.201), we have / / Whk = fhk · dxh + fkh · dxk Lh (ta ,tb ) Lk (ta ,tb ) / / d Uhk d Ukh =− ehk · dxh + ekh · dxk dr hk Lh (ta ,tb ) Lk (ta ,tb ) drkh / rhk (tb ) d Uhk =− drhk = Uhk rhk (ta ) − Uhk rhk (tb ) , (21.203) dr hk rhk (ta ) where Uhk (rhk ) = Uhk (xh − xk ) := −
/
rhk rhk0
d Uhk drhk drhk
(21.204)
equals minus the work done by both forces from a reference location. Finally, let us introduce the definition UI(x1 , . . . , xn ) :=
n 1 # Uhk (xh − xk ), 2 h,k=1
(21.205)
906
Part III. Multivariate calculus and mechanics in three dimensions
where Uhk (xh − xk ) = Ukh (xh − xk ). [The factor 12 on the right side must be included, because each pair of particles appears twice, whereas it should be included only once. Moreover, the term corresponding to h = k is included for the sake of notational simplicity, but is assumed to vanish (Eq. 17.10).] "n Then, we have (use Eq. 21.200 and 21.202 and fhI = k=1 fhk , Eq. 21.2) gradh UI(x1 , . . . , xn ) = −
n #
fhk = −fhI .
(21.206)
k=1
Moreover, the work done by all the internal forces between ta and tb is given by (use Eq. 21.80, as well as Eqs. 21.203 and 21.205) WI(ta , tb ) = UI(ta ) − UI(tb ),
(21.207)
where we have set UI(t) = UI[x1 (t), . . . , xn (t)].
21.9.3 Energy conservation for conservative fields
♠
In analogy with Theorem 231, p. 882, on the conservative–field energy conservation for two–particles, we have the following Theorem 232 (Energy conservation for conservative fields). If all the fields (external and internal) are conservative, the sum of kinetic energy and potential energy remains constant in time T (t) + U(t) = constant,
(21.208)
U (t) = UE(t) + UI(t),
(21.209)
with
where UE(t) and UI(t) denote respectively the potential energy of external and internal forces. ◦ Proof : Combining T (tb )−T (ta ) = WE(ta , tb )+WI(ta , tb ) (Eq. 21.78), with Eqs. 21.195 and 21.207, we have T (tb ) − T (ta ) = UE(ta ) − UE(tb ) + UI(ta ) − UI(tb ), which is equivalent to Eq. 21.208.
(21.210)
Chapter 22
Dynamics in non–inertial frames
In this chapter, we examine how three–dimensional single–particle dynamics (Chapter 20) may be formulated in a frame of reference that is not inertial. We also present a conceptual experiment to determine whether a frame of reference is inertial or not.
• Overview of this chapter In Section 22.1, we consider two frames of reference, with a common point, say the origin O. We assume that one frame is inertial, whereas the other rotates with respect to the first around the origin, and hence is not inertial. Then, we show that in this case, at any given time, in the moving frame there exists an instantaneous axis of rotation, namely a straight line whose points do not move. Next, in Section 22.2, we introduce the Poisson formulas, which give the time derivative of the base vectors of the moving frame of reference, in terms of its angular velocity ω. These allow us to obtain, still for two frames with common origins, the relationship between the velocity of a point P for an observer in the fixed frame and the velocity of P for an observer in the moving frame, as well as the corresponding relationship for the acceleration. Next, in Section 22.3, we remove the assumption that the two origins coincide, thereby extending the formulation of Section 22.2 to a frame of reference in arbitrary motion with respect to an inertial one. Specifically, we obtain the expressions for the acceleration of P for an observer in the fixed frame, in terms of (i) the acceleration of P for an observer in the moving frame, (ii) the acceleration of the moving–frame origin, (iii) the so–called Euler acceleration, (iv) the so–called centripetal acceleration, and (v) the so–called Coriolis acceleration.
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_22
907
908
Part III. Multivariate calculus and mechanics in three dimensions
Then, in Section 22.4, in the same vein as in the d’Alembert principle of inertial forces (Eq. 20.58), we move some of the mass–times–acceleration terms to the right side of the equation and treat them as (apparent) inertial forces. All of us have already experienced (apparent) inertial forces in daily life. For instance, we all experienced the sensation of increase (decrease) in weight that occurs when an elevator suddenly starts to move upward (downward), or the (apparent) side force when we are in a car running in circles. As already pointed out in Remark 94, p. 457, these forces are only apparent, in the sense that they do not exist if we formulate the problem in an inertial frame of reference. They are caused by the fact that the frame of reference rigidly connected to the object we are in (elevator, car) is not inertial. Still in Section 22.4, several additional illustrative examples (qualitative and quantitative) are presented, including the Foucault pendulum. Finally, the study of the motion in these frames of reference is used in Section 22.5 to provide some clarification to the discussion presented in Subsection 11.2.1, where we addressed the question “What is an inertial frame of reference?” Indeed the best way to appreciate the reason for using inertial frames of reference is by discussing what happens if we were not doing so. This allows us to determine whether or frame is inertial or not, at least conceptually. We also have two appendices. In the first (Section 22.6), we present a fairly detailed analysis of tides. In the second (Section 22.7), we obtain the time derivative of the rotation matrix R(t).
22.1 Frames in relative motion with a common point Consider two frames of reference with a common origin O, the first of which is inertial, whereas the second moves in an arbitrary three–dimensional rigid– body rotation around O, making it non–inertial. We will refer to the first as the fixed frame, the second as the moving frame. Let ik denote the orthonormal base vectors in the fixed frame, and jk (t) the orthonormal base vectors in the moving frame. Then, for an arbitrary vector b, we introduce the following notation b = b1 i1 + b2 i2 + b3 i3 = b1 j1 + b2 j2 + b3 j3 .
(22.1)
A generic point P of the fixed space, a point P is identified by the vector x(t) = x1 (t) i1 + x2 (t) i2 + x3 (t) i3 , whereas in the moving space is identified by the vector
(22.2)
909
22. Dynamics in non–inertial frames
x(t) = x1 j1 (t) + x2 j2 (t) + x3 j3 (t),
(22.3)
with xk time–independent. Recall that (Eq. 15.23) ih · ik = δhk
and
jh · jk = δhk .
(22.4)
Accordingly, we have xh (t) = ih · x(t) = ih · j1 (t) x1 + ih · j2 (t) x2 + ih · j3 (t) x3 .
(22.5)
Thus, setting x(t) = x1 (t), x2 (t), x3 (t) T and x = x1 , x2 , x3 T , we have (as in Eq. 16.67) x(t) = R(t) x , where, in analogy with Eq. 15.106, R(t) is defined by ⎤ ⎡ i 1 · j1 i1 · j2 i1 · j3 R(t) = rhk = ih · jk = ⎣ i2 · j1 i2 · j2 i2 · j3 ⎦. i3 · j1 i3 · j2 i3 · j3
(22.6)
(22.7)
Note that the elements of the h-th row of R(t) coincide with the components of ih . Hence, the rows of R(t) are mutually orthonormal. Therefore, R(t) is an orthogonal matrix (Theorem 168, p. 642). [Similarly, the k-th column of R(t) consists of the components of jk .] Indeed, R(t) is the most general orthogonal matrix in three dimensions. We will refer to it as a rotation matrix. For future reference, note that, if jk (0) = ik , then Eq. 22.6 implies R(0) = I, and x coincides with the value x0 = x(0). In the remainder of this section, first we show that, for any finite rotation R(t), there exists an axis of rotation, namely a line that at time t has the same position it had at time t = 0. [Of course, this line varies with t.] Then, we introduce the notion of instantaneous axis of rotation, which is used in Section 22.2 to obtain the Poisson formulas, which give the expression of the time derivatives of the base vectors jk (t). • Axis of rotation For a generic rotation, we have the following Theorem 233 (Axis of rotation). Let R denote an orthogonal matrix that represents a three–dimensional finite–amplitude rigid–body rotation around
910
Part III. Multivariate calculus and mechanics in three dimensions
a fixed point, say the origin O. Then, there exists a unique line through O (called the axis of rotation) that does not move. ◦ Proof : To prove this, it is sufficient to show that there exists a nontrivial solution z = 0 such that R z = z,
(22.8)
and that the solution is uniquely defined (except for a multiplicative constant of course). [For, this implies that any point on the line x = α z is not affected by the rotation.] We begin by showing that Eq. 22.8 admits a nontrivial solution, namely that |R − I| vanishes. To this end, note that
⎡ ⎤ ⎡ ⎤
i1 · i1 i1 · i2 i1 · i3
i1 · j1 i1 · j2 i1 · j3
R − I = ⎣i2 · j1 i2 · j2 i2 · j3 ⎦ − ⎣i2 · i1 i2 · i2 i2 · i3 ⎦
i3 · j1 i3 · j2 i3 · j3 i3 · i1 i 3 · i2 i3 · i3
i1 · (j1 − i1 ) i1 · (j2 − i2 ) i1 · (j3 − i3 )
=
i2 · (j1 − i1 ) i2 · (j2 − i2 ) i2 · (j3 − i3 )
. (22.9)
i3 · (j1 − i1 ) i3 · (j2 − i2 ) i3 · (j3 − i3 ) [Hint: For the first equality, use R = [ih ·jk ] (Eq. 22.7), and I = [δhk ] = [ih ·ik ] (Eq. 22.4). For the second equality, use [bhk ] + [chk ] = [bhk + chk ] (Eq. 3.16).] Next, note that the k-th column comprises the components of jk − ik in the inertial frame. Therefore, we have
R − I = (j1 − i1 ) · (j2 − i2 ) × (j3 − i3 ) (22.10) = j1 · j2 × j3 − i1 · j2 × j3 − i2 · j3 × j1 − i3 · j1 × j2 + j 1 · i2 × i3 + j2 · i3 × i1 + j3 · i1 × i2 − i1 · i2 × i3 = 1 − i1 · j1 − i2 · j2 − i3 · j3 + j1 · i1 + j2 · i2 + j3 · i3 − 1 = 0. [Hint: For the first equality, use the expression for the triple scalar product in terms of a determinant involving Cartesian components (Eq. 15.95). For the second equality, expand the products. For the third equality, use i1 × i2 = i3 , i2 × i3 = i1 and i3 × i1 = i2 (Eq. 15.73) and similar ones for the jk ’s.] Regarding the uniqueness of the axis, note that, if a space has two lines that do not move, the whole space is not moving. Accordingly, we cannot have more than one non–trivial solution to Eq. 22.8 (axis of rotation), unless R = I, in which case the moving frame of reference is not moving! ◦ Comment. It may be noted that a three–dimensional rotation around a given axis may be conceived as a two–dimensional rotation, in the sense that, if we introduce a new orthonormal basis with one of the base vector (say, i1 ) in the direction of z, the matrix R assumes necessarily the form
911
22. Dynamics in non–inertial frames
⎡
⎤
1 0 0 R1 = ⎣ 0 cos θ − sin θ ⎦. 0 sin θ cos θ
(22.11)
[Hint: Indeed, in such a frame of reference, we have z = 1, 0, 0 T . Hence the relationship Rz = z determines the first column; the other two columns follow necessarily from the mutual orthonormality of columns of an orthogonal matrix (Theorem 168, p. 642). From a kinematic point of view, θ denotes the angle of rotation around the direction determined by z.]
• More on the uniqueness of the axis of rotation
♥
Here, we discuss the uniqueness of the axis of rotation by using an analytical approach. Assume that we have obtained an axis of rotation, say z. We want to address the possibility of a second one. Let us choose the x-axis in the direction of z (namely, of the axis of rotation that we obtained). Accordingly, we have z = [1, 0, 0]T , and R equals R1 (Eq. 22.11). Next, we want to find out whether ⎡ ⎤ 0 0 0 (22.12) R1 − I z = ⎣ 0 cos θ − 1 − sin θ ⎦z = 0 0 sin θ cos θ − 1 admits a second nontrivial solution. The last two equations in Eq. 22.12 contain only two unknowns and the corresponding determinant,
cos θ − 1 − sin θ 2 2
sin θ cos θ − 1 = (cos θ − 1) + sin θ = 2 (1 − cos θ), (22.13) equals zero only for θ = 2kπ, in which case cos θ = 1. Hence, for θ = 2kπ we have z2 = z3 = 0 and z = [1, 0, 0]T is the only nontrivial solution for θ = 0. [Looking at it from a different angle, if θ = 0, the rank of R1 − I is two.] On the other hand, if θ = 0, Eq. 22.11 yields R1 = I (no rotation), and hence any vector satisfies Eq. 22.8.
• Successive rotations Consider two successive rigid–body rotations. In mathematical terms, this may be expressed as follows: if xk = R(k←h) xh represents the rigid–body rotation from the configuration h to the configuration k, then, combining x1 = R(1←0) x0 and x2 = R(2←1) x1 , we have x2 = R(2←0) x0 , with
912
Part III. Multivariate calculus and mechanics in three dimensions
R(2←0) = R(2←1) R(1←0) .
(22.14)
◦ Comment. The matrix R(2←0) corresponds to a rigid–body rotation, with its own axis of rotation. Indeed, the product of orthogonal matrices is itself an orthogonal matrix (Theorem 167, p. 641). Note that, if x2 coincides with x0 , Eq. 22.14 yields R(0←1) R(1←0) = I. Therefore, we have (use R -1 = RT , Eq. 16.64) -1 T R(0←1) = R(1←0) = R(1←0) .
(22.15)
(22.16)
In plain language, the “return rotation matrix ” R(0←1) (namely the inverse of R(1←0) ) is the transpose of R(1←0) .
• Instantaneous axis of rotation Consider, in particular, the rotation R(t ←t) , between times t and t = t + Δt. In this case, Eq. 22.14 may be written as R(t ←t0 ) = R(t ←t) R(t←t0 ) , which yields (use Eq. 22.16) R(t ←t) = R(t ←t0 ) R(t0 ←t) .
(22.17)
The matrix R(t ←t) corresponds to a rigid–body rotation, with its own axis of rotation. [Again, the product of orthogonal matrices is itself an orthogonal matrix (Theorem 167, p. 641).] Accordingly, we have the following Definition 351 (Instantaneous axis of rotation). The limit, as t tends to t, of the axis of rotation of the matrix R(t ←t) is called the instantaneous axis of rotation. [Thus, the motion between times t and t + dt consists of a rigid–body rotation around the instantaneous axis of rotation at time t.]
22.2 Poisson formulas We have the following Theorem 234. Consider a vector c which is time–independent for an observer in the moving frame. Let eA denote a unit vector directed along the instantaneous axis of rotation, and oriented so as to see the rotation by an angle dθ > 0 to occur in the counterclockwise direction. Then, we have
913
22. Dynamics in non–inertial frames
dc = ω × c, dt
(22.18)
ω = ω eA,
(22.19)
where
with ω=
dθ . dt
(22.20)
◦ Proof : Let eθ denote a unit vector normal to eA and c, pointed like eA × c (Fig. 22.1.a), and χ denote the angle between eA and c (Fig. 22.1.b).
Fig. 22.1 Poisson formulas: (a) Top view; (b) Side view
Then, we have Δc dc Δθ := lim = lim c sin χ eθ = ω × c. Δt→0 Δt Δt→0 dt Δt
(22.21)
[Hint: For the second equality, use Δc cN Δθ eθ , with eθ normal to the plane determined by eA and c (Fig. 22.1.a, where eA is pointing towards your eyes), whereas cN = c sin χ (Fig. 22.1.b, where ω × c is pointing towards your eyes). For the third equality, use Eqs. 22.19 and 22.20, along with the definition of the cross product, namely eA × c = c sin χ eθ (Eqs. 15.58 and 15.59).] ◦ Comment. Note that, contrary to Eq. 21.121, here we don’t have ω · e = 0 (Eq. 21.122). As particular cases of Eq. 22.18, we have d jk = ω × jk dt
(k = 1, 2, 3),
(22.22)
914
Part III. Multivariate calculus and mechanics in three dimensions
which gives the time derivative of the base vectors jk (t) of the moving frame. These are called the Poisson formulas.1
22.2.1 Transport theorem We can now extend the results to a vector that is not seen as constant by an observer in the moving frame of reference. [We still assume O to be a point shared by the two frames.] We have the following Theorem 235 (Transport theorem). For a generic vector b, we have db δb = + ω × b, dt δt
(22.23)
where ω is defined in Eq. 22.19, whereas δb db db db := 1 j1 + 2 j2 + 3 j3 δt dt dt dt
(22.24)
will be referred to as the time derivative in the moving frame. For, is the variation perceived by observers in the moving frame, as they are not aware of the time–variation of jk . ◦ Proof : Setting b = b1 j1 +b2 j2 +b3 j3 , and using the derivative of a product, as well as Eq. 22.24, we have db d = b j1 + b2 j2 + b3 j3 ) dt dt 1 dj1 dj2 dj3 db db db = 1 j1 + 2 j2 + 3 j3 + b1 + b2 + b3 dt dt dt dt dt dt δb + b1 ω × j1 + b2 ω × j2 + b3 ω × j3 , = δt in agreement with Eq. 22.23.
(22.25)
◦ Warning. The notation δb/δt is commonly used in the flight–dynamics literature (e.g., in Etkin, Ref. [21], p. 88, Eq. 4.4.1). It is adopted here because of its convenience. 1 Sim´ eon Denis Poisson (1781–1840), was a French mathematician and physicist. His name is also connected with some items addressed in Vol. III, namely the Poisson equation ∇2 u = f (x), the Poisson ratio in the Hooke law for three–dimensional elasticity, the Poisson integral formula for the solution of the Dirichlet problem for the Laplace equation over a disk, and the Poisson integrals for the integral form of Helmholtz decomposition.
22. Dynamics in non–inertial frames
915
22.3 Frames of reference in arbitrary motion In this section, we generalize the results of the preceding one, by removing the assumption that the origins of the two frames of reference coincide. In particular, we obtain the expressions for velocity and acceleration for an observer in the fixed frame, in terms of velocity and acceleration for an observer in the moving frame. To begin with, note the difference between: (i) “a point of the moving frame” and (ii) “a point in the moving frame.” Specifically, we have the following Definition 352. A point of the moving frame is a point that is rigidly connected to such a frame. A point in the moving frame is a point that can move with respect to an observer in such a frame. First, we address the issue for a point of the moving frame (Subsection 22.3.1). Then, we address the issue for a point in the moving frame (Subsection 22.3.2).
22.3.1 Point of the moving frame Again, let ik and jk (t) (k = 1, 2, 3) denote respectively the orthonormal bases in the fixed and moving frames. In addition, let O and Q(t) denote the origins of the fixed and moving frames, respectively (Fig. 22.2).
Fig. 22.2 Moving frame
Consider a point P of the moving frame (again, a point that is rigidly connected to such a frame). Let x(t) denote the vector from O to P , xQ(t) the vector from O to Q, and z(t) the vector from Q to P (Fig. 22.2 again).
916
Part III. Multivariate calculus and mechanics in three dimensions
Accordingly, we have x(t) = xQ(t) + z(t).
(22.26)
Note that, for an observer in the moving frame of reference, z is a time– independent vector (Eq. 22.24). Thus, we have δz = 0, δt
(22.27)
and hence (use the transport theorem, Eq. 22.23) dz = ω × z. dt
(22.28)
[Alternatively, and more directly, use dc/dt = ω × c (Eq. 22.21).] We have the following Definition 353 (Transport velocity). The velocity of a point P of the moving frame with respect to the inertial frame, namely vT :=
dx , dt
(22.29)
will be referred to as the transport velocity, namely the velocity that the point has because it is being “transported” by the moving frame. We have (time differentiate Eq. 22.26, and use Eq. 22.28) vT = vQ + ω × z,
(22.30)
where vQ := dxQ/dt denotes the velocity of the point Q with respect to the inertial frame. We have the following Definition 354 (Transport acceleration). The acceleration of a point P of the moving frame with respect to the inertial frame, namely aT :=
dvT , dt
(22.31)
will be referred to as the transport acceleration. Time differentiating Eq. 22.30 and using ω ×(ω ×z) = −ω 2 zN (Eq. 15.98), as well as Eq. 22.28, we obtain aT = aQ +
dω × z + ω × (ω × z) = aQ + ω˙ × z − ω 2 zN, dt
(22.32)
22. Dynamics in non–inertial frames
917
where aQ := dvQ/dt is the acceleration of the point Q, whereas the term ω˙ × z due to the angular acceleration ω˙ is called the Euler acceleration, and finally ω × (ω × z) = −ω 2 zN is called the centripetal acceleration.2
22.3.2 Point in the moving frame Contrary to the preceding subsection, here P (t) denotes a point that moves with respect to the moving frame of reference. Let again ih and jk (t) denote, respectively: (i) the orthonormal basis of the fixed (inertial) frame with origin O, and (ii) the orthonormal basis of the moving frame with origin Q(t). Let: (i) x denote by the vector from the origin O to P (t), (ii) xQ the vector from O to Q, (iii) z the vector rigidly connected with the moving frame from Q to P0 = P (0) (location of P at t = 0). If P didn’t move with respect to the moving frame, at time t we would have x(t) = x0 (t) = xQ(t) + z(t). Accordingly, its displacement from P0 is given by (Fig. 22.3) u(t) := x(t) − x0 (t) = x(t) − xQ(t) + z(t) . (22.33)
Fig. 22.3 Point in the moving frame
Equation 22.33 may be written as x(t) = xQ(t) + z(t) + u(t) = xQ(t) + y(t),
(22.34)
y(t) := z(t) + u(t)
(22.35)
where
2
Centripetal comes from Latin: centrum (center) and petere (to pursue).
918
Part III. Multivariate calculus and mechanics in three dimensions
is the vector from Q to current location of P .
• Velocity of a point in the moving frame To obtain the velocity v with respect to the inertial frame (known as the absolute velocity), let us time differentiate Eq. 22.34. Recall that z(t) is constant in the moving frame, so that δz/δt = 0, and hence dz/dt = ω × z (Eq. 22.28), whereas du/dt = δu/δt + ω × u (Eq. 22.23). This yields v = vQ + ω × (z + u) +
δu = vQ + ω × y + v R , δt
(22.36)
where vR :=
du1 δu du du = j1 + 2 j2 + 3 j3 δt dt dt dt
(22.37)
is the so–called relative velocity, namely the velocity perceived by a moving– frame observer, who is not aware of the motion of the moving frame. Equation 22.36 may be written in a more appealing form as v = vT + v R ,
(22.38)
vT := vQ + ω × (z + u) = vQ + ω × y.
(22.39)
where
This is the velocity of the point of the moving frame that coincides with P at time t (Eq. 22.30), namely the velocity that P would have at that time, were it not to move with respect to the observer in the moving frame. Accordingly, we will refer to vT as the transport velocity, namely the velocity that the point P would have if it was merely “transported” by the moving frame. ◦ Comment. You might have noticed that I have given you two definitions of the transport velocity vT, namely vT = vQ + ω × z (Eq. 22.30), and vT = vQ + ω × y, where y = z + u (Eq. 22.39). In both cases, we are talking about the velocity of a point of the moving frame of reference. Hence, in both cases the term “transport velocity” is appropriate. Indeed, in Subsection 22.3.1, z and y coincide because u = 0. Hence, the new definition (Eq. 22.39) includes the old one (Eq. 22.30) as a particular case (namely with u = 0).
919
22. Dynamics in non–inertial frames
• Acceleration of point in moving frame To obtain the acceleration a with respect to the inertial frame (known as the absolute acceleration), let us time differentiate Eq. 22.38. Then, we obtain dvR dvQ dv d = + ω×y + dt dt dt dt δvR dy dω = aQ + ×y+ω× + ω × vR + dt dt δt dω 2 × y − ω y N + 2 ω × v R + aR . = aQ + dt
a =
(22.40)
[Hint: Use dy/dt = ω × y + δy/δt (transport theorem, Eq. 22.23) and then ω × (ω × y) = −ω 2 yN (Eq. 15.98), where yN denotes the portion of y normal to ω.] In the above equation, aR :=
δvR δt
(22.41)
is called the relative acceleration, namely the acceleration perceived by an observer in the moving frame, who is not aware of the moving–frame motion. Equation 22.40 may be written in a more appealing form as a = aT + aCor + aR,
(22.42)
aT = aQ + ω˙ × y − ω 2 yN,
(22.43)
where
is called the transport acceleration. This is the acceleration of the point of the moving frame that coincides with P at that given time, namely the acceleration that the point P would have if it was merely “transported” by the moving frame. [The considerations regarding the double definition of the transport velocity apply to the transport acceleration as well. Indeed, the difference between the two expressions, namely aT = aQ + ω˙ × z − ω 2 zN (Eq. 22.32) and aT = aQ + ω˙ × y − ω 2 yN (Eq. 22.43), disappears when u = 0.] On the other hand, aCor, defined by aCor := 2 ω × vR,
(22.44)
is called the Coriolis acceleration.3 3 Named after Gaspard–Gustave de Coriolis (1792–1843), a French mathematician, mechanical engineer and scientist.
920
Part III. Multivariate calculus and mechanics in three dimensions
22.4 Apparent inertial forces Sometimes it is convenient to study a problem in a moving frame of reference that is not inertial. This is accomplished by combining the Newton second law ma = f (Eq. 20.3) with Eq. 22.42 to obtain (in analogy with the d’Alembert principle of inertial forces, Eq. 20.58) m aR = f + fT + fCor,
(22.45)
where fT := −m aT and fCor := −m aCor, namely (Eq. 22.43 and 22.44) fT = −m aQ + ω˙ × y − ω 2 yN and fCor = −2 m ω × vR, (22.46) are called the (apparent) inertial forces, specifically the transport force and the Coriolis force, respectively. The terms −m ω˙ × y and m ω 2 yN in fT are called the Euler force and the centrifugal force, respectively.4 That said, fT = −m aT and fCor = −m aCor may be treated as if they were actual forces acting on the particle. ◦ Comment. Note that the Coriolis force is perpendicular to the relative velocity, and hence the corresponding power, fCor · vR, vanishes. In the following subsections, we present some applications. Specifically, in Subsection 22.4.1 we begin with some qualitative illustrative examples, just to give you a gut–level feeling of the phenomenon. Then, we consider more mathematically sophisticated applications that should help you in understanding and appreciating the new point of view. Specifically, in Subsection 22.4.2, we revisit the relationship between gravity and gravitation, already discussed in Subsection 20.8.1. Next, in Subsection 22.4.3, we study a spring–connected particle constrained to move along a wire that spins around a vertical axis. Then, in Subsection 22.4.4, we address a planar mass–spring system that is spinning around an axis normal to the plane of the system. Finally, in Subsection 22.4.6, we discuss the Foucault pendulum, which provides evidence of the rotation of the Earth. [In this book, however, the most interesting application of the new formulation regards the analysis of the tides. In view of the fact that it is quite lengthy, to streamline the flow of the presentation, this analysis is presented in Appendix A (Section 22.6).] 4
Centrifugal comes from Latin: centrum (center) and fugere (to flee, to run away).
22. Dynamics in non–inertial frames
921
22.4.1 Some qualitative examples We begin by noting that the moving frame might move in uniform translation. In this case, we have aQ = ω = 0, and hence aT = aCor = 0. This confirms that the Newton second law is invariant to Galilean transformations (namely, it is the same for all inertial frames of reference). Accordingly, here, we assume that the moving frame is non–inertial. Everyday examples of apparent forces (namely transport and Coriolis forces) due to the fact that the frame of reference is non–inertial abound. Some of them were mentioned in the introduction to this chapter, such as the elevator and the car running in circles. Here, we present some additional qualitative illustrative examples that do not require much mathematical modeling.
• Memories. My rotor–ride experience Here, is a great example of an apparent force that you have experienced if you have ever been on a rotor–ride at a carnival. If you have not, a rotor–ride consists of a cylindrical chamber, large enough to accommodate about twenty people. Initially, you are simply standing inside the cylindrical chamber, with your back against its wall. Then, the cylinder starts to rotate around its (vertical) axis. The resulting centrifugal force “flattens” you against its wall, along with all the people inside the chamber. At some point, the centrifugal becomes so strong that when the pavement lowers you remain hanging from the wall – no hook required. Then (just to make things more exciting!), after the cylinder reaches its full speed, the axis around which the cylinder spins is tilted from vertical to horizontal. The centrifugal force is so high that you remain stuck against the wall — you cannot even move. You can find videos on the web – look for “rotor ride”. When my daughters were still in grammar school, I took them to a carnival, in Lexington, MA. They definitely wanted to go on the carnival rotor. So, I went with them ... and I got sick, really sick. They told me my face was green. I even paid to be subject to such torture! My daughters loved it, though!
• Apparent force due to aQ Here, we consider apparent forces due solely to aQ. Specifically, we assume ω = 0, but aQ = 0. Hence, the only apparent force is −maQ. To give you
922
Part III. Multivariate calculus and mechanics in three dimensions
an example, consider again the apparent change in weight that occurs due to an acceleration (or deceleration) of an elevator. This is due to the apparent force −maQ. Particularly clarifying is the specific case in which the elevator is subject to a downwards acceleration exactly equal to the acceleration of gravity (as in a free fall in a vacuum, Subsection 11.5.1). In this case, if you are inside the elevator, you feel weightless. In order for you to better appreciate this, let’s make things a bit more dramatic. Assume that the cables holding the elevator and the counterweight break and, at the same time, the emergency brakes fail. The only force acting on the elevator is gravity (the friction and the air drag may be assumed to be negligible for the type of analysis we are interested in here). The elevator is in a free fall — the acceleration is equal to the gravity acceleration g. There are people in that elevator (how unlucky can one get), who are subject to gravity and to the (apparent) inertial force, −m a. These two forces are equal and opposite and the corresponding resultant force equals zero. They experience what it means to be weightless! Not for long though! This is similar to the sensation of an astronaut in a space capsule that is moving along a straight line (in the inertial frame) towards the center of the Earth (this never happened, to the best of my knowledge). The capsule is only subject to gravitation (no air). This causes the capsule to have an acceleration equal to the gravitational force divided by the mass of the capsule, including its content. Thus, in a frame of reference connected to the capsule, the gravitational force and the (apparent) inertial force cancel each other and the astronaut feels absolutely weightless. As another example, consider some astronauts in a space station that moves, without rotation, along a circular orbit around the Earth. [The fact that there is no rotation implies that a porthole will always face the same distant star.] Let us consider a non–rotating moving frame of reference, with origin at the center of mass of the space station. In this case, the only apparent force is −m aQ = m v 2 r/r2 (lateral acceleration, Eq. 20.32). This is balanced exactly by the gravitational force (Eq. 20.28), at the center of mass G of the space station. The astronauts feel weightless, provided that they are very close to G. [If they are not near G, the situation is more complicated, and I suggest that you hold your breath, as a similar problem is addressed in Appendix A (Section 22.6), where we study the tides.] ◦ Comment. If we want to be picky, we should include the solar gravitation and the apparent forces due to the motion of the Earth around the Sun. Again, they offset the inertial force — exactly only at the center of mass of the space station. The same holds true for all the other celestial bodies.
22. Dynamics in non–inertial frames
923
• Centrifugal force and Euler force Here, we present illustrative examples of the centrifugal and Euler forces. Let us consider one more astronaut in a space station, in a geostationary (circular) orbit (of course, on the equatorial plane, Definition 338, p. 842). This time, we assume that the space station is rotating and that the porthole is always facing the Earth. In this case, it is convenient to formulate the problem in a frame of reference rigidly connected to the Earth, with origin at the center of the Earth. The space station does not move with respect to such a frame. In the present case, we have yN = r, and hence the centrifugal force may be expressed as f = m ω 2 r. At the center of mass G of the space station, this inertial force balances the gravitational force. In such a location, the astronaut feels weightless again. Want more? Here it is! The side force you experience when your car is running in circles is also due to the term m ω 2 r. [This is a case where longitudinal inertial force and centrifugal force are equally applicable, with v = ω r (Remarks 145, p. 813). A more detailed analysis of this issue is presented in Remark 161 below).] The same apparent force is involved if you hold a child by the hands and spin him/her around a vertical axis. In addition, if you sharply increase your angular velocity, the child lags behind. This is due to the Euler force −m ω˙ × r, which acts along the circumferential direction. The same considerations hold true if you fill up a beach bucket with water and spin it around, by holding it with a twine. The water stays inside the bucket because of the centrifugal force that shows up in a frame of reference that spins with you. ◦ Comment. Since we are at it, I’d like to ask you: “What happens to the bucket if, at time t = t0 , you let go of the twine? (i) Does it go radially? (ii) Does it take off tangentially? (iii) Does it go somewhere in between the two?” Every time I asked this question to my sophomore class in mechanics, most of the students gave me the wrong answer, typically the last one. The correct answer is the first one. Indeed, as you let go of the rope, the bucket is no longer subject to the force exerted by the rope. If we assume the effect of gravity and air drag to be negligible, the bucket is not subject to any force, and hence it travels with a constant velocity (namely at a constant speed, along a straight path), which is acceptable for short time intervals. The velocity is determined by the condition at the time t0 . Accordingly, the bucket will travel along the tangent to the circle described before t0 , with the speed it had at t = t0 . [The same considerations apply to hammer throwing.] Remark 161 (An important observation). As pointed out in Remark 145, p. 813, in certain cases the approach based upon centrifugal force and that based upon lateral inertial force are indistinguishable from each other.
924
Part III. Multivariate calculus and mechanics in three dimensions
For instance, this is true for the analysis of a person in the rotor–ride, and for the astronauts in the space station (in both cases we may use either the lateral acceleration, or the centripetal acceleration). [Another clarifying example is related to the difference between gravity and gravitation, which will be addressed in Subsection 22.4.2.] Similarly, we may compare the present point of view to that in Subsection 20.4.1, regarding a car on a turn. There, we were discussing the lateral force v 2 /R, here the centrifugal force ω 2 R. If the car is moving in circles, we have v = ω R and the two expressions give the same result. Indeed, you can always come up (not easily, sometimes) with a frame of reference rigidly connected to the car, and origin at the (instantaneous) center of the trajectory. This way, you can always use the centrifugal acceleration approach. However, I will avoid this. Indeed, I believe that it is important to distinguish the two types of approach. Specifically, note that the lateral–force approach can be used if, and only if, we are dealing with a point particle. Accordingly, my recommendation is to use the centrifugal– force approach only in the other cases. For instance, in the problem addressed in Subsection 20.4.1, the car is treated as a particle and we are interested in the dynamics of such a point. We found that such a particle is subject to an apparent lateral force equal to v 2 /R. There is no force field — only a force applied to a material point; ω does not even come into the picture — it would have to be introduced by setting ω = v/R. On the other hand, the centrifugal–force approach may be more convenient when the problem of the turning car (or the rotor–ride) is not dealt with as a single point, but as a region in space. In this case, we experience an apparent force field ω 2 R, because the frame of reference connected to such a space is not inertial. Any object, like a mosquito (and even the air itself), is subject to such an apparent force. [The distinction will be important in Subsection 22.4.5 (on the spherical pendulum, again), where the centrifugal force and the longitudinal acceleration are used to address two unrelated aspects of the same problem.]
• Coriolis force Finally, consider the Coriolis force. This force is the reason why, if you drop an iron ball from a high place (say, the tower of Pisa), it will tend to move eastward with respect to the vertical. This may be explained by observing the phenomenon from an inertial frame of reference. Specifically, the transport velocity due to the Earth rotation is higher at the top of the tower and lower at the bottom, and hence the momentum of the ball due to the rotation of the Earth makes it go eastward. [You might like to verify this fact by using the mathematical results presented above.]
22. Dynamics in non–inertial frames
925
◦ Northbound and southbound trains. A similar explanation holds for northbound trains. Let ω denote the angular velocity of the Earth. In the northern hemisphere, they tend to move to the right. Indeed, in this case, the Coriolis force, −m 2 ω × vR, points east. [At a gut level, in the northern hemisphere the distance from the axis of rotation of the Earth decreases by traveling north, causing the train to want to move east.] Still in the northern hemisphere, if the train were to travel southbound, it would still tend to move to its right, since in this case the Coriolis force points west. In the southern hemisphere, the effects are opposite to those in the northern hemisphere, since the component of vR normal to ω changes sign. [Note that, for northbound or southbound trains, the Coriolis force vanishes at the equator, where ω and vR are parallel.] ◦ E¨ otv¨ os effect. On the other hand, at the equator, for an eastbound train, the Coriolis force would be pointing upwards (namely along the vertical on the equator). Its apparent weight would decrease. [Similarly, the apparent weight of a westbound train would increase.] This is known as the E¨otv¨ os effect.5 Let us look at this from two different points of view. Consider first the Earth frame of reference. The centripetal, Coriolis and relative accelerations are all directed along the vertical (downwards), as you may verify. The sum of their components in that direction is given by a = Ω 2 R + 2 Ω v + v 2 /R = (Ω + v/R)2 R,
(22.47)
where Ω is the angular speed of the Earth and v is the speed of the train (positive because the train is eastbound), as you may verify. [Hint: The relative acceleration is given by aR = aR = v 2 /R, where v = vR (recall the expression of the lateral acceleration v 2 n/R, Eq. 20.32). The second term on the right side of Eq. 22.47 is the Coriolis acceleration.] Next, consider a frame of reference that rotates around the Earth axis, with angular speed equal ω1 = Ω + v/R. In this frame of reference, our train does not move. Now we have only the centrifugal, which is given by (Ω + v/R)2 R, in agreement with Eq. 22.47. [Of course, such an effect occurs at other latitudes as well. However, the corresponding force is not normal to the Earth surface, but normal to the Earth axis of rotation.] ◦ Trade winds. As a final example, let us discuss trade winds, the prevailing westerly (easterly) winds in the equatorial (tropical) regions, which have been used by sailing ships to cross the oceans for centuries. Let me begin with some background material. At the equator, the high temperature causes the air to rise. After rising, having nowhere else to go, the airflow splits up, with part of it going toward the tropic of Cancer and part towards the tropic of Capricorn. At the tropics, for reasons too complicated to address here, the 5
Named after the Hungarian physicist L´ or´ and E¨ otv¨ os (1848–1919).
926
Part III. Multivariate calculus and mechanics in three dimensions
air moves back to the Earth. [Consider it an experimental fact.] Then, the air returns to the equator. In the northern hemisphere, this implies northbound winds in the higher atmosphere (from the equator to the Tropic of Cancer), and southbound winds in the lower atmosphere. Let us focus on the latter. The Coriolis force in this case is pointed westwards. Hence, these winds bend toward the west (like a southbound train). This effect is compounded by the fact that, as they approach the equator, they encounter the corresponding winds from the southern hemisphere, also westbound. The two streams are forced to move along the equator. [In the northern hemisphere, from an observer in an inertial frame of reference, a wind moving purely southwards with respect to the Earth, has an absolute velocity with an eastward component, due to the rotation of the Earth. Also, as a point on the Earth surface moves south, the eastward component of its velocity increases. Correspondingly the wind “lags behind,” namely acquires a westward component with respect to the Earth.]
22.4.2 Gravity vs gravitation revisited In Subsection 20.8.1, we have defined the force of gravity as given by the expression fW (x) := fG(x) + m Ω 2 R eR (Eq. 20.173). This equation indicates that gravity is a combination of the force of gravitation, fG, and the apparent lateral force. Indeed, the explanation given in Subsection 20.8.1 is that a point P of the Earth travels along a circular path for an observer in an inertial frame. However, we can also interpret the term m Ω 2 R eR as due to the centrifugal force. Indeed, we can also see P as a fixed point for an observer in the Earth frame. [The situation is another good illustrative example for Remark 161, p. 923, regarding the centripetal vs lateral force approaches.] Within this second approach, a particle at a point P is subject to the centrifugal force at P , given by (Eq. 22.46) fC = m Ω 2 zN ,
(22.48)
where Ω = ω is the angular speed of the Earth, whereas zN is the portion of z normal to ω, namely the vector normal to the axis of the Earth that connects the axis of the Earth to P . Using zN = R eR (where eR = zN/R is the unit vector in direction zN, and R = zN is the distance of P from the Earth axis), we obtain that the centrifugal force is given by fC = m Ω 2 R eR (Eq. 22.46), so that
927
22. Dynamics in non–inertial frames
fW (x) = fG(x) + m Ω 2 R eR,
(22.49)
in agreement with Eq. 20.173.
22.4.3 Mass on a spinning wire
♥
Consider a particle constrained to move along a straight frictionless wire (like an abacus bead), and subject to: (i) its own weight and (ii) a spring force. Assume that the wire is rotating around a vertical axis, with angular speed equal to Ω. We may study the dynamics of the particle in a frame of reference that rotates with the wire, namely in the plane (y1 , y3 ) depicted in Fig. 22.4. In this frame of reference, the particle is subject to its own weight fW = −m g i3 , the spring force fS, the reaction exerted by the wire fR (normal to the frictionless wire) and two apparent forces (the centrifugal fC, the Coriolis fCor).
Fig. 22.4 Mass on a spinning wire: (a) the setup; (b) the force diagram
Let us begin with the forces normal to the plane of Fig. 22.4. Since the relative acceleration component in such a direction vanishes, the Coriolis force, which is normal to the plane of the figure, is balanced by the portion of the wire reaction fR normal to the plane. [Figure 22.4 only shows the portion of fR in the plane of the figure.] The corresponding equation may be used to evaluate the component of the wire reaction normal to the plane of the figure. Next, consider the balance of the components normal to the wire in the plane of the figure. These include the components of: (i) weight, (ii) centrifugal force, and (iii) reaction of the wire. They balance each other, again because the relative acceleration component in such a direction also van-
928
Part III. Multivariate calculus and mechanics in three dimensions
ishes. Again, the corresponding equation may be used to evaluate the other component of the wire reaction, namely that in the plane of the figure. Finally, consider the components along the wire. Note that the centrifugal force is given by (Eq. 22.46) fC = m Ω 2 y N ,
(22.50)
where y denotes the vector from Q to P , whereas yN denotes its portion normal to the vertical axis. We have yN = y cos α, where y denotes the abscissa along the wire, and α (positive counterclockwise) denotes the angle between y and the horizontal plane. Accordingly, the projection of the centrifugal force along the wire is mΩ 2 yN cos α = mΩ 2 y cos2 α. Thus, the governing equation divided by m reads d2 y = Ω 2 y cos2 α − g sin α − ˚ ω 2 (y − yU), dt2
(22.51)
where yU denotes the value of y where the spring is unstretched, whereas ˚ ω = κ/m is the natural frequency of the system for Ω = 0, namely in the absence of the centrifugal force. To make our life simple, let us begin by considering what happens if α = yU = 0. Then, Eq. 22.51 yields d2 y 2 + ˚ ω − Ω 2 y = 0, 2 dt
(22.52)
The solution is stable for Ω < ˚ ω and unstable for Ω > ˚ ω (Section 14.8). Next, let us remove the assumptions α = yU = 0. Then, it is convenient to find first the equilibrium position, yE, which is obtained from Eq. 22.51 by setting y¨ = 0. This yields Ω 2 yE cos2 α − g sin α − ˚ ω 2 (yE − yU) = 0,
(22.53)
or yE = (˚ ω 2 yU − g sin α)/(˚ ω 2 − Ω 2 cos2 α). Next, we set y(t) = yE + u(t), substitute into Eq. 22.51 and use Eq. 22.53. This yields d2 u 2 + ˚ ω − Ω 2 cos2 α u = 0. 2 dt
(22.54)
The solution is stable for Ω cos α < ˚ ω and unstable for Ω cos α > ˚ ω (Section 14.8 again). [Note, however, that at the stability boundary, namely for Ω cos α = ˚ ω , we have yE = ±∞.]
929
22. Dynamics in non–inertial frames
22.4.4 A spinning mass–spring system
♥
Consider a mass subject to a system of springs as shown in Fig. 22.5 and constrained to move, without friction, over a horizontal plane, say the (x1 , x2 ) plane. Assume that such a system is spinning around the vertical x3 -axis, with constant angular velocity ω = Ω j3 . Consider the frame of reference (y1 , y2 ) that is rotating with the mass–spring support.
Fig. 22.5 Spinning system
Recall that the transport force and the Coriolis force are given, respectively, by fT = −m (aQ − Ω 2 yN + ω˙ × y) = m Ω 2 y and fCor = −2 m ω × vR (Eq. 22.46). Then, Eq. 22.45 (m aR = f + fT + fCor) yields m aR = fS − m g j3 + m Ω 2 y − 2 m ω × vR + fR j3 ,
(22.55)
where fS is the spring resultant force and fR j3 is the constraint reaction. As it could have been anticipated, the j3 -component of the above equation yields the constraint reaction fR = m g. Next, consider the in–plane components of the above equation. Let y1 and y2 denote the components of y in the rotating frame, and assume the spring forces to be linear and given by (again in the rotating frame) fSh = −k yh , where k = 2 κ, κ being the stiffness of each spring. Then, dividing Eq. 22.55 by m, and recalling that vR = δy/δt (Eq. 22.41) and aR = δvR/δt (Eq. 22.37), one obtains the two linearized scalar equations governing the dynamics of the system under consideration, namely d2 y1 2 + ˚ ω − Ω 2 y1 − 2 Ω 2 dt d2 y2 2 + ˚ ω − Ω 2 y2 + 2 Ω 2 dt
dy2 = 0, dt dy1 = 0, dt
(22.56)
930
Part III. Multivariate calculus and mechanics in three dimensions
where ˚ ω = k/m is the natural frequency of the motion in either direction (j1 and j2 ), for Ω = 0. ◦ Warning. Why did I say linearized ? Because I have not included the contribution to fS due to the lateral motion, namely the type of force discussed in 17.4.4 on the linearized formulation of the dynamics of chains of spring– connected particles. Accordingly, we can infer that we have tacitly assumed that there is no initial tension in the springs (see Eq. 17.46, with TS = 0). Indeed, in such a case, the contribution due to the lateral motion is nonlinear, and hence is not included in Eq. 22.56, because we are only interested in the linearized equations. [You might like to consider the case TS > 0. How does this affect the linearized equations? Not much, as you may verify.]
• Solution Let us try a solution of the type y1 (t) = A cos(ωt + χ)
and
y2 (t) = B sin(ωt + χ).
(22.57)
Combining with Eq. 22.56, and noting that the first equation contains only terms multiplied by cos(ω t + χ), whereas the second contains only terms multiplied by sin(ω t + χ), one obtains 2 2 −ω + ˚ −2 ω Ω A 0 ω − Ω2 = . (22.58) B 0 −2 ω Ω −ω 2 + ˚ ω2 − Ω 2 The above is a system of two homogeneous linear algebraic equations with two unknowns, A and B. The solution differs from the trivial solution A = B = 0 if, and only if, the determinant equals zero, that is, 2 − ω 2 + (˚ ω 2 − Ω 2 ) − (2 ω Ω)2 = 0. (22.59) Using −2 ω 2 (˚ ω 2 −Ω 2 )−(2 ω Ω)2 = −2 (˚ ω 2 +Ω 2 )ω 2 , the above equation yields 2 2 2 ω + Ω 2 ω2 + ˚ ω − Ω 2 = 0. ω4 − 2 ˚
(22.60)
The above is a quadratic algebraic equation for the unknown ω 2 . Using z± = −b/2 ± b2 /4 − c (Eq. 6.36, with a = 1), we have that its roots are . 2 2 2 =˚ ω2 + Ω 2 ± ˚ ω2 + Ω 2 − ˚ ω2 − Ω 2 , (22.61) ω± 2 =˚ ω2 + Ω 2 ± 2 ˚ ω Ω, or namely ω±
ω ± Ω , ω± = ˚
(22.62)
931
22. Dynamics in non–inertial frames
that is, ω± = ˚ ω±Ω >0
(Ω < ˚ ω ),
(22.63)
ω± = Ω ± ˚ ω>0
(Ω > ˚ ω ).
(22.64)
Accordingly, the solution proposed in Eq. 22.57 is valid for any Ω = ˚ ω . [In the following, we limit ourselves to Ω = ˚ ω . The case Ω = ˚ ω is mathematically more complicated and is addressed in Vol. II.]
• The ratio A/B Here, we want to find the ratio A/B. This may be obtained from either one of the two equations in Eq. 22.58, since the two equations are linearly dependent, because the determinant equals zero. Using the first, we have 2 − ω± +˚ ω 2 − Ω 2 A± = 2 ω±Ω B±. (22.65) From Eq. 22.59 we see that the two coefficients have the same absolute value. Therefore, we only have to determine their signs. Let us address first the case Ω < ˚ ω , for which we have ω± = ˚ ω ± Ω (Eq. 22.63). Then, 2 −ω± +˚ ω 2 − Ω 2 = −(˚ ω ± Ω)2 + ˚ ω 2 − Ω 2 = ∓2˚ ω Ω − 2Ω 2 = ∓2 ω±Ω. (22.66)
Substituting the above expression into Eq. 22.65, we obtain B± = ∓ A±.
(22.67)
Therefore, recalling Eq. 22.57 and combining the two particular solutions, we have y1 (t) = A− cos(ω−t + χ−) + A+ cos(ω+t + χ+), y2 (t) = A− sin(ω−t + χ−) − A+ sin(ω+t + χ+).
(22.68)
The four constants A+, A−, χ+ and χ− are to be determined through the four initial conditions y1 (0) = y10 , y˙ 1 (0) = v10 , y2 (0) = y20 and y˙ 2 (0) = v20 . ω Finally, let us consider the case Ω > ˚ ω , for which we have ω± = Ω ± ˚ (Eq. 22.64). Then, 2 −ω± +˚ ω 2 − Ω 2 = −(Ω ± ˚ ω )2 + ˚ ω 2 − Ω 2 = ∓2˚ ω Ω − 2Ω 2 = −2 ω±Ω. (22.69)
Substituting the above expression into Eq. 22.65, we obtain B± = −A±. The remainder of the analysis is similar to that for Ω < ˚ ω , as you may verify.
932
Part III. Multivariate calculus and mechanics in three dimensions
Remark 162. It should be emphasized that the system is stable, at least as long as Ω = ˚ ω . This result is quite different from that for Eq. 22.52, which — you may note — may be obtained from the first of Eq. 22.56, by setting y1 = y and y2 = 0. Specifically, if the mass is constrained to move along a straight line (Eq. 22.52), the system is stable for Ω < ˚ ω , unstable for Ω > ˚ ω. On the other hand, if we remove the constraint and allow the mass to move over the whole plane (Eq. 22.56), the system becomes stable for all the values of Ω = ˚ ω.
• Analysis of the results Here, we address a gut–level interpretation of our results. For the sake of simplicity, let us limit our analysis to the case Ω < ˚ ω . To better understand the above equations, assume A− = −A+ = 1 and χ+ = χ− = 0, so that Eq. 22.68 yields y1 (t) = cos ω−t − cos ω+t, y2 (t) = sin ω−t + sin ω+t.
(22.70)
Incidentally, the above equations correspond to the following initial conditions: y1 (0) = y2 (0) = y˙ 1 (0) = 0
and
y˙ 2 (0) = ω− + ω+.
(22.71)
Next, recall the prosthaphaeresis formula (Eqs. 7.59 and 7.61), namely cos(α − β) − cos(α + β) = 2 sin α sin β, sin(α − β) + sin(α + β) = 2 sin α cos β.
(22.72)
Then, setting α = ˚ ω t and β = Ωt, we have α ± β = (˚ ω ± Ω)t = ω±t (Eq. 22.63), and Eq. 22.70 reduces to y1 (t) = 2 sin ˚ ω t sin Ωt, y2 (t) = 2 sin ˚ ω t cos Ωt.
(22.73)
◦ Comment. To interpret this result, let us consider the equations for rigid– body rotations (Eq. 7.3) y1 = ξ1 cos θ − ξ2 sin θ, y2 = ξ1 sin θ + ξ2 cos θ,
(22.74)
933
22. Dynamics in non–inertial frames
where θ > 0 corresponds to a counterclockwise rotation. Setting ξ1 = 0, ξ2 = 2 sin ˚ ω t and θ = −Ωt, Eq. 22.74 reduces to Eq. 22.73. Thus, for the initial conditions under consideration, the mass motion consists of oscillations with amplitude 2 sin ˚ ω t, along a ξ2 -line that rotates around the vertical axis, clockwise if Ω > 0. Next, note that ω = Ω j3 implies that the plane (y1 , y2 ) rotates counterclockwise if Ω > 0. Therefore, we obtain that: (i) the ξk -axes coincide with the (fixed) xk -axes, and (ii) the mass oscillates along the (fixed) x2 -line, with x2 (t) = 2 sin ˚ ω t. In other words, in an inertial frame of reference, the mass oscillates along the x2 -axis, with frequency ˚ ω . The rotation of the system does not affect the motion observed in the fixed frame. ◦ Comment. However, we have shown this to true only for the special initial conditions in Eq. 22.71. For different initial conditions, the motion is more complicated. You might like to study this for other sets of initial conditions.
22.4.5 Spherical pendulum, again
♥
Here, we consider again the exact equations that govern the motion of a spherical pendulum (Eqs. 20.82 and 20.84), which were obtained in Subsection 20.5.1, by using the angular momentum approach. I said, then, that these equations are easier to interpret by using the material in this chapter. To clarify this statement, let us consider the momentum equation for the problem divided by m, namely a = g + fR/m,
(22.75)
where fR is the force exerted by the bar (constraint reaction). Let us choose the moving frame so as to have the problem formulated in the rotating vertical plane, say P, that is hinged along the z-axis and contains the pendulum at all times. Specifically, let the origins O and Q of the two frames coincide with the hinge H (namely O = Q = H), and introduce a non–inertial left–handed orthonormal frame i, j, k, with: (i) k along the vertical (upwardly directed), (ii) i on the plane P, and directed horizontally, rightwards in the figure, and (iii) j = k × i (unit normal to the plane of the figure, pointing away from your eyes). Also, let t denote the trajectory tangent, pointing in the direction of growing θ (θ being the angle that the pendulum forms with the vertical, positive counterclockwise), and n the trajectory normal, which is directed towards H (Fig. 22.6). Using the formulation in the rotating frame i, j, k, we have (use Eq. 22.40, with aQ = 0, since the origin Q coincides with the hinge)
934
Part III. Multivariate calculus and mechanics in three dimensions
Fig. 22.6 Spherical pendulum
a = aR + ω˙ × y − ω 2 yN + 2 ω × vR,
(22.76)
where y = − n, vR = θ˙ t (Eq. 20.31) is the relative velocity, whereas aR is the relative acceleration. In addition, ω = ω k is the angular velocity of the plane P around the z-axis, and yN is the portion of y normal to ω. In the plane P, the pendulum moves along a circular path. Therefore, we may use the intrinsic acceleration components (Eq. 20.32), and obtain aR = s¨ t +
vR2
n = θ¨ t + θ˙2 n.
(22.77)
Furthermore, the Euler acceleration is given by (use k × n = − sin θ j) ω˙ × y = ω˙ k × (− n) = ω˙ sin θ j.
(22.78)
Moreover, the centripetal acceleration is given by (use i = cos θ t − sin θ n) −ω 2 yN = −ω 2 sin θ i = −ω 2 sin θ (cos θ t − sin θ n).
(22.79)
Finally, the Coriolis acceleration is given by (use k × t = cos θ j) 2 ω × vR = 2 ω k × θ˙ t = 2 ω θ˙ cos θ j.
(22.80)
Combining Eqs. 22.75–22.80, one obtains ( θ¨ t + θ˙2 n) + ω˙ sin θ j − ω 2 (sin θ cos θ t − sin2 θ n) + 2 ω θ˙ cos θ j = −g (sin θ t + cos θ n) + T n/m.
(22.81)
[Hint: Use g = −g k = −g (sin θ t + cos θ n) and set fR = T n.] Then, taking the components along the orthonormal basis t, j, n yields, respectively,
935
22. Dynamics in non–inertial frames
θ¨ − ω 2 sin θ cos θ = −g sin θ, ω˙ sin θ + 2 ω θ˙ cos θ = 0, θ˙2 + ω 2 sin2 θ = −g cos θ + T /m.
(22.82)
Note that the first of the above equations is in agreement with Eq. 20.84, with ω = ϕ, ˙ and determines θ = θ(t). [The terms on the left are the longitudinal components of the relative and centripetal accelerations. Their sum equals the corresponding component of g. The term −ω 2 sin θ cos θ (which was left somewhat unexplained in Remark 148, p. 823) is simply due to the centripetal acceleration.] In the second equation, we have that the sum of Euler and Coriolis accelerations, both directed along j, vanishes. This equation, multiplied by (sin θ)/, may be integrated to yield ω sin2 θ = constant, in agreement with Eq. 20.83. Finally, in the third equation (component along n, not addressed in Section 20.5), we have that the intensity T of the constraint reaction, divided by m, equals the sum of the n-components of: (i) relative and centrifugal accelerations and (ii) the gravity acceleration component g cos θ. ◦ Comment. This illustrative example reinforces the considerations in Remark 161, p. 923 on how important it is to distinguish the centripetal acceleration (Eq. 22.79) from the lateral acceleration (Eq. 22.77).
22.4.6 Foucault pendulum
♥
Thus far, we have almost always assumed the Earth frame of reference to be inertial. To be precise, we have only seen qualitative examples that include the Coriolis force due to the Earth’s rotation. Here, is our opportunity to correct the situation. This subsection is devoted to the Foucault pendulum, where it is paramount to treat an Earth frame of reference as non–inertial. 6 In the preceding section, we have discussed the spherical pendulum in an inertial frame. Here, we include the fact that now the Earth frame is not inertial, and hence the corresponding (apparent) inertial forces must be included in Eq. 20.69. However, of the two (transport and Coriolis), only the Coriolis force is relevant here. Indeed, let us consider the transport acceleration aT = aQ − Ω 2 yN + ω˙ × y (Eq. 22.32). As discussed in Subsection 22.4.2, on gravity vs gravitation, we have seen that the centrifugal force m Ω 2 z is included in gravity. [The variation of the centrifugal due to the motion of the pendulum may be neglected, as it would give rise to a term proportional to 6 Named after the French physicist Jean Bernard L´ eon Foucault (1819–1868), who introduced it.
936
Part III. Multivariate calculus and mechanics in three dimensions
ΩN2 ˚ ω 2 := g/m, as you may verify.] In addition, the term ω˙ × y, due to the variation of the angular velocity of the Earth, is definitely negligible. But what about the first term, namely maQ? This requires a more detailed analysis. At the center of the Earth, such a term is balanced exactly by the gravitational field due to the Sun and the other celestial bodies (momentum equation of the center of mass, maQ − fG = 0, Eq. 20.3). At any point P different from Q this difference no longer vanishes. Nonetheless, for the analysis presented here, this term is small and may be ignored. [On the other hand, such a small difference is responsible for the tides (Section 22.6).] Thus, all we have to do is to add, to the spherical pendulum equation (Eq. 20.69), the Coriolis force, namely fCor = −2 m ω E ×vR, with ω E = Ω eE, where eE is the unit vector directed from the South to the North pole, and Ω > 0 is the angular speed of the Earth. Thus, the angular momentum equation ¨ = m x × g, Eq. 20.69) yields (we use y for the spherical pendulum (m x × x because we operate in a moving, non–inertial frame) y × aR = −y × 2 ω E × vR + y × g. (22.83) Consider an orthonormal Earth frame jk (k = 1, 2, 3) with the origin placed at H. Let y1 and y2 denote two orthogonal horizontal components of y in this frame. In addition, let us introduce for simplicity a small–perturbation approximation. Specifically, assume that y1 and y2 are small, say of order ε, and y3 = − + O[ε2 ]. Accordingly, we obtain y = y1 j1 + y2 j2 + j3 + O[ε2 ] = j3 + O[ε]. Also, recalling that vR = δy/δt (Eq. 22.37) and aR = δvR/δt (Eq. 22.41), we have vR = y˙ 1 j1 + y˙ 2 j2 + O[ε2 ] and aR = y¨1 j1 + y¨2 j2 + O[ε2 ]. Then, neglecting all the (non–linear) terms of order O[ε2 ], Eq. 22.83 divided by yields y¨2 + 2 ΩN y˙ 1 + ˚ ω 2 y2 j1 + y¨1 − 2 ΩN y˙ 2 + ˚ ω 2 y1 j 2 = 0 (22.84) as you may verify. [Hint: Use j3 × j2 = −j1 j3 × j1 = j2 (Eq. 15.73)), as well as the BAC–minus–CAB rule (Eq. 15.96) to obtain y × [2ω E × vR] = 2 ΩN (y˙ 1 j1 + y˙ 2 j2 ) + O[ε2 ], where ΩN := Ω sin ψ denotes the component of ω E in the direction normal to the Earth surface, ψ being the latitude (Eq. 19.140). Also, set ˚ ω 2 := g/ (natural frequency of oscillation when ΩN = 0, e.g., at the equator).] Hence, we have y¨1 − 2 ΩN y˙ 2 + ˚ ω 2 y1 = 0, y¨2 + 2 ΩN y˙ 1 + ˚ ω 2 y2 = 0,
(22.85)
The equation is of the same type as that for the spinning system (Eq. 22.56) and the results of Subsection 22.4.4 (with Ω ˚ ω ) apply. In particular, the plane of oscillation rotates with angular speed equal to ΩN, thereby
937
22. Dynamics in non–inertial frames
demonstrating that the Earth rotates. [In the absolute frame of reference, the plane of oscillation does not rotate.] Indeed, the Foucault pendulum was the first apparatus to provide visual evidence of the rotation of the Earth.
22.5 Inertial frames revisited
♥
Here, we explore in greater depth the meaning of “inertial frame of reference.” Specifically, we want to devise a conceptual procedure that would allow us to determine whether we are in an inertial frame. [We cannot use the Foucault pendulum, which we have addressed in the preceding section, because we would obtain ΩN and not ω = ΩeE. In addition, we would not be able to verify whether aQ=0.] Here, we only assume that the particle is subject to a field force fF, about which we know that it tends to zero as y tends to infinity. Let us construct a conceptual experimental apparatus, not necessarily “easy to implement.” Consider a mass connected to a structure through a system of springs, which provides a measurable force fS. The structure is allowed to move in uniform translation with respect to the frame of reference under consideration (e.g., one connected to the Earth), with prescribed relative velocity vR. Hence, we have (use Eq. 22.40) m aQ + ω˙ × y − ω 2 yN + 2 ω × vR + aR = fS(y, t) + fF(y, t), (22.86) where m denotes the (known) mass of the particle, and aR is the relative acceleration, which of course can be measured and is therefore considered as known. To decide whether the frame is inertial, we may proceed as follows. Let us take two measurements of the force fS, say fS and fS , corresponding to two values of the velocity, say vR and vR = 2 vR . [Conceptually, the two sets of measurements should be taken in the same place, at the same time.] Subtracting Eq. 22.86 as written for the two cases, we have ΔfS = fS (y, t) − fS (y, t) = 2 m ω × vR − vR = 2 m ω × vR . (22.87) Next, note that vR × ΔfS = 2 m vR ω N (use a × (b × a) = a2 bN, Eq. 15.67). Accordingly, given an arbitrary frame of reference, say j1 , j2 , j3 , if we choose vR = v3 j3 , we obtain 2
ω N = ω1 j1 + ω2 j2 = j3 × ΔfS/(2 m v3 ),
(22.88)
938
Part III. Multivariate calculus and mechanics in three dimensions
which provides us with ω1 and ω2 . We cannot possibly obtain ω3 from Eq. 22.88. However, if we make a second experiment with vR = v1 j1 , we obtain ω2 and ω3 . This way we have obtained all three components of ω. [Again, j1 , j2 and j3 form an arbitrarily chosen orthonormal basis.] We may repeat the experiment over time and obtain ω(t). Once we know ω(t), we can also ˙ time–differentiate it and obtain ω(t). Thus, the terms regarding centrifugal, Euler and Coriolis forces are known. [Recall that y is the vector from Q (origin of our system) to P , the arbitrarily chosen point under consideration.] Finally, we can even evaluate (again, conceptually) m aQ. Indeed, let us see what happens if vR = 0. All the terms are known, except for fF and m aQ. However, the first tends to zero as y tends to infinity, whereas the second is independent of y. This allows us to evaluate aQ(t). This way, we are able to determine whether our frame of reference is inertial (ω = aQ = 0), or not. Voil` a.
22.6 Appendix A. The tides
♥
In this appendix, we address the phenomenon of tides, a problem that has intrigued me for a long time. My interest goes way back, when my high school teacher of mathematics and physics, Prof. Pietro Pagani, discussed them in class. He used the scheme depicted in Fig. 22.7.
Fig. 22.7 On the tides. A simple model
Here, we consider only the Sun and the Earth. The Earth is assumed to consist of a uniform ball subject only to the gravitational force due to the Sun, whose center is O. At the center of the Earth, Q, the gravitational force due to the Sun, fS, is perfectly balanced by the lateral force due to orbital
22. Dynamics in non–inertial frames
939
the motion of the Earth around the Sun, fQ = −m aQ = m Ω 2 rSE, where Ω is the angular speed of the Earth in its orbital motion (which for simplicity is assumed to be perfectly circular), and rSE denotes the vector from the center of the Sun to the center of the Earth. However, the situation is different for any point that does not coincide with Q, in particular for the points on the surface of the Earth, as illustrated in Fig. 22.7. The figure depicts the Earth orbital plane, which is approximated with its equatorial plane, although the angle between the two (axial tilt) is 23.439 3◦ (p. 1003). For instance, the point A1 is closer to the Sun than Q. Therefore, the Sun gravitational force fS is stronger, and hence fQ and fS do not offset each other. Their sum points away from Q. The opposite occurs at A2 , but the vector sum still points away from Q. On the other hand, at B1 and B2 the sum points towards Q, because of the difference in the angles. Therefore, the water is pulled upwards at A1 and A2 , and downwards at B1 and B2 . As the Earth rotates, the point P experiences a periodic force with a peak every 12 hours. I had an objection, though: the shape of the water in a pot does not change if the force of gravity increases or decreases. Therefore, the formulation proposed worked well for a uniform ball fully covered with water (for, in this case the shape of the water surface is equipotential, namely orthogonal to the resultant potential force), but not for small basins. He acknowledged that the mechanism is a bit more complex, but added that the substance remained essentially that described in Fig. 22.7, and told me I should settle for that explanation, because the complete analysis was way above my mathematical background. [After all, I was only in my twelfth grade. Prof. Pagani was very well known in our school, because he covered much more advanced material than the other physics profs.] I soon forgot about the issue and that is the way it stayed for about forty years, until one day, in the mid–nineties, I was at MIT, visiting my friend Jerry Milgram, a professor in the Naval Architecture and Ocean Engineering Department. At some point, I don’t remember why, during our conversation, the topic shifted to tides. Suddenly, the issue I had raised with Prof. Pagani popped up in my mind, and I asked Jerry the same question. He pointed out to me that one should take into account the circumferential component of the difference between gravitational and centrifugal forces, and that this was the one responsible for the tides in relatively small basins. He even gave me a figure illustrating the variation of the radial and circumferential components with time. Then, I got curious and decided to examine the problem from a mathematical point of view. I went through all the equations to my full satisfaction and obtained a relatively detailed description of the forces that cause the tides. This process requires only the know–how you have acquired thus far. Accordingly, such an analysis (an oversimplification of a very complex phenomenon) is presented in this appendix.
940
Part III. Multivariate calculus and mechanics in three dimensions
22.6.1 A relatively simple model
♥
In this subsection, we present a relatively simple model that will help us in shedding some light on the phenomenon of tides.
• Preliminary considerations Here, we assume that the Earth surface may be approximated with a sphere, and that its center, denoted by Q, coincides with the center of mass of the Earth, GE. ◦ Comment.♥ Maybe you heard that the Earth is as smooth as a billiard ball. Is this true? Let’s see! A billiard ball is supposed to be exactly spherical. However, nobody is perfect! According to the World Pool–Billiard Association, an uncertainty approximately equal to .22% of the billiard ball diameter is acceptable (Ref. [72]). A .22% of the Earth diameter is approximately 28 km. Now, the smoothed Earth surface is close to being spherical. The difference between the equatorial radius (about 6,378 km) and the polar radius (about 6,357 km) equals about 21 km. Thus, the roundedness of the Earth is lower than the acceptable uncertainty range for the billiard ball. In addition, the roughness on the Earth is given by the difference in height between the highest mountain, Mount Everest (about 8,848 m above sea level), and the deepest point in the ocean, the Mariana Trench (about 11,000 m below sea level in), less than 20 km. Hence, the smoothness is also lower than the acceptable uncertainty range for the billiard ball. Thus, even adding the two contributions, we can say that the Earth is almost as smooth as a billiard ball. Let us consider the inertial forces acting on a given point P of the surface of the Earth. In this subsubsection, we consider the point of view of an observer in a non–inertial frame of reference with origin at the center Q of the Earth and rigidly connected with the Earth. Considering P as fixed in the Earth frame makes our work a bit easier, since, for our observer, we have vR = aR = 0, so that the relative and Coriolis accelerations vanish. This leaves only the force due to the transport acceleration fT = −m [aQ + ω × (ω × z) + ω˙ × z] (Eq. 22.32), where z is the vector from the center of the Earth Q to the point of its surface P (Fig. 22.7, p. 938), and is time–independent for our observer. Let us consider the three terms. The second one is the centrifugal force, which is already included in the weight (Eq. 20.173). [Weight is of no interest here, because we are only considering forces that vary with time as the Earth rotates.] The last term is due to the variation of the angular velocity of the Earth and is definitely negligible. But what about the first term, which is due
941
22. Dynamics in non–inertial frames
to the acceleration of the point Q? Don’t we have to include this apparent force? This is the issue addressed in this appendix. In summary, the forces that we have to include in the analysis are the gravitational forces due to all the celestial bodies (Earth excluded — weight is of no interest here), and the term −m aQ. If our point P were at the center of the Earth, namely for P = Q = GE, these two forces would balance each other (exactly!), because of the law governing the dynamics of the center of mass, m aG = fE (Eq. 21.50). However, these forces do not balance each other exactly at any point P = Q. The main objective of this section is to present an analysis of such a difference, which is the driving force behind the tides.
• The hypotheses behind the model In the analysis of gravity vs gravitation presented in Subsections 20.8.1 and 22.4.2, we have not included the gravitational forces due to the Sun, the Moon, the planets and other celestial bodies. We do this in this appendix. As we will see in Subsection 22.6.2, the effect of the Moon is about twice as large as that of the Sun. However, the mechanism is more easily described for the Sun, which may be considered as fixed. For this reason, this is considered first. Accordingly, here we consider an oversimplified model, consisting only of Earth and Sun, as discussed in Subsection 20.8.3, on the Kepler laws. In addition, we use the following simplifying hypotheses: (i) the Earth orbit is circular (recall Remark 152, p. 840); (ii) the Earth equatorial plane coincides with its orbital plane (even though their angle is 23.439 3◦ ); (iii) the ratio ε = RE/rSE is much smaller than one, ε :=
RE 1, rSE
(22.89)
where RE is the Earth radius and rSE = rSE is the distance between the Sun center and the Earth center. [The Earth equatorial radius is RE 6, 378 km, whereas the average distance of the center of the Earth, Q, from the center of the Sun, O, is rSE 149.60 million kilometers (Fig. 22.8). Therefore, in this case we have ε 0.000 042 6, very small indeed. In the figures, the ratio ε is greatly magnified, for graphical clarity.] For future reference, the average distance rEM of the center of the Earth from the center of the Moon is approximately 384,000 km. Thus, for the Moon, we have εM := RE/rEM 0.016 6, which is still adequately small for the analysis presented below.
942
Part III. Multivariate calculus and mechanics in three dimensions
• The tidal force (exact) Consider Eq. 21.50, namely ME aQ = fQ,
(22.90)
where ME and aQ are the mass of the Earth and the acceleration of its center Q, whereas fQ is the gravitational force that the Sun exerts on the Earth, evaluated at Q, namely (use Eq. 20.145) fQ = ME fQ ,
with fQ := −G
MS rSE, 3 rSE
(22.91)
where MS is the mass of the Sun, and rSE is the vector from the center of the Sun, O, to the center of the Earth, Q (Fig. 22.8). According to Eq. 22.90, at the center of the Earth, the gravitational force, fQ, is balanced exactly by the (apparent) inertial force −ME aQ (Eq. 22.46), due to the acceleration of the origin Q of the Earth frame of reference (Eq. 22.32). However, things are different at any point P = Q. To address this, let us go back to the explanation given to me by Prof. Pagani. Consider a point P of the surface of the Earth. In this case, in addition to the weight (not of interest here, because is constant as the Earth rotates), the forces under consideration are: (i) the inertial force −ME aQ, which equals −fQ = −ME fQ (Eqs. 22.90 and 22.91), and (ii) the Sun gravitational force acting on P , namely fP = ME fP , with fP = −G
MS r r3
(r := r),
where r is the vector from O (center of the Sun) to P (Fig. 22.8).
Fig. 22.8 On the tides. A better model
(22.92)
22. Dynamics in non–inertial frames
943
The sum of these two forces (namely the inertial and the gravitational) is given by, for m = 1, rSE r Δf := fP − fQ = −G MS 3 − 3 . (22.93) r rSE Again, at the points A1 and A2 the resultant force is directed away from Q, whereas at the points B1 and B2 , which have the same distance from O as Q, the resultant force is directed towards Q. Next, let us obtain an expression for Δf that is easier to analyze. Consider a frame of reference i, j, k, with origin placed at O (center of the Sun, see again Fig. 22.8) and the x-axis directed like rSE, so that i := rSE/rSE. Note 2 that fQ /fQ = −i, where fQ := fQ = GMS/rSE (Eq. 22.91). Accordingly, using Eqs. 22.91–22.93, we have fP r/rSE Δf = + i = − 3 3 + i, fQ fQ r /rSE
(22.94)
with r = rSE + RE cos θ i + RE sin θ j, or r = (1 + ε cos θ) i + ε sin θ j, rSE
(22.95)
where ε = RE/rSE (Eq. 22.89), and ) *3/2 ) *3/2 r3 2 2 2 = (1 + ε cos θ) + (ε sin θ) = 1 + 2 ε cos θ + ε , 3 rSE
(22.96)
where θ is the angle (positive counterclockwise) between the x-axis and the segment QP (Fig. 22.8 above). [When θ = π (θ = 0), the Sun is at the zenith (nadir).] Next, we want to obtain the components of Δf , in the radial and circumferential directions. Let us denote by eR = cos θ i + sin θ j the unit vector in the radial direction (namely in the direction of the oriented segment from Q to P ), and by eθ = − sin θ i + cos θ j the unit vector in the counterclockwise circumferential direction. Then, we have Δfθ Δf := · eθ (22.97) fQ fQ 3 ) * −rSE = (1 + ε cos θ) i + ε sin θ j + i · − sin θ i + cos θ j 3 r 3 3 ) * rSE rSE = 3 (1 + ε cos θ) sin θ − ε sin θ cos θ − sin θ = − 1 sin θ r r3
944
Part III. Multivariate calculus and mechanics in three dimensions
and ΔfR fQ
Δf · eR fQ 3 ) * −rSE (1 + ε cos θ) i + ε sin θ j + i · cos θ i + sin θ j = r3 3 ) * −rSE = 3 (1 + ε cos θ) cos θ + ε sin2 θ + cos θ r 3 rSE = − 3 (cos θ + ε) + cos θ. (22.98) r :=
The two equations above are exact, even for large ε.
• The tidal force (approximate) Here, we obtain the approximations when terms of order ε2 are negligible. To this end, using (1 + x)α = 1 + α x + O[x2 ] (Eq. 13.26), Eq. 22.96 yields 3 rSE
r3
) *−3/2 = 1 + 2 ε cos θ + ε2 = 1 − 3 ε cos θ + O ε2 .
(22.99)
3 3 /r ) − 1 sin θ (Eq. 22.97), neglecting Combining this with Δfθ /fQ = (rSE terms of order ε2 , and using sin θ cos θ = 12 sin 2θ (Eq. 7.64), we have Δfθ 3 = − 3 ε cos θ sin θ = − ε sin 2θ. fQ 2
(22.100)
3 On the other hand, combining ΔfR /fQ = −(rSE /r3 ) (cos θ + ε) + cos θ (Eq. 1 2 22.98) with Eq. 22.99, using cos θ = 2 (1 + cos 2θ) (Eq. 7.63), and neglecting terms of order ε2 , we have
ΔfR fQ
1 = − cos θ + ε − 3 ε cos2 θ + cos θ = ε (1 + 3 cos 2θ). 2
(22.101)
Our results are depicted in Figs. 22.9 and 22.10, where Fθk = Δfθ /(εk fQ ) and FRk = ΔfR /(εk fQ ), with εk = 0, 0.05, 0.10 (k = 0, 1, 2). Recall that for the Sun, ε 0.000 042 4, and hence the approximate solution is undistinguishable, within plotting accuracy, from the exact one. [For the Moon εM := RE/rEM = 0.016 6 ∈ (ε0 , ε1 ), and hence there exists a difference, albeit minor.] Note that the approximate solution misses a feature of the phenomenon, namely the difference in the peaks exhibited by the exact solutions.
945
22. Dynamics in non–inertial frames
Fig. 22.9 Circumferential component
Fig. 22.10 Radial component
• Taking into account the rotation of the Earth Finally, note that, as the Earth rotates, the angle θ changes, so as to have θ = θ0 + Ω t,
(22.102)
where Ω is the angular speed of the Earth. Accordingly, the Sun generates periodic forces in both the radial and circumferential directions. Similar considerations apply to the tidal forces that are generated by the Moon. To be specific, we assume that the Earth–Moon system evolves as described in Section 21.8 (Sun–Earth–Moon system). Then, you may verify that the Earth–Sun tidal analysis presented above is applicable to the Earth– Moon system as well. The total force field is the sum of the two (solar and lunar) effects (not to mention all the other celestial bodies, not included here).
22.6.2 So! Is it the Sun, or is it the Moon?
♥
Here, we show that the effect of the Moon on the tides is more than twice as large as that of the Sun. According to Eq. 22.97, the ratio ρθ = ΔfθS /ΔfθM (where ΔfθS is due to the Sun and ΔfθM is due to the Moon) is given by, for the same value of θ, ρθ :=
3 fQS MS rEM ΔfθS = = , 3 ΔfθM fQM MM rSE
(22.103)
946
Part III. Multivariate calculus and mechanics in three dimensions
where rSE and rEM denote the distance of Q from the centers of the Sun and the Moon, respectively. This value holds for the radial component as well, namely (use Eq. 22.98) ρR :=
ΔfRS ΔfRM
=
3 MS rEM . 3 MM rSE
(22.104)
Using MS/MM 27 · 106 , and rEM/rSE 0.384 4/149.60 2.570 · 10−3 (p. 1003), we obtain ρθ = ρR 0.458,
(22.105)
namely the solar tidal force is about 45.8% of the lunar one.
• The period of the solar and lunar tides
♥♥
The period of the tides due to the Sun is, of course, one solar day. Nest, let us consider the period of the tides due to Moon, which is connected with the so–called lunar synodic day TMSyn , namely the time it takes the Moon to reappear in the same location to an observer placed on the Earth. This is given by TMSyn = TESid + τ,
(22.106)
where TESid denotes the Earth sidereal period, namely the time necessary for a complete revolution (Definition 339, p. 842). On the other hand, τ is obtained as follows. As the Earth makes a full revolution around its axis, the Moon moves forward a bit. Thus, the Earth has to cover an additional angle, say θ∗ , to catch up with the Moon. Accordingly, τ is the time it takes the Earth to cover the additional angle θ∗ . Also, it takes the Moon the time TESid + τ to cover the same angle. Therefore, denoting by ωE the angular speed of the Earth around its axis, and by ωM the angular speed of the Moon around the Earth, we have θ∗ = ωE τ = ωM (TESid + τ ),
(22.107)
τ 1 . = TESid ωE/ωM − 1
(22.108)
or
Then, recalling that T = 2π/ω (Eq. 11.38), we have ωE/ωM = TM/TESid , where TM is the Moon orbital period, whereas TESid is the Earth sidereal rotation
947
22. Dynamics in non–inertial frames
period. Hence, we finally have τ 1 . = TESid TM/TESid − 1
(22.109)
Recall that the Moon orbital period is TM 27.321 661 solar days (p. 1003) and that the Earth sidereal rotation period is TESid = 0.997 269 68 solar days (Definition 339, p. 842). Thus, we have τ /TESid = 0.037 884.
(22.110)
Finally, the lunar synodic day, TMSyn = TESid + τ , equals TMSyn 0.997 269 68 × (1 + 0.037 884) 1.035 050 solar days, (22.111) namely one solar day plus approximately 50.4 minutes. Correspondingly, the period of the lunar tides (the so–called tidal lunar semi–day) equals 12 hours and 25.2 minutes, in perfect agreement with the experimental value.
22.6.3 Wrapping it up
♥
Here, we apply the results of the preceding subsections to a qualitative analysis of tides. Specifically, first we discuss how the amplitude of oscillation is strongly related to the phenomenon of resonance, or near–resonance (Section 11.7). Then, we discuss the influence of the Sun.
• Resonance and tides Some authors present a static analysis of tides for the limited case of a planet that is fully covered with water. This consists in finding the shape of the free surface that is equipotential (so that the force is normal to the free surface). However, these analyses (which use more sophisticated mathematics than that presented thus far) are based upon a sequence of equilibrium configurations. Therefore, they are clearly inadequate for taking into account the effects of resonance. So, let us address a dynamic analysis of the phenomenon of the tides. For the time being, let us focus on the circumferential component. To have a qualitative understanding of the phenomenon, consider a body of water that oscillates within its basin. Here, I ignore the Coriolis force, well aware that in doing so I am oversimplifying the phenomenon.
948
Part III. Multivariate calculus and mechanics in three dimensions
To use an analogy, let us consider a bucket full of water. Tilt it on its side and wait until the water stops moving. Then, bring it back to its upright position. You will note that the water oscillates, with decreasing amplitude. In fact (take my word for it), for small amplitudes, these oscillations are analogous to those of a linear damped harmonic oscillator (Section 14.2, on free oscillations of a damped harmonic oscillator). For small amplitudes, they are both governed by an equation closely related to Eq. 14.7. In the case of tides however, we have a forcing function, given Eq. 22.97. [Here, we limit ourselves to the Moon, in the absence of the Sun and any other celestial bodies.] Accordingly, the phenomenon is analogous to that of the forced oscillations due to a harmonic input (Section 14.3, on harmonic forcing of a damped harmonic oscillator; see in particular Eq. 14.6). Experimental evidence confirms this result. Indeed, in some places we observe tides with amplitude larger than in other places in the same general area. The larger amplitudes occurs when the period of oscillation of the basin under consideration is in sync with the period of the tides. Finally, consider the effect of the radial component. If there were no spatial variations of the radial force, we would simply experience a change of the acceleration of gravity. An increase of the weight leaves the level of the waters unchanged (no tidal effect). This is true for relatively small basins. However, the force has a variation along θ (Eq. 22.102). This variation does have an effect on the phenomenon under consideration and should be included in the analysis. [I will let you have fun with it! Think of a seesaw!]
• Combining effects Finally, it is apparent that, to the lunar tidal effects, we have to add those due to the Sun. The period of the tidal forces due to the Sun is clearly half a solar day. Because of the difference in the periods, the lunar and solar tidal forces go continuously from being in phase (namely reach the peak at the same time, in which case they add, so as to yield stronger tides), into being in counterphase (namely when one is at its maximum, the other is at its minimum, in which case they subtract, so as to yield weaker tides). Specifically, when the Sun and the Moon are aligned (full–Moon and new–Moon days), the tides are stronger.
949
22. Dynamics in non–inertial frames
• Concluding remarks The analysis presented above is really much too simple (but is the best we can do at this point in our journey). Many important items have been neglected, in addition to the Coriolis force mentioned above. These include the inclination of the Earth equatorial plane with respect to its orbital plane, and the fact that the shape of the Earth is an oblate spheroid. An issue raised by the German philosopher Immanuel Kant pertains to the effect of friction.7 But even more complicated phenomena come into play, such as the fact that the Earth is not a rigid body, and deforms like an elastic body under the gravitational forces under consideration. [For an analysis of these issues from a historical point of view, you might like to read the paper by Martin Ekman (Ref. [18]), which presents a history of the theories of tides from antiquity to 1950. If you are interested, in a deeper analysis of tides, you may also consult for instance Ref. [48], pp. 130–188, and Ref. [10], pp. 95–111.]
22.7 Appendix B. Time derivative of rotation matrix
♠
We conclude this chapter by deriving the expression for the time derivative of the rotation matrix R = [rhk ] = [ih · jk (t)], which was introduced in Eq. 22.7. Consider the angular velocity ω of the moving frame and set ω=
3 #
ω k ik =
k=1
3 #
ωk jk ,
(22.112)
k=1
and introduce the angular–velocity matrices Ω and Ω , given by ⎡ ⎤ # 3 0 −ω3 ω2 jpk ωp = ⎣ ω3 0 −ω1 ⎦ Ω := ωjk = p=1 −ω2 ω1 0
(22.113)
and = Ω := ωjk
# 3 p=1
7
jpk ωp
⎤ 0 −ω3 ω2 0 −ω1 ⎦. = ⎣ ω3 −ω2 ω1 0 ⎡
(22.114)
The German philosopher Immanuel Kant (1724–1804) contributed not only to philosophy (Transcendental Idealism), but to science as well.
950
Part III. Multivariate calculus and mechanics in three dimensions
[Note the analogy with Eq. 15.84.] We have 3 3 # # drhk = i h · ω × j k = ih · ωp ih × i p · jk ω p ip × j k = dt p=1 p=1
=
3 #
hpj ωp ij · jk =
j,p=1
3 #
ωhj rjk .
(22.115)
j=1
[Hint: For the first equality, use the Poisson formulas (Eq. 22.22). For the second equality, use Eq. 22.112. For the third, use a · b × c = a × b · c (Eq. "3 15.89). For the fourth equality, use ih × ip = j=1 hpj ij (Eq. 15.76). For the fifth equality, use rjk = ij · jk (Eq. 22.7), as well as the definition of ωhj (Eq. 22.113).] In matrix notation, we have dR = Ω R. dt
(22.116)
We also have 3 # d drhk = i h · j k = i h · ω × j k = ih · ω p jp × j k dt dt p=1
= ih ·
3 # j,p=1
jj jpk ωp =
3 #
rhj ωjk .
(22.117)
j=1
[Hint: For the second equality, use the Poisson formulas (Eq. 22.22). For the third equality, use Eq. 22.112. For the fourth equality, use jp × jk = "3 j=1 jpk jj (Eq. 15.76). For the fifth equality, use rhj = ih · jj (Eq. 22.7), as well as the definition of ωjk (Eq. 22.114).] In matrix notation, we have dR = R Ω . dt
(22.118)
◦ Comment. Alternatively, Eq. 22.118 may be obtained directly from Eq. 22.116 by using the fact that, for orthogonal transformations, similar matrices are related by Ω = RT Ω R (Eq. 16.71), so that R Ω = Ω R.
Chapter 23
Rigid bodies
Here we are, at the final chapter of this volume, where we explore how the equations of dynamics simplify if the particles may be assumed to be rigidly connected. The assumption that the particles be rigidly connected will be referred to as the rigid–body assumption. The rigidly–connected particles could consist of atoms, thereby providing the basis for a rigid–body formulation for a continuum (namely a body with piecewise–smooth density). We will refer to these systems as rigid particle system (for discrete systems), or rigid bodies (for continuum systems).
• Overview of this chapter In Section 23.1 we discuss the kinematics of rigid bodies, which is closely related to the material in Chapter 22. Regarding dynamics, note that a rigid– body motion is fully determined by six degrees of freedom. Thus, we need six equations. These are provided by the equations of momentum, m dvG/dt = fE ˇ /dt = m E (Eq. 21.60). These are (Eq. 21.50) and angular momentum, dh G G reformulated for rigid bodies in Section 23.2. Next, in Section 23.3, we consider the energy equation, dT /dt = PE + PI (Eq. 21.73), again reformulated for rigid bodies. In Section 23.4, we focus on two–dimensional problems and present an application, namely a disk rolling down a piecewise slippery slope. Finally, we consider the dynamics of pivoted rigid bodies in three dimensions, such as gyroscopes (Section 23.6), and hinged rigid bodies in planar motion, such as realistic pendulums (Section 23.7). We also have two appendices. In the first (Section 23.8), we derive explicit expressions of the moments of inertia for some simple geometrical solids, whereas, in the second (Section 23.9), we introduce tensors, which are used in Section 23.2.
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9_23
951
952
Part III. Multivariate calculus and mechanics in three dimensions
23.1 Kinematics of rigid bodies In this section, we address the kinematics of rigid bodies. As stated above, this material is closely related to that presented in Chapter 22. Indeed, in the case of rigid bodies, it is convenient to introduce a frame of reference rigidly connected to the body, with origin at an arbitrary point Q of its space. We will refer to this as the body frame. Accordingly, all the material presented in Chapter 22 applies. In particular, we set for the k-th particle zk := xk − xQ,
(23.1)
as in Eq. 22.34. Then, we can use the formulation of Subsection 22.3.1 on the kinematics of points of the moving frame, where we have δzk /δt = 0 (Eq. 22.27). Thus, we are left with dzk /dt = ω × zk (Eq. 22.28). Then, in analogy with vT = vQ + ω × z (Eq. 22.30), we have vk =
dxQ dxk = + ω × zk = vQ + ω × (xk − xQ). dt dt
(23.2)
◦ Comment. Note that Eq. 23.2 can be applied in particular if xk is replaced with an arbitrary point xQ . This yields vQ = vQ +ω×(xQ −xQ). Subtracting this equation from Eq. 23.2, we have vk = vQ +ω×(xk −vQ ). Thus, changing the origin leaves Eq. 23.2 formally unchanged, as it should be, since Q was an arbitrary point to begin with.
23.1.1 Instantaneous axis of rotation
♣
In Definition 351, p. 912, we have introduced the instantaneous axis of rotation, for the limited case of a moving space that has a fixed point. Here, we address the formulation regarding the existence of an instantaneous axis of rotation in the moving frame. This is slightly more complicated, because the origin of the moving frame is not assumed to be fixed. Therefore, we wonder whether the existence of the instantaneous axis of rotation applies here as well. The answer to this question is the objective of this subsection. First, we address the particular case in which we have vQ · ω = 0, as this provides a gut–level understanding of the situation. Then, we extend the result to the case in which vQ · ω = 0.
23. Rigid bodies
953
• Particular case: vQ · ω = 0 Consider first the case in which vQ and ω are perpendicular to each other at all times, so that vQ · ω = 0 (as we have for instance in planar motion). Then, according to Eq. 23.2 we have vk · ω = 0, namely all the velocities vk lie on a plane perpendicular to ω, a plane which may vary with time. At a given time t, let us choose a frame of reference with the origin at Q, the i-axis directed like vQ, and the k-axis directed like ω (so as to have that vQ = vQ i and ω = ω k) and let us consider the point P given by zP = (vQ/ω) j (Fig. 23.1, where k is pointing toward your eyes).
Fig. 23.1 Instantaneous axis of rotation
The velocity of P is given by (use Eq. 23.2, as well as k × j = −i, Eq. 15.72) vP = vQ + ω × zP = vQ i + ω k × (vQ/ω) j = vQ i − vQi = 0.
(23.3)
This is depicted in Fig. 23.1. [The dark gray area to the left of the QP line represents the (horizontal) leftwards speed of the points between Q and P due to rotation. The light gray area to the right of the QP line represents the (horizontal) rightwards speed of the same points due to translation.] The same holds true for all the points on a straight line L given by z = zP + z k with z arbitrary (namely a z-line through P ). Indeed, we have (use Eq. 23.3) v(z) = vQ + ω × (zP + z k) = vP + ω k × z k = 0.
(23.4)
In other words, for all the points that lie on L, the translation velocity is canceled exactly by the velocity due to the rotation, and hence v = 0, as also apparent from the figure. Accordingly, such a line is the instantaneous axis of rotation for the case under consideration.
954
Part III. Multivariate calculus and mechanics in three dimensions
• General case: vQ · ω = 0 In order to extend this formulation to the general case, consider a rigid body that at a given time t has an angular velocity ω, whereas the point Q has an arbitrary velocity vQ. The presence of a component of vQ parallel to ω (namely normal to the plane of Fig. 23.1) does not alter substantially the considerations presented in the preceding subsubsection. Specifically, let us choose the z-axis to be directed like ω (as before), and the y-axis normal to vQ and ω. Then, we have ω = ω k and vQ = vQ i + vQ k. Therefore, in x z analogy with Eq. 23.4, for all the points on the straight line L defined by z = (vQ /ω) j + z k with z arbitrary, we have x
v = vQ + ω × z = [vQ i + vQ k] + ωk × (vQ /ω) j + z k x
z
x
(23.5)
= [vQ i + vQ k] − vQ i = vQ k. x
z
x
z
Thus, all the points on the line L have a velocity v which is parallel to k, namely to L itself. In other words, all the points of L move along L itself. Accordingly, the line L, seen as a whole (namely as indistinguishable from the individual points) does not move. Therefore, such a line is the instantaneous axis of rotation for the general case under consideration. The motion consists of: (i) a translation parallel to the axis of rotation with velocity vQ k, and z (ii) a rotation around the same axis, with angular velocity ω = ω k.
23.2 Momentum and angular momentum equations From Eq. 23.2 we see that, in the case of rigid particle systems, we have that the motion of all the particles is fully determined by six parameters, namely the components of vQ(t) and ω(t). Indeed, once we have evaluated vQ(t) and ω(t), we can obtain xQ by integrating the vector differential equation x˙ Q = vQ, and then the location xk of each point, by integrating the vector differential equations x˙ k = vQ(t) + ω(t) × [xk − xQ(t)] (Eq. 23.2). Accordingly, we need two vector differential equations, for vQ(t) and ω(t) respectively. These are provided by the momentum and angular momentum equations for rigid bodies, which are examined, respectively, in the two subsections that follow. It should be emphasized again that these are the only equations needed to solve completely the motion of a rigid body (two vector equations for two vector unknowns, namely vQ and ω). This is the main reason for our insistence on these equations in Chapter 21.
955
23. Rigid bodies
• Choosing the center of mass as the origin The solution of the problem simplifies considerably if we choose the point Q to coincide with the center of mass G of the rigid body. Indeed, by doing so, we have two effects. First and most important, the equation for the motion of G is very simple. For, the momentum equation for n particles, namely m dvG/dt = fE (Eq. 21.50), is clearly still valid. In addition, we choose the pivot for the angular moment equation to coincide with G. Then, the angular momentum equation simplifies, from dhO /dt = mOE + m vG × vO (Eq. 21.55), to dhG/dt = mGE (Eq. 21.60). Accordingly, this choice is adopted throughout this chapter, with the exception of Section 23.6 on pivoted bodies and Section 23.7 on hinged bodies. Specifically, instead of vk = vQ + ω × zk (Eq. 23.2), with zk := xk − xQ (Eq. 23.1), we have vk = vG + ω × (xk − xG), namely ˇk, ˇ k := vk − vG = ω × x v
(23.6)
ˇ k = xk − xG (Eq. 21.65). where x
23.2.1 Momentum equation The momentum equation is given by Eq. 21.50, namely m
dvG = fE. dt
(23.7)
[Note that, for rigid bodies, xG is a point rigidly connected to the body, albeit not necessarily a point of the body (e.g., a constant–density doughnut).]
23.2.2 Angular momentum equation Here, we consider the equation of the angular momentum with respect to the center of mass, in the form given by Eq. 21.68, namely ˇ dh G = mGE , dt
(23.8)
" ˇ = "n mk x ˇ k (Eq. 21.64), whereas mGE = nk=1 x ˇk × v ˇ k ×fkE is the where h G k=1 resultant moment of the external forces, with respect to the center of mass ˇk = ω × x ˇ k (Eq. 23.6), we obtain xG (Eq. 21.56). Combining with v
956
Part III. Multivariate calculus and mechanics in three dimensions
ˇ = h G
n #
ˇ k ). ˇ k × (ω × x mk x
(23.9)
k=1
Next, using a × (b × c) = b (a · c) − c (a · b) (BAC–minus–CAB rule, Eq. 15.96), we have ˇ = h G
n #
) * ˇk) − x ˇ k (ˇ ˇk · x mk ω x xk · ω .
(23.10)
k=1
• Moment of inertia tensor Finally, the time has come to introduce tensors. To this end, it’s convenient to begin by shifting to matrix notation (Definition 237, p. 596). Then, we have ˇ = h G
n #
) * mk ω ˇxkT ˇxk ) − ˇxk (ˇxkT ω ,
(23.11)
k=1
which may be written as ˇ = J ω, h G G
(23.12)
mk ˇxkT ˇxk I − ˇxk ˇxkT
(23.13)
where JG =
n # k=1
is called the moment of inertia matrix. ◦ Warning. Moments of inertia have nothing to do with moments of forces — another unfortunate choice of terminology! Next, I would like to replace the matrix notation with the more convenient vector notation used in Eq. 23.10, and rewrite Eqs. 23.12 and 23.13 by using vector notation. For ˇxk and ω, we know what to do. Unfortunately, we do not have yet introduced a suitable notation for “vectorial” quantities that correspond to matrices. To accomplish this, we have to introduce tensors. For your convenience, to streamline the flow of the presentation, tensors (in particular second–order tensors) are introduced in Appendix B of this chapter (Section 23.9). Remark 163 (A very strong recommendation). If you are not familiar with tensors and tensor notation, I strongly recommend that you, at this point, read carefully the material in Section 23.9 up to Eq. 23.147 included,
957
23. Rigid bodies
in particular: (i) the dyadic product, A = b ⊗ c (Eq. 23.138), which corresponds to A = b cT , (ii) the rule b ⊗ c d = b (c · d) (Eq. 23.142), which corresponds to [b cT ] d = b [cT d], (iii) the definition of a second–order tensor, "3 A = h,k=1 ahk ih ⊗ ik (Eq. 23.144) and (iv) the tensor–by–vector product, A b = c (Eq. 23.147), which corresponds to A b = c. Using such notation, Eqs. 23.12 and 23.13 may be written as ˇ = J ω, h G G
(23.14)
where JG =
n #
) * ˇk) I − x ˇk ⊗ x ˇk xk · x mk (ˇ
(23.15)
k=1
is called the moment of inertia tensor. [The identity tensor I corresponds to the identity matrix I.] Finally, combining Eqs. 23.8 and 23.14, we obtain that, for rigid bodies, the angular momentum equation around the center of mass is given by d JG ω = mGE . dt
(23.16)
• From discrete to continuous representation Remark 164. If the rigid particle system consists of atoms, it is customary to shift from the discrete representation, namely a body composed of n particles of mass mk (k = 1, . . . , n), into a continuous representation, namely a body having a smooth density (mass per unit volume) (x). In this case, one uses integrals instead of sums. Accordingly, the definition of center of mass (Eq. 21.46) becomes /// 1 xG = x dV, (23.17) m V where /// dV
m=
(23.18)
V
is the total mass of the system. Similarly, Eqs. 23.13 and 23.15 become, respectively, /// ) * (ˇxT ˇx) I − ˇx ˇxT dV, JG = V
(23.19)
958
Part III. Multivariate calculus and mechanics in three dimensions
and /// JG =
V
) * ˇ) I − x ˇ⊗x ˇ dV, (ˇ x·x
(23.20)
ˇ2 , x ˇ3 , or, in terms of components, x ˇ1 , x /// ) * 2 x ˇ1 + x ˇ22 + x ˇ23 δhk − x ˇh x ˇk dV. JG =
(23.21)
For instance, we have /// 2 x ˇ1 + x ˇ22 dV JG =
(23.22)
hk
33
V
and
JG = − 12
V
/// V
x ˇ1 x ˇ2 dV.
As an example, consider a truncated circular cylinder of radius R and length . If is constant, using cylindrical coordinates (body–frame components) and choosing the z-axis to coincide with the axis of the cylinder, we have (use dV = r dr dθ dz, Eq. 19.138) /// JG = 33
r2 dV = 2 π V
R
/
r3 dr =
0
1 1 π R4 = m R2 , 2 2
(23.23)
where m = π R2 denotes the mass of the cylinder. [More examples are presented in Appendix A of this chapter (Section 23.8).] Remark 165. If the body has a plane of symmetry (in terms of both geometry and density distribution), say the plane x ˇ 1 = 0, then JG = JG = 0, 12 13 as apparent from Eq. 23.22. If we have two orthogonal planes of symmetry, say the planes x ˇ1 = x ˇ2 = 0, then all the off–diagonal terms vanish: JG = JG = JG = 0, 12
13
(23.24)
23
so that JG is a diagonal matrix.
23.2.3 Euler equations for rigid–body dynamics Here, we introduce an equation that is fully equivalent to (JG ω)˙= mGE (Eq. 23.16), and yet much simpler to use. Specifically, we want to perform the time derivative in Eq. 23.16 in terms of its components. This may be accomplished in at least two different ways, namely by using the components: (i) in a non– rotating frame with origin xG, or (ii) in a frame that is rigidly connected to the body.
959
23. Rigid bodies
The first approach has a major disadvantage — in such a frame of reference, the components of JG are, in general, time–dependent, since the rigid body is rotating with respect to such a frame. Thus, in taking the time derivative hG we would have to include the time derivative of the components of JG. On the other hand, using the second approach, we can exploit the fact that δJG/δt = 0. For this reason, following Euler, it is preferable to use the second approach.1 Accordingly, using db/dt = δb/δt + ω × b (Eq. 22.23), Eq. 23.16 becomes JG
δω + ω × (JG ω) = mGE . δt
(23.25)
Next, assume that the body has two planes of symmetry. In this case, we have JG = JG = JG = 0 (Eq. 23.24). [Strictly speaking the existence 12 13 23 of two orthogonal planes of symmetry is not required. The same result can always be obtained by an appropriate choice of the axes, as it will be shown in Subsection 23.2.4.] Accordingly, we have ˇ = J1 ω 1 j 1 + J 2 ω 2 j 2 + J 3 ω 3 j 3 , h G
(23.26)
where ωk are the components of ω in the body frame. [For the sake of notational simplicity and following a nearly universal tradition, we have omitted the symbol that was used in Eq. 22.1. Also, we have set Jh := JG .] hh Then, we have ⎤ ⎡ j1 j2 j3 ω2 ω3 ⎦ ω × (JG ω) = ⎣ ω1 (23.27) J 1 ω1 J2 ω2 J3 ω3 = (J3 − J2 ) ω2 ω3 j1 + (J1 − J3 ) ω3 ω1 j2 + (J2 − J1 ) ω1 ω2 j3 . 1 Named after Leonhard Euler (1707–1783), definitely a key contributor to the interplay between mathematics and mechanics. Euler was a Swiss mathematician and physicist, who spent most of his adult life first in Russia and then in Germany. He is considered the most prolific writer of mathematics of all time, the most important mathematician/mechanician of the XVIII century, and one of the most important ever. He is the “scientific grandson” of Leibniz, since he studied mathematics under the supervision of the Swiss mathematician Johann Bernoulli, who in turn had studied calculus directly from Leibniz’s papers. Euler has worked in many different fields, and contributed substantially to all of them. There are several items named after him. Just to name the few that are included in this book, in addition to the Euler equations for rigid–body dynamics (Eq. 23.28), we have the Euler acceleration due to an angular acceleration of the body frame of reference (Eq. 22.32), along with the corresponding (apparent) inertial force (Section 22.4), the Euler formula relating imaginary exponentials to sines and cosines, three methods of numerical integration of differential equations, the Euler–Bernoulli beam, the Euler load on a column, the Euler–Lagrange differential equation in calculus of variations, the Eulerian point of view in continuum mechanics, the d’Alembert–Euler acceleration formula and, last but definitely not least, the Euler equations in fluid dynamics.
960
Part III. Multivariate calculus and mechanics in three dimensions
Accordingly, Eq. 23.25 may be written as (again, in terms of body–frame components) dω1 + (J3 − J2 ) ω2 ω3 = mGE , 1 dt dω2 + (J1 − J3 ) ω3 ω1 = mGE , J2 2 dt dω3 + (J2 − J1 ) ω1 ω2 = mGE . J3 3 dt
J1
(23.28)
The above are known as the Euler equations for rigid–body dynamics.
23.2.4 Principal axes of inertia
♥
Here, we finally have an interesting application of the eigenvalue problem theory briefly outlined in Subsection 16.3.4. As stated above, the fact that JG is diagonal is not limited to bodies that have two orthogonal planes of symmetry. Specifically, I claim that, for any moment of inertia matrix JG, there always exists a frame of reference, where the matrix JG is diagonal. [For the sake of simplicity and notational clarity, we limit the analysis to the continuum case; the discrete one is conceptually identical.] To this end, we begin by noting that Eq. 23.19 depends upon the frame of reference used, while Eq. 23.20 is independent. Thus, let us see what happens if we introduce a new body basis, j1 , j2 , j3 , still with origin at G. Recall that "3 jh = k=1 rhk jk (Eq. 15.104), where rhk = jh · jk (Eq. 15.105) are the components of jh in the basis j1 , j2 , j3 . Correspondingly, we have ˇx = R ˇx and ˇx = RT ˇx (Eqs. 15.112 and 15.114), where R := [rhk ] = [jh · jk ] (Eq. 15.106). Next, set /// ) /// ) * * T T JG = (ˇxT ˇx) I − ˇx ˇxT dV and JG = (ˇx ˇx ) I − ˇx ˇx dV. (23.29) V
V
Then, recalling that RT = R -1 (Eq. 16.64), we have /// ) * JG = (ˇxT ˇx) I − RT ˇx ˇxT R dV = R -1 JG R.
(23.30)
V
In plain words, JG and JG are similar matrices. In the following, we use Eq. 23.30 to find out whether there exists a frame of reference such that JG is diagonal, namely JG = Jh δhk , where Jh denotes the h-th diagonal term of JG. [You might like to review Subsection 16.3.4 before continuing.] Equation 23.30 may be written as JGR = R JG, namely
961
23. Rigid bodies
using indicial notation, may be written as
"3
j=1 JGhj rjk
3 #
=
"3
j=1 rhj Jj
(k) JG − λk δhj zj = 0, hj
δjk = rhk Jk . This in turn
(23.31)
j=1
where λk := Jk and (k)
zj
:= rjk .
(23.32)
In other words, to find the solution to this problem, we have to solve ⎤⎧ (k) ⎫ ⎧ ⎫ ⎡ JG − λk JG JG ⎪ ⎨z1 ⎪ ⎬ ⎨0⎬ 12 13 ⎥ ⎢ 11 (k) J G − λk JG = 0 , (23.33) ⎦ ⎣ JG21 z 22 23 ⎪ 2 ⎪ ⎩ ⎭ 0 JG JG JG − λk ⎩z3(k) ⎭ 31
32
33
with k = 1, 2, 3. For any given k, we have to solve an eigenvalue problem of the type JG − λ I z = 0. As outlined in Subsection 16.3.4, this is a homogeneous linear algebraic system, which admits a nontrivial solution iff the determinant
vanishes, namely JG − λ I = 0. This condition yields a third–order algebraic equation for the unknown λ (use the Sarrus rule, Eq. 3.166). This implies that there exist three roots λk (k = 1, 2, 3), and correspondingly three nontrivial (k) solutions zk = zh . As stated in Subsection 16.3.4, λk and zk (k = 1, 2, 3) are called the eigenvalues and the eigenvectors of A, respectively. Remark 166. At this point in our journey, I prefer not to bore you with the fact that, for any real symmetric matrix (like JG is), we have: (i) all the roots λk are real, and (ii) there always exist three real vectors zk that are mutually orthogonal. You have to take my word for it. [The proof is given in Volume II.] It is apparent that the vectors zk are uniquely defined (of course, except for arbitrary multiplicative constants, which may be used to impose the normalization condition zk = 1). Therefore, after normalization, the vectors zk form an orthonormal basis (Theorem 170, p. 643). Accordingly, (k) the relationship zh = rhk = jh · jk (Eqs. 23.32 and 15.105) implies that (k) zh is the h-th element of the k-th column of R, and hence R is indeed (k) an orthogonal matrix. Moreover, the same relationship implies that zh is the h-th component of jk . In other words, the vector zk coincides with the base vector jk of the frame that makes JG diagonal. Accordingly, we have obtained that, using an orthonormal frame of reference with origin at G and axes parallel to the vectors zk , the matrix JG assumes a diagonal form: JG = Diag Jp . (23.34)
962
Part III. Multivariate calculus and mechanics in three dimensions
This shows that Euler equations (Eq. 23.28) are not limited to those rigid bodies that have at least two orthogonal planes of symmetry, for both geometry and density distribution (Eq. 23.24). On the contrary, they are valid for any rigid body whatsoever, provided that we use the base vectors jk = zk . Remark 167. For the sake of simplicity, in this subsection we have assumed that the origin is placed at G. However, the results are valid as well if, in defining the tensor of inertia, the origin is not placed at G, but at a generic point, say O, as you may verify.
23.2.5 On the stability of frisbees and oval balls
♥
As stated in the preceding subsections, we have the following Definition 355 (Principal axes of inertia). Axes that make the matrix J diagonal, either because of the existence of two orthogonal planes of symmetry (Subsection 23.2.3), or because of the considerations in Subsection 23.2.4, are called the principal axes of inertia. Consider an object that is rotating with angular velocity ω = Ω j1 , where j1 is in the direction of one of the principal axes of inertia. Assume that the resultant moment applied to the body vanishes, mGE = 0. Let us perturb the angular velocity, so as to have ω = Ω j1 + ω .
(23.35)
◦ Warning. In this subsection, the symbol (prime) denotes perturbation. In terms of components, we have ⎧ ⎫ ⎪ ⎨Ω + ω1 ⎪ ⎬ ω= ω2 . (23.36) ⎪ ⎪ ⎩ ⎭ ω3 Correspondingly, the Euler equations of rigid–body motion, with mGE = 0, read dω1 + (J3 − J2 ) ω2 ω3 = 0, dt dω2 J2 + (J1 − J3 ) ω3 (Ω + ω1 ) = 0, dt dω3 J3 + (J2 − J1 ) (Ω + ω1 ) ω2 = 0. dt
J1
(23.37)
963
23. Rigid bodies
Remark 168. It should be pointed out that, when studying the stability of the solution of a specific equation, linearization does not involve any approximation. For, to study instabilities, we introduce an infinitesimal perturbation in the solution, and want to know whether the disturbance remains finite or grows. Since the disturbance can be made as small as we wish, all the nonlinear terms of the disturbance can always be neglected. Therefore, linearization is a particularly powerful tool to study stability. Accordingly, in view of the fact that we want to study the stability of the motion of our rigid body, we may assume that the perturbations are infinitesimally small, namely ω Ω, so that the quadratic terms may be neglected. This yields the linearized Euler equations for the case under consideration, namely dω1 = 0, dt dω2 J1 − J3 + Ω ω3 = 0, dt J2 dω3 J2 − J1 + Ω ω2 = 0. dt J3
(23.38)
The first of these equations states that ω1 remains constant. On the other hand, time differentiating the second equation and using the third to eliminate ω˙ 3 , one obtains d2 ω2 + C ω2 = 0, dt2
where C =
(J1 − J2 )(J1 − J3 ) 2 Ω . J2 J3
(23.39)
The same type of equation is obtained for ω3 , as you may verify. If J1 is either the largest of the three moments of inertia (as in the case of a frisbee), or the smallest (as in the case of oval balls used in rugby or American football), then C > 0. In this case, we may set C = ω 2 and the solution is ω2 (t) = A cos ωt + B sin ωt (Eq. 11.35) — no instability arises in this case! On the other hand, if J1 is neither the largest nor the smallest, then C < 0. In this case, we may set C = −α2 and the solution is ω2 = Aeαt + Be−αt (Eq. 14.106) — the solution is unstable! Finally, if J1 = J2 , the last in Eq. 23.38 yields ω3 = constant, say ω3 = A. In this case, the second in Eq. 23.38 reads dω2 /dt = A∗ , where we have set A∗ = −Ω A (J1 − J3 )/J2 . This yields ω2 = A∗ t + B. The solution is unstable in this case as well (independently of the sign of J1 − J3 ). [Similar results are obtained when J1 = J3 , as you may verify.]
964
Part III. Multivariate calculus and mechanics in three dimensions
23.3 Energy equation As stated above, we have only two vector unknowns (vG and ω), and hence the equations introduced above, namely the momentum equation for xG and the angular momentum equation around xG are sufficient to solve the problem. Nonetheless, sometimes the use of the energy equation allows us to obtain the solution more directly. [A fitting example is presented in Section 23.5, where in studying the motion for a disk on a slope we use the energy equation (Eqs. 23.78–23.80).] Here, we address how the energy equation is simplified through the rigid– body assumption. We address separately the energy of the center of mass, the energy around the center of the mass, and the total energy (sum of the preceding two).
• Energy of the center of mass Consider the energy equation regarding the motion of the center of mass. We have that Eq. 21.85, namely 2 dTG 1 dvG = m = fE · vG, dt 2 dt
(23.40)
is still valid. Thus, in this case no simplification or variation is introduced by the rigid–body assumption.
• Energy around the center of mass Next, consider the motion around the center of mass. We have the following Theorem 236. For a rigid body, the energy equation around the center of mass, dTˇ /dt = PˇE + PˇI (Eq. 21.87) reduces to 1 d ω · JGω = mGE · ω. 2 dt
(23.41)
◦ Proof : Let us begin with the kinetic energy expression for the motion "n around the center of mass. Recall that Tˇ = 21 k=1 mk ˇ vk 2 (Eq. 21.82), ˇk = ω × x ˇ k (rigid–body expression for the velocity, Eq. where now we have v 21.65). Combining, one obtains
965
23. Rigid bodies n
1# ˇ k 2 . mk ω × x Tˇ = 2
(23.42)
k=1
We have (use a × b2 = a2 b2 − (a · b)2 , Lagrange identity, Eq. 15.103) ˇ k 2 = (ω · ω) (ˇ ˇ k ) − (ω · x ˇ k ) (ˇ ω × x xk · x xk · ω) ˇk · x ˇk I − x ˇk ⊗ x ˇ k ω. =ω· x
(23.43)
[Hint: As stated in Remark 163, p. 956, for the dyadic product, A = b ⊗ c, defined in Eq. 23.138, we have the rule b ⊗ c d = b (c · d) (Eq. 23.142).] Hence, Eq. 23.42 yields 1 Tˇ = ω · JG ω, (23.44) 2 "n ˇk) I − x ˇk ⊗ x ˇ k is the moment–of–inertia tensor xk · x where JG = k=1 mk (ˇ introduced in Eq. 23.15. In matrix notation, we have (use JG, Eq. 23.13) 1 Tˇ = ωT JG ω. 2
(23.45)
Next, consider the power generated by the external forces in the motion "n ˇ k (Eq. 21.88), where around the center of mass, namely PˇE = k=1 fkE · v ˇk = ω × x ˇ k (Eq. 21.65). Using a · b × c = c × a · b (Eq. 15.89), we have v PˇE :=
n # k=1
ˇk = fkE · ω × x
n #
ˇ k × fkE · ω = mGE · ω, x
(23.46)
k=1
"n ˇ k × fkE is the external–force resultant moment with where mGE = k=1 x respect to xG (Eq. 21.56). On the other hand, for the power generated by the internal forces in the "n ˇ k ) (Eq. 21.92), with vh − v motion around G, namely PˇI = 12 h,k=1 fhk · (ˇ ˇh = ω × x ˇ h (Eq. 21.65), we have v n n 1 # 1 # ˇk ) = ˇ k ) = 0, fhk · (ˇ vh − v fhk · ω × (ˇ xh − x PˇI = 2 2 h,k=1
(23.47)
h,k=1
ˇh − x ˇ k are parallel (Eq. 17.60). [Recall Eq. 15.88 on the because fhk and x vanishing of the scalar triple product of three coplanar vectors.] Finally, substituting Eqs. 23.44, 23.46 and 23.47 into dTˇ /dt = PˇE + PˇI (Eq. 21.87), one obtains Eq. 23.41. Remark 169. Incidentally, note that Eqs. 23.44 and 23.45 imply that JG and JG are positive definite (see also Eq. 23.43).
966
Part III. Multivariate calculus and mechanics in three dimensions
For future reference, note that the energy equation deals with scalars, and hence is independent of the orientation of the frame of reference. In particular, in terms of components, using the principal axes of inertia of JG (Definition 355, p. 962), Eq. 23.45 yields , 1+ J1 ω12 + J2 ω22 + J3 ω32 Tˇ = 2
(23.48)
(where ωk are the components of ω in the body frame), and Eq. 23.41 may be written as , 1 d+ J1 ω12 + J2 ω22 + J3 ω32 = mGE · ω. (23.49) 2 dt • Total energy Sometimes it is convenient to combine Eqs. 23.40 and 23.41, namely to go back to the total energy equation dT /dt = PE + PI (Eq. 21.73). To this end, recall the K¨onig theorem, T = TG + Tˇ (Eq. 21.83), with TG = 12 m vG2 (Eq. 21.81) and Tˇ = 12 ωT JGω (Eq. 23.44). In addition, we have PE = fE · vG + PˇE (Eq. 21.91), with PˇE = mGE · ω (Eq. 23.46), whereas PI = PˇI = 0 (Eqs. 21.92 and 23.47). Combining, we obtain dT d 1 1 T 2 = m vG + ω JG ω = PE = fE · vG + mGE · ω. (23.50) dt dt 2 2
23.3.1 Comments Here, we wrap up the material on the equations of momentum, angular momentum and energy with some clarifying considerations.
• Six unknowns, seven equations We have come up with an apparent conundrum. We have more equations than unknowns. For, the equations of momentum, angular momentum and energy provide us with a total of seven scalar equations for only six scalar unknowns, namely the components of vG and ω. Akin to the system of linear algebraic equations, this is acceptable if, and only if, one of the equations may be obtained from the others. Accordingly, here we show that the energy equation is a consequence of the others.
23. Rigid bodies
967
Indeed, the energy equation of G, namely ( 12 m vG2 )˙ = fE · vG (Eq. 23.40), may be obtained by dotting the momentum equation m dvG/dt = fE (Eq. 23.7) with vG, to obtain m v˙ G · vG = fE · vG, which is equivalent Eq. 23.40. On the other hand, the energy equation around G, 12 (ω · JGω)˙= mGE · ω (Eq. 23.41), we may be obtained by dotting JG δω/δ t + ω × (JG ω) = mGE (Eq. 23.25) with ω, to obtain ω · JG δω/δt = mGE · ω, which is equivalent to Eq. 23.41.
• On treating a rigid body as a particle Here, I’d like to go back to the formulation for a single particle in Chapter 20, and reexamine the hypothesis introduced in Subsection 4.1.1, namely that a particle may be treated as a material point, provided that it is adequately small for the problem under consideration. Consider the motion of the center of mass. The pertinent equations are those regarding momentum, namely m dvG/dt = fE (Eq. 23.7) and energy, namely dTG/dt = fE · vG (Eq. 23.40). These are identical to the equations for a single particle, that is, the Newton second law (namely m a = f , Eq. 20.3) and the energy equation (namely dT /dt = f · v, Eq. 20.87). What we have discovered, is that even a large object may be treated as a particle if we are only interested in the motion of its center of mass, under prescribed forces. Specifically, the equation regarding the motion of the center of mass decouples from those regarding the angular momentum, again provided that the forces are prescribed. Let me clarify this last statement. Sometimes the forces acting on the body depend on the angular orientation of the body itself. A clear example is the dynamics of an airplane, where its angular orientation (the so–called trim) determines the forces. In fact, the lift acting on the airplane depends strongly upon its angle of attack (namely the angle between its velocity and its no–lift trim). As a consequence, the angular momentum equation is essential for the analysis. In this case, the equations do not decouple, and hence the airplane cannot be treated as a point particle.
23.4 Planar motion In this section, we simplify the equation for the case in which the motion is planar, as defined in the following
968
Part III. Multivariate calculus and mechanics in three dimensions
Definition 356 (Planar motion). The motion of a three–dimensional system of particles is called planar iff at all times the velocity of each particle has a zero component along a given direction. Necessary and sufficient conditions for the motion of a rigid body to be planar are (use a frame of reference with two axes, say x and y, in such a plane) vG = uG i + vG j
and
ω = ω k.
(23.51)
Indeed, the above expressions correspond to a planar motion and vice versa, as you may verify. [Hint: Use v = vO + ω × x (Eq. 23.2).]
23.4.1 Momentum and angular momentum equations Here, we consider the rigid–body equations for planar motions, say in the (x, y)-plane. This implies wG = ω1 = ω2 = 0 (Eq. 23.51). In this case, the equations of rigid–body dynamics simplify considerably. Indeed, for planar motions, only three equations are relevant. These are the first two components of the equation of motion for G (Eq. 23.7), namely duG = fxE , dt dvG m = fyE , dt m
(23.52)
along with the third component of the equation of motion around G (Eq. 23.28), which for ω1 = ω2 = 0 reads JG
dω = MGE , dt
(23.53)
where we have set ω := ω3 (ω3 being the angular speed in the inertial frame), JG := J3 and MGE := mGE . 3
• Conditions for planar motion
♠
If we look at the above problem as a three–dimensional one, we note that three equations have not been included, namely: (i) the third component of the momentum equation, and (ii) the first two components of the angular momentum equation. Here, we address this issue in greater detail. Specifically,
969
23. Rigid bodies
let us assume that we have wG = ω1 = ω2 = 0 at time t = 0 and find out under what conditions we continue to have wG = ω1 = ω2 = 0 at all times. The formulation presented below addresses both: (i) unconstrained bodies, and (ii) bodies that are constrained to move in a planar motion. [Think for instance of a hockey puck sliding on ice, or a circular pendulum, or even a body hinged at the endpoint of a massless frictionless telescopic rod (namely a rod composed of sliding overlapping sections).] Accordingly, in the two subsubsections that follow, we address the two cases separately.
• Unconstrained body Consider first the third component of Eq. 23.7, namely m
dwG = fzE . dt
(23.54)
Accordingly, the necessary and sufficient condition to continue to have wG = 0 at all times is that fzE = 0 (see Remark 98, p. 465). Next, consider the angular momentum equation, and assume that at time t = 0, we have ω1 = ω2 = 0. Hence, at time t = 0, we have ⎤ ⎡ i1 i2 i3 0 ω3 ⎦ = − JG i1 + JG i2 ω32 . (23.55) ω × (JG ω) = ⎣ 0 23 13 JG ω3 J G ω3 JG ω3 13
23
33
Therefore, still at time t = 0, the first two body–axis components of the angular momentum equation around the center of mass are given by JG
11
JG
21
dω1 + JG 12 dt dω1 + JG 22 dt
dω2 + JG 13 dt dω2 + JG 23 dt
dω3 − JG ω32 = mGE , 23 1 dt dω3 2 E + JG ω3 = mG . 13 2 dt
(23.56)
[Hint: Substitute Eq. 23.55 into JG δω/δt + ω × (JG ω) = mGE (Eq. 23.25). Do not use the hypothesis JG = JG = JG = 0 (Eq. 23.24).] 12 13 23 In order to continue to have ω1 = ω2 = 0 for all t > 0, it is necessary and sufficient that, for any t > 0, we also have ω˙ 1 = ω˙ 2 = 0 (see Remark 98, p. 465). Therefore, necessary and sufficient conditions to have a planar motion for an unconstrained rigid body are (i) mGE = mGE = 0, and (ii) 1 2 JG = JG = 0 (since ω3 is arbitrary), namely that the x3 -axis is a principal 23 13 axis of inertia. [We can always assume JG = 0 without loss of generality, 12 since this requires only a change of axes in the (x1 , x2 )-plane (Subsection 23.2.4).]
970
Part III. Multivariate calculus and mechanics in three dimensions
◦ Comment. Of course, this conclusion is true exclusively from a mathematical point of view, since it requires that the initial conditions are satisfied exactly. From a physical point of view, we have to take into account that the initial conditions may be nonhomogeneous, even ever so slightly. Then, we have to consider the possibility of instability, which may be triggered by an infinitesimal disturbance, such as the proverbial flapping of a butterfly wing. [A fitting example is the instability of a rigid body rotating around the principal axis corresponding to the intermediate moment of inertia, as discussed in Subsection 23.2.5, on the stability of frisbees and oval balls.].
• Constrained body Here, we assume that the constraints guarantee that wG = ω1 = ω2 = 0 at all times. In this case, the three unused equations give us the constraint reactions. Specifically, regarding the angular momentum equation, let us go back to Eq. 23.56, and set ω1 = ω2 = 0, at all times. This yields mGE = JG
13
1
mGE = JG
23
2
dω3 − JG ω32 , 23 dt dω3 + JG ω32 , 13 dt
(23.57)
where mGE and mGE are the planar components of the resultant moment due 1 2 to applied moments (known) and constraint reaction moments (unknown). Similarly, the third component of the momentum equation gives fzE = 0 (use Eq. 23.54), where fzE is the resultant due to applied forces and constrained reactions.
23.4.2 Energy equation To conclude this section, let us consider the energy equation. The planar– motion assumption does not affect the expressions of the three energy equations addressed in the preceding section, namely: (i) that of the center of mass dTG/dt = fE · vG (Eq. 23.40); (ii) that around the center of mass dTˇ /dt = PˇE (Eq. 23.41); and (iii) that of the total energy, sum of the preceding two, dT /dt = PE (Eq. 23.50). The only simplification occurs in the expressions for Tˇ and PˇE, since now Tˇ = 12 JG ω 2 and PˇE = MGE ω. Correspondingly, we have T =
1 1 m vG2 + JG ω 2 2 2
(23.58)
971
23. Rigid bodies
and PE = fE · vG + MGE ω. Hence, the total energy equation (Eq. 23.50) now reads d 1 1 m vG2 + JG ω 2 = PE = fE · vG + MGE ω, dt 2 2
(23.59)
(23.60)
which integrated yields
1 1 m vG2 + JG ω 2 2 2
t1 t0
= WE(t0 , t1 ) =
/
t1
t0
23.5 A disk rolling down a slope
fE · vG + MGE ω dt.
(23.61)
♥
This section is included to provide you with a fairly elaborate illustrative example that we are able to address with the know–how developed up to this point. Specifically, let us consider the motion of a disk, having radius R and thickness δ, rolling down a piecewise frictionless slope (Fig. 23.2). This analysis should clarify several issues, such as the instantaneous axis of rotation, as well as the convenience, sometimes, of using the energy equation instead of the momentum and/or angular momentum equation. [You may review the formulation used to study the motion of a sled (Subsection 20.2.3), and note how much more complicated the problem is now, because we have to include the motion around the center of mass.] Consider a start from rest from the point P0 , and assume that the portion of the slope between P1 and P2 (marked in gray in Fig. 23.2) may be considered as frictionless (think of driving over a patch of ice), whereas friction exists everywhere else (marked in black), with μS and μD < μS denoting, respectively, the static and dynamic friction coefficients, here assumed to be constant.
Fig. 23.2 Disk on a slope
972
Part III. Multivariate calculus and mechanics in three dimensions
Let us place the origin at P0 , with the x-axis along the slope (pointing down the slope), the y-axis normal to it (also pointing downwards, Fig. 23.2) and the z-axis normal to the plane of the figure, pointing away from your eyes. Let us assume that the center of mass G coincides with the center of the disk (e.g., that the density of the disk is constant), and that the disk is always in contact with the slope. Let us denote by C(t) the point of the disk that is in contact with the slope, at any given time t. One can use two equivalent abscissas to give the location of the disk: the first is the distance x traveled by the contact point, the second is the distance xG traveled by G. Of the two, I prefer to use xG(t), in order to emphasize the applicability of the equations in the preceding subsection. Of course, we have x(t) = xG(t).
(23.62)
23.5.1 Solution from P0 to P1 It is convenient to address separately the three segments: (i) from P0 to P1 , (ii) from P1 to P2 , and (iii) after P2 . Here, we consider the first one. [The other ones are dealt with in the two subsections that follow.] As stated above, we assume that the disk starts from rest. Accordingly, the initial conditions are xG(0) = vG(0) = ω(0) = 0.
(23.63)
• Kinematics. The no–slip constraint A contact point is by definition the point where the two bodies touch each other. For instance, at the location of the light gray disk (Fig. 23.2), the point C of the disk and the point P of the slope coincide. This occurs only for an instant of time, say at t = tP . The point P , being a point of the slope, does not move, so that vP = 0, at all times. On the other hand, the point C, being a point of the disk, has velocity vC = vG + ω × (xC − xG).
(23.64)
The x-component of this equation is vC = vG − ω R,
(23.65)
973
23. Rigid bodies
with ω > 0 (clockwise, for a change, because the z-axis points away from your eyes, Fig. 23.3).
Fig. 23.3 Disk on a slope
At this point, we introduce the no–slip assumption, namely we assume that the friction is such that no slip occurs at the contact point. This means that, at time t = tP , P and C have the same velocity, namely vC = vP = 0. [Incidentally, this implies that C is on the instantaneous axis of rotation.] Combining with Eq. 23.65, we have vG = ω R.
(23.66)
This is the desired relationship between vG and ω, introduced by imposing the no–slip assumption. We will refer to this as the no–slip constraint. [The condition to impose on the static friction coefficient μS for slip not to occur is presented at the end of this subsection (Eq. 23.82).]
• Dynamics Here, we address dynamics. ◦ Governing equations and solution. The equations of motion for the center of mass (Eq. 23.52) reduce to mx ¨G = − T + m g sin α, 0 = −N + m g cos α,
(23.67)
where N > 0 is the reaction, normal to the slope, T > 0 is the magnitude of the friction force (tangential to the slope), whereas α ∈ (0, π/2) is the angle that the slope forms with the horizontal axis (Fig. 23.3 again). [Note that
974
Part III. Multivariate calculus and mechanics in three dimensions
y¨G = 0. There is no acceleration along the normal to the slope, because we assume the disk to remain in contact with the slope at all times.] In addition, we have the Euler equation, JG ω˙ = MGE (Eq. 23.53), which for this specific case reads JG
dω = T R, dt
(23.68)
where JG is the moment of inertia of the disk. [If the density is only a function of the radial location, namely if = (r), we have (use polar coordinates, so that dA = r dθ dr, Eq. 19.129) /// JG =
2
(x, y, z) x + y
2
2π/ R
/ δ/ dV =
V
0
0
(r) r3 dr dθ dz.
(23.69)
0
If is constant, we obtain JG = 12 δ π R4 = 12 m R2 (Eq. 23.23), where m = π δ R2 denotes the mass of the disk.] Equations 23.67 and 23.68 are the governing equations of motion. Next, we want to obtain the solution to these equations, subject to the homogeneous initial conditions (Eq. 23.63) and the no–slip kinematic constraint (Eq. 23.66). Dividing the first in Eq. 23.67 by m, one obtains v˙ G = g sin α − T /m.
(23.70)
Next, let us introduce the following Definition 357 (Gyradius). The gyradius (or radius of gyration) of the disk, denoted by ρG, is given by ρG =
JG . m
(23.71)
The gyradius equals the distance from the axis where one should concentrate the mass, in order to have the same moment of inertia. [More on moments of inertia and gyradii in Appendix A (Section 23.8), where we will see that for a disk we always have ρG ≤ 1 (Eq. 23.130). In particular, if the density is constant, we have ρ2G = 12 R2 (see below Eq. 23.69).] Hence, dividing Eq. 23.68 by m R, one obtains (use vG = ω R, Eq. 23.66) ω˙ ρ2G/R = v˙ G ρ2G/R2 = T /m.
(23.72)
Then, eliminating T /m between Eqs. 23.70 and 23.72, one obtains v˙ G(t) =
g sin α . 1 + ρ2G/R2
(23.73)
975
23. Rigid bodies
Thus, integrating and recalling that vG(0) = 0 (Eq. 23.63), we have vG(t) =
g sin α t. 1 + ρ2G/R2
(23.74)
Integrating again and recalling that xG(0) = 0 (Eq. 23.63 again), we have xG(t) =
g sin α t2 . 1 + ρ2G/R2 2
(23.75)
This implies that the disk reaches the point P1 at time t = t1 , with t1 given by 1 1 + ρ2G/R2 x1 , (23.76) t1 = 2 g sin α where x1 denotes the abscissa of P1 . Correspondingly, combining Eqs. 23.74 and 23.75, we have that the velocity at P1 is given by 1 1 2 g x1 sin α 2 g h1 v1 := vG(t1 ) = = , (23.77) 2 2 1 + ρG/R 1 + ρ2G/R2 where h1 := x1 sin α > 0 denotes the vertical distance between P0 and P1 . The final conditions, x1 and v1 , are all we need to address what happens next. They will be used as the initial conditions for the analysis of the solution from P1 to P2 , addressed in Subsection 23.5.2. ◦ Comment. Note that, if = (r/R), then ρG/R is independent of R and m (Remark 170, p. 989), and hence t1 is independent of R and m. This means that two circular cylinders, having different mass and/or different size but the same distribution = (r/R), will arrive together at x1 (an arbitrary point!), provided of course that friction and aerodynamic drag are negligible.
• Solution via energy considerations As pointed out in Subsection 11.9.2, the relationship v = v(x) may be obtained much more easily by using the energy equation. In the present case, using Eq. 23.61, we have
1 1 m vG2 + JG ω 2 2 2
t1 0
= WE(0, x1 ).
(23.78)
976
Part III. Multivariate calculus and mechanics in three dimensions
The forces are T , N and the weight W . Note that the forces T and N are applied at the contact point C, and that vC = 0. Thus, the work performed by these forces vanish. On the other hand, the work performed by the weight is given by mgh1 (Eq. 20.118), where h1 = x1 sin α > 0. Thus, WE is given by WE = mgh1 . Hence, Eq. 23.78, with v0 = ω0 = 0, yields 1 1 m v12 + JG ω12 = m g h1 . 2 2
(23.79)
Finally, dividing by m/2 and using v1 = ω1 R (use Eq. 23.66) and ρ2G = JG/m (Eq. 23.71), one obtains 1 + ρ2G/R2 v12 = 2 g h1 , (23.80) which is fully equivalent to Eq. 23.77.
• Condition for no slip As promised above (just below Eq. 23.66), here we address the mathematical expression of the condition for no slip to occur. Eliminating m v˙ G between Eqs. 23.70 and 23.72, one obtains mg sin α − T = T R2 /ρ2G, or T =
m g sin α . 1 + R2 /ρ2G
(23.81)
On the other hand, we have N = mg cos α (second in Eq. 23.67). Combining these equations with the law for static friction, T /N < μS (Eq. 17.87), we obtain T /N = tan α/(1 + R2 /ρ2G) < μS, or tan α < 1 + R2 /ρ2G μS.
(23.82)
This is the desired mathematical expression for the no–slip condition. Such a condition was tacitly assumed in the analysis of this subsection. If such a condition is not satisfied, namely if tan α > (1 + R2 /ρ2G) μS, the disk will start slipping from the very beginning. Then, the friction force will get it also to rotate (with slip). [You might like to sharpen your skills and perform such an analysis. However, I suggest that you first complete reading this whole section, because a similar problem is addressed in Subsection 23.5.3.]
977
23. Rigid bodies
23.5.2 Solution from P1 to P2 As stated at the beginning of this subsection, between P1 and P2 we assume the surface to be frictionless. Therefore, here we have T = 0. This implies that we have to allow for the presence of slip, and hence the no–slip constraint vG = ωR (Eq. 23.66) no longer applies. Accordingly, the first of Eq. 23.67 (mv˙ G = mg sin α − T ) and Eq. 23.68 (JG ω˙ = T R) are no longer interrelated. On the contrary, Eq. 23.68 becomes JG ω˙ = 0, which integrated gives ω2 = ω1 = v1 /R.
(23.83)
On the other hand, for evaluating vG(t2 ) (namely when C is at P2 ), we can use Eq. 23.70 with T = 0, namely v˙ G = g sin α, which integrated yields vG(t) = v1 +g (t−t1 ) sin α, and hence xG(t) = x1 +v1 (t−t1 )+ 12 g (t−t1 )2 sin α. This provides us with t2 , since x2 := xG(t2 ) is prescribed. Then, from t2 , we can evaluate v2 := vG(t2 ) = v1 + g (t2 − t1 ) sin α. Again, the final conditions (namely x2 , v2 and ω2 ) are all we need to address what happens next, as they will be used as the initial conditions for the analysis of the solution after P2 , which is addressed in the subsubsection that follows. ◦ Comment. Alternatively, to obtain v2 much more directly, we can use the center–of–mass energy equation dTG/dt = fE ·vG (Eq. 23.40), which integrated gives (use the facts that T = 0, and that N is normal to vG, so that its work vanishes) v22 = v12 + 2 g (h2 − h1 ),
(23.84)
where hk = xk sin α (k = 1, 2). [You might like to verify that we have obtained the same result. Hint: Multiply x2 − x1 = v1 (t2 − t1 ) + 12 g (t2 − t1 )2 sin α by 2g sin α, and use v2 − v1 = g (t2 − t1 ) sin α to eliminate t2 − t1 .]
23.5.3 Solution after P2 The solution after P2 is the most interesting one. Note that, when the contact point arrives at P2 , we have (use Eqs. 23.83 and 23.84) v2 > v 1 = R ω 2 .
(23.85)
Thus, at P2 the no–slip constraint (namely vG = R ω, Eq. 23.66) is not satisfied. Therefore, we necessarily have slip, at least initially. Hence, immediately after P2 , the friction force arises from the dynamic (not the static) friction law, namely T = μD N (Eq. 17.89). This yields (use N = m g cos α, Eq.
978
Part III. Multivariate calculus and mechanics in three dimensions
23.67), T = μD N = μD m g cos α.
(23.86)
Next, note that if ω R grows faster than vG, there exists a time t = t3 when the disk will reach the no–slip situation vG = ω R (Eq. 23.66) and it will stop slipping. So, the question is: “Does ωR grow faster that vG? Here, we address this issue in detail. Specifically, note that the presence of friction has two effects. The first is that the resulting moment increases the angular speed, whereas the second is that the force tends to decrease vG (use the first in Eq. 23.67). Thus, we should consider the possibility that the disk at some point might reach a no–slip situation vG = ω R (Eq. 23.66). I said “might,” because gravity has the opposite effect on vG, as it tends to increase it. Specifically, if we reduce the angle α, we reduce the effect of gravity, and hence we expect that, for small α, after a while, the disk reaches the no–slip situation. But what happens if we increase α, even if we respect the no–slip condition, namely tan α < μS (1 + R2 /ρ2G) (Eq. 23.82)? Is it possible that the disk never reaches the no–slip situation vG = ω R (Eq. 23.66)? To answer these questions, let us first evaluate the solutions for vG = vG(t) and ω = ω(t), while the disk is slipping. Then, we will be able to ascertain whether the disk ever reaches the no–slip situation vG = ω R. Integrating v˙ G = g sin α − T /m (Eq. 23.70) from t2 to t, with T /m = μD g cos α (Eq. 23.86), one obtains vG(t) = v2 + C (t − t2 ),
where C := g (sin α − μD cos α). (23.87)
On the other hand, combining the angular momentum equation ω˙ ρ2G/R = T /m (Eq. 23.72) and Eq. 23.86, we obtain ρ2G ω/R ˙ = μD g cos α. This integrated yields ω(t) R = ω2 R + D (t − t2 ),
where D := (μD g cos α) R2 /ρ2G.
(23.88)
Thus, the velocities vG and ω R vary linearly with t. In other words, the last two equations correspond to two straight lines on the plane velocity–time. If C > D, namely if sin α − μD cos α > μD
R2 cos α, ρ2G
(23.89)
that is, if tan α < μD 1 + R2 /ρ2G ,
(23.90)
979
23. Rigid bodies
then the two straight lines will intercept at t = t3 > t2 . The corresponding point is denoted by P3 . [You might have asked yourself: “How come in Fig. 23.2 the disk at the location P3 is shown in gray?” The reason is that the existence of the point P3 is not guaranteed, since it is subject to the condition in Eq. 23.90.] This means that at P3 we will have vG = ωR, and the disk will stop sliding. Then, we are back to the formulation regarding the motion from P0 to P1 , albeit with nonhomogeneous initial conditions. On the other hand, assume that (23.91) μD 1 + R2 /ρ2G < tan α < μS 1 + R2 /ρ2G , [Hint: The second inequality corresponds to the no–slip condition (Eq. 23.82).] In this case, the two lines will not intersect for t > t2 , even though the no–slip condition is satisfied. This means that the disk will continue to slip at all times. These results are summarized in Fig. 23.4, where in the ordinates we have either v (Eq. 23.87), or ω R (Eq. 23.88), whereas in the abscissas we have τ = D t.
(23.92)
[With this choice, we have only one line for ω R (that is, the line that starts from ω2 R), which may be written as ω R = ω2 R + (τ − τ2 ).]
Fig. 23.4 Existence of P3
In Fig. 23.4, we present five lines for v(t). The first, denoted by (1), corresponds to tan α ∈ 0, μD , so that the coefficient C = g(sin α − μD cos α) in Eq. 23.87 is negative. The second, denoted by (2), corresponds to tan α = μ D, so that C vanishes. The third line, denoted by (3), corresponds to tan α ∈ μD, μD(1 + R2 /ρ2G) , so that the coefficient C is positive, and the condition for the existence of the point P3 (Eq. 23.90) is satisfied. The fourth, denoted
980
Part III. Multivariate calculus and mechanics in three dimensions
by (4), corresponds to tan α = μD(1 + R2 /ρ2G), which is the upper bound for the condition for the existence of the point P3 (Eq. 23.90) to be satisfied. The fifth, denoted by (5), corresponds to tan α = μS(1 + R2 /ρ2G), which is the upper bound for α, under the no–slip condition (Eq. 23.82). The dark gray region in Fig. 23.4 corresponds to the condition in Eq. 23.91. In this region, for τ > τ2 , the disk will continue to slip forever, even though it satisfies the no–slip condition. [For, this applies only if initially the disk is not slipping.] Indeed, a line in this region never meets the line ω R = ω2 R + (τ − τ2 ), for any τ > τ2 .
23.6 Pivoted bodies
♣
Sometimes we are interested in studying the dynamics of a rigid body in which one point of the body, say O, is fixed (universal or Cardano joint), so that vO = 0.
(23.93)
A body that has a fixed point in three dimensions (such as a spherical pendulum) will be referred to as a pivoted body. [On the other hand, a body with a fixed line in three dimensions will be referred to as a hinged body, as it will a body with a fixed point in two dimensions (Section 23.7).] For a pivoted body, the motion is fully determined by the angular momentum equation. To be specific, to study this problem it is convenient to use the angular momentum equation around O, so as to eliminate the constraint reactions acting at O. Accordingly, here we modify the angular–momentum formulation used thus far (namely for a motion around the center of mass), by introducing the rigid–body angular–momentum equation around O, instead of G. Let us begin with the definition of the angular momentum with respect to an arbitrary point O. Choosing the origin to coincide with O, so that xO = 0, Eq. 21.54 reduces to hO =
n #
mk xk × vk .
(23.94)
k=1
Using the fact that vO = 0, we have that vk = ω × xk (Eq. 23.2), and hence (use the BAC–minus–CAB rule, Eq. 15.96) hO =
n # k=1
) * mk ω xk · xk ) − xk (xk · ω .
(23.95)
981
23. Rigid bodies
ˇ = J ω (Eq. 23.14), we have Proceeding as we did to show that h G G hO = JO ω,
(23.96)
where JO =
n #
) * mk (xk · xk ) I − xk ⊗ xk .
(23.97)
k=1
Since vO = 0, we can use h˙ O = mOE (Eq. 21.59), and obtain d (J ω) = mOE . dt O
(23.98)
Next, let us review the formulation in the body frame, for which we have δJO /δt = O, where O denotes the zero second–order tensor (Definition 360, p. 993). Accordingly, using db/dt = δb/δt + ω × b (Eq. 22.23), Eq. 23.98 yields JO
δω + ω × (JO ω) = mOE , δt
(23.99)
in full analogy with Eq. 23.25. [In other words, using O instead of G does not alter the angular momentum equation, because vO = 0 (Eq. 21.55).] In general, in terms of body–axis components, Eq. 23.96 yields hO = JO ω1 + JO ω2 + JO ω3 , 1
11
12
13
hO = JO ω 1 + J O ω 2 + J O ω 3 , 2
21
22
23
hO = JO ω 1 + J O ω 2 + J O ω 3 . 3
31
32
(23.100)
33
[As we did for the Euler equations, we are omitting the symbol (see Eq. 23.26 and lines the follow).] Next, let us use the principal axes of inertia, so that JO is a diagonal matrix, namely JO = JO = JO = 0. 12
13
(23.101)
23
[This occurs if have two orthogonal planes of symmetry (Remark 165, p. 958). Alternatively, use a frame of reference that makes the matrix JO diagonal (see Subsection 23.2.4, in particular Remark 167, p. 962).] Then, following the procedure used to obtain the Euler equations for rigid–body dynamics (Eq. 23.28), we have
982
Part III. Multivariate calculus and mechanics in three dimensions
JO
11
JO
22
JO
33
dω1 + (JO − JO ) ω2 ω3 = mOE , 33 22 1 dt dω2 + (JO − JO ) ω3 ω1 = mOE , 11 33 2 dt dω3 + (JO − JO ) ω1 ω2 = mOE , 22 11 3 dt
(23.102)
in full analogy with Eq. 23.28.
• Moment of inertia tensor for a continuum For a continuum, we have (place the origin in O) /// ) * JO = (x · x) I − x ⊗ x dV,
(23.103)
V
or, in matrix notation, /// JO =
V
) * (xT x) I − x xT dV,
(23.104)
or, in terms of components, /// ) * x21 + x22 + x23 δhk − xh xk dV. JO =
(23.105)
For instance, we have /// x21 + x22 dV JO =
(23.106)
hk
33
V
and
V
JO = − 12
/// V
x1 x2 dV.
• Relationship between JO and JG Here, we want to establish the relationship between JO and JG. Again, let us ˇ = x − xG (Eq. 21.65), use a frame of reference with the origin at O. Using x ˇ + xG, we have namely x = x
983
23. Rigid bodies
/// JO =
) * (ˇ x + xG) · (ˇ x + xG) I − (ˇ x + xG) ⊗ (ˇ x + xG) dV
///V ) * ˇ) I − x ˇ⊗x ˇ dV = (ˇ x·x /// V ) * ˇ ⊗ x G − xG ⊗ x ˇ dV + 2 (ˇ x · xG ) I − x ///V ) * + (xG · xG) I − xG ⊗ xG dV.
(23.107)
V
Next, recall that the definition of center of mass (Eq. 21.46) yields Eq. 21.52, which for a continuum reads /// ˇ dV = 0. x (23.108) V
This equation tells us that the next to the last integral in Eq. 23.107 vanishes, since xG is space–independent. Then, we are left with * ) JO = JG + m (xG · xG) I − xG ⊗ xG .
(23.109)
This is the desired relationship between JO and JG. [The term in bracket is the moment of inertia with respect to O of a mass m placed at G.] For two–dimensional motion, the only moment of inertia of interest is JO := JO . In this case, the above equation simply yields 33
J O = JG + m 2 ,
(23.110)
where is the distance between xG and xO , as you may verify.
23.6.1 Illustrative example: my toy gyroscope
♥
When I was a kid, still in grammar school, my parents gave me a toy gyroscope as a birthday present. I guess they had decided to encourage my interests in the sciences. What foresight on their part! My toy gyroscope consisted of a flywheel surrounded by a frame, along with a pedestal for its support (Fig. 23.5). The axle of the flywheel had a hole, as shown in the figure. It worked like this. You would put a piece of twine through the hole and wrap it around the flywheel axle. Then, you pull the twine with all the strength you had so as to get the flywheel to spin as fast as you could. Next, you would put the bottom of the gyroscope on its support (in gray in the figure) and see what happens.
984
Part III. Multivariate calculus and mechanics in three dimensions
Fig. 23.5 Gyroscope
Fig. 23.6 Horizontal–axle gyroscope
After a while it got to be pretty boring — not much difference from a spinning top I had when I was a toddler. Nonetheless, something captured my attention. When I put the gyroscope on the support with a horizontal axle, its axis rotated around the vertical, with its axle remaining pretty much horizontal. This is something I could not observe with my spinning top a few years earlier, since the top would touch the ground much before its axle was horizontal. I wondered then what kept the gyro from falling, like any other object. Now I have the answer and I can present it to you.
• A horizontal–axle gyroscope I remember that the gyro’s axel rotated at a somewhat constant angular velocity around the vertical axis, and that the angle of the gyro’s axel and the vertical axis increased very slowly with time. I claim that these variations are due to friction and air drag. Then, neglecting them, the gyro’s axel should remain horizontal. To verify this, let us limit to a very simple model, namely a horizontal–axle gyroscope (Fig. 23.6), in the absence of friction and air drag. Then, the giro’s frame ought to rotate horizontally, with angular speed ω, the so–called gyroscopic procession (Fig. 23.6). Let us place the origin O at the pivot and introduce the following right– handed base vectors: (i) the unit vector j1 (t) in the direction of the horizontal flywheel axle, pointing away from the pivot; (ii) the unit vector j3 , in the vertical direction, pointing upwards, and (iii) the unit vector j2 = j3 × j1 (pointing away from your eyes). In addition, assume the gyro to be axisymmetric, like any typical gyro is, and the gyro’s frame to be weightless. Recall the angular momentum equation, dhO /dt = mOE (Eq. 21.59). In our case, the angular velocity of the gyro is given by Ω j1 + ω j3 . Correspondingly,
985
23. Rigid bodies
the angular momentum is given by hO = J Ω j1 (t) + J3 ω j3 ,
(23.111)
where we have set J := J1 . In addition, we have emphasized that only j1 is a function of time, as per our assumption; indeed, j1 is rigidly connected with the gyro’s frame, which has an angular velocity equal to ω = ω j3 . Accordingly, using the Poisson formula dj1 /dt = ω × j1 (Eq. 22.22) and j2 = j3 × j1 (Eq. 15.73), we have d j1 = ω j3 × j1 = ω j2 , dt
(23.112)
so that dhO /dt = J Ω ω j2 . Moreover, the moment due to the force of gravity −W j3 (where W denotes the weight) is given by mOE = − j1 × W j3 = W j2 ,
(23.113)
where is the distance between O and the center of mass of the gyroscope. Accordingly, the angular momentum equation, dhO /dt = mOE (Eq. 21.59) reduces to J Ω ω j2 = W j2 .
(23.114)
W W = , JΩ hO
(23.115)
This yields ω=
1
where hO = JΩ is the first component of the angular momentum hO . 1 ◦ Comment. You might want to check out your skill and study the problem when the axle of the gyro makes an angle θ = 0 with respect to the horizontal plane. [For a very extensive analysis of gyros and tops, you may consult Kline and Sommerfeld, Ref. [36].]
23.7 Hinged bodies Here, we restrict the analysis to a hinged body. The motion is necessarily planar, say in the (x, y)-plane. Specifically, we consider a body that is rotating around a frictionless hinge xH , so as to have only one degree of freedom, the angle of rotation around the hinge. Using hO = JO ω1 + JO ω2 + JO ω3 (Eq. 23.100), with ω1 = ω2 = 0, 3 31 32 33 we have
986
Part III. Multivariate calculus and mechanics in three dimensions
hH = JH ω,
(23.116)
where we have set ω := ω3 , hH := hO , and JH := JO . Accordingly, using 3 33 the third component of dhO /dt = mOE (use Eq. 21.59, since vO = vH = 0), as well as Eqs. 23.116, we have JH
dω = MHE , dt
(23.117)
where MHE = mg sin θ denotes the component of mHE normal to the plane of the motion (positive counterclockwise).
23.7.1 Illustrative example: realistic circular pendulum Thus far, for this type of problem, we have considered only an ideal pendulum, which consists of a mass–point connected to a hinge by a massless rod. Actual pendulums are a bit more complex than this. Here, as an illustrative example, we generalize this to a realistic circular pendulum that is composed of a rigid body that is constrained to rotate around a horizontal–axis hinge. The motion is necessarily two–dimensional, with only one degree of freedom, namely the angle of rotation around the hinge. Specifically, in Subsection 20.3.2, we studied an ideal pendulum, which is composed of a point particle connected to a fixed point through a massless rod. In this section, we eliminate the ideal–pendulum restriction. The body is assumed to have arbitrary shape and density distribution. Let us assume friction and air drag to be negligible, so that the only forces acting on the body are gravity and hinge reaction. In this case, it is convenient to use the angular momentum equation around the hinge (Eq. 23.117), so as to eliminate the hinge reaction from the equation. Again, we choose: (i) the origin O to coincide with the hinge point H, (ii) k to be parallel to the hinge line, pointing towards your eyes, (iii) j to be pointing like −g (namely upwards), so at to have g = −g j, and (iv) i = j × k (pointing towards the right), so as to have a right–handed frame. Then, we have /// /// MHE = k · x × g dV = −g j × k · x dV V
= −m g i · xG = −m g sin θ.
V
(23.118)
[Hint: Use a · b × c = c × a · b (Eq. 15.89) and j 000 × k = i (Eq. 15.72), as well as the definition of the center of mass m xG = x dV (Eq. 23.17), V and i · xG = sin θ, where > 0 is the distance between G and the hinge
987
23. Rigid bodies
line, whereas θ is the counterclockwise angle measured from the vertical ray (pointing downwards) from H.] Combining with JH θ¨ = MHE (Eq. 23.117), one obtains d2 θ + C sin θ = 0, (23.119) dt2 . where C = m g /JH = g /ρ2H > 0 (ρH = JH /m being the gyradius around H, Eq. 23.121). This equation is formally identical to Eq. 20.39. The only difference is that g/ is replaced by C. In summary, the problem addressed in this subsubsection is governed by the same equations used for the analysis of the ideal circular pendulum, in Subsection 20.3.2, provided that we replace m 2 with JH , namely g/ with g/ρ2H . Thus, all the considerations presented there apply to this case as well. • Comparison with ideal pendulum To facilitate the comparison with the results for the ideal pendulum (Subsection 20.3.2), note that C=
mg 1 g g = 2 = , JH ρH 1 + ρ2G/2
since JH = m 2 + JG (Eq. 23.110), and hence 2 ρ2H = JH /m = 2 + ρ2G ρG = JG/m ,
(23.120)
(23.121)
where ρH is the gyradius around H, whereas ρG is the gyradius around G (Eq. 23.71). In the ideal–pendulum formulation (Subsection 20.3.2), we were dealing with a point particle connected to the hinge through a massless rod. In this case, we have JG = ρG = 0, and hence ρH = m and C = g/, in full agreement with Eq. 20.40. Of course, this is approximately true whenever JG m2 , namely ρG .]
23.8 Appendix A. Moments of inertia
♥
Here, we derive explicit expressions of the moments of inertia for some simple solids, namely a cuboid, a truncated circular cylinder and a ball. Explicit results are given only when the density is constant.
988
Part III. Multivariate calculus and mechanics in three dimensions
In all the cases considered in this section, we consider only the moments of inertia around the center of mass, since the tensor of inertia JO around a point xO = xG is easily obtained from JG by using Eq. 23.109.
• Cuboid Consider a homogeneous cuboid (namely a hexahedron with rectangular faces, Definition 104, p. 196), with sides a, b and c, in the x-, y- and zdirections, respectively. Placing the origin at the center of mass G, we have, for the moment of inertia around the x-axis, /// J1 = (x, y, z) (y 2 + z 2 ) dV. (23.122) V
If the cuboid is homogeneous, namely if the density is constant, we obtain / J1 = a c
b/2
/
2
c/2
y dy + a b −b/2
z 2 dz =
−c/2
1 m b2 + c2 , 12
(23.123)
where m = a b c denotes the mass of the cuboid. Similarly, we have J2 = m (a2 + c2 )/12, and J3 = m (a2 + b2 )/12. Hence, again if the density is constant, we have ⎡ 2 ⎤ b + c2 0 0 m ⎣ 0 ⎦. 0 a2 + c 2 (23.124) JG = 12 2 0 0 a + b2
• Truncated circular cylinder Consider a truncated circular cylinder, with radius R and height h. Assume the density to be only a function of r, namely = (r).
(23.125)
Let us place the origin at the center of mass G. Consider first the moment of inertia around the axis of the cylinder, which we choose to coincide with the z-axis. Using cylindrical coordinates (Eq. 19.170), for which the Jacobian is given by J = r (Eq. 19.171), we obtain /// J3 =
2
/
(r) r dV = 2 π h V
0
R
(r) r3 dr.
(23.126)
989
23. Rigid bodies
If is constant, we have J3 =
1 m R2 , 2
(23.127)
where m = π R2 h denotes the mass of the cylinder, in agreement with Eq. 23.23. On the other hand, if the mass is evenly distributed exclusively over the cylindrical surface of radius R (namely at r = R), we have J3 = m R2 ,
(23.128)
whereas, if the mass is all concentrated along the cylinder’s axis, we have J3 = 0.
(23.129)
◦ Comment. If we introduce the gyradius ρ3 := J3 /m (Eq. 23.71), even if the cylinder is not homogeneous, we have
/// /// J3 ρ23 2 2 = = (x, y, z) (r /R ) dV (x, y, z) dV ≤ 1, (23.130) m R2 R2 V V with ρ23 /R2 = 1 only if all the mass is evenly distributed at r = R. Remark 170. If = (r/R), the ratio ρ3 /R is independent of R and m, as you may verify. [Hint: Set r/R = u in Eq. 23.130. See also Eqs. 23.127 and 23.128.] Next, consider the moment of inertia around the y-axis. For simplicity, consider only the case of constant Using again m = π R2 h, as well 0 π density. 2 as x = r cos θ (Eq. 19.170) and −π cos θ dθ = π (Eq. 10.47), we have /// J2 = /
(x2 + z 2 ) dV V π
=
R
/
2
cos θ dθ /
/ dθ
−π
R
/
h/2
r dr 0
dz
−h/2
0
π
h/2
r dr
−π
+
/
3
z 2 dz
−h/2
R2 (h/2)3 1 R4 h + 2π 2 = m 3 R 2 + h2 . = π 4 2 3 12
(23.131)
The same expression holds for J1 . Therefore, the moment of inertia tensor for a uniform truncated cylinder is given by
990
Part III. Multivariate calculus and mechanics in three dimensions
⎤ 3R2 + h2 0 0 m⎣ 0 3R2 + h2 0 ⎦. JG = 12 0 0 6R2 ⎡
(23.132)
• Ball Consider a homogeneous ball, with radius R. For simplicity, assume again the density to be constant. To evaluate the moment of inertia around the z-axis, it is convenient to place the origin at the center of mass G and to use spherical coordinates (Eq. 19.140), for which the Jacobian is J = r2 cos ψ (Eq. 19.178). Then, using 0 π/2 x2 + y 2 = r2 cos2 ψ (use Eq. 19.140), as well as −π/2 cos3 ψ dψ = 4/3 (Eq. 10.49) and m = 4πR3 /3 (Eq. 19.180), we have /// 2 J3 = x + y 2 dV V π
/ =
/
π/2
dϕ
cos ψ dψ
−π
= 2π
/
3
−π/2
R
4
r dr 0
2 4 R5 = m R2 . 3 5 5
(23.133)
[Alternatively, if you do not like to use spherical coordinates (Eq. 19.140), you may use the approach used for the integrations performed in Eq. 19.180 (namely using cylindrical coordinates). Accordingly, setting r2 = x2 + y 2 , we have /// / R / √R2 −z2 2 r dV = 2 π r3 dr dz J3 = V
−R
R
0
z3 z5 + dz = π R4 z − 2 R2 3 5 0 1 8 2 2 = π R5 = m R2 , = π R5 1 − + 3 5 15 5 /
= π
R2 − z
2 2
R 0
(23.134)
in agreement with Eq. 23.133.] Of course, the same result holds for J1 and J2 . Therefore, the moment of inertia tensor for a uniform ball is given by ⎡ ⎤ 1 0 0 2 (23.135) JG = m R2 ⎣ 0 1 0 ⎦. 5 0 0 1
991
23. Rigid bodies
23.9 Appendix B. Tensors A matrix may be conceived as a linear mathematical operator that, applied to a mathematicians’ vector (column matrix), produces another mathematicians’ vector: c = A b.
(23.136)
[See for instance the relationship between the angular momentum and the angular velocity, hG = JG ω (Eq. 23.12).] Next, assume that bk and ck are the components of the physicists’ vectors b and c. In this case, it is convenient to introduce a new mathematical entity to perform the equivalent operation between the two physicists’ vectors. ◦ Comment. Of course, what we are aiming at, here, is the mathematical ˇ = J ω (Eq. 23.12), in terms of tensor notation (an basis for rewriting h G G ˇ = J ω (Eq. 23.14). [Recall that the extension of vector notation), as h G G mathematicians’ vectors are dependent upon the frame of reference used, whereas the physicists’ vectors are frame–independent. Thus, the new entity would also be independent of the frame of reference.]
23.9.1 Matrices vs tensors In this section, we introduce such a new entity, which is called a (second– order) tensor. To be specific, in Section 3.1 on matrices, we have introduced the notation b = {bh } to denote a mathematicians’ vector, namely an ordered n-tuple of numbers (elements) bh , as well as the notation A = [ahk ], to denote a matrix, namely an array of numbers (elements) ahk . Finally, we have introduced the convention that the notation c = A b (Eq. 23.136) denotes the "n operation of matrix–vector multiplication, namely ch = k=1 ahk bk (Eq. 3.56). In analogy with this equation, we could define a tensor as a mathematical entity having components ahk , which is such that, given a vector b, the product A b yields a vector c = A b.
(23.137)
This operation is understood to indicate that, in the frame i1 , i2 , i3 , the components ch of c are related to the components bh of b through the operation "3 ch = k=1 ahk bk . [As stated in Remark 135, p. 666, I will always use boldface capital letters only to denote tensors.]
992
Part III. Multivariate calculus and mechanics in three dimensions
◦ A very important consideration. The above definition would not be fully satisfactory, because it does not allow us to express A in terms of components and base vectors, as we do for vectors, in the sense provided by "3 b = h=1 bh ih (Eq. 15.12). Moreover, such an approach does not explain the symbol ⊗ used in Eq. 23.15. Thus, instead of using Eq. 23.137 as a definition, in the following, I will introduce a different one (to me, more gut–level), and then show that this definition allows us, as a consequence, to arrive at Eq. 23.137.
23.9.2 Dyadic product. Dyadic tensor In this subsection, we introduce the necessary notions that allow us to obtain the desired result stated above. Consider the matrix operation A = b cT , namely A = [ahk ], with ahk = bh ck . This provides us with a procedure for introducing tensors starting from (physicists’) vectors. Specifically, to extend the operation A = b cT from mathematicians’ vectors to physicists’ vectors, we use the following Definition 358 (Dyadic product. Dyadic tensor). Given two vectors b and c, the dyadic product, denoted by b ⊗ c, is an operation that produces a mathematical entity, called the dyadic tensor, given by A := b ⊗ c.
(23.138)
Such an operation is understood in the following way: if bh and ck denote the components of b and c in the frame i1 , i2 , i3 , then the components of A are ahk = bh ck . A few comments are in order. First of all, note that the above definition implies the full equivalence of the following three expressions: A = b ⊗ c,
A = b cT ,
ahk = bh ck .
(23.139)
They represent the same operation in different notation, namely tensor (i.e., extended vector), matrix, and indicial, respectively (Definition 237, p. 596). [Again, note that ahk = bh ck and A = b cT depend upon the frame of reference, whereas A = b ⊗ c does not. This issue is addressed in greater depth in Subsection 23.9.5.] In addition, note that, according to the above definition, the dyadic product has the following property (β1 b1 + β2 b2 ) ⊗ c = β1 b1 ⊗ c + β2 b2 ⊗ c
(23.140)
993
23. Rigid bodies
and a similar one for c = γ1 c1 + γ2 c2 . In other words, the operation ⊗ is bilinear (that is, linear with respect to both arguments, namely b and c). Moreover, note that (b cT ) d = b cT d = b (cT d).
(23.141)
Correspondingly, we have the important rule b⊗c d = b (c · d).
(23.142)
"3 "3 Finally, recalling that b = h=1 bh ih and c = k=1 ck ik , we have (use Eq. 23.140) 5 3 6 5 3 6 3 # # # A=b⊗c= bh i h ⊗ c k ik = bh ck ih ⊗ ik . (23.143) h=1
k=1
h,k=1
23.9.3 Second–order tensors Equation 23.143 allows us to introduce the definition of a tensor as follows Definition 359 (Second–order tensor). Given an array of numbers ahk (h, k = 1, 2, 3), and a frame of reference i1 , i2 , i3 , the quantity A=
3 #
ahk ih ⊗ ik
(23.144)
h,k=1
is called a second–order tensor ; the quantities ahk are called the components of the tensor A. The quantities ih ⊗ ik will be referred to as the base tensors. In particular, we have the following Definition 360 (Identity tensor. Zero tensor). The identity tensor I is the second–order tensor that corresponds to the unit matrix I, namely the tensor such that I b = b for any b. The zero tensor O is the second–order tensor that corresponds to the zero matrix O, namely the tensor such that O b = 0 for any b. For future reference, recall the last statement of Subsection 15.3.1, namely that a × b, in terms of components, may be written as A b, with A given by 15.84. We may now restate that statement as follows a × b = A b,
(23.145)
994
Part III. Multivariate calculus and mechanics in three dimensions
"3 "3 where A := h,k=1 ahk ih ⊗ ik , with ahk = j=1 hjk aj (Eq. 15.81). Warning: In the rest of this book, unless otherwise specified, the term “tensor” will be used to mean “second–order tensor,” namely a tensor whose components have two indices. If its components have n indices, it will be called an n-th order tensor.
• On notation, one more time Finally, we can broaden Definition 237, p. 596, to include tensor notation: Definition 361 (Tensor, matrix, indicial and expanded notation). The term tensor notation (namely extended vector notations) denotes the use of boldface letters to represent physicists’ vectors (lower–case letters) and tensors (upper–case letters; Definition 359, p. 993). The term matrix notation denotes the use of sans serif letters, to represent matrices (upper–case letters) and mathematicians’ vectors (lower–case letters). The term indicial notation denotes the use of subscripted notation to represent the elements of matrices (two indices) and mathematicians’ vectors (one index). [Indicial notation is also used for the components of tensors and physicists’ vectors.] The term expanded notation, also used for the components of physicists’ vectors, refers to the use of the symbols i, j, k for the base vectors, the letters x, y, z for the components of the location vector x, the letters u, v, w for the component of the velocity vector v (sometimes for the displacement vector u), as well as bx , by , bz for a generic vector b.
23.9.4 Tensor–by–vector product Next, let us go back to the original objective, namely Eq. 23.137. Setting ch :=
3 # k=1
we have
ahk bk ,
(23.146)
995
23. Rigid bodies
Ab =
# 3
ahj ih ⊗ ij
# 3
h,j=1
=
3 #
bk i k
k=1
h,j,k=1
=
3 #
3 #
ahj bk (ih ⊗ ij ) ik =
ahj bk ih (ij · ik )
h,j,k=1
ahj bk ih δjk =
h,j,k=1
3 #
ahk bk ih =
h,k=1
3 #
ch ih = c,
(23.147)
h=1
in full agreement with Eq. 23.136. [Hint: For the first equality, use Eq. 23.144 "3 and b = k=1 bk ik (Eq. 15.12). For the third, use (ih ⊗ ij ) ik = ih (ij · ik ) (Eq. 23.142). For the fourth, use ij · ik = δjk (Eq. 15.23). For the fifth, use the "n rule that governs the use of the Kronecker delta, namely j=1 ahj δjk = ahk "3 (Eq. 3.30). For the sixth, use ch := k=1 ahk bk (Eq. 23.146). For the last, "3 use c = h=1 ch ih (Eq. 15.12).] Thus, the definition of tensor given in Eq. 23.144 yields, as a consequence, the desired objective stated in Eq. 23.137.
• Projections and components of tensors In analogy with bk = b · ik (Eq. 15.29), which provides the relationship between components and projections for orthonormal base vectors, we have the following Theorem 237. In orthonormal bases, the components ahk of the tensor A are given by ahk = ih · A ik .
(23.148)
◦ Proof : Using (b⊗c) d = b (c · d) (Eq. 23.142), as well as ih · ik = δhk (Eq. 15.23), we have i h · A i k = ih ·
# 3
apq ip ⊗ iq ik = ih ·
p,q=1
= ih ·
3 #
apq ip δqk = ih ·
p,q=1
in agreement with Eq. 23.148.
3 #
apq ip (iq ·ik )
(23.149)
p,q=1 3 # p=1
apk ip =
3 #
δhp apk = ahk ,
p=1
996
Part III. Multivariate calculus and mechanics in three dimensions
23.9.5 Tensors components and similar matrices As stated above, akin to the relationship between mathematicians’ and physicists’ vectors, the components of a second–order tensor in a given frame of reference form a matrix. We express this fact with the notation "3 A = h,k=1 ahk ih ⊗ ik (Eq. 23.144). One more time, the components of a second–order tensor obviously depend upon the frame of reference used, whereas the second–order tensor itself is independent of the frame of reference. Indeed, it is interesting to show that the components of the same second– order tensor in different frames of reference correspond to similar matrices (Subsection 16.3.2). For the sake of simplicity, here we limit ourselves to orthogonal bases. Let us introduces two orthogonal bases, namely i1 , i2 , i3 , and "3 i1 , i2 , i3 , that are related by Eqs. 15.104 and 15.105, namely ih = p=1 rhp ip , with rhp = ih · ip . Then, we have A =
3 #
ahk ih ⊗ ik =
h,k=1
=
3 #
3 #
ahk rhp rkq ip ⊗ iq
h,k,p,q=1
apq ip ⊗ iq ,
(23.150)
p,q=1
where 3 #
apq =
ahk rhp rkq ,
(23.151)
h,k=1
namely (use RT = R -1, Eq. 16.64) A = RT A R = R -1A R,
(23.152)
as anticipated.
23.9.6 Transpose of a tensor In analogy with the transpose of a matrix, we have the following Definition 362 (Transpose tensor). Given two second–order tensors, A and B, iff, for any c and d, we have c · A d = d · B c, then B is called the transpose of A, and is denoted by AT . Thus, by definition, we have
997
23. Rigid bodies
c · A d = d · AT c,
for any c and d.
(23.153)
◦ Comment. Let us try to make the definition of the transpose of a tensor more palatable. Let us set A=
3 #
ahk ih ⊗ ik
and
B=
h,k=1
as well as c =
"3
c · Ad =
p=1 cp ip
3 #
3 #
bkh ik ⊗ ih ,
(23.154)
h,k=1
and d =
ch ahk dk
"3
dq iq . Thus, we have
q=1
and
d · Bc =
h,k=1
3 #
dk bkh ch .
(23.155)
h,k=1
Therefore, if B = AT , Eq. 23.153 implies ahk = bkh ,
(23.156)
"3
as you may verify. [Hint: We have h,k=1 ch (ahk − bkh ) dk = 0. Then, if we choose c = d = 1, 0, 0 T , we obtain a11 = b11 , and so on.] In other words, let A and B denote respectively the matrices of the components of the second–order tensors A and B, in any given basis. Iff B = AT , then B is called the transpose of A and is denoted by B = AT .
23.9.7 Symmetric and antisymmetric tensors As we did for matrices in Subsection 3.4.3 (Eqs. 3.90 and 3.91), any second– order tensor A can be uniquely written as the sum of a symmetric tensor and an antisymmetric one. Specifically, we have the following Definition 363 (Symmetric and antisymmetric tensors). A second– order tensor, A, is called symmetric iff it is equal to its transpose, namely iff (use Eq. 23.156) A = AT .
(23.157)
A second–order tensor, A, is called antisymmetric (or skew–symmetric) iff it is equal to the opposite of A, namely iff A = −AT .
(23.158)
998
Part III. Multivariate calculus and mechanics in three dimensions
Moreover, in analogy with the corresponding definition for matrices (Eqs. 3.90 and 3.91), we have Definition 364 (Symmetric and antisymmetric parts of a tensor). For any second–order tensor A, its symmetric part is defined by AS :=
1 A + AT = ATS , 2
(23.159)
whereas its antisymmetric part is defined by AA :=
1 A − AT = −ATS . 2
(23.160)
The above definitions imply that any tensor may be decomposed into its symmetric and antisymmetric parts, as A = AS + A A ,
(23.161)
as you may verify. The decomposition is unique. [The proof is identical to that for matrices, Theorem 21, p. 113.]
23.9.8 An interesting formula
♠
Just as an application of our newly acquired skills, we obtain an interesting formula (which complements Eq. 18.77), namely a × curlb = QT − Q a = QT a − a · grad b, (23.162) where Q = gradb is the tensor with components qhk = ∂bh /∂xk . [Indeed, for the first equality, we have, in terms of components, 3 #
hjk aj
j,k=1
3 # l,m=1
=
3 #
klm
∂bm ∂xl
qjh − qhj aj .
=
3 #
aj δhl δjm − δhm δjl qml
j,l,m=1
(23.163)
j=1
"3 [Hint: Use curlb = c, with ck = l,m=1 klm ∂bm /∂xl (Eq. 18.70), along "3 with k=1 hjk klm = δhl δjm − δhm δjl (Eq. 15.99).] For the second equality "3 "3 in Eq. 23.162, use j=1 qhj aj = j=1 ∂bh /∂xj aj = (a · grad) bh . Note the analogy with the proof of the expression for a×curlb+b×curla (Eq. 18.77). Indeed, you may obtain Eq. 18.77 from Eq. 23.162.
References
1. Abbott, E. A., Flatland: A Romance of Many Dimensions, Seely & Co., London, UK, 1884. [Also available as Dover, New York, NY, 1992.] 2. Abramowitz, M., Stegun, I. A., (Eds.), Handbook of Mathematical Functions, with Formulas, Graphs, and Mathematical Tables, Dover, New York, NY, 1965. 3. al–Khw¯ arizm¯ı, Muhammad bin M¯ us¯ a, Al–Kitab al–mukhtas.arf¯ı h¯ıs¯ ab al–˘ gabr wa’l– muq¯ abala, IX century. 4. Arnold, V. I., Mathematical Methods of Classical Mechanics, Springer, New York, NY, 1978. 5. Axler, S., Linear Algebra Done Right, Springer, New York, NY, 1997. 6. Behnke, H., Bachmann, F., Fladt, K., S¨ uss, W., (Eds.), Fundamentals of Mathematics, Vol. I, Foundations of Mathematics, The Real Number System and Algebra, The MIT Press, Cambridge, MA, 1974. 7. Behnke, H., Bachmann, F., Fladt, K., Kunle, H., (Eds.), Fundamentals of Mathematics, Vol. II, Geometry, The MIT Press, Cambridge, MA, 1974. 8. Behnke, H., Bachmann, F., Fladt, K., S¨ uss, W., (Eds.), Fundamentals of Mathematics, Vol. III, Analysis, The MIT Press, Cambridge, MA, 1974. 9. Berkeley, G., The Analyst: A Discourse Addressed to the Infidel Mathematician, Printed for J. Tonson in the Strand. London, 1734. 10. Bertotti, B., Farinella, P., Vokrouhlick´ y, D., Physics of the Solar System: Dynamics and Evolution, Space Physics, and Spacetime Structure, Kluver Academic Publishers, Dordrecht, The Nederlands, 2003. 11. Bewersdorff, J., Galois Theory for Beginners – A Historical Perspective, American Mathematical Society, Providence, RI, 2006. 12. Birkhoff, G., Mac Lane, S., A Survey of Modern Algebra, 3rd Ed., MacMillan, New York, NY, 1977. [Also available as A K Peters, Wellesley, MA, 1997.] 13. Boyer, C. B., A History of Mathematics, 2nd Ed., Revised by U. C. Merzbach, Wiley, New York, NY, 1991. 14. Cardano, G., Artis magnae, sive de regulis algebraics, 1545. [Also available as Ars Magna or the Rules of Algebra, translated by T. Richard Witmer, MIT Press, 1968. Reprint, Dover, Mineola, NY, 1993.] 15. Crowe, M. J., A History of Vector Analysis. The Evolution of the Idea of a Vectorial System University of Notre Dame, South Bend, IN, 1967. [Also available as Dover, New York, NY, 1985, 1994.] 16. Dugas, R., A History of Mechanics Dover, New York, N.Y., 1988. 17. Ehrlich, Ph., “The rise of non–Archimedean mathematics and the roots of a misconception. I: The emergence of non–Archimedean systems of magnitudes,” Arch. Hist. Exact Sci., Vol. 60, No. 1, 2006, pp. 1–121.
© Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9
999
1000
References
18. Ekman, M. “A Concise History of the theories of Tides, Precession–Nutation and Polar Motion (from Antiquity to 1950),” Surveys in Geophysics, Vol. 14, 1993, pp. 585–617. 19. Einstein, A., The Meaning of Relativity Princeton University Press, Princeton, N.J., 1921. 20. Ess´ en, H., and Nordmark, A., “A simple model for the falling cat problem,” European Journal of Physics, Vol. 39, No. 3, pp. 1–8, 2018. 21. Etkin, B., Dynamics of Flight. Stability and Control, 2nd Ed., Wiley, New York, NY, 1982. 22. Euclid, The Thirteen Books of the Elements, with introduction and commentary by Th. L. Heath (Unabridged and unaltered republication of 2nd edition by Cambridge University Press), Dover, New York, NY, 1956. 23. Fibonacci, L. Liber Abaci (Book of Calculation) Originally published in 1202 (no known existing copy; republished in 1228). Available in Laurence E. Sigler, Fibonacci’s Liber Abaci. A translation into Modern English of Leonardo Pisano’s Book of Calculation, Springer, New York, NY, 2003. 24. G¨ odel, K., On Formally Undecidable Propositions on Principia Mathematica and Related Systems, Dover, New York, NY, 1992. [Republication of the work first published by Basic Books, New York, NY, 1962, which presents a translation of the original ¨ paper by Kurt G¨ odel, “Uber formal unentscheidbare S¨ atze der Principia Mathematica und verwandter Systeme I,” published in the Monatshefte f¨ ur Mathematik und Physik, Vol. 38, pp. 173–198 (Leipzig, 1931).] 25. Gradshteyn, I. S., Ryzhik, I. M., Table of Integrals Series and Products, 6th Ed., Academic Press, New York, NY, 2000. 26. Gram, J. P., “Ueber die Entwickelung reeller Functionen in Reihen mittelst der Methode der kleinsten Quadrate,” J. reine ang. Math., Vol. 94, pp. 41–73, 1883. 27. Grattan-Guinness, I., Landmark Writings in Western Mathematics, 1640–1940, Elsevier, Amsterdam, The Netherlands, 2005. 28. Guiggiani, M., The Science of Vehicle Dynamics: Handling, Braking, and Ride of Road and Race Cars, 2nd Ed., Springer, New York, NY, 2018. 29. Hales, Th. C., “Jordan’s proof of the Jordan curve,” Studies in Logic, Grammar and Rhetoric, Vol. 10, No. 23, pp. 45–60, 2007. 30. Hellman, H., Great Feuds in Mathematics: Ten of the Liveliest Disputes Ever, Wiley, New York, NY, 2006. 31. Hilbert, D., Grundlagen der Geometrie, B. G. Teubner, Stuttgart, Germany, 1899. [English translation: The Foundations of Geometry (Reprint Ed.), The Open Court Publishing Co., La Salle, IL, 1950.] 32. Hirsch, M. W., Smale, S., Differential Equations, Dynamical Systems, and Linear Algebra Academic Press, New York, NY, 1974. 33. Jeffrey, A., Dai, H.-H., Handbook of Mathematical Formulas and Integrals, 4th Edition Elsevier, Amsterdam, 2008 34. Kane, T. R., Scher, M. P., “A dynamical explanation of the falling cat phenomenon,” Int. J. Solids Structures, Vol. 5, No. 7, pp. 663–670, 1969. 35. K´ arm´ an, Th. v., Biot, M. A., Mathematical Methods in Engineering McGraw-Hill, New York, NY, 1940. ¨ 36. Klein, F., Sommerfeld, A., Uber die Theorie des Kreisels; Heft I–IV, Teubner, Leipzig, 1897. [English translation by R. J. Nagel and G. Sandri: The Theory of the Top; Vol. I–IV, Springer (Birkhauser), 2008–2014.] 37. Kleppner, D., Kolenkow, R. J., An Introduction to Mechanics, McGraw-Hill, Singapore, 1973. 38. Kline, M., Mathematical Thought from Ancient Days to Modern Times, Vol. 1–3, Oxford University Press, Oxford, UK, 1972. 39. Lagrange, J. L., M´ ecanique Analytique Gauthier–Villars fils, Paris, 1788. 40. Lanczos, C., The Variational Principles in Mechanics, 4th Ed., University of Toronto Press, Toronto, Canada, 1970. [Also available as Dover, New York, NY, 1986.]
References
1001
41. l’Hˆ opital, G. F. A., Marquis de, Analyse des Infiniment Petits pour l’Intelligence des Lignes Courbes, L’Imprimerie Royale, Paris, 1696. 42. Livio, M., The Golden Ratio: The Story of Phi, the World’s Most Astonishing Number, Broadway Books, New York, NY, 2002. 43. Mandelbrot, B., “How long is the coast of Britain? Statistical self–similarity and fractional dimension,” Science, New Series, Vol. 156, No. 3775, pp. 636–638, 1967. 44. Maray, E.–J. (author of the photographs being reported), “Photographs of a tumbling cat,” Nature, Vol. 52, Issue 1308, pp. 80–81, 1894. 45. Maslow, A. H., The Farther Reaches of Human Nature, Viking Press, New York, NY, 1971. 46. Monaghan, A., “Walkie-Scorchie problems nearly fixed, Land Securities says,” The Guardian, London, UK, Tue., 12 Nov., 2013. 47. Murray, C. D., Dermott, S. F., Solar System Dynamics, Cambridge University Press, Cambridge, UK, 1999. 48. Newton, I., Philosophiae Naturalis Principia Mathematica London, 1687. [Reprinted by University of California Press, 1999.] 49. O’Connor, J. J., Robertson, E. F. (Creators), MacTutor History of Mathematics Archive; Biographic Index, School of Mathematics and Statistics, University of St Andrews, Scotland, [http://www-history.mcs.st-andrews.ac.uk] 50. Pacioli, Fra Luca Bartolomeo de, De Divina Proportione (On the Divine Proportion), illustrated by Leonardo Da Vinci, Paganini, Republic of Venice, 1509. 51. Peano, G., Arithmetices Principia, Nova Methodo Exposita (Latin for “The principles of arithmetic, presented by a new method.”) Fratelli Bocca, Torino, Italy, 1889. 52. Penrose, R., The Road to Reality: A Complete Guide to the Laws of the Universe, Random House, London, UK, 2004 53. Persico, E., Introduzione alle Fisica Matematica, redatta da T. Zeuli, 2nd Ed., Nicola Zanichelli, Bologna, Italy, 1943. 54. Ricci-Curbastro, G., Levi-Civita, T., “Methods of Differential Calculus and their Applications,” Math. Ann., Vol. 54, pp. 125–201, 1901. [Superseded by T. Levi-Civita, The Absolute Differential Calculus (Calculus of Tensors), Dover, Mineola, NY, 2013, which includes applications to relativity.] 55. Robinson, A., Non–standard Analysis, Princeton University Press, Princeton, NJ, 1966. 56. Rudin, W., Principles of Mathematical Analysis, 3rd Ed., McGraw-Hill, New York, NY, 1976. 57. Russell, B., A History of Western Philosophy, Simon & Schuster, New York, NY, 1945. 58. Russell, B., Whitehead, A., Principia Mathematica, Vol. I–III, Cambridge University Press, Cambridge, UK, 1910–1913. 59. Schmidt, E., “Zur Theorie der linearen und nichtlinearen Integralgleichungen, I. Teil: Entwicklung willk¨ urlicher Funktionen nach Systemen vorgeschriebener,” Math. Ann., Vol. 63, pp. 443–476, 1907. 60. Seife, C., Zero. The Biography of a Dangerous Idea, Penguin Books, New York, NY, 2000. 61. Sommerfeld, A., Lectures on Theoretical Physics, Vol. I, Mechanics, Academic Press, New York, NY, 1952. 62. Stoer, J., Burlisch, R., Introduction to Numerical Analysis, Springer, New York, NY, 1980. 63. Tattersall, J. J., Elementary Number Theory in Nine Chapters, 2nd Ed., Cambridge University Press, Cambridge, UK, 2005. 64. Taylor, B. N., Thompson, A., The International System of Units, NIST Special Publication 330, National Institute of Standard and Technology, Gaithersburg, MD, 2008. 65. Temple, G., 100 Years of Mathematics – A Personal Viewpoint, Springer, New York, NY, 1981. 66. Tricomi, F. G., Lezioni di Analisi Matematica, Volumes I and II (in Italian), 7th Ed., CEDAM (Casa Editrice Dott. Antonio Milani), Padova, Italy, 1956.
1002
References
67. Tricomi, F. G., Istituzioni di Analisi Superiore – Metodi Matematici della Fisica (in Italian), CEDAM (Casa Editrice Dott. Antonio Milani), Padova, Italy, 1964. 68. Tricomi, F. G., Equazioni differenziali, 3rd edition, Boringhieri, Torino, Italy, 1961 [Translated by Elizabeth McHarg into English as Differential Equations, Hafner, New York, NY, 1961. Also available as Dover, New York, NY, 2012.] 69. Tricomi, F. G., Equazioni a Derivate Parziali, Editrice Cremonese, Roma, Italy, 1957. 70. Tricomi, F. G., Funzioni Analitiche, Zanichelli, Bologna, 1961. 71. Truesdell, C., The Kinematics of Vorticity, Indiana University Press, Bloomington, IN, 1954. [Also available as Dover, New York, NY, 2018.] 72. World Pool–Billiard Association, Tournament Table & Equipment Specifications, Item 16. Balls and ball rack, https://web.archive.org/web/20200513110343/https://wpapool.com/equipmentspecifications.
Astronomical data
• Gravitation and gravity G = (6.674 30 ± 0.000 15) · 10−11 m3 kg−1 s−2 gE 9.823 m/m2 (poles) gE 9.789 m/m2 (equator) gM 1.624 9 m/m2 • Masses and mass ratios MS = (1.988 47 ± 0.000 07) · 1030 1.990 · 1030 kg ME 5.972 19 · 1024 kg MM 73.476 730 924 573 5 · 1021 73.48 · 1021 kg MS/ME 333, 060.402 333, 060 MM/ME 0.012 300 0.012 3 MS/MM 27.085 · 106 27 · 106 MS/MJupiter = 1, 047.348 6 ± 0.000 8 1, 047 • Radii and distances RE (equator) 6, 378.136 6 km 6,378 km RE (pole) 6, 356.752 3 km 6,357 km rP erihelion 147, 098, 291 km rAphelion 152, 098, 233 km rSE = 12 [rP erihelion + rAphelion] 149, 598, 262 km 149.60 · 106 km rP erigee 363, 104 km rApogee 405, 696 km rEM = 12 [rP erigee + rApogee] 384, 400 km rGS 42, 164 km (Geostationary Satellite) • Times TE 365.242 2 solar days TM 27.321 661 27.32 solar days © Springer-Verlag GmbH Germany, part of Springer Nature 2021 L. Morino, Mathematics and Mechanics - The Interplay, https://doi.org/10.1007/978-3-662-63207-9
1003
1004
Astronomical data
TESid 0.997 269 68 solar days 86,164 s 1 Syn 2 TM 12 hours+25.2 minutes (Tidal Lunar Semi–day) • Orbit Eccentricities and Semiaxis Relative Differences E 0.016 709 [(a − b)/a 0.000 140] M 0.054 901 [(a − b)/a 0.001 508] • Orbit Inclinations Earth axial tilt 23.439 3◦ Lunar orbit inclination 5.145◦ 5◦
Index
1D, 2D, 3D, 141 >, ≥, b, 41 Infimum, 337 Influence matrix, 162 Initial conditions, 460 Input function, 406 Integer number, 24 Integer part (rationals), 34 Integral Contour, 740 Darboux (one–dimensional), 444 Definite, 411, 414, 420 Indefinite, 411, 414, 420 Line, 738 Line (path–independent), 740 Piecewise–continuous function, 418 Riemann, 413 Surface, 769 Integrating factor, 766 Integration By parts, 429 By substitution, 427 Interference Constructive, 477 Destructive, 477 Interior angle (polygon), 181 Interior point, 177, 268 International System of Units, 169 Intersection, A ∩ B, 347 Interval, 301 Bounded, 43, 301 Length, 301 Open/closed, 43, 301 Unbounded, 44, 302 Intrinsic acceleration components, 804 Inverse function, 249 Derivative of, 372 Inverse matrix, 629 Inverse square law, 839 Irrational numbers, 39 Irreducible fraction, 27 Irrotational vector field, 762 Isochrone, 853 Isochronous, 810 Pendulum, 812 Iterated integrals, 751 Jacobi, Carl Gustav Jacob, 783 Jacobian, 783 Jordan, Marie Ennemond Camille, 348
1011 Jordan, Wilhelm, 650 Joule, James Prescott, 172 K¨ onig theorem, 876 K¨ onig, Johann Samuel, 876 Kant, Immanuel, 949 Kelvin (degree), 173 Kelvin, Lord, 173 Kepler First law, 846 Laws (history), 838 Second law, 845 Third law, 847 Kepler, Johannes, 803 Kilogram, 170 Kinetic energy n particles, 873 Around xG , 875 Center of mass, 875 Single particle (1D), 487 Single particle (3D), 824 Koch snowflake, 319 Koch, Niels Fabian Helge von, 319 Kronecker delta δhk , 99 Kronecker, Leopold, 99 l’Hˆ opital rule, 388 l’Hˆ opital, Guillaume Marquis de, 388 Lagrange Formula, 615 Identity, 616 Multipliers, 735 Remainder of Taylor polynomials, 537 Lagrange, Joseph–Louis, 143 Lam´ e coefficients, 787 Lam´ e, Gabriel, 787 Lamellar vector field, 762 Laplace equation, 469 Laplacian (2D, 3D), 723 Lateral, 152 Acceleration, 804 Latitude, 278 Law, 144 Leading coefficient (polynomial), 216 Leading term (polynomial), 216 Least common multiple, 24 Left–handed Basis, 595 Set of three vectors, 595 Left–multiplication (matrices), 105 Left–side coefficient, 70 Leg (right triangle), 182 Legitimate, 42 Leibniz
1012 Analysis, 13, 142 Cartesian coordinates, 205 Chain rule (derivative), 371 Notation, 367 Primitive and integrals, 419 Rule, 438 Vector calculus, 588 Leibniz, Gottfried Wilhelm, 90 Lemma, 13 Leonardo da Vinci, 46 Lever, 696 Levi-Civita, Tullio, 137 Limit Function, 320 From left or right, 321 Sequence, 305 Line, 176 Closed, 348 Continuity, 176 Open, 348 Piecewise–smooth, 347 Smooth, 347 Line of action, 684 Linear Algebraic equations, 57 Algebraic systems, 57 Equation, 406 Root, 217 Extrapolation, 219 Interpolation, 208, 219 Spring, 149 Transformation, 632 Linear algebraic equations, 67 Elimination (Gaussian), 63 Substitution, 60 Linear algebraic system Homogeneous, 83 Upper triangular, 70 Linear algebraic systems Equivalence (3 × 3), 70 Lower triangular, 83 Upper triangular, 83 Linear combination Functions, 213 Matrices, 99 Vectors, 99 Vectors (physicists’), 591 Linear dependence Functions, 213 Matrices, 103 Vectors (mathematicians’), 101 Vectors (physicists’), 591 Linear independence xk (k = 1, 2, 3), 215
Index xk (k = 1, 2, . . . ), 227 Functions, 214 Matrices, 103 Column, 102 Row, 115 Vectors (mathematicians’), 102 Vectors (physicists’), 591 Linear operation Limit, 308 Matrix–vector multiplication, 127 Linear transformation, 635 Linear vector space, 633 Rn and Cn , 633 Linearity, 127 Differentiation, 370 Matrix–vector multiplication, 128 Linearization, 149, 543 Little o notation, 327 Local minimum, 733 Location vector, 595 Locus, 177 Logarithm natural, 498 Long division (natural numbers), 19 Longitude, 278 Longitudinal, 152 Acceleration, 804 Lower bound, 337 Lower triangular Linear algebraic systems, 83 Matrix, 93 Maclaurin polynomials, 523 Maclaurin, Colin, 523 Magnitude Mathematicians’ vector, 107 Physicists’ vector, 588, 589 Vector in Cn , 645 Vector in Rn , 638 Main diagonal, 70, 83, 92 Mandelbrot, Benoˆıt B., 319 Manifold, 260 Mapping, 253 Linear, 409 ´ Marey, Etienne–Jules, 889 Mass, 794 Mass point, 459 Massless spring, 464 Material point, 146 Mathematical induction, 319 Mathematicians’ vector, 94 Matrix 2 × 2, 90 m × n, 92 Algebra (non–commutative), 104
Index Antisymmetric, 109 Block–diagonal, 133 Column, 90, 92 Conjugate transpose, 645 Diagonal, 92, 627 Diagonal element, 92 Diagonal form, 637 Diagonalizable, 637 Dimensions, 92 Hermitian, 647 Identity, 122, 626 Inverse, 629 Main diagonal, 92 Multiplication by scalar, 95 Negative definite (real), 123 Negative semidefinite (real), 123 Nilpotent, 658 Off–diagonal element, 92 Operation, A : B, 113 Orthogonal, 641 Partitioned, 131 Positive definite (complex), 648 Positive definite (real), 123 Positive semidefinite (real), 123 Power (An ), 631 Product, 104 Rotation, 909 Row, 92 Similar, 635 Singular, 117 Sizes, 92 Square, 92 Strictly triangular, 654 Subdiagonal element, 92 Sum, 95 Superdiagonal element, 92 Symmetric, 109 Transpose, 93 Triangular, 93, 654 Tridiagonal, 92 Unitary in Cn , 648 Unitriangular, 654 Vandermonde, 226 Zero matrix, O, 101 Maximum, 338 Global, 338 Local, 339 Mechanical energy (1D), 492 Meter, 170 Metric system, 170 Minimum, 338 Global, 338 Local, 339 Moment of a force, 683
1013 Arm, 685 Pure, 691 Resultant, 689 Moment of inertia Matrix of, 956 Tensor of, 957 Momentum, 458 n particles, 867 Single particle, 794 Monomial, 216 Monotonic Function, 250 Sequence, 308 Motion around xG , 872 Multi–valued function, 251 Multiple (natural numbers), 19 Multiplication (scalar/vector), 590 Multiplicity Prime factor, 21 Root of polynomial, 223 Multiply connected region, 352 Napier number, 503 Napier, John of Merchiston, 497 Natural Logarithm, 498 Number, 15 Natural frequency of oscillation, 466 Near resonance, 477 Necessary condition, 20 Negative definite matrix (real), 123 Negative semidefinite matrix (real), 123 Neighborhood Lower, 302 On R, 302 One–sided, 302 Plane, 355 Upper, 302 Newton Analysis, 13 Equilibrium law (n particles), 151 Equilibrium law (single particle), 145 First law (1D), 460 First law (3D), 795 Kepler laws (history), 838 Law 4, 457 Notation, 367 Primitive and integrals, 419 Second law (1D), 453 Second law (3D), 793 Third law (1D), 151 Third law (3D), 686 Unit of force, 171 Newton, Isaac, 142
1014 Nice geometrical figures, 260 Nilpotent matrix, 658 Degree, 658 Non–commutative matrix algebra, 104 Non–Euclidean geometries, 184 Non–standard analysis, 369 Nonhomogeneous Equation, 406 Linear algebraic system, 83 Nontrivial linear combination Functions, 213 Vectors (mathematicians’), 101 Nontrivial solution Homogeneous linear algebraic system, 66 Homogeneous linear equation, 406 Nonzero vector (physicists’), 591 Norm (physicists’ vector), 589 Norm (physicists’ vector), 594 Normal, 180 To a surface, 720 Unit, 712 Normalization condition, 594 Notation Big O, 327 Column matrix, 90 Expanded, 597, 994 Indicial, 597, 994 Little o, 327 Matrix, 90, 597, 994 Partitioned–matrix, 131 Tensor, 994 Vector (mathematicians’), 588, 597, 994 Vector (physicists’), 588, 597, 994 Number Complex, 53 Countable, 32 Euler, 503 Even, 23 Imaginary, 52 Integer, 24 Irrational, 39 Napier, 503 Natural, 15 Odd, 23 Prime, 21 Rational, 27 Real, 40 Uncountable, 32 Whole, 15 Odd function, 211 Off–diagonal element, 92 One–to–one Correspondence, 32
Index Relationship, 32 Open, 351 Disk, 177 Interval, 301 Region (2D), 349 Surface, 350 Operator, 253 Linear, 402 Order Derivatives, 373 Differential equation, 406 Ordered, 303 Ordered vectors, 594 Ordinate, 205 Orientation (physicists’ vector), 588 Origin, 205 Orthogonal, 179, 180 Mathematicians’ vectors (complex), 645 Mathematicians’ vectors (real), 638 Matrix, 641 Physicists’ vectors, 593 Transformation, 642 Orthonormal Basis in Cn , 646 Vectors in Cn , 646 Orthonormality Vector in Rn , 639 Osculating Circle, 401 Parabola, 401, 525 Out of phase, 473 Pacioli, Fra Luca Bartolomeo de, 313 Parabola n-th order, 280 Eccentricity, 290 Focus, 289 Horizontal, 280 Semi–latus rectum, 289 Vertical, 219, 280 Parallel, 179 Parallelepiped, 196 Rule, 590 Parallelogram, 190 Area, 193 Rule, 590 Parameter Line in space, 261 Line on plane, 260 Surface on space, 261 Parametric Representation (line), 260, 261 Representation (surface), 261 Parentheses, 30
Index Partial differential equations, 406 Particle, 146 Partitioned matrix, 131 Pascal, Blaise, 171 Path–connected Region, 349 Surface, 350 Peano epsilon, 43 Peano, Giuseppe, 43 Pendulum, 807 Circular ideal, 808 Exact governing equation, 808 Linearized governing equation, 808 Circular realistic, 986 Foucault, 935 Huygens, 810 Isochronous, 810 Perimeter (polygon), 181 Period Decimal number, 34 Function, 231 Length, 34 Oscillation, 467 Periodic function, 231 Permutation symbol, 138 Perpendicular, 179, 180 Phase shift, 469 Phidias, golden ratio, 312 Physicists’ vector, 94, 588 Pick–up of a car, 804 Piecewise, 208 Piecewise–continuous function f (x), 345 f (x, y), 355 Convention, 346 Integration of, 418 Piecewise–smooth surface, 350 Pivot of a moment, 683 Planar motion Rigid body, 967 Plane, 176 Complex, 54 Point, 176 Decimal, 34 In the moving frame, 915 Of the moving frame, 915 Pointing (physicists’ vector), 588 Poisson, Sim´ eon Denis, 914 Polygon, 181 Atypical, 181 Convex, 181 Perimeter, 181 Regular, 181 Side, 181
1015 Typical, 181 Polyhedron, 196 Convex, 196 Edge, 196 Face, 196 Vertex, 196 Polynomial, 216 Degree, 216 Leading coefficient, 216 Leading term, 216 Taylor, tn (x), 521 Zero, 216 Polynomials Maclaurin, 523 Positive definite matrix Complex, 648 Real, 123 Positive semidefinite matrix (real), 123 Postulate, 13 Potential, 763 Energy (1D), 490 Energy (3D), 828 Force field (3D), 827 Multi–valued, 765 Vector field, 762 Power a-th, 25, 37 n particles (3D), 874 z n , 55 Matrix (An ), 631 Natural (complex), 55 Natural (integer), 25 Natural (real), 37, 206 Real (aα ), 247 Real (xα ), 248 Single particle (3D), 825 Prescribed, 5 Prime number, 21 Primitive, 419 Primitive concept, 176 Primitive term, 14 Principal axes of inertia, 962 Principal tidal lunar semi–day, 947 Principle, 144 Mathematical induction, 319 Procession gyroscope, 984 Product Matrix by matrix, 104 Matrix by vector, 106 Row matrix by matrix, 107 Vector by vector, 107 Product rule (derivative), 370 Projection Physicists’ vector, 593
1016 Segment, 197 Vector in Rn , 640 Proof Reductio ad absurdum, 39 By contradiction, 39 Property Associative, 18 Commutative, 17 Distributive, 18 Proposition, 13 Pythagoras, 38 √ 2, 38 Pythagorean theorem, 194 Inverse, 276 Three dimensions, 196 Quadrant, 230 Quadratic equation, 220 Roots, 221 Quadrilateral, 189 Quasi–lamellar, 765 Quotient (natural numbers), 19 Quotient rule (derivative), 371 Radians, 178 Radius of gyration, 974 Range, 208 Rank, 116 Rational number, 27 Ray, 176 Real function of real variable, 204 Real number, 40 Uncountability, 50 Reciprocal function (derivative), 371 Rectangle, 189 Diagonal of, 189 Reference configuration, 153 Region n-connected, 352 2D, 349 Closed, 351 Exterior, 351 Exterior and interior (2D), 349 Interior, 351 Multiply connected, 352 Open, 351 Open and closed (2D), 349 Simply connected, 352 Three dimensions, 351 Relative Acceleration, 919 Velocity, 918 Remainder Division (natural numbers), 19
Index Taylor polynomials (Lagrange form), 537 Taylor polynomials (Peano form), 527 Representation Decimal, 34 Line in space, 261 Line on plane, 260 Surface in space, 261 Repulsive force, 495 Resonance, 475 Near, 477 Response to a harmonic input, 559 Restoring force, 149, 495 Rhombus, 191 Area, 194 Ricci-Curbastro, Gregorio, 137 Riemann Integrable function, 443 Integral, 413, 443 Riemann, Bernhard, 413 Right triangle, 182 Right–handed Basis, 595 Frame of reference, 257 Set of three vectors, 595 Right–multiplication (matrices), 105 Right–side coefficient, 70 Rigid body, 951 Robinson, Abraham, 369 Rolle theorem, 386 Rolle, Michel, 386 Root Linear equation, 217 Natural √ Complex number ( n z), 55, 207, 219 √ n Real number ( x), 38 Polynomial, 223 Multiple, 223 Simple, 223 Quadratic equation, 221 Square, 38 Rotating indices, 82 Rotor, 723 Rotor–ride, 921 Rouch´ e, Eug` ene, 119 Round–off error, 325 Row matrix, 92, 94 Rule BAC–minus–CAB, 615 Cramer (2 × 2 system), 136 Cramer (3 × 3 system), 139 Derivative, 369 Inverse function, 372 Leibniz chain, 371 Multiple chain, 393
Index Product, 370 Quotient, 371 Reciprocal function, 371 Integration, 427 By parts, 429 By substitution, 427 Kronecker delta, 99 l’Hˆ opital, 388 Limits of functions, 321 Parallelepiped, 590 Parallelogram, 590 Sarrus, 137 Sarrus rule, 137 Sarrus, Pierre Fr´ ed´ eric, 137 Scalar, 95 Scale Balance, 146 Spring, 147 Schmidt, Erhard, 619 Schwarz theorem, 397 Schwarz, Hermann, 396 Secant, 180 Second, 171 Segment, 176 Oriented, 176 Semi–difference, 73 Semi–latus rectum Ellipse, 287 Hyperbola, 292 Parabola, 289 Semi–sum, 73 Semiaxis, 73 Sequence, 303 Bounded/unbounded, 303 Divergent, 306 Limit, 305 Monotonically increasing, 308 Serret, Joseph Alfred, 716 Set, 208 Bounded/unbounded, 337 Shock absorbers, 564 Shrinkable, 352 SI (International System of Units), 169 Side (polygon), 181 Sidereal period (Earth), 842 Similar matrix, 635 Similar triangle, 184 Simply connected region, 352 Sine, 228 Single–valued function, 251 Sinusoidal function, 469 Sizes of a matrix, 92 Smaller (natural number), 15
1017 Smooth Point of surface, 720 Surface, 350 Solar day, 842 Solid geometry, 196 Sommerfeld, Arnold, 665 Space Linear vector, 633 Three–dimensional, 176 Speed, 449, 804 Surface, 773 Sphere, 277 Center, 277 Radius, 277 Spherical coordinates, 278, 789 Splitting trick, 322 Spring, 147 Spurious root, 291 Square, 189 Matrix, 92 Stable, 576 Solution, 551 Statically determinate, 693 Statics, 143 Stationary Function, 385 Point, 385 Steady–state solution, 559 Steepest ascent/descent, 721 Step function, 331 Stiffness matrix, 156 Stokes theorem (2D), 762 Straight line, 176, 262 Strictly concave function, 400 Strictly convex function, 400 Subdiagonal element, 92 Subspace (linear vector), 633 Successive subdivision procedure, 335 Sufficient condition, 20 Summation symbol, 97 Double, 98 Dummy index, 97 Index, 97 Multiple, 98 Superdiagonal element, 92 Superposition theorems Linear algebraic systems, 128 Linear equations, 406 Supremum, 337 Surface, 350 Integral, 769 Open/closed, 350 Path–connected, 350 Piecewise–smooth, 350
1018 Smooth, 350 Speed, 773 Symmetric Matrix, 109 Part of a matrix, 112 Synodic period (lunar), 947 System Differential equations, 406 Rigid, 694 Tangent, 180, 228 Plane, 720 Tartaglia, Niccol` o, 820 Tautochrone, 810, 812, 853 Taylor vs Maclaurin polynomials, 523 Taylor polynomial, 521 Taylor, Brook, 523 Temperature, 173 Tensor Antisymmetric, 997 Antisymmetric part, 998 Base, 993 Components, 993 Identity, 957 Second order, 993 Symmetric, 997 Symmetric part, 998 Transpose, 997 Unit, I, 993 Zero, O, 993 Terms Known, 70 Left–side, 70 Prescribed, 70 Right–side, 70 Unknown, 70 Theorem, 13 Angular momentum (n-particles), 870 Axis of rotation, 910 Binomial, 531 Bolzano–Weierstrass in R, 358 Center of mass dynamics, 869 Energy (n-particles), 874 Fubini, 755 Fubini (rectangular region), 753 Fundamental Arithmetic (existence), 21 Arithmetic (uniqueness), 23 Calculus (first), 423 Calculus (second), 424 Gauss (2D), 761 Green, 757 K¨ onig, 876 Momentum (n particles), 867
Index Negative, 392 Pons asinorum, 188 Pythagorean, 194 Rolle, 386 Rouch´ e–Capelli, 120 Rouch´ e–Capelli (generalized), 127 Stokes (2D), 762 Uniform continuity, 356 Uniqueness (linear equations), 406 Thomson, William, Baron Kelvin, 173 Time derivative, u(t), ˙ 374 Torque, 691 Total energy (3D), 829 Trajectory, 793 Transformation Axes (rotation), 258 Axes (translation), 257 Rigid–body, 259 Transport Acceleration, 916, 919 Velocity, 916, 918 Transpose matrix, 93 Transversal, 152 Triangle, 182 Area, 193 Base, 193 Equilateral, 182 Height, 193 Inequality (Euclidean geometry), 198 Inequality (physicists’ vector), 602 Isosceles, 182 Postulate, 182 Right, 182 Scalene, 182 Similar, 184 Triangular matrix, 93 Tridiagonal matrix, 92 Trigonometric functions, 229 Trigonometry, 228 Triple product Scalar, 612 Vector, 615 Trivial solution Homogeneous linear algebraic system, 66 Homogeneous linear equation, 406 Unbounded Interval, 302 Sequence, 303 Set, 337 Uncountability (real number), 50 Uncountable, 32 Unilateral Constraint, 667
Index Contact, 667 Union, A ∪ B, 347 Unit Circle, 178 Measurement, 169 Tensor, I, 993 Vector (physicists’), 592 Vector in Cn , 645 Vector in Rn , 638 Unitary matrix in Cn , 648 Universal joint, 820, 980 Unknown, 5 Unstable, 576 Upper bound, 337 Upper triangular Linear algebraic system, 70 Linear algebraic systems, 83 Matrix, 93 Vandermonde matrix, 226 Vandermonde, Alexandre–Th´ eophile, 226 Varignon, Pierre, 689 Vector (mathematicians’), 90, 94 n-dimensional, 93 Conjugate transpose, 645 Dimension, 93 Multiplication by scalar, 96 Sum, 96 Zero vector, 0, 101 Vector (physicists’), 94 Orthogonality, 593 Unit, 592 Vectors (mathematicians’) Orthogonality (complex), 645 Orthogonality (real), 638 Vectors (physicists’) Applied, 664 Nonzero vector, 591
1019 Normalized, 594 Orthogonal, 594 Orthonormal, 594 Zero vector, 0, 591 Velocity, 793 1D, 449 Absolute, 918 Relative, 918 Transport, 916, 918 Vinculum, 34 Vitruvian Man, 46 Volume Ball, 778, 782 Circular cylinder, 780 Watt, James, 172 Wavelength, 482 Weierstrass Bolzano–Weierstrass theorem in R, 358 Theorem (continuous functions), 341 Weierstrass, Karl, 341 Weight, 146 Whole number, 15 Winding number (2D), 743 Work n particles (3D), 874 Single particle (1D), 487 Single particle (3D), 826 Wrench, 692 Zenith, 197 Zeno of Elea, 302 Zero Function, 0(x), 213 Matrix, O, 101, 625 Tensor, O, 993 Vector (mathematicians’, 0), 101 Vector (physicists’, 0), 591