The Spacetime Jungle [1] 9798397100472

This is a book series about the little-known field of gravitational physics called torsion physics. It is part and parce

213 31 39MB

English Pages 215 Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Jungle me Dangal Part 1

178 73 64MB Read more

Minkowski Spacetime

482 71 59KB Read more

The Amazon Jungle 9781631952814

354 57 10MB Read more

The Ontology of Spacetime [1 ed.] 0444527680, 9780444527684

This book contains selected papers from the First International Conference on the Ontology of Spacetime. Its fourteen ch

431 39 2MB Read more

Jungle Inspired Cookbook: Best Jungle Recipes with a Summer Touch

It is summertime again! Exploring the wilderness and connecting with nature cannot be adventurous without sumptuous meal

116 3 4MB Read more

SpaceTime of the Imperial 9783110418750, 9783110419733

New Series This volume works through spatio-temporal concepts to be found in imperial practices and their representati

193 25 4MB Read more

The Large Scale Structure of Spacetime 9780521200165

112 77 22MB Read more

DK Readers L1: Jungle Animals: Discover the Secrets of the Jungle! (DK Readers Level 1) 1465449620, 9781465449627

Take a walk on the wild side in Jungle Animals. Look up, look down, and look out for the colorful birds, the gentle gia

328 97 7MB Read more

King of the Jungle [1 ed.] 9789789182695, 9789789182091

In King of the Jungle, the bouts of ethno-religious violence in Jos are fused with the heartbreaking story of two brothe

148 86 30MB Read more

Jungle Inspired Cookbook: Best Jungle Recipes with a Summer Touch

It is summertime again! Exploring the wilderness and connecting with nature cannot be adventurous without sumptuous meal

118 24 4MB Read more

The Spacetime Jungle [1]
9798397100472

Author / Uploaded
Ethan J. Morris

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

The Spacetime Jungle UFO's, Torsion Physics, and the Physical Vacuum Volume I

Ethan Morris

Kindle Direct Publishing

"Too bad, Isaac Newton, they dimmed your renown And turned your great science upside down Now a long haired crank, Einstein by name, Puts on your high teachings all the blame Says: matter and force are transmutable And wrong the laws you thought immutable I am much too ignorant, my son For grasping schemes so finely spun My followers are of stronger mind And I am content to stay behind Perhaps I failed, but I did my best These masters of mine may do the rest Come, Kelvin, I have finished my cup When is your friend Tesla coming up? Oh, quoth Kelvin, he is always late It would be useless to remonstrate Then silence– shuffle of soft slippered feet I knock and– the bedlam of the street" –Nikola Tesla, Fragments of Olympian Gossip

The Spacetime Jungle Copyright © 2023 by Ethan Morris All rights reserved. No part of this publication may be reproduced without the express permission of the author. ISBN: 9798397100472

Contents

Contents

iii

Foreword

ix

Introduction

x

Volume I 1. Curvature and Torsion

15

1.1 Arc Length and Time Derivatives: Parametric Calculus. Arc Length I. 16 1.2 Vectors: Vectors. Vector Algebra. Basis Vectors. The Scalar Product. Angles. Projections. Vector Products. 1.3 Vector Functions: The Position Vector. Limit of a Vector Function. Derivatives of Vector Functions and the Velocity Vector. Acceleration I. Arc Length II: The Metric. 1.4 Curvature and Torsion: Curvature. Torsion. Acceleration II. 31

2. Rⁿ

20 27

36

2.1 Geometry of R³: Lines. Distance From a Point to a Line. Planes. Line of Intersection 37 Between Two Planes. Distance From a Point to a Plane. Angle Between Planes. 2.2 Partial Derivatives: Functions of n Independent Variables. Contours. Partial Derivatives. The Chain Rule II. 2.3 Tangents, Normals, and Directional Derivatives: Directional Derivatives. Tangents and Normals.

3. The Integral Theorems of Vector Calculus

41 43

49

3.1 Iterated Integrals: Double Integrals. Triple Integrals. Change of Variables. 50 3.2 Vector Fields: Vector Fields. Line Integrals. Del Operator. 57 3.3 Integration in Vector Fields: Work, Flow, Circulation, and Flux. Parametrization of 64 Surfaces. Parametric Surface Area and the Surface Differential. Implicit Surface Area. Explicit Surface Area. Surface Integrals. The Integral Theorems of Vector Calculus.

4. Matrix Algebra

74

4.1 Matrices: Row and Column Vectors. Matrices. Matrix Properties. Matrix Multiplication. 75 The Matrix Transpose. Inner and Outer Products. Types of Matrices. 4.2 Solving Equations: Row Reduction. The Gauss-Jordan Method. Solutions to Linear Equations. 81 4.3 Vector Spaces: Tiers of Structure in Linear Algebra. Vector Space. 84 4.4 Linear Maps: Definition. The Nullspace. The Fundamental Theorem of Linear Algebra. 85 Span. Coordinate Transformations.

5. Matrix Properties

91

5.1 Determinants Matrix Length. Determinants. The Inverse Formula. Cramer's Rule. Vector Products. 5.2 Eigenvectors: Linear Operators. Eigenvectors and Eigenvalues. The Characteristic 104 Equation. Eigenspaces. Diagonalization. Powers. Sequencing.

6. Real Analysis

92

111

6.1 Analysis of Real Numbers: The Idea of Real Analysis. Infinite Sets. Real Numbers. 112 Real Topology. 6.2 Essential Definitions of Real Analysis: Open Balls. Closed Balls. Open Sets. Closed Sets. 117 Convergent Subsequences. Bounded Sets. Convergent Sequence Properties. Subsequence. Closure. Limits of Functions. Limits of Transformations. Limits of Compositions. Continuous Functions. Convergent Vectors. Convergent Matrices. Compact Sets. Least Upper Bound. Maximum. Minimum. Existence of Maxima and Minima. The Mean Value Theorem. 6.3 The Inverse and Implicit Function Theorems: Newton's Method. Lipschitz Conditions. 122 Lipschitz Constants. Kantorovitch's Theorem. Superconvergence. The Inverse Function Theorem. The Inverse Function Theorem - Short Version. The Implicit Function Theorem - Short Version.

Volume II 7. Curved Geometry and Higher Dimensions

9

7.1 Euclid’s Postulates. 10 7.2 Isometries: Complex Numbers. Euclidean Maps. Quaternions. Euclidean Geometry. 7.3 Higher Dimensions: Lower Analogy and the Fourth Dimension. Higher Dimensions. The Euler Number. Gaussian Curvature. 7.4 Curved Geometries: Spherical Geometry. Hyperbolic Geometry. 28

8. Manifolds

12 20

34

8.1 Lie Groups 35 8.2 Manifolds: Manifolds I. Immersions and Embeddings. Jacobian as the Tangent Map. 38 Tangent Spaces on Manifolds. Smooth Curves in R². Tangent Line to a Smooth Curve. Tangent Space to a Smooth Curve. Smooth Surfaces. Tangent Plane to a Smooth Surface. Tangent Space to a Smooth Surface. Smooth Surface in R³. Theorem: Smooth Curves in R³. Manifolds II. Tangent Space to a Manifold.

9. Tensors

48

9.1 Tensors I: Components and Indices: Index Notation. Scalars. Vectors and Matrices. 49 Coordinate Transformations. Matrix Operations. The Geometry of Rotation Matrices. Scalars and Vectors Redefined. Physical Quantities. Del Operator II. 9.2 Delta and Epsilon: The Substitution Delta I. The Volume Tensor I: The Permutation Epsilon. 61 9.3 Covectors: Curvilinear Coordinates. General Curvilinear Coordinates. Fiber Bundles. Forms. 68 Covectors and Covariancy. 9.4 Tensors II: Definitions: Tensors I: Geometric Objects. Tensors II: The Stress Tensor. 81 Tensors III: Multilinear Functions. Tensors IV: Tensor Transformations. Tensors V: Tensor Products. 9.5 Tensor Algebra: Contraction. The Metric Tensor I: Raising and Lowering Indices. Tensor 104 Transpose. Tensor Symmetry. The Tensor Trace. Tensor Determinants. Cofactor Tensors

10. Forms

109

10.1 Differential 1-Forms: Linear Functionals. Coordinate Functions. The Tangent Space. Directional Derivatives. Differential 1-Forms. 10.2 Exterior Calculus: Volume Forms. p-Forms. p-Vectors and The Dual Map. The Exterior Product. Inner Products. The Exterior Algebra. The Exterior Derivative. 10.3 Geometry of Forms: Geometry of p-Forms. Geometry of p-Vectors. Dimensions of p-Forms and p-Vectors.

11. Integration

110 126 138

148

11.1 Push-forwards and Pull-backs: Coordinate Changes. Push-Forwards. Pull-Backs. Polar Coordinates. Differential Forms. 11.2 Vector Calculus: The Hodge Star Operator. The Poincare Lemma. Musical Notation and Stokes' Theorem.

12. Differential Geometry

149 166

172

12.1 Space Curves: The Local Theory of Plane Curves. The Local Theory of Space Curves. 173 12.2 Surfaces: Surfaces in Rⁿ. Graphs of Functions. Surfaces of Revolution. Surfaces Defined by 180 Equations. 12.3 Curvature: The First Fundamental Form. Smooth Maps. Gaussian Curvature: Negative 185 Curvature Density. The Intrinsic Curvature Density Theorem: The Theorema Egregium. Christoffel Symbols I: Fundamental Forms. Intrinsic Curvature Proof. 12.4 Covariant Derivatives: Arbitrary Speed Curves. The Affine Connection I: Covariant 202 Derivatives. Adapted Frame Fields. Connections Forms. Contour Frames and the Structure Equations. 12.5 Geometric Surfaces: Isometries of R³. The Push-Forward of an Isometry. Euclidean 216 Geometry. Patch Computations. Surface Maps. Manifolds III. 12.6 The Manifold Equations: The Shape Operator. Normal Curvature. Mean Curvature and 218 Curvature Density. Classical Curvature Calculations. Abnormal Curvature Calculations. The Manifold Equations.

13. Tensor Calculus

233

13.1 The Metric: The Metric Tensor II: Arc Length III. The Metric. Associated Tensors. 234 Vector Length. Angle Between Vectors. Angle Between Coordinate Curves. Volume Forms. The Riemannian Metric. 13.2 Tensor Calculus: The Language of Curved Geometry. Differential Operators. Commutators. 248 Covariant Derivatives. Tensor Derivatives II. Christoffel Symbols II: Connection Coefficients. Christoffel Symbols III: Calculating Connection Coefficients. The Symmetric Connection. 13.3 Curvature and Torsion II: The Geodesic Equation. The Curvature Tensor. Bianchi Identities: 276 Ricci and Einstein Tensors. The Torsion Tensor. Bundles and Connections. ∇²(S): Curvature of the Fiber Bundle.

Volume III 14. Geometric Mechanics

9

14.1 Introduction: The Geometric Principle. 10 14.2 Particle Dynamics: Geometric Force and the Laws of Motion. Friction. Models. 11 Periodic Motion 14.3 Work and Energy: Work. Energy. Conservation of Energy. Collisions and Linear Momentum. 15 14.4 Rotation: Angular Quantities. 23

15. Vibration

27

15.1 Vibrations: The Harmonic Oscillator. Damping. Forced Oscillations I. 28 15.2 Superposition: Phasors. Forced Oscillation II. Superposition: Two Collinear Vibrations. 35 Superposition: Two Perpendicular Vibrations. 15.3 Coupled Coordinates: Normal Modes and Coupled Vibrations. 42 15.4 Waves: Plane Sine Waves. The Wave Equation. Incident, Transmitted, and Reflected Waves. 48 Waves of Higher Dimension.

16. Electricity

54

16.1 Electromagnetism: Electromagnetic Field Equations. Electromagnetic Lorentz 55 Force: F = qE + qv × B. 16.2 Electric Fields: Electric Potential. Conservation of Electric Charge. Conduction and Resistance. Electric Dipoles. Mechanical Potential Energy of an Electric Field. Capacitance.

17. Relativity

61

68

17.1 The Lorentz Transformation: The Principle of Relativity. Optics and the Nature of Light. The 69 Speed of Light is Constant. The Michelson-Morley Experiment. Inertial Frames. Time Dilation and Length Contraction. Lorentz Transformations. 17.2 Flat Spacetime: The Minkowski Metric. Relativistic Mechanics. Symmetry Operations. 81 General Boosts. Doppler Shift. World Lines. 4-Vectors. 4-Vector Mechanics. 4-Vector Lorentz Transformations. The Lorentz Force II.

18. Geometry of Electromagnetism

97

18.1 The Electromagnetic Field: Variational Calculus. Lagrangians. The Electromagnetic Lagrangian I: Connection Potential. Hodge Duals in Spacetime. The Electromagnetic 2-Form. The Electromagnetic Lagrangian II: Force = Curvature. 18.2 Electrodynamics: Empty Space. The Stress Tensor. Stress Tensor Fields. The 114 Electromagnetic Stress Tensor. The Electromagnetic Field Tensor. Electromagnetic Field Equations II. The Stress-Energy Tensor I.

19. Gravity and the Physical Vacuum

98

135

19.1 Stress in Spacetime: The Stress-Energy Tensor II. Ideal Fluids. 136 19.2 Newtonian Gravity: Newtonian Gravitation: Inverse Square Law. 148 19.3 The Physical Vacuum: The Equivalence Principle. The Gravitational Field Equations. The Yang-Mills Equations. Λ: The Zero-Point Energy. Quantum Mechanical Spin and Torsion of the Spacetime Manifold. 5-Dimensional Spacetime. M-Theory and Spacetime Matter.

150

20. Quantum Mechanics and Fractal Spacetime

175

20.1 The Principles of Quantum Mechanics: The Physics of Scales. Alchemy, Chemistry, 176 and Atomic Theory. Wave-Particle Duality. Atomic Structure and the Principles of Quantum Mechanics. Quantization and Uncertainty. 20.2 Quantum Foundations: Fourier Transforms. Expectation Values. 188 20.3 Quantum Mechanics: Bras and Kets: State Vectors and Covectors. The Wave Function: 192 Change of Position/Momentum Basis Kets. Generators: Lie Groups of Differential Operators. The Canonical Commutation Relation. The Schrödinger Equation. 20.4 Fractal Spacetime: Fractional Dimensions. Scale Relativity and Fractal Spacetime. 199

21. Cosmic Plasma

208

21.1 Fluids and Elastics: Atomic Bonding. States of Matter. 209 21.2 Cosmological Plasma: The Aurora. Cosmic Plasma. Dark Matter and the Cosmic Web.

22. The Spacetime Jungle

218

226

22.1 Astrospheric Cosmology: Connecting the Dots. A Truly Relative Relativity of Scales. Torsion Physics, Fractal Spacetime, and Cosmic Plasma. Biodiversity and Biolocomotion. Gravitational Propulsion and the Astrosphere. 22.2 UFO Craft: The Nimitz Incident. Generating Lift: How Airplanes Fly. The Gravitational Spaceflight Envelope. Examples of UFO Craft. The Spacetime Jungle.

Appendix A: Numbers and Equations

130

A.1 Set Theory 131 A.2 Numbers: Types of Numbers. Real Numbers. Complex Numbers. A.3 Expressions and Equations 136

Appendix B: Functions and Graphs

138

B.1 Functions 139 B.2 Linear Functions 144 B.3 Asymptotes 148 B.4 Exponential & Logarithmic Functions

154

132

227

236

Appendix C: Systems of Equations C.1 Systems of Equations

159

Appendix D: Trigonometry

165

158

D.1 Circles, Squares, and Triangles: Plane Geometry. The Pythagorean Theorem. 166 D.2 Trigonometric Functions: Trig Functions. Determining Angles. The Law of Cosines. Trigonometric Identities. Periodic Functions.

Appendix E: Derivatives and Integrals

173

180

E.1 Derivatives: k-Closeness and the Least Upper Bound. Limits. Derivatives. Derivative Rules. 181 Common Derivatives. Inverse and Implicit Differentiation. Extrema. The Mean Value Theorem. E.2 Integrals: Definite Integrals. Integral Properties. Common Integrals. The Mean Value 193 Theorem for Integrals. The Fundamental Theorem of Calculus. E.3 Integration Techniques: By Substitution. By Parts. By Trigonometric Substitution. 200 By Partial Fraction Expansion. Further Techniques.

Foreword This is a book series about the little-known field of gravitational physics called torsion physics. It is part and parcel to the geometrical interpretation of spacetime as a physical vacuum. The way in which these notions are related to the subject of UFO’s- mysterious, futuristic crafts that zip through our skies- is the primary result of these books. As we’ll see, this is done totally naturally, by simply applying the laws of evolutionary biology to the setting of a physical vacuum with torsion. Namely, we apply Simpson’s law of biodiversity by way of it’s clear, purely physical relation to animal locomotion, yielding a new, astrospheric geosphere in spacetime. These books are all about communicating the potential legitimacy of torsion physics, physical vacuum theory, and their implications to not only those physicists who have so far overlooked the subject, but also to the general public and any other interested researchers. For this reason, these books are fully self-contained, and are readable by anyone off the street with only some arithmetic. I believe the relevancy of torsion physics may become more clear to the American public in the coming years. If it is, then I hope this book is of great value to those wishing to research it and its relation to the UFO phenomenon. Whether or not an astrospheric theory of cosmology is correct, or torsion physics endures as nothing more than a theoretical curiosity, a thought-provoking study of spacetime physics, and any other aspect of the universe we find ourselves in, is always worthwhile. In terms of acknowledgement, firstly I’d like to thank Oxford scholar Joseph Pharrell, who’s books and research in this field have greatly inspired these. I’d also like to thank Richard Dolan, Daniel Lintz, John Greenwald Jr., and Louis Elizondo for their efforts in exposing the very real subject of UFO’s to unaware members of the public, like me, and making this book possible. Finally, as with every purported work of science, it owes a great debt to the work that came before it. In particular, I’d like to acknowledge the work of Elie Cartan, Hans Alfvén, and Laurent Notalle, for their inspiring contributions to spacetime physics.

Ethan Morris July, 2023

is a book about geometrical physics and its relationship to biology, astronomy, and chemistry. However, from This the point of view of geometry, these separate fields are but different faces of the same nameless, interdisciplinary physical science. Therefore, more generally, this is a book about science: its origin, its methods, and its future.

Regarding real geometry, its greatest formative minds, in terms of significance of contribution, are unarguably the Greek German mathematicians Euclid, Carl Friedrich Gauss, and Bernhard Riemann.

Regarding their nearlyand inconceivable invention of the tensor calculus (humanities greatest analytical tool of geometry) the minds of French and Italian mathematicians Gregorio Ricci and Elié Cartan must be acknowledged

have occupied a realm of genius that few have ever shared company. But with regard to proper, geometrical to physics, none have been more influential than Isaac Newton, Nikola Tesla, and Albert Einstein.

From their point of view, geometry was that mathematics that preceded quantification and coordination, and thus it must be that the language of universal laws was written in terms of it. This fact is exemplified by tensor transformation, wherein say, vectors, exist independently of whatever coordinates may be attached to them. In

other words, beyond some number that describes them, they are actual geometric objects. Whether it's a little arrow on paper, or a rigid object in space, the object upon which we do measurements exists before we do any

measuring:

g

In this way, geometry, as sets of isometries, physically precedes the mathematics supplied to it. This is known as the geometrical principle of physics. this principle, Newton, Tesla, and Einstein agreed. Where they appear to differ is in their application of it.

WhileOnTesla famously called Einstein's work

Introduction

Where We Are Going: Spacetime and the Origins of Higher Species

kit

"...a mass of error and deceptive ideas violently opposed to the teachings of great men of science of the past and even to common sense... the [relativity] theory wraps all these errors and fallacies...in magnificent mathematical garb which fascinates, dazzles and makes people blind to the underlying errors."

x

Though

this particular quote is of questionable provenance, it is most true in spirit. For although withering quotes from Tesla directed towards Einstein exist in abundance, Einstein only ever described him as "an eminent pioneer of high frequency currents". While Tesla's opinion of relativity as a "beggar wrapped in purple whom ignorant

take for a king" was in part driven by his inability to grasp Riemannian geometry, it was also in part people motivated by his belief in an absolute ether. Tesla said

From these experiments, Hendrik Lorentz and Henri Poincare would craft the theory of relativity, agreeing with

Tesla that their failure to detect a phase shift of light from moving light sources did not necessarily imply that ether did not exist. These scientists based their opinions on the elegance of Newton's wave theory of light and Maxwell's

identification of it with the electromagnetic field. Einstein is popularly known to have said, upon being asked how it felt to be the smartest man alive, "I do not know, you will have to ask Nikola Tesla."

"They say much about the Einstein theory now. According to Einstein, the ether does not exist, and many people agree with him. But it is a mistake in my opinion. Ether's opponents refer to the experiments of [Michelson and Morley]..."

From their point of view, if waves are the phase propagation of many coupled oscillators, as their differential equations describe, and if light is such a wave, and if light waves are electromagnetic waves– then through what do the waves of electromagnetism propagate? In other words, waves need some medium to propagate, and since light propagates in an apparent vacuum, must there not be something there doing the oscillating? From his 1864 paper, Maxwell said "The electromagnetic field is that part of space which contains and surrounds bodies in electric and magnetic conditions. ... It may contain any kind of matter, or we may render it empty of all gross matter...There is always, however, enough matter to receive and transmit the undulations of light and heat...that we are obliged to admit that the undulations are those of aetherial substance."

Newton's theory of light waves were themselves grounded in his laws of motion, the basis of all geometrical physics. It served as the very heart of the Enlightenment, and supplanted the magic and superstitions of the time with geometry. Let's illustrate this point with the classic example of a rope. Imagine that a person A and a person B each held their arms behind a curtain that separated them, and that person A is then yanked towards person B, who seems to be pulling them:

B

A

e

t

Now, by magic, one could say that there need not clearly be a rope behind this curtain; person B simply casts a spell, and purely by their will do they move person A. Alternatively, but totally equivalently, the superstitious might cast a prayer (a kind of spell) to some higher being, so that, by their will, person A moves to person B. Totally contrary to this is the concept of contiguous action that emerges, in wave theory, from the geometric principle and the laws of motion. Contiguous action is the phenomenon of force transmission by way of adjacency and contact between intermediaries. By this action, there must be some kind of rope behind the curtain:

it A

it B xi

Force is transmitted, from one adjacent element of the rope, or oscillator, to the next:

The force and momentum, and thus work and energy, entering into one side of an element of the rope leaves its other, so that, as a closed system, at any given time, the total energy of the system is the same: it's conserved. Now, this reasonable way of thinking is much preferable to the superstitions that led, ironically, to witches being burned alive in the West, and it was partly from Newton's laws of motion that the Enlightenment emerged. It must be said partly because, as everyone knows, Newton harbored a fascination with alchemy. In this Newton took the same position as those of other enlightened thinkers of that age, separating the auxiliary bulk of medieval and Hellenistic alchemy from its Hermetic, ancient Egyptian canon. By their reasoning, in the fragments of knowledge, megaliths, and machined artifacts left to us by a previous, industrialized civilization that probably fathered Egypt, there is evidence of a far higher science, that speaks of the preeminence of geometry, and the existence of a universal, material substrate. This is indicated in, for example, the Emerald Tablet. We'll return to this geometrical hypothesis by the end of this series, with fresh eyes, but regardless of whether it is true, it heavily influenced Newton, and, thus, heavily influenced the Enlightenment. From Newton's inspired geometrical physics, and Maxwell's discovery of electromagnetic light waves, most scientists pre-special relativity firmly believed that space was not truly empty; that although we could not see it, there must be something there. In terms of our rope analogy, even if we removed the curtain, and there was no rope, and, as if by magic, person A were yanked by some invisible force, with no intermediary to transmit it, towards person B,

e A

it B

-the geometric principle declares that some contiguous explanation still exists. But could empty space really be an illusion, like the empty space between person A and B? How can it be that something can be in adjacent proximity to something else, so that force transmission might occur, when it does not exist in its plane of reality? Here, the operative word is plane, because, as Riemann and Einstein showed us, the answer lies in higher dimensions. For just as a creature of two-dimensional nature (Mr. Triangle, below) might be totally unaware of a three-dimensional object in its close proximity,

so too might three-dimensional creatures like us be unaware of some four-dimensional object in our proximity. Notice too that Mr. Triangle thinks he lives in a flat, 2-dimensional plane, when really, it is a cube, and its smallest pieces are themselves cubes. This means that the smallest pieces of Mr. Triangle's world are not really squares, as he expects to see, but a cube that he is incapable of seeing. Yet, if vibrations of the elemental cubes could transmit force, do work, and possessed energy, and if the grounding force of the 2-D plane world (gravity) was discovered

to be the result of a 3-dimensional curvature, then Mr. Triangle might be compelled to admit that the smallest pieces of his world were of the same higher dimension as the entire thing. This is precisely the thinking in general relativity. It is true that Einstein would initially declare that, since he could not discern it, and it was not necessary for truly relative frames of inertial motion, the ether was superfluous. But though it is hardly an acknowledged fact in popular academics, it is also true that Einstein only held this view for the brief time between his 1905 and 1915 papers. For the entirety of the rest of his career, he would state that, while the absolute ether of classical physics was an incorrect description, a relativistic ether was not only possible, but was in fact the basis of general relativity: spacetime. On this Einstein said in the now buried statement "according to the general theory of relativity even empty space (empty in the said sense) has physical qualities, which are characterized mathematically by the components of the gravitational potential."..."Thus, once again “empty” space appears as endowed with physical properties ie, no longer as physically empty, as seemed to be the case according to special relativity. One can thus say that the ether is resurrected in the general theory of relativity, though in a more sublimated form."

To Einstein, the higher dimensional nature of spacetime and its curvature manifested by the force of gravitation offered the ultimate clue as to the actual, physical nature of "empty" space. In 1916 he said prophetically "We may still use the word ether, but only to express the physical properties of space. The word ether has changed its meaning many times in the development of science. At the moment, it no longer stands for a medium built up of particles. Its story, by no means finished, is continued by the relativity theory."

And so, upon a proper accounting, we see that Einstein and Tesla were in agreement on the existence of a universal substrate, only Tesla failed to appreciate its higher dimensional generalization in spacetime, and its Riemannian foundations. Einstein was no doubt aware of this, and perhaps sympathetically maintained his silence on Tesla, knowing that Tesla's genius lay not in his knowledge, but his imagination. For it was not only by mathematics, but by ingenuity, that Tesla discovered rotating magnetic fields and devised alternating currents. As the father of electrical engineering, Tesla was the first to exemplify a pattern of contrarian scientific opinion continued within this field to this very day. As we will demonstrate in this book, electricity, and its relativistic manifestation as magnetism, is just the most obvious and interactive of the curvatures of spacetime geometry, just like gravity. Those who study its systems are therefore studying an aspect of spacetime popularly absent from its usual development, namely, that associated with its non-Riemannian nature, and its torsion. In this book, we will give abundant attention to this more obscure aspect of spacetime geometry. We will then extend it in the style of Einstein, who was himself preoccupied with incorporating the spacetime torsion in a fivedimensional version of relativity theory throughout the final years of his life. We will use modern theoretical developments to formulate a fully geometrical theory that incorporates quantum mechanics, and by many angles of attack establish the contiguous, physical nature of spacetime. But though this is a book about physics, its conclusions are biological. On the way, using spacetime physics, we'll gain new insight into astronomy and chemistry, as well, acquiring a satisfying, interdisciplinary understanding of physical science. Incredibly, by applying spacetime physics to biological systems, we may observe an organic extension of the biosphere into higher dimension, the astrosphere, and to a consideration of the gravitationally propelled craft that must exist there. We do this on no other grounds than torsion physics, and the crowning jewel of science, Charles Darwin's theory of evolution by natural selection. Namely, Simpson's hypothesis of biodiversity is furthered in the purely physical context of biolocomotion. By a proper application of torsion physics, the general method of propulsion of UFO craft becomes clear, and their general existence emerges as a feature of evolutionary biology and spacetime physics. In an astrospheric theory of cosmology, the Earth is just a subsystem of a much larger, higher dimensional world, like a lake is here on our own. By the geometrical technique of lower analogy, we may grasp this incomprehensible higher world, and understand it as we understand the atmosphere, the hydrosphere, and the lithosphere: the other geospheres of our world. When we do so, the different UFO hypotheses are unified, as we shall demonstrate.

xiii

How We Get There: Mathematics, Geometry, and Physics This series is written in four basic parts. The first part, from chapters 1 to 5, reviews the mathematical fundamentals of differential geometry. Vector calculus is reviewed, and the metric, the curvature, and the torsion are introduced. Linear algebra then prepares the way for real analysis and manifold theory. For the convenience of the reader, a brief review of elementary mathematics covering algebra, trigonometry, and calculus is given in the Appendix. In the second part, from chapters 6 to 13, we'll construct the mathematical framework of higher dimensions and differential geometry. Proofs of real analysis are used immediately to establish the theory of manifolds. Tensors are carefully and methodically introduced throughout multiple definitions and demonstrations. Forms and their integration over manifolds are defined, and Hodge theory is thus deployed to unify the integral theorems of vector calculus into Stokes' theorem. A classical presentation of differential geometry precedes a development of Riemannian geometry. Covariant derivatives and connection forms are introduced and the tensor calculus and structure equations are constructed. Finally, the curvature and torsion tensors are constructed, including their 2forms. The third part, from chapters 13 to 21, properly develops the physical nature of spacetime. The probable relation between spacetime torsion and quantum mechanical spin, and the resulting analogy between the spin angular momentum of matter and the torsion of spacetime to the stress-energy of matter and the curvature of spacetime, is then made apparent. The Kaluza theory is subsequently reviewed, and the vector potential is shown to emerge from a five-dimensional spacetime metric. The non-Riemannian geometry of rotating electric and magnetic fields is introduced, and correlated with spin and spacetime torsion. Spacetime and matter are proven by theorem to be equivalent in five-dimensions. We then demonstrate by scale relativity that quantum properties emerge from a fractal spacetime, and its self-similar, torsional nature is potentially observed in large scale, filamentary formations of the cosmic web. By a conceptual extension of scale relativity, a new relationship between scale, dimension, and curvature is formulate, establishing the foundation of a new cosmology. In the fourth part, from chapters 22 to 23, a properly developed spacetime physics is applied to astronomy and the laws of biology. A new geosphere and its gravitationally propelled life emerge immediately from both the biological considerations of evolutionary biolocomotion and biodiversity, and an informed spacetime physics. UFO craft are established as a real phenomena and their types are reviewed. The principles and equations of lift and aerodynamics are defined, and a gravitational method of UFO propulsion is thus identified. This method was detailed, in general, in terms of spin and torsion, and the astrosphere is identified as the adaptive zone of UFO craft. Coevolution amongst organisms of the astrosphere, along with its bewildering complexity, finally reveals its jungle-like nature.

xiv

Chapter One Curvature and Torsion

1.1 Arc Length and Time Derivatives: Parametric Calculus. Arc Length I. 1.2 Vectors: Vectors. Vector Algebra. Basis Vectors. The Scalar Product. Angles. Projections. Vector Products. 1.3 Vector Functions: The Position Vector. Limit of a Vector Function. Derivatives of Vector Functions and the Velocity Vector. Acceleration I. Arc Length II: The Metric. 1.4 Curvature and Torsion: Curvature. Torsion. Acceleration II.

1.1 Arc Length and Time Derivatives Parametric Calculus Let's quickly review the vector calculus in the space of 3-dimensions, where we first encounter the curvature and torsion, the metric, and the directional derivative. Vector calculus is the mathematics concerned with the derivatives and integrals of vector functions. It'll take place in a 3-dimensional vector space, but first, let's parametrize our equations in time. Now, in physics, the path of a particle may not realistically be a bijective function of two variables- it's function fails the vertical line test. The path may still be described, however, by a pair of equations, x = f(t) and y = g(t):

we f(t), g(t) Position of particle at time t

-where f and g are continuous functions of an independent variable t: time, the parameter. If x and y are given as functions x = f(t) and y = g(t) over an interval I of t-values, then the set of points (x, y) = (f(t), g(t)) defined by these equations is a parametric curve, and the equations are parametric equations. Together, the interval and parameter constitute a parametrization of the curve, and when given to describe that curve, the curve is then said to have been parametrized. A parametrized curve x = ƒ(t) and y = g(t) is differentiable at t if ƒ and g are differentiable at t. Their time derivatives dx/dt and dy/dt are related by dy/dx via the chain rule:

I Then, rearranging, it's slope is given by

I

ITE EE

se

DA

BYE

Likewise, for a C² parametric function,

JI BIE FIE

9kt

Dfw

For example, consider the parametrized function

X 16

sect

y TAN t

33

Using the slope formula,

m

sect secttant

I MII If

Choosing t = π/4, –π/2 < t < π/2,

txt If

4

3 Choosing run over rise, and writing slope–intercept form from the point–slope form of the equation using the point (√ 2, 1), the tangent line is given by

y l y

(√2, 1)

e x z ZX

z I

ex 1

Arc Length I Let's make a first pass at the metric by deriving the arc length integral. Begin by defining a smooth curve y = f(x) as continuously differentiable (C¹) at every point from x = a to x = b. Partition the interval [a, b] into n subintervals with a = x₀ < x₁ < x₂ < ... < xₙ = b. If yₖ = f(xₖ), then the n corresponding point Pk(xₖ, yₖ) lie on the curve. Connect successive points Pk – 1 and Pk with straight line segments that collectively form a polygonal approximation to the curve: Pₖ₊₁

Pₖ₋₁ Pₖ

x₀ = a x₁

x₂ x₃ P₃

P₀ = A P₁

B = Pₙ

xₖ₋₁

xₖ

xₖ₊₁

b = xₙ

P₂ 17

Since ∆xₖ = xₖ₋₁ – xₖ, ∆yₖ = yₖ₋₁ – yₖ,

y

Pₖ₋₁ ∆yₖ

Lₖ

Pₖ

∆xₖ

xₖ₋₁

X

xₖ

Then any line segment along the polygonal path has length

Lk

CAN Ay I

The length of the curve is approximated by the sum

ÉLk E CAN

Aye

By the mean value theorem, there is a point cₖ with xₖ₋₁ < cₖ < xₖ at which a tangent lies parallel to the secant from xₖ₋₁ to xₖ, and such that

Ayk f

a Atk

Substituting,

The limit of the sum approaches the integral as the norm of the partition goes to zero in each subinterval:

-the arc length of a curve y = f(x) in R² from the point A = (a, f(a)) to the point B = (b, f(b)). We can use the equivalent notation

18

If y = f(x) and if f is C¹ on [a, b], then by the fundamental theorem of calculus, a new function, the distance function, or metric, s(x) may be defined

W

É HERE

HEY

Then the differential of arc length for y = f(x) is

HEPA HAT ditty

In this way, every representation of arc length is some special case of the equation

L Sdk For example, by the fundamental theorem of calculus, it must also be true that

EE

T

So the arc length differential is also

EF Fat

A

ditty The arc length differential ds "lives" in the curve, and it measures length intrinsic to it:

y ds

dy

ϕ dx

x And, in R³, as we will see later in this chapter,

L

14

1 EYETEETH Ii

St 19

1.2 Vectors Vectors Distance in space is a logical extension of distance in the plane. Between points P₁(x₁, y₁, z₁) and P₂(x₂, y₂, z₂),

x

where the bars indicate absolute value, making the distance always positive. The distance between points is numerically equivalent to the length of a line segment connecting those points. This length is numerically equivalent to magnitude. Direct the line segment by equating it to an identical line segment that has one endpoint at the origin, the initial point, and its other endpoint at the coordinate , its terminal point: the position vector. A vector v

T

PE

is a directed line segment z Q(x₂, y₂, z₂) P(x₁, y₁, z₁)

(v₁, v₂, v₃)

v₃

v₂

v₁

x

y

that has both magnitude

and direction, both of which are determined by the vector's components

-ordered double, triples, or n-tuples of real numbers. A vector is thus a transformable geometric object, with components composed of scalar coefficients attached to basis vectors. Typeset vectors with boldface, and write them underneath little arrows:

v=T Or give them as a single-column matrix:

20

I

For example,

my

Vector Algebra Let u = [u₁, u₂, u₃]ᵀ and v = [v₁, v₂, v₃]ᵀ. Define vector addition and subtraction as

att

V

y

t

I

II

t

o

-operating between like components. The resultant vector is the diagonal of a parallelogram that has vectors u and v as its edges. For four vectors a, b, c, and d:

c=a+b

d=a–b

A scalar is just a real number. For a scalar k and a vector u, define scalar multiplication as

ke

Kat Kus

The value of the scalar effects the length and orientation of the resulting vector. For a vector a and a scalar λ,

λ=1

λ1

0 0 such that for all t ∈ D

A vector function r(t) is continuous at a point t = t₀ in its domain if

E

Edt F to

Derivatives of Vector Functions and the Velocity Vector The vector function r(t) = f(t)ê₁ + g(t)ê₂ + h(t)ê₃ has derivative [Dr(t)] at t if f, g, and h have derivatives at t. Then the derivative of vector function r(t) is

-the sum of the derivatives of its scalar coefficients. [Dr(t)] is the vector notation for the derivative matrix of r(t). For two points P, Q on a curve C, Q approaches P as ∆t → 0. At the limit, the vector PQ/∆t becomes the tangent vector r'(t):

r(t + ∆t) – r(t)

z

r'(t)

r(t + ∆t) r(t)

y

x

-the velocity vector, v:

F

E t

T

The derivative and integral rules for vector functions are the same as the rules for scalar functions with the distinction that the derivatives and integrals distribute to components. The speed of a particle is the magnitude of its velocity vector, the magnitude of its change in position. It is therefore a scalar. Starting with a change in 28

distance function s(t), by the mean value theorem, we have the speed as |v|:

ECE At

Ant

JI

ECT

r Jt

JI At

J

Acceleration I The acceleration vector is the time derivative of the velocity vector:

ai

EE

Arc Length II: The Metric Integrating both sides of the speed definition

SEE It S T at gives us the arc length of a smooth curve that's traced out by a vector function r(t):

Kt

f

T Jt

This is just as we defined it at the beginning of the chapter, as the limit of the sum of the arc length differential measured along the curve:

L

14

1 EF

FLEET at

IT

It

Now, this is a subtle point, and crucial for developing an intuition of differential geometry. Intrinsic geometry is often flippantly compared to an ant being able to move about on a cable that to us looks one-dimensional:

EE

But is this ant not able to walk about freely on a surface? Aside from the topological euphemism of a flat cylinder, this begs immediately questions concerning the dimensionality of an object upon scale transformation. In the theory of manifolds, it's important to take seriously the notion that, just as we are embedded in 3-space, there might likewise be similar geometric objects embedded in 2-space (Abbott's Flatland), 1-space (a curve), or spaces 29

of any other dimension. We gain this notion from the metric, and it serves as the mathematical means by which we'll prove curvature to be intrinsic later on. By metric, we mean the same thing as the arc length thats developed intrinsic to the curve:

L

LK

Crucially, all distances are then measured intrinsic to, or from within, the curve. But, as we'll see later, since arc length and the scalar product are related by

g JuiJui

th similar to

y

2

J T

Z

VV

Vz Vetus

then not only distances, but angles also, are derived from this object, gᵢⱼ, the metric tensor. But all of geometry may be formed by measures of angles and distances; indeed, we can define a real geometry as those collections of invariant, isometric rotations and translations, or isometries, of a manifold. Then we must face the fact that there is some point of view that exists intrinsic to a curve, surface, volume (obviously), etc. Furthermore, and quite essential to our journey in this book, we must realize that if one were to live bounded to a curve, stuck within its singular dimension, the globally curved nature of the space would vanish. As we earthbound surface dwellers know all to well, our globally curved planet seems, from our local point of view, quite flat. We'll seize upon this phenomenon in Riemannian geometry, interlinking it to the derivative of a curve and its locally flat tangent space.

30

1.4 Curvature and Torsion Curvature The instantaneous direction of a vector function v(t) may be given by

I -its unit tangent vector, T. The curvature κ (kappa) may be defined in terms of T, as the magnitude of the rate of change of the unit tangent vector with respect to each unit of length along the curve, ie with respect to the arc length differential:

y

s

T P

P₀ T

T

x

This form is appropriate for a curve parametrized by arc length. Where a parameter t is given, for a smooth curve r(t), then κ may be rearranged to give the formula

If r is a differentiable vector function of t of constant length, then its derivative is at all times orthogonal to it:

dr/dt

F

JE

o

|r| =

c

31

Now, since T is a vector function of constant length,

then the derivative dT/ds must be orthogonal to it. Since T is tangent to the curve, its orthogonal derivative must be normal (perpendicular) to the curve:

N

T

P₁

T P₂

P₀

N

By definition, the length of dT/ds is equal to κ. Therefore, dividing by κ yields a unit vector normal to the unit tangent vector:

-the unit normal vector. Vector dT/ds points in the direction in which T turns along the curve as it bends, on the concave side. Its length is proportional to the curvature of the curve it is taken at, and it's by this length that we measure the curvature. For a smooth curve already given in some parameter t,

JE N

Jt

II

I Curvature can be defined by an osculating circle at a point P on a plane curve because, if the circle and plane curve share the same tangent at P, and the circle has a center lying on the curves concave side, then the curvature of the 32

circle and the curvature of the curve are equal:

Circle of Curvature

Center of Curvature Radius of Curvature

N

T P(x, y)

Torsion If T is orthogonal to N, then their vector product must be a unit vector orthogonal to them both:

Ae -the unit binormal vector. Together, the three complete a moving frame, a right-handed, mutually orthogonal frame of vectors: z

B=T×N

r

N

T

y P₀

x 33

They determine three planes– the osculating plane, determined by T and N, the normal plane, determined by N and B, and the rectifying plane, determined by B and T: Normal Plane

z

Rectifying Plane

B

N

T

y x Osculating Plane

Now, since the vector B has a constant length, its derivative dB/ds must be orthogonal to it. Then it must be parallel to N:

I g

ti

B

Plane of Curvature

F Since N is a unit vector, dB/ds must be a scalar multiple of N,

JE

EN

-where τ is the torsion. It measures how sharply the curve twists out of the plane of curvature (the normal plane) ie, the rotation of the binormal vector at the given point. Thus, both the curvature and torsion are scalar measurements of these properties of our curve, a 1-dimensional manifold. Take the scalar product of both sides of the binormal derivative with N:

JE

TN N

N

Then the torsion τ of a space curve is defined as:

e 34

É

N

Acceleration II By the chain rule, dr/ds is

Then

A Thus

confirming our earlier formula for N:

Since v = dr/dt, the total acceleration vector is

At

A

An

a

an -the sum of its normal and tangential components:

at

Ye dy t

an

KT

aₙ = κ(ds/dt)² aₜ = d ²s/dt²

35

Chapter Two Rⁿ 2.1 Geometry of R³: Lines. Distance From a Point to a Line. Planes. Line of Intersection Between Two Planes. Distance From a Point to a Plane. Angle Between Planes. 2.2 Partial Derivatives: Functions of n Independent Variables. Contours. Partial Derivatives. The Chain Rule II. 2.3 Tangents, Normals, and Directional Derivatives: Directional Derivatives. Tangents and Normals.

2.1 Geometry of R³ Rⁿ (or, as a vector space, just bold, Rⁿ) is the Cartesian space of real coordinates, and automatically satisfies the conditions of a vector space (§4.3). Let's redefine some important geometrical quantities in the vector space of R³.

Lines Suppose that L is a line in R³ passing through the point P₀ = (x₀, y₀, z₀) parallel to the vector v = x'ê₁ + y'ê₂ + z'ê₃. Then L is the set of all points P(x, y, z) for which P₀P is parallel to v. Thus, for some point on the line, P₀P = tv:

Mt

-the vector equation of the line L through P₀ parallel to v. The points P and P₀ have position vectors r(t), r₀(t): z P₀(x₀, y₀, z₀) P(x, y, z) v

y x

The vector equation of the line is the origin of the often used standard equation of particle motion:

Ect

Nott T

É

Direction

Initial Speed Position

Distance from a Point to a Line This is just a straightforward use of trigonometry:

37

Planes The plane through P₀(x₀, y₀, z₀) normal to n = aê₁ + bê₂ + cê₃ is at

t

t

t

n P(x, y, z)

P₀(x₀, y₀, z₀)

If a point P and a normal vector n are given, just substitute into the formula. For example, the plane through P₀(–3, 0, 7) perpendicular to n = 5ê₁ + 2ê₂ – ê₃ is

É É Any three points will lie in a distinct plane. Find the normal line by taking the cross product of two vectors that are the distances between one point and the others. The vector normal to the plane through points A(0, 0, 1), B(2, 0, 0), C(0, 3, 0) is

É t

t t

Then, choosing any of the given points, the equation for the plane is

38

The Line of Intersection of Two Planes Two planes are parallel if their normals are parallel: n₁ = kn₂. If they are not parallel, then they intersect in a line that is perpendicular to the normal line of both planes:

The normals of each plane are just the real coefficients with their unit vectors. For example, the vector parallel to the line of intersection of planes 3x – 6y – 2z = 15 and 2x + y – 2z = 5 is

A parametric equation for this line may be given by solving a system of equations for the two planes, identifying a point on their line of intersection. Then the parametric components are

Any

Distance from a Point to a Plane If a point P is on a plane with normal n, then the distance from any point Q to the plane is the length of the vector projection of PQ onto n: a

39

For example, let Q be a point Q(1, 1, 3) outside of the plane 3x + 2y + 6z = 6, and P any point in the plane. The easiest point to find are the intercepts the plane makes with the coordinate axes: z Q(1, 1, 3)

n = 3i + 2j + 6k

(0, 0, 1) 3x + 2y + 6y = 6 Distance

(2, 0, 0)

P(0, 3, 0)

y

x

-(2, 0, 0), (0, 3, 0), and (0, 0, 1). If P is taken to be (0, 3, 0) then

Angle Between Two Planes The angle between two intersecting planes is defined to be the acute angle between their normal vectors: n₂

θ

n₁

θ

For example, the angle between the planes 3x – 6y – 2z = 15 and 2x + y – 2z = 5 is found by

40

2.2 Partial Derivatives Functions of n Independent Variables If D is a set of real-valued n-tuples (q₁, q₂, ..., qₙ) then f is a real, well defined multivariable function on D if f is a rule that assigns to each element in D a unique real number

where f is the domain and w is range.

Contours The set of points in the plane where a function f(x, y) has a constant value f(x, y) = c is a level curve of f. The set of points (x, y, z) in space where a function of three independent variables has a constant value f(x, y, z) = c is a level surface of f: z too

z = f(x, y) = 100 – x² – y² is the graph of f

Plane z = 75

I too as

z = 100 – x² – y²

The contour curve f(x, y) = 100 – x² – y² = 75 is the circle x² + y² = 25 in the plane z = 75

to

1 to f(x, y) = 75

y x

f(x, y) = 52

f(x, y) = 0

The level curve f(x, y) = 100 – x² – y² = 75 is the circle x² + y² = 25 in the xy-plane

The curve in space in which the plane z = c cuts a surface z = f(x, y) is the contour curve. For example, a geologic map of contour curves is a topographical map:

41

Partial Derivatives Partial derivatives are a straightforward generalization of derivatives in a single variable: differentiate with respect to a single variable while treating the other variables as constants. The partial derivative of f(x, y) with respect to x at the point (x₀, y₀) is

The mixed derivative theorem states that, if f(x, y) and its partial derivatives are defined and continuous throughout an open region containing the point (a, b), then

fxyla 6 fyx a b -ie, the order the partial derivatives are taken does not matter.

The Chain Rule II In a single variable, the chain rule states that when w = f(x) is a differentiable function of x and x = g(t) is a differentiable function of t, then w is a differentiable function of t by the composite function w(t) = f ∘ g = f(g(t)), and dw/dt may be calculated by the formula

In the composite w(t) = f(g(t)), t is the independent variable, w is the dependent variable, and x = g(t) is the intermediate variable, since t determines the value of x by way of function g, and x then determines the value of w by way of function f. Use a branch diagram:

42

Moving down the branch generates partial derivatives, and their product is the formula. In two variables, we have simply two branches instead of one, and moving down the branches generates products with more intermediate variables, and the formula is now the sum of those products:

If w = f(x, y) is differentiable and if x = x(t), y = y(t) are differentiable functions of t, then the composite w = f(x(t), y(t)) is a differentiable function of t and

t Proof: Let w = f(x, y) and let ∆x, ∆y, and ∆t be the increments that result from changing t from t₀ to t₀ + ∆t. By the mean value theorem:

Divide by ∆t:

A similar theorem and proof may be given for three or more variables.

2.3 Tangents, Normals, and Directional Derivatives Directional Derivatives Suppose that the function f(x, y) is defined throughout a region R in the xy-plane, that P₀(x₀, y₀) is a point in R, and that u = u₁ê₁ + u₂ê₂ is a unit vector. Then the equations

parametrize the line through P₀ parallel to u. If parameter s measures arc length from P₀ in the direction of u, then df/ds measures the rate of change of f at P₀ in the direction of u: 43

The directional derivative of f at P in the direction of u is the derivative of f at P₀(x₀, y₀) in the direction of the unit vector u = u₁ê₁ + u₂ê₂ and is the number

It

DutPo hey

u po

No tha yo thud s

f xo yo

A definition that also applies in R³: z Q

z = f(x, y)

f(x₀ + su₁, y₀ + su₂) – f(x₀, y₀)

P₀(x₀, y₀, z₀)

(x₀ + su₁, y₀ + su₂)

y

u = u₁i + u₂j x P₀(x₀, y₀)

The partial derivatives of the line through P in the direction of u are

g Then by the chain rule,

44

Gradient of f at P₀

Direction û

-the directional derivative is the scalar product of the gradient of a function f and some unit vector u:

At The direction of a vector v is the unit vector obtained by dividing it by its length. The gradient of f is the vector

The function f increases most rapidly in the direction of the gradient vector of f at P, so when cos(θ) = 1:

It decreases most rapidly in the opposite direction, so when cos(θ) = –1:

It does not change in any orthogonal direction, so when cos(θ) = 0: z

(1, 1, 1)

y x No change

–∇f

∇f = i + j (1, 1)

For example, for the function in the diagram above, f(x, y) = (x²/2) + (y²/2) increases most rapidly in the direction of ∇f:

t It decreases most rapidly in the opposite direction:

45

The gradient vector is normal to a level curve of its function:

If a differentiable function f(x, y) has a constant value c along a smooth curve r = g(t)ê₁ + h(t)ê₂, then f(g(t), h(t)) = c. Differentiating both sides of this equation with respect to t leads to the equations

FEW -the orthogonal gradient theorem.

Tangents and Normals A line through a point P₀(x₀, y₀) normal to a vector N = Aê₁ + Bê₂ has the equation

If N is the gradient than

Then its normal line, the tangent line to a level curve, is

For example, the ellipse

Suppose the tangent line is at (–2, 1). The gradient of f at (–2, 1) is

46

Then the tangent line is

At y

∇f(–2, 1) = –i + 2j x – 2y = –4

√2

x²/4 + y² = –4

1 (–2, 1)

2√2 –2

–1

1

x

2

–1

If r(t) is a smooth curve on the level surface f(x, y, z) = c of a differentiable function f, then

if

If

O

The tangent plane at the point P₀(x₀, y₀, z₀) on the level surface f(x, y, z) = c of a differentiable function f is the plane through P₀ normal to ∇f at P₀:

The normal line of the surface at P₀ is the line through P₀ parallel to ∇f at P₀: t

47

For example, the tangent plane to the surface f(x, y, z) = x² + y² + z – 9 = 0 at the point (1, 2, 4) is

As The normal line at P₀ is therefore

I

48

pet I

Chapter Three The Integral Theorems of Vector Calculus 3.1 Iterated Integrals: Double Integrals. Triple Integrals. Change of Variables. 3.2 Vector Fields: Vector Fields. Line Integrals. Del Operator. 3.3 Integration in Vector Fields: Work, Flow, Circulation, and Flux. Parametrization of Surfaces. Parametric Surface Area and the Surface Differential. Implicit Surface Area. Explicit Surface Area. Surface Integrals. The Integral Theorems of Vector Calculus.

3.1 Iterated Integrals Double Integrals If f(x, y) is a positive function over a rectangular region R in the xy-plane, the double integral of f over R is the volume of the 3-dimensional solid region over the xy-plane bounded below by R and above by the surface z = f(x, y):

Each term f(xₖ, yₖ)∆Aₖ in the sum Sₙ = ∑ f(xₖ, yₖ)∆Aₖ is the volume of a vertical rectangular box that approximates the volume of the portion of the solid that stands directly above the base ∆A . The sum Sₙ thus approximates the total volume of the solid as ∆Aₖ → 0 and n → ∞:

É

is iii1

n = 16

iI

is I i i i1 ii i

i

isiiis is

n = 64

n = 256

giving

-Fubini's iterated integral theorem, as it applies to a rectangular region in the plane. If f(x, y) is continuous throughout the region R: a ≤ x ≤ b, c ≤ y ≤ d, then

50

In other words, double integrals may be evaluated as iterated integrals. If y is made to be a function of x, g(x), then the shape is less restrictive and, generally, any shape may be cut by a plane and the volume of each resulting half found by a limiting case of slices in the independent variable x:

We can thus rephrase Fubini's theorem: if R is defined by a ≤ x ≤ b, g₁(x) ≤ y ≤ g₂(x) (the circle in the plane lying directly below the half sphere in the diagram above), then

Notice that the outside integral with respect to x is integrated last because x is the independent variable: it's needed to first calculate the value of the first, inner integral. The double integral transforms into its polar form by not only a transformation of the x and y bounds to bounds in terms of r and θ, but also by supplying an extra factor of r:

We'll explore the origin of quantities like r from a few different perspectives by the end of this book, beginning with the following geometrical argument. Consider a function of two variables f(r, θ), defined over a region R bounded by the rays θ = α and θ = β and by the continuous curves r = g₁(θ) and r = g₂(θ). Suppose that 0 ≤ g₁(θ) ≤ g₂(θ) ≤ a for every value of θ between α and β. Then R lies in a fan shaped region Q defined by the inequalities 0 ≤ r ≤ a and α ≤ θ ≤ β, inducing a partition of polar rectangles, shapes with constant r and θ values: α + 2∆π ∆Aₖ θ=β

(rₖ, θₖ) R

α + ∆π ∆r

θ=α

∆θ

r = g₂(θ) r=a

3∆r 2∆r ∆r

θ=π

r = g₁(θ)

θ=0

If the polar rectangles have area ∆A₁, ∆A₂,..., ∆Aₙ, and if (rₖ, θₖ) is any point in the polar rectangle,

Sn

EHa

a AA

neeSn

Stir

e

JA 51

The area of a wedge-shaped circle sector with radius r and angle θ is

tf

Let rₖ be the average of the radii of the inner and outer arcs bounding the kth polar rectangle ∆A . Then the radii of the inner and outer arcs bounding ∆A are

∆θ

(rₖ + ∆r/2) (rₖ – ∆r/2)

∆r

∆Aₖ

Then

Integrating as the limit of the sum, we have our extra factor of r:

Sn

Eth

ricardo

HIS

IT

o

rdna

Triple Integrals Let F(x, y, z) be a function defined over a closed, bounded region D in space. Partition a rectangular box-like region containing D into rectangular cells. Order the cells from 1 to n with the kth cell having dimensions 52

∆xₖ × ∆yₖ × ∆zₖ, and volume ∆Vₖ = ∆xₖ∆yₖ∆zₖ

Let

Sn

EF Xxyked Ak

Then the rectangular triple integral of F over D is

AtaMn Similarly, let

Sn

E f ok Ok Az r Ar AO

Then the cylindrical triple integral is the limit of the sum of wedge-shaped partitions:

Atan V = r ∆θ ∆r ∆z

r∆θ

∆z

∆θ ∆r r 53

Finally, let

by Then the spherical triple integral is the limit of the sum of doubly wedge-shaped spherical partitions:

Aylin ρsin ϕ ∆θ

∆θ

ρ∆ϕ

∆ρ

∆ϕ

Change of Variables Recall in the single variable case that an integral substitution is made by composing two functions and multiplying by the new functions derivative:

The multivariate case replaces the single variable derivative g'(u) in the integrand with the derivative in several variables, the derivative matrix, which maps a little piece of one function to a little piece of another. Define a bijective transformation from a region G in (u, v, w-space to a region D in (x, y, z)-space by the differentiable equations

54

Then the change of variables formula is

where |J| is the Jacobian, the determinant of the first derivative matrix of coordinate functions. The derivative matrix of a function f is the m × n matrix composed of the partial derivatives of f evaluated at a:

where the vector components of the vector function f are

May

When we use the first derivative matrix to change variables, we call it a Jacobian matrix. Its signed form takes on the role of pushing forward vectors and pulling back differential forms in the language of differential geometry, and thus plays a key role in higher dimensional mathematics. Because it's just the first derivative matrix, it's form is simple: every row is just a 0-form f, and every column is just a vector basis. The Jacobian relates the two: every column has m rows of 0-forms fₘ, and every row has n columns of vector bases ∂/∂xⁿ. In Rⁿ,

For example, the Jacobian matrix that relates rectangular to cylindrical coordinates is given by

y

24,97

act O

ay yr

ay 20

ay 27

3 ZE

We call the determinant of this matrix is the Jacobian. The Jacobian replaces the single derivative g'(u) in the single variable case of a change of variables (substitution, Appendix E). For example, we can relate the rectangular and cylindrical triple integrals by their Jacobian:

55

From a quick calculation of determinant |J| we get our extra factor of r in the cylindrical change of coordinates:

My

I

Similarly, calculation of |∂(x, y, z)/∂(ρ, ϕ, θ)|, the Jacobian relating rectangular and spherical coordinates, gives us the extra factor of ρ²sin ϕ in the spherical integral:

Thus, the spherical change of coordinates is given by

56

3.2 Vector Fields Vector Fields A vector field is a vector-valued function of scalars that assigns a vector to every point in space:

For example, consider the vector field F(x, y) = (sin y, sin x):

Similarly, a gradient field of a differentiable function f(x, y, z) is the field of gradient vectors

where at each point there is associated a vector that points in the direction of greatest increase in f, and whose magnitude is the value of the directional derivative at that point in the same direction.

Line Integrals A line integral is an integral evaluated along a curve. For a continuous function f defined on a curve C given parametrically by r(t) = g(t)ê₁ + h(t)ê₂ + k(t)ê₃, a ≤ t ≤ b, then the line integral of f over C is

Parametrized in time, the line integral of a scalar field is then

where r : [a, b] → C is any arbitrary bijective parametrization of the curve C such that r(a) and r(b) give the endpoints of C and a < b. The line integral over a scalar field can be thought of as the area under the curve C along 57

a surface z = f(x, y), described by the field:

Let F : U ⊂ Rⁿ → R be a vector field along a piecewise smooth curve parametrized by r(t), a < t < b. Then the line integral of vector field F over (along) C is

The particle (in red) travels from point a to point b along a curve C in a vector field F. Shown below, on the dial at right, is the field's vectors from the perspective of the particle: the red arrow is tangent to it, the blue vector in the vector field is pointing slightly to the left, and the value of their dot product is shown in green. As it changes orientation, the axis arrows rotate to illustrate the changes in reference:

58

The dot product of the tangent velocity vector r'(t) (in red) and the field vector F(r(t)) (in blue) results in the value represented as a green bar. This bar sweeps out an area under C as the particle travels along the path. This area is equivalent to the line integral of the vector field. Now, if force is a vector, by the second law of motion, then any such vector field might be described also as a force field. For example, vector field E is the electric field. Then the line integral becomes the work, E · T, of some particle done in moving through the electric field.

The Del Operator The nabla symbol, ∇, represents del, a vector differential operator that, when applied to a scalar field, denotes the gradient of the scalar field, or, when applied to a vector field, denotes the divergence or curl of the vector field. In R³ with coordinates (x, y, z) and standard basis {ê₁, ..., êₙ), del is

In any Rⁿ with coordinates (x₁, ..., xₙ) and standard basis {ê₁, ..., êₙ), del is

Then the gradient of a scalar field f at the point (x, y, z) is

-the locally steepest slope. The divergence of a vector field F at the point (x, y, z) is

t

t

-a scalar that gives the value of expansion or compression at each point, a measure of the flux density. It can be derived geometrically in the following way: consider in R² a field of velocity vectors, a velocity field, F(x, y) = M(x, y)ê₁ + N(x, y)ê₂ in a domain containing rectangular region A:

59

The rate at which a fluid leaves the rectangle across its bottom edge is approximately

-the scalar component of velocity in the direction of the outward, downward normal, times the length of the segment. The flow across the other edges are similarly calculated:

Sum the flow rates to get the net flow rate out of the rectangle. Starting with opposite edges,

Then the flux, the total flow, across the rectangular boundary is approximately

Divide by the area of the rectangle ∆x∆y to estimate the flux per unit area of the rectangle, and then let the ∆x and ∆y go to zero:

-the flux density. A similar technique may be used to derive the circulation density, the k-component of curl, geometrically. Consider in R² a field of velocity vectors, a velocity field, F(x, y) = M(x, y)ê₁ + N(x, y)ê₂, in a domain containing rectangular region A: 60

The circulation rate of F around the boundary of A is the sum of flow rates along the sides in the tangential direction:

-the scalar components of velocity field F in the tangent direction times the length of the segment. Sum opposite pairs:

Then the net circulation, in the positive counterclockwise (right-handed) direction, is approximately

Dividing by rectangle area ∆x∆y,

-the circulation density of F at (x, y), a scalar. It is positive circulating counterclockwise as seen from above unit vector k = ê₃, k being parallel to the axis of rotation, and this axis being perpendicular to the plane. It is negative circulating clockwise. To picture the spin of circulation acting at some point (x, y), imagine there being a paddlewheel who’s spokes are connecting there: 61

I

The notation above revealing the fact that the circulation density is the k-component of the more general circulation vector in R³, the curl of F. The curl of a vector field F(x, y, z) = M(x, y, z)ê₁ + N(x, y, z)ê₂ + P(x, y, z)ê₃ at the point (x, y, z) is t

I

Then the k-component is

t

Del has many important properties. The most important is in its involvement in the Laplacian. The Laplacian is a scalar operator that may be applied to both scalar and vector fields:

Then the vector Laplacian applies to a vector field and returns a vector:

The scalar Laplacian applies to scalar field f, where f is C² and real-valued, and is the divergence of the gradient of f:

If f is defined at some point f(p), then the Laplacian can be thought of as a measurement of the extremum of a 62

vector field defining some surface “above” its input space. The gradient field gives the rate of steepest ascent for this surface as seen by the gradient vectors in the input space:

-the gradient vectors in the input space of the plane diverge from the minimums of surface z, and converge to z's maximums. Then the scalar Laplacian, the div(grad f), is analogous to the second derivative test: it has a negative value for maximums, and a positive value for minimums. The curl of the curl of a vector v is The curl of the curl of a vector v is

-the divergence of its divergence minus its Laplacian. Always equal to zero are the second derivatives

Which is to say that if

Then there is a u such that

63

3.3 Integration in Vector Fields Work, Flow, Circulation, and Flux In physics, since force is a vector, the work done by moving a particle from one point to the next is an integral over a vector field. Let C be a smooth curve parametrized by r(t), a ≤ t ≤ b, and F be a continuous force field (vector field) over some region containing C. Then the work done in moving a particle from point A = r(a) to point B = r(b) along C is

-the line integral over vector field F. At a point, this is the scalar component F⋅T, in the direction tangent to the curve, integrated along the curve:

Fₖ Tₖ

Fₖ · Tₖ

-ie, that component of F which lies in the instantaneous direction of the curve, T. Thus, integration is the total work along the curve. The work integral has a number of equivalent forms, the work forms:

For example, to find the work done by a force field F = (y – x²)ê₁ + (z – y²)ê₂ + (x – z²)ê₃ along the curve r(t) = tê₁ + t²ê₂ + t³ê₃, 0 ≤ t ≤ 1, from (0,0,0) to (1,1,1), find F and dr/dt, and evaluate the integral. Evaluating the force field F on the curve r gives

64

Then the integral is

I

Flow and circulation integrals are straightforward interpretations of the work integral. If F is the velocity field of a fluid, then the integral of F⋅T along the curve is considered the flow of the fluid along the curve. If the flow ends where it begins, the flow is a circulation. If r(t) parametrizes a smooth curve C in the domain of a continuous velocity field F, the flow along the curve from A = r(a) to B = r(b) is

When A = B, the circulation is

Now, a curve in R² is simple if it does not cross itself. When a curve starts and ends at the same point, it is a closed curve or loop. Let C be just such a closed, simple curve. The scalar F · n is the scalar component of F in the direction outwardly normal to the curve. The integral of F · N over C is the flux of F across C:

Evaluate the integral by choosing a smooth parametrization x = g(t), y = h(t), a ≤ t ≤ b, that traces the curve exactly once as t increases from a to b. Next, give the outward unit normal vector N as the cross product between T and ê₃ orthogonal to the plane of T. The motion along C determines whether T ×ê₃ is pointing inwards or outwards: z

z

y x

T

k×T

k

y k

x T×k

T 65

Choosing N = T × ê₃ for counterclockwise motion,

Therefore

Then the flux form integral is

Using this form, the flux may be found by using only M, N, dy, and dx. Neither N nor ds is needed.

Parameterizations of Surfaces The parametric vector equations for surfaces in space is analogous to the case of curves in the plane, only now with two parametric variables:

This vector equation is equivalent to the parametrizations

66

For example, parametrize a cone in Cartesian coordinates to a cone projected to polar coordinates in the plane by

I Parametric Surface Area and the Surface Differential A double integral exists that gives the surface area of a curved surface in space based on a parametrization

A parametrized surface r(u, v) = f(u, v)ê₁ + g(u, v)ê₂ + h(u, v)ê₃ is smooth if rᵤ and rᵥ are continuous and rᵤ × rᵥ is never zero on the interior of the parameter domain. The partial derivatives rᵤ and rᵤ are equal to

If rᵤ and rᵤ are never zero within the domain, then they are not parallel. Consider a small rectangular patch ∆Aᵤᵥ in R² with sides on the lines u = u₀, u = u₀ + ∆u, v = v₀, and v = v₀ + ∆v. Map the rectangular patch ∆Aᵤᵥ to a surface patch ∆σ such that line u = u₀ maps to curve C₂ and line v = v₀ maps to curve C₁:

Note that C₁: v = v₀ is a line segment in the direction of u, and C₂: u = u₀ is a line segment in the direction of v. The partial derivative vector rᵤ(u₀, v₀) is tangent to C₁ and rᵥ(u₀, v₀) is tangent to C₂. Extend |rᵤ × rᵥ| over the surface patch to approximate it by multiplying each partial derivative vector by a change in that direction ∆ur and ∆vr :

67

A partition of R in the uv-plane into rectangular patches ∆A induces a partition of S in R³ into surface patches ∆σ. Surface patch ∆σ is approximately equal to

The sum of these surface patches approximates the surface in ever finer detail as the side length of tangent plane approximations goes to 0, or, equivalently, the number of surface patches goes to infinity:

The parametric surface area differential dσ is analogous to the arc length differential ds:

Then, for a surface parametrized in the way given in the beginning of the section, the parametric surface area is

Implicit Surface Area It isn't always possible to know explicit formulas for f, g, and h in the parametric vector form of a surface:

This is the case in implicitly defined functions. As we will see in the next section, surfaces of a constant value are implicitly defined surfaces:

I This is equally true of curves of a constant value. By projecting a surface onto an oriented plane region lying beneath it, we can use the implicit function theorem to parametrize the function in such a way that the surface differential may be given for a function defined implicitly. We begin with an implicit surface S lying above a rectangular region R beneath it, with unit vector p normal to it:

68

Orient the surface by making unit normal vector p unit vector ê₃ so that R is the xy-plane. If certain conditions are met, the implicit function theorem says that S is the graph of a differentiable function z = h(x, y), even though the function h is not explicitly known. These conditions are that the surface is smooth (F is differentiable and ∇F is nonzero and continuous throughout S) and ∇F⋅ê₃ ≠ 0. Define parameters u and v by u = x and v = y. Then z = h(u, v) and

gives a parametrization of surface S. With this parametrization in mind, find |rᵤ × rᵥ| and substitute into the uv surface differential formula derived in the parametric case from the last section:

Since z = h(u, v), use the implicit function formulas to give ∂h/∂u and ∂h/∂v:

Then

And

Then the implicit surface area differential is

69

where the index on the unit vector implies that it may be taken as i, j, or k in elementary terms. Then the implicit area of a surface F(x, y, z) = c over a closed and bounded plane region R is

Explicit Surface Area For a surface S defined explicitly as the graph of z = f(x, y), with f being a continuously differentiable function over a region R in the xy-plane, the surface area differential may be simplified to

Do this in the parametric case by

Substitute for u and v:

In the implicit case, the formula derives the same result when one considers the fact that the zth component of F must be equal to one.

Surface Integrals A surface integral is the integral of a function taken over a surface:

For instance G might be the charge or mass density on a surface. It was shown in the previous section that every implicit function in two or three variables is also a curve or surface of constant value:

70

Recall that, by the orthogonal gradient theorem, every gradient is orthogonal to a level curve, or surface, of its function:

f

JI

o

Notice how this fact in the single variable case manifests itself as the derivative being orthogonal to a level curve of its function. We now use it to derive yet another form of the unit normal:

And from the last section we have

Specifying an outward pointing normal field on a surface orients the surface. A smooth surface S is orientable if it is possible to define a field of unit normal vectors N on S that vary continuously with position. From an oriented surface and its normal vector field, the surface integral of a vector field may be defined. Let F be a vector field in three dimensional space with continuous components defined over a smooth surface S having a chosen field of normal unit vectors N orienting S. Then the surface integral of a vector field is

-the surface flux: the flux of vector field F across surface S in the direction of the surface's unit normal field N. Since dσ may possess both a magnitude and a direction corresponding with unit normal N, it can be vectorized from its product with N:

This means the surface differential has components corresponding to the projections of the area element on the three mutually perpendicular planes defined by the rectangular axes:

Then the surface integral of a vector field F over a surface S is equal to the integral of the normal component of F over this surface:

71

The Integral Theorems of Vector Calculus Let C be a piecewise smooth, simple closed curve enclosing region R in the plane. Let F = Mê₁+ Nê₂ be a vector vector field with M and N having continuous first partial derivatives in an open region containing R. Greene's flux-divergence theorem in R² is

-the flux of a vector field F across a closed, simple curve is equal to the divergence of the area enclosed by the curve:

Let F be a vector field whose components have continuous first partial derivatives, and let S be a piecewise smooth oriented surface. The flux-divergence theorem in R³ is

-the flux of a vector field F outwardly normal to and across a closed oriented surface S is equal to the divergence of the vector field over the enclosed volume:

Let C be a piecewise smooth, simple, closed curve enclosing region R in the plane. Let F = Mê₁+ Nê₂ be a vector field with M and N having continuous first partial derivatives in an open region containing R. Greene's curlcirculation theorem in R² is

-the counterclockwise circulation around a closed, simple curve is equal to the circulation density (the curl) over 72

region R:

The flux-divergence and circulation-curl theorems in R² are equivalent. They yield the same answers and can be applied to the same problems. Let S be a piecewise smooth oriented surface having a piecewise smooth boundary curve C. Let F = Mê₁+ Nê₂+ Pê₃ be a vector field whose components have continuous partial derivatives on an open region containing S. The circulation-curl theorem in R³ is Stokes' curl-circulation theorem, in space, is

-the circulation of a vector field F around a boundary C of an oriented surface S counterclockwise with respect to the orienting chosen unit normal vector N is equal to the curl of F over the surface (the integral of the curl vector field ∇ × F over the surface S):

73

Chapter Four Matrix Algebra

4.1 Matrices: Row and Column Vectors. Matrices. Matrix Properties. Matrix Multiplication. The Matrix Transpose. Inner and Outer Products. Types of Matrices. 4.2 Solving Equations: Row Reduction. The Gauss-Jordan Method. Solutions to Linear Equations. 4.3 Vector Spaces: Tiers of Structure in Linear Algebra. Vector Space. 4.4 Linear Maps: Definition. The Nullspace. The Fundamental Theorem of Linear Algebra. Span. Coordinate Transformations.

4.1 Matrices Row and Column Vectors In a flat, Euclidean space, the familiar n-tuple and the row and column representations of a vector are equivalent:

I

1

2,3

I

t

Column Vector

13

3

2

"Row Vector"/Covector

Thus, df = ∇f (§16.3). Multiplication shorthand is still the scalar product:

I

I

2

I

2

I 2,1

Cl 2,17

Matrices A matrix is a two dimensional array:

a

wir

Ai

a

z

q.in

Az azz

i

ai

am

m rows

i ay

am

i

ma

n columns

Like a cathedral, it has m horizontal rows (pews) and n vertical columns, indexed by their subscripts. Note that the little arrow that would typically indicate a vector is absent from u and v. One subscript or superscript automatically represents a vector, and two represents a matrix. The idea of a tensor generalizes this concept also for 3, 4, or n indices.

Matrix Properties For addition, combine corresponding entries. For two matrices A and B:

ATB

ai

t bke

aint bi

azitba

ant

bi

art

biz

aint bin

ait bij i

amatbmn 75

Scalars distribute to every entry:

coin CA

Cait

Cai

calm

Cain camn

i

Matrices are associative, distributive, and commutative, but they do not commute under multiplication:

BA

AB AtB

BTA

CCAtB

CATCB At B C

At BTC

AB

CAB Matrix Multiplication

If A = [a ] and B = [b ], and m = p and p = n , then AB = C is the m × n matrix whose ijth entry is obtained by ik k B A ofBB: multiplying the ith row of A by A the jth column

an

i

app

by

by

i

bin

t

ai

Ajk

i

am

i

by

Aip s

amp

be

Cin

on

be

i

ci i

i

ben

Cm

i

cmn

Where

Ci

Ai

bij t aizbz

t

taikbk

t

taipbp

Aikbk

Multiplication with mismatched rows and columns is undefined:

Ma p

AB

0

Picture the multiplication like a kind of sieve machine: take a column of B, rotate it positive 90° above A parallel to A's top row, then drag the rotated column over every row of A, letting the numbers build up and shoot into C. 76

The Matrix Transpose If A = [a ] is an m × n matrix, then A = [b ] is the n × m matrix where [a ] = [b ]:

A

É

A

an air ais

Rotate columns counterclockwise about diagonal pivots:

A

I

2

s

9

ji

j

j

is

3

4

6

7

8

0

I

2

I

s

9

2

6

0

3

7

I

4

8

Z

Every row becomes a column, and every column becomes a row. Properties:

A BY At

AT BT

CAT CAT

At

A

CAB'S ABT

A

Inner and Outer Products The inner product is a row times a column:

att

a

b b b

az as

For example

I

4

3

2

I

0

I 4

3

14

I s

p

g

o

7

2

g

t

z

The outer product is a column times a row: ab

a

aft

a

as

b babs

a bz

azt

azt

asb

a

albs

azt

bz asks 77

For example,

I

4

3

s

2

o

I

32

I

t

3

I 0

3

z

t

4 s

o

7

2

g

z

Types of Matrices The trace is the sum of the diagonal entries:

TRCA

EY

i

ai

j

Any n × n matrix is a square matrix. Make a square matrix out of any matrix by multiplying by its transpose:

0

3

i

The zero matrix is the empty matrix, where every entry has no value. The identity matrix has 1's down its diagonal:

I

t

Oi

It reflects the identity property of the n dimensional vector space its basis describes:

AI

I

A

I

The unique matrix that gives the identity matrix under multiplication is the inverse matrix:

AH

I

For a diagonal (scalar) matrix, the inverse is just the real inverses of the diagonal entries. The inverse of a matrix product is anticommutative:

Att

AH

ABC

E'B'A

A matrix A is said to be non-singular if it invertible, i.e., elimination to reduced row echelon form produces n pivots with no free variables and no solutions in the nullspace, the only solution being x = A⁻¹b. A is said to be singular if it is non-invertible. If Ax = 0, then either x = 0 and A is invertible and non-singular, or x ≠ 0 and A is non-invertible and singular with |A| = 0. That the Gauss-Jordan method of finding the inverse fails if A is not reducible to R, and that the cofactor inverse formula fails if |A| = 0, corroborates the preceding facts. These methods, the so called Gauss-Jordan method, and the cofactor inverse formula, are the primary tools for 78

calculating matrix inversion. Any matrix that is unchanged under transposition is symmetric:

At A

Ai Aji

When a transposition flips entries to their negatives, that matrix is antisymmetric:

At

A

ai

Aji

The orthogonality condition

crib Oi says that if two vectors have matching basis entries, then they are equal to 1, the natural scalar product between them, and if they have mismatched basis values, then they are equal to 0, again, the natural scalar product between them: z

o

9

t o

f This condition is also true of matrices: z

9

I 0

o

ATB

d

ki

I

Oi

04

107

A

o

o

l

B

Therefore, when a matrix has a transpose that is equal to its inverse, that matrix is itself orthogonal, i.e., it has an orthogonal basis:

A A -a secondary condition of orthogonality for matrices. A matrix that commutes with its transpose is a normal matrix:

A

A

6

6

3

3

6

6

3

6

3

3 6

3

6

3

3 6

45

The above matrices are antisymmetric: all symmetric and orthogonal matrices are normal. This conforms to geometric sense. A matrix with complex entries is a complex matrix:

A

2 8

5 3i

6

1 Yi

The conjugate transpose is just what it says: 2 8

5 31

4 7

Gi

1 Yi

3 2

Iji

A

z 8

Gi

5 3

1 4

4 7

3 Zi

A i

79

When the conjugate transpose is the same as the square matrix it was formed from, eg Z

A

Gi

Gi

2

g

Gi

then that matrix is said to be Hermitian, the complex analogue of symmetry. Likewise, a matrix that is equal to its negative conjugate transpose (conjugation affecting imaginary components of complex numbers only) is anti-Hermitian: z

A

3 6

3 Gi

2

At

I

3 Gi

3 6

Notice that in both Hermitian and anti-Hermitian matrices the imaginary parts do not change, but signs are switched on the real parts of complex numbers in the anti-Hermitian matrices, and in both cases the diagonal consists of real numbers, since only they are equal to their own complex conjugates. When a Hermitian matrix is also equal to its complex inverse (found by matrix methods), that matrix is unitary, the complex analogue of orthogonality:

AAt

AA

For example, the following matrix admits an equal complex conjugate, making it Hermitian, and the complex conjugate matrix is the same thing as its inverse matrix, and therefore it is unitary:

I

I i

i

i

i ti

fi

it

In other words, the matrix, its complex conjugate, and its inverse are all the exact same matrix, which is why it is called a unitary matrix. When a matrix commutes with its conjugate transpose, it is also considered normal:

AAt

80

Ata

4.2 Solving Equations Row Reduction A system of m equations with n unknown variables x₁, ... , xₙ is solved by the x₁, ... , xₙ that simultaneously satisfy all m equations. In this section, we'll learn how to reduce matrices, continuing our work from §7.1. Reducing a matrix is equivalent to solving a system of equations A with a vector x. In matrix form, such a problem is posed as

Ax G -solving for x. In elementary algebra, elimination that reduces the equations until one yields a single variable's particular value will often quickly then be solved by back substitution. In matrix form, this translates into creating an upper triangular matrix U by eliminating entries below the pivots. This is done by doing whatever is necessary to one row so that a single entry below or above that row (specifically its pivots) may be eliminated. In other words, just as in the elementary algebraic case, we combine like coefficients in the coefficient matrix A in such a way that, since one is the opposite value of the other, when they combine they eliminate, but with the key understanding that whatever is done to a variable in a row to eliminate a variable in another, so must be done to the entire row, and the entire row must then also be combined with the affected row, leaving only the affected row changed. This process is repeated until the system yields a solution or solutions (Appendix C). The actual matrix operations of the process consists of multiplication of the coefficient matrix by an elimination matrix E that has the number –C/A in the position to be eliminated, and is the identity elsewhere:

A

A B

IA E

D

C

E

O

B D

CAB

U

A

where D – CA⁻¹B is the Schur complement, the affected, left over row entries. For example, to eliminate below the first pivot, entry (2,1), we would apply to that matrix an elimination matrix

la

Ez Az

I

CA

Taking matrix A to upper triangular U to reduced-row echelon form R is the matrix form equivalent of explicitly solving systems of equations completely by elimination. Reduce to the diagonal and divide each row by its pivot: z

u

4 i

z

z

I

g

i

É

i

g

I

2

R

I

y The Gauss-Jordan Method

Place A next to I and take it to U then to R, doing the same to I, leaving I (formerly A) next to A⁻¹ (formerly I):

AI

A

W

R

IA

81

Solutions to Linear Equations The existence and uniqueness of solutions of linear equations can be determined from its reduced row echelon form. Augment a matrix by adding to the end of it the solution of its system:

Ax 6

IAG

X

I

2

3

1

I

2

3

I

4

5

6

2

4

5

6

2

7

8

9

3

7

8

9

3

-m × n to m × (n + 1). Theorem 4.2.1: Existence and Uniqueness of Solutions Represent the system of Ax = b, involving m linear equations in n unknowns, by the m × (n + 1) matrix [A, b], with row reduction [A, b]. Then i) If the row-reduced vector b contains a pivotal 1, the system has no solutions. ii) If b does not contain a pivotal 1, then a) the system has a unique solution, if there are no non-pivotal unknowns (i.e. each column of A contains a pivotal 1) b) there are infinitely many solution, if at least one unknown (column of A) is non-pivotal A system Ax = b has a unique solution for every b if and only if A reduces to the identity ie, is square and invertible. The best proof of this theorem is by experimentation, and in fact that is what most proofs rely on. By the theorem, one can see what kind of solution a linear system of equations has, and, if it has one, what its unique solution is. For example, the following system

2

32

1

2

I

3

1

1

0

I

0

y

1

I

1

0

1

0

I

l

o

y

22

1

I

1

2

1

0

0

0

I

4

32

1

2

I

3

1

I

0

I 213

y

I

I

I

0

I

0

I

1

13

22

13

I

1

2

43

0

0

0

0

4 x

-has no solutions, since no multiple of 0 will ever produce 1. Next, the following system

2

x

y

-has infinite solutions. Notice now with the given b, z is irrelevant to the original system: it has to be zero to solve the equation. But another way of saying the same thing is that z is arbitrary, since, in the 82

reduced matrix, x + z = 2/3, y + z = –1/3, giving solution

213

Z

1132 Then the z's may be chosen arbitrarily-

453315

1

2

261 37 413 3617

4,3

1

-any z will do. See Span for more examples. As many vectors as desired may be augmented to a matrix, and each augmentation is equal to a system of equations where it is equal to the maximally ranked matrix it has been augmented to:

Zxty x y t

xty

32

1

z

I

x y

22

1

xty

zxty

32

2

Zxty

32

0

z

o

x y

z

I

zz

I

xty

Zz

I

2

I

3

I

2

0

1 0

1

z

z

f

I

1

I

1

0

1

0

I

0

1

I

Z

1

I

2

I

1

I

0

0

I

2

1

4

E Ez 63

-bᵢ are the solutions to the system of equations, separately. The dimension of the augmented matrix is n + m, where m is the number of augmented non-pivotal columns, which are variables in n equations. More generally, if there are n equations in n + m variables, the m variables can be thought of as known, leaving n unknown variables in n equations. This makes sense because, if there are non-pivotal variables in a matrix, they must be some linear combination of the other vectors of maximal rank that they, by definition, span. For theorem 1.2.1, the first part is the linear version of the inverse function theorem, and the last part is the linear version of the implicit function theorem. These two nonlinear theorems define pivotal and non-pivotal unknowns as being those of the linearized problems. As in the linear case, the pivotal unknowns are implicit functions of the non-pivotal unknowns. But those implicit functions will be defined only locally, and in this small region, linearity precisely approximates nonlinearity. See §14.3. In a very general sense, some information can be gained by simply observing the number of columns and rows of a matrix. For a square matrix with n equations in n unknowns, there is usually a unique solution. For a matrix with more rows than columns, m > n, more equations than unknowns, there is usually no solution that will satisfy the long list of requirements for the few variables. Likewise, when a matrix has more columns than rows, n > m, more unknowns than equations, there are usually infinitely many solutions.

83

4.3 Vector Space Tiers of Structure in Linear Algebra Three levels of structure in linear algebra build in the following way: Set V → Vector Space V → Inner Product Space V. From a set V with arbitrary elements, the structure of the vector space is created by imposing on this set the operations of addition (the mapping +: V × V → V) and scalar multiplication (ω: C × V → V). An inner product (or metric) on this vector space then allows for the constructions of length and angle, making possible the calculation of orthogonality, linear independence, basis, span, dimension

Vector Space A vector space is a set V of vectors v ∈ V such that a) any two vectors can be added to form another vector, and b) a vector can be multiplied by a scalar in R to form another vector, and such that its eight axioms are satisfied: i) Additive Associativity: (u + v) + w = u + (v + w) ii) Additive Commutativity: u + v = v + u iii) Additive Identity: There exists a vector 0 ∈ V such that 0 + v = v iv) Additive Inverse: ∀v ∈ V, there exists a vector –v ∈ V such that v + (–v) = 0. v) Multiplicative Identity in R: 1v = v vi) Compatibility of Scalar and Field Multiplication: a(bv) = (ab)v vii) Distributivity, Scalar · over Vector +: a(u + v) = au + av viii) Distributivity, Vector · over Scalar +: (a + b)v = av + bv Rⁿ (or, as a vector space, just bold, Rⁿ) is the Cartesian space of real coordinates, and Rⁿ automatically satisfies the conditions of a vector space. For R² and R³, vectors are easily imagined:

ê₃ = ê₂ =

i

ê₁ =

If S = {v₁, v₂, ... , vₖ) ⊂ V is a set of k vectors in V, then the span of S, span(S), is the set of all vectors such that

a v t

at k

-they are all linear combinations of vᵢ. In other words, the span of a vector v₁, ...., vₙ is the set of all linear combinations a₁v₁ + ... + aₙvₙ. The set of S spanning vectors are linearly independent if and only of i) The only solution to a₁v₁ + ... + aₙvₙ = 0 is cᵢ = 0 (ie there are no nullspace solutions) ii) None of the vᵢ is a linear combination of any others A matrix composed of linearly independent columns is nonsingular (invertible). The set of S vectors is linearly dependent if there exists distinct vectors v₁, v₂, ... , vₖ in S and scalars a₁, a₂, ... , aₖ, not all of which are 0, such that aᵢvᵢ = 0. 84

A matrix composed of linearly dependent columns is singular (non-invertible). If an ordered set of vectors {v} = v₁,...,vₖ ∈ V spans V and is also linearly independent, then it serves as a basis for V, the set of vectors out of which all other vectors may be made. The dimension is the number of elements of any finite basis. When considered as a basis, the notation êᵢ replaces vᵢ. A list of length n is an ordered collection of n objects, which may be quantities or even other lists. A list is given symbolically by parentheses: ( , ). In a set, sequence and repetition are irrelevant; the set {4, 4} is just the set that contains 4, set {4}. In a list, the order matters: the lists (5, 3) and (3, 5) are not equal, yet the sets {5, 3} and {3, 5} are. A subspace U of V is a subset of V that is also a vector space and satisfies the following properties: i) Additive Identity

O

E U

ii) Closed Under Addition

I Tell

Itt

ell

iii) Closed Under Scalar Multiplication

a

Tell

EF

atell

Where F is taken to be either R or C. For example, the xy-plane in R³, {(x, y, 0): x, y ∈ F}, is a subspace of F³. Suppose V is finite dimensional and U is a subspace of V. Then there is a subspace W of V such that V = U ⊕ V, where ⊕ is the direct sum, the sum of subspaces such that their only shared element is the zero vector. If V is finite dimensional, then dimU ≤ dimV. Using R², and extending by linearity, it is seen that vector addition and scalar multiplication in tge vector space is done pointwise, or

c

ii ti

EYE

6.2 Linear Maps Definition A linear map or transformation from V to W is a function T: V → W with the following properties: i) Additivity

TCett

tutti

Hu v EV

ii) Homogeneity

Tar

alt

Hae F V EV

The first property, T(u + v) = T(u) + T(v), means that performing the transformation on the sum of two vectors in the domain vector space has the same result as performing the transformation on the two vectors separately and then adding the results in the codomain vector space. The second property, T(av) = aT(v), means that performing the transformation on a scalar multiple of a vector has the same result as performing the transformation on a vector and then multiplying the result by the scalar. The set of all linear maps from V to W is denoted L (V, W). The zero map 0 is the function that takes each element of a vector space to the additive identity of another vector space i.e., for 0 ∈ L (V, W), 0 is defined by 0v = 0. The identity map I is the function on some vector space that takes each element to itself i.e., for I ∈ L (V, V), I is defined by Iv = v. Let Pₘ(R) denote the set of all polynomials

85

with real coefficients of at most degree m. Define the derivative map T ∈ L (Pₙ(R), Pₘ(R)) by Tp = p'. Integration defines a similar map to R. If v = (v₁, v₂, ..., vₙ) is a basis for V, when a linear map T is applied to v, it creates the matrix of the linear map. In matrix form, its easy to see that T defines a linear transformation from V to W, or more properly, a vector v in V with basis v to a vector w in W with basis w. For linear map A and vectors x and b,

an i n

b

X

ail

ain

ask i

i

Am

Xz

i

6m

n

Amn

n

From row space of dimension n to column space with dimension m: linear transformations map between vector spaces. Proposition 1.2.1 Every linear transformation T: Rⁿ → R is given by multiplication of an m × n multiplication matrix [T], the ith column of which is T(ê ), where all columns combine to give

T J

TCT

Proof: Any vector v ∈ Rⁿ can be written in terms of the orthonormal basis vectors:

f Or

t

t

o

t t

t

ten

t

V2

E

vein

v'entreat

Due to its definitional linearity, a linear map acts only on the basis:

TCT

T

vie

8

i

Evie Vite

This creates a matrix of the linear map T in the standard orthonormal basis ê . Therefore,

TCT

T J

To make a distinction between a vector's components and its basis is instrumental to higher geometry:

T

ale

taeztaez

TE

vitlei

Linear transformations act on basis vectors:

Its not hard to extend this reasoning to see that if a basis transforms one way, its components, to remain unaffected, must change oppositely, or contravariantly. Then some inverse transformation must act on the components, and they are likewise the basis of some vector that lives in a distinct vector space, the dual space of 86

V, V*, the space of all linear functionals. If T is a linear function from Rⁿ to R, then T is a linear functional. For example, the linear map T: R² → R³

Iif 2x

J

acts on a specific element of vector space R²

and thus sends the vector [4, 10]ᵀ ∈ R² to the vector [–6, 14, 8]ᵀ ∈ R³, a linear transformation. The following linear maps T: R² → R

ÉEI 3

T

7

-are all linear functionals. If a vector is represented by a column vector, its dual vector is a row vector, and in unicode we denote the vertical column vector as its transposed row vector with superscript, [v]ᵀ. To access the dual map that truly explains this relationship between forms and vectors, linear transformations must be extended to become multilinear functions in a multilinear algebra: tensors. See Geometry. Easy We now confirm that T is in fact a linear transformation by demonstrating that both properties hold. To confirm that

T1

E

find an expression for the left hand side. We'll use the same parametrization as before: a tb a 6

be fast za 26

TI E I 8

E 8

az b z

t

Finding an expression for the right hand side,

E

Tl 8

Ia

a

t

a tb a tb t

É

za t

az b z

at bz 26

Comparing expressions, the first property holds. To confirm the second property

to

c

E

find an expression for the left hand side:

to

E

EI

Then find an expression for 87

the right hand side:

c.tl

EI

c

Comparing expressions confirms property two holds. Then T is a linear transformation.

The Nullspace For T ∈ L (V, W), the nullspace of T, null T, is the subset of V consisting of those vectors that T maps to zero:

TEV

Nowt

T

o

For the matrix of the linear map, the nullspace of T is given by a matrix that multiplies with T to give the zero matrix. Just like two vectors, any matrix that satisfies the orthogonality condition are orthogonal i.e., since T and its nullspace multiply to zero, they must be orthogonal. Calculate nullspace solutions by i) reducing the matrix of a map to R, ii) identifying negative free columns, and iii) replacing pivot columns in transposed R with free columns and fill the rest with the identity. For example,

i

T

I

3

0

2

1

O

0

1

4

3

I

3

I

6

4

ii

I

3

0

2

1

R

O

O

1

4

3

Q

o

P

i

30

m r ZEROROWS pivotcowntns

a

n

r

NEGATIVE FREE COLUMNS

iii

I

o

30

Rt

o

0

I

2

4

O

I

3

0

F

s

o

o

o

0

4

3

0

I

o

0

0

1

E

0 C

III

a

N

Multiplication quickly reveals that indeed

TN

o The transposed pivot columns demarcate where to split the negated free columns and fill in with the identity matrix. When there is only one pivot column, it means there is no split, and the free column is listed normally.

88

z

The Fundamental Theorem of Linear Algebra In the previous section, the reduced matrix had m – r zero rows and n – r free columns. This is the so-called ranknullity theorem: the rank of a matrix is equal to the number of linearly independent pivot columns it contains. It is usually the maximum number of equal rows and columns in a non-square matrix. The number of nullspace solutions contained in a matrix that are not pivot columns is the nullity of the matrix. Then the rank and the nullity add up to n, or

Nuu

u

r

And there are m – r zero rows in R. Vector spaces are orthogonal complements when then the linear combination of a vector in one with any vector in the other is equal to zero. The dimensions of orthogonal complements add up to the number of dimensions of the space they span. The nullspace is orthogonal to the column and row space:

Axn

O

Axn

0

Understand what this means: the nullspace is always orthogonal to the matrix from which it was made. It is not just orthogonal abstractly: it is actually orthogonal as a matrix i.e., it is made from the transpose of the reduced matrix. This is not unlike the orthogonality condition given between perpendicular row space and column space vectors. These things suggest that row basis vectors are orthogonal to column basis vectors. This is the essence of the Fundamental Theorem of Linear Algebra- that the row space and the column space are separately orthogonal, and both posses orthogonal compliments that consist of their nullspaces:

Rowspace

columnspace

Dim r

Dim r

Left Nuuspace

Nygge Dim

u

r

Din

Ift

Span Span is used to determine the existence of solutions to linear equations. In Vector Space, it was said that the span of a vector v₁, ...., vₙ is the set of all linear combinations a₁v₁ + ... + aₙvₙ. Check if a vector v is a linear combination of any other vectors wᵢ by augmenting a w matrix with the v. If w spans the space, its matrix will reduce to the identity. If v is within its span, the augmented column will yield the unique solution. If the augmented column contains a pivot, there are no solutions. If at least one unknown column (in the reduced augmented matrix) is non-pivotal, then there are infinitely many solutions: 89

2

TF Y WE

I

I

Wj

Il

I

o 1 o o

INFINITE

solutions

o

WE I2 Wj

we

3 3 I

1 0 1

33

3g

3

I

9 I

socio

É I

2

TF Y WE 1 1

21

I

l

0 1 3

8

o

w3

l

3 0

I

0

o

I 1

2

I

2

I

sittin

I

9

3

Then it can be said that the span of T, SpT, is

TEV Tu SPT -is the subset of V consisting of those vectors that T maps to b.

90

O

Iz 98 o l

9

I

13

b

Chapter Five Matrix Properties

5.1 Determinants Matrix Length. Determinants. The Inverse Formula. Cramer's Rule. Vector Products. 5.2 Eigenvectors: Linear Operators. Eigenvectors and Eigenvalues. The Characteristic Equation. Eigenspaces. Diagonalization. Powers. Sequencing.

5.1 Determinants Matrix Length Let A be an n ×m matrix. Totally analogous to a vector, matrix A's length |A|² is defined to be

AZ

EI

ai

-the square root of the sum of the squares of it's entries. For example, for two matrices A and B,

A

B

I

2

3

4

I l

2 o

AZ t

4

B

16

I

3

9

16 30

Determinants Matrix determinants are easy once you get the hang of them. The first thing to know is that determinants only exist for square, n × n matrices. In R², a determinant is a real or complex valued function of 2 × 2 matrices; in goes a 2 × 2 matrix, out comes a number. For a 2 × 2 matrix, multiply corners up to the top row and subtract:

J

DET

ad be

In R³, we decomposes a 3 × 3 matrix into more manageable 2 × 2 minor determinants. We do this by a simple rule that we'll later justify by deeper mathematical principles. Choose any row or column, choose an entry, and get a 2 × 2 matrix from the row and column that is deleted:

Choose entry a₁ of the first row

b

c

62

oz

9363

03

a

DET az

Choose entry b₁ of the same chosen row or column

a

b

o

a

b

o

az

62

oz

az

62

oz

as

63

03

9363

03

a

Delete entry a₁'s row and column to form a 2 ×2 minor

6203 0263

a

b

o

a

b

o

az

62

oz

az

62

oz

9363

03

9363

03

6,6203

0293

Delete entry b₁'s row and column to form a 2 ×2 minor

And similarly for entry c₁ of row one, column three, giving the number c₁(a₂c₃ – c₂a₃). Now we just sum these three numbers together, giving them a positive or negative sign according to the rule a 92

1

its

So a₁, in position (1, 1), is given a positive sign by the multiplication of (–1)², where the exponent is equal to (i + j) = (1 + 1). Similarly, b₁ = (–1)³b₁ = –b₁, and c₁ = (–1)⁴c₁ = c₁. Then our 3 × 3 determinant is equal to

b

o

62

oz

9363

03

a

DET az

6,6203

6203 0263

a

to

0293

203

0293

We could have also used any other row or column to start with:

b

o

62

oz

9363

03

a

DET az

6,6203

6z

0293

a Cz

b3

0,93

aicz

gaz

Cofactor

In every case we use our row or column to acquire coefficients For larger matrices, such as a 4 × 4 matrix, we use the method of minors. Using some row or column of the 4 × 4 matrix, develop 3 × 3 submatrices on which we may take determinants as before:

DET

a

b

o

d

az

62

oz

dz

a

62

oz

dz

63

03

43

f Az

64 04 d4

9363 03 43 94 64 04 d4

as

ii

63

03

43

b

o

d

b

o

d

62

oz

dz

ay 62

oz

dz

by Cy dy

63

03

43

minor Call the the signed minors of an n × n matrix A the cofactors of A, Aᵢⱼ , defined by the formula

A

C1

M

Call the submatrices the minors of A, namely, of entry aᵢⱼ. They are the determinant Mᵢⱼ of the (n – 1) × (n – 1) submatrix that remains after deletion of the ith row and jth column of A. The determinant det A = |aᵢⱼ| of an n × n matrix A = [aᵢⱼ] is defined as

PETA

di

ay

AytaiztiztainAin

Previous elimination of a matrix can greatly simplify its determinant computation by supplying zero's to the chosen row or columns. Because every term involving a zero coefficient has a cofactor of zero, those terms are automatically zero. The determinant of a reduced matrix is still equal to the determinant of the original matrix. For 93

example, in the matrix below, the standard computation gives:

A

2

3

y

yo

4

Bett

t

I if I

2

7 in

214

20

32

87

3 40

ft

3

366

16

37 30

35

Supply zero entries to the third column by quick addition of twice the first column to it, giving an equivalent, but much simpler, determinant equation

I ÉÉ 5

A

7

I I

35

This is the beauty of matrix theory: the matrix is linear, such that anything done to some column, row, or entry changes also the rest of the matrix just enough to retain certain immutable properties of the system's structure, regardless of the actual numerical value of the entries. In this way, we use matrix theory to investigate the linear structure between equations represented by matrix entries. For this reason, any row or column of a matrix might be multiplied by a nonzero constant k without changing the linear properties of the matrix, including its determinant. Conversely, this means we factor k out of any row or column of a difficult matrix, and that, if a row of a matrix B is got by multiplication by k of a row of A, then detB = k detA:

7

15 17

I E E

s

II

Likewise, if a matrix A and a matrix B differ only by a constant multiple of single row or column, then detB = detA. As in the example above, always seek zero entries to help reduce the difficulty of calculations. When given a matrix with zero entries, always reduce to the column that yields the quickest zeros. For example, in the below matrix, expand the determinant by the third column, since it already has only one entry:

Obviously, then, any matrix with a column or row full of zeros has a determinant of zero. If any two rows or columns of a matrix A are exchanged, then the sign will be negative if there are an odd number of exchanges, and positive if there an even number of exchanges:

1 94

I

l

I

If any two rows or columns of a matrix A are identical, then detA = 0. This follows from the last property: let B denote the matrix obtained by interchanging any row or column of A. Then B = A, and detB = detA, so that detB = – detA. Thus detA = –detA, giving detA = 0. It follows that detAᵀ = detA. If a matrix is reduced to an upper or lower triangular echelon form, then the determinant is simply the trace, the sum of diagonal entries of the matrix:

TRA

42 32

This is easily proved:

3 11

9

2

z

8

f

g p

2

an tazz

f

8

31 275

5 17

3

Ai

4

4

I

31 2756 4

The trickiest part of calculating determinants is following the signs through the computation. It is useful to simply recognize the sign pattern of cofactors in a determinant:

A concluding property of determinants has to do with the matrix inverse, the subject of the next section. As we will see, if a matrix is invertible, it is reducible and non-singular, and its determinant is nonzero. If a matrix is noninvertible, then it is irreducible and singular, and its determinant is zero. This property simply follows from the definition of the matrix inverse, in which the determinant of the matrix plays the role of denominator to the adjugate of the matrix. Finally, we point out the geometric nature of the determinant. Why does the determinant have the same value, no matter the constant differences, row reductions, or entry exchanges? If we think of a matrix, as we did to open the section, as a function of column vectors, then we can think of the determinant as the signed area of a parallelogram whose legs are the column vectors of the matrix. For example, the area spanned by two vectors a and b is given by det|a, b|:

E

DET

f

off

at

The sign is given by the orientation of the vectors; the determinant is positive if and only if b lies counterclockwise from a, and is negative only if b lies clockwise from a:

E

I

Itt 95

The Matrix Inverse The cofactor matrix C of a matrix A is the matrix composed purely of its cofactors:

A

I

2

I

0

I

3

z

o

i C

Lz

31 is is 2

6

2

2

1

4

5

3

I

The matrix adjoint or adjugate of A, adjA, is the transposed cofactor matrix:

ADJA

E

2

z

5

6

t

3

2

4

1

The matrix inverse of matrix A, A⁻¹, is defined as its adjugate over its determinant

A

BEF

S

as

For example, in R²,

c

5

Multiplication of the adjugate with its matrix gives the determinant pivot, ie product AI:

AE AI As an example of using the matrix inverse formula, the inverse of the following matrix A is found by

A

0 o

c 96

3

I 0

PETA

I 3

0

3

test

C

3

2

o

I

o

7

3

Cramer's Rule Cramer's rule uses determinants as a sometimes faster way of solving a system of equations Ax = b. It does this by replacing column j in matrix A with known solution vector bⱼ. To see how this works, first reformulate unknown solution vector x by way of the matrix inverse A⁻¹:

E

AI

I

PEACE

The product of A's adjugate and solution vector is equal to the determinant of Cramer matrix Bⱼ, which is defined to be the matrix A with column j replaced by the solution vector bⱼ:

CT 8

DEB

A

31

Col

b

Incorporation of solution vector bⱼ into the identity matrix replaces any column of A upon multiplication:

I

Y

is

Using these formulae, Cramer's rule states that the unknown solution vector x is equal to detBⱼ/detA:

IE

In the following example, we apply Cramer's rule to the system of equations

34

4

2

2

Tx

6

2

4

I

2

first via the adjugate,

DETH

CT

GY

DetBz

34

I I 4

56

I

then confirm it conforms to the simpler formula detBⱼ/detA:

DEA I

44

DEB

2

Ifi

x Xz

41 21

2 z

2

1

I

ÉyÉ 97

Vector Products The cross or vector product, ×, is the product of the magnitudes of two vectors and the sine of the angle between them:

I

É

a

o

5

n

7

a

0

Ext

T

E

Exa

E

-a vector c satisfying the following four vector product properties: i) Orthogonality: a · (a × b) = 0, b · (a × b) = 0 The vector c is orthogonal to the plane spanned by a and b. For example, for two vectors u = (3, 2, 0) and v = (1, 4, 0), with only x and y components, new vector w = u × v lies entirely in z:

ex Ey Ez 3 2 O 1 4 0

Lodz

ii) Anticommutativity: c = a × b = –b × a This just reverses rows 2 and 3 in the vector product's determinant:

Ex hey Ez iii) Magnitude: |a × b| = |c| = ||a||b| sin θ| The absolute value of the cross product |a × b|, the length of vector c, |c|, is equal to the area of a parallelogram spanned by the vectors a and b:

f

Ext q

o

E

Ext

g

To demonstrate this, if we start with an area we know, say 4, we can, using our determinant rules, easily think of two vectors a and b that will result in a cross product c with a single component with absolute value (ie, length) of 4:

E 98

E

one one

422

42

8

4

Geometrically, if the two vectors a = (1, 2, 0) and b = (2, 1, 0) span a parallelogram, its area (b × h) is 4:

I

I

2

z

y

Our geometric demonstration is convincing, but now let's prove it algebraically. The area of the parallelogram spanned by a and b is |a|·|b| sin θ. Then

cos o

Éf

Sino

I

ai bit azbztazb

aft at as 67 63 63

cost

1

Cattaztas 67 62 63

Cata

63 63 63

a

attazta 63 62 63

Cafta E f sino

Cattazta 67 63 63

a

aibitazbztas.bz

Casta as 67 62 63 a

att É t

z

aibitazbztas.bz 63 62 63

aibitazbztas.bz

a bitebztatbtazbitazbztazbstazbitaz.bz a b

aibiaibitaibiazbztaibiasbstazbzaibitazbzaz.bz tazbzasbstazbsaibitasbsazbztasbsas.bz

at Gtabeta63tazGtazbztaz63ta36ta3bzta36

att

a b azbztaibiasbstazbzaibitazbztaz.bzasb

azbzaibitasbsazbzt.EE azbzta363tazftaz63ta36tas.bz

abet a b

te

zaibiazbztzaibiasbstzazbsaz.bz

za b azbztazbsta36 zaibiasbstaz63tazbz zazbsaz.bz

te

te

Cabz ad t cabs asbAcads astr

Ext

This concludes our proof. We'll see soon enough that the three terms in the above radical are the components of the vector product, and how to actually compute it with these components. 99

iv) Right-handedness: a → b → c The vectors of the vector product lie naturally along the thumb, index, and middle finger of the right hand, and only along this orientation is c positive. More simply, when imagining R³, the next axis to be drawn always lies to the left of the previous axis, in a new plane:

I nd

y

Compute the vector product just like a determinant, but replace the first row (or column) with a unit vector. This leaves our determinant sum with vector components, creating a vector, and with the components of the vector product:

É É É 61

62 63

azbs

a

I

b

9,63 936

éyt a bz ad Ez

For example, the vector product of vectors a = (2, 1, 2) and b = (2, 4, 1) is the vector c = (–7, –2, 6):

Ex Ey I

462

211

zéyt6Éz

IZ

ICI

343

7Ex

262 4 2147 211

The length of this vector is equal to its magnitude:

E

7

2

6272 1672

89

-the area of the parallelogram spanned by a and b. We could also use a parallelogram determinant to find the area spanned by two vectors in Rⁿ. Take the square root of the determinant of the square matrix of the column matrix of the vectors:

Yz

I

pi p

b

a

12

Arena E

Using the two vectors a = (2, 1, 2) and b = (2, 4, 1) as example,

I 4 -which conforms to our previous result. 100

z

Z

1

4

z

Yz

9 to k 10 2

89

Use the vector product components to confirm the orthogonality property:

I

Ext

u

Guzy

uz uz

WgrzEx wives war é

wirz war

E

If two vectors are orthogonal, then their scalar product is zero. Similarly, if the vector product of two vectors are zero, then they must be parallel. They must have two equal rows:

Ex Ey Ez

Oz

3 I 8

Triple Products If the area of a parallelogram is given by the length of the vector product, then the scalar product of the vector product with some vector c anchored at the origin should give us a volume. The triple scalar product, c · (a × b), is the signed (oriented) volume of a parallelepiped having as edges the three vectors involved:

at

e n

7

y 10

q

Ext

E

The triple scalar product is cyclic and anticommutative:

E E

Exe

Ext

F

E

Ext

Voila I

E

Ext

E

ext

I

axe

Ext

E

This algebraic product corresponds to the volume given geometrically by the absolute value of a determinant that has as columns the three vectors that form the edges of the parallelepiped:

Voila T

E

DETCE T E

For example, using vectors a = (1, 4, 2), b = (2, 1, 3), and c = (2, 3, 1), we have the volume of the parallelepiped they form the edges of as

E

Ext

2

I

It É I 3

to

3133

8 4

1

7 37

20

16

16

101

The triple vector product is the cross product of one vector a with the cross product of another two b and c:

E x FXE The triple vector product must also be a vector, w:

I

E x FXE

The vector w must lie in a plane defined by b and c, in the direction of êq:

aia

The vector w is not required to be orthogonal to b or c, but to a vector d = b × c. Thus, the triple vector product w = a × (b × c) reduces to the vector product w = a × d. The triple product expansion is defined to be

E x FXE

F E

E

Eta f

Prove the above identity using the tensor expression for the triple vector product (see Part V, §6 Tensors). The triple vector product is cyclic and anticommutative:

a x EXE

E

x

Ex a x

exE

a x EXE

6

FxE

Ex

The sum of the vector product with its two cycles is equal to zero:

I x FXE -the Jacobi identity. It follows that 102

E

x

Exa

Ex E x

5

o

-the Jacobi identity corollary. To conclude this section, we give an important identity, the squared identity of the vector product:

Ext

Ca a

F 6

I

E

It's easily proved:

Ext

EXT

EXT

E F sin OF I

2

I SING

I

2

5 41

q

Z

f

2

a

2

I

2

a

2

I

2

a a

cost 2

I

62020

E f cos o E

E E

67 E

I

103

5.2 Eigenvectors Linear Operators A linear operator makes one vector dependent on another in the same way a function makes one scalar dependent on another. In other words if vector C is a function of D, then a linear operator O transforms C into D, ie

OCC

D

Specifically, a linear operator on a vector space V is a linear map T from V to itself. It satisfies the linearity condition:

TCCTtw

CTCF

TCW

When the linearity condition is in this form, it's easy to see that the operator acts on the basis vectors, “bypassing” the scalars:

TCT

Vite

Teviei

Vit

e

-linearity causes operators to act on basis vectors. Denote the set of all linear operators on V by L (V), i.e., L (V) = L (V, V). An invariant subspace is a subspace that gets mapped into itself. For T ∈ L (V) and U being a subspace of V, U is invariant under T if u ∈ U implies Tu ∈ U. Conversely, U is invariant under T if T is an operator on U: T| . u One-dimensional subspaces of V are especially easy to describe (it consists of all the vectors that run along a single line) and likewise it is easy to observe the behavior of operators there. Take any nonzero vector u ∈ V and let U equal the set of all scalar multiples of u:

U

at

a

EF

Then U is a one-dimensional subspace of V, and every one-dimensional subspace of V is of this form. If u ∈ V and the subspace U defined by is invariant under T ∈ L (V), then Tu must be in U, and hence there must be a scalar λ ∈ F such that Tu = λu, or, with vectors emphasized, Tu = λu. Conversely, if u is a non-zero vector in V such that Tu = λu for some λ ∈ F, then the subspace U is a one-dimensional subspace of V invariant under T.

Eigenvectors and Eigenvalues An eigenvector is a vector that does not change direction under a linear transformation. Therefore, it lives in an invariant subspace and that mapping that transforms it must be an operator. With the eigenvector is an associated eigenvalue λ ∈ F that is a scalar multiple of the eigenvector v:

AT

XT

Specifically, the scalar λ ∈ F is an eigenvalue of the n × n matrix A provided there exists a non-zero vector v such that Av = λv. Vector v is required to be nonzero because with v = 0 every scalar λ ∈ F satisfies the equation. The scalar λ creates the state of the eigenvector- stretching it or shrinking it or reversing it or leaving it unchanged. The eigenvectors of a 104

symmetric matrix are always orthogonal. What vectors multiply the matrix A only to give themselves?

Y

A

I

T

E

T

Av

I

X

1

I

I

Similarly, for a non-symmetric matrix,

A

S

6

z

z

T

L

I

Av

s

6

z

z

X

3

I

32

32

z

1 1

The sum of eigenvalues Σλ of a matrix A is equal to Tr A. The product of eigenvalues Πλ is equal to det A. An n × n matrix has n eigenvalues. If λ = 0, then λ is in the nullspace. The same eigenvector may differ by some constant. All antisymmetric matrices admit complex eigenvalues.

The Characteristic Equation Given a matrix, we want to be able to find their eigenvectors and associated eigenvalues. Begin by moving the λv term in the definitional equation to the left hand side and factor out the v:

AT

XIN

CA

XT

O

The logic in doing this is that, if (A – λI)v = 0, then either (A – λI) is an invertible matrix with v = 0, or (A – λI) is singular non-invertible with |A – λI| = 0 and v is an eigenvector in the nullspace that may be calculated along with its associated eigenvalue. The equation

A

XI

O

is the characteristic equation. Assuming its existence for a matrix A allows for eigenvalues to be directly calculated as the zeros of the equation. Substituting these values back into the characteristic equation creates a matrix who's determinant is equal to zero, and therefore eigenvector v can be calculated as the vector that is taken to the zero vector by A, i.e., the vector in the nullspace. For example, for the following matrix A, λ and v are found by first setting up the determinant and solving it: subtract λ from the diagonal, take the determinant and solve for eigenvalues that are the zeros of the equation:

A

A

XI

X

3 I

3

p

3

3

XT 1

42 6

8

X

D 2

X

2

X 4

1 4

The polynomial λ² – 6λ + 8 is called the characteristic polynomial, the unique polynomial associated with the characteristic equation of A. Next, substitute back into the characteristic equation the values of λ, finding the 105

eigenvectors in the nullspace:

A

A

21

I

1

I

l

41

0

IT

I

I

I I

Pilot

IT

Y

o

d

I

I

pilot

That these are the eigenvectors and eigenvalues turns out to be true:

3 I

A

Il

3

z

T

3

IXE

1

I

3

A

v2

II

Te

Te

II 4

T I

I

XE

Sometimes, using the sum and product of eigenvalues equivalencies is faster than using the characteristic equation. For example, on an antisymmetric matrix, it can quickly be determined that the only number with a sum of 0 and a product of 1 is imaginary opposites:

t

Q

EX

TQ

0

X DETQ I Confirmable via its characteristic equation:

t

XI

Q

X

y

I

X

1

Matrices with eigenvalues equal to 0 and 1 are projection matrices. Matrices with eigenvectors equal to 1 or -1 are reflection matrices. Matrices with absolute value of eigenvalues equal to 1 are permutation matrices. Any matrix given as upper triangular has its eigenvalues displayed on its diagonal:

A

106

1

2

3

1

2

34

I

XE X

t

O

I

1 3

Eigenspaces Sometimes, there is more than one eigenvector associated with an eigenvalue. Every eigenvector adds a dimension to the solution space of the characteristic equation: the eigenspace, the subspace of Rⁿ consisting of all n eigenvectors associated with a particular λ, along with the zero vector. It is then said that the n-dimensional eigenspace has basis {v¹, v², ..., vⁿ} associated with the eigenvalue λ. For example, to find the bases for the eigenspaces of the matrix

g

4

2

z

o

I

z

z

3

First find eigenvalues via the characteristic equation, then associated eigenvectors in the nullspace, the number of which gives the dimension of the eigenspace:

2

I

4 X

Y 3

2

266

2X

Z

X

I

21

264

2

z

Y 3

z

4 X

4 X

3

X

4

2

12

X

7

2

2

13 3

8

16

8

2

X

12

7

4

2

4 2

4

27

4

2X

16

27

12

Use the rational zero theorem, the factor theorem, and synthetic division (see Algebra, §IX) to quickly find the roots of the cubic polynomial:

X

IÉ É Theorem

2

I

z

7

I

16 12

X X 2

I

18

7

X

2,2

2

16

7

12

4 11

o

D

0

2

Etten

POLYIELDSTO

16

12

X 2

X

5

61

X 2

x 2

Y 1

5

61 x 3

3

The repeated eigenvalue has more than one eigenvector. The number of extra eigenvectors is the dimension of the eigenspace for this eigenvalue. Solving for the nullspace eigenvectors,

2 2 z z

I I l 42 l z

z

I

z

z

I

A ZI

U

R

107

Entries of N go in the transposed pivot column's spot, the top row. Fill the rest of the matrix with I. The eigenvectors are then clearly displayed as free columns:

I

N

Yz

1

T

I

I

t

I

E

I

o

o

z

t t

That an eigenvector can differ by some constant does not affect its function in solving a system to zero, which is why the latter eigenvector was arbitrarily changed. In general, ratios are not wanted in the eigenvector, and they are always multiplied out. When a matrix only has one equation, it is probably faster to simply ask what vectors will solve the equation, for which the same answer would have been discovered. The two above eigenvectors are associated with repeated eigenvalue λ = 2. Then the 2-dimensional eigenspace of A associated with repeated eigenvalue λ = 2 has basis {v₁, v₂}. For the other eigenvector,

I

2

I

2

3

I

t

t

z

z

O

z

2

1

2

I

1

2

t

I t

l

l t

t

R

A 31

I I

1 I

I v3 I N

I

Then the eigenspace of A associated with eigenvalue λ = 3 is 1-dimensional and has basis {v₃}.

Diagonalization

An n × n matrix A may be diagonally factored into the eigenvector matrix, the eigenvalue matrix, and the inverse eigenvector matrix:

A SAS

Where S is the eigenvector matrix, composed of columns of eigenvectors, and Λ is the diagonal matrix of A, with n eigenvalues of A running down the diagonal. Two n × n matrices A and B are said to be similar if there exists an invertible matrix P such that

B P AP

An n × n matrix A is diagonalizable if it is similar to a diagonal matrix D, i.e., if there exists an invertible matrix P such that A = PDP⁻¹ and so

P AP

D

The process of finding P and D is called diagonalizing A. An n × n matrix A is diagonalizable if and only if it has n linearly independent eigenvectors. If an n × n matrix A has n distinct eigenvalues, then it is diagonalizable. Then the diagonalization of A consists of finding S and Λ. In S and Λ, the order of eigenvector columns in S correspond to the positions of associated eigenvalues λ along the diagonal of Λ. Every symmetric matrix is diagonalizable by an orthogonal matrix.

108

Powers Any matrix that maps to powers of itself is an operator, and this distinguishes them for their usefulness in polynomial and matrix equations. According to the definitional equation of eigenvectors, any power A is raised to, λ must also be raised to:

A AT ACXT

AT XX

XT AT

When A is diagonally factored, Λ shares this same property:

A

SAS

SAS SAS

In general,

SA'S

A

Find a power of an n × n matrix A by raising Λ to that power and multiplying SΛS⁻¹. Therefore, to do so, i) find eigenvalues, ii) find eigenvectors, iii) find the inverse eigenvector matrix, iv) raise Λ and multiply. For example, finding the third power of the matrix A via powers of its diagonalization,

A

XI

A

X

l o

2

z

3

C X G X

X

1 3 2

I

I

I

X

z

s

t

ti

c

l I

A STS

t

o 5

t

t

t

13

t

33

t

2

I

yes

g

1 I

1

I

I

I

I

1

1 26

a

27

Which is confirmed by conventional means:

A

I

3

I

2

3

I

2

3

I

8

I

26 27

109

Sequencing Any sequence that has three determinable consecutive terms can be sequenced by matrix according to the sequencing formula:

Un Anno SNStwo Where u₀ is the vector that has as its entries the first two terms of the sequence, and u₁ is the vector that has as its first two entries the second and third term in the sequence. We then ask what matrix multiplies u₀ to give un . For example, in the Gibonacci sequence, where every new term is the average of the previous two, the vectors would be

no

Anti

Antz

un

G

Ant

for which the Gibonacci rule is

Antz

Entitle

Then the rule matrix is

112 112

Antz

Ant

Now that the rule matrix A is known, the sequencing formula can be used by diagonalizing A, raising to the power of the term vector to be found, and multiplying the diagonalization by u₀. Taking the Gibonacci to its third power:

It

X Yz X

V2 X Yz

X

I

X

112

Th A

I

I

y

1

II

112

I

42

I

I

yz

At YI

Ii

I

se

Xz

12

X 1

Y Y X 112

Yz

S

g g

3

l

2

S

If u₀ contains the first and second term of the Gibonacci sequence (0 and 1), then u₁ contains the second and third term:

31

I

I

I

2

l

2

1

I

I

I

0

42 s

31

I I

Then the third term in the Gibonacci is 1/2, which is, in fact, what it is. 110

1

I

I

2

s

l

z

12

y

I

Chapter Six Real Analysis

6.1 Analysis of Real Numbers: The Idea of Real Analysis. Infinite Sets. Real Numbers. Real Topology. 6.2 Essential Definitions of Real Analysis: Open Balls. Closed Balls. Open Sets. Closed Sets. Convergent Subsequences. Bounded Sets. Convergent Sequence Properties. Subsequence. Closure. Limits of Functions. Limits of Transformations. Limits of Compositions. Continuous Functions. Convergent Vectors. Convergent Matrices. Compact Sets. Least Upper Bound. Maximum. Minimum. Existence of Maxima and Minima. The Mean Value Theorem. 6.3 The Inverse and Implicit Function Theorems: Newton's Method. Lipschitz Conditions. Lipschitz Constants. Kantorovitch's Theorem. Superconvergence. The Inverse Function Theorem. The Inverse Function Theorem - Short Version. The Implicit Function Theorem Short Version.

14.1 Analysis of Real Numbers The Idea of Real Analysis The notion of continuity is fundamental to the machinery of differential geometry. It's not enough to assume that coordinates change continuously, or that one curve maps to another continuously; we must prove it. That's what real analysis is all about. In this chapter, we'll dig deep to establish a firm foundation for manifold theory. For clarity, we'll summarize the ideas of real analysis, as it concerns differential geometry, here. A real function of a single variable is said to be analytic at x = x₀ if it has a Taylor series expansion about x₀:

fx

fix f Xo

x

thot z

to

x Xo

fix

x

x Xo

to

É

t's

fÉ

x Xo

x

É yt

to

f4

t

Therefore, functions that are not infinitely differentiable are not analytic. Infinitely differentiable functions exist which are not analytic, however, the standard example being exp(–1/x²), whose value and all of whose derivatives are zero at x = 0, but which is not identically zero in any neighborhood of x = 0. This is explained by the fact that the analytic extension of this function into the complex plane has an essential singularity at z = 0, but is well behaved on the real line. Analytic functions are good approximations to many nonanalytic functions in the following sense. A real valued function g(x₁, ..., xₙ) defined on an open region S of Rⁿ is said to be square integrable if the multiple integral

I

gun

xn

Jx Jx

Jan

exists. It is a theorem of functional analysis that any square integrable function g may be approximated by an analytic function g' in such a way that the integral of (g – g')² over S may be made arbitrarily small. In other words, the difference between the analytic and nonanalytic functions can be made negligible given a positive definite square integrable function. For this reason, a given physical function may be assumed to be analytic. Since a C function need not be analytic, use the notation C for analytic functions. A map f: M → N is continuous at x in M if any open set of N containing f(x) contains the image of an open set of M, where M and N are topological spaces. In other words, when open neighborhoods of one topological space map to open neighborhoods of another topological space, the mapping between them is continuous. When the map f is continuous at all points of M, f is continuous on M. A continuous mathematical mapping of this kind takes a geophysical feature, such as a mountain, and maps it to the plane, a common map: P Q

M

f

f(Q) 112

f(P)

f

N

Obviously, to say a function f(x₁, ..., xₙ) on an open region S of Rⁿ is Cᵏ differentiable is to say that it has continuous partial derivatives of order less than or equal to k on S. If f is a bijective map of an open set M of Rⁿ onto another open set N, it can be expressed as

y

fill

Xn

Xz

J f I

or

or

y

yn

f Xu

where {xᵢ} defines a point x of M and {yᵢ} likewise defines a point y of N. If the functions are all Cᵏ differentiable, then the map is said to be Cᵏ differentiable. The Jacobian matrix of a C¹ map is the matrix of partial derivatives, the first derivative matrix:

Its determinant is the Jacobian, J. If the Jacobian at a point x is nonzero, then the inverse function theorem guarantees that the map is bijective in some neighborhood of x. This dovetails nicely with results of linear algebra concerning nonsingular invertible matrices, in particular, that without a nonzero determinant, the inverse matrix fails to exist by

IT

A

If a function g(x₁, ... , xₙ) is mapped into another function h(y₁, ... , yₙ) by the rule

hlf lx

Xu

fully

xu

get

Xu

(that is, h has the same value at f(x) as g has at x), then the integral of g over M equals the integral of hJ over N:

Smg x

fhly

Xu tx dxz dxn

yn

J dytyz dyn

-the change of variables equation. Since g and h have the same value at appropriate points, it is often said that the volume-element dx₁dx₂...dxₙ has changed to J dy₁dy₂...dyₙ . An operator A on functions defined on Rⁿ is a map which takes one function f into another one, A(f). An obvious operator is the derivative,

Jtf

Df

or the multiplicative operator A(f) that is just gf where g is another function. An important type of operator occurs in integration using a fixed kernel function g:

fly g

Gfcx

x y

dy

and there are also more complicated operators such as

Ef

ft

3

113

Specifying the set of functions on which an operator is allowed to act in fact forms part of its definition; this set is its domain. For example, operator D may not be defined on a function which is not C¹, while operator G may be undefined on, say, functions which give unbounded integrals. The commutator of two operators A and B, [A, B], is another operator defined by

CAB BA f

A B f

AB f

BA f

If two operators have vanishing commutator, they are said to commute. Now, let's review some terms of real analysis so we can prove the inverse function theorem.

Infinite Sets The concept of convergence begins with infinite sets. An infinite set is obviously any set with an infinite number of elements. These sets may be countable or uncountable, depending on their specific properties. The countable number of members of a set is its cardinality. Using cardinality, its easy to see that the infinity of R must be larger than the infinity of the naturals, N. Theorem 1.1.1: The cardinality of infinite set N is smaller than the cardinality of infinite set R. Proof: Take an infinite list of reals, {R}:

9374 2 9 71 3 I 0735 9 6 59

576 06 S S 8 6776 5 3 28 229

Add a new term by taking the diagonal and making an arbitrary rule change for integers 0-9:

1157

9097

ts

-the number .7775 is not on the list: it can't be the nth term because it doesn't share the nth decimal. The infinity of R is greater than that of N. In axiomatic set theory, set theory for which absolute axioms are given, infinite sets are built by was of successors, and essentially rely on the reasoning that for every number one can come up with, a successive number may always be found that forms a new set for which the previous is a countable subset. Axioms are aggressively given, instead of proved, in axiomatic set theory to eliminate paradoxes that arise when dealing with infinite sets. The most famous of these is Russell's paradox, which says that if a set is defined to be the set which contains all sets that have no members like themselves, then that set must belong to itself:

IR 114

X

X

X

R E R e

R ER

Real Numbers Real numbers are introduced in college algebra (Appendix A), and their properties of continuity are studied throughout the calculus of single variable (Appendix B). Real numbers are the set of all infinite decimals, ordered from least to greatest such that all numbers, from naturals to rationals, are included, and such that any number obeys the rounding convention defined by k-closeness. Two points x, y ∈ R are k-close if for each i = (0,..., n), |[x ] – [y ] | ≤ 10 k, where k is the decimal i k i k placement of the numbers. For example,

9998

Cy

x

1.0001

0.0003

AEemEnt

I 154

0001

In other words, k-close is the technical way of explaining the convention that a number ending in all 9's is equal to the rounded up number ending in all 0's. When this is considered, the above agreement between .9998 and 1.0001 is said to be 3-close: they agree to three decimal places. When two numbers are k-close for all k, the two numbers are the same.

Real Topology The space Rⁿ is the usual n-dimensional space of vector algebra. A point in Rⁿ is represented by a sequence of n real numbers, an n-tuple, (x₁, x₂, ..., x ). The real space Rⁿ is continuous, following from the definitional continuity n of R. That R is continuous endows it with a certain physical applicability beyond all other kinds of numbers except the complex numbers. Intuitively, continuity means that there are points of Rⁿ arbitrarily close to any other point, and that a line joining any two points can be subdivided into arbitrarily many pieces. This is opposed to, say, a lattice, such as the set of all n-tuples of integers (i₁, i₂, ... , i ). In practice, we recall from Vector Calculus the n definition of multivariable continuity, namely, that, like functions of a single variable, continuity in multiple variables is defined in terms of limits. A function is differentiable on an open region U if and only if it has continuous partial derivatives on U. A function ƒ(x, y) is continuous at the point (x₀, y₀) if i) f is defined at (x₀, y₀), ii)

lim

f(x, y) exists

lim

f(x, y) = f(x₀, y₀)

cx.gs exoyo

iii)

cx.gs exoyo

When a function can be differentiated n times at each point in its domain, and the nth derivative is continuous, the function is considered Cⁿ-smooth. If a function is Cⁿ-smooth for every positive integer n, it is considered C -smooth. In real analysis, the concept of an open ball was defined analogously to that of an open interval, namely, by usage of the relations ≥, ≤, >, or 0 such that an open ball centered at x of radius r, B (x), still lies within U, i.e., Br (x) ⊂ U. An open neighborhood and an open r ball clearly have the same meaning, but now it would be said that the set of points for which a ≤ x < b is not open because the point x = a does not have a neighborhood entirely contained in the set, since some points of any neighborhood of x = a must be less than a and therefore outside of the set:

Two points are near when they lie in the same neighborhood. In the same way an open interval is analogous to an open ball, a neighborhood is in general any dimensional "chunk" of Rⁿ, and its set is open if it its boundary is not included in the set. In topology, the idea that a line joining any two points in Rⁿ can be infinitely subdivided is made precise by saying that any two points in Rⁿ have neighborhoods which do not intersect when those neighborhoods are chosen small enough, the Hausdorff property of Rⁿ:

o

o

From the continuous structure of Rⁿ, this property describes the discrete structure caused by subdivision. In terms of a perfect fabric, be it any boundary, such openness within it implies its perfect malleability, and its possession of the Hausdorff property means that it is also measurable. 116

14.2 Essential Definitions of Real Analysis We'll need the inverse function theorem to define manifolds rigorously. To make sense of it, let's review the following concepts of real number analysis.

Open Balls Let's define some terms of real analysis. Essential terms of set theory are briefly given in Appendix A.2. For any x ∈ Rⁿ and any r > 0, the open ball of radius r around x is the subset

Brix

yERM

x

r

y

For example, the ball of radius 2 centered at p would be B₂(p).

Closed Balls For any x ∈ Rⁿ and any r > 0, the closed ball of radius r around x is the subset

Bra

yERM

X

y Er

Open Sets A subset U ⊂ Rⁿ is open in Rⁿ if for every point x ∈ U there exists r > 0 such that Bᵣ(x) ⊂ U. In other words, no matter the boundary of U, an x with radius r may always be chosen closer to the boundary.

Closed Sets A closed set of Rⁿ, C ⊂ Rⁿ, is a set whose compliment Rⁿ – C is open. For example, some open and closed sets, using interval and set notation, are a

b

x EIR a

x a

6 b

Ca 6

EXER

asx

x ER

axe 6

6

Convergent Sequences A sequence of points a₁, a₂, ... aₙ in Rⁿ converges to a ∈ Rⁿ if for all ε > 0 there exist M such that when m > M. Then |aₘ – a| < ε. Then a is the limit of the sequence.

Bounded Sets A subset x ∈ Rⁿ is bounded if it is contained in a ball in Rⁿ centered at the origin:

X C Br o forsome r

a 117

Convergent Sequences Properties Let aₘ and bₘ be two sequences of points in Rⁿ, and cₘ be a sequence of numbers. Then i) If aₘ and bₘ both converge, then so does aₘ + bₘ, and

Lingam t mtg6m

mfscantbm

ii) If aₘ and cₘ both converge, then so does aₘcₘ, and

IdeCm mean

mess Emam

iii) If aₘ and bₘ both converge, then so does aₘ · bₘ, and

inseam mustbm

mess2amGm

iv) If aₘ and bₘ both converge, then so does aₘ · bₘ, and

mtg CmAm

0

Subsequences A subsequence of a sequence a₁, a₂,... aₙ is a sequence formed by choosing elements of the originating sequence by some rule such that aᵢ₁, aᵢ₂, ..., aᵢₙ ↔ i(k) > i(j) when k > j If a sequence converges to a limit, so does its subsequence.

Closure If A ⊂ Rⁿ is a subset, the closure of A, A, is the set of all limits of sequences in A that converges in R:

A ISCLOSED

A A

Limits of Functions A function f: U → Rᵐ has the limit a at x₀,

Yot

x

fixo

a

if x₀ ∈ U, and if for all ε > 0 there exist δ > 0 such that when |x – x₀| < δ, and x ∈ U, then |f(x) – a| < U(ε). In other words, when f is evaluated at x arbitrarily close to x₀, f(x₀) is arbitrarily close to a.

118

Limits of Transformations Observe an Rᵐ -valued mapping:

J Th

Mm

-in goes one vector, out comes another: a vector function, f. When f has no arrow, the vector function is treated as a coordinate function. Multivariable functions decompose into their individual coordinate functions. Using vertical vector notation:

5 t th IT

Let

II

E

III be a vector function defined on a domain U ⊂ Rⁿ, and let x₀∈ Rⁿ be a point in U. Then

IX F

a

exists if and only if each of

FIX Ji

ai

exists, ie

HIIIII The properties of limits of functions are similar to the convergent sequence properties; the limit of the sum is the sum of the limit. The limit of the product is the product of limits, and likewise for quotients and scalar products. Limits of Compositions If

ftp.FCX

Y ANDgtfogey

exists, then

II

JFK

yegg y

119

Continuous Functions Let X ⊂ Rⁿ. Then a mapping f: X → Rᵐ is continuous at x₀ ∈ X if and only if for every ε > 0, there exists δ > 0 such that when |x – x₀| < δ, then |f(x) – f(x₀)| < ε. Thus, vector function f is continuous on X if it is continuous at every point on X:

IX f

x

fix

Let U be a subset of Rⁿ, f and g mappings U → Rᵐ, and h a function U → R. Then i) f, g continuous at x₀ → so is f + g ii) f, h continuous at x₀ → so is hf iii) f, h continuous at x₀ → so is f / h, with h(x₀) ≠ 0 iv) f, g continuous at x₀ → so is f · g Therefore, i) Any polynomial function Rⁿ → R is continuous on all of Rⁿ. ii) Any rational function is continuous on the subset of Rⁿ where the denominator does not vanish. Continuous functions are defined for integers and reals.

Convergent Vectors A series

Eia

is convergent if the sequence of partial sums

In Eia is convergent. Then the infinite sum is

Edi

YI In

Convergent Matrices A series of n × m matrices

É Ak

converges if for each position (i, j) of the matrix the series of the entries (Aₙ)(i, j) converges.

Compact Sets A subset C ⊂ Rⁿ is compact if it is closed and bounded. If a compact set C ⊂ Rⁿ contains a sequence x₁, x₂, ..., xₙ, 120

then that sequence has a convergent subsequence xᵢ₁, xᵢ₂, ..., xᵢₙ whose limit is in C: Limit

EBOUNDAT

eat

Boundary

X

Xii

Convergent Subsequence

-a compact set C has continuous first derivatives. Then C² has continuous second derivatives, and so on.

Least Upper Bound A number x is the least upper bound of a vector function f defined on a set C if x is the smallest number such that f(a) ≤ x for all a ∈ C.

Maximum A number x is the maximum, max, of a vector function f defined on a set C if it is the least upper bound of f and there exists b ∈ C such that f(b) = x.

Minimum A number y is the greatest lower bound (inf) of a vector function f defined on a set C if y is the largest number such that f(a) ≥ x for all a ∈ C. A number y is the minimum, min, of f if there exists b ∈ c such that f(b) = y.

Existence of Maxima and Minima Let C ⊂ Rⁿ be a compact subset, and f: C → R be a continuous vector function. Then there exists a point a ∈ C such that f(a) ≥ f(x) for all x ∈ C, and a point b ∈ C such that f(b) ≤ f(x) for all x ∈ C. In other words, the max and min of f exists.

The Mean Value Theorem Roughly speaking, the mean value theorem states that for a given planar arc between two endpoints, there is at least one point at which the tangent to the arc is parallel to the secant adjoining its endpoints on some interval. If f : [a, b] → R is continuous and f is differentiable on (a, b), then there exists c ∈ (a, b) such that

f cc

Fcb fca b

a

See Appendix E for proof.

Derivatives Calculus replace nonlinear mappings with linear ones so that they may be studied. Curves become lines, surfaces become planes, nonlinear equations become linear transformations. Theorem 14.1.1: Let U be an open subset of R, and f: U → R a function. Then f is differentiable at a ∈ U if following the limit exists:

flag

Lfo L flath

f

as

121

14.3 The Inverse Function Theorem Newton's Method The nth roots x of any number b is given by:

x I y

fix

x

6

0

The method of dividing and averaging, in which a first guess a₀ is replaced by a successive guess a₁, every new guess being of the form

Cota

a

and ultimately converging to some root, is Newton's method in disguise. They both identify the same geometric phenomena:

fix

x

6

6

4 do 3

a

do a

2.167

I

2.167

3

n

4

2.167 734 I 2.0006

Az

2

13 6

313

952.0006 -convergence to the root: x = ±2. Interpreting Newton's method geometrically exemplifies the distinctions between a continuous and a linearized function:

Continuous

Cao X do

Linear

Starting at f(a₀), the linearized function's x-intercept converges to the root:

x a

122

f

x

fool

Df

Icao

Dao

8

-a system of n linear equations in n unknowns:

DILA

X

A

A

Icao 6

X

If [Df(a₀)] is a nonsingular, invertible matrix, then

f

ait b

x

x

Ao

x

a

Dfat fa do

Ia

a

6

t Lao I to

DIA

TewtonsMETHD

Newtons method has a distance x – a₀ = –[Df(a₀)]⁻¹f(a₀) that becomes smaller and smaller, while the method of dividing and averaging has an initial guess a₀ that converges to the root x. As in divide and average, take the solution x to be a₁, etc., and hope the function converges to a root.

Lipschitz Conditions The Lipschitz Condition is the condition whereby the distance between a functions derivative at two points is less than or equal to the distance between the two points, thereby ensuring (nearly) that it is twice continuously differentiable: Let f: U → Rᵐ be a differentiable mapping. The derivative [Df(x)] satisfies a Lipschitz condition on a subset V ⊂ U with Lipschitz constant M if for all x₁, x₂ ∈ V

E M X X

LIPSCHITZ CONSTANT

Dfw

Dfa

T

Lipschitz Constants Root out M from the I) normalized distance between derivatives, or II) the normalized second partial derivatives:

I say

123

f

Fite

fly

YEE

Df

1,1

X

Y Z 2 X

Yz

E Z x

z

F

O 0 6 2 D D

0 D 0 D D

i M 6

Kantorovitch’s Theorem Let a₀ be a point in Rⁿ and f: U → Rⁿ a differentiable mapping with invertible derivative [Df(a₀)]. Define

Pff

zgy

DAY

jz

y F Z Xz

y

Dfw

Dey

24392

yzf

X Xz

Xz yay M

y

Dif

IF

f

DID

III

I FI

Dad f f Daf Diff Def Ditz bx 136 7 36 3 Dfw DAY

To

DECA

increment

III

Datz I DzDzf Diff 61 7 3

Icao

ao

Uh

vector

-a vector. A point minus a vector is a point:

do A point plus a vector is a point. Define

124

DFCao

Icao

I

a

If

do tho

a

Uo

ails thot

X H

If the derivative [Df(x)] satisfies the Lipschitz condition

I Fca

Dfa

E M U

Ual

For All U UzE No

then f(x) = 0 has a unique solution in U₀ that Newton's method, with seed a₀, converges to, iff

IFcaal

Dfa

T M Et

If the product of the length of the function at a₀, the length of the squared inverse of the derivative of the function at a₀, and the Lipschitz constant M, is less than or equal to 1/2, then Newton's method is guaranteed to converge to a root in U₀:

a

u

Uo

to

Each iteration of the inequality locates one ball inside another, with at most half the radius of the previous. Lipschitz conditions satisfied on one ball are satisfied on all of them. As the radius of the ball goes to zero, the sequence of a's converge to x -the root. The following example demonstrates the use of Newton's method with Kantorovitch's theorem. Find the roots for

cos x

sin

y

y

exty

Elf

x

With initial guess

cos

y

y

SIN

Xt y

X

I

0907

q

asf

[1 1] is close:

cos

I

I

1

0

SIN I

I

125

Check that Newton's method works to the root by satisfying Kantorovitch’s theorem— a) Find the vector function's length b) Find the derivative, inverse derivative, and inverse squared derivative matrices c) Root M out from the normalized second derivatives d) Check if the theorem is satisfied

2

a cos X 09074.1 6 SIN x y

X cos X Sin 1 COS

X SIN X y 1 Datz SIN Xt y COS

l o

cos

DECADT

i 1

l cos 1.58524 L 2 I

C

Cosa y SIN x DD cos x y Dad

D D

Cosa y Coscx y D SIN Y y

D D

SIN SIN Xt y

D

SINCX C COSH y t COSCX y y

t SIN Ay t t Cos X y COSH y Sin x y

2 2

SIN Xt ZSINCX cos x y t ZSINCX y y

i 2 M

Fao Dif

Dif City

City

Dead

y

y

Def

City Dead

lose

352 8

costs

y

y

Ifad

0525

cosh

27

f

f

f Daf Dfw

fat DoDaf

Daf DzDzfz

City Dey

Ft

V4

C

F

d

Ft

C

L

C

y

FCA I

DFCdo

Y M

E

ty

R E

1 2

2

4 L

S

-Kantorovitch’s theorem is satisfied, and Newton's method converges to a root starting at a₀. The increment within which the root must be is quickly found:

do

O

-the root is within .064 radians of .

126

Wo

DICH

cosh

I

1

1

19

52

F

b sites I

Superconvergence Set x = |a – a |. The sequence a₀, a₁,... superconverges if, when x are given in base 2, each number x starts with i i i i i i 2i - 1 ≈ 2 zeroes.

The Inverse Function Theorem Proving that if the derivative of some function is invertible at some point x₀, then the function is itself locally invertible in some neighborhood of x₀. A function and its inverse have a point of intersection at 0 for f(x) – y = 0: the square and square root are inverse functions to each other. Let U ⊂ Rⁿ be an open neighborhood of x₀, and f: U → Rm a continuously differentiable mapping. Suppose y₀ = f(x₀) and that its derivative L = [Df(x₀)] is invertible. Let R > 0 be a number satisfying the following hypotheses: I) The ball U₀ of radius 2R|L⁻¹| and centered at x₀ is contained in U. II) In U₀, the derivative satisfies the Lipschitz condition: LIPSCHITZRATIO

DFID

DICK

at

t t

p

Then there exists a unique continuously differentiable mapping g from the ball V of radius R centered at y₀ to the ball U₀:

Egly

g

V

j

No

SUCH THAT

Dfg y

Dog y

-inverse vector function g exists iff [Df(x)] is invertible. The image of g contains the ball of radius R₁ around x₀, where

ZR I E R

ILI

t

z 14 I If the second partial derivatives are continuous, then the function is automatically Lipschitz in some neighborhood

R

of x₀. Proof: We want to find f(x) = y = 0 for y ∈ V, using Newton's method to superconverge on x beginning at x₀, given the restatement of the function f(x) = y, so that b = 0 for some x:

o x

L xo Xo

Xo Xo

fix j

Fy

Dfg

holy

DICK Dfy

Exo

Fy

fy

L

j

joy

yo j

-y₀ is the center of ball V, is within V, and has radius equal to R, giving |y₀ – y| ≤ R. The ball U₀ of radius |h₀(y)| centered at x₁ is 127

at most half the radius of V₀:

Therefore, by Kantorovitch’s theorem,

fy Xo

t 2

Dfa

M E R E

Z z

y

I

-Newton's method on the function f(x) = 0 superconverges beginning at the point x₀, and the function remains invertible to an arbitrary degree. For example, to find where a mapping is invertible

O VX COT

21 5

-check that the determinant of the derivative matrix is nonzero ie, that its inverse does not fail to exist. The above vector function is invertible everywhere except (0, 0). As another example,

SIN

CoSCxty

2 2 x zx

E X X Xz DET O

0 Iff X ZX COS x 2 2 COS Xt Xz

-the function is invertible for all angles x ≠ y, and all x₁ ≠ x₂.

128

III

fl'd

DECEIT

Exa

EE I

I

DECEIT

E

BZP

XE Xi

Xi

AY

f E

Coscxty

Xi

y

City

ty

y

The Inverse Function Theorem- Short Version If U ⊂ Rⁿ is open, f: U → Rⁿ differentiable, y₀= f(x₀), and [Df(x₀)] is invertible, then there exists neighborhood V of y₀ and inverse function g: V → Rⁿ with g(y₀) = x₀, and f ∘ g(y₀) = f(g(y₀)) = y₀. Near (x₀, y₀), y = f(x) → 0 = y – f(x), and 0 = y – f(x) expresses x implicitly as a function of y.

The Implicit Function Theorem- Short Version Let U ⊂ Rⁿ⁺ᵐ be open, F: U→ Rⁿ be a C¹ mapping such that F(c) = 0, c = ∈ Rⁿ⁺ᵐ and such that its derivative, the linear transformation [DF(c)] is onto. Then the system of linear equations [DF(c)][x] = 0 has n pivotal variables and m non-pivotal variables, and there exists a neighborhood of c for which F = 0 implicitly defines the n pivotal variables in terms of the m non-pivotal variables, and the derivative of implicit function g at implicit point b is

b D FCC DutmFCC Dnt FCC DwF c

This equation is reminiscent of the partial differential equation of an implicit function's derivative:

I

Dg

f Fx

x x7

f

y

x

y

I

0

Fly x2

o

O

Fye

ztI

E

o

-the implicit derivative of x as a function of y is equal to the product of the (negative inverse) derivative of the multivariable function with respect to unknown variable x and the derivative with respect to known variable y. In original equation y = f(x), y is the known, dependent variable, and x is the unknown independent variable for which an implicit function, switching the roles of the variables, is sought. That y is known and x unknown is a consequence of what is being asked by augmentation: does some m vector or vectors (x) exist as a linear combination of n known vectors (y) ie, do they span the space of those column vectors? The idea of the implicit function theorem is only understood in the context of the existence and uniqueness of solutions of linear systems of differential equations, Theorem 4.2.1. In that theorem, it was shown how, if there are non-pivotal variables, they must be some linear combination of pivotal variables. In a general augmentation, if there are n equations in n + m variables, the m variables can be thought of as known, leaving n unknown variables in n equations. For theorem 4.2.1, the first part is the linear version of the inverse function theorem, and the last part is the linear version of the implicit function theorem. These two nonlinear theorems define pivotal and nonpivotal unknowns as being those of the linearized problems. As in the linear case, the pivotal unknowns are implicit functions of the non-pivotal knowns. But those implicit functions will be defined only locally, and in this small region, linearity precisely approximates nonlinearity. Then locally, the mapping behaves like its linearized function– its derivative.

Iz

E

F't

129

Appendix A Sets and Numbers

A.1 Set Theory A.2 Numbers: Types of Numbers. Real Numbers. Complex Numbers. A.3 Expressions and Equations

A.1 Set Theory The various notations of set theory, along with their meanings and an example of their usage, are described in the following table:

Symbol Meaning Usage

A= {a, b, c}

{Brackets} "The set ∗ with members{...}"

"The set of ∗ such that" {A | p(a)} {∗| }

A∈a "Is a member of the set" ∈

a=b "The same object as" =

A⊂B "Is a subset of the set..." ⊂

"Contains the set..." A⊃B ⊃

Union: "The set of elements of A or B or both" ⋃ A⋃B⋃C

Intersection: "The set of elements of both A and B"

⋂ A⋂B⋂C

A⋂B=∅ The empty set ∅

For example,

B

B

A A

C

AUB AMB

AU BMC AUB NC

131

A.2 Numbers Types of Numbers There are five kinds of numbers, each more general and inclusive than the last: i) Set N, the natural numbers, eg {0, 1, 3, ...} ii) Set Z, the integers, eg {..., –1, 0, 1, ...} iii) Set Q, the rational numbers, eg { p/q|(p, q) ∈ Z, q ≠ 0} iv) Set R, the infinite and infinitesimal real numbers, eg {π, 1/3, [xᵢ]ₖ, ...} v) Set C, the complex numbers, eg {σ + it, (σ, t) ∈ R, with imaginary i = √(–1)}

Real Numbers Real numbers are the set of all infinite decimals, eg 0.999... (repeating), ordered from least to greatest: –n

–5

–4

–3

–1

–2

1

0

2

The real numbers possess the following properties: (a) Commutativity of addition

at

b b ta

and multiplication:

ab

ba

(b) Associativity of addition:

at

b

tc

bt c

at

and multiplication:

ab c

a

bc

(c) Distributivity of addition:

a

bt c

abt

Existence of (d) the identity element for addition:

at o 132

a

a

n

π

0.999...

–3.333...

3

4

5

and multiplication:

all

a

And existence of the (e) inverse element for addition:

att

to

a

and for multiplication:

a

la

l

aa

As an example, using these rules, (a)

13

(b)

3

(c)

7

(d) (e)

7

7 8

13

7

Xt s

Ot Gx Tt C T1

TX

8

3

4

13

20

7

7

263

7

BX

GX

X

G Gx

C 2 3 X

35

Gto

Gx

7

O

I

I

Complex Numbers Consideration of the negative discriminant leads one to the mathematical necessity of complex numbers. The quadratic formula as a solution to the quadratic equation ax² + bx + c = 0 is traceable to at least Babylon. In analogy to the quadratic equation x² = mx + b having two positive solutions for a positive discriminant, one root with an empty discriminant, and no solutions with a negative discriminant, cubic equation x³ = px + q must have a negative solution since x³ runs from –∞ to ∞:

y

pxta

n

y x2

y mXtb

yet

I Geometrically, these solutions are the intersections of y = x², y = mx = b, and y = x³, y = px + q. It was Rafael 133

Bombelli that conceived of treating the negative square root as a number i, the imaginary number

i

1

Then i² = –1. A complex number is the sum of a real and an imaginary part, where the imaginary part is a real number multiple of i:

z

it

at

The adjective complex defines itself, given that it means not only complicated, but also compound, a thing with separate parts. It is undoubtedly in this second meaning that Gauss came to call these numbers complex. Along with Gauss, Caspar Wessel and Jean-Robert Argand contributed to the geometrical interpretation of complex numbers as coordinates in the complex plane: i

r

z

3

L

3

3i

Z

b

Real

ZE 3 3 i

v Imaginary

In so doing they created complex geometry, and it may clearly be interpreted geometrically as a new imaginary coordinate associated with every real coordinate, imagined as new, intrinsic dimension. From the diagram, we see that a complex number is in direct analogy to a vector, and indeed the set of complex numbers C forms a vector space, where each complex number is defined by a length or modulus r,

at 62

Z and a direction given by angle φ, the argument:

Y

TANT

E

For a complex number z = a ± ib, the complex conjugate is

E 134

a

Fib

Since complex addition is just vector addition, eg

Z it Zz

r

Z

Zz then the sum of a complex number and its conjugate is real:

z

tz

IR

Also,

zits

Zi Zz Z Zz z

2

Fizz at 62

ZE

-the sum of conjugates is the conjugate sum; the product of conjugates is the conjugate product; the square of the modulus of a complex number is equal to its product with its conjugate. In elementary terms, complex numbers may be added and multiplied like any polynomial, for example

3 41 3 31

2

i

17

6

20

3i

6i

3i

i

312

9 3i

35 3 But by this strategy even the basic multiplication of complex numbers can be cumbersome. The periodic nature of the imaginary numbers, revealed by exponentiation (eg i⁰ = 1, i² = –1, i³ = 1, etc.), is reminiscent of the trigonometric functions, suggesting a complex polar decomposition. This is proved in an earlier theorem (§23.1).

Ann

135

A.3 Expressions and Equations Let's brush up on the most important thing about college algebra: solving algebraic equations. Remember that algebra is the use and deduction of variables and constants to determine solutions of some numerical relationship not given explicitly. Express unknowns with some italicized letter, say x. A constant c = 3 has a given or determined value, whereas a variable x has a changing or undetermined value:

4

3

15

A coefficient is the numerical value attached to the variable:

3 4 15

The terms are those parts separated by their arithmetic operations:

4

3

15

Like terms share the same variable:

7

SX

Without an equal sign, all of the above quantities involving unknowns are expressions. Like terms may be combined:

9

SX

7

An algebraic equation equates two algebraic expressions:

3

4X

Is

An equation is solved when the variable to be found is isolated

3 3 44 15 3

Ry

3 X

-by applying operations to both sides of the equation simultaneously.

44

136

7

To do so, obey the order of operations, PEMDAS:

PRIORITY OPERATION

ga

X 3

4

t s

6

-please excuse my dear aunt Sally.

If no equality exists, the expression may still be simplified:

8 8 8 8

26

PARENTHESES EXPONENTS MULTIPLICATION

Division ADDITION SUBTRACTION

x 31

265

10

NAME

X

3

6

2

Distribute negative Distribute 2 Combine like terms

16

An equation's solution, for example:

40 20 2

20 2

20 2 12

20 24

24

X

7

7

940 900 900 30 10

s

137

Appendix B Functions and Graphs

B.1 Functions B.2 Linear Functions B.3 Asymptotes B.4 Exponential & Logarithmic Functions

B.1 Functions A function is a transformation that maps one number to another according to some rule, such that each correspondence is unique. Then every number in the first set has one and only one partner in the second set. The first set is called the domain of f, and the second set is called the range. Think of a function like a machine:

X

fx

y

-in goes some number x, the independent variable, and out comes some number y, the dependent variable. For example, the function f(x) could be the mapping from father to children:

"The father of x is y"

fx y

Domain Range

Children Fathers

f: N → N

-f is a rule that uniquely maps from one set of natural numbers to another: everyone has exactly one father. Yet not all fathers have exactly one child:

RULE

Children Fathers

-therefore, the latter is not a function. It was made a function in the former by restricting the domain to single children. A mapping is onto (surjective) if for every element in the domain there exists at least one element in the range ie, it is everywhere defined. A mapping is one-to-one (injective) if for every element in the domain there exists exactly one element in the range ie, it is well defined.

139

Ii

fact

t

If a mapping is onto, and one-to-one, it is then bijective, and there exists an inverse. The existence of an inverse and the quality of bijection is precisely what defines a function.

A function can be the function of another function:

-a composition. For example,

fog Fom Fo Fom

f g

X

x

Father of the mother of Maternal great grandfather

The diagrammatic representation of a functional relation in which pairs of objects are in some sense related. A graph is an ordered pair G(V, E) comprised of a set of vertices, V, and a two-set of edges, E, such that E ⊆ {{x, y} | (x, y) ∈ V² ∧ x ≠ y} i.e., the vertices x and y are the endpoints of the edge, and the edge joins x and y. The cartesian graph is an equilateral polygonal tessellation of space by the set G(V, E), be that in 1, 2, 3, or more dimensions. The directions are said to be perpendicular or orthogonal, and every point on the graph is describable by an ordered pair of real numbers (x, y), the signed distances to the points from two perpendicular oriented lines, axes x, y, in two dimensions,

I

QUADRANT QUADRANT I I

(–5, 6) C it Ct t

(3, 4)

x

(–4, 2)

(6, –5)

QUADIANI QUADIINT

I

or three mutually perpendicular planes in three dimensions, x, y, z, in three dimensions:

it Ys

z

x

yl The sign of the coordinate, its signed distance from the axis, or plane, bestows orientation on the graph. This is reflected by a graphs four quadrants, in two dimensions, or its eight octants, in three dimensions: infinite regions 140

bounded by the axes or planes. Plot the input x and the output y of a function as points on the graph and connect them to see the graph of that function:

y = 2x

The values of x and y for which the equation is true is the solution to the equation, and the points of the graph:

y

2x

t I 2

4

3

6

4

8

Forgot

Use the vertical line test to test whether or not a graph is a function:

-if the line crosses more than one point of the graph, then the graph is not a function. 141

Use the horizontal line test to determine whether or not y = c, and whether or not f is invertible as a bijective function:

y = x²

–∞ < x < ∞

y=3

Use interval notation to describe a graph's domain and range:

y y y

3

x x x

I

2 I x x 312

3,1

1,2 0,3 y y y

The zeros of a polynomial function f(x) are those values of its independent variable that make f(x) = 0:

4 fix

O 4

P 4

X

F

If the function is instead the equivalent polynomial equation, then, when set equal to zero, those zeros are now described as roots to the equation- the solution or solutions to the polynomial that make it equivalent to 0.

142

f

1,5

E

E

A function's extrema, its maximum and minimum values, are best found by examining its first and second derivatives. Determining a function's behavior as increasing, decreasing, or constant, can be done by examining its first derivative just above, and just below the point where its derivative vanishes (§9.1). For an abstract function in higher dimensions, examining the quadratic terms of the function's Taylor polynomials will identify extrema.

Definitionally, a function's value f(a) is a local maximum of f if there exists an open interval containing a such that f(a) > f(x) for all x ≠ a in the open interval. A function's value f(b) is a local minimum of f if there exists an open interval containing b such that f(b) < f(x) for all x ≠ b in the open interval:

a feat

fear tt

x flti

Ez fixed 6,516

fab fixe A function is increasing if on an open interval, I, f(x₁) < f(x₂) whenever x₁ < x₂ for any x₁ and x₂ in the interval. It is decreasing if on an open interval, I, f(x₁) > f(x₂) whenever x₁ > x₂ for any x₁ and x₂ in the interval.

A graph is symmetric with respect to the x-axis if when substituting –y for y the equation remains the same. It is symmetric with respect to the y-axis if when substituting –x for x the equation remains the same. If both x, y are replaceable, then the graph is symmetric with respect to the origin:

y y y

(x, y) (–x, y) (x, y) (x, y)

x X X

(x, –y) (–x, –y)

A piecewise function is one in which more than one equation must be used to describe a graph. Use brackets:

µ v

143

ge gcu

yg

B.2 Linear Functions The slope m of a line is defined to be

II

m

-the ratio of the change in one signed coordinate distance, y, to the change in the other, x. It does not matter which pair is used first or last:

ez e

6 2,7

y

3

X

8

yz a

y

It

x

II

1

to

Yo

-only that the chosen orientation is preserved throughout the graph ie, once

RISE

OR Mt M

RUN is declared, it must remain that way. There are many equations of the line and they are all linear functions: Ax ± b = f(x). They may be in one form, or the other, or a combination of them all. The point-slope form:

no x m Xz yz y y

is just a reorganization of the slope equation. Likewise, the slope-intercept form

is just a reorganization of the point-slope form. It's usually given simplified:

pigg

xp

Yz

metexpty y

mxty y-intercept

To find it, just solve for y. To find the x-intercept, set y equal to 0 and solve for x. Set the x equal to zero and solve for y to find the y-intercept. For the next form, take the coefficient of x in the previous forms, m, multiply out the denominator onto y, and place all terms on one side of the equation

0 m 6

to derive the general form of the linear function, usually given simplified as

ME

Myx my

Ax

By

C

o

where the slope is -A/B. Any straight line can be written in any or all of these forms. For example the general form of the line

3 put into slope-intercept form is equal to

144

29 4

0

3

and the point-slope form is

29 4

0

3

Zy

24

3

y y

4

32

32

x

4 4 2

o

If two lines are perpendicular, then their slope has a product equal to -1:

5 0 me m 4

mm 1 I

-and a dot or scalar product of coordinate terms (vector components) equal to 0:

3 12 12 0 46 37

4

It should be obvious that all parallel lines share the same slope.

Function transformations may be performed according to the following procedures. For a vertical shift

y fix IC

-add a real number to the entire function. For a horizontal shift

y fix

-add a real number to the independent variable. For a reflection about x

tx y

-negate the entire function. For a reflection about y

y fl X

-negate the independent variable. To vertically stretch a function

34

39

Y

3

3

E

1364

49

y

ctx

c

l

-multiply by a factor greater than 1. To vertically shrink a function

O cat y ctx

145

-multiply by a factor less than 1 but greater than 0. And likewise to horizontally stretch or horizontally shrink a function:

y flux c I y flex O cel

-multiply the independent variable by the respective factors. For example, for some function f(x),

y = x² – 1 y = x²

Horizontal Shift

y = 1/2(x² – 1) y = –1/2(x² – 1)

Vertical Stretch Vertical Reflection Let f: A → B and g: B → A be functions such that for all x ∈ A, B

f g

X

x

g f x

X

The function g is then the inverse function of the function f, denoted

f

g x

Therefore,

g

f f

X

X

get FX

X

and the domain of f is equal to the range of f ⁻¹ and vice versa. The inverse gᵣ is the right inverse and exists only if f is surjective. The inverse gL is the left inverse and exists only if f is injective. If both are satisfied, the function g is a full inverse. 146

Find the inverse of a single variable function

f

7

X

5

by solving for independent variable x

y

X

and exchanging the independent variable for the dependent one:

y

S 7 4 28 5 33 5 W X 4 7 f x

w x 33 4 f x

Find the inverse of a linear multivariable function that is decomposed and provably bijective by forming a matrix and finding its matrix inverse:

3 1

X Ex x f y y

s I

s

14 74

1617 DEA 365 E

i c 15 1

14

x f ix g

349 14

Y

14

What if a multivariable function is given composed, ie, z = ux + vy? The vertical and horizontal line tests for bijection reveals the quality of monotonicity. A function is monotone if its function is always increasing or always decreasing. This unifies the ideas of the vertical and horizontal line tests, demonstrating that they are really the same tests for bijectivity on different variables. In higher dimensions, it is not always possible to say weather a function is always increasing or always decreasing. The test is instead replaced by the requirement that the derivative of a mapping is invertible: the inverse function theorem, which states that if the derivative of a mapping f is invertible at some point x₀, then the mapping is locally invertible in some neighborhood of the point f(x₀).

147

Is

ft

XIS

3355 33

if I

I it

I

It

I b

if it

Ty

29

A

Sy

DEA

I

ADA

Ex

ily

B.3 Asymptotes Asymptotes can help us understand derivatives. Rational functions are quotients of polynomial functions:

fix

PEE

The most basic of these is the reciprocal function:

fx

I

From its graph,

and a table of values for x and f(x), x

{1 {0:5 {0:1 {0:01 0:01 0:1 0:5 1

f(x)

{1 {2 {10 {100 100 10 2 1

we can see that as x approaches from the left, y goes to negative infinity, and as x approaches from the right, y goes to positive infinity.

148

The lines they approach are asymptotes. Their distance from the graph goes to zero as x → ∞. In each of the following graphs, the asymptotes are identified. Firstly, we have the graph of f(x) = 1/x, with asymptotes x, y = 0:

The graph f(x) = 1/x → 0, of x → ∞:

149

Next, we have the graph of f(x) = (x² – 3)/(2x – 4), with asymptotes x = 2, and y = x/2 + 1:

The graph of y = (5x² + 8x – 3)/(3x² + 2) has asymptote y = 5/3:

The graph of y = (11x + 2)/(2x³ – 1) has asymptotes x = 1, y = 0:

150

Finally, the graph of y = 2 + ((sin x)/x) has asymptote y = 2, where the curve seems almost to twist into it:

Locate vertical asymptotes by solving for the zeros of the denominator of a polynomial function. If f(x) = p(x)/q(x) have no common factors and a is a zero of q(x), then x = a is a vertical asymptote of f:

FX 3

3

Xp

Eq

Xt3ÉX

-the vertical asymptotes are located at x = ±3. Again,

Now consider the function

fix

fix

YE X

Xt

EX

3

3

Xu 3

XIII

151

It's immediately clear that the vertical asymptote is at x = 2. But if the expression on the right is simplified obtained instead is

2 x z

fix

-the lines are equivalent, yet the latter clearly has no vertical asymptote. How can this be? The answer lies in the fact that, where the previous unsimplified expression made clear there was an asymptote, there is now an infinitesimal hole in the line at (2, 4):

4

z

z Thus, the value of f(x) may be evaluated to be arbitrarily close to 4 as the input x is taken to be arbitrarily close to 2:

x f(x)

1:9 3:9

2:1 4:1

1:99 3:99

2:01 4:01

1:999 3:999

2:001 4:001

1:9999 3:9999

2:0001 4:0001

1:99999 3:9999

2:00001 4:00001

1:999999 3:99999

If f(x) is arbitrarily close to a number L for all x arbitrarily close to c, then the number f approaches the limit L as x approaches c:

4

XIII

z

x

2

314

4 5 124

Therefore, the limit of x is 2:

152

fx

IZ

254

4

Find a horizontal asymptote by comparing the degree of the numerator and denominator in a rational polynomial function. If the degree of the numerator is m, and the degree of the denominator is n, then if m < n, the x-axis y = 0 is the horizontal asymptote. If m = n, the line aᵐ/bⁿ is the horizontal asymptote, where a and b are leading coefficients. If m > n, then there are no horizontal asymptotes. For example,

4x 2 fX

2 1

M l N Z

o ML n ya

2 4

fX

zit I

n 2 m y 4 2

A diagonal asymptote is equal to the q of the division algorithm, but now with polynomials:

PIX qx

ba

qtr

Divide the polynomials to determine the quotient. For example, the diagonal asymptote of the following function is

f(x) = x – 1

5 X

fix

9

2 4

X E

8

4 s I

153

43

If

31

35 5

it

B.4 Exponential and Logarithmic Functions The exponential function with base b is

fx

6

where b is a positive constant other than 1 (b > 1, b ≠ 1) and x ∈ R. Examples of exponential functions are

fx

24

fx

18

fx

t

I

fx BASE 1

BASE 3

BASE10

BASE Z

3

z

Some non-examples are

fX

X

FX

FX

1

1

FX

X

-in the first non-example, the variable is the base and not the exponent, and therefore it's not a function of the exponential. In the second, the base is 1, and in the third the base is negative. The fourth has the variable as both the base and the exponent. The domain of the exponential function consists of all the reals. The range consists of all positive reals. Every exponential must pass through the point (0,1), since f(0) = b⁰ = 1. If b > 1, then f is an increasing function to the right. If 0 < b < 1, then f is a decreasing function from the right. The x-axis y = 0 is the horizontal asymptote for any exponential function:

Base e is defined as

a 2.72 e i

-the natural base. The function

neg

h

fx

ex

is the natural exponential function, and turns up in many places in the natural sciences. The logarithmic function is defined as the inverse of the exponential function. For x > 0 and b > 0, b ≠ 1,

6 X Loa X y

For example, taking the inverse of the exponential function with base b,

154

X 69 f x y 6

Loa X y

-which literally means “to what power must b be raised to equal x?” By the exponential function, that power must be y:

Loa X 6

x y In this way, the functions are inverse:

X 64 64 Loa 6

A common logarithm (log) is one that uses a base 10 power. For example, log(100) = 2, and

General Properties Common Logarithms Natural Logarithms

1. 1. 1.

o o o l l Loa en l Log

2. 2. 2.

I lo love I I Loa

Gob

3. 3. 3. b la ex x lox OG x Log

x 4. 4. 64964 4. X y

The natural logarithm (ln x) is one that uses the base e. It is the inverse of e:

elnx

1049

h E

z

brett

7

2

eh

z

elite

Both varieties obey certain rules. Let b, M, N ∈ R+ with b ≠ 1. The logarithmic product rule:

MN LOAM LOG N

-the logarithm of a product is the sum of logarithms, not dissimilar to the exponential power rule

Loaf

6h 6h6min

7

2

Likewise, the logarithmic quotient rule:

oh

LOG N

LOG M

MN

-the logarithm of a quotient is the difference of logarithms. The logarithmic power rule:

LOG MP PLOG M

-the logarithm of a number to a power is the product of the power and the logarithm. For example,

Loaz but thnx Loa bn Fx Loa x

Change the base of any logarithm by the following formula: for any bases a and b, and any positive number M,

Logan LOG M

Loaab

-ie, for common logarithms and natural logarithms

wa m I

-the logarithm of M with base b is equal to the logarithm of M with any new base divided by the logarithm of b with that new base. For example, change to common base for the logarithm log₅140 is

60910140 LOG 40

3.07

Solve an equation involving an exponential function (an exponential equation) by the power of the same base technique: rewrite the equation in the form

6M 6N

set M equal to N, since if

I

zlux

but

Loam

8

89

Logos

6M 6N For example,

M

23 23

8

16

8

24

3

8

4

N

X

4

155

Use logarithms with exponentials to solve equations with exponential unknowns. Isolate the exponential and solve for its variable, using a common logarithm for an exponential with base 10, or a natural logarithm for any other base:

44

15

lot

xln4

he Is

X

I

hey

120,000 LOG 120000

40910 e

pg

40

x

For example,

3 2 42 5

3 2 x 2 In 5

2 hrs 2hr5

Zlust 2x

2hr4 hrs Zen St

X

Likewise, use exponentials to solve equations with logarithmic unknowns:

3hr 2 LOG x 37

Xt 3 4

3 X 16

x 13

00

2

5.08

b4

64

3hr4

hey

this

364

3hr4

Zine23,414 In

For example, log₂ x + log₂ (x – 7) = 3 is solvable by

3 LOG X LOG_ x 77

X x 7 3 LOG

23 XC x 7

8 Fx

8 Xt 1 x

7 156

6.34

e

ZX

12

2x

4

e4

2x

E

x

F

F

87

0

X

L

X

f

Use the logarithmic equivalence property of logarithms to equate them:

LOG M

LOGON N

M

For example,

h

he

x t2

is solvable by

XCX 2

2X 3

3 2x

x 3 x t1

3 X x

-equating logarithms. The following exponential function

Matt

I 4

XI

AGENT

3

4X

F

A T

Y

UCI

4173

ft

bn

3

4

0 O

I

Aoekt In

-models exponential growth or decay- an increasing or decreasing function that increases or decreases by an amount proportional to t, the time it has been increasing or decreasing. If k > 0, the equation models growth, and if k < 0, it models decay:

y

y

y = A₀eᵏ 1

k>0

y = A₀eᵏ

k>0

t t

157

Appendix C Systems of Equations

C.1 Systems of Equations

C.1 Systems of Equations Equations of the form

Ax

C

By

of leading degree 1, are linear; they are straight lines when graphed. If two equations are given together

g

x

y

y

y

t

then they form a system of linear equations, and the ordered pair (x, y) of their simultaneous solution is constrained; they must both satisfy both equations:

1

6

7

1

6

t

2

3

7

z

3

Ft

3

4

7

3

4

t

4,6 2,3

3,4

-thus, (3, 4) is the solution to the above system. Finding the solution to this one is easy: what two numbers have a sum of 7 and a difference of ±1? If such a solution isn't quickly obvious, just use algebra on one equation, solving one variable for the other:

x

y

7

y

1

x

y y

4

X

p xx

1

2

6

x

3

7

and then back substitute into either equation to solve, with the found variable, for the unknown variable:

3

y y

1

3

y y

4

1

4

For any linear system of equations, eg

I

t

I 159

the solution is their point of intersection:

x + 2y = 2

(4, –1)

x – 2y = 6

For the equations of our example, this point is (4, –1). To find it algebraically, put y or x in terms of the other and back substitute

x

zy

Z

x

zy

6

X

6

y

x

3

Xtz

Ex

3

zy

6

x X

4

2

2x

8

x

4

zy zy

z z

I

y

t

4

160

z

zy Ly y

z z

I

Or, take a multiple of the variable of one equation and combine it with another to eliminate variables and solve the equations simultaneously: solution by elimination. Here, to get an equation that can be solved for y, make the x in the first equation a (–x) by multiplying the first equation by (–1),

xtzy X

zy

Z

X

Zy

6

x

29 6

z

to combine it with the second for a new second equation:

X

zy

x

zy

z

6

49 4 t

y From here we can just back substitute:

261

x

Z

x

4

But we only solved the equations simultaneously halfway, until we found y. To solve them fully simultaneously, multiply one equation to make one variable opposite (eg, 8/4x → –4/8x) so that upon combining the equations it is eliminated. For example, 6+2=8

Z

xtzy x

xtzy

29 6

t

Z Add: 2y + (–2y) = 0

2x

f

x

29 6

1 x 29 6 t

Multiply : 1(x – 2y = 6) gives –2y, opposite to 2y of first equation

Leave the multiplied equation unchanged and repeat the process with the other equation:

8

2x x

zy

6

2

2x

t x

Add

28

g

2x

Zy

Z

29 6 161

Notice that we've eliminated variables so that the remaining variables descend a diagonal from the top left of the first equation to the bottom right of the last. This has left zero's (0x and 0y) off the diagonal. Now just divide out:

8

2x

X

2

zy

4

toy

l

ox t y

As we'll see in §15.2, this whole process of elimination is greatly distilled by the use of matrices, a 2dimensional array of numbers:

a

z

9th

922

Ai

win

a

m rows

971 ai

i ay

i

am

am

ma

i

n columns

We can easily reframe our system as a matrix equation, reducing it from A to U to R:

x

zy x ay

Z

I

6

I

I

I

Augmentation

If

I

A

Iz

Z

f

E

j

4

R

Uᵀ

In eliminating a system of equations by reducing their matrix, one is concerned purely with establishing 1's along the diagonal pivots of the matrix by eliminating entries to zero above and below them. Therefore, there are many routes to a solution. For example, in the above problem, we could have focused instead on first eliminating x in the second equation, automatically making upper-triangular matrix U:

I

I 3

3 A

1

I

Y

a

R

U

In forming R, the reduced row echelon matrix, from U, we finish making entries above the pivots zero, and just divide the rows so that the pivots are equal to 1. Notice that we have discovered the vector x = [4, 1]ᵀ that solves the linear system of equations

162

HI t

I 1

2 z

t

2

6

As a first example, consider two equations in two unknowns:

3

5

29 8

4xty

In two equations, with two unknowns, using elimination and back substitution is faster than matrix reduction:

y

4

8

41 8 4 x

y

y

1,4

Using matrix reduction instead,

I

I

I

4

As another example, consider the three equations in three unknowns

x x y x

2

22

Zytz

8 17 16

163

If we reduce the matrix to the point of isolating a single variable in a single equation,

1 I I 1 1 I 2 1

8

I I

17 16

I

2

1

8

1 1

9

1

I

L

16

1

-then the system is solvable by back substitution. There are many routes to the solution. Here is one:

29 8

i

y 4 x 14 13

X 13

17

22 ZZ

22

8

2

s z

S

4 10 17 x i

x

y Z

3

3,4 s

Or, take A to its reduced row echelon form R:

1 1 Z

I I

8

I 1

g

É

I I 164

g

1

8

I

9

z

10

I I z

X

4

Appendix D Trigonometry

D.1 Circles, Squares, and Triangles: Plane Geometry. The Pythagorean Theorem. D.2 Trigonometric Functions: Trig Functions. Determining Angles. The Law of Cosines. Trigonometric Identities. Periodic Functions.

D.1 Circles, Squares, and Triangles Plane Geometry We use trigonometry to relate curvilinear, angular coordinates to Cartesian ones. In the plane, triangles act as the intermediary between circles and rectangles by way of the trigonometric functions. Thus, in the study of trigonometry, one must also be concerned with the properties of circles. For the development of spacetime in this series, we'll be primarily concerned with three aspects of the subject: trigonometric functions, trigonometric identities, and trigonometric (periodic) graphs. First, let's review some basic terms of plane geometry. An angle is a figure determined by two rays having a common endpoint. A ray is an infinite extension of a point in any direction. An acute angle is an angle with a measure between 0 and 90°. A right angle is an angle with a measure of 90°, while an obtuse angle has a measure between 90 and 180°. When the sum of the measures of two angles is 90°, the angles are complementary. When the sum of the measures of two angles is 180°, the angles are supplementary. Angles are measured most accurately by radians, the number of radii of a circle that will fit inside its circumference: 2π. The diameter of a circle is the length of the line that halves it. The radius of a circle is the length of a line that originates at its center and terminates at its edge, half of that of the diameter, d/2. And the circumference of a circle is the length of the line that wraps its perimeter, dπ, or, equivalently, 2πr. Irrational number π, pi, is the ratio of a circle's diameter to its circumference:

Vc D

Zita

z

A line is an infinite, continuous point extension containing infinite points:

Any two lines lie in the same plane together: they are coplanar. Two coplanar lines either intersect or are parallel. If two lines intersect, they have exactly one point in common. Two lines in a plane are parallel if they have no common point. When two lines intersect to form equal adjacent angles, the lines are perpendicular. Each of the angles formed by two perpendicular lines is a right angle. A transversal is a line that intersects two or more coplanar lines in distinct points. When two lines are cut by a transversal, the angles formed are classified by their locations:

3 4

me

ne

s

v

-angles 3, 4, 5, and 6 are interior angles. Angles 1, 2, 7, and 8 are exterior angles. 3, 6, and 4, 5 are alternate interior angles. 1, 8 and 7, 2 are alternate exterior angles. Corresponding angles are 1, 5 and 3, 7 and 2, 6 and 4, 8.

it

f

166

Triangles are plane closed shapes formed by line segments that meet at their endpoints- the vertices of the triangles. Triangles that have no two sides with the same length are called scalene (a), those with at least two sides having the same length are called isosceles (b), and those with all three sides having the same length are called equilateral (c). Triangles containing a right angle are right triangles (d):

b a cc

d Oblique triangles are triangles with no angle 90°. They may be acute (90°):

900 900

The 180° rule says that the sum of the angles of any triangle is equal to 180° degrees, or π radians:

420 960

600

420

600

600

Thus, when any two angles of a triangle are known, the third is apparent. Triangles have area 1/2bh, where b is the base, and h it the height. Circles have area πr². Triangles with the same size and shape are congruent- they have the same edge lengths and angle measure. Triangles of different sizes but same shape are similar- they share the same angle measure. The median m of a triangle is the line segment connecting a vertex to the midpoint of the opposite side edge:

m

The centroid is the intersection of the medians:

167

The incenter is the circle within a triangle that is tangent to all it's sides:

A polygon is a plane closed figure who's edges are straight noncollinear line segments that intersect exactly two other line segments at its vertices, forming a closed polygonal chain. The interior, the boundary, or both may be referred to as a polygon. Polygons fall under many classifications. An n-gon has n edges and n angles. A quadrilateral is a polygon having four sides. The sum of interior angles of any quadrilateral is 360°. An equilateral polygon has edges equal in length. An equiangular polygon has angles equal in measure. A regular polygon is an equilateral, equiangular polygon:

Triangle

Square

Pentagon

Hexagon

Octagon

Thus, a triangle is a regular polygon. A parallelogram is a quadrilateral with opposing parallel edges. A rectangle is a parallelogram with one right angle. A rhombus is a parallelogram with two adjacent edges equal. A square is a rectangle with two adjacent edges equal. A trapezoid is a quadrilateral with exactly one pair of parallel edges. In three dimensions, the shapes obtain new characteristics. They are the polyhedra, three dimensional shapes that have flat polygonal faces, edges, vertices, and volume between. Both polygons and polyhedra are 2 and 3dimensional special cases of a more general n-dimensional polytope.

The Pythagorean Theorem Trigonometry is the study of the properties of triangles. It is built upon the fact that in the unit square, with edges equal in length to 1, the diagonal length is not also equal to 1, but to √2 ≈ 1.414. This property is known as the Pythagorean Theorem:

C

6

a

168 E at 62

-for any right triangle, the square of the hypotenuse (c) is equal to the sum of the squares of its two legs (a and b). Quickly prove it with a 3:4:5-triangle, a triangle whose hypotenuse must be five units in length, if its legs are three and four units in length:

Tiling the plane shows us the same thing another way. The plane may be tessellated or tiled by larger and smaller squares of arbitrary size:

Connect the centers to form the vertices of a still larger lattice of squares that fill the plane:

169

Translate the lattice so that lower left vertices coincide with the original tessellation:

Clearly, any tilted square has side length equal to the hypotenuse of a triangle with leg lengths equal to the sides of the two squares of the original tessellation, as evident by the orange and dark blue edges:

Furthermore, each titled square is equal in area to the sum of the two squares found within it, just as we saw before with the 3:4:5-triangle.Then the tilted square and the smaller tiles prove Pythagoras' theorem: the square (area) of the side length of the larger square is the sum of the squares (areas) of the side lengths of the smaller squares. Now, if a triangle's hypotenuse is allowed to coincide with a circle's radius r, then as line r rotates around the circle, its terminal point is describable in terms of coordinates x and y given by the two legs of the triangle, and some angle θ:

r

y

When r is equal to 1, the circle becomes the unit circle. Then the radius r fits into the circumference exactly 1· 2π 170

times. The angle, θ (theta), is the ratio of an arc s, subtended by central angle A'CB', to the radius r of a circle with radius r, s/r:

O

T

B

B r

u

r

I

IYA

Angle measure is periodic: even multiples of π takes us back to where we began, at angle 0, on the unit circle:

Angle 0 = 2π = 4π, etc.

The angle sign depends on line r's orientation, so that it's a vector, a little arrow:

9π/4 3π

–5π/2

171

The angle in degrees and radians, and their corresponding (x, y)-coordinates around a unit circle, are given in the following diagram:

y (0, 1) π/2 π/3

2π/3 3π/4

π/4

90° 120°

5π/6

60° 30°

150°

(–1, 0)

π

180°

0

0°

(1, 0)

330°

210° 7π/6

π/6

45°

135°

315°

225°

11π/6

300°

240° 5π/4

270° 4π/3

7π/4 5π/3

3π/2 (0, –1)

Application of a sinusoidal (sin θ) or cosinusoidal (cos θ) trigonometric function takes an angle and gives an x or a y coordinate. In the case of x, this is the projected length of vector r on the x-axis:

(0.707, 0.707) y = sin(π/4) = 0.707

I x = cos(π/4) = 0.707

172

Thus, if we simply think of the trigonometric functions as buttons on our calculators, we can quickly understand the sine and cosine functions as the projected length of a vector r on the x and y axes given some angle θ.

Since 180° is equal to π radians, their relationship is defined by

368

For angles measured in degrees, digits following the decimal may be represented by minutes and seconds. A minute (') is 1/60 of a degree. A second is ('') is 1/60 of a minute. Angle 57.296° thus has .296(60) = 17.76 minutes, and therefore .76(60) = 45.6 – .6 = 45 seconds, so that

24

I RAD

57.2960

1,840

I DEG

57017145

0 017453 RAD

RAD

D.2 Trigonometric Functions Trig Functions If the end of r is bounded to a unit circle, then it's coordinates are given directly by the leg lengths x and y. We can view r as either a line with slope y/x, or a vector [x, y]. The sinusoidal and cosinusoidal trigonometric functions act on θ to yield y or x, respectively, for r = 1. In other words, sin θ = y and cos θ = x. In general, in the standard, first quadrant orientation of a right triangle inscribed in a circle with radius r, the sine function is defined to be the ratio of x to r, the cosine function is defined to be the ratio of y to r, and the tangent function is defined to be the ratio of y to x:

050 SING TANG

More generally, for any quadrant, we must use ratios of the hypotenuse and the legs determined either opposite or adjacent to the angle of the function. For example, in the case of the sine function, it's equal to the ratio of the length of the opposite leg to the length of the adjacent leg as determined by the angle of the sine function. Similar ratios define the other trigonometric functions. Use the mnemonic SOH-CAH-TOA:

Y

CAH

SOH

SING

Opposite Hypotenuse

Y

I

TOA

Adjacent

Cost

Hypotenuse

TANG

Opposite Adjacent

So, in the case of the cosine function, e

Opposite

H

us ten o yp

G Adjacent

173

Similarly, the sine of an angle b, sin(b), would be

E

SING

B

u Each trigonometric function has a cofunction that is exactly the opposite leg/hypotenuse ratio:

osco

Hypotenuse

sect

Opposite

Hypotenuse

Coto

Adjacent

Adjacent Opposite

-the cosecant, secant, and cotangent functions. They are cofunctions because they both give the same answer from opposite sides of a triangle:

b

ta

TANN

g

CTB

a Use the CAST mnemonic to determine whether or not a given trigonometric function is positive or negative given it's quadrant location. For coordinates given by angle 2π/3, (cos (2π/3), sin (2π/3)) = (–1/2, √3/2), in quadrant II, only the sine function is positive:

y

s

A

E's

I's

T

E

I

E X

12

C

cos

Pos

Now, by the previous definitions, any point (x, y) may be in terms of r and θ,

X 174

rcoso

y

rsino

re

tyz

- its polar coordinates, P(r, θ):

Such angular coordinates are not unique:

P(r, θ) = P(x, y)

r

A

–11π/6 –330° r=2

P(2, π/6) = P(2, –11π/6)

π/6 30°

But, as we already know from algebra, if there are trigonometric functions acting on θ to produce some ratio of r, x, and y, there must also exist some arc or inverse trigonometric function acting on this ratio to produce θ. These functions are defined to be

SIN

cost

TANT

Given a situation, me might use, say arcsin(x) instead of sin⁻¹(x) so that we don't confuse the latter with it's algebraic inverse:

I I

Y

SINI

SIN

to

ly

y 175

Determining Angles

Use the inverse trigonometric functions to determine the angle. The inverse sine and cosine functions can work,

Q

cost I

O

sin

E

but initially, they'll land us in the wrong quadrant, because –1 ≤ sin θ ≤ 1, –1 ≤ cos θ ≤ 1. Use the inverse tangent function instead:

E

TANT

TANT

IEEE

The Law of Cosines The law of cosines generalizes the Pythagorean theorem for non-right triangles such as

C 6

x

a

by the equation

E at 62 Zabar

-which says that one must know two of three sides and the angle between them to determine a third side. Stretch a right triangle over the y axis with leg vertices at the origin to prove it:

T

fu

B(a cos θ, a sin θ)

a

Y

É

A(b, 0)

The square of the distance between A and B is

E 176

a cos 0

67

a Cosa

6

casinos a cos 0

b

casinos

at

cost

at cost at Cosa at 62

acoso

b

t

a sin 0

Zab coset 62

a'sin 02

SING Zab

tf

acoso b

Zab

62

cos

cos

-where (cos² θ + sin² θ) = 1, by the Pythagorean identity. When cos θ = (π/2), then the triangle is right, and the equation once again becomes the familiar Pythagorean theorem.

Trigonometric Identities A primary aspect of trigonometric functions is in the use of their identities to solve equations where perhaps no other route to a solution may be found. Firstly, we have the reciprocal identities:

SINO a

TANG

doo

0500

I

sing

SECO

see a

I

COTO

Coto

g

Ing

Use the cofunctions to extend the Pythagorean identities:

COSI SINE

SECO TANI

osco

oof

I

Sino

SINE I 1

cost

TO

SECO I TANG

TANG I SEA coto

as

l

Coto

I tosco

Rotating every trigonometric function by 90° yields their cofunctions, the cofunction identities:

SING

cost of

Cosa

sin

I A 177

f of

osco

sec

TANG

cotta

sect

0

f

f

oso

Tanta

Coto

o

In the case of the sine and cosine functions, some of their most useful identities can be derived one from the other. Starting from the Addition identities (a), double the angle to yield the Double-Angle identities (b), and then replace cos θ and then sin θ in the cosine double angle identity to yield the Half-Angle identities (c):

a. Addition Identities

b. Double-Angle Identities

AIB SIN AIB

COSACOSBISINASING

SINGA

ZONSINX

SIN

Cos 2x Cos 2x

COSASINBICOSBSINX

COST SINK

SINK SINK I 251N'd I

ZSINX 1

c. Half-Angle Identities

I

SIN d

k

cos 2x cos 2x

l

mean

a

on

I

Cosy

Cost

20052 1 20052

1 Cos 2x

costa

cos 2x

Finally, we have the product-to-sum identities

cos

178

cost

Cost 4

SINOSINO

COCO 9

SIN Cost

SINCE 9

cost SINO

SINCE F

Icoscott

Ios

Ott

SINCE 0

I

SINCE 0

and the sum-to-product identities:

coset cost Zoos

cost 251N cost

Oto cos o 1 Oto SIN QI

SINO SINO

ZSIN

OED

SINO SINO

Zoos

OLD SINCE 1

cos

OED

Periodic Functions A function f(x) is periodic if there is a positive number p such that f(x + p) = f(x) for every value of x. The smallest value of p is the period of f. In this way, the trigonometric functions are periodic so that sin(θ + 2π) = sin(θ), cos(θ + 2π) = cos(θ), and tan(θ + 2π) = tan(θ), and similarly for their cofunctions. Graph the periodicity by marking off angular values against the function. For example, in the periodic function cos x, the graph of its function is y = cos x, so that 1 = cos(0):

y = cos x –3π/2

π/2 π/2

–π/2

3π/2

This function's domain is –∞ < x < ∞, its range is –1 < y < 1, and its periodicity is 2π. Similarly, the graph of the periodic function sin x is y = sin x, so that 0 = sin(1)

y = sin x 2π

π

π

2π

And the graph of the periodic function tan x is y = tan x, so that 1 = tan(π/4) y = tan x

π/4 –π/4

179

Appendix E Derivatives and Integrals

E.1 Derivatives: K-Closeness and the Least Upper Bound. Limits. Derivatives. Derivative Rules. Common Derivatives. Inverse and Implicit Differentiation. Extrema. The Mean Value Theorem. E.2 Integrals: Definite Integrals. Integral Properties. Common Integrals. The Mean Value Theorem for Integrals. The Fundamental Theorem of Calculus. E.3 Integration Techniques: By Substitution. By Parts. By Trigonometric Substitution. By Partial Fraction Expansion. Further Techniques.

E.1 Derivatives k-Closeness and the Least Upper Bound A number a is an upper bound for a subset X ∈ R if for every x ∈ X, x ≤ a. A number b is a least upper bound, sup, for a subset X ∈ R if for every x ∈ X, b ≤ a. Two numbers, or points, x, y ∈ Rⁿ are k-close if for each i = (0,..., n), |[xᵢ]ₖ – [yᵢ]ₖ | ≤ 10⁻ᵏ, where k is the decimal placement of the numbers. In other words, if two numbers are k-close, then their difference is zero for n decimal places up to k. For example,

Xi y

Yi

1,0001

y

9998

o

0003 I 10 4

0.0001

w

n

4-decimal disagreement

4-decimal disagreement

are 4-close, since their difference is zero for n = 3 decimal places up to 4, where the difference then disagrees with 10⁻⁴. Clearly then, when two numbers are k-close for all k, the two numbers are the same. A quick and dirty way to prove the same thing is

1

0,999

10 1 0.999

9 0.999 a

a

10 0.999

0.999

a

9.999 0.999 a

4

1

where .999... (repeating) is represented by .999 for convenience.

Limits A number x is the limit of the sequence xⁿ if for each real ε > 0 there exists a real N such that for every n > N, |xⁿ – x| < ε. Then the sequence must converge to the limit x: N

N x

xᵢ

n

x

xᵢ

ε

|xₙ – x| < ε

-ie, for all n > N, |xⁿ – x| < ε. In the following diagram, there are a finite number of leftover sequence n terms that lie outside of the epsilon box:

iii i i i N₀

i i

xₙ + ε xₙ + ε xₙ

xₙ

xₙ – ε

i i

i i

xₙ – ε

N₁ 181

-and there exists an index N for which, for every n ≥ N, |xⁿ – x| < ε. In other words, two numbers in a sequence may be arbitrarily close, leaving behind a finite number of leftover terms and, due to convention, such closeness implies equality at the limit. They are therefore sufficiently close. Some functions may not have a particular value at a point, and cannot be evaluated there. For instance, the function y = (x² – 1)/(x – 1) is 0 when x = 1. But when x = 0.999..., y = 1.999..., infinitely close to 2 when denominator x – 1 is infinitely close to 0. Similarly, when x = 1.111..., y =2.111..., infinitely close to 2 when denominator x – 1 is infinitely close to 0:

y 2

y

1

E

tx

1

Some irrational numbers can only be evaluated by a close real approximation, such as π. Limits provide a language for evaluating a function as it's input, and therefore it's output, approaches a number, or is near a number, or is arbitrarily close to a number, without it's input or output actually acquiring that value. How does the function

tx

E

behave near x = 1? The graph for the function shows that it is undefined for x = 1. However, the value of f(x) may be evaluated to be arbitrarily close to 2 as the input x is taken to be arbitrarily close to 1: x

0:9 1:1 0:99 1:01

f(x)

1:9 2:1 1:99 2:01

Clearly then, if a function f(x) is arbitrarily close to a number L for all x arbitrarily close to c, then the number y approaches the limit L as x approaches c:

II TX Therefore, if f(x) = = (x² – 1)/(x – 1), the limit of f is 2:

fx 182

It

II TX

z

Let f(x) be defined on an open interval about c, except possibly at c itself. The limit of the function f(x) as x approaches c is the number L,

L

II JK

if, for every number ε > 0, there exists a corresponding number δ > 0 such that for all x,

FX

OC X C

8

y

y = f(x)

y

L + 1/10

tx Lee

I

LCE y = f(x)

L + 1/10

L

L

L – 1/10

L – 1/10

x

x

c c – δ1/10

c + δ1/10

x C 2840 If P(x) and Q(x) are polynomials and Q(c) ≠ 0, then

IEEE II is the limit of a rational function. Suppose that g(x) ≤ f(x) ≤ h(x) for all x in some open interval containing c, except perhaps at c itself. Suppose also that

Icg X

L

YIch X

Then y₁ = 1 + (x²/2)

y 2

II JX

L

y = u(x)

1 y₂ = 1 – (x²/4) –1

1

X

-the squeeze theorem, which says that any function u(x) that lies between y₁ and y₂ must have limit 1 as x→0. 183

It may take an increasingly large number for a function to approach it's limit- a limit at infinity (∞). The function f(x) has the limit L as x approaches ∞,

IBEX

L

if, for every number ε > 0, there exists a corresponding number M such that for all x

X

FX L

M

CE

-for any M, a greater x may be found, and, as M grows, ε diminishes to the limit. A curve is continuous if and only if it is C¹: once differentiable. As we'll see by Volume III, fractal geometry is an example of a non-continuous, non-differentiable, yet still connected geometry, a notion harnessed by topology. Let c be a real number on the x-axis. The function f is continuous at c if

II TX

ta

By this, we'd test a function for continuity at a point in the same we did with f(x) = (x² – 1)/(x – 1) before. And as we'll see in the next section, for a function to be continuous at a point it's curve's tangent line must exist at that point. Now, the line secant to a graph is that one formed from an average rate of change- the ratio of the difference between a function's input and output taken for any two inputs: y = f(x)

Se ca nt

Q(x₂, f(x₂))

P(x₁, f(x₁))

x₁

x₂

-the average rate of change of y = f(x) with respect to x over the interval [x₁, x₂]:

EY

Xz

f Xi

Xz Xi

FX tax

AX

FX

For example, the average rate of change of a fruit fly population, over the course of 22 days, from the 23rd day of 184

the experiment, to the 45th day of the experiment, is the ratio ∆p/∆t: p Q(45, 340)

350

number of flies

300 250

∆p = 190

200 150

P(23, 150) ∆t = 22

100 50

0

10

t

30

20

40

50

time (days)

-approximately 8.6 fruit flies per day:

340 150

If

kg

45 23

8.6

This ratio is the slope of the secant line through the points P and Q on the graph. But the slope of the graph, and thus the rate of flies per day, is changing. Taking values of Q closer to the day our experiment began thus results in a different slope. Looking at the graph, it's clear that this slope should increase as Q approaches P: p Q(45, 340)

350

(45, 340) (40, 330) (35, 310) (30, 265)

300

∆p/∆t 340 – 150 45 – 23 330 – 150 40 – 23 310 – 150 35 – 23 265 – 150 30 – 23

≈ 8.6 ≈ 10.6

number of flies

Q

250 200 150

P(23, 150)

≈ 13.3 100 ≈ 16.4 50

0

10

20

30

t 40

50

time (days)

The line created when Q reaches P is the tangent line. The slope of the tangent line is the instantaneous rate of change of the function at that point. For example, shorter and shorter periods of time since the 23rd day of the experiment may be taken to reach an approximate instantaneous rate for the 23rd day. Thus, the ratio of the 185

difference between two points on the tangent line is its slope, the instantaneous rate of change of fruit flies on the 23rd day, on the first day of the experiment:

EE 35,9 16.7 -16.7 fruit flies per day. As an example, consider how most problems involving limits may be solved for the limit in question simply by substitution:

Ito

Ito ex

Ito ext i

y

e I

I

tt t t

Derivatives The derivative of a function is its instantaneous rate of change– the instantaneous rate of change of its output with respect to its input. Geometrically, the derivative linearizes curvilinearity, developing equal-dimensional tangent spaces to curves, surfaces, volumes, or any n-dimensional manifold. The derivative of the function f(x) with respect to the variable x is the function f ', whose value at x is

t

x

É

yes

text

y

Fx

For the curve of our single variable function, it's the slope of the line tangent to a function y = f(x) exactly at the point x that it is evaluated:

Tan g

ent

y

m=

A

f(x + ∆x) – f(x) ∆x

Se ca n

t

f(x + ∆x) – f(x)

∆x x

The process of applying the derivative to a function is known as differentiation. Different notations for it mean the 186

same thing:

tx

f

t

x

Derivative Rules Let's quickly review our derivative rules. Derivatives obey i) The Power Rule:

F

nah

x

ii) The Constant Rule:

CD UX

Dx CU X iii) The Sum Rule:

Dx UX UXTWX

U X

VIX

w X

iv) The Product Rule:

x

gx

TX g X TX g x

v) The Quotient Rule:

Fx

gx

XX

gx

TX g x

vi) The Chain Rule:

f x

f a gx

As examples of using the power rule, consider

3

3 13 11

35

543

3 I's 187

As an example of using the product rule, consider

I

fx

I

fix

I

ate

Rte

Rte

I

Ite

I

Elite

I

It

et

ate

Etz É É É I I

Ez X

I

1

As an example of using the chain rule, consider

ZSIN T

XP

ZSIN U X

ZX ZOOSU

2

4 0

Common Derivatives Common derivatives we'll use all the time are the trigonometric derivatives,

SIN X

cos X

CosX

SIN X

TANx

sax

and the trigonometric cofunction derivatives:

csc

X

t

X cot X

csc

cot X

osex

EX

EX TANX

We have then also the derivatives of their inverses, the inverse trigonometric derivatives,

I

SIN X

l

e

cosy

t y

y

Tat't

t

I I

E

and the inverses of their cofunctions, the inverse trigonometric cofunction derivatives:

050 X 188

X

t p p

oso x

l X

F l

Foot x

É

Finally, recall the common logarithmic

Flux I and exponential derivatives:

alma

at

ex ex

As examples of using these derivatives,

FX

R

TX

SIN X

2x

cos X

and

get TANG SINE g t

secs S

sinzt SINE

zoos Zt

S O

sine

Zt

2005 Zt

S SIN Zt

Inverse and Implicit Differentiation In the single variable case, without the use of the partial derivative formula, we can still differentiate implicitly by simply treating treating the independent variable as a differentiable function of the dependent one. For example, for the rightward opening parabola given by x = y², we just treat y (here, the independent variable) as a function of x, y(x), to differentiate both sides:

Ex

It

E Ly

1 29

z x

The inverse derivative is equal to the derivative of the inverse, given by the formula

5K

JI'd

189

For example, using the function y = x², in which the inverse is clearly x¹ ², with known derivatives

x2

Xk z.ly

2x

we can verify the the formula by

x2

I

1

xx z x

Extrema We can use derivatives to identify the maximum and minimum values of a graph on some interval. We can then determine whether a function is increasing, decreasing, or constant, on that interval, by examine sign and sign change. The intervals are those with zero slope, given by the zeros of the first derivative. If the derivative is positive at some point, then the slope is positive, and the function is increasing. If the derivative is negative, the slope is negative, and the function is decreasing. The roots of the equation of the differentiated function divide the curve into intervals of increase and decrease. For example, consider the graph of the function f(x) = x³ – 12x– 5:

The minimums and maximums of f(x) are given by f '(x) = 0:

g

43 12 3

190

2

12

5

3

2

12 0

42 4 4

12

Using x₁ = –2, x₂ = 2 in x³ – 12x – 5 gives y₁ = 11 and y₂ = –21. Then for the graph of f(x) = x³ – 12x – 5, the maxima and minima are located at (–2, 11), and (2, 21). But notice that x = ±2 are just the roots of f '(x) = 3x² – 12:

f '(x) = 3x² – 12

f(x) = x³ – 12x – 5

(–2, 11)

(2, 21)

We can learn even more about f(x) from f '(x). Everywhere f '(x) is positive, f(x) is increasing, and everywhere f '(x) is negative, f(x) is decreasing. The graph of f ''(x) is similarly revealing. For any extrema identified by f '(x) = 0, if f ''(x) > 0, the extrema is a minimum; if f ''(x) < 0, the extrema is a maximum; and if f ''(x) = 0, the extrema is a global inflection point:

f '(x) = 3x² – 12 (2, 12) f(x) = x³ – 12x – 5

(–2, –12) f ''(x) = 6x

For example, using x₁ = –2, x₂ = 2 for the function f ''(x) = 6x, we have y₁ = –12, y₂ = 12. Thus, since y₁ is negative, it must be a local maximum, and since y₂ is positive, it must be a local minimum.

191

The Mean Value Theorem Roughly speaking, the mean value theorem states that for a given planar arc between two endpoints, there is at least one point at which the tangent to the arc is parallel to the secant adjoining its endpoints on some interval. If f : [a, b] → R is continuous and f is differentiable on (a, b), then there exists c ∈ (a, b) such that

6 ta b a

fix Proof:

The distance travelled by the function f in the interval b – a is f(b) – f(a), for which its average speed is

m

461

9

The function g maintains a constant, average speed:

glx

ta

tm X a

The function h measures the distance between f and g:

hx

TX g X

TX

fa

my

a

Function h(x) is continuous on [a, b]. At h(a) = h(b) = 0. If h is 0 everywhere, then h(x) has derivative m everywhere, so the theorem is true. If h is not 0 everywhere, then distance function h(x) must be positive or negative somewhere on the interval. Therefore it must have a positive maximum or a respectively negative minimum. Let c ∈ (a, b) be an extreme value. Then h is differentiable at c, and therefore h'(c) = 0, giving

192

E.2 Integrals Definite Integrals Write sums in compact form with a sigma Σ:

Eak

ait azt

tan

it an

Consider an arbitrary bounded function f defined on a closed interval [a, b]: y

a

x b

Subdivide the interval [a, b] into subintervals, not necessarily of equal width, by choosing n – 1 points between a and b satisfying

C

X

C

Xz s

to C

X

C

Xz L

a

xn 1

6

Setting a = x₀ and b = x,

L Xn

is xn

The set

P

to X Xz

In 1 Xn

is the partition P that divides [a, b] into n closed subintervals:

193

Now, choose points in every subinterval. The point cₖ is within the kth subinterval. The point f(cₖ) touches the k curve. For every subinterval, there may be stood a rectangle having a width of ∆x and height f(cₖ), and an area k that is the absolute value of the product of the width and height, ∆x ⋅ f(cₖ): y

(cₙ, f(cₙ)) (cₖ, f(cₖ))

c₂

c₁ x₀ = a

x₁

xₖ₋₁

xₖ

x₂

xₙ = b

xₙ₋₁ cₙ

cₖ

x

(c₁, f(c₁)) (c₂, f(c₂))

The sum Sₚ of the rectangles approximates the area bounded by the curve and the axis of incrementation- the Riemann sum of f on the interval [a, b]:

Sp

I FKk AX

As n increases, the increments ∆x get shorter. Thus, the rectangles become ever more fine, and the approximation of the bounded area by the Riemann sum increases in precision: y

y

a b

a

x

x b

When n → ∞, ∆x → 0, and the area bounded by the curve on the interval is approximated perfectly: y

1 te a

I

b

194

x

In this way, we have the definite integral of f(x) on the interval [a, b]:

I

WE

DAX

fix Jx

Integral Properties A function f is integrable on some interval and the definite integral ∫ f(x) dx exists if it is continuous over the interval [a, b] , or if ƒ has at most finitely many jump discontinuities there. Definite integrals have the following properties: i) Order of Integration (Orientation):

MY

EX

EX

ii) Zero Width:

iii) Constants:

H iv) Sums and Differences:

ACK

SN

gley

ga

v) Additivity:

E

X

Ex 195

Common Integrals An indefinite integral has no boundaries:

Stix JX Use it to define the property of the integral as the antiderivative:

Stix Sx

HEI

C

This is proven below, in the fundamental theorem of calculus. Note the undetermined new constant C. Call the integrated function F(x). The definite integral, as a rule, integrates f(x) between its boundaries by the formula

fix JX

F b

FLA

FR

C

For example

Ix ax Yi

E

E Etc

Thus, just as integrating a variable x¹ = x take it to x²/2, integrating a constant takes x⁰ = 1 to x¹ = x. For the same reasons, the logarithmic integral is the integral ∫ 1/x = ln|x|, and the exponential integral is the integral ∫ eˣ = eˣ:

fox dx

ext C

SY dx lw X

t c

The Mean Value Theorem for Integrals The average integral value on [a, b] is

Java

bla

fix Jx

For example, consider the function

FLN on the interval [–2, 2] given by the graph 196

4 x2

f et dx

etc

y 2

1

x –2

2

1

–1

We know already that the area of this semicircle with radius 2 is (1/2)πr² = (1/2)π(2)² = 2π. Then the integral measuring its area is equal to 2π, so that the average integral value of f(x) is

Java

ztz124

Rdx

421T

t -π/2. If we take this average y as the height of a rectangle with base width b – a = 4, then the area of the rectangle is 2π, equal to the area of the semicircle: y

2 y = π/2 1

x –2

–1

1

2

The mean value theorem for definite integrals says that there is a number c in [a, b] such that the rectangle with height equal to the average value f(c) of the function and base width b – a has exactly the same area as the region beneath the graph of f from a to b. Thus, if f is continuous on [a, b], then at some point c in [a, b],

5C

bla

b

fix JX

Now, to actually integrate f(x) = √4 – x², we'll need all the techniques of integration given in the next section. 197

The Fundamental Theorem of Calculus What makes calculus a branch of mathematics is that somehow the operation of differentiation is the inverse of integration. This relation is established by the fundamental theorem of calculus, which says that the derivative of an integral of a function is just the function again. Specifically, let f be continuous on the closed interval [a, b], and for all x ∈ [a, b], let F be defined by

FLA

If x dx

Then F is uniformly continuous on [a, b], differentiable on (a, b), and

F CN f x for all x ∈ (a, b). Proof: Define

FLA

If x Jx

For any two numbers x₁, x₁ + ∆x = [a, b],

FCK

EJ t

Jt

and

FIX TAX

EFE Jt

Subtract for the difference:

FIX tax

FIX

EFC Jt

LJ E

By the additive property,

If E JEIEEE JEIEEE Jt So,

198

ÉÉt St EFE Jt If E

Jt

Jt

Making the appropriate substitution gives

EEE JE

FIX

FIX tax

By the mean value theorem, there exists a real c(∆x) in the interval [x₁, x₁ + ∆x]:

I

f Lax f LAX

It

x

Ifl

JAX JX

AX

tax

Sx

Jx

x

ax

Then

FIX tax

FIX

FELIX

FLN

II of

AX

Divide by ∆x and take the limit as ∆x → 0:

1

Fatty

ax

Then, by definition,

II FELIX

F Xi But, by the squeeze theorem,

X E C E X AX

II

o c

AX

Xi

C AX

X

Therefore,

II of LAX

IT x FA

f X

and

F Xi

f X

0 199

E.3 Integration Techniques By virtue of the fundamental theorem of calculus, we may treat integrals as antiderivatives, and use its accompanying formula. In practice, though, this is usually not enough. Below, for the convenience of the reader, we have quickly reviewed and given examples of the most essential integration techniques.

By Substitution Sometimes, the integrand function is not clearly recognized as being a derivative of the integration:

This function might be integrated if it's put into simpler terms. Here, there are many functions composed with each other. Substitute u for one function, and somewhere in the integrand find its derivative:

Integrate and substitute back in:

All substitutions have this flavor. In general, if g' is continuous on the interval [a, b] and f is continuous on the range of g(u) = x, then

If

x

Jx

Sf gcu ga du

-the substitution formula for indefinite integrals. Sometimes an extra factor may appear in du that does not exist in the integrand:

Compensate for extra factors after the integral sign by adding its reciprocal in front of the integral sign:

200

If f(u) du contains an extra factor of x, just put x in terms of u:

Agam

Magoon

The substitution formula for definite integrals is the same as for indefinite ones, but now there are bounds in terms of x that will need to be put in terms of u:

For example,

The boundaries changed to g(a) = 0, g(b) = 2, since ((1³) + 1) = 2, and ((-1³) + 1) = 0.

By Parts By integrating the product rule of derivatives,

201

-we get the formula for integration by parts:

flag Nba

t g Jx

f g dx

Rephrased as

f u Ju

f v du

uv

The goal in actually using this formula is to put a non-integrable integrand ∫ u dv in terms of one that is, ∫ v du. Begin by identifying the u and dv in the non-integrable integrand according to its separate integrable parts, then use the formula. For example,

so

I It may be needed more than once:

A

I

I Integrate by parts again:

i 202

I

t

By Trigonometric Substitution For all but the secant and tangent functions, the antiderivatives of trigonometric functions are just the opposite of their derivatives:

SIN X CosX

Os X

f cos X JX

SIN X

S sin x dx

EX DX

SosEX JX

S EX TANX

SIN X cos X

TANX COTX SECX

JasoX cot

csc

X

And, for the secant and tangent functions,

STANX

In

Jx

ScotX JX

sax

In SINX

S sect Jx An

sext TANX

JasoX dx

csc

In

X toot X

A trigonometric integral is one involving an algebraic combination of trigonometric functions in the integrand:

ISIN3Xcos Sx JX For products of powers of sines and cosines, take any odd power and form a trigonometric square to use the Pythagorean identity, or take two even powers and use the half angle identities:

I 203

For example, with m odd:

x

A

t

c

t t With n odd:

Use the same technique to split up powers of the tangent and secant functions, and integrate by parts to reduce those powers:

A

204

É G t t If the derivative of a trigonometric function can be found in the integrand, use substitution instead:

t t

t t

t

t

t

t

Use the product to sum identities for products of sines and cosines:

t t

t t

t

205

Trigonometric functions may be algebraically combined with other kinds of functions in the integrand:

M

b Sometime, we might want to integrate with respect to an angular variable:

MAMA where u = cot θ, du = –csc²θ dθ

–du = csc²θ dθ.

Integrands of the form

may be integrated by the substitution of the variable of integration with a trigonometric function: trigonometric substitution. The substitutions make the integrand integrable and are of the form

206

g

I

Use the reference triangles to put the solution back in terms of the variable of integration. For example,

Now use the reference triangle to put ln|sec θ + tan θ| back in terms of x:

Here is an example using the sine substitution for the form a² – x²: 207

W

soo u = 2θ du = 2 dθ

Similarly, setting a² = 4, we can finally integrate the integrand from the previous section, f(x) = √4 – x²:

A

t

t

209

Then the average value fAVG is equal to

dehra

I By Partial Fraction Expansion An integrand involving a rational function that has a factorable denominator and a numerator degree less than its denominator degree may require simplification via its partial fraction expansion before it can be integrated. Take for example

99 t

t

210

la

I

I

l

Use the faster Heaviside Cover-Up method when available (linear factors of the first degree in denominator), back substitute out of taking matrix A all the way to R, and be prepared to long divide the polynomial rational function to achieve a denominator degree less than that of the numerator. For example,

9

Wig

t

t 211

Further Techniques Not every integral will yield to the techniques given so far. Some will require the use of the more specialized techniques, and we'll review a few and demonstrate them in this final section. Force substitution on integrands that have inconvenient terms that do not equal any findable du:

I

xt6

x3 2

4

Algebraically manipulate them into an equivalent form that also involves their derivative. Here, we want to replace the x + 6 term with 3x + 2, the derivative of x³ + 2x + 4. Go ahead and put the derivative in the solution. Multiply the denominator by the same factor. What number can be added that will equal the original constant term when divided by 3?

t Integrals of the form

in which the denominator does not factor, and the numerator is not a constant multiple of the derivative of the denominator, and no algebraic manipulation can force substitution, may be solvable by completing the square. Complete the square of the denominator and make a substitution that integrates to the inverse tangent or logarithmic functions, depending on whether or not the variable exists in the numerator, by the following formulas:

For example

212

Now make a substitution that gives the form of the derivative of either the logarithmic or inverse trigonometric functions: u = x – 3/4 du = x dx

As an example of a problem that requires forcing a substitution, consider a³ – b³ = (a – b)(a² + ab + b²)

are q

A

213

The last integrand will have to be put in a form that allows substitution: force substitution. If u will be the denominator, the fraction will have to be put in a form where the derivative is in its numerator:

Now, complete the square:

214

u = x + 1/2 du = dx

v = 2u/√3 dv = 2/√3 du

am Finally, add back the arbitrary constant C:

S

215

RI

4

x

3 tnx x

I

flux

At XH

t

3 Tan

EY t c