415 241 238MB
English Pages 475 Year 2019
bY
CALCULUS BLUE MULTIVARIABLE VOLUME 2 : DERIVATIVES ROBERT GHRIST 3rd edition, kindle format Copyright © 2019 Robert Ghrist All rights reserved worldwide Agenbyte Press, Jenkintown PA, USA ISBN 978-1-944655-04-4 1st edition © 2016 Robert Ghrist 2nt edition © 2018 Robert Ghrist
prologue chapter 1: multivariate functions chapter 2: partial derivatives chapter 3: the derivative chapter 4: differentiation chapter 5: the chain rule chapter 6: differentiation rules chapter 7: inverse function theorem chapter 8: implicit function theorem chapter 9: gradients chapter 10: tangent spaces
chapter 11: linearization chapter 12: taylor series chapter 13: computing taylor series chapter 14: critical points chapter 15: optimization: regression chapter 16: optimization: game theory chapter 17: constrained optimization chapter 18: the lagrange multiplier chapter 19: using lagrange’s method epilogue foreshadowing: integrals
enjoy learning! use your full imagination & read joyfully… this material may seem easy, but it’s not! it takes hard work to learn mathematics well… work with a teacher, tutor, or friends and discuss what you are learning. this text is meant to teach you the big ideas and how they are useful in modern applications; it’s not rigorous, and it’s not comprehensive, but it should inspire you to do things with math… exercises at chapter ends are for you to practice. don’t be too discouraged if some are hard… keep working! keep learning!
Waves oer the heavens deep…
What have we learned?
Well…
we’VE LEARNED TO WORK WITH…
&
but
is understanding & using derivatives
how to maximize/minimize a function with multiple inputs?
What lies beyond maxima/minima?
the lagrange multiplier will be one of our most useful optimization methods
it works in the setting of constrained optimization
such constraints arise naturally in economics, engineering, physics, & more
why do multivariate derivatives?
derivatives are critical to so many applications! along the way, we’ll learn…
x0
n # (xi yi) – (# xi) (#i yi)
m = A-1 b = b
n # i xi 2 – ( # i xi )
2
–(# xi) # (xi yi) + (# xi2) (# yi)
n # xi 2 – ( # xi )
2
we’ll learn some applications ranging from statistics to game theory…
is concerned with the Local & global features of Functions…
Most of the calculus you have learned
f: _
2
f: _
these curves are “level sets” where the function is constant: see chapter 9
multivariable calculus deals with functions of the form n inputs
n
f: _
m
m outputs
you’ve seen lots of multivariate functions before...
f: _
n
2
f: _
n
f(x,y) = c
these curves are “level sets” where the function is constant: see chapter 9
2
f: _
f(x,y,z ) = c
the corresponding construction in 3-d yields surfaces...
3
f: _
Sometimes, you can use color or density or time to visualize functions with more than two inputs/outputs…
4
f: _
2
often have position and/or time as inputs
meteorological quantities are functions of position
2
f: _
meteorological quantities are functions of position
2
f: _
2
meteorological quantities are functions of position
3
f: _
3
can “convert” between different units
r θ
y
x
polar functions can switch between coordinate systems
functions can switch between coordinate systems x r rθ =f = rθ y θ r x2 + y2 -1 x =f = θ y (y/x)
these are TYPICALlY inverse functions
r
euclidean θ
y
x
functions can switch between coordinate systems
can be explicit functions of inputs
Q = quantity
demand : D(P) = D0 - bP
D0
supply = demand at the market equilibrium
Qm S0
Pm
P = price
Qm = S0 + a Pm = D0 – b Pm market quantity
market price
P = price
supply : S(P) = S0 + aP
(a + b) Pm = D0 - S0 Pm =
Pm Qm
D0 – S0 a+b
a b = = f S 0 D0
Qm = S0 + a Pm = S0 + a D0 – S0 a+b a D0 + b S0 = a+b D0 – S0 a+b a D0 + b S0 a+b
supply : S(P) = S0 + aP demand : D(P) = D0 - bP
supply = demand at the market equilibrium Qm = S0 + a Pm = D0 – b Pm market quantity
market price
P = price
S0 + a Pm = D0 – b Pm
f
=f
S E I R
S μ - βSI - μS E βSI – (μ+α)E = I αE – (μ+γ)I R γI - μR
S(t) = susceptible E(t) = exposed I(t) = infected R(t) = recovered α, β, γ, μ = rates positive constants
how these cohort sizes evolve
rates of change
this is just one possible model
dS/dt dE/dt dI/dt dR/dt
are sometimes “implicitly defined”
is to think about rates of change
the mathematics of change
the big picture
Most “real-world” functions have multiple inputs & multiple outputs… multivariable calculus Is what makes sense of them!
1
Consider functions f : n _ m and g : k _ n and h : m _ k WHERE m ≠ n ≠ k . A) Which pairwise compositions f•g , f•h , g•h , etc. are permitted? B) For each such legal composition, state the number of its inputs & outputs.
2
recall that the domain of a function is the set of all “legal” inputs. What subsets { (x, y) } of 2 comprise the domains of these functions?
x A) f = y
3
xy 1-x2-y2
y x b) f = y x
x c) f = y
2x – 5y 7x + y -x + 4y
[challenge?] find the inverses of the following change-of-coordinates functions A)
u x =f = v y
x+y x-y
b)
u x =f = v y
y3 x-y3
Can you draw pictures of what is happening in these coordinate changes?
4
if you wanted to model the flow of a fluid (such as wind) in 3-d as a function of particle positions and time, how many inputs and outputs would you need?
5
continuity for multivariate functions is a little more complicated than saying “draw the graph without lifting the pencil”… consider the following function: xy a) note that this is not well-defined at the origin… f (x, y) = 2 2 x +y b) take the limit of f(x,0) as x_0. are you happy? c) now take the limit of f(t,t) as t_0. are you still happy? d) relax! you will probably never see a function like this in “the wild”. maybe.
6
a famous model in economics is the cobb-douglas model for production. it says: P = CLαM 1-α where P, L, & M are the amounts of goods produced, labor used. & materials used, respectively; and C and 0m)
the derivative gives a system of equations to solve for the tangent space
x = f (0)
where
x=
x1 …
n
given a level set in n -1
xn
the tangent space at x0 is given by
[Df] ( x-x0) = 0 evaluated at x0
let’s say you have a “nice” parametrized “manifold” defined by
f: _
(kRn such that (1)Each φα maps Uα onto its image continuously with a continuous inverse φα-1. (2)If Uα intersects Uβ, then the change of coordinates map φβ.φα-1:Rn->Rn is differentiable on its domain φα(Uα ∩ Uβ). This is not an easy definition. Just think “locally Euclidean”.
You should not worry about this too much. One never really constructs all the covers and coordinate maps. Instead, one relies on the Big Tools: the Inverse and Implicit Function Theorems, which hold for manifolds. These can be used to generate n-manifolds as solutions to implicit equations. Manifold theory is used all the time in physics, robotics, dynamical systems, control theory, and (!) Mathematics.
I have read the above completely and agree to abide by these terms
for more than just geometry…
Finding tangent spaces is really a linearization of a function
The linearization varies from point-to-point…
…and provides an approximation that is locally valid
Finding tangent spaces is really a linearization of a function
The linearization varies from point-to-point…
…and provides an approximation that is locally valid
it’s simplest to use differential notation
f(x) = f(x1, x2, x3, … ,xn) n
^f df = # ^x dxi i i=1 this is a linear combination of differentials if we “pretend” that each differential represents a small change, then…
^f = ^x
^f ^x1
^f ^x2
^f ^xn
this is a linear transformation acting on vectors of rates of change
^f dx = df ^x
where “dx” is a vector of differentials dxi
are really helpful when estimating errors
numerical approximations let’s estimate the terms…
x=5
z = π = -1 dz ≈ 0.01 2 3 = (π-0.14…) ≈ -1 + 21 (0.14) ≈ -0.99
Consider the function
15 xyz
now we can compute… -0.1 0.14 0.01 15 ≈ 15 + -15 + 3 + -1 4.9 π(3) 5*3*(-1) 5*3*(-1) 5
(
you can check that…
dx + dy + dz df = -15 xyz x y z
(
y = 3 dy ≈ 0.14
the last will take a bit more work…
15 4.9π(3 ) f(x,y,z) =
dx = -0.1
)
= -1 + ( -.020 + .047 - .010 ) = -0.983
= -0.98427…
)
that’s not bad for by hand…
tolerances and error
a = 24 ± 1/8 inches beam length: rafter length: b = 30 ± 1/4 inches θ = 57•± 1• angle: 2
2
estimate the strut length
L
L ≈ 26.30 in
L = a2 + b – 2ab θ • errors: | da | < 1/8 ; | db | < 1/4 ; | dθ | < 1 (= 1/2π rad) L dL = (a-b θ) da + (b-a θ) db - (ab θ) dθ (a-b θ) da + (b-a θ) db - (ab θ) dθ dL = L maximize & compute
L
b θ a | dL | < 3.85 in
Are especially important in applications
if you have/need information about percentage errors, you will want to use relative rates
relative rate of change of u
“if each input is known with a 1% accuracy, with what accuracy are the outputs known?”
du d( u) = u represents percent change
Beam deflection
The elastic deflection at the midpoint of a beam, loaded at its center, & supported by two simple supports is
3
FL u = 48EI
The cross-sectional moment of inertia of a rectangular beam is
cross-section
3
wh I = 12
h w
if each variable is subject to a 2% error, what is the net impact?
Beam deflection
3
3
3
2
3
3
L 3FL FL 3FL du = 3 dF + 3 dL 2 3 dw 4 dh 4Ewh 4Ewh 4Ew h 4Ewh The relative rate of du = dF + 3 dL - dW - 3 dh error in deflection u F L W h
assume each term is +/- 2% at worst
FL FL = u = 48EI 4Ewh3
u(x) = u(x1, x2, x3, … ,xn) it’s of the form n
ci
u = K Π xi
du = d( u) u n ci = d ( K + # (xi ) ) =
i=1
=
n
i=1
c i d( xi) # i=1 n
dxi c # i xi i=1
is great… but it’s just the first step
in some regions, linear approximation is very accurate... ...but not everywhere!
of course, it’s… did I really need to ask?
you might want to review that before proceeding…
the big picture
the derivative is key to both linearization & approximation of a nonlinear function
1
practice your implicit differentiation on the following functions, writing your answer in terms of differentials A) w = (x 2y2z ) B) a 2b - ab 2 = ab c) u = e vw - e wv
2
use differentials to give a numerical approximation for 1 / ( 0.99+0.492 ). try to do it without using a calculator! Hint: use f(x,y) = 1 / ( x+y2 ) .
3
if you know that e3 = 20.0855… and that (e/π)-( 3/2) ≅ -0.00077.., then, using differentials, estimate e5/ π2 ≅ 15.0374… how close is your estimate? now let’s say you also know that π3 ≅ 31.0063… then, estimate π/e (without knowing the square root of 3) by using the expression π/e = π3(e/π)2/e3. yes, this is kind of ridiculous.
4
to which variable (height or diameter) is the volume of a cylinder more sensitive? does this impact aspect ratios of canned goods? think of volumes and dimensions of cans of soda/pop, energy drinks, vegetables, juice, etc.
6
consider the area A of a parallelogram in the plane determined by vectors a i + b j and c i + d j. if each of the constants (a, b, c, d) can vary by up to 10%, what percentage variation can arise for A? your answer will depend on (a, b, c, d). are there values of (a, b, c, d) which lead to very large percentage errors in A?
7
assume that you are told the surface area of a sphere with a possible 10% error in measurement. with what confidence do you know its volume? radius? diameter? How does your answer change if the object is a cube?
8
[challenge] consider a solid object, such as a sphere, cone, or cube, in 3-d. if you have a measurement of the length scale (radius, side length, height, etc.) with 1% error, which is more accurate: measurement of volume? or surface area? does it matter what the exact shape is?
9
what is the relative rate of error of an exponential function f(u) = Ceu ? how does this compare to that of a logarithmic function f(u) = C u ?
you learned taYlor series?
for f : IR _ IR the taylor series is about x=0
f(x) = f(x) = f(a+h) =
about x=a
f(x) =
∞
i
∑
1 df i ! dxi
∑
1 i!
i=0 ∞ i=0 ∞
∑ i=0 ∞
∑ i=0
xi
= f(0) + f'(0) x +
0
i
i x Df
1 df i ! dxi
i
a
i
1 df i ! dxi
∞
h =∑ i=0
i
a
f''(0) x2 + …
if one thinks of differentiation as an operator D
0
i
1 2
1 i!
i
Df h
∞
(x-a) = ∑ i=0
i
local variable: h = x - a
a
1 i!
i
i
D f (x-a) a
this converges within a radius of convergence
as you recall… truncations to taylor polynomials provide approximations of increasing fidelity with degree
is simplest in the single-output case
in like manner…
a taylor polynomial is the “best fit” polynomial
in a neighborhood of the expansion point
about x=0
∑
f( ) = sum over “multi-indices”
1
Df 0
it looks intimidating, but that is to be expected… remember how the single variable case seemed hard at first? we will learn what all these terms mean…
is intimidating-looking, but really useful
attention must be paid to:
multivariate sums work best with a multi-index
the
given n variables…
x = ( x1 , x2 , … , xn )
a multi-index is n ordered indices…
I = ( i1 , i2 , … , in )
of a multi-index I is defined as | I | := i1 + i2 + … + in
x = ( x1 , x2 , … , xn ) I = ( i1 , i2 , … , in ) the degree of the monomial term equals the degree of the multi-index
xI =
i1 i2
in
x1 x 2 … x n
multi-index monomials
x = ( x, y, z ) x(1,2,3)
=
x1y2z3
= xy2z3
x = ( a, b, c , d ) x(1,2,3,1)
2 3
= ab c d
x(0,1,0) = x0y1z0 = y
x(0,1,1,1) = bcd
x(1,0,1) = x1y0z1 = xz
x(1,0,0,0) = a
x(0,0,0) = x0y0z0 = 1
x(0,0,0,0) = 1
“linear” terms have degree 1 ; “quadratic” terms degree 2 ; etc.
I = ( i1, i2 , … , in ) (1, 2, 3) ! = 1! 2! 3! = 12 (2, 2, 2) ! = 2! 2! 2! = 8 (0, 0, 0) ! = 0! 0! 0! = 1
I ! = i1! i2! … in!
INTERACT NICELY WITH THIS NOTATION
f=f(x) x = ( x1 , x2 , … , xn ) I = ( i1 , i2 , … , in ) take partials according to the multi-index
I
D
i1
i2
^ ^ f = i1 i2 ^x1 ^x2
in
^ in ^xn
f
higher derivatives
f ( x ) = x3 y z2 D
(1,2,3)
2
3
^ ^ ^ f = ^x ^y2 ^z3
D
(1,0,1)
D
(0,0,0)
f=
^ ^ ^x ^z
x = ( x, y, z )
f = 0
f = 6x2yz
f = f = x3 y z2
it’s not hard to compute multiple partial derivatives, but…
if f has continuous second partial derivatives, then…
^ ^ ^xi ^xj
f =
^ ^ ^xj ^xi
f
for all i and j
And now After all that hard work… The notation makes more sense… so
1
∑
f( ) = sum over multi-indices
Df 0 all derivatives evaluated at zero
I = ( i1 , i2 , … , in ) | |
f ( ) =∑
1
Df
about
f( letting
) =∑
a
1
evaluate at
Df a
so much work to compute !
not all series converge! in general, you have to compute the radius of convergence and stay within that distance to the expansion point. that’s not always easY to do – we will not explore this here… we know what the first derivative of a function is, but what are the higher derivatives? are they like matrices? the answer is not simple, as you can see from the complexity of the taylor formula.
the big picture
multi-index notation gives a general formula for taYlor series that closely resembles the usual formula we know & love
1
evaluate the following multi-index factorials: A) ( 1, 2 ) ! b) ( 0, 1, 0 ) ! c) ( 3, 0, 1, 2 ) ! d) ( 4, 0 ) ! e) ( 1, 1, 1, 1, 1 ) ! f) ( 2, 2, 2, 2, 1, 0 ) !
2
evaluate the following multi-index monomials xI, where: A) x = ( x, y ) ; I = ( 1, 2 ) b) x = ( x, y, z ) ; I = ( 3, 0, 2 ) c) x = ( x1, x2 , x3, x4 ) ; I = ( 1, 4, 0, 2 ) d) x = ( a, b, c ) ; I = ( 0, 2, 1 ) state the degree of each monomial in each case
3
compute the following multi-index derivatives DIf , where: A) x = ( x, y ) ; I = ( 2, 1 ) ; f(x) = exy b) x = ( x1, x2 , x3, x4 ) ; I = ( 1, 2, 3, 4 ) ; f(x) = ( x1 + x2 - x3 + x4 )3 c) x = ( u, v, w ) ; I = ( 2, 1, 1 ) ; f(x) = u3v2w – (u + v + w)2 d) x = ( x, y, z ) ; I = ( 0, 2, 1 ) ; f(x) = ( xyz )
4
compute the taylor series of f(x, y) = 3 – x + 2y + 5xy – y2 about the point x=1 and y=2. do this step-by-step, computing all derivatives. note: your answer should be a polynomial in the variables (x-1) and (y-2). when finished, be sure to check your work by multiplying everything out and confirming that you get the original polynomial back.
5
fix n the number of variables. there are, clearly, n multi-indices of degree equal to 1. how many multi-indices are there of degree 2? of degree 3? can you guess at (or prove) how many multi-indices have degree K for K>0 ?
6
[challenge] prove that mixed partial derivatives commute for multivariate polynomial functions. if you’re stuck, start with monomials in two variables…
7
for f : n _ the second derivative [D2f] is defined to be the square matrix whose entries are the second partial derivatives: [D2f]i,j = ^2f/^xi^xj = ^/^xi (^f/^xj). 2 what does the fact that mixed partials commute tell you about [D f]?
in the case of a planar function
2
f: _
for a planar function about origin…
f(x,y)
= f(0,0) ^f + ^x
^f x + ^y 0 2
0
y
2
2
1 ^f ^f 1 ^f 2 2 + x + xy + y 2 2 2 ^x 0 ^x^y 0 2 ^y 0 3
3
3
3
1 ^f 1 ^f 1 ^f 1 ^f 3 2 2 3 + x + x y + xy + y 6 ^x3 0 2 ^x2^y 0 2 ^x^y2 0 6 ^y3 0
+…
that is still a lot of derivatives
doing things the hard way… 2+y x taYlor expand e about x=y=0 up to terms of order three
I I = ( 0, 0) I = ( 1, 0) I = ( 0, 1) I = ( 2, 0) I = ( 1, 1) I = ( 0, 2)
I!
I
D
1 1 1 2
2+y x e
1 2
2+y x 2xe
2+y x 2xe 2+y x e
2+y 2+y 2 x x 4x e +2e 2+y x e
at (0,0) 1 0 1 2 0 1
(x,y)
I
1 x y x2 xy y2
idea: use the taylor series for e©
e©
= 1+©+
2+y x e
= 1+ = =
1 2
2
© +
1 2 (x +y) +
1 6
© 3 + O(©4)
2 2 (x +y)
+
1 6
3 2 (x +y) +
O( |x|4 )
2 1 4 2 1 2 1 6 1 4 1 2 1 3 2 1 + (x +y) + ( 2 x +x y+ 2 y ) + ( 6 x + 2 x y+ 2 x y+ 2 y ) + 1 2 3 2 1 3 2 1 + y + x + 2 y + 2 x y + 2 y + O( |x|4 )
is, as ever, so very helpful
using the chain rule taYlor expand
1+y+xz about x=y=z=0 up to terms of order three
1 1 2 1 3 + 2 © - 8 © + 16 © + O(©4) 1 1 1 2 3 1 + y + xz = 1 + 2 (y+xz) - 8 (y+xz) + 16 (y+xz) + O( |x|4 ) 1 1 2 1 3 = 1 + 2 (y+xz) - 8 (y +2xyz) + 16 (y ) + O( |x|4 ) 1 1 1 2 1 1 3 = 1 + 2 y+ 2 xz - 8 y - 4 xyz + 16 y + O( |x|4 )
(1+©)1/2 = 1
Why bother with taylor series?
remember, when given a “strange” function, and a particular local region of inputs, taylor expansion reveals its behavior…
local solutions to equations
y
approximate solutions to
( xy ) =
3 x e
- ( x2y )
near ( 0, 0 )
( xy ) =
x
3 x e - ( x2y )
xy - O((xy)3) = ( 1 + x3 + O(x6)) 2 2 - ( 1 - O((x y) )) xy – x3 + O( |x|6 ) = 0
ignore higher order terms
xy – x3 = x ( y-x2) =0
complicated? yes. Useful? oh yes.
let’s rewrite the second-order expansion to see a patTERN…
f(a+h) = f(a) +
1 1!
[ Df ]a h +
2 1 T h [ D f ] h a 2!
3
+ O( | h | )
we can form the 2nd partial derivatives into a matrix, making this term a quadratic form
2
the hessian (or 2nd derivative)
2
[D f]i,j =
2
^f ^xi^xj
2
^f ^x12
[D f] =
f(a+h) = f(a) +
2
^f ^x1^xn 1 1!
[ Df ]a h +
2
2
^f ^xj2
^f ^xn^x1 2
^f ^xn2 2 1 T h [ D f ] h a 2!
this is a sYmmetric matrix 2
2 T
[D f] = [D f] 3
+ O( | h | )
but it’s not a 2-d array… such multidimensional arrays are known as tensors, and they have a rather complex algebraic structure
the big picture
as with single-variable taylor series, multivariate taylor series are often easilY computed via the chain rule
1
make sure you remember your taylor series! write out the standard, singlevariable taylor series about zero for the following functions: a) © b) © c) (1+©) d) 1 1-© α e) h © f) h © g) (1+©) be sure to include any bounds on domain of convergence…
2
compute the taylor series of the following functions about the origin, using standard taylor series and composition. a) ( x2 – 2y ) all terms up to & including order 4 b) h ( xz + y3 )
all terms up to & including order 9
c) ( 1 + x ( y – xey ) )
all terms up to & including order 5
d) ( 1 + ( 1 + x ) / ( 1 – y ) ) 1/2
all terms up to & including order 3
you learned how to solve max/min problems ?
in the case of a multi-input function
n
f: _
or two, to find & classify extrema of functions…
n
for f : _ , a critical point is an input whose derivative is zero… [ Df ] =
^f ^x1
^f ^x2
^f ^xn
0
0
0
of course, this is, equivalently, a point where the gradient is zero
n
a real-valued function f : _ attains a local maximum or minimum at a only if the input a is a critical point
and just like single-variable calculus
finding the critical points find and classify the extrema of
f ( x , y ) = x3 + y3 - 3xy ^f = 3x2 - 3y = 0 ^x ^f = 3y2 - 3x = 0 ^y critical points: ( 0, 0 ) ( 1 , 1 )
f(0,0) = 0 f _ +∞ f _ -∞
f( 1, 1 ) = -1 as as
x, y _ +∞ x, y _ -∞
is what we need
f ( a + h ) = f ( a ) + [ Df ]a h +
2 1 T h [ D f ] h a 2!
+ O( | h | )
THE SECOND DERIVATIVE DOMINATES LOCAL BEHAVIOR T
2
h [ D f ]a h > 0 IMPLIES THAT THE CRITICAL POINT IS A
for all h ≠ 0
T
3
2
h [ D f ]a h < 0 IMPLIES THAT THE CRITICAL POINT IS A
2
[D f] =
a b c d
det = ad - bc tr = a + d
it’s easy (and sufficient) to consider the case of a diagonal matrix in this case, the signs of the diagonal terms classify the critical point
+ 0 0 -
det < 0
+ 0 0 +
det > 0 tr > 0
- 0 0 -
det > 0 tr < 0
2
ASSUME f : _ ASSUME a critical
> if DET[D2f]a if DET[D2f]a>0 then 2 > TR[D f]a>0 => a = LOCAL MIN; 2 > TR[D f]a a = LOCAL MAX;
> else
a = DEGENERATE :
FAIL
classify f ( x , y ) = x3 + y3 - 3xy
2
[D f] =
6x -3 -3
6y
= =
6
-3
-3
6
0
-3
-3
0
2
det [ D f ] = 27 > 0 2 tr [ D f ] = 12 > 0 2
det [ D f ] = -9 < 0
this function vanishes to 2nd order at the origin
this is sometimes called a "monkey saddle"
a classification of n-dimensional critical points
see the epilogue for a few hints…
before dealing with these complications
the big picture
the second derivative classifies critical points into local maxima, local minima, & saddles, (or degenerates)
1
Compute and classify the critical points of the following functions: A) 2x2 + 2y -4y2 + 4x
b) x(x+1)(y-2) – y(x+3)(y-4)
c) x3 + 3xy +2y2 + 3x - 4y
d) x3 - 4xy +2y2
e) ( x2 + 4y2 + 1 )
f) h ( (x-1)2 -(y-2)2 ) h) (x1-1)2 + (x2-2)2 + (x3-3)2 + … + (xn-n)2
g)
2 2 –x –(y+2) e
2
For which values of C will the function f(x,y) = Cx2+4xy+Cy2 have a (local) maximum at (0,0)? What about minimum? Saddle?
3
Compute the second derivative (or “hessian”) of the following functions at 0 : 3x2 + 4xy -4y2 + 4x -3y + 2
A) c) (x-3y) + 2x (y-2x) 2 | x | = x*x e)
2
2
e2x +3y
b) d) x2 + 2xz - y2 + 3xy -2yz + 4z2 f) xTAx for A a square matrix
The treatment of critical point classification given in this chapter only works for 2-d and uses trace and determinant. Both this and the general n-dimensional case are greatly simplified with the use of eigenvalues. From the Taylor expansion of f:Rn->R about a critical point a, one sees that the second derivative (or Hessian) is the dominant term: f(a+h) = f(a)+ 1/2 h[D2f]hT + … Denote by {λi}, i=1…n, the eigenvalues of [D2f]
LEMMA: All eigenvalues λi of [D2f] are real. (This follows from [D2f] being a symmetric matrix, due to mixed partial derivatives commuting. From there, well, you should take a linear algebra course. Really!)
CONCLUSION: The signs of the eigenvalues {λi} completely determine the critical point type: 1) ALL λi>0 ==> MINIMUM (ie, [D2f] positive definite) 2) ALL λi MAXIMUM (ie, [D2f] negative definite)
COROLLARY: The quadratic form in the Taylor series about a 3) λi>0>λj ==> SADDLE critical point a converts after (some go up, all others down) a linear change of coordinates 4) if any λi=0 ==> DEGENERATE to: (need more info to determine) f(a+h) = f(a)+ 1/2 Σiλihi2 + O(h2) This is the best way to classify extrema.
I have read the above completely and agree to abide by these terms
given a collection of data points in the plane, what is the “best fit line” through them?
y
x what if we don’t want to “take a good guess”?
that’s what you do, right?
y
data presented as n pairs (xi, yi) find a “best fit” line y = mx + b
x
minimize the square sum of vertical differences between points and the line
f ( m , b ) = # ( yi – (mxi + b)) i
slope
Y-intercept
Y-value
2
Y-coordinate of line
f(m,b)=# ( y – (mx + b) ) i i i
2
^f = # ^ ( y – (mx + b))2 i i ^m i ^m ^ = # 2 ( yi – (mxi + b)) ^m ( yi – (mxi + b)) i
= # 2 ( yi – (mxi + b)) (–xi ) i
= # -2xi ( yi – (mxi + b)) i
by linearity chain rule
take the partial there, that’s not so bad!
f(m,b)=# ( y – (mx + b) ) i i i ^f = # ^b i
2
^ 2 ( yi – (mxi + b)) ^b ( yi – (mxi + b))
= # -2 ( yi – (mxi + b)) i
^f ^m
[ Df ] = #i -2xi (yi - mxi - b)
^f ^b
#i -2 (yi - mxi - b)
# -2 x (y mx b) = 0 i i i i
expand
# -2 (y mx b) = 0 i i i
#i xi yi -#i mxi2 -#i bxi = 0
#i yi -#i mxi -#i b = 0
#i xi yi -m#i xi2 -b#i xi = 0
#i yi -m#i xi -b#i 1 = 0
m #i xi2 + b #i xi = #i xi yi
m #i xi + b #i 1 = #i yi
pull out the variables
rearrange
a linear system!
# x i2
# xi
m
# xi
#1
b
=n -1
A =
Ax 2
1
=
# xi yi # yi
= b 2
n # xi – (# xi)
n
is A invertible? well…
det A = 2
2
n # xi – (# xi )
which is >0 so long as the xi are not all same
-# xi
-# xi # xi2
thanks to the standard 2-by-2 formula
-1
A =
m b
2
1
2
n # xi – (# xi)
-1
=A b =
n
-# xi
-# xi # xi2
b =
n # (xi yi) – (# xi) (#i yi) 2 2 n #ixi – (#ixi) –(# xi) # (xi yi) + (# xi2) (# yi) 2 2 n # xi – (# xi)
# xi yi # yi slope
Y-intercept
how do we know this solution MINIMZES distance to the line?
we know…
[ Df ] = #i -2xi (yi - mxi - b) compute the second derivatives…
2
[D f] =
2#ixi2 2#ixi 2#ixi
= 2A twice the matrix we used earlier !
2n
#i -2 (yi - mxi - b) 2
tr [D f] > 0 2 det [D f] = 4 det A 2 2 = 4 ( n #ixi - (#ixi) ) 2
= 4 #i0
thus, this choice of m, b, minimizes the least-squares distance
there’s a lot more to linear regression than just planar line-fitting… with n dependent and m independent variables, one is looking for a best-fit m-dimensional n+m “subspace” in the “goodness” of the fit is crucial & not all relationships are linear!
LINEAR regression is AS INTERESTING AS IT IS USEFUL…
& IS A GREAT REASON TO LEARN MORE LINEAR ALGEBRA
IS THERE NON-LINEAR REGRESSION? OF COURSE!
ONE CAN USE A LOW-ORDER POLYNOMIAL APPROXIMATION & SOLVE FOR THE COEFFICIENTS OF THE TAYLOR SERIES
& BEYOND THIS LIES…
but that’s another story…
the big picture
regression formulae in statistics look intimidating, but are easily derived as solutions to optimization problems
1 2
what happens to the best fit line when you rescale the yi values in the data set by a (nonzero) constant C? [that is, the new yi equals Cyi] consider the following set of points in the plane: (xi , yi) = ((i), (i)) : i = 1…n A) use software to compute the values of m and b for the best fit line, for various values of n. be sure to use radians when computing (xi , yi). any patterns? b) now, if you wish, plot the points (xi , yi) for the same values of n. now what do you notice? does this shed light on your answers for m and b above?
3
prove that If the data set (xi , yi) consists of points that are sampled from a straight line, then the linear regression values (m, b) recover that line.
4
a challenge: given data (xi , yi , zi) try to find the best-fit plane z = ax + by + c by minimizing the sum of square-distances along the z axis. can you do it? what size matrices do you get? can you get explicit formulae for a, b, & c?
is another field in which optimization plays a part
this is a large subject, and we will merely sample it… specifically, we will look at
two person zero sum finite games
encode such games
each player has a finite set of strategies each round, they (privately) pick their strategies & play
P =
1 -7 -1 6 1
-3 5 -1 -1 0 3 5 -1 2 0 -3 -7 2
3
…
-4 6 -1 3
1 2 …
two players, A & B
m
n
player a “wins” and B “loses” the (EQUAL) payout value a and b play repeatedly
Pij = payoff from strategy choices ( wins 3 from )
some classical payofF matrices… rock, scissors, paper
P=
0 -1 1
1 0 -1
-1 1 0
this game is “fair” in that all plays are symmetric. you can “see” the fairness in the “skew-symmetry” T
P =-P
some classical payofF matrices… The even-odd game
P=
-2 3 3 -4
Two players: “ODD” AND “EVEN” EACH PUTS OUT 1 OR 2 FINGERS. IF THE SUM TOTAL IS ODD, ODD WINS THAT AMOUNT; ELSE EVEN WINS THE SUM
some classical payofF matrices… A MENDELSOHN GAME
P=
0 -1 2
1 -2 0 1 -1 0
EACH PLAYER CHOOSES 1, 2, OR 3 IF THE SAME NUMBER, NO PAYOFF IF YOU ARE HIGHER BY 1, YOU LOSE 1 POINT IF YOU ARE HIGHER BY 2, YOU WIN 2 POINTS
You played strategies at random ?
this is a very cool idea !
These must add up to 100%
.25 .4 .25 .1
.1
.2
.3
.1
.3
1 -3 5 -1 -7 -1 0 3 -1 5 -1 2 6 0 -3 -7
-4 6 -1 3
if player a plays strategy 2 for 40% of the games
& player b plays strategy 4 for 10% of the games
then this combination happens 4% of the time
P
with net expected payoff for this combination…
0.12
this is a very cool idea !
player A chooses among m strategies at random using a
a =
a1 a2 : am
player B chooses among n strategies at random using b
b = These must add up to one
b1 b2 : bn
T a Pb
random play leads to predictable average outcome
MIXED STRATEGY FOR THE EVEN-ODD GAME PAYOFF MATRIX
P=
-2 3 3 -4
As you play over & over, you can average the payoff per turn
Let’s say players choose 1 or 2 at random with some probability…
probability distributions:
a a= 1-a
the average payout (from P2 to P1) is:
f (a,b) = aT P b = -2ab + 3(1-a)b + 3a(1-b) -4(1-a)(1-b) = -4 + 7a + 7b -12ab
b b= 1-b
0≤a≤1 0≤b≤1
Each player has a separate random strategy
These are vectors adding up to 1
MIXED STRATEGY FOR THE EVEN-ODD GAME PAYOFF MATRIX
P=
-2 3 3 -4
As you play over & over, you can average the payoff per turn
^f = 7 – 12b = 0 ^a ^f = 7 - 12a = 0 ^b
7 b= 12 7 a= 12
b= a=
the average payout (from P2 to P1) is:
f (a,b) = aT P b = -2ab + 3(1-a)b + 3a(1-b) -4(1-a)(1-b) = -4 + 7a + 7b -12ab
But what kind of optimum is this?
7/ 12 5/ 12 7/ 12 5/ 12
MIXED STRATEGY FOR THE EVEN-ODD GAME
2
[D f]=
0 -12 -12 0
optimal average payout (from P2 to P1):
f ( 7/12 , 7/12 ) = -4 + 49/12 + 49/12 - 49/12 = 1/12 PLAYER 1 (“ODD”) HAS A SLIGHT ADVANTAGE & CAN GUARANTEE AN AVERAGE NET WIN
the average payout (from P2 to P1) is:
f (a,b) = aT P b = -2ab + 3(1-a)b + 3a(1-b) -4(1-a)(1-b) = -4 + 7a + 7b -12ab
NO MATTER WHAT p2 DOES, p1 WILL WIN ON AVERAGE AT LEAST 1/12 NO MATTER WHAT p1 DOES, p2 CAN LOSE ON AVERAGE NO MORE THAn 1/12
AT A NASH EQUILIBRIUM, neither PLAYER CAN DO better, GIVEN THE OPPONENT’S STRATEGY
AT A NASH EQUILIBRIUM, neither PLAYER CAN DO better, GIVEN THE OPPONENT’S STRATEGY
here’s a general (and powerfuL) theorem about saddle-point equilibria… minimax theorem:
given anY payoff matrix P, there exists a nash equilibrium ( a, b ). that is, there is a mixed strategy pair such that
maximize gain
T
T
max x ( Pb ) = min ( a P ) y x y
minimize LOSs this requires some deeper tools than calculus…
SOME 3-BY-3 nash equilibria With work, you can compute these…
0 -1 1
a=
1/ 3 1/ 3 1/ 3
1 0 -1
-1 1 0
b=
0 -1 2 1/ 3 1/ 3 1/ 3
a=
1/ 4 1/ 2 1/ 4
1 -2 0 1 -1 0
b=
0 1 -2 -2 0 4 3 -2 1 1/ 4 1/ 2 1/ 4
a=
9/ 23 7/ 23 7/ 23
b=
17/ 46 10/ 23 9/ 46
IN GENERAL, YOU HAVE TO BE CAREFUL (OR LUCKY) SOME GAMES HAVE SADDLE POINTS WITH COORDINATES THAT VIOLATE THE CONSTRAINTS (E.G., NEGATIVE) FOR NON-SQUARE MATRICES, YOU NEED A DIFFERENT APPROACH FOR FINDING THE NASH EQUILIBRIUM
1 what happens if the payoffs change as you plaY? 2 what happens if you have more than two players? 3 what happens if the set of strategies is not discrete?
the big picture PAYOFf matrices, acting on vectors of probability distributions, lead to optimal strategies, which give saDdle points
1
what is the payoff matrix for the even-odd game in which players choose numbers in the set { 1 , 2 , 3 , 4 } ?
2
consider the following payoff matrices: 0 2 -1 1 3 -3 A) b) c) -2 0 2 -2 0 1 3 -2 1 -2 0 2 -2 1 compute the payoff functions and the resulting nash equilibrium and expected payoff for each. -1 2
3
recall that a matrix A is skew-symmetric if AT = -A. is the product of two skew-symmetric matrices is also skew-symmetric?
4
try to compute the nash equilibrium for the payoff matrix: what goes wrong? why? what is the optimal strategy?
2 -1 3 -2
Often come with constraints
The size of the tumor must be non-negative
rent cannot exceed 60% of after-tax income
a rectangular box can be shipped only if the length plus the girth (perimeter of the cross-section) is below 120cm.
This engine must operate from -20•C to 40•C
total parts per million cannot exceed 350
the total expenditure on materials, capital equipment, & labor must not exceed available investment funds
is something you should remember…
bounded optimization in the 1-d case x -2 ≤ x ≤ 2 f(x) = x3 – 2x2 + x - 4
f' = 3x2 – 4x + 1 = 0 0 = ( 3x – 1 ) ( x – 1 ) x = 1/3 or x = 1 f" = 6x – 4
at the endpoints…
f(-2) = -22
f(2) = -2 however…
f( 1/3 ) = 4/27 – 4 < -2 f( 1 ) = -4
at x = 1/3,
f" = -2
at x = 1,
f" = 2
Ok, ok, so we have to check endpoints…
What’s the big deal? Just check the boundary points and move on…
the boundaries are not a finite set!
A simple boundary constraint
x, y, z x + y + z = 30 x, y, z ≥ 0 f( x, y, z ) = xyz use the constraint
z = 30 - x - y
then maximize
f( x, y ) = xy (30 - x - y) this is a nice, 2-d problem…
[ Df ] = 30y – 2xy – y2 0 = y(30–2x–y)
30x – x2 – 2xy 0 = x(30–x–2y)
two solutions to each equation
( 0 , 0 ) ( 0 , 30 ) ( 30 , 0 ) ( 10 , 10 ) there are four critical points to be classified. hmmmmm… i wonder which is the answer?
A simple boundary constraint
x, y, z x + y + z = 30 x, y, z ≥ 0 f( x, y, z ) = xyz
( 0 , 0 ) ( 0 , 30 ) ( 30 , 0 ) ( 10 , 10 ) there are four critical points. hmmmmm…i wonder which is the answer?
f( 0, 0 ) = 0*0*30 = 0 f( 30, 0 ) = 30*0*0 = 0 f( 0, 30 ) = 0*30*0 = 0 f( 10, 10 ) = 10*10*10 = 1000
2
[D f] =
–2y
30–2x–2y
30–2x–2y
–2x
=
-20 –10 –10 -20
det [ D2f ] = 300 > 0 tr [ D2f ] = -40 < 0
A simple boundary constraint
x, y, z
y
x + y + z = 30 x, y, z ≥ 0 f( x, y, z ) = xyz
( 0 , 0 ) ( 0 , 30 ) ( 30 , 0 ) ( 10 , 10 ) there are four critical points. hmmmmm…i wonder which is the answer?
z=0 x=0
y=0
x
since the function vanishes along the entire boundary of the legal domain, we conclude that the interior maximum is, indeed, the global maximum
is not a fixed boundary at all…
an optimal box without bounds
x, y, z xyz = 48 x, y, z ≥ 0
front/back = $1/ft2 top/bottom = $2/ft2 left/right = $3/ft2
f( x, y, z ) = 4xy + 2xz + 6yz
[ Df ] = 4y – 288/x2
there is a single critical point
use the constraint
z = 48 / xy
then minimize
f( x, y ) = 4xy + 96/y + 288/x this is a nice, 2-d problem…
2
[D f] =
4x – 96/y2
4y = 288/x2 4x = 96/y2 x=6 & y=2
576/x3
4
4
192/y3
= (6,2)
8/ 3
4
4
24
det > 0 tr > 0
an optimal box without bounds
x, y, z xyz = 48 x, y, z ≥ 0
front/back = $1/ft2 top/bottom = $2/ft2 left/right = $6/ft2
y
as x _ 0 f_∞
f( x, y ) = 4xy + 96/y + 288/x critical point at
x=6 & y=2 this is a local minimum
as y _ 0, f _ ∞
x
since the function has a single local minimum and “blows up” to infinity along all boundaries and as you go off “to infinity”, it is a global minimum
that all problems are this simple
a parameterized boundary
x, y
2
-2 ≤ x, y ≤ 3
[D f] =
a stress function
f( x, y ) =
x3 -
6xy +
3y2
use the derivative
^f/^x = 3x2 - 6y = 0 ^f/^y = -6x + 6y = 0 ( 0, 0 ) & ( 2, 2 )
2
[D f] =
6x
-6
-6
6
6x
-6
-6
6
= (0, 0)
= (2, 2)
0 -6 -6
6
12 -6 -6
det < 0
6
det > 0 tr > 0
there are two critical points in the interior of the square, and one is a local minimum. one would guess that this is the desired minimum; however, the entire boundary must be examined…
a parameterized boundary
x, y -2 ≤ x, y ≤ 3 a stress function
f( x, y ) = x3 - 6xy + 3y2 use the derivative
^f/^x = 3x2 - 6y = 0 ^f/^y = -6x + 6y = 0
f( x, 3 ) = x3 - 18x + 27 -2 ≤ x ≤ 3 f' = 3x2 - 18 f' = 0 at x = 6 f" = 6x > 0 at x = 6 local min at ( 6 , 3 ) local max at ( -2 , 3 ) local max at ( 3 , 3 )
y
x
a parameterized boundary (-2, 3)
( 6 , 3)
x, y -2 ≤ x, y ≤ 3 a stress function
f( x, y ) = x3 - 6xy + 3y2
(-2,-2)
(3,-2)
can you imagine what it would take to do this problem for a 3-d cube? or a more complex shape?
the big picture Constraints are common (& commonly difficult!) Be sure to check the boundary, boundaries, & “infinity” as needed
1
compute the maximal volume of a cone whose radius, r, and height, h, must satisfy the constraint r+h = 10, with r and h positive.
2
find the point on the plane 2x + y - z = 4 which is closest to the origin. hint: compute the minimum of the square of the distance from ( x, y, z ) to ( 0, 0, 0 ). be sure to argue why it’s a minimum, based on what happens as you go to infinity.
3
find the maxima and minima of f(x, y) = exe-y on the rectangle given by -1 ≤ x ≤ 1 and -2 ≤ y ≤ 2. hint: don’t forget corners!
4
what is the maximum and minimum of the function f(x, y) = x2y - 2xy on the disc of radius two given by x2 + y2 ≤ 4? note: you need compute the extrema in the interior of the disc and then on the boundary. hint: parametrize the boundary using an angular variable θ via x = 2 θ, y = 2 θ.
5
challenge: maximize f(x, y) = x2 - 2y2 - 3x + y on the square given by -1 ≤ x, y ≤ 2
Is almost never easy, but…
constrained to a circle… f(x,y) = x - 2y
y
hey! no critical points on 2
x2 + y2 – 4 = 0
x x(t) = 2 t y(t) = 2 t t = 0…2π f(t) = x(t) – 2y(t) = 2 t - 4 t f'(t) = -2 t - 4 t = 0 t = (-2) the usual singlevariable method
max:
(
2 5
,
-4 5 )
-2 4 min: ( 5 , 5
)
if you can parametrize the constraint set
constrained to a circle… level sets of f
y
f(x,y) = x - 2y x2 + y2 – 4 = 0
x look at the level sets of f… it appears that the extrema of f, when restricted to the constraint set, occur precisely where the level set is tangent to the constraint set
max:
(
2 5
,
-4 5 )
-2 4 min: ( 5 , 5
)
f(x,y) = x - 2y G(x,y) = 0
y
◊G ◊f
the level sets of f are tangent to the constraint set G=0 precisely where ◊f is parallel to ◊G 1 ◊f = -2
1 = 2λx -2 = 2λy
2x ◊G = 2y
y = -2x
x
◊ f = λ◊G x2 + y2 = 4 5x2 = 4
max:
(
2 5
,
-4 5 )
-2 4 min: ( 5 , 5
)
ARE THE RIGHT PERSPECTIVE
a constrained max/min is not a max/min of the full function
Is it always the case that the constraint level set and the optimal function level sets are tangent? With parallel gradients?
AT OPTIMA, LEVEL SETS OF THE FUNCTION AND THE CONSTRAINT LEVEL SET ARE TANGENT!
◊ f ⊥ { G=0 }
in 2-d & beyond!
◊ f | | ◊G
a general method for constrained optimization
n
n
given (differentiable) functions f : _ and G : _ anY exTremum a of f(x) restricted to G(x) = 0 must satisfy The fine print: make sure that [DG] ≠ 0 at a
[Df] a = λ[DG]a for some λ
this is equivalent to finding unconstrained n+1 optima of L(x, λ) = f - λG : _ the “lagrangian”
◊a f = λ◊G a equivalently
a simple example f( x , y ) = x2 – 2xy + 2y2 G( x , y ) = 2x – 3y –5 = 0 Lagrange:
[Df] = λ[DG]
^ ^x
2x – 2y = 2λ
^ ^y
-2x + 4y = -3λ
λ = x-y solve -2x + 4y = -3(x-y) substitute x+y=0 simplify y = -x 2x – 3(-x) – 5 = 0 substitute into G=0
solve
x=1
y = -1
can you “see” that this is a minimum?
this is certainly solvable by other means; however, the lagrange multiplier method is “automatic” and requires little other than algebra…
to solve constrained optimization you can convert to unconstrained optimization with an extra variable
L a G R a N G e e Q U a T I O N s
[Df] = λ[DG] G=0 to optimize f(x) constrained to the level set of G(x)
means different things to different people
force of constraint
shadow price
the multiplier is the rate of change of the optimal value with respect to the constraint value
I’ll tell you
consider the constraint value as a variable, c (think “cost”)
G(x) = c then use lagrange to optimize f(x) when constrained by G-c=0
[Df] = λ[DG] G-c= 0 this gives n+1 equations on n+2 variables (x, λ, c) d dc
f (x(c)) = λ
thanks to the implicit function theorem, we can solve for the optima of f as a function of c (locally)
x = x(c)
the rate of change of the optimal f-value is: df ^f dx = ^x dc dc ^G dx = λ ^x dc dG = λ dc
=λ
the lagrange multiplier at this local max is larger…
…than the lagrange multiplier at this local max
because of how the critical values change with a small change in constraint value
can we get back to solving some real problems?
the big picture lagrange’s method converts constrained optimization into
unconstrained optimization with an extra variable…
the lagrange multiplier
1
use the lagrange equations to find the point on the plane ax + by + cz = 1 closest to the origin. Hint: extremize the square-distance f = x2+y2+z2.
2
use the lagrange equations to compute a formula for the minimal distance D from a point (x0, y0) in the plane to a line of the form ax + by = 1. hint: compute the minimal square-distance D2, then take its square root.
3
challenge: try to generalize the previous problem to n dimensions, finding the minimal (square) distance from the point P = (x1, … , xn) to the hyperplane given by the equation a1x1 + a2x2 + … + anxn = 1. this may require courage.
4
consider the cost function f = xy for x, y > 0, constrained to a level set of the form ax + by = C, for a, b, C > 0. draw the level sets of the cost and constraint functions in the plane. use lagrange to solve for the optima in terms of C. what does the lagrange multiplier λ tell you? as cost C is increased, does the lagrange multiplier λ get larger or smaller? can you “see” the answer?
L a G R a N G e e Q U a T I O N s
[Df] = λ[DG] G=0 to optimize f(x) constrained to the level set of G(x)
Is straightforward, but…
A supply-chain problem
commodities are consumed at constant rate & restocked when depleted. stock reorders happen y times per year; each time ordering amount x a total amount of S is needed per year: G( x , y ) = xy – S = 0 f ( x , y ) = ax + by
[Df] = λ[DG] a = λy a/y = λ
a ~ average storage cost ; b ~ order delivery cost
lagrange ^ ^x ^ ^y
b = λx
b/x = λ G
x b y = a
a 2 b 2 S = xy = a y = b x
x=
bS a
reorder this much
y=
aS b
at this frequency
That “degenerate” optima can be rejected
find the maximal inscribed cone V( r, h ) = π r2 h/3 G( r, h ) = ( h – R )2 + r2 – R2 = 0 lagrange [DV] = λ[DG] ^ ^r
2πrh/3 = 2λr r=0
λ = πh/3
0 = ( h – R )2 + r2 – R2 0 = ( h – R )2 + 2( h – R ) h - R2 0 = 3h2 – 4hR
^ ^h
R
h
πr2/3 = 2λ(h-R)
πr2/3 = 2 (h-R) πh/3
R
h-R r
r2 = 2 (h-R) h
0 = h (3h – 4R) h=0
h = 4R/3
h = 4R/3 r2 = 2 (h-R) h = R2/8
These could have been solved with substitution & “single-variable” methods
now in 3-d! With extra linear algebra! f ( x, y, z ) = xy + yz - x2 + y2 - z2 G( x, y, z )
= x2 + y2 + z2
-8=0
The lagrange equations are: ^ ^x ^ ^y ^ ^z
y - 2x = 2λx x + z + 2y = 2λy y – 2z = 2λz
0 -2-2λ 1 1 2-2λ 1 0 1 -2-2λ
x y
=
z
0 0 0
NEED A NONZERO SOLUTION FOR THE constraint TO HOLD… THIS ONLY HAPPENS IF…
DET = 0 -2-2λ
1
λ = -1
0 1
1 2-2λ =0 0 1 -2-2λ compute the determinant… 2
-4 ( 1 + λ ) ( 2λ – 3 ) = 0
OR
λ=± 3 2
now in 3-d! With extra linear algebra! f( x, y, z ) = xy + yz - x2 + y2 - z2 G( x, y, z ) λ = -1 0 1 0
= x2 + y2 + z2
-8=0
solve 1 4 1
0 1 0
x y z
=
0 0 0
this leads to two equations…
0 -2-2λ 1 1 2-2λ 1 0 1 -2-2λ
y = 0 x = -z
x y
=
z
substitute into the constraint equation…
x2 + y2 + z2 - 8 = 0 x2 + 0 + (-x)2 = 8
0 0 0
NEED A NONZERO SOLUTION FOR THE constraint TO HOLD… THIS ONLY HAPPENS IF…
λ = -1 OR
λ=± 3 2
x = ±2 ; y = 0 ; z = -x
λ = -1 x = 2 , y = 0, z = -2 OR
x = -2 , y = 0, z = 2
λ=± 3 2
x=z=±
2
3± 3 2
is this awful? yes. it is awful. y=(2± 6)x
You can still get results from lagrange
Economics without numbers
invest in an amount E of equipment at unit rate cost e invest in an amount L on labor at unit rate cost l fixed production function P = P( E , L ) : constant output C( E , L ) = e E + l L Lagrange: [DC] = λ[DP] ^ ^E ^ ^L
^P ^E ^P l =λ ^L
e =λ
solve for 1/λ equate
1 ^P = 1 ^P e ^E l ^L at the optimal cost, equals
higher dimensional problems are really hard
a high-dimensional maximization n
variables xi ≥ 0 for i=1…n n
G=# i=1
x2i
Lagrange: ^ ^xj
either
xj = 0
not maximal
i=1
2
=a ≥0
n
f = x1 x2…xn-1xn = Π 2 2
G = #x2i = a2 ≥ 0
2
2
x2i i=1
[Df] = λ[DG] i≠j
by symmetry, we conclude for all j
a n
for all i
this must be the maximum since the function is positive & all other critical points have value zero
2xj Π x2i = 2xj λ
Π xi2 = λ _ xj = i≠j
xi =
a2 n =
constant for all j
the maximal value is n n 2 2 = a = Π x fmax i=1 i n
n
variables xi ≥ 0 for i=1…n n
G=# i=1
x2i
=a ≥0
n
f = x1 x2…xn-1xn = Π
n
f =Π
x2i i=1
a2
≤ n
2
x2i i=1
2
n
# xi = a2 i=1 n
i=1
2
2 2
for xi ≥ 0 and
G = #x2i = a2 ≥ 0
we have n
n
1 2 xi = n# i=1
= n
Π
xi2 i=1
1/n
fmax 1
n
≤ n #x2i i=1
xi =
a n
a2 n =
for all i
the maximal value is n n 2 a 2 Π xi = n fmax = i=1
n
Π y i=1 i
1/n
1
n
≤ n # yi i=1
relies on core inequalities many of which are proved via lagrange optimization
the big picture When using lagrange’s method…
Stay calm & Do the algebra
1
recall (from exercises earlier in this text) the cobb-douglas model for P (production) in terms of labor x and materials y is P = C xα yβ, where α, β, C > 0 are constants and α + β = 1. assuming that labor costs A dollars per unit and materials cost B dollars per unit, use the lagrange method to optimize resource allocation x, y so as to maximize P with a fixed cost K.
2
the lagrange multiplier λ has a definite meaning in load balancing for electric network problems. Consider three generators that can output xi megawatts, i=1…3. Each generator costs Ci = 3xi + (i/40)xi2. If the total power needed is 1000 MW, what load balance (x1, x2 , x3) minimizes cost? in this problem, what is λ and what are the units of λ? if you are operating at the optimal load balance, and a request comes in for additional power at $20/MW, is this a good price? how do you determine this?
3
assume a solid body with three principal normal stresses σ1 > σ2 > σ3 > 0. along a plane with unit normal vector n, the shear stress is given by the function: τ2 = σ12 n12 + σ22 n22 + σ32 n32 – ( σ1 n12 +σ2 n22 +σ3 n32 )2 write out the lagrange equations for extremizing the shear stress τ as a function of the normal vector n satisfying n12 + n22 + n32 = 1. try to solve! (hint: there are 3 solutions, each of which has one components ni=0. what is the maximal shear stress τ as a function of normal stresses?
4
challenge: it’s hard to do a problem in n-dimensions, but here’s an example. Minimize a quadratic form f(x) = xT Q x for a positive definite square matrix Q subject to a hyperplane constraint x * b = 1 for a constant nonzero vector b. as a warm-up, you might want to start with a simpler case… A) let Q be a 2-by-2 matrix with determinant and trace both positive b) let Q be an n-by-n diagonal matrix with positive diagonal terms
modern applications of derivatives
n
a function f: _ to be optimized while satisfying constraints n m G(x) = 0 for G: _
the solutions to the constrained optimization problem equal those of an unconstrained problem with more variables, one for each constraint… n+m
Define L:
_
L( x , λ ) = f (x) – λ*G (x) where λ is a vector of m lagrange multipliers
n
a function f: _ to be optimized while satisfying constraints n m G(x) = 0 for G: _
n+m
Define L:
_
L = f – λ*G
λ = vector of m lagrange multipliers
0 = [DL] =
[Df] – λ* [ DG]
G1 G2 … Gm
^ ^x
^ ^λ
This gives two sets of equations
[Df] = λ* [ DG] G=0
classifying maxima/minima/saddles
n
In the unconstrained case of f: _ 2 the eigenvalues of the 2nd derivative [D f] determine everything! 2
the derviative [D f] has n eigenvalues 2nd
they are all real, 2 since [D f] is symmetric
2
if the eigenvalues of [D f] are… all positive all negative mixed +/some zero
local min local max saddle ???
this is not so simple…careful!
2
in the constrained setting, the 2nd derivative [D L] is used…
yes. oh, yes. oh, yes yes yes yes yes.
three time-dependent variables with linearly related evolution
x' = 3x - 2y + z y' = x + 4y - 2z z' = 5x - y + 3z
can be expressed as a simple first-order linear system
x' y' z'
=
you can check that the solution to this is…
x(t) = eAt x(0)
3 -2 1 1 4 -2 5 -1 3
x y z
that is…
x' = Ax using matrixvector notation
this is, of course, the old single-variable story rewritten with matrices
for a square matrix, the exponential is, clearly, defined as…
eAt
= I + At +
for a diagonal matrix…
A=
λ1 0 0 0 λ2 0 0 0 λ3
1 2!
2
(At) +
3 1 3! (At) +
… +
the exponential is too λ 1t
eAt =
e 0
0 0 λ2t e 0
0
0
eλ3t
1 n!
n
(At) + … in the same way that eigenvalues classify critical points in optimization, eigenvalues classify dynamical equilibria…
the solutions can be classified into types based on eigenvalues reminiscent of minima/saddles/maxima
notice how these have an equilibrium (constant solution) at zero… for complex pairs of eigenvalues, one gets spiraling behavior: either stable (sink), unstable (source), or “balanced” (center)
when the system is not linear & not solvable…
x = F(x) FOR A NONLINEAR SYSTEM, YOU FIND THE EQUILIBRIA (CONSTANT SOLUTIONS) AND LINEARIZE THE DYNAMICS BY COMPUTING THE DERIVATIVE…
THE EIGENVALUES OF THE DERIVATIVE AT AN EQUILIBRIUM TELL YOU ABOUT THE NONLINEAR DYNAMICS… LOCALLY!
more interesting things can happen
don’t panic if that all doesn’t make much sense… the epilogues are meant to be mind-crunching
are, of course, anti-derivatives
d dx
f:_ f' : _
n
∫-dx
f: _ D
m
?
[Df] the derivative is not the same type of function… so you can’t invert!
n
let f : _ be an “integrable” n function on a region R ⊂
@ f dx
is a limit
R
(like everything else in mathematics)
over higher-dimensional domains
n
let f : _ be a “sufficiently integrable” n function on a region R ⊂ . then,
@ f dx = @( @ … ( ( @f dx )dx ) … dx ) dxn R
1
2
n-1
these “partial integrals” are analogous to undoing partial derivatives…
is not computing the integral
limits on a triple integral 1
1
@ @ @
1-x
z
f dz dx dy
y
y=-1 x=y2 z=0
=
1
1-x
@ @ @
1-x
f dy dz dx
x
x=0 z=0 y=- 1-x
these don’t look the same, but they are!
(0,1)
z
(0,1)
z=1-x
z
y
z=1-y2
(0,0) (0,0)
(1,0)
x
(-1,0)
x
(1,0)
y
(1,1)
x=1-y2
(1,-1)
in all manner of applications
r
r l
r
2M r2 5
2 r 5
2M r2 3
2 r 3
M 2 2 ( 3r + l ) 12
3r2
2
+l 12
are mostly limited to one tool
^u ^x
@
h(u) du = F(D)
@
u = F( x )
h(F(x)) det [DF] dx
D
difficult & useful integrals
z
z
y
x
r = x2 + y2 θ = ( y/x ) z=z
y
x
ρ = x2 + y2 + z2 θ = ( y/x ) φ = ( z/ρ )
n
For a surface S⊂ parametrized by n G : R _ the surface area element is
s G = t
*** x
n
transpose!
T
dσ = det [DG] [DG] ds dt square root!
x1 x2
compute the volume Bn and “surface area” Ωn of the unit-radius ball in n
ρn-1dΩn
dVn = ρn-1 dρ dΩn take advantage of spherical coordinates…
dVn = ρ = dΩn =
volume element radial coordinate solid angle element
we will need to use the gamma function…
Γ(x) =
@
∞
-t x-1
e t
t=0
dt = (x-1) !
the big picture in higher dimensions, one needs to integrate definitely & iteratively the difficulty is not integrating… WHAT’S HARD IS setting it up & choosing proper coordinates!
bY
Robert ghrist Is the andrea Mitchell professor Of mathematics and Electrical & systems engineering at the university of pennsylvania
He’s an award-winning researcher, teacher, writer, & speaker his 1995 ph.d. is in applied mathematics from cornell
Good textbooks on calculus that use matrices & matrix algebra: Colley, S. J., Vector Calculus, 4th ed., Pearson, 2011. Hubbard, J. and Hubbard, B. B., Vector Calculus, Linear Algebra, and Differential Forms: A Unified Approach, 5th ed., Matrix Editions, 2015. Good introduction to game theory: Ferguson, T., Game Theory, 2nd ed., web site, 2014. https://www.math.ucla.edu/~tom/Game_Theory/Contents.html
Good introduction to least-squares regression & more Boyd, S., and Vandeberghe, L., Introduction to Applied Linear Algebra – Vectors, Matrices, & Least Squares, Cambridge, to appear, 2018. Good graduate level text on optimization theory & applications: Boyd, S., and Vandeberghe, L., Optimization Theory, Cambridge, 2004.
all writing, design, drawing, & layout by prof/g [Robert ghrist] prof/g acknowledges the support of andrea Mitchell & the fantastic engineering students at the university of pennsylvania during the writing of calculus blue, prof/g’s research was generously supported by the united states department of defense through the ASDR&E vannevar bush faculty fellowship