245 36 5MB
English Pages 450 Year 2007
Stochastic Economic Dynamics
Bjarne S. Jensen & Tapio Palokangas (Editors)
Stochastic Economic Dynamics
Copenhagen Business School Press
Stochastic Economic Dynamics © Copenhagen Business School Press, 2007 Printed in Denmark by Narayana Press, Gylling Cover design by BUSTO│Graphic Design First edition 2007 e-ISBN 978-87-630-9982-0
Distribution: Scandinavia DBK, Mimersvej 4 DK-4600 Køge, Denmark Tel +45 3269 7788 Fax +45 3269 7789 North America International Specialized Book Services 920 NE 58th Ave., Suite 300 Portland, OR 97213, USA Tel +1 800 944 6190 Fax +1 503 280 8832 Email: [email protected] Rest of the World Marston Book Services, P.O. Box 269 Abingdon, Oxfordshire, OX14 4YN, UK Tel +44 (0) 1235 465500, fax +44 (0) 1235 465655 E-mail: [email protected]
All rights reserved. No part of this publication may be reproduced or used in any form or by any means graphic, electronic or mechanical including photocopying, recording, taping or information storage or retrieval systems - without permission in writing from Copenhagen Business School Press at www.cbspress.dk
Contents in Brief Introduction
1
Part I: Developments in Stochastic Dynamics 1. Fractional Brownian Motion in Finance
11
2. Moment Evolution of Gaussian and Geometric Wiener Diffusions
57
3. Two-Dimensional Linear Dynamic Systems with Small Random Terms
101
4. Dynamic Theory of Stochastic Movement of Systems
133
Part II: Stochastic Dynamics in Basic Growth Models and Time Delays 5. Stochastic One-Sector and Two-Sector Growth Models in Continuous Time
167
6. Comparative Dynamics in a Stochastic Growth and Trade Model with a Variable Savings Rate
217
7. Inada Conditions and Global Dynamic Analysis of Basic Growth Models with Time Delays
229
8. Hopf Bifurcation in Growth Models with Time Delays
247
Part III: Intertemporal Optimization in Consumption, Finance, and Growth 9. Optimal Consumption and Investment Strategies in Dynamic Stochastic Economies
271
10. Differential Systems in Finance and Life Insurance
317
11. Uncertain Technological Change and Capital Mobility
361
12. Stochastic Control, Non-Depletion of Renewable Resources, and Intertemporal Substitution
381
13. Capital Accumulation in a Growth Model with Creative Destruction
393
14. Employment Cycles in a Growth Model with Creative Destruction
423
i
Table of Contents Introduction
1
Bjarne S. Jensen and Tapio Palokangas
Part I: Developments in Stochastic Dynamics 1. Fractional Brownian Motion in Finance
11
Bernt Øksendal 1.1 Introduction 1.2 Framework and definitions 1.3 Classical white noise theory and Hida-Malliavin calculus 1.4 Fractional stochastic calculus 1.5 Summary of results 1.6 Concluding remarks 2. Moment Evolution of Gaussian and Geometric Wiener Diffusions
11 12 16 30 40 53 57
Bjarne S. Jensen, Chunyan Wang, and Jon Johnsen 2.1 Introduction 2.2 Structure of basic diffusion processes 2.3 Dynamics of first-order and second-order moments 2.4 Expectation vector functions 2.5 Covariance matrix functions 2.6 Probability density functions 2.7 Final comments Appendices 3. Two-Dimensional Linear Dynamic Systems with Small Random Terms
57 59 64 68 71 84 92 92 101
Nishioka Kunio 3.1 3.2 3.3 3.4 3.5 ii
Introduction Non-random dynamic system Lyapunov index of the random system One-dimensional diffusion process in an interval Spiral point and center
101 102 106 109 113
3.6 Saddle point 3.7 Improper and proper node 4. Dynamic Theory of Stochastic Movement of Systems
117 127 133
Masao Nagasawa 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9
Dynamic theory of stochastic processes Kinematic theory Sample path equation in kinematic theory Mechanics and the equation of motion Evolution function and kinematic equation Exponent of motion and initial condition Examples Schr¨odinger’s wave theory and dynamic theory Sample paths of motion governed by the Schr¨odinger equation 4.10 Interference phenomena and entangled motion
133 134 135 137 140 142 143 146 147 159
Part II: Stochastic Dynamics in Basic Growth Models and Time Delays 5. Stochastic One-Sector and Two-Sector Growth Models in Continuous Time
167
Bjarne S. Jensen and Martin Richter 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9
Introduction Neoclassical technologies and CES forms Stochastic one-sector growth models Boundaries, steady state, and convergence Explicit steady-state distribution with CD technologies Sample paths and asymptotic densities with CD and CES technologies General equilibria of two-sector economies Dynamics of two-sector economies Sample paths of two-sector models and CES
167 170 172 177 188 190 202 206 210 iii
6. Comparative Dynamics in a Stochastic Growth and Trade Model with a Variable Savings Rate
217
Zhu Hongliang and Huang Wenzao 6.1 Introduction 6.2 Stochastic dynamic systems for trading economies 6.3 Comparative dynamics and policy parameters 7. Inada Conditions and Global Dynamic Analysis of Basic Growth Models with Time Delays
217 218 221 229
Zhu Hongliang and Huang Wenzao 7.1 7.2 7.3 7.4 7.5
Introduction Neoclassical growth model with a time delay Dynamics with delays in production and depreciation Persistent oscillation in a growth model with delay Final comments
8. Hopf Bifurcation in Growth Models with Time Delays
229 231 236 240 245 247
Morten Brøns and Bjarne S. Jensen 8.1 8.2 8.3 8.4 8.5 8.6
Introduction Dynamics of growth and cycles Hopf bifurcation analysis CD technologies and time delays CES technologies and time delays CES and delays with cycles, square waves, and chaos 8.7 Final comments
247 249 250 257 260 261 265
Part III: Intertemporal Optimization in Consumption, Finance, and Growth 9. Optimal Consumption and Investment Strategies in Dynamic Stochastic Economies
271
Claus Munk and Carsten Sørensen 9.1 Introduction 9.2 Consumption and investment in complete markets 9.3 Results for CRRA utility in general markets iv
271 276 281
9.4 Examples 9.5 Extensions 9.6 Concluding remarks Appendix 10. Differential Systems in Finance and Life Insurance
290 301 309 310 317
Mogens Steffensen 10.1 Introduction 10.2 The differential equations of Thiele and Black-Scholes 10.3 Surplus and dividends 10.4 Intervention 10.5 Quadratic optimization 10.6 Utility optimization 11. Uncertain Technological Change and Capital Mobility Paul A. de Hek 11.1 Introduction 11.2 Framework of the model 11.3 The effect of uncertainty on growth 11.4 Conclusion Appendices 12. Stochastic Control, Non-Depletion of Renewable Resources, and Intertemporal Substitution
317 321 332 338 344 352 361 361 363 369 374 375 381
Nils Chr. Framstad 12.1 12.2 12.3 12.4 12.5
Introduction The preferences The optimal control problem Non-optimality of immediate total depletion Concluding remarks
381 382 385 387 391
v
13. Capital Accumulation in a Growth Model with Creative Destruction
393
Klaus W¨alde 13.1 Introduction 13.2 Framework of the model 13.3 Solving the model 13.4 Cycles and growth 13.5 Conclusions Appendices 14. Employment Cycles in a Growth Model with Creative Destruction
393 395 399 402 410 411 423
Tapio Palokangas 14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8
vi
Introduction Technology R&D and capital accumulation Capitalists Wage settlement Economic growth Cycles Conclusions
423 425 427 428 430 432 435 436
Introduction
Bjarne S. Jensen University of Southern Denmark and Copenhagen Business School Tapio Palokangas University of Helsinki and HECER
A unity of aim and a diversity of topics shaped the contents of this volume. Although each chapter can be read independently as a selfcontained presentation, the book is more than a collection of the individual contributions. Difficult subjects are interrelated, juxtaposed, and examined for consistency in various disciplinary, theoretical, and empirical contexts. The major unifying theme of this joint work is the coherent and rigorous treatment of uncertainty and its implications for describing relevant stochastic processes through basic (prototype, core) models and differential equations of stochastic dynamics.
Part I: Developments in Stochastic Dynamics 1. Bernt Øksendal. Fractional Brownian Motion in Finance. Stochastic processes in continuous time are here described by a fractional Brownian motion (Wiener process) in which the stochastic increments are not necessarily independent. The increments have a covariance function, for which the size of the Hurst parameter (H) is critically important. When H = 12 , the fractional Brownian motion coincides with the classical Brownian motion. If H > 12 , the stochastic increments have a positive autocorrelation (motion is persistent). If H < 12 , increments have a negative autocorrelation (motion is antipersistent). This chapter gives a survey of the theory of stochastic
Bjarne S. Jensen, Tapio Palokangas calculus (integrals) with fractional Brownian motion and discusses the applications of fractional stochastic calculus to financial markets. Asset prices are described as solutions of stochastic differential equations that are driven by the generalized stochastic processes. 2. Bjarne S. Jensen, Chunyan Wang, and Jon Johnsen. Moment Evolution of Gaussian and Geometric Wiener Diffusions. This chapter analyzes two basic stochastic models: The time homogeneous Gaussian and the geometric Wiener diffusion of twodimensional vector processes. Using the theory of stochastic processes and Ito’s lemma, the probability distributions of the stochastic state vectors are described by the evolution of their moments (expectation vector and covariance matrix as functions of time). These moments satisfy certain systems of ordinary (deterministic) differential equations. By solving these ODE, the authors present explicit solutions for the first-order and second-order moment functions. Kolmogorov’s forward equation is used to derive the results by alternative methods and to gain information on the probability distributions. The general closed form results for these moment evolutions - still unavailable have many applications in models of linear dynamics with uncertainty. 3. Nishioka Kunio. Two-Dimensional Linear Dynamic Systems with Small Random Terms. Chapter 3 links up with chapter 2 and further studies the asymptotic behavior of the time paths of two-dimensional linear dynamic systems that are perturbed by small random terms. Economic growth is traditionally treated as a non-random dynamic system. If the system is linear and two-dimensional, it can be classified as one of five wellknown types, according to its long-run (asymptotic) behavior. With uncertainty involved in economic growth, the asymptotic behavior of a system with random perturbations is important to investigate, for example, when analyzing steady state properties of economic growth. If the random perturbations are small, the asymptotic behavior (time paths) of the linear stochastic system is the same as in non-random cases, unless the relevant dynamic system is a circle or a proper node.
2
Introduction 4. Masao Nagasawa. Dynamic Theory of Stochastic Movement of Systems. The dynamic theory of stochastic movement of systems contains a general mathematical theory of random motion - consisting of two parts, stochastic kinematics and stochastic mechanics. The stochastic kinematics is analytically described by Kolmogorov’s PDE equation, which - with its drift coefficient and diffusion coefficient - uniquely characterizes the transition probability distribution when an initial distribution is prescribed. The stochastic mechanics contains the mechanical equation of motion, which, in addition to the Kolmogorov equation, includes a potential function of external forces. The potential function determines a so-called induced drift coefficient. This induced drift coefficient in turn enters a new kinematic (Kolmogorov) equation that fully describes the relevant transition probability density of the observed stochastic process. However, Kolmogorov equations are not easy to solve, except in some simple cases. Therefore, to analyze the stochastic processes, it is often better to use Ito’s stochastic differential equations (SDE), and in solving them, we have the powerful tools of sample path analysis, in particular, L´evy’s formula and Ito’s formula. The dynamic theory of stochastic motion (mechanics) is then applied to Quantum Mechanics (Schr¨odinger’s complex “wave equation”). Sample paths in one and two dimensions of simple motions governed by Schr¨odinger‘s equation are illustrated. Finally, the methodology is applied to the Schr¨odinger equation with Coulomb potential to obtain the sample path of the electron in the hydrogen atom. Here the critical (“attractive”) radius of the solved radial motions (sample paths) agrees with the classic Bohr radius expression for the “stationary states” of hydrogen. The conceptual existence of sample paths (stochastic trajectories) has been controversial (even denied) in quantum dynamics. Because sample paths and stochastic differential equations are the natural generalization of deterministic dynamics in Economics, the chapter has devoted a keen effort of calculation to demonstrate some particular sample paths for states of hydrogen motion that are governed by the universal Schr¨odinger equation.
3
Bjarne S. Jensen, Tapio Palokangas Part II: Stochastic Dynamics in Basic Growth Models and Time Delays 5. Bjarne S. Jensen and Martin Richter. Stochastic One-Sector and Two-Sector Growth Models in Continuous Time. This chapter extends the basic deterministic one-sector and two-sector growth models to a stochastic context in continuous time, using Wiener processes for the description of various sources of uncertainty in the growth rate of the labor force, the rate of capital depreciation, and the saving rate. The drift and diffusion coefficients of the stochastic dynamic systems are homogeneous of degree one in the two state variables, labor and capital - which allows a reduction to the onedimensional stochastic dynamics of the capital-labor ratio. The crucial issue of absorbing boundaries for the stochastic growth models is rigorously examined, and simple criteria (sufficient conditions) for inaccessible boundaries are established, a subject that the literature has not yet adequately addressed. The steady state probability distribution of the capital-labor ratios is derived from Kolmogorov’s forward equation. For stochastic one-sector and two-sector growth models, the sample paths - of the transition to steady states or persistent (endogenous) growth - of particular state variables are simulated for many parametric specifications of the CD and CES sector technologies involved. The impacts of technology shocks are similarly demonstrated. All relevant sample paths are exhibited on both shorter and longer time horizons. 6. Zhu Hongliang and Huang Wenzao. Comparative Dynamics in a Stochastic Growth and Trade Model with a Variable Savings Rate. The authors consider neoclassical two-sector growth models of a small country that is trading in both commodities in stochastic environment in continuous time, and they use a saving function for which the rate of saving depends on the capital-labor ratio and a policy parameter. The global comparative dynamic properties of the capital accumulation process are studied with respect to changes in the policy parameter. By characterizing the entire time path of the capital accumulation process, the effect of the policy parameter can be determined. The time path of the capital-labor ratio satisfies a monotonicity property if the saving function changes monotonically with respect to a policy parameter. In addition, the impact of the policy parameter on the steady-state distribution of the capital-labor ratio is analyzed. 4
Introduction 7. Zhu Hongliang and Huang Wenzao. Inada Conditions and Global Dynamic Analysis of Basic Growth Models with Time Delays. In economics, time delays are often neglected in continuous time dynamics, no doubt due to the difficulty in solving and analyzing such models. Nevertheless, economic development depends not only on the current state, but also on past states (history), so delay phenomena also influence the dynamic characteristics of economic systems. This chapter introduces time delays in a particular neoclassical growth model. The global conditions for steady-state stability and/or persistent oscillation around a steady state are obtained and analyzed. It is shown that oscillations in growing economies are not rare, but common. 8. Morten Brøns and Bjarne S. Jensen. Hopf Bifurcation in Growth Models with Time Delays. Chapter 8 complements the global analysis of particular differencedifferential equations in chapter 7 by performing a non-linear local analysis of the dynamics of the delay model when the size of time delay is close to a critical value. For this critical value, a Hopf bifurcation (of a fixed-point into a closed orbit in the neighborhood of the equilibrium) occurs, that is, periodic solutions (“limit cycles”) are created when the steady state solution of the capital-labor ratio loses its local stability. Analytical criteria are derived to determine the stability types (supercritical or subcritical) of the periodic solutions (limit cycles). Finally, it is shown that the delay model with CES production functions can exhibit dynamics with solutions which have been observed in other delay-differential equations: square waves and chaos (aperiodic waves/cycles). Simulations illustrate the analytical results and theorems.
5
Bjarne S. Jensen, Tapio Palokangas Part III: Intertemporal Optimization in Consumption, Finance, and Growth 9. Claus Munk and Carsten Sørensen. Optimal Consumption and Investment Strategies in Dynamic Stochastic Economies. The authors derive optimal consumption and investment strategies of an investor with a CRRA utility of consumption and terminal wealth and with access to trade in a complete, but otherwise very general, financial market. Interest rates, excess expected returns, price volatilities, correlations, and consumer prices may all evolve stochastically over time, even with non-Markovian dynamics. The risks that individuals want to hedge are shown, as well as how to finance a desired real consumption process by investing in a market of nominal securities. The general results are extended to the case of a HARA utility and power-linear habit utility. The chapter also discusses how labor income and undiversifiable shocks should be included in the consumer price index. In the special case where real interest rates are Gaussian and real market prices of risk are deterministic, the chapter shows that CRRA investors hedge with a single real bond, with a utility of terminal wealth (real zero-coupon bond maturing at the horizon), and with a utility of intermediate consumption (a bond with continuous coupons proportional to the expected future real consumption rate under the forward martingale measure). The results are illustrated by two examples: (i) non-Markovian HJM term structure dynamics, (ii) stochastic volatility and excess returns in the stock market. 10. Mogens Steffensen. Life Insurance.
Differential Systems in Finance and
Financial and life insurance mathematics share a common problem of valuation of future payment streams. However, the valuation principles - no arbitrage and diversification - differ because risks differ. In both financial and life insurance mathematics, the valuation problem reduces to calculating conditional expected values and the extrema of these values. If the risk process is Markovian, expected values can be characterized by solutions to systems of deterministic differential equations. Deterministic differential systems also appear in financial and life insurance decision-making. They characterize optimal expected values of future utility and optimal decisions. For both valuation and optimization, we derive some classical examples from finance and life insurance and generalize to situations that are relevant in 6
Introduction both fields. We study valuation with participation and early exercise options, applications of the linear regulator, and generalized consumption problems. The collection of results and proofs demonstrate both the similarities and the small but important differences in the various problems. 11. Paul A. de Hek. Uncertain Technological Change and Capital Mobility. Unpredictable variations in economic productivity may have a positive or negative effect on the average growth rate of output. This theoretical ambiguity result is not solely determined by the value of the elasticity of intertemporal substitution. The growth-uncertainty relationship depends on two factors: whether returns to scale in knowledge creation are increasing or non-increasing, and whether the elasticity of intertemporal substitution (of profits) is higher or lower than some critical value. Empirical studies concerning these two factors indicate that unpredictable variations in economic productivity have a negative effect on the average long-run growth rate. 12. Nils Chr. Framstad. Stochastic Control, Non-Depletion of Renewable Resources, and Intertemporal Substitution. For a wide class of models concerning the optimal extraction of a renewable resource, it is well known that an expected profit maximizer with an infinite horizon does not deplete the resource completely if its relative growth rate is strictly greater than the discount rate. This principle is extended to preferences that have intertemporal substitution in direct utility rates and that exhibit risk aversion (or risk neutrality) sufficiently close to zero. For a CRRA utility, the effect of intertemporal substitution is seen more clearly. The model in this chapter is an Itˆo process driven by semi-martingales.
7
Bjarne S. Jensen, Tapio Palokangas 13. Klaus W¨alde. Capital Accumulation in a Growth Model with Creative Destruction. Capital accumulation and creative destruction are modeled together with risk-averse households. The novel aspect - risk-averse households - allows the use of well-known models not only for analyzing long-run growth as in the literature, but also short-run fluctuations. The model remains analytically tractable because of a very convenient property of household investment decisions in this stochastic setup. 14. Tapio Palokangas. Employment Cycles in a Growth Model with Creative Destruction. This chapter constructs a model that would explain economic growth with fluctuations in output and employment. The particular features of the model are the following. There is creative destruction in the sense that a new technology renders an old technology obsolete. There are efficiency wages in R&D. In production, there is unionemployer bargaining over wages. The firms can increase the probability of a technological change for themselves by R&D. Learning-byinvestment increases the productivity of labor in the consumptiongood sector in proportion to the expected accumulation of capital. The main results are: In the long run, the economy follows a balanced-growth path that satisfies Kaldor’s stylized facts. Wages in production grow on average in proportion to the level of productivity. The economy generates cycles around this long-run equilibrium. Capital stock swings up and down due to endogenous technological shocks. Because union-firm bargaining keeps real wages in proportion to the level of productivity, the labor-capital ratio is fixed and employment swings in proportion to capital stock. Thus, a stationary state equilibrium is characterized by involuntary unemployment, employment cycles and stable real wages in production.
8
Part I: Developments in Stochastic Dynamics
Chapter 1 Fractional Brownian Motion in Finance
Bernt Øksendal Center of Mathematics for Applications (CMA) Department of Mathematics, University of Oslo, and Norwegian School of Economics and Business Administration
1.1
Introduction
How can we model (as a function of time) (i) the levels of a river? (ii) the characters of solar activity? (iii) the widths of consecutive annual rings of a tree? (iv) the outdoor temperature at a given point? (v) the values of the log returns hn , defined by hn = log
S(tn ) S(tn−1 )
where S(t) is the observed price at time t of a given stock? And how can we model (vi) the turbulence in an incompressible fluid flow? (vii) the electricity price in a liberated electricity market?
Bernt Øksendal The answer in all these cases is: By using a fractional Brownian motion! The examples (i)–(iii), (v) and (vi) are taken from Shiryaev (1999), example (iv) is from Brody, Syroka and Zervos (2002), and example (vii) is from Simonsen (2003). This amazing range of potential applications makes an interesting object to study. 1.2
Framework and definitions
Fractional Brownian motion is defined as follows: Definition 1. Let H ∈ (0, 1) be a constant. The (1-parameter) fractional Brownian motion (f Bm) with Hurst parameter H is the Gaussian process BH (t) = BH (t, ω), t ∈ R, ω ∈ Ω, satisfying BH (0) = E[BH (t)] = 0
for all t ∈ R
(1)
and E[BH (s)BH (t)] = 12 {|s|2H + |t|2H − |s − t|2H };
s, t ∈ R.
(2)
Here E denotes the expectation with respect to the probability law P for {BH (t)}t∈R = {BH (t, ω); t ∈ R, ω ∈ Ω}, where (Ω, F) is a measurable space. If H = 12 then BH (t) coincides with the classical Brownian motion, denoted by B(t). If H > 12 then BH (t) is persistent, in the sense that ρn := E[BH (1)·(BH (n+1)−BH (n))] > 0 and
∞
for all n = 1, 2, . . . (3)
ρn = ∞.
(4)
n=1
If H
12 in (i)–(v) and with H < 12 in (vi) and (vii). Another important property of f Bm is self-similarity: For any H ∈ (0, 1) and α > 0 the law of {BH (αt)}t∈R is the same as the law of {αH BH (t)}t∈R . In order to be able to apply f Bm to study the situations above we need a stochastic calculus for f Bm. However, if H = 12 then BH (t) is not a semimartingale, so one cannot use the general theory of stochastic calculus for semimartingales on BH (t). For example, it is not a priori clear what a stochastic integral of the form T φ(t, ω)dBH (t) 0
should mean. The two most common constructions of such a stochastic integral are the following: 1.2.1 The pathwise or forward integral This integral is denoted by T
φ(t, ω)d− BH (t).
0
If the integrand φ(t, ω) is caglad (left-continuous with right sided limits) then this integral can be defined by Riemann sums, as follows: Let 0 = t0 < t1 < · · · < tN = T be a partition of [0, T ]. Put Δtk = tk+1 − tk and define T 0
φ(t, ω)d− BH (t) = lim
Δtk →0
N −1
φ(tk ) · (B(tk+1 ) − B(tk )),
(6)
k=0
if the limit exists (e.g. in probability). See Theorem 5. Note that with this definition the integration takes place with respect to t for each fixed “path” ω ∈ Ω. Therefore this integral is often called the pathwise integral. Using a classical integration theory due to Young one can prove that the pathwise integral (6) exists if the p-variation of t → φ(t, ω) is finite for all p > (1 − H)−1 . See Norvaisa (2000) and the references therein. Since t → BH (t) has finite 13
Bernt Øksendal q-variation iff q ≥ H1 , we see that if H < even include integrals like T
1 2
then this theory does not
BH (t)d− BH (t).
0
For this reason one often assumes that H > 12 when dealing with forward integrals with respect to BH (t). In general T E
φ(t, ω)d− BH (t) = 0,
(7)
0
even if the forward integral belongs to L1 (P ). For H > 12 the forward integral obeys Stratonovich type of integration rules. For example, if f ∈ C 1 (R) and t Xt :=
φ(s, ω)d− BH (s)
exists for all t > 0
0
then
t f (Xt ) = f (0) +
f (Xs )d− Xs ,
(8)
0
where d− Xs = φ(s, ω)d− BH (s). (See e.g. Norvaisa (2000) and also Theorem 22.) For this reason the forward integral is also sometimes called the Stratonovich integral with respect to f Bm. As a special case of (8) we note that T
2 BH (t)d− BH (t) = 12 BH (T )
for H >
1 2
.
(9)
0
Moreover, a slight extension of (8) gives that the unique solution Xt of the fractional forward stochastic differential equation d− X(t) = α(t, ω)X(t)dt + β(t, ω)X(t)d− BH (t);
14
X(0) = x > 0 (10)
Fractional Brownian Motion in Finance is t X(t) = x exp
t α(s, ω)ds +
0
β(s, ω)d− BH (s)
for H >
1 2
,
0
(11) provided that the integrals on the right hand side exist. 1.2.2 The Skorohod (Wick-Itˆo) integral This integral is denoted by T φ(t, ω)δBH (t). 0
It may be defined in terms of Riemann sums, as follows: T φ(t, ω)δBH (t) = lim
Δtk →0
0
N −1
φ(tk ) (B(tk+1 ) − B(tk )),
(12)
k=0
where denotes the Wick product (see Theorem 3). Thus the difference between this integral and the forward integral is the use of the Wick product instead of the ordinary product in the Riemann sums (12) and (6), respectively. The Skorohod integral behaves in many ways like the Itˆo integral of classical Brownian motion. For example, we have T E
φ(t, ω)δBH (t) = 0
(13)
0
if the integral belongs to L2 (P ). Moreover, if f ∈ C 2 (R) then we have the following Itˆo type formula t t f (BH (t)) = f (0) + f (BH (s))δBH (s) + H f (BH (s))s2H−1 ds, 0
0
(14) valid for all H ∈ (0, 1), provided that the left hand side and the last term on the right hand side both belong to L2 (P ) (see Biagini, Øksendal, Sulem and Wallner 2004).1 1
See also Bender (2003a), Elliott and van der Hoek (2003), Hu (2003) and Mishura (2002) for related results. In Duncan, Hu and Pasik-Duncan (2000) and Biagini and Øksendal (2004) Itˆ o formulae for more general processes are proved, but valid only for H > 12 .
15
Bernt Øksendal Note that as a special case of (14) we get T
2 BH (t)δBH (t) = 12 BH (T ) − 12 T 2H ,
H ∈ (0, 1).
(15)
0
The Wick-Skorohod-Itˆo analogue of (10) is the equation δX(t) = α(t, ω)X(t)dt + β(t, ω)X(t)δBH (t);
X(0) = x > 0. (16)
Assume that α(t, ω) = α and β(t, ω) = β are constants. Then by a slight extension of the Itˆo formula (14) one obtains that the unique solution of (16) is X(t) = x exp(βBH (t) + αt − 12 β 2 t2H );
H ∈ (0, 1).
(17)
Note that if H = 12 then the formulas (15) and (17) reduce to the formulas obtained by the Itˆo formula for the classical Brownian motion. Later in this paper we will give a more detailed discussion about these two types of integration and their use in finance (Section 1.5). But first we recall the mathematical foundation of fractional Brownian motion calculus based on white noise theory (Sections 1.3 and 1.4). 1.3
Classical white noise theory and Hida-Malliavin calculus
In this section we give a brief review of some fundamental concepts and results from classical white noise theory. We refer to Holden, Øksendal, Ubøe and Zang (1996), Hida, Kuo, Potthoff and Streit (1993), and Kuo (1996) for more information. Definition 2. Let S(R) be the Schwartz space of rapidly decreasing smooth functions on R and let Ω := S (R) be its dual, often called the space of tempered distributions. Then by the Bochner-Minlos theorem there exists a unique probability measure P on the Borel subsets of Ω such that − 1 f 2 2 eiω,f dP (ω) = e 2 L (R) ; f ∈ S(R) (18) where i =
√
Ω
−1 , f 2L2 (R) =
R
f (x)2 dx and ω, f = ω(f ) denotes the
action of ω ∈ Ω = S (R) on f ∈ S(R). This measure P is called the white noise probability measure. 16
Fractional Brownian Motion in Finance From (18) it follows that E[ ω, f ] = 0
for all f ∈ S(R),
(19)
where E[ ω, f ] = EP [ ω, f ] =
ω, f dP (ω) Ω
denotes the expectation of ω, f with respect to P . Moreover, (18) implies the isometry E[ ω, f 2 ] = f 2L2 (R)
for all f ∈ S(R).
(20)
Using (19) and (20) we can extend the definition of ω, f from S(R) to L2 (R) as follows: If f ∈ L2 (R) define
ω, f = lim ω, fn n→∞
(limit in L2 (P ))
(21)
where fn ∈ S(R) and fn → f in L2 (R). (It follows from (20) that the limit in (2.4) exists in L2 (P ) and is independent of the choice of the approximating sequence {fn }∞ n=1 ⊂ S(R).) In particular, we can for each t ∈ R define := B(t, ω) := ω, X[0,t] (·) B(t) where
⎧ ⎪ if 0 ≤ s ≤ t ⎨1 X[0,t] (s) = −1 if t ≤ s ≤ 0, except t = s = 0 ⎪ ⎩ 0 otherwise
(22)
(23)
By Kolmogorov’s continuity theorem it can be proved that B(t) has a continuous version, which we will denote by B(t). Then we see that B(t) is a continuous Gaussian process with mean B(0) = E[B(t)] = 0 and covariance
for all t
(24)
E[B(t1 )B(t2 )] =
X[0,t1 ] (s)X[0,t2 ] (s)ds
R min([t1 |, |t2 |); if t1 , t2 > 0 = 0 otherwise
(25)
17
Bernt Øksendal Therefore B(t) is a (classical) Brownian motion with respect to P . Suppose f (t) = ak X[tk ,tk+1 ) (t) k
is a step function, where t1 < t2 < · · · < tN and ak ∈ R. Then by (22) and linearity we get ak ω, X[tk ,tk+1 ) (·) = ak (B(tk+1 ) − B(tk ))
ω, f = k
k
f (t)dB(t).
= R
By taking limits of such step functions we obtain that for all f ∈ L2 (R).
ω, f = f (t)dB(t)
(26)
R
In the following we let x2
hn (x) := (−1)n e 2
dn − x22 e ; dxn
n = 0, 1, 2, . . .
(27)
be the Hermite polynomials and we let 1 √ x2 1 ξn (x) := π − 4 (n − 1)! − 2 hn−1 2 x e− 2 ;
n = 1, 2, . . .
(28)
be the Hermite functions. Then {ξn }∞ n=1 consitutes an orthonormal 2 basis for L (R). The first Hermite polynomials are: h0 (x) = 1, h1 (x) = x, h2 (x) = x2 − 1, h3 (x) = x3 − 3x, . . . Let J be the set of all multi-indices α = (α1 , α2 , . . .) of finite length (i.e. αk = 0 for all k large enough), with αi ∈ N ∪ {0} = {0, 1, 2, . . .} for all i. For α = (α1 , . . . , αm ) ∈ J define Hα (ω) = hα1 ( ω, ξ1 )hα2 ( ω, ξ2 ) . . . hαm ( ω, ξm ).
(29)
For example, if we put ε(k) = (0, 0, . . . , 1) ∈ Rk 18
(the k’th unit vector)
(30)
Fractional Brownian Motion in Finance then we see that
Hε(k) (ω) = h1 ( ω, ξk ) = ω, ξk =
ξk (t)dB(t).
(31)
R
It is a fundamental fact that the family {Hα }α∈J constitutes an orthogonal basis for L2 (P ). Indeed, we have: Theorem 1. (The Wiener-Itˆ o chaos expansion (I)). Let F ∈ L2 (P ). Then there exists a unique family {cα }α∈J of constants cα ∈ R such that cα Hα (ω) (convergence in L2 (P )). (32) F (ω) = α∈J
Moreover, we have the isometry E[F 2 ] =
c2α α!
(33)
α∈J
where α! = α1 !α2 ! . . . αm ! if α = (α1 , . . . , αm ) ∈ J . Example 1. For each t ∈ R the random variable F (ω) = B(t, ω) belongs to L2 (P ). Its chaos expansion is ∞ B(t) = ω, X[0,t] (·) = ω, (X[0,t] , ξk )L2 (R) ξk k=1
=
∞
(X[0,t] , ξk )L2 (R) ω, ξk =
k=1
∞ t
ξk (s)ds Hε(k) (ω),
(34)
k=1 0
where in general
(f, g)L2 (R) =
f (t)g(t)dt. R
We now use Theorem 1 to define stochastic test functions and stochastic distributions, as follows: In the following we use the notation (2N)γ := (2 · 1)γ1 (2 · 2)γ2 . . . (2 · m)γm
(35)
if γ = (γ1 , . . . , γm ) ∈ J . 19
Bernt Øksendal Definition 3. a) The space (S) of Hida test functions is the set of all ψ ∈ L2 (P ) whose expansion ψ(ω) = aα Hα (ω) α∈J
satisfies
a2α α!(2N)αk < ∞
for all k = 1, 2, . . .
(36)
α∈J
b) The space (S)∗ of Hida distributions is the set of all formal expansions bα Hα (ω) G(ω) = α∈J
such that
b2α α!(2N)−qα < ∞
for some q ∈ N.
(37)
α∈J
We equip (S) with the projective topology and (S)∗ with the inductive topology. Then (S)∗ becomes the dual of (S) and the action of G ∈ (S)∗ on ψ ∈ (S) is given by α!aα bα . (38)
G, ψ = G, ψ(S)∗ ,(S) = α∈J
Note that
(S) ⊂ L2 (P ) ⊂ (S)∗ .
(39)
2
Moreover, if G ∈ L (P ) then
G, ψ = E[G · ψ]
for all ψ ∈ (S).
Definition 4. (Integration in (S)∗ ). the property that
Z(t), ψ ∈ L2 (R, dt) Then the integral
Z(t)dt R
20
(40)
Suppose Z : R → (S)∗ has
for all ψ ∈ (S).
Fractional Brownian Motion in Finance is defined to be the unique element of (S)∗ such that Z(t)dt, ψ = Z(t), ψdt for all ψ ∈ (S). R
(41)
R
Such functions Z(t) are called integrable in (S)∗ . Example 2. (White noise). W (t) =
∞
Define
ξk (t)Hε(k) (ω);
t ∈ R.
(42)
k=1
Then by Definition 3.b we see that W (t) ∈ (S)∗ for all t. Moreover t W (s)ds =
∞
t
ξk (s)ds Hε(k) (ω) = B(t),
(43)
k=1 0
0
by Example 1. In other words, the function t → B(t) is differentiable in (S)∗ and d B(t) = W (t) in (S)∗ . (44) dt This justifies the name white noise for W (t). We now recall the definition of the Wick product, which was originally introduced by the physicist G. Wick in the early 1950’s as a renormalization operation in quantum physics, but has later turned out to be central in stochastic analysis as well: Definition 5. (The Wick product). Let aα Hα (ω) ∈ (S)∗ and G(ω) = bβ Hβ (ω) ∈ (S)∗ . F (ω) = α∈J
β∈J
Then the Wick product of F and G, F G, is defined by (F G)(ω) = aα bβ Hα+β (ω) = aα bβ Hγ (ω). α,β∈J
γ∈J
(45)
α+β=γ
One can easily verify that the Wick product is a commutative, associative and distributive (over addition) binary operation on both (S) and on (S)∗ . Moreover, note that F G=F ·G
if either F or G is deterministic.
(46) 21
Bernt Øksendal Example 3.
If
F (ω) =
f (t)dB(t) and G(ω) =
R
g(t)dB(t) R
with f, g ∈ L2 (R) (deterministic), then F G = F · G − (f, g),
(47)
where (f, g) = (f, g)L2 (R) . Proof: Using (45) and that h2 (x) = x2 − 1 we get F G = ω, f ω, g ∞ ∞ = (f, ξk ) ω, ξk (g, ξ ) ω, ξ k=1
= = =
∞
=1
(f, ξk )(g, ξ )Hε(k) +ε()
k,=1 ∞
∞
k = ∞
k=1 ∞
(f, ξk )(g, ξ )Hε(k) Hε() +
(f, ξk )(g, ξk )h2 ( ω, ξk )
(f, ξk )(g, ξ )Hε(k) Hε() −
k,=1
(f, ξk )(g, ξk )
k=1
= ω, f · ω, g − (f, g).
One reason for the importance of the Wick product is the following result [we refer to Holden, Øksendal, Ubøe and Zang (1996) for a proof and more information]: Theorem 2. Suppose that Y (t, ω) is a stochastic process which is Skorohod integrable. Then Y (t) W (t) is integrable in (S)∗ and Y (t)δB(t) = Y (t) W (t)dt, (48) R
R
where the left hand side denotes the Skorohod integral of Y (·) with respect to B(·). 22
Fractional Brownian Motion in Finance The Skorohod integral is an extension of the classical Itˆo integral, in the sense that if Y (t, ω) is measurable w.r.t. the σ-algebra Ft generated by B(s, ω); s ≤ t, for all t (i.e. if Y (·) is Ft -adapted ) and T E Y 2 (t, ω)dt < ∞ , (49) 0
then T
T Y (t)δB(t) =
0
Y (t)dB(t),
the classical Itˆo integral.
(50)
0
The integral on the right hand side of (48) may exist even if Y is not Skorohod integrable. Therefore we may regard the right hand side of (48) as an extension of the Skorohod integral and we call it the extended Skorohod integral. We will use the same notation Y (t)δB(t) R
for the extended Skorohod integral. Example 4. Using Wick calculus in (S)∗ we get T
T B(T ) W (t)dt = B(T )
B(T )δB(t) = 0
T
0
W (t)dt 0
2
= B(T ) B(T ) = B (T ) − T,
(51)
by Example 3 with f = g = X[0,T ] . The following result gives a useful interpretation of the Skorohod integral as a limit of Riemann sums: Theorem 3. Let Y : [0, T ] → (S)∗ be a caglad function, i.e. Y (t) is left-continuous with right sided limits. Then Y is Skorohod integrable over [0, T ] and Y (t)δB(t) = lim R
Δtj →0
N −1
Y (tj ) (B(tj+1 ) − B(tj ))
(52)
j=0
where the limit is taken in (S)∗ and 0 = t0 < t1 < · · · < tn = T is a partition of [0, T ], Δtj = tj+1 − tj , j = 0, . . . , N − 1. 23
Bernt Øksendal Proof: This is an easy consequence of Theorem 2.
We also note the following: Theorem 4. Let Y : R → (S)∗ . Suppose Y (t) has the expansion cα (t)Hα (ω); t∈R Y (t) = α∈J
where
cα ∈ L2 (R)
Then
Y (t)δB(t) =
for all α ∈ J .
(cα , ξk )Hα+ε(k) (ω),
(53)
α∈J k∈N
R
provided that the right hand side converges in (S)∗ . In particular, if Y (t)δB(t) ∈ L2 (P ) R
then E
Y (t)δB(t) = 0.
(54)
R
1.3.1
The forward integral
We have already noted that the Skorohod integral is an extension of the classical Itˆo integral to integrands which are not necessarily adapted. There is another natural extension of this type, called the forward integral, which we now define: Definition 6. The forward integral of a function Y : R → (S)∗ is defined by B(t + ε) − B(t) − dt, Y (t)d B(t) = lim Y (t) ε→0 ε R
R
provided that the limit exists in (S)∗ . We refer to Nualart and Pardoux (1988), and Russo and Vallois (2000) for more information about the forward integral. At this stage we will settle with the following result, which gives an easy comparison with the Skorohod integral (see Theorem 3). 24
Fractional Brownian Motion in Finance Theorem 5. Suppose that Y : [0, T ] → (S)∗ is caglad and forward integrable over [0, T ]. Then T
−
Y (t)d B(t) = lim
Δtj →0
0
N −1
Y (tj )·(B(tj+1 )−B(tj ))
(limit in (S)∗ ).
j=0
(55) Proof: This follows by a Fubini argument. See e.g. (2.24) in Biagini and Øksendal (2004) for a proof. We say that X(t) is a forward Itˆ o process if t
t u(s, ω)ds +
X(t) = x + 0
v(s, ω)d− B(s);
t≥0
(56)
0
for some measurable processes u(s, ω), v(s, ω) ∈ R (not necessarily adapted) such that t |u(s, ω)|ds < ∞ (57) 0
and the Itˆo forward integral t
v(s, ω)d− B(s)
(58)
0
exists for all t > 0. In that case we use the shorthand notation d− X(t) = u(t)dt + v(t)d− B(t);
X(0) = x
(59)
for the integral equation (56). For such processes we have the following Itˆo formula: Theorem 6.2 (Itˆ o formula for forward processes). Let there 2 o be f ∈ C (R) and define Y (t) = f (X(t)). Then Y (t) is a forward Itˆ process and d− Y (t) = f (X(t))d− X(t) + 12 f (X(t))v 2 (t)dt. 2
(60)
Russo and Vallois (2000).
25
Bernt Øksendal 1.3.2
Stochastic differentiation
We now make use of our explicit knowledge of the space Ω = S (R) to define differentiation with respect to ω, as follows: Definition 7. a) Let F : Ω → R, γ ∈ L2 (R). Then the directional derivative of F in the direction γ is defined by F (ω + εγ) − F (ω) ε→0 ε
(61)
Dγ F (ω) = lim
provided that the limit exists in (S)∗ . b) Suppose there exists a function ψ : R → (S)∗ such that Dγ F (ω) = ψ(t)γ(t)dt for all γ ∈ L2 (R).
(62)
R
Then we say that F is differentiable and we call ψ(t) the stochastic gradient of F (or the Hida-Malliavin derivative of F ). We use the notation Dt F = ψ(t) for the stochastic gradient of F at t ∈ R. Note that – in spite of the notation – Dt F is not a derivative w.r.t. t but (a kind of) derivative w.r.t. ω ∈ Ω. Example 5. Suppose F (ω) = ω, f =
f (s)dB(s) R
for some f ∈ L2 (R). Then by linearity 1
ω + εγ, f − ω, f = γ, f = ε→0 ε
Dγ F (ω) = lim
f (t)γ(t)dt R
for all γ ∈ L2 (R). We conclude that F is differentiable and Dt f (s)dB(s) = f (t) for a.a. t.
(63)
R
(Note that this is only valid for deterministic integrands f . See Theorem 11 for the general case.) 26
Fractional Brownian Motion in Finance We note two useful chain rules for stochastic differentiation: Theorem 7. (Chain rule I). Let φ : Rn → R be a Lipschitz continuous function, i.e. there exists C < ∞ such that |φ(x) − φ(y)| ≤ C|x − y|
for all x, y ∈ Rn .
Let X = (X1 , . . . , Xn ) where each Xi : Ω → R is differentiable. Then φ(X) is differentiable and n ∂φ Dt φ(X) = (X)Dt Xk . ∂x k k=1
(64)
We refer to Nualart (1995) for a proof. If f (x) = put
∞ m=0
am xm is a real analytic function and X ∈ (S)∗ we
f (X) =
∞
am X m ,
(65)
m=0
provided the sum converges in (S)∗ . We call f (X) the Wick version of f (X). A similar definition applies to real analytic functions on Rn . Theorem 8. (The Wick chain rule). Let f : Rn → R be real analytic and let X = (X1 , . . . , Xn ) ∈ ((S)∗ )n . Then if f (X) ∈ (S)∗ Dt (f (X)) =
n ∂f k=1
∂xk
(X) Dt Xk ;
t ∈ R.
(66)
We refer to Biagini, Øksendal, Sulem and Wallner (2004) for a proof. Note that by Example 5 and the chain rule (64) we have Dt Hα (ω) =
m
αi Hα−ε(i) (ω)ξi (t) ∈ (S)∗
for all t.
(67)
i=1
In fact, using the topology for (S)∗ one can prove: 27
Bernt Øksendal Theorem 9. Let F ∈ (S)∗ . Then F is differentiable, and if F has the expansion F (ω) = cα Hα (ω) α∈J
then Dt F (ω) =
cα αi Hα−ε(i) (ω)ξi (t)
for all t ∈ R.
(68)
α,i
The stochastic gradient is the key to the connection between forward integrals and Skorohod integrals: Theorem 10. Suppose Y : R → (S)∗ is caglad. Then T
T
−
Y (t)d B(t) = 0
T Y (t)δB(t) +
0
Dt+ Y (t)dt
for all T > 0,
0
(69) provided that the integrals exist, where Dt+ Y (t) = lims→t+ Ds Y (t). We now mention without proofs some of the most fundamental results from stochastic differential and integral calculus. For proofs we refer to Nualart and Pardoux (1988) and Biagini, Øksendal, Sulem and Wallner (2004). Theorem 11. (Fundamental theorem of stochastic calculus). Suppose Y (·) : R → (S)∗ and Dt Y (·) : R → (S)∗ are Skorohod integrable. Then Y (s)δB(s) = Dt Y (s)δB(s) + Y (t). (70) Dt R
R
Theorem 12. (Relation between the Wick product and the ordinary product). Suppose g ∈ L2 (R) is deterministic and that F ∈ L2 (P ). Then F g(t)dB(t) = F · g(t)dB(t) − g(t)Dt F dt. (71) R
Corollary 1.
R
Let g ∈ L2 (R) be deterministic and F ∈ L2 (P ). Then E F · g(t)dB(t) = E g(t)Dt F dt (72) R
provided that the integrals converge. 28
R
R
Fractional Brownian Motion in Finance Theorem 13. (Integration by parts). Let F ∈ L2 (P ) and assume that Y : R × Ω → R is Skorohod integrable with Y (t)δB(t) ∈ L2 (P ). R
Then
Y (t)δB(t) =
F
F Y (t)δB(t) +
R
R
Y (t)Dt F dt
(73)
R
provided that the integral on the extreme right converges in L2 (P ). This immediately gives the following generalization of Corollary 1: Corollary 2. Let F and Y (t) be as in Theorem 13. Then E F Y (t)δB(t) = E Y (t)Dt F dt . R
(74)
R
Theorem 14. (The Itˆ o-Skorohod isometry). Y : R × Ω → R is Skorohod integrable with Y (t)δB(t) ∈ L2 (P ).
Suppose that
R
Then 2 E Y (t)δB(t) Y 2 (t)dt +E Dt Y (s)Ds Y (t)ds dt . =E R
R
R
R
(75)
Using Theorem 12 we obtain the following relation between forward integrals and Skorohod integrals: Theorem 15. Suppose that Y : [0, T ] → (S)∗ is caglad and Skorohod integrable over [0, T ]. Moreover, suppose that T Dt+ Y (t)dt 0
exists, where Dt+ Y (t) = lims→t+ Ds Y (t). Then T 0
Y (t)d− B(t) =
T
T Y (t)δB(t) +
0
Dt+ Y (t)dt.
(76)
0
29
Bernt Øksendal 1.4
Fractional stochastic calculus
We now consider the corresponding calculus for fractional Brownian motion BH (t) with arbitrary Hurst parameter H ∈ (0, 1). It turns out that it is possible to transform the calculus for B(t) into the calculus for BH (t) by means of an operator M . This is the idea of Elliott and Van der Hoek (2003), which we now describe. The approach of Elliott and Van der Hoek (2003) represents an extension to all H ∈ (0, 1) of the fractional white noise calculus for H ∈ ( 12 , 1) introduced by Hu and Øksendal (2003). For details we refer to Elliott and Van der Hoek (2003), Hu and Øksendal (2003), Biagini, Øksendal, Sulem and Wallner (2004) and Biagini, Hu, Øksendal and Zang. See also De¨ unel (1998), and Nualart (2004) for an alternative creusefond and Ust¨ approach. Definition 8. For H ∈ (0, 1) put π −1 (H − 12 ) [Γ(2H + 1) sin(πH)]1/2 cH = 2Γ(H − 12 ) cos 2
(77)
where Γ(·) is the Gamma function. Define the operator M = M H on S(R) by f (y) = cH |y| 12 −H fˆ(y); f ∈ S(R), (78) M where in general 1 gˆ(y) = √ 2π
e−ixy g(x)dx
R
is the Fourier transform of g. Let L2H (R) be the closure of S(R) in the norm 2 f L2 (R) = (M f, M f )L2 (R) = (M f (x))2 dx; H
f ∈ S(R).
(79)
R
Then the operator M extends in a natural way to an isometry between the two Hilbert spaces L2 (R) and L2H (R). Note that f , M g) = (M f, M g) = (f, M 2 g) (M
for f, g ∈ L2H (R).
(80)
Now define ˜H (t) = B ˜H (t, ω) = ω, M X[0,t] . B 30
(81)
Fractional Brownian Motion in Finance ˜ is a Gaussian process with mean Then by Section 1.3 we see that B(t) 0 and covariance ˜H (s)B ˜H (t)] = (M X[0,s] , M X[0,t] ) E[B = (X[0,s] , X[0,t] )L2H (R) = 12 (|s|2H + |t|2H − |s − t|2H ),
(82)
by (A.10) in Elliott and Van der Hoek (2003). ˜H (t) has a continuous version, denoted by BH (t), which Therefore B is a fractional Brownian motion with Hurst coefficient H. Arguing as in Section 1.3 we see that if aj X[tj ,tj+1 ) (t) f (t) = j
is a (deterministic) step function, then
ω, M f =
aj (BH (tj+1 ) − BH (tj )) =
j
f (t)dBH (t). R
On the other hand, we know that
ω, M f = M f (t)dB(t). R
Therefore
f (t)dBH (t) =
R
M f (t)dB(t)
(83)
R
for all step functions f , and hence for all f ∈ L2H (R). The chaos expansion of BH (t) ∈ L2 (P ) is ∞ (M X[0,t] , ξk )ξk BH (t) = ω, M X[0,t] = ω, k=1
=
∞
(X[0,t] , M ξk ) ω, ξk =
k=1
∞
t
M ξk (s)ds Hε(k) (ω).
(84)
k=1 0
Therefore, if we define fractional white noise WH (t) by WH (t) =
∞
M ξk (t)Hε(k) (ω),
(85)
k=1
31
Bernt Øksendal then WH (t) ∈ (S)∗ and dBH (t) = WH (t) dt
in (S)∗ .
(86)
In view of this and Theorem 2 the following definition is natural: Definition 9. The Skorohod integral of a function Y : R → (S)∗ with respect to BH (t) is defined by Y (t)δBH (t) = Y (t) WH (t)dt, (87) R
R
provided that Y (t) WH (t) is integrable in (S)∗ . We can in a natural way extend the M -operator to functions Y : R → (S)∗ whose chaos expansion cα (t)Hα (ω) Y (t) = q∈J
has coefficients cα ∈ L2H (R), as follows: M cα (t)Hα (ω). M Y (t) = α∈J
This is well-defined if the series converges in (S)∗ . With this extension of M we note that the connection between the classical white noise W (t) and the fractional white noise WH (t) can be written WH (t) = M W (t);
t ∈ R.
(88)
Combining this with Definition 9 we get Theorem 16. Let Y : R → (S)∗ . Suppose Y (t) has the expansion cα (t)Hα (ω); t∈R Y (t) = α∈J
where Then
cα (·) ∈ L2H (R) Y (t)δBH (t) = R
for all α ∈ J . (cα , ek )L2H (R) Hα+ε(k) (ω),
α∈J k∈N
provided that the right hand side converges in (S)∗ . 32
(89)
Fractional Brownian Motion in Finance Note in particular that if Y (t)δBH (t) ∈ L2 (P ), R
then
E
Y (t)δBH (t) = 0.
(90)
R
Proof: Y (t)δBH (t) = Y (t) WH (t)dt R
R
Y (t)
= =
M ξk (t)Hε(k) (ω)dt
k=1
R
∞
(cα , M ξk )Hα+ε(k) (ω) =
α,k
=
(M cα , ξk )Hα+ε(k) (ω)
(91)
α,k
(cα , ek )L2H (R) Hα+ε(k) (ω).
(92)
α,k
We also note the following relation between the Skorohod integrals w.r.t. BH (·) and B(·): Y (s)δBH (s) = M s Y (s)δB(s), (93) R
R
where M s indicates that M is operating on the variable s. This follows from (92) and Theorem 4. Example 6. What is T BH (t)δBH (t)? 0
We can answer this by using Wick calculus as in Example 4: T
T BH (t) WH (t)dt =
BH (t)δBH (t) = 0
0
=
1 2
T BH (t) 0
d BH (t)dt dt
T
2 2 2 (T ) = 12 BH (T ) − 12 T 2H , BH (t) = 12 BH
(94)
0
33
Bernt Øksendal because, by (81) and (47), 2 BH (T ) = ω, M X[0,T ] ω, M X[0,T ]
= ω, M X[0,T ] · ω, M X[0,T ] − (M X[0,T ] , M X[0,T ] ) = BH (T ) · BH (T ) − (X[0,T ] , X[0,T ] )L2H (R)
2 = BH (T ) − T 2H
(by (A.10) in Elliott and Van der Hoek (2003)).
(95)
This result could also have been deduced from the following version of the Itˆo formula. o formula for fractional Skorohod integrals). Theorem 17.3 (Itˆ Let f (s, x) : R × R → R belong to C 1,2 (R × R) and assume that the three random variables t t 2 ∂f ∂ f f (t, BH (t)(t)), (s, BH (s))ds and (s, BH (s))s2H−1 ds 2 ∂s ∂x 0 0 all belong to L2 (P ). Then
t ∂f (s, BH (s))ds f (t, BH (t)(t)) = f (0, 0) + 0 ∂s t t 2 ∂f ∂ f + (s, BH (s))dBH (s) + H (s, BH (s))s2H−1 ds. 2 ∂x ∂x 0 0 (96)
Proof: There are several versions of this result. See Mishura (2002), van der Hoek and Biagini, Øksendal, Sulem and Wallner (2004). This result is valid for all H ∈ (0, 1), but if we restrict ourselves to 12 < H < 1 there is a more general Itˆo formula in Duncan, Hu and PasikDuncan (2000) and Biagini and Øksendal (2004).
Example 7. equation
Let α, β = 0 be constants. The fractional Skorohod
δY (t) = αY (t)dt + βY (t)δBH (t); 3
34
Y (0) > 0
Biagini, Øksendal, Sulem and Wallner (2004), Theorem 3.8.
(97)
Fractional Brownian Motion in Finance t i.e.
t αY (s)ds +
Y (t) = Y (0) + 0
βY (s)δBH (s);
t≥0
0
has the unique solution Y (t) = Y (0) exp(βBH (t) + αt − 12 β 2 t2H );
t > 0.
(98)
This follows by applying Theorem 17 to the process X(t) = αt − 12 β 2 t2H + βBH (t) and the function f (x) = Y (0) exp x. In analogy with the classical case we call this process Y (t) the geometric Skorohod fractional Brownian motion. Note that if we put H = 12 we get the classical geometric Brownian motion. We proceed to consider differentiation: (H)
Definition 10. The Hida-Malliavin derivative Dt gradient) of an element F ∈ (S)∗ is defined by (H)
Dt F = M −1 Dt F ;
(or stochastic
t ∈ R.
(99)
By Theorem 9 we see that if F has the expansion F (ω) = cα Hα (ω) α∈J
then
(H)
Dt F =
cα αi Hα−ε(i) (ω)ei (t);
t ∈ R.
(100)
α∈J i∈N
We can now formulate the fractional analogue of Theorem 11: Theorem 18.4 (Fractional fundamental theorem of calculus). (H) Suppose Y (·) : R → (S)∗ and Dt Y (·) : R → (S)∗ are Skorohod integrable w.r.t. BH . Then (H) (H) Y (s)δBH (s) = Dt Y (s)δBH (s) + Y (t). (101) Dt R 4
R
Biagini, Øksendal, Sulem and Wallner (2004), Theorem 5.3.
35
Bernt Øksendal Proof: By (98), (92) and Theorem 11 we get (H) −1 Y (s)δBH (s) = M t Dt M s Y (s)δB(s) Dt R
=
M −1 t
=
R
Dt (M s Y (s))δB(s) + M −1 t M t Y (t)
R
M −1 t Dt (M s Y (s))δB(s) + Y (t)
R
=
(H)
Dt (M s Y (s))δB(s) + Y (t) R
=
(H)
M s (Dt Y (s))δB(s) + Y (t) R
=
(H)
Dt Y (s)δBH (s) + Y (t).
R
Let F ∈ Theorem 19.5 (Fractional integration by parts). assume that Y : R × Ω → R is Skorohod integrable w.r.t. L2 (P ) and 2 BH with Y (t)δBH (t) ∈ L (P ). Then
R
Y (t)δBH (t) =
F R
(H)
Y (t)M 2t Dt F dt.
F Y (t)δBH (t) + R
(102)
R
Proof: By (92), Theorem 13 and (98) we get F Y (t)δBH (t) = F M t Y (t)δB(t) R
=
R
F M t Y (t)δB(t) + R
=
R
=
F Y (t)δBH (t) +
R
36
M t (F Y (t))δB(t) + R
5
M t Y (t)Dt F dt (H)
M t Y (t)M t Dt F dt R (H)
Y (t)M 2t Dt F dt.
R
Biagini, Øksendal, Sulem and Wallner (2004), Theorem 5.3.
Fractional Brownian Motion in Finance Corollary 3. Let F and Y (t) be as in Theorem 19. Then (H) E F Y (t)δBH (t) = E Y (t)M 2t Dt F dt . R
(103)
R
We also note the following fractional version of Theorem 14: Theorem 20.6 (The fractional Itˆ o-Skorohod isometry). Suppose Y : R × Ω → R is Skorohod-integrable with respect to BH with
Y (t)δBH (t) ∈ L2 (P ).
R
Then E
2 Y (t)δBH (t)
R
=E
(H) (M Y (t)) dt + E Dt M 2s Y (s) · Ds(H) M 2t Y (t)ds dt . 2
R
R
R
(104) Proof: This follows by combining Theorem 14 with (92) and (98). We omit the details. Finally we turn to the fractional forward integral. This is defined in the same way as in the classical case (Definition 6): Definition 11. The forward integral of a function Y : R → (S)∗ with respect to BH (t) is defined by:
Y (t)d− BH (t) = lim
Y (t)
ε→0
R
R
BH (t + ε) − BH (t) dt, ε
(105)
provided that the limit exists in (S)∗ .
6
Elliott and Van der Hoek (2003).
37
Bernt Øksendal Just as in Theorem 5 we have: Theorem 21. Suppose Y : [0, T ] → (S)∗ is caglad and forward integrable over [0, T ] w.r.t. BH (·). Then T
Y (t)d− B(t) = lim
Δtj →0
0
N −1
Y (tj ) · (BH (tj+1 ) − BH (tj ))
(106)
j=0
(limit in (S)∗ ). Remark 1. In the special case when Y = Y (t, ω) : [0, T ] × Ω → R is a classical stochastic process (and Y (t, ·) ∈ (S)∗ for all t) and the limit in (105) exists for a.a. ω, the forward integral of Y coincides with the pathwise integral (or more precisely the left Young (LY) integral of Y ) with respect to dBH (t). See Norvaisa (2000) for details. ∇ Definition 12. A function Y : [0, T ] → (S)∗ with expansion cα (t)Hα (ω) Y (t) = α∈J (H)
belongs to the space D1,2 if ∞ 2 Y (H) := αi α!(cα , ξi )2 < ∞ D 1,2
α∈J i=1
where
T (cα , ξi ) =
cα (s)ξi (s)ds. 0
The analogue of Theorem 15 is the following: Theorem 22. Suppose that Y : [0, T ] → (S)∗ is cadlag and Skorohod integrable over [0, T ] w.r.t. BH (t). Moreover, suppose that (H) Y ∈ D1,2 . Then T 0
38
(H)
[M 2t Dt Y (u)]u=t dt exists in L2 (P )
Fractional Brownian Motion in Finance and T
Y (t)d− BH (t) =
0
T
T Y (t)δBH (t) +
0
(H)
[M 2t Dt Y (u)]u=t dt.
(107)
0
Proof: We refer to Biagini and Øksendal (2004) for details. See also Mishura (2002). We end this section by giving an Itˆo formula for forward integrals w.r.t. fractional Brownian motion: A forward fractional Itˆ o process is a process of the form t
t u(s, ω)ds +
X(t) = x + 0
v(s, ω)d− BH (s);
t≥0
(108)
0
where u(s, ω) and v(s, ω) are realvalued, measurable (not necessarily adapted) processes such that t
t |u(s, ω)|ds < ∞ and
0
v(s, ω)d− BH (s)
exists a.e..
0
In this case we use the shorthand notation d− X(t) = u(t)dt + v(t)d− BH (t);
X(0) = x.
(109)
Theorem 23. (An Itˆ for forward fractional pro o formula cesses). Suppose H ∈ 12 , 1 . Let f ∈ C 1 (R) and put Y (t) = f (X(t)), where X(t) is given by (108). Then d− Y (t) = f (X(t))d− X(t).
(110)
Proof: This is a classic result about forward (pathwise) integration. A direct proof can be found in Biagini and Øksendal (2004). See also Norvaisa (2000), Nualart (2004) and Russo and Vallois (2000) and the references therein. If f posseses higher order regularity then a corresponding (but more complicated) Itˆo formula can be obtained for lower values of H. See e.g. Countin and Qian (2002) and Gradinaru, Nourdin, Russo and Vallois (2002).
39
Bernt Øksendal Example 8. The fractional forward equation d− X(t) = αX(t)dt + βX(t)d− BH (t); has for
1 2
X(0) = x > 0
< H < 1 the unique solution X(t) = x exp(βBH (t) + αt);
1.5
(111)
t ≥ 0.
(112)
Summary of results
We now use the mathematical machinery described in the earlier sections to study finance models involving f Bm. We have seen that there are two natural ways of defining integration with respect to f Bm: (a) The pathwise (forward) integration (b) The Skorohod integration. Therefore we discuss these two cases separately: 1.5.1
The pathwise integration model ( 12 < H < 1)
For simplicity we concentrate on the simplest nontrivial type of market, namely on the f Bm version of the classical Black-Scholes market, as follows: Suppose there are two investment possibilities: (i) A safe or risk free investment, with price dynamics dS0 (t) = rS0 (t)dt;
S0 (0) = 1
(113)
and (ii) a risky investment, with price dynamics d− S1 (t) = μS1 (t)dt + σS1 (t)d− BH (t);
S1 (0) = x > 0,
(114)
where r, μ, σ = 0 and x > 0 are constants. By Example 8 we know that the solution of this equation is S1 (t) = x exp(σBH (t) + μt);
t ≥ 0.
(115)
Let {FtH }t≥0 be the filtration of BH (·), i.e. FtH is the σ-algebra generated by the random variables BH (s), s ≤ t. 40
Fractional Brownian Motion in Finance A portfolio in this market is a 2-dimensional FtH -adapted stochastic proces θ(t) = (θ0 (t), θ1 (t)) where θi (t) gives the number of units of investment number i held at time t, i = 0, 1. The corresponding wealth process V θ (t) is defined by V θ (t) = θ(t) · S(t) = θ0 (s)S0 (t) + θ1 (t)S1 (t),
(116)
where S(t) = (S0 (t), S0 (t)). We say that θ is pathwise self-financing if d− V θ (t) = θ(t) · d− S(t)
(117)
i.e. t θ
θ
V (t) = V (0) +
t θ0 (s)dS0 (s) +
0
θ1 (s)d− S1 (s).
(118)
0
If, in addition, V θ (t) is lower bounded, then we call the portfolio θ (pathwise) admissible. Definition 13. A pathwise admissible portfolio θ is called an arbitrage if the corresponding wealth process V θ (t) satisfies the following three conditions: Vθ =0 V (T ) ≥ 0 θ
θ
(119) a.s.
P [V (T ) > 0] > 0.
(120) (121)
Remark 2. The non-existence of arbitrage in a market is a basic equilibrium condition. It is not possible to make a sensible mathematical theory for a market with arbitrage. Therefore one of the first things to check in a mathematical finance model is whether arbitrages exist. In the above pathwise f Bm market the existence of arbitrage was proved by Rogers Rogers (1997) in 1997. Subsequently several simple examples of arbitrage were found. See e.g. Dasgupta (1997), Salopek (1998) and Shiryaev (1999). Note, however, that the existence of arbitrage in this pathwise model is already a direct consequence of Theorem 7.2 in Delbaen and Schachermayer (1994): There it is proved in general that if there is no arbitrage using simple portfolios (with pathwise products), then the price process is a semimartingale. Hence, since 41
Bernt Øksendal the process S1 (t) given by (114) is not a semimartingale, an arbitrage must exist. Here is a simple arbitrage example, due to Dasgupta (1997) and Shiryaev (1998): For simplicity assume that μ=r
and
σ = x = 1.
(122)
Define θ0 (t) = 1 − exp(2BH (t)),
θ1 (t) = 2(exp(BH (t)) − 1).
(123)
Then the corresponding wealth process is V θ (t) = θ0 (t)S0 (t) + θ1 (t)S1 (t) = (1 − exp(2BH (t))) exp(rt) + 2(exp(BH (t)) − 1) exp(BH (t) + rt) = exp(rt)(exp(BH (t)) − 1)2 > 0
for a.a. (t, ω).
(124)
This portfolio is self-financing, since θ0 (t)dS0 (t) + θ1 (t)d− S1 (t) = (1 − exp(2BH (t)))r exp(rt)dt + 2(exp(BH (t)) − 1)S1 (t)[rdt + d− BH (t)] = r exp(rt)(exp(BH (t)) − 1)2 dt + 2 exp(rt)(exp(BH (t)) − 1) exp(BH (t))d− BH (t) = d(exp(rt)(exp(BH (t)) − 1)2 ) = d− V θ (t).
∇
We have proved: Theorem 24.7 The portfolio θ(t) = (θ0 (t), θ1 (t)) given by (123) is a (pathwise) arbitrage in the (pathwise) fractional Black-Scholes market given by (113), (114) and (122). In view of this result the pathwise f Bm model is not suitable in finance, at least not in this simple form (but possibly in combination with classical Brownian motion). 7
42
Dasgupta (1997) and Shiryaev (1999).
Fractional Brownian Motion in Finance 1.5.2 The Wick-Skorohod integration model (0 < H < 1) We now consider the Wick-Skorohod integration version of the market (113)–(114). Mathematically the model below is an extension to H ∈ (0, 1) of the model introduced in Hu and Øksendal (2003) for H ∈ ( 12 , 1). (Subsequently a related model, also valid for all H ∈ (0, 1), was presented in Elliott and Van der Hoek (2003).) However, compared to Hu and Øksendal (2003) we give a different interpretation of the mathematical concepts involved: Assume that the values S0 (t), S1 (t) of the risk free (e.g. bond) and risky asset (e.g. stock), respectively, are given by (bond)
dS0 (t) = rS0 (t)dt;
S0 (0) = 1 (125)
and (stock)
δS1 (t) = μS1 (t)dt+σS1 (t)δBH (t);
S1 (0) = x > 0 (126)
where r, μ, σ = 0 and x > 0 are constants. By Example 7 the solution of equation (126) is S1 (t) = x exp(σBH (t) + μt − 12 σ 2 t2H );
t ≥ 0.
(127)
In this Wick-Skorohod model S1 (t) does not represent the observed stock price at time t, but we give it a different interpretation: We assume that S1 (t) represents in a broad sense the total value of the company and that it is not observed directly. Instead we adopt a quantum mechanical point of view, regarding S1 (t, ω) as a stochastic distribution in ω (represented mathematically as an element of (S)∗ ), ˆ and regarding the actual observed stock price S(t) as the result of applying S1 (t, ·) ∈ (S)∗ to a stochastic test function ψ(·) ∈ (S). In other words, ˆ := S(t, ·), ψ(·) = S(t), ψ, S(t)
(128)
where in general F, ψ denotes the action of a stochastic distribution F ∈ (S)∗ to a stochastic test function ψ ∈ (S). (See Section 1.3.) We call such stochastic test functions ψ market observers. We will assume that they have the form h(t)dBH (t) = exp h(t)dBH (t) − 12 h 2L2 (R) ψ(ω) = exp H
R
for some h ∈ L2H (R).
R
(129) 43
Bernt Øksendal The set of all linear combinations of such ψ is dense in both (S) and (S)∗ . Moreover, these ψ are normalized, in the sense that E exp h(t)dBH (t) = 1 for all h ∈ L2H (R). (130) R
We let D denote the set of all market observers of the form (129). Similarly, a generalized portfolio is another adapted process θ(t) = θ(t, ω) = (θ0 (t, ω), θ1 (t, ω));
(t, ω) ∈ [0, T ] × Ω
representing a general strategy for choosing the number of units of investment number i at time t; i = 0, 1. (For example, θ1 (t) could be the usual “buy and hold” strategy, consisting of buying a certain number of stocks at a stopping time τ1 (ω) and holding them until another stopping time τ2 (ω) > τ1 (ω). Or θ1 (t) could be the strategy to hold a fixed fraction of the current wealth in stocks.) If the actual observed price at time t is Sˆ1 (t) = S1 (t, ·), ψ(·), the actual number of stocks held is (131) θˆ1 (t) := θ1 (t, ·), ψ(·). Thus the actual observed wealth Vˆ1 (t) held in the risky asset corresponding to this portfolio is Vˆ1 (t) = θ1 (t), ψ · S1 (t), ψ.
(132)
By Lemma 1 below this can be written Vˆ1 (t) = θ1 (t) S1 (t), ψ,
(133)
where denotes the Wick product. In fact, F := θ1 (t) S1 (t) is the unique F ∈ (S)∗ such that
F, ψ = θ1 (t), ψ · S1 (t), ψ
for all ψ ∈ D.
(134)
In view of this it is natural to define the generalized total wealth process V (t, ω) associated to θ(t, ω) by the Wick product V (t, ·) = θ(t, ·) S(t, ·) = θ0 (t)S0 (t) + θ1 (t) S1 (t).
(135)
Similarly, if we consider a discrete time market model and keep the generalized portfolio process θ(t) = θ(tk , ω); 44
tk ≤ t < tk+1
Fractional Brownian Motion in Finance constant from t = tk to t = tk+1 , the corresponding change in the generalized wealth process is ΔV (tk ) = θ(tk ) ΔS(tk ),
(136)
where ΔV (tk ) = V (tk+1 ) − V (tk ),
ΔS(tk ) = S(tk+1 ) − S(tk ).
If we sum this over k and take the limit as Δtk = tk+1 − tk goes to 0, we end up with the following generalized wealth process formula T
T θ(t) dS(t) = V (0) +
V (T ) = V (0) + 0
θ(t)δS(t),
(137)
0
where δS(t) means that the integral is interpreted in the (Wick-Itˆo-) Skorohod sense. Therefore, by (125)–(126), T T V (T ) = V (0) + rθ0 (t)S0 (t)dt + μθ1 (t) S1 (t)dt 0
0
T σθ1 (t) S1 (t)δBH (t).
+
(138)
0
We now prove the fundamental result which explains why the Wick product suddenly appears in (133) above: Lemma 1. a) Let F, G ∈ (S)∗ . Then
F G, ψ = F, φ · G, ψ
for all ψ ∈ D.
(139)
b) Moreover, if Z ∈ (S)∗ is such that
Z, ψ = F, ψ · G, ψ then
for all ψ ∈ D
Z = F G.
45
Bernt Øksendal Proof: a) Choose ψ = exp
h(t)dBH (t) ∈ D.
R
We may assume that F = exp
f (t)dBH (t)
and
G = exp
R
g(t)dBH (t)
R
for some f, g ∈ L2H (R), because the set of all linear combinations of such Wick exponentials is dense in (S)∗ . For such F, G, ψ we have
F, ψ = E[F · ψ]
G, ψ = E[G · ψ].
and
Therefore
F G, ψ = E exp
= E exp
(f + g)dBH · exp
R
(f + g)dBH − 12 f + g 2L2 (R)
hdBH
R
H
R
· exp
hdBH − 12 h 2L2 (R)
H
R
(f + g + h)dBH
= E exp R
− 12 f 2L2 (R) − 12 g 2L2 (R) − 12 h 2L2 (R) − (f, g)L2H (R) H H H = E exp (f + g + h)dBH − 12 f + g + h 2L2 (R)
H
R
+ (f, h)L2H (R) + (g, h)L2H (R) (f + g + h)dBH · exp(f + g, h)L2H (R) = E exp R
= exp(f + g, h)L2H (R) . 46
(140)
Fractional Brownian Motion in Finance On the other hand, a similar computation gives f dBH · exp hdBH
F, ψ · G, ψ = E exp
· E exp
R
gdBH · exp
R
R
hdBH
R
= exp(f, h)L2H (R) · exp(g, h)L2H (R) = exp(f + g, h)L2H (R) .
(141)
Comparing (140) and (141) we get a). b) This follows from the fact that the set of linear combinations of elements of D is dense in (S), and (S)∗ is the dual of (S). Remark 3. We emphasize that this model for f Bm in finance does not a priori assume that the Wick product models the growth of wealth. In fact, the Wick product comes as a mathematical consequence of the basic assumption that the observed value is the result of applying a test function to a distribution process describing in a broad sense the value of a company. This way of thinking stems from microcosmos (quantum mechanics), but it has been argued that it is often a good description of macrocosmos situations as well. Here is an example: An agent from an opinion poll firm stops a man on the street and asks him what political party he would vote for if there was an election today. Often this man on the street does not really have a firm opinion about this beforehand (he is in a diffuse state of mind politically), but the contact with the agent forces him to produce an answer. In a similar sense the general state of a company does not really have a noted stock price a priori, but brings out a number (price) when confronted with a market observer (the stock market). ∇ In view of the above we now make the following definitions: Definition 14. a) The total wealth process V θ (t) corresponding to a portfolio θ(t) in the Wick-Skorohod model is defined by V θ (t) = θ(t) S(t).
(142)
b) A portfolio θ(t) is called Wick-Skorohod self-financing if δV θ (t) = θ(t)δS(t)
(143) 47
Bernt Øksendal i.e.
t θ
θ
V (t) = V (0) +
t θ0 (s)dS0 (s) +
0
θ1 (s)δS1 (s).
(144)
0
In particular, we assume that the two integrals in (144) exist. By the Girsanov theorem for f Bm [see e.g. Molchan (1969), Valkeila (1999), [EvdV], Hu and Øksendal (2003)] there exists a probability measure Q on (Ω, F) such that Q is equivalent to P (i.e. Q has the same null sets as P ) and such that ˆH (t) := μ − r t + BH (t) B σ
(145)
is a fractional Brownian motion w.r.t. Q. ˆH (t) in (144) we get Replacing BH (t) by B e
−rt
t θ
θ
V (t) = V (0) +
ˆH (s). e−rs σθ1 (s) S1 (s)δ B
(146)
0
Definition 15. We call a portfolio θ(t) Wick-Skorohod admissible if it is Wick-Skorohod self-financing and θ1 (s)S1 (s) is Skorohod integrable ˆH (s). w.r.t. B Definition 16. A Wick-Skorohod admissible portfolio θ(t) is called a strong arbitrage if the corresponding total wealth process V θ (t) satisfies V θ (0) = 0 V θ (T ) ∈ L2 (Q) and P [V θ (T ) > 0] > 0.
(147) V θ (T ) ≥ 0 a.s. P
(148) (149)
The following result was first proved by Hu and Øksendal (2003) for the case 12 < H < 1 and then extended to arbitrary H ∈ (0, 1) by Elliott and Van der Hoek (2003) (in a related model): Theorem 25. There is no strong arbitrage in the Wick-Skorohod fractional Black-Scholes market (125)–(126). 48
Fractional Brownian Motion in Finance Proof: If we take the expectation with respect to Q of both sides of (146) with t = T we get, by (90), e−rT EQ [V θ (T )] = V θ (0). From this we see that (147)–(149) cannot hold.
Remark 4. Note that the non-existence of a strong arbitrage in this market (where the value process S1 (t) is not a semimartingale) is not in conflict with the result of Delbaen and Schachermayer (1994) mentioned in Remark 2, because in this market the underlying products are Wick products, not ordinary pathwise products. ∇ We proceed to discuss completeness in this market: Definition 17. The market is called (Wick-Skorohod) complete if (H) for every FT -measurable random variable F ∈ L2 (Q) there exists an admissible portfolio θ(t) = (θ0 (t), θ1 (t)) such that F = V θ (T ) a.s.
(150)
By (146) we see that this is equivalent to requiring that there exists φ such that −rT
e
−rT
F (ω) = e
T ˜H (s), φ(s, ω)δ B
EQ [F ] +
(151)
0
where φ(s) = e−rs σ θ1 (s) S1 (s).
(152)
If such a φ can be found, then we put θ1 (s) = σ −1 ers S1 (s)(−1) φ(s).
(153)
It was proved by Hu and Øksendal (2003) (for 12 < H < 1) and subsequently by Elliott and Van der Hoek (2003) in a related market (for arbitrary H ∈ (0, 1)) that this market is complete. In fact, we have: 49
Bernt Øksendal (H)
Theorem 26.8 Let F ∈ L2 (Q) be FT -measurable. Then F = V θ (T ) a.s. for θ(t) = (θ0 (t), θ1 (t)), with ˆ t(H) F | Ft(H) ], θ1 (t) = σ −1 e−ρ(T −t) S1 (t)(−1) E˜Q [D
(154)
ˆ t(H) is the where E˜Q [·|·] denotes the quasi-conditional expectation and D ˆH (·).9 The other fractional Hida-Malliavin derivative with respect to B component, θ0 (t), is then uniquely determined by the self-financing condition (144). In the Markovian case, i.e. when F (ω) = f (BH (T )) for some function f : R → R, we can give a more explicit expression for the replicating portfolio θ(t). This is achieved by using the following representation theorem, due to C. Bender (2003a). It has the same form as in the well-known classical case (H = 12 ): Theorem 27.10 Let f : R → R be such that E[f 2 (BH (T ))] < ∞ . Then T f (BH (T )) = E[f (B(T ))] + φ(t, ω)dBH (t), (155) 0
where φ(t, ω) =
∂ E[f (x + BH (T − t))] . ∂x x=BH (t)
(156)
In view of the interpretation of the observed wealth Vˆ (t) as the result of applying a test function ψ ∈ D to the general wealth process V (t), i.e. Vˆ (t) = V (t), ψ, (157) the following alternative definition of an arbitrage is natural (compare with Definition 16): Definition 18. A Wick-Skorohod admissible portfolio θ(t) is called a weak arbitrage if the corresponding total wealth process V θ (t) satisfies V θ (0) = 0
8
(158)
V θ (T ), ψ ≥ 0
for all ψ ∈ D
(159)
V θ (T ), ψ > 0
for some ψ ∈ D.
(160)
Hu and Øksendal (2003), Elliott and Van der Hoek (2003). See Hu and Øksendal (2003) and Elliott and Van der Hoek (2003) for details. 10 Bender (2003a). 9
50
Fractional Brownian Motion in Finance Do weak arbitrages exist? The answer is yes. Here is an example, due to C. Bender (2003b): Example 9.11 (A weak arbitrage). −1 if Kε (x) = 1 if
For ε > 0 define |x| ≤ ε |x| > ε.
(161)
Then there exists ε0 > 0 such that
Kε0 (x) exp − 12 x2 dx = 0.
(162)
R
By a variant of Lemma 2.6 in Bender (2002) we have E K( ω, f ) exp( ω, g − 12 g 2L2 (R) ) H 2 2 (R) ) (u − (f, g) L H = (2π)−1/2 f L2H (R) K(u) exp − , 2||f 2L2 (R) R
(163)
H
for all bounded K : R → R, f, g ∈ L2H (R). Applying (163) to f = X[0,1] and ω, f = BH (1) we get E[Kε0 (BH (1))] = 0 (164) 2 1 2 E Kε0 (BH (1)) exp( ω, g − 2 g L2 (R) ≥ 0 for all g ∈ LH (R) H (165) (166) E Kε0 (BH (1)) exp( ω, X[0,1] − 12 X[0,1] 2L2 (R) > 0. H
Now consider the Skorohod fractional market (125)–(126) with r = μ = 0,
σ = T = 1.
Then S0 (t) = 1 and S1 (t) = x exp(BH (1) − 1/2). 11
Bender (2003b).
51
Bernt Øksendal ˜H (t) = BH (t) and P = Q. Hence by Theorem 26 and Moreover, B (4.50) there exists a Skorohod self-financing portfolio θ(t) = (θ0 (t), θ1 (t)) such that
T θ
Kε0 (BH (1)) = V (1) =
θ1 (s)δS(s) a.s.
(167)
0
Then V θ (0) = 0 and by (165), (166) and (129) we see that (159) and (160) hold. Hence θ(t) is a weak arbitrage. 1.5.3
A connection between the pathwise and the Wick-Skorohod model
In spite of the fundamental differences in the features of the pathwise model and the Wick-Skorohod model, it turns out that there is a close relation between them. Assume H ∈ ( 12 , 1). Fix ψ ∈ D and define the function bH : [0, T ] → R by bH (t) = BH (t), ψ = E[BH (t) · ψ].
(168)
Then for p > 1 and any partition P : 0 = t0 < t1 < · · · < tN = T of |0, T ] we have N −1
|bH (tj+1 ) − bH (tj )| = p
j=0
N −1
|E[(BH (tj+1 ) − BH (tj )) · ψ]|p
j=0
≤
N −1
(E[|BH (tj+1 ) − BH (tj )|p ]1/p · E[ψ q ]1/q )p
j=0
≤C
N −1
E[|BH (tj+1 ) − BH (tj )|p ],
j=0 1 p
1 q
where + = 1. Hence, by a known property of f Bm, sup P
N −1
|bH (tj+1 ) − bH (tj )|p < ∞
iff p ≥
1 H
.
j=0
In this sense the continuous function bH (t) is at least as regular as a generic path of a fractional Brownian motion BH (t, ω). Therefore we can define integration with respect to bH (t) just as we define pathwise 52
Fractional Brownian Motion in Finance integration with respect to BH (t). Now suppose we start with the wealth generating formula in the Wick-Skorohod model T θ
θ
V (T ) = V (0) +
φ(s, ω)δBH (s).
(169)
0
Suppose φ is caglad and ψ ∈ D. Then this gives T θ θ θ Vˆ (T ) = V (T ), ψ = V (0) + φ(s, ω)δBH (s), ψ = V θ (0) + lim
−1 N
Δtj →0
= V θ (0) + lim
Δtj →0
= V θ (0) + lim
Δtj →0
0
φ(tj ) (BH (tj+1 ) − BH (tj )), ψ
j=0 N −1
φ(tj ), ψ BH (tj+1 ) − BH (tj ), ψ
j=0 N −1
ˆ j )(bH (tj+1 ) − bH (tj )) φ(t
j=0
T ˆ φ(t)db H (t).
= V θ (0) +
(170)
0
We can summarize this as follows: Theorem 28. If H > 12 the mapping F → F, ψ; F ∈ L2 (P ) transforms the Wick-Skorohod fractional Brownian motion model into the pathwise fractional Brownian motion model. If H = 12 this mapping transforms the Wick-Skorohod Brownian motion model into the classical Brownian motion model. 1.6
Concluding remarks
At first glance there seems to be a disagreement between the existence of arbitrage in the (fractional) pathwise model (see Theorem 24) and the non-existence of a (strong) arbitrage in the Wick-Skorohod model (Theorem 25). The above discussion, including in particular Theorem 4.16, serves to explain this apparent contradiction: The arbitrages in the pathwise model correspond to the weak arbitrages in the WickSkorohod model (see Example 9), and not to the (non-existent) strong arbitrages. 53
Bernt Øksendal In spite of the mathematical coherence of the Wick-Skorohod model, there is still a lot of controversy about its economic interpretation and features. We refer to the discussions in Bj¨ork and Hult (2005), and Sottinen and Valkeila (2003) for more details. Acknowledgements: I am grateful to Christian Bender, Tomas Bj¨ork, Nils Christian Framstad, Walter Schachermayer and John van der Hoek for helpful communication. References: Bender, C. (2002) The Fractional Itˆo Integral, Change of Measure and Absence of Arbitrage. Manuscript. Bender, C. (2003a) “An Itˆo Formula for Generalized Functionals of a Fractional Brownian Motion with Arbitrary Hurst Parameter.” Stochastic Processes and Their Applications 104: 81–106. Bender, C. (2003b) Construction of a Weak Arbitrage. Manuscript, May. Biagini, F., Hu, Y., Øksendal, B., and Zhang, T. Fractional Brownian Motion and Applications. Springer-Verlag (Forthcoming). Biagini, F., and Øksendal, B. (2004) Forward Integrals and an Itˆo Formula for Fractional Brownian Motion. Preprint, Dept. of Mathematics, University of Oslo 22/2004. Biagini, F., Øksendal, B., Sulem, A., and Wallner, N. (2004) “An Introduction to White Noise Theory and Malliavin Calculus for Fractional Brownian Motion.” The Proceedings of the Royal Society 460: 347–372. Bj¨ork, T., and Hult, H. (2005) “A Note on the Wick Products and the Fractional Black-Scholes Model.” Finance and Stochastics 9: 197–209. Brody, D., Syroka, J. and Zervos, M. (2002) “Dynamical Pricing of Weather Derivatives.” Quantitative Finance 2: 189–198. Coutin, L., and Qian, Z. (2002) “Stochastic Analysis, Rough Path Analysis and Fractional Brownian Motions.” Prob. Theory Related Fields 122: 108–140. Dasgupta, A. (1997) Fractional Brownian Motion: Its Properties and Applications to Stochastic Integration. Ph. D. thesis, Dept. of Statistics, Univ. of North Carolina at Chapel Hill. 54
Fractional Brownian Motion in Finance ¨ unel, A. S. (1998) “Stochastic Analysis of Decreusefond, L., and Ust¨ the Fractional Brownian Motion.” Potential Analysis 10: 177–214. Delbaen, F., and Schachermayer, W. (1994) “A General Verion of the Fundamental Theorem of Asset Pricing.” Mathematische Annalen 300: 463–520. Duncan, T. E., Hu, Y., and Pasik-Duncan, B. (2000) “Stochastic Calculus for Fractional Brownian Motion.” SIAM Journal on Control and Optimization 38: 582–612. Elliott, R., and van der Hoek, J. (2003) “A General Fractional White Noise Theory and Applications to Finance.” Mathematical Finance 13: 301–330. Gradinaru, M., Nourdin, I., Russo, F., and Vallois, P. (2002) m-order integrals and generalized Itˆo’s formula: the case of a fractional Brownian motion with any Hurst index. Preprint. Hida, T., Kuo, H.-H., Potthoff, J., and Streit, L. (1993) White Noise Analysis. Kluwer. Holden, H., Øksendal, B., Ubøe, J., and Zhang, T. (1996) Stochastic Partial Differential Equations. Birkh¨auser. Hu, y. (2003) Integral Transformations and Anticipative Calculus for Fractional Brownian motion. Manuscript. Hu, Y., and Øksendal, B. (2003) “Fractional White Noise Calculus and Application to Finance.” Infinite Dimensional Analysis, Quantum Probability and Related Topics 6: 1–32. Kuo, H.-H. (1996) White Noise Distribution Theory. CRC Press. Mishura, Y. (2002) Fractional Stochastic Integration and Black-Scholes Equation for Fractional Brownian Model with Stochastic Volatility. Manuscript, December. Molchan, G. (1969) “Gaussian Processes with Spectra Which are Asymptotically Equivalent to a Power of λ.” Theory of Probability and Its Applications 14: 530–530. Norvaisa, R. (2000) “Modelling of Stock Price Changes. A Real Analysis Approach.” Finance and Stochastics 4: 343–369. Nualart, D. (1995) The Malliavin Calculus and Related Topics. Springer-Verlag. Nualart, D. (2004) Stochastic Integration with Respect to Fractional Brownian Motion and Applications. Preprint. Nualart, D., and Pardoux, E. (1988) “Stochastic Calculus with Anticipating Integrands.” Probability Theory and Related Fields 78: 555–581. 55
Bernt Øksendal Rogers, L.C. (1997) “Arbitrage with Fractional Brownian Motion.” Math. Finance 7: 95–105. Russo, F., and Vallois, P. (2000) “Stochastic Calculus with Respect to Continuous Finite Quadratic Variation Processes.” Stochastics and Stochastics Reports 70: 1–40. Salopek, D. M. (1998) “Tolerance to Arbitrage.” Stochastic Processes and Their Applications 76: 217–230. Shiryaev, A. (1999) Essentials of Stochastic Finance. World Scientific Publishing Company. Simonsen, I. (2003) “Measuring Anti-Correlations in the Nordic Electricity Spot Market by Wavelets.” Physica A: Statistical Mechanics and Its Applications 322: 597–606. Sottinen, T. and Valkeila, E. (2003) “On Arbitrage and Replication in the Fractional Black-Scholes Pricing Model.” Statistics and Decisions 21: 93–108. Valkeila, E. (1999) On Some Properties of Geometric Fractional Brownian Motion. Preprint, Univ. of Helsinki, May. J. van der Hoek: Private Communication.
56
Chapter 2 Moment Evolution of Gaussian and Geometric Wiener Diffusions
Bjarne S. Jensen Chunyan Wang University of Southern Denmark and Copenhagen Business School Jon Johnsen Department of Mathematical Sciences, Aalborg University
2.1
Introduction
The purpose of this chapter is to analyse two basic stochastic models in the plane: The time homogeneous Gaussian and Geometric Wiener diffusions. Using the theory of stochastic processes and the Itˆo lemma, the probability distributions of the stochastic state vectors are described by the evolution of their moments (expectation and covariance as functions of time). These moments satisfy certain systems of (deterministic) ordinary differential equations (ODE). We solve these ODE and present explicit solutions (time paths) for the first-order and second-order moments. The forward Kolmogorov equation is used to derive the same moment functions by alternative solution methods and gain further information on the probability distributions. Motivation. Uncertainty or incompleteness evidently prevails in the process descriptions of many scientific disciplines. Hence, stochastic models must often be used for an adequate mathematical representation of the dynamic systems. For any stochastic process X(t),
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen an ideal but unattainable situation is to get an explicit formula for the probability distribution P (t, x) of X(t) at any future instant t. We deal with the situations, where we want to obtain a collection of distributions P (t, x) - corresponding to various drift and diffusion coefficients, a(x, t) and B(x, t), that for X(t) enter, respectively, the system of stochastic differential equations (SDE) in the Itˆo form or the forward Kolmogorov partial differential equation (PDE). Our intention is to elucidate to what extent the usual methods of the stochastic literature can provide explicit (closed form) expression of P (t, x). It turns out that the use of both Itˆo’s lemma and the forward Kolmogorov equation can only give explicit formulas in the very simplest cases: the Gaussian diffusion (GD) and geometric Wiener diffusion (GWD), in which a(X, t) and B(X, t) are suitable first-order polynomials in the state variables X alone. For these two time-homogeneous diffusion processes, it is possible in the two-dimensional case to write up the complete expressions for the evolution of the mean vector and covariance matrix and also for their probability distributions. To our knowledge, these moment solution formulas have not been derived before, although their explicit derivation facilitates the understanding and modeling of stochastic processes in physics, biology, economics, finance and technological sciences. Given the increasing usage of stochastic differential equations, the results are likely to be of general interest. Overview of results. In section 3, the evolution of the mean vector m(t) = [mx (t) my (t)] for the GD and GWD models is given in Theorem 1 with asymptotics in Corollary 1. In section 4, the evolution of the covariance matrix Σ(t), covariance vector σ(t) = [σ xx (t) σ xy (t) σ yy (t)] for the GD and GWD models are given in Theorem 2, with asymptotics for covariances and correlation coefficients of the GD model in Corollary 1-2. In section 5 - using the Kolmogorov partial differential equation the evolution of the density functions, and the time paths for the mean vector m(t) and the covariance matrix Σ(t) of the GD and GWD models are corroborated in Theorem 3 and Theorem 4. However, a GWD density function for transition probability distribution does not exist, but the probability measure is given. The limitations of the methods. Regarding the difficulties of deriving the explicit moment formulas, it is easy to understand our restriction to diffusion processes in the plane. Indeed, the time dependence often emanates from an exponential matrix etA , and in higher 58
Moment Evolution of Gaussian and Wiener Diffusions dimensions, it is in general impossible to write the eigenvalues of A as functions of its entries (in contrast to cases where the characteristic polynomial has the degree two). In three dimensions, there are six distinct second-order moments, and hence six linear differential equations to be solved explicitly. But according to Abel’s theorem, even the general quintic polynomial is unsolvable algebraically. For planar diffusion processes, however, the roots of our cubic characteristic polynomial became simple expressions of at most eight fundamental drift and diffusion parameters. By the Itˆ o Lemma, the ordinary differential equations for the first and second order moments are uncoupled when drift and diffusion coefficients depend linearly on state variables, x. In all other cases, knowledge of the full distribution P (t, x) is required just to write down the moment differential equations. Alternatively, one could get an infinite number of coupled differential equations in moments of arbitrarily high order. We therefore only consider the GD and GWD models. When applying the Kolmogorov equation, it is necessary to know beforehand that P (t, x) has a density function. For our GWD model, this condition is not fulfilled (since we require the drift coefficient Ax to have an arbitrary matrix A, we must, as explained below, use the Wiener process of dimension one). Morever, the uniqueness of the obtained solution needs to be proved. Needless to say, heavy calculations are involved in obtaining the final formulas for the second-order moment evolutions. The complications arise more from the intricate interconnections of the steps than from the difficulty of any step in particular. While computers cannot collect the intermediate elements into compact formulas, computer programs (here Maple IV) can check and confirm the final explicit solutions that we obtain. 2.2
Structure of basic diffusion processes
2.2.1 Stochastic preliminaries Consider the stochastic differential equations (SDE) in the Itˆo form, (1)
dX = a(X, t)dt + B(X, t)dw X(t) ∈ R , n
a∈R , n
B∈R
n×r
,
w(t) ∈ R , r
(2)
where a(X, t) is an n-dimensional vector function, called the drift coefficient, and B(X, t) is an n × r matrix function, called the diffusion 59
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen coefficient. The elements of a and B are Borel-measurable functions from [0, ∞) × Rn into R. The stochastic state vector is X(t) ∈ Rn , while x ∈ Rn denotes the state variable. The random (stochastic) vector, dw ∈ Rr , represents the noise in the stochastic dynamic system (1). As a stochastic process, w(t) is assumed to be an r-dimensional standard Wiener process with t ∈ R, and hence w(t) has continuous (but nowhere differentiable) sample paths (phase paths, trajectories). The drift coefficient a(X, t) determines the local drift (change, increment) of the expected value (mean, average trend of evolution) of the stochastic process X(t) in a short interval of time from t to t+dt under the condition that X(t) = x. The matrix product BB T (T =transpose) of the diffusion coefficient B(X, t) determines the local dispersion (the size of the central second-order moments, the mean square deviation of the stochastic process X(t) from the original position x) during a short period of time from t to t + dt.1 From probability theory, it is well known that, if the multi-dimensional functions a(x, t) and B(x, t) satisfy both the Lipschitz and the linear growth conditions and are continuous with respect to t, then the stochastic process X(t), solving (1), is a continuous Markov process with transition distribution (density) functions that are, under certain regularity assumptions, uniquely determined by only their first- and second-order moments. These moments are then completely described by, respectively, the drift and diffusion coefficients in (1). Such continuous Markov processes are called Itˆo diffusion processes. In addition, when the transition density function p(x, t | x0 ) of a diffusion process exists, it satisfies the (forward) Kolmogorov equation (PDE), n ∂ 2 p(x, t | x0 ) 1 ∂p(x, t | x0 ) B(x, t)B T (x, t) j,k = ∂t 2 j,k=1 ∂xj ∂xk
−
n j=1
aj (x, t)
∂p(x, t | x0 ) , ∂xj
(3)
where the elements of a(x, t) and B(x, t)B T (x, t) enter, respectively, as the coefficients of the first-order and second-order partial derivatives.2 More generally, the distribution itself, P (t, x | x0 ), solves (3). See also Appendix D. Moreover, for any function of a diffusion process, X(t), Itˆo’s Lemma gives the following result:3 1
Cf. Prohorov and Rozanov (1969, pp. 258, 282). Cf. Prohorov (1969, pp. 282). 3 Cf. Karatzas and Shreve (1988), and Øksendal (2005, pp. 48). 2
60
Moment Evolution of Gaussian and Wiener Diffusions Lemma 1. (Itˆ o). Let X(t) ∈ Rn be a general diffusion process defined as in (1). If F (x, t) is an arbitrary C 2 map from Rn+1 → R, then dF (X, t) = Ft dt + FxT dX + 1/2dX T Fxx dX
(4)
i.e., F (X, t), determined by diffusion process X(t), is again a diffusion process where Ft and Fx represent, respectively, the partial derivatives with respect to t and x of the function F (x, t), and Fxx represents the Hessian matrix of the function F (x, t), and, (dwi )2 = dt ∀i; dwi ·dwj = 0, for i = j; (dt)2 = 0; dt · dwi = 0. Using this Lemma, one can study many properties of the diffusion process X(t) governed by (1). A well-known decisive property of a diffusion process is that its conditional transition probability, under certain regularity assumptions, is uniquely determined by only the first-order or second-order moments, which again are completely determined by the drift and diffusion coefficients. Therefore, it is sufficient to study the functions for the first-order and the second-order moments of diffusion processes. By Lemma 1, the following lemma for the moment statistics of the Itˆo diffusion process (1) can be derived:4 Lemma 2. The mean vector m(t) and variance-covariance matrix Σ(t) of the transition probability distribution for the family of solutions to the stochastic differential equation, (1), satisfy the deterministic ordinary differential equations (ODE), dm(t) = E {a(X, t)} (5) dt dΣ(t) ˙ = E a(X, t) X T − mT (t) + [X − m(t)] aT (X, t) Σ(t) = dt (6) + B(X, t)B T (X, t) m ˙ =
where m(t) = [m1 m2 . . . mn ]T = [E(X1 ) E(X2 ) . . . E(Xn )]T Σ(t) = (σ ij )n×n ,
and
σ ij = E[(Xi − mi )(Xj − mj )],
(7) (8)
for i, j = 1, 2,. . . , n. Therefore, m(t) and Σ(t) of the dynamic stochastic system (1) can be studied by solving the differential equations (5)-(6), and the future 4
Cf. Pugachev and Sinitsyn (1987, pp. 302).
61
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen probability behavior of the diffusion process X(t) is described by the time paths of these first-order and second-order moments. Remark 1. The differential equations for the moments usually constitute an infinite coupled system, because the right hand sides of (5)-(6) and the equations for the other moments contain moments of arbitrarily high order. Because knowledge about the full probability distribution, (probability density function), is necessary to solve (5) and (6), numerical methods are much more used in practice. For more details, see Soong (1973). But when the drift- and diffusion coefficients are linear functions in the state variables x, the solution of (5) may be inserted in (6). However, although we specialize to dimension n = 2 – and for the GWD model to a one-dimensional Wiener process – the resulting system for variances-covariance, σ xx (t), σ xy (t), σ yy (t), is only barely solvable. The difficulty is to determine etA (cf. Appendices B and C). ∇ 2.2.2
Linear Itˆo diffusions in the plane
Henceforth, we make three assumptions [occasionally, the state vector x is written as (x, y), and similarly for X(t)]: Assumption 1. The stochastic dynamic system (1) is a time-homogeneous (independent of time) system in the Euclidean plane, i.e., a(x, t) = a(x),
B(x, t) = B(x);
x ∈ R2 .
(9)
Assumption 2. The drift coefficients are linear functions of the state variables a b x x a1 (x) = Ax = ; x= . (10) a(x) = c d y y a2 (x) Assumption 3. The diffusion coefficients and the vector dw may, respectively, take two forms, either β 11 β 12 dw1 B(x)dw(t) = , (11) β 21 β 22 dw2 i.e., the noise vector has two different independent random elements, and the diffusion coefficients are given by a constant matrix - or βx B(x)dw(t) = dw, (12) βy 62
Moment Evolution of Gaussian and Wiener Diffusions i.e., the noise vector has only one random element, but the components of the diffusion coefficient depend on the state of the system; (12) is often called a geometric Wiener diffusion (GWD) process. Combining Assumption 1-3, we can get two basic time-homogeneous diffusion models in the Euclidean plane: GD model: General Bivariate Gaussian Diffusion: Using (9) (10) and (11), we have, in compact notation, dX = AXdt + Bdw
(13)
dX = (aX + bY )dt + β 11 dw1 + β 12 dw2 , dY = (cX + dY )dt + β 21 dw1 + β 22 dw2 .
(14)
or explicitly,
GWD model: Bivariate Geometric Wiener Diffusion: With Assumption (9), (10) and (12), we have, dX = AXdt + βXdw
(15)
dX = (aX + bY )dt + βXdw dY = (cX + dY )dt + βY dw
(16)
or in explicit form,
Remark 2. In practice, it is reasonable to think that the factors influencing the states of the system interact, and to assume that the uncertainties attributable to the interacting factors are correlated. Therefore, it is necessary to include dependent Wiener processes that are correlated with covariance matrix Σ. The stochastic model takes the form dX = (aX + bY )dt + β 11 dW1 + β 12 dW2 , (17) dY = (cX + dY )dt + β 21 dW1 + β 22 dW2 . In this case, W1 and W2 are not independent Wiener processes, but by using the linear transformation method to replace the correlated Wiener processes with the independent Wiener processes, the diffusion model (14) can be obtained: Rewriting (17) in the vector-matrix form dX = AXdt + BdW
W ∼ N (0, Σ)
(18)
63
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen 1
and w = DW with D = Σ− 2 , then w ∼ N (0, 1) are the independent Wiener processes. By replacing W with D−1 w, (18) is transformed into the stochastic differential equations (14). In the following, we will therefore focus only on the analysis of the probability properties of the diffusion process with independent Wiener processes. ∇ 2.3
Dynamics of first-order and second-order moments
Applying Lemma 2 to (14) and (16), we get ordinary differential equations (ODE) for the moments of both the Gaussian and geometric Wiener diffusions. Proposition 1. The components of the mean vector function, m = [mx my ]T , satisfy the differential equations, a b mx m ˙x = Am = m ˙ = m ˙y my c d
(19)
for both of the GD and GWD models (14) and (16), respectively. Proof: By applying (5) to (14) or (16), m ˙x ax + by m ˙ = = E {AX} = E m ˙y cx + dy
=
amx + bmy cmx + dmy
2
which is (19). Proposition 2. The covariance vector function, σ = [σ xx σ xy σ yy ]T
(equivalent to covariance matrix Σ) satisfies the differential equations: GD model: σ˙ = Cσ + δ, (20) or explicitly ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ 2a 2b 0 σ xx β 211 + β 212 σ˙ xx ⎣ σ˙ xy ⎦ = ⎣ c a + d b ⎦ ⎣ σ xy ⎦ + ⎣ β 11 β 21 + β 12 β 22 ⎦ (21) 0 2c 2d σ˙ yy σ yy β 221 + β 222
64
Moment Evolution of Gaussian and Wiener Diffusions GWD model: ˜ + ˜δ(t) = (C + β 2 I)σ + ˜δ(t), σ˙ = Cσ
(22)
or explicitly ⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤ 2a + β 2 m2x 2b 0 σ˙ xx σ xx ⎣ σ˙ xy ⎦ = ⎣ ⎦ ⎣ σ xy ⎦+β 2 ⎣ mx my ⎦ b c a + d + β2 m2y σ˙ yy σ yy 0 2c 2d + β 2 (23) Proof: Applying (6) to (14) to get the differential equation for the covariance functions of the two diffusion processes, we get, cf. (11), d
(t) = dt
ax + by σ˙ xx σ˙ xy x − mx y − my =E σ˙ xy σ˙ yy cx + dy x − mx ax + by cx + dy +E y − my β 11 β 21 β 11 β 12 (24) +E β 21 β 22 β 12 β 22
Since E [x(x − mx )] = E [(x − mx )2 ] = σ xx , etc., the right hand side of (24) becomes 2aσ xx + 2bσ xy cσ xx + (a + d) σ xy + bσ yy + BB T . cσ xx + (a + d) σ xy + bσ yy 2cσ xy + 2dσ yy (25) T By letting σ = [σ xx σ xy σ yy ] and using (24) and (25), we get (21). To obtain (23), applying (6) to (16), the first two terms in (6) are the same as those in (24), while the third term is, cf. (12), % & 2 2 βx β x β 2 xy βx βy . =E E(BB T ) = E βy β 2 xy β 2 y 2 As E(x2 ) = σ xx + m2x , etc., the combination of terms as before yields (23). 2 Remark 3. Since the homogeneous diffusion process is a timehomogeneous Gaussian diffusion, the differential equations (21) can also be derived directly from the following expression Σ˙ = AΣ + ΣAT + BB T
(26) 65
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen where Σ = (σ ij ), is the 2 × 2 covariance matrix, while B and A are given as in (14). The expression (26) is obtained in, e.g., Jacobsen (1970, pp. 12). ∇ Now, the differential equations for the moments of both the Gaussian and geometric Wiener diffusion processes have been obtained; clearly, the mean functions for the Gaussian diffusion process and for the geometric Wiener diffusion process satisfy the same differential equation. Contrary to this, the differential equation for the covariance function of the Gaussian diffusion is obviously different from that of the geometric diffusion. For the geometric Wiener diffusion, it can be seen in (23), that the geometric parameter β appears not only in the non-homogeneous but also in the homogeneous parts of the differential equations. In addition, the mean functions mx (t) and my (t) now appear as components in the non-homogeneous part of (23). Obviously, they are functions of time t, satisfying (19). It is easily recognized that if one has β 1 and β 2 (instead of having just β for both state variables), or moreover two different Wiener processes dw1 and dw2 instead of just dw, then the equation for the covariance function in (23) will be much more complex and probably impossible to solve. It is worthwhile to try to solve (19)-(23) explicitly in closed form, in order to study the probability distribution, nonsingularity, stability, etc., of the diffusion processes. 2.3.1
Eigenvalues of A, C, and C˜
As a preparation, we make a few observations on the drift coefficient matrix A in (19). It has a trace, determinant, and characteristic polynomial given by, |A| = ad − bc,
trA = a + d,
(27)
2
|A − λI| = λ − (trA)λ + |A|.
(28)
2
The latter (28) has discriminant 4Δ with, 1 Δ2 = (a + d)2 − (ad − bc) = 4 d − a 2 bc = 1 − Δ2 2Δ . Henceforth, we use the notation: √ Δ2 > 0 : Δ ≡ + Δ2 , Δ2 66
1 (d − a)2 + bc 4
⇔ (29)
' < 0 : Δ ≡ + |Δ2 |.
(30)
Moment Evolution of Gaussian and Wiener Diffusions Generally, the eigenvalues of A may be written, cf. (28)-(30), as Δ2 > 0 : λ1 = 12 (a + d) + Δ, λ2 = 12 (a + d) − Δ 2
Δ =0: 2
Δ 0 : α1 = (d − a)/2b + Δ/b , α2 = (d − a)/2b − Δ/b 2
Δ = 0 : α = (d − a)/2b = 2c/(a − d)
(34) (35)
When bc = 0, d = a, one of the axes contains an eigenvector, whereas the other eigenvector has a slope given as (d − a)/b or c/(a − d). With reference to the fundamental parameter elements of A, (19), we can now analyze the Itˆ o coefficient matrix C, (20- 21), defined in Proposition 2. Obviously C is real; it has a trace and determinant, tr C = 3 (a + d) = 3 tr A, |C| = 4 (a + d) (ad − bc) = 4 tr A |A|
(36) (37)
and its characteristic polynomial is, |C − λI| = λ3 − 3 ( tr A) λ2 +2 ( tr A)2 + 2 |A| λ − 4 ( tr A) |A| .
(38)
When comparing with the eigenvalues of A, λ1 , and λ2 , or λ, (31- 33), we get the following result: Lemma 3. The eigenvalues of the Itˆo matrix C, (20- 21), associated with the Gaussian diffusions, are in the nonsingular case, |C| = 0, given by C C Δ2 > 0 : λC 1 = tr A, λ2 = tr A + 2Δ = 2λ1 , λ3 = tr A − 2Δ = 2λ2 2
Δ =0: 2
Δ 0 : m(t) = d−a b 1 cosh (Δt) − sinh (Δt) sinh (Δt) (a+d)t 2Δ Δ m0 e2 c d−a sinh (Δt) cosh (Δt) + sinh (Δt) Δ 2Δ (45) 1 1 bt 1 − 2 (d − a) t m0 (46) Δ2 = 0 : m(t) = e 2 (a+d)t ct 1 + 12 (d − a) t Δ2 < 0 : m(t) = b 1 sin (Δt) sin (Δt) cos (Δt) − d−a (a+d)t 2Δ Δ 2 m0 (47) e c sin (Δt) cos (Δt) + d−a sin (Δt) Δ 2Δ 68
Moment Evolution of Gaussian and Wiener Diffusions where, m0 = [mx (0) my (0)]T , is the initial value of the expectation function at time t = 0. Proof: Although the infinite series etA converges for all t and all square matrices A, it seems impossible in general to express etA in closed form. But in the two-dimensional case, the series may be summed in terms of elementary functions by an application of the 2 Hamilton–Cayley Theorem.5
Figure 1: Drift parameter regions and the mean functions (45)-(47)
5
Cf. Appendix B, and Jensen (1994, p. 307).
69
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen 2.4.2
Asymptotics of the mean vector function
For convenience we shall write f ∼ g whenever the functions f (t) and g(t) are such that the ratio f (t)/g(t) tends to 1 for t → ∞. Moreover, for two vector functions u and v, the relation u ∼ v means that uj ∼ vj for each j. Corollary 1. The asymptotic (long-run) behavior for t → ∞ of the expectation functions m(t) in Theorem 1 are given by Δ2 > 0 : λ1 < 0, λ2 < 0 : λ1 > 0, λ2 = 0, λ1 > 0, λ2 < 0, λ1 > 0, λ2 > 0,
lim m(t) = 0
t→∞ m0x = m0y m0x = m0y m0x = m0y
α2 : m(t) ∼ α2 :
1 ( 1 (a+d)+Δ)t e 2 2
1−
d−a 2Δ c Δ
1
b Δ + d−a 2Δ
m0
lim m(t) = 0
t→∞
1 1 α2 : m(t) ∼ e( 2 (a+d)+Δ)t 2
1 + d−a 2Δ c −Δ
b −Δ d−a 1 − 2Δ
m0 (48)
2
Δ ≤ 0 : tr A < 0 :
lim m(t) = 0.
t→∞
Proof: Writing the components of m(t), (45), in alternative explicit form gives d−a 1 1/2(a+d+2Δ)t b (1 − )mx (0) + my (0) mx (t) = e 2 2Δ Δ d−a b 1 )mx (0) − my (0) , (49) + e1/2(a+d−2Δ)t (1 + 2 2Δ Δ 1 1/2(a+d+2Δ)t c d−a mx (0) + (1 + )my (0) my (t) = e 2 Δ 2Δ c d−a 1 1/2(a+d−2Δ)t − mx (0) + (1 − )my (0) . (50) + e 2 Δ 2Δ When λ1 < 0 and λ2 < 0, the coefficients of t in the expressions above are negative, and hence lim mx (t) = 0, and lim my (t) = 0. t→∞
t→∞
In the other cases, it is seen that the terms with e1/2(a+d+2Δ)t are dominating; the terms e1/2(a+d−2Δ)t dominates only for special initial my (0) = α2 . In this way, the Corollary is verified. 2 values, r0 = m x (0)
70
Moment Evolution of Gaussian and Wiener Diffusions Remark 4. For parameter regions of the drift coefficients - as depicted on the plane (trA, |A|), cf. (27) - (29) - the geometry of the global phase portraits and the evolution (exact time paths) of the mean functions (45) - (47) are shown in Fig. 1. As seen further below, the parameter regions for classifying the global behavior of the covariance functions also correspond to the regions depicted in Fig. 1. Those GD and GWD processes with asymptotic stationary probability distributions are located in the interior of the second (upper-left, shaded) quadrant in Fig. 1.6 ∇ Whereas the sample paths of the diffusion processes are always continuous but nondifferentiable, the evolution of their moments are smoothly changing (described by C −1 -curves), as illustrated by the mean functions in Fig. 1. 2.5
Covariance matrix functions
2.5.1 Solutions to the differential equations From (20)-(23), the covariance function σ = [σ xx σ xy σ yy ]T - which contains the three independent elements of the covariance matrix Σ(t) for both the Gaussian and geometric Wiener diffusions – satisfies a 3dimensional ordinary differential equation in a general symbolic form: σ(t) ˙ = Cσ(t) + δ(t).
(51)
The complete solution of (51) can be symbolically written as follows: ⎡ ⎤ t −1 (52) σ(t) = etC ⎣σ(0) + e−τ C δ(τ )dτ ⎦ , e0C = I, e−tC = etC , 0
where the parameter σ(0) ∈ Rn plays the role of initial value of the solution. For a given particular solution of the equation (51), σ ¯ (t), the complete solution σ(t) passing through σ 0 at t = 0, σ(0) = σ 0 , is ¯ (t) σ(t) = etC σ ∗ (0) + σ ∗ ¯ (0); σ (0) = σ 0 − σ
(53) (54)
where σ ∗ (0) denotes the initial value of the solution to the corresponding homogeneous system (51). This symbolic notation is used below with specific matrices. 6 For a full dynamic description of the trajectory configurations in Fig. 1, see Jensen (1994, pp. 235).
71
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen ˜
Exponential matrices etC and etC
As a first step in determining the solution in (52), we use the ˜ cf. Lemma 3, to calculate the eigenvalues of our specific C and C, ˜ tC relevant exponential matrices, e and etC . Proposition 3. The exponential matrix etC of the CD model with C, (20 -21), is given by, etC = e(a+d)t M (t),
e0C = M (0) = I,
(55)
where the matrix M (t) is presented in Tables 1-3 as follows : Table 1: The nonsingular case, |C| = 0, with matrix M (t) as M+ (t), M0 (t), M− (t) for respectively : Δ2 > 0, Δ2 = 0 and Δ2 < 0 Table 2: The singular case, |C| = 0, and tr A = 0, with M (t) as M+ (t), M0 (t), M− (t) for respectively : Δ2 > 0, Δ2 = 0 and Δ2 < 0 Table 3: The singular case, |C| = 0, and |A| = 0, with a single M (t) matrix. ˜ ˜ (22 -23), is The exponential matrix etC of the GWD model with C, given by ˜
etC = eβ t etC = e(a+d+β )t M (t), 2
2
˜
e0C = M (0) = I,
(56)
where the matrix M (t) is presented in the Tables 1-3. Proof: For Δ2 > 0, the eigenvalues of C are distinct [Appendix C] t
etC = eλ1 C
C − λC 2 I
λC 1
−
λC 2
C − λC 3 I λC 1
−
λC 3
t
+ eλ2 C
C − λC 1 I
λC 2
−
λC 1
C − λC 3 I λC 2
−
λC 3
t
+ eλ3 C
C − λC 1 I
λC 3
−
λC 1
C − λC 2 I λC 3
−
λC 2
.
By inserting the eigenvalues of C for Δ2 > 0, cf. (39), we get 1 ( tr A+2Δ)t 2 C − 2( tr A − Δ)C + tr A( tr A − 2Δ)I e 8Δ2 1 ( tr A−2Δ)t 2 C − 2( tr A − Δ)C + tr A( tr A + 2Δ)I e + 8Δ2 1 ( tr A)t 2 C − 2( tr A)C + ( tr A − 2Δ)( tr A + 2Δ)I e − 2 4Δ e( tr A)t (cosh(2Δt) − 1) C 2 + 2( tr A − tr A cosh(2Δt) + Δ sinh(2Δt))C = 4Δ2 e( tr A)t ( tr A)2 cosh(2Δt) − 2Δ tr A sinh(2Δt) − 4|A| I. (57) + 2 4Δ
etC =
72
Moment Evolution of Gaussian and Wiener Diffusions Using that ⎡
⎤ 4a2 + 2bc 6ab + 2bd 2b2 C 2 = ⎣ 3ac + dc (a + d)2 + 4bc 3bd + ab ⎦ , 6cd + 2ac 4d2 + 2bc 2c2
(58)
a regrouping of the terms in (57) leads to M+ (t) in Table 1. Similarly, when Δ2 < 0, there are also three distinct eigenvalues, and following the same procedures with (41), we get M− (t) in Table 1. For the case Δ2 = 0, it is seen from (40) that C has three identical C C C eigenvalues, λC 1 = λ2 = λ3 = λ = tr A. Hence, by Appendix C, 1 2 2 C C tC λC t I +t C −λ I + t C −λ I = e e 2 1 ( tr A)t 2 2 = e I + (C − tr A)t + (C − tr A) t . (59) 2 By inserting the matrices, cf. (21), ⎡ ⎤ a − d 2b 0 0 b ⎦, (60) C − tr A = ⎣ c 0 2c d − a ⎡ ⎤ 2b2 (d − a)2 + 2bc −2b(d − a) ⎦ 4bc b(d − a) (C − tr A)2 = ⎣ −c(d − a) 2 2 2c(d − a) (d − a) + 2bc 2c (61) into (59), then M0 (t) in Table 1 is obtained. The singular cases in Tables 2-3 are the relevant simplifications of M+ (t), M0 (t) and M− (t) in Table 1. From (23), we have C˜ = C + β 2 I, where C and β 2 I commute so that 2 2 ˜ etC = eβ tI · etC = eβ t · etC . 2 Proposition 3 is a major result of our investigation. Having obtained ˜ explicit expressions for the exponential matrices etC and etC , the problem is now by (53) and (52) to calculate the particular solutions of the differential equations (21) and (23).
73
74
(1 −
bc ) cosh(2Δt) 2Δ2
−
d−a 2Δ
c2 (cos(2Δt) 2Δ2
− 1)
⎡
sinh(2Δt)
bc 2Δ2
− 1)
c(d−a) (cos(2Δt) 2Δ2
− 1) +
c Δ
− 1) +
− 1) + 1
b(d−a) (cos(2Δt) 2Δ2
2ct + c(d − a)t2
1 − 12 (d − a)2 t2
bc (cos(2Δt) Δ2
−
− 1) +
2bt − b(d − a)t2
c(d−a) (cosh(2Δt) 2Δ2 c Δ
− 1) +
− 1) + 1
b(d−a) (cosh(2Δt) 2Δ2
bc (cosh(2Δt) Δ2
−
[1 − 12 (d − a)t]2 ⎢ ⎢ 1 2 M0 (t) = ⎢ ⎢ ct − 2 c(d − a)t ⎣ c2 t2
c 2Δ
sinh(2Δt) +
d−a bc bc (1 − 2Δ 2 ) cos(2Δt) − 2Δ sin(2Δt) + 2Δ2 ⎢ ⎢ ⎢ c M− (t) = ⎢ − c(d−a) (cos(2Δt) − 1) + 2Δ sin(2Δt) 4Δ2 ⎢ ⎣
⎡
c2 (cosh(2Δt) 2Δ2
⎢ ⎢ ⎢ M+ (t) = ⎢ − c(d−a) (cosh(2Δt) − 1) + 4Δ2 ⎢ ⎣
⎡ sinh(2Δt)
sin(2Δt)
sin(2Δt)
b Δ
(1 −
⎤
(1 −
−
−
sin(2Δt) sin(2Δt) +
b 2Δ
sinh(2Δt)
bc 2Δ2
sinh(2Δt) +
b 2Δ
d−a 2Δ
d−a 2Δ
− 1) + bc ) cos(2Δt) 2Δ2
b(d−a) (cos(2Δt) 4Δ2
b2 (cos(2Δt) 2Δ2
− 1)
bc ) cosh(2Δt) 2Δ2
− 1) +
− 1)
b(d−a) (cosh(2Δt) 4Δ2
b2 (cosh(2Δt) 2Δ2
⎥ ⎥ bt + 12 b(d − a)t2 ⎥ ⎥ ⎦ 1 2 [1 + 2 (d − a)t]
b2 t2
sinh(2Δt)
b Δ
Table 1. The matrices M+ (t), M0 (t), M− (t) for the nonsingular case : |C| = 0.
⎥ ⎥ ⎥ ⎥ ⎥ ⎦
⎤
bc 2Δ2
⎥ ⎥ ⎥ ⎥ ⎥ ⎦
⎤
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen
2cd [cos(2Δt) − 1] + 2cΔ sin(2Δt)
2c2 [cosh((a + d)t) − 1]
1 1 [ae 2 (a + d)t + de− 2 (a + d)t ]2 ⎢ ⎢ 1 ⎢ M (t) = ⎢ ace(a+d)t + c(d − a) − dce−(a+d)t (a + d)2 ⎢ ⎣
⎡
2cde(a+d)t − 2c(d − a) − 2ace−(a+d)t
2bc [cosh((a + d)t) − 1]
2abe(a+d)t + 2b(d − a) − 2bde−(a+d)t
⎥ ⎥ ⎥ bde(a+d)t − b(d − a) − abe−(a+d)t ⎥ ⎥ ⎦ 1 1 (a + d)t − (a + d)t 2 + ae 2 ] [de 2
2b2 [cosh((a + d)t) − 1]
⎤
(d2 + Δ2 ) cos(2Δt) + 2dΔ sin(2Δt) + bc
bd [cos(2Δt) − 1] + bΔ sin(2Δt)
2bc [cos(2Δt) − 1] + 2Δ2
Table 3. The matrix M (t) for the singularity : |C| = 0, with |A| = 0.
c2 [cos(2Δt) − 1]
b2 [cos(2Δt) − 1]
(dt + 1)2
2ct(dt + 1)
2ab [cos(2Δt) − 1] + 2bΔ sin(2Δt)
c2 t2
⎤
⎥ ⎥ bt(dt + 1) ⎥ ⎦
b2 t2
2bct2 + 1
2bt(at + 1)
⎥ ⎥ ⎥ ⎦
⎤
(d2 + Δ2 ) cosh(2Δt) + 2dΔ sinh(2Δt) + bc
bd [cosh(2Δt) − 1] + bΔ sinh(2Δt)
2bc [cosh(2Δt) − 1] + 2Δ2 2cd [cosh(2Δt) − 1] + 2cΔ sinh(2Δt)
b2 [cosh(2Δt) − 1]
2ab [cosh(2Δt) − 1] + 2bΔ sinh(2Δt)
(at + 1)2 ⎢ ⎢ M0 (t) = ⎢ ct(at + 1) ⎣
⎡
(a2 + Δ2 ) cos(2Δt) + 2aΔ sin(2Δt) + bc ⎢ 1 ⎢ M− (t) = ⎢ ac [cos(2Δt) − 1] + cΔ sin(2Δt) 2Δ2 ⎣
⎡
c2 [cosh(2Δt) − 1]
(a2 + Δ2 ) cosh(2Δt) + 2aΔ sinh(2Δt) + bc 1 ⎢ ⎢ M+ (t) = ⎢ ac [cosh(2Δt) − 1] + cΔ sinh(2Δt) 2Δ2 ⎣
⎡
Table 2. The matrices M+ (t), M0 (t), M− (t) for the singularity : |C| = 0, with trA = 0. ⎥ ⎥ ⎥ ⎦
⎤
Moment Evolution of Gaussian and Wiener Diffusions
75
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen Particular solutions Proposition 4. For the GD, (20 - 21), particular solutions σ ¯ (t) are provided by the following expressions: In the nonsingular case, |C| = 4 |A| tr A = 0, σ ¯ (t) can be taken as a constant vector: ⎤ σ ¯ xx ¯ xy ⎦ σ ¯=⎣ σ σ ¯ yy ⎡
(62)
⎡ ⎤ − |A| (β 211 + β 212 ) − (dβ 11 − bβ 21 )2 − (bβ 22 − dβ 12 )2 1 2 2 2 2 ⎣ cdβ 11 − 2adβ 11 β 21 + abβ 21 + cdβ 12 − 2adβ 12 β 22 + abβ 22 ⎦ = 2 |A| tr A − |A| (β 221 + β 222 ) − (cβ 11 − aβ 21 )2 − (cβ 12 − aβ 22 )2
In the singular case, |C| = 4 |A| tr A = 0, the particular solution σ ¯ (t) is as follows: if ⎤ ⎡ γt 1 γ − (β 211 + β 212 ) − ab γt ⎦ (63) trA = 0 : σ ¯ (t) = ⎣ 2b 12 β 22 − cb γt − ba2 γ − β 11 β 21 +β b where γ=
1 1 (dβ 11 − bβ 21 )2 + (dβ 12 − bβ 22 )2 + (β 211 + β 212 ), 2 |A| 2
or if
⎤ γt 1 ⎦ γ − (β 211 + β 212 ) − ab γt |A| = 0 : σ ¯ (t) = ⎣ 2b β 12 β 21 +β 12 β 22 2 2 a2 a+d a γt + 2b2 (β 11 + β 12 − γ) − b2 γ − b2 b (64) ⎡
where γ now denotes: (tr A)−2 [(dβ 11 − bβ 21 )2 + (dβ 12 − bβ 22 )2 ]. ˜ = For the GWD with |C| 0, (22 - 23), a particular component σ ˜ (t) is ⎡ ⎤ t m2x (0) ˜˜ −τ C −β 2 t ⎣ mx (0)my (0) ⎦ (65) δ(τ )dτ = 1 − e σ ˜ (t) = e 2 m (0) y 0 , ˜
˜
where [e−tC is the inverse of (56); e−tC = e−(a+d+β )t M (−t), M (−t) = [M (t)]−1 ] ⎡ ⎤ m2x (0) 2 ˜ e−tC ˜δ(t) = e−β t β 2 ⎣ mx (0)my (0) ⎦ (66) m2y (0) , 2
and mx (0) and my (0) are the initial values of the mean function m(t), cf. (45)-(47) in Theorem 1.
76
Moment Evolution of Gaussian and Wiener Diffusions Proof: Concerning (62)-(64) for the GD model, see Appendix A. For the GWD model, (23) shows that the nonhomogeneous part depends on the mean function, which is a function of time t. Noting ˜δ(t), (22)-(23) and (45)-(47) together with (52), we get (65) and (66) after remarkable simplicification. 2 Remark 5. The RHS of Formula (65) for the GWD model is another main result of this study. It is noteworthy how the initial mean values of mx and my enter there in precisely the same manner as they enter in ˜δ(t), cf. (22)-(23). The reductions leading to the RHS of formula (65) are really lengthy; it is decisive, first, to have Theorem 1 on the means (45)(47) available, and second, to have an extensive cancelation of terms in the integrand (66). The details are left out to save space, but a technical report may be obtained from the authors. Alternatively, one could also make a computer-aided calculation of the integral (65), using Maple or similar software. ∇ General covariance vector solutions Finally, by (52-53), (55 - 56), and Propositions 3-4, we get: Theorem 2. The covariance vector σ(t) = [σ xx (t) σ xy (t) σ yy (t)]T that explicitly solves (20 - 21) for the GD model with the nonsingular Itˆ o coefficient matrix |C| = 0 is [cf., (53), (55), (62), (63) and (64)] ¯ (t). σ(t) = e(a+d)t M (t)σ ∗ (0) + σ
(67)
This vector σ(t) is presented in Tables 4, 5, and 6 for the three cases: Δ2 > 0, Δ2 = 0, and Δ2 < 0, respectively. The covariance vector σ(t) = [σ xx (t) σ xy (t) σ yy (t)]T that explicitly solves (22 - 23) for the GWD model is [cf., (52), (56) and (65)] ˜ (t)]. σ(t) = e(a+d+β )t M (t)[σ(0) + σ 2
(68)
This vector σ(t) is presented in Tables 7, 8, and 9 for the cases: Δ2 > 0, Δ2 = 0, and Δ2 < 0, respectively. Table 4. GD function σ(t), (53), (55), (62), (29), for Δ2 > 0, |C| = 0: ¯ xx σ xx (t) = e(a+d)t k1∗ e2Δt + k2∗ e−2Δt + k3∗ + σ (69) k1∗ = k2∗ = k3∗ =
2 d−a b b2 ∗ 1 − d−a σ ∗xy (0) + 4Δ σ ∗xx (0) + 2Δ 2 σ yy (0) 2Δ 2Δ d−a d−a b 1 b2 ∗ ∗ ∗ 1 + 2Δ σ xx (0) − Δ 2 + 4Δ σ xy (0) + 4Δ2 σ yy (0) ∗ b cσ xx (0) + (d − a) σ ∗xy (0) − bσ ∗yy (0) 2Δ2 1 4 1 4
1−
k1∗ + k2∗ + k3∗ = σ ∗xx (0)
77
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen σ xy (t) = e(a+d)t k4∗ e2Δt + k5∗ e−2Δt + k6∗ + σ ¯ xy k4∗ = k5∗ = k6∗ =
(70)
c bc b ∗ 1 − d−a σ ∗ (0) + 2Δ 1 + d−a σ ∗ (0) 2 σ xy (0) + 4Δ 4Δ 2Δ xx 2Δ yy d−a c ∗ (0) + bc σ ∗ (0) − b − 4Δ 1 + d−a σ 1 − σ ∗yy (0) xx 4Δ 2Δ 2Δ2 xy ∗ 2Δ d−a ∗ (0) − bσ ∗ (0) cσ (0) + (d − a) σ xx xy yy 4Δ2
k4∗ + k5∗ + k6∗ = σ ∗xy (0)
¯ yy σ yy (t) = e(a+d)t k7∗ e2Δt + k8∗ e−2Δt + k9∗ + σ k7∗ k8∗ k9∗
= = =
c2 σ∗ 4Δ2 xx
(0) +
c 2Δ
c2 c σ ∗ (0) − 2Δ 4Δ2 xx c ∗ (0) − −cσ xx 2Δ2
1+
1− (d −
d−a 2Δ
σ ∗xy
d−a σ ∗xy 2Δ a) σ ∗xy (0)
1+
d−a 2Δ
1− (0) + + bσ ∗yy (0)
d−a 2Δ
(0) +
1 4 1 4
(71) 2 2
σ ∗yy
(0)
σ ∗yy
(0)
k7∗ + k8∗ + k9∗ = σ ∗yy (0)
Table 5. GD function σ(t), (53), (55), (62), for Δ2 = 0, |C| = 0: %
σ xx (t) = e(a+d)t
1−
&2 & % 1 1 (d − a)t σ ∗xx (0) + 2bt 1 − (d − a)t σ ∗xy (0) + b2 t2 σ ∗yy (0) 2 2
+σ ¯ xx (72)
%
& & % 1 1 ct − c(d − a)t2 σ ∗xx (0) + 1 − (d − a)2 t2 σ ∗xy (0) σ xy (t) = e(a+d)t 2 2 % & 1 2 ∗ + bt + b(d − a)t σ yy (0) + σ ¯ xy 2 % & 1 σ yy (t) = e(a+d)t c2 t2 σ ∗xx 0) + 2ct 1 + (d − a)t σ ∗xy (0) 2 &2 % 1 ¯ yy + 1 + (d − a)t σ ∗yy (0) + σ 2
(73)
(74)
Table 6. GD function σ(t), (53), (55), (62), for Δ2 < 0, |C| = 0: ∗ ∗ ∗ cos (2Δt) + k11 sin (2Δt) + k12 }+σ ¯ xx σ xx (t) = e(a+d)t {k10 ∗ k10
=
∗ = k11
∗ = k12
b(d−a) bc b2 ∗ 1 − 2Δ σ ∗xx (0) − 2Δ2 σ ∗xy (0) + 2Δ 2 2 σ yy d−a ∗ 1 − 2 σ xx (0) + bσ ∗xy (0) Δ ∗ b cσ xx (0) + (d − a) σ ∗xy (0) − bσ ∗yy (0) 2 2Δ
(75)
(0)
∗ + k ∗ = σ ∗ (0) k10 xx 12
∗ ∗ ∗ σ xy (t) = e(a+d)t {k13 cos (2Δt) + k14 sin (2Δt) + k15 }+σ ¯ xy ∗ = k13 ∗ = k14 ∗ = k15
c(d−a) b(d−a) ∗ 1 σ yy − 4 σ ∗xx (0) + bσ ∗xy (0) + 4 Δ2 1 ∗ ∗ (0) + bσ (0) cσ yy 2Δ xx d−a cσ ∗xx (0) + (d − a) σ ∗xy (0) − bσ ∗yy (0) 4Δ2
(0)
(76)
∗ + k ∗ = σ ∗ (0) k13 xy 15
∗ ∗ ∗ σ yy (t) = e(a+d)t {k16 cos (2Δt) + k17 sin (2Δt) + k18 }+σ ¯ yy ∗ k16
=
∗ = k17 ∗ = k18
∗ + k ∗ = σ ∗ (0) k16 yy 18
78
c(d−a) ∗ c2 ∗ σ xy (0) + 2 σ xx (0) + 2Δ 2Δ2 1 ∗ (0) + d−a σ ∗ (0) cσ xy yy Δ 2 (−c) cσ ∗xx (0) + (d − a) σ ∗xy (0) 2Δ2
1−
bc 2Δ2
− bσ ∗yy (0)
σ ∗yy
(0)
(77)
Moment Evolution of Gaussian and Wiener Diffusions Table 7. GWD function σ(t), (52), (56), (65), for Δ2 > 0. 2 2 σ xx (t) = e(a+d+β )t k1 e2Δt + k2 e−2Δt + k3 + 1 − e−β t × % 2 2 & % & 1 b 1 b d−a d−a 1− mx (0) + my (0) e2Δt + 1+ mx (0) − my (0) e−2Δt 4 2Δ Δ 4 2Δ Δ * b 2 cmx (0) + (d − a)mx (0)my (0) − bm2y (0) + , (78) 2Δ2
where k1 , k2 and k3 are given in Table 4 with σ ∗xx (0), σ ∗xy (0), σ ∗yy (0) replaced by σ xx (0), σ xy (0), σ yy (0),
σ xy (t) =e(a+d+β
2
)t
2 k4 e2Δt + k5 e−2Δt + k6 + 1 − e−β t ×
% & % & d−a d−a bc b c 1− m2x (0) + 1+ m2y (0) e2Δt mx (0)my (0) + 4Δ 2Δ 2Δ2 4Δ 2Δ % & % & c bc b d−a d−a 2 + − m (0)m (0) − (0) e−2Δt 1+ m2x (0) + 1 − m x y y 4Δ 2Δ 2Δ2 4Δ 2Δ * d−a , (79) + cm2x (0) + (d − a)mx (0)my (0) − bm2y (0) 2 4Δ
where k4 , k5 and k6 are given in Table 4 with σ ∗xx (0), σ ∗xy (0), σ ∗yy (0) replaced by σ xx (0), σ xy (0), σ yy (0),
σ yy (t) =e(a+d+β
2
)t
2 k7 e2Δt + k8 e−2Δt + k9 + 1 − e−β t ×
2 2 d − a 1 c d − a 1 c mx (0) + 1 + mx (0) + 1 − my (0) e2Δt + my (0) e−2Δt 4 Δ 2Δ 4 Δ 2Δ * c + , (80) −cm2x (0) − (d − a)mx (0)my (0) + bm2y (0) 2Δ2
where k7 , k8 and k9 are given in Table 4 with σ ∗xx (0), σ ∗xy (0), σ ∗yy (0) replaced by σ xx (0), σ xy (0), σ yy (0) Table 8. GWD function σ(t), (52), (56), (65), for Δ2 = 0. 2 1 1 (d − a)t σ xx (0) + 2bt 1 − (d − a)t σ xy (0) + b2 t2 σ yy (0) 2 2 2 2 1 + 1 − e−β t (81) 1 − (d − a)t mx (0) − btmy (0) 2 2 1 1 σ xy (t) = e a+d+β t ct 1 − (d − a)t σ xx (0) + 1 − (d − a)2 t2 σ xy (0) 2 2 & % 2 1 1 + bt 1 + (d − a)t σ yy (0) + 1 − e−β t ct 1 − (d − a)t m2x (0) 2 2 1 1 2 2 + 1 − (d − a) t mx (0)my (0) + bt 1 + (d − a)t m2y (0) (82) 2 2 2 2 1 1 σ yy (t) = e(a+d+β )t c2 t2 σ xx (0) + 2ct 1 + (d − a)t σ xy (0) + 1 + (d − a)t σ yy (0) 2 2 & % 2 2 1 + 1 − e−β t ctmx (0) + 1 + (d − a)t my (0) (83) 2
σ xx (t) = e(a+d+β
2
)t
1−
79
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen Table 9. GWD function σ(t), (52), (56), (65), for Δ2 < 0. σ xx (t) = e(a+d+β
2
)t
2 k10 cos(2Δt) + k11 sin(2Δt) + k12 + 1 − e−β t × %
bc 2 b(d − a) b2 2 m (0) − m (0)m (0) + m (0) cos(2Δt) x y x y 2Δ2 2Δ2 2Δ2 d−a 2 b mx (0) + mx (0)my (0) sin(2Δt) + − 2Δ Δ &* bc 2 2 cm (0) + (d − a)m (0)m (0) − bm (0) , (84) + x y x y 2Δ2 1−
where k10 , k11 and k12 are given in Table 6 with σ ∗xx (0), σ ∗xy (0), σ ∗yy (0) replaced by σ xx (0), σ xy (0), σ yy (0), σ xy (t) = e(a+d+β
2
)t
2 k13 cos(2Δt) + k14 sin(2Δt) + k15 + 1 − e−β t × bc b(d − a) 2 c(d − a) 2 m (0) + m (0)m (0) + m (0) cos(2Δt) x y x y 4Δ2 Δ2 4Δ2 c b m2 (0) + m2 (0) sin(2Δt) + 2Δ x 2Δ y & d − a 2 cmx (0) + (d − a)mx (0)my (0) − bm2y (0) + , (85) 2 4Δ
%
−
where k13 , k14 and k15 are given in Table 6 with σ ∗xx (0), σ ∗xy (0), σ ∗yy (0) replaced by σ xx (0), σ xy (0), σ yy (0) σ yy (t) = e(a+d+β
2
)t
2 k16 cos(2Δt) + k17 sin(2Δt) + k18 + 1 − e−β t × %
c(d − a) bc 2 c2 2 m (0) + m (0)m (0) + 1 − (0) cos(2Δt) m x y x y 2Δ2 2Δ2 2Δ2 c d−a 2 mx (0)my (0) + my (0) sin(2Δt) + Δ 2Δ &* (−c) 2 2 cm (0) + (d − a)m (0)m (0) − bm (0) , (86) + x y x y 2Δ2
where k16 , k17 and k18 are given in Table 6 with σ ∗xx (0), σ ∗xy (0), σ ∗yy (0) replaced by σ xx (0), σ xy (0), σ yy (0)
80
Moment Evolution of Gaussian and Wiener Diffusions Remark 6. It should be noted that the moments formulas (69)–(77) for the GD model with |C| = 0 have – independent of the sign Δ2 – constant terms, σ ¯ xx , σ ¯ xy , σ ¯ yy , (62), depending on both the drift (A), and diffusion parameters (B). The terms k∗i in Tables 4-6 are independent of the diffusion parameters (B). For the moments (78)– (86) of the GWD model, however, the diffusion parameter β is always involved in the long-run covariance vector solutions, except for the trivial long-run stationary solution “0”, see (89)–(90). ∇ Remark 7. In both models, the initial value σ(0) may be thought of as an uncertainty in the measurement of the initial state vector, X(0). Then the explicit formulae allow a discussion of whether this initial uncertainty σ(0) or the diffusion coefficient B plays the dominating role for σ(t), and hence for the future deviations of X(t) from the mean value m(t). Symbolic versions of the formulas for σ(t), Σ(t), are for the GD and GWD models – in the case of, σ(0) = 0 – found in Theorems 3 and 4 below. ∇ 2.5.2 Asymptotics of the covariance vector function From (69)–(86), the next result follows immediately, Corollary 2. The asymptotic or long-run behavior, as t → ∞, for the covariance vector function σ(t) of the non-singular GD model with |C| = 0 is given by [cf. Lemma 3] ⎧ λ < 0, λ2 < 0 : lim σ(t) = σ ¯ ⎪ ⎪ 1 t→∞ ⎡ ∗ ⎤ ⎨ k1 (87) Δ2 > 0 : (a+d+2Δ)t ⎣ ∗ ⎦ ⎪ k > 0 : σ(t) ∼ e λ 1 ⎪ 4 ⎩ k7∗ Δ2 ≤ 0 : trA < 0 : lim σ(t) = σ ¯ t→∞
(88)
∗ where σ ¯ is given in (62) and k1, k4∗ , k7∗ are given in (69)–(71).
The asymptotic behavior, as t → ∞, for the covariance vector function σ(t) of the GWD model, is given by ⎧ λ1 < 0, λ2 < 0 : lim σ(t) = 0 ⎪ ⎪ t→∞ ⎡ ⎤ ⎨ k 1 + κ1 2 (89) Δ >0: 2 (a+d+2Δ+β )t ⎣ ⎪ k 4 + κ4 ⎦ ⎪ ⎩ λ1 > 0 : σ(t) ∼ e k 7 + κ7 2 lim σ(t) ≡ 0 Δ2 ≤ 0 : a + d + β < 0 : t→∞
(90) 81
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen where k1, k4 , k7 are given in (78)–(80), and
2 d−a 1 b )mx (0) + my (0) κ1 = (1 − 4 2Δ Δ d−a 2 d−a 2 c bc b (1 − )mx (0) + (1 + )my (0) κ4 = mx (0)my (0) + 2 4Δ 2Δ 2Δ 4Δ 2Δ 2 1 c d−a κ7 = mx (0) + (1 + )my (0) (91) 4 Δ 2Δ The correlation coefficient of the GD model with |C| = 0 has the following asymptotics – in the notations of (62) and (69)–(77): Δ2 > 0 :
⎧ ⎨ λ1 > 0, λ2 < 0 : lim ρ(t)2 =
⎩ λ1 > 0 : ⎧ ⎨ trA < 0 Δ2 = 0 : ⎩ trA > 0 Δ2 < 0 : trA < 0
t→∞
lim ρ(t)2 =
t→∞
lim ρ(t)2 =
t→∞
σ ¯ 2xy σ ¯ xx σ ¯ yy [k4∗ ]2 = k1∗ k7∗
1
σ ¯ 2xy σ ¯ xx σ ¯ yy
lim ρ(t)2 = 1
t→∞
lim ρ(t)2 =
t→∞
σ ¯ 2xy σ ¯ xx σ ¯ yy
(92)
For the GWD model [cf. (78)-(86) and (89)] Δ2 > 0 : Δ2 = 0 :
lim ρ(t)2 =
t→∞
[k4 +κ4 ]2 [k1 +κ1 ][k7 +κ7 ]
lim ρ(t)2 = constant
(93)
t→∞
σ 2 (t)
xy Proof: By the definition, ρ2 (t) = σ2 (t)σ 2 (t) , and by the asymptotics xx yy for the covariance function in the first part above, the asymptotics for the correlation function in the second part is obtained. 2
Remark 8. In Corollary 2, we have only given the asymptotics for the correlation coefficients of the GD model with |C| = 0. It is interesting to note that, the GD model with the singular Itˆo coefficient matrix : |C| = 0, e.g., in Table 2 with Δ2 < 0, trA = 0 and (63), the asymptotics [dominated by the particular solutions (63)] of the 2 correlation coefficient gives a constant: lim ρ(t)2 = − abc > 0; see the t→∞ related ellipses, circles of the mean functions in Fig. 1. ∇
82
Moment Evolution of Gaussian and Wiener Diffusions Stationarity. A nonsingular diffusion process is said to be stationary, or to have a stationary version, if it admits a time-invariant probability distribution. It is well known that a necessary and sufficient condition for the GD processes (13), with |Σ(t)| = 0, to be stationary, cf. (87 88), is that all eigenvalues of the drift coefficient A, cf. (31)–(33), - or equivalently the Itˆ o coefficient matrix C, cf. (39)–(41) - have negative real parts. Similarly, we see from (89) and (90) that the necessary and sufficient condition for the GWD processes (15) to be stationary is that ˜ cf. (42), have negative all eigenvalues of the Itˆo coefficient matrix C, real parts; the stationary GWD process is trivial, cf. Remark 6. Moreover, we note that the asymptotically stable deterministic system remains stochastically asymptotically stable upon addition of arbitrary strong disturbances (”arithmetic noise”), e.g. GD model. Such a stability property, determined entirely by the drift coefficient matrix A, disappears with higher (3-dimensional) diffusion processes. Furthermore, we note that, with the addition of geometric noise, e.g. the GWD model, this stability property cannot be determined entirely by the drift matrix A. The GWD model can only be stable, when λ1 < −β 2 . Hence the stability region for the GWD model is smaller than for the GD model, see Fig. 1. 2.5.3 Singularity of the covariance matrix A diffusion process is said to be nonsingular if its covariance matrix Σ(t) is nonsingular for all t > 0, or singular if Σ(t) is singular for all t > 0. For the GD model, (13)–(14), Σ(t), n = 2, is nonsingular, if and only if (94) rank (B AB A2 B · · · An−1 B) = n This rank condition is equivalent to hypoellipticity of the forward Kolmogorov equation, cf. Theorem 3 below. We state: Lemma 4. Let X(t) be described by the GD model, (13)–(14): When |B| = 0, X(t) is always nonsingular; When |B| = 0, X(t) is nonsingular if and only if the columns of B in (11) are not proportional to any eigenvector of A. The eigenvectors are proportional to (1, α1 ), (1, α2 ) and (1, α), where α1 , α2 and α are given in (34)–(35), i.e., β d−a Δ β 11 ± = α2 . or 22 = (95) β 21 β 12 2b b 83
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen Proof: When |B| = 0, (94) clearly holds. But if |B| = 0, i.e. β 11 β 22 − β 12 β 21 = 0, then rank (B) = 2. Consider β 11 β 12 aβ 11 + bβ 21 a β 12 + bβ 22 . (96) (B AB) = β 21 β 22 cβ 11 + dβ 21 cβ 12 + dβ 22 Calculating all the determinants of the 2 × 2 submatrices, we get the same criteria for the singularity of the submatrices, namely that β 11 β 22 β 22 d−a Δ ± = αi, i = 1, 2 · = 1 and = (97) β 21 β 12 β 12 2b b Given |B| = 0, the first equality is of course satisfied. Hence, the only way to avoid singularity is that the second equality is not satisfied. Evidently, ββ 22 is the slope of the column vector of the diffusion matrix 12 B, (11). Thus, the necessary and sufficient condition for X(t) of the GD model to be nonsingular is that none of the column vectors of B coincide with any eigenvector of A, (34), (35). See also Fig. 1. 2 2.6
Probability density functions
2.6.1
Kolmogorov’s forward equation
In this section, we take an alternative approach to the treatment of the problems with the diffusion models in (13)–(14) and (15)–(16). For simplicity, the initial conditions are taken as deterministic, i.e., x0 (ω) is independent of ω ∈ Ω. Our models are written compactly as dX = a(X)dt + B(X)dw,
X(0) = x0 .
(98)
The transition probability distribution , P (s, y, t, A), is given by P (s, y, t, A) = P (X(t) ∈ A | X(s) = y)
(99)
for every t > s > 0, y ∈ Rn and each Borel set A ⊂ Rn . Moreover, provided that P (s, y, t, ·) – a probability measure on Rn – is known to have a transition probability density function p(s, y, t, x), then p(0, x0 , t, x) solves the following system of equations written with (u(t, x) as the unknown), cf. (3): n n 1 ∂u ∂ 2u ∂u = bj,k (x) − aj (x) for t > 0, (100) ∂t 2 j,k=1 ∂xj ∂xk ∂xj j=1 1= u(t, x) dx for each t > 0, (101) Rn
δ x0 (x) = lim u(t, x); t→0+
84
(102)
Moment Evolution of Gaussian and Wiener Diffusions where bj,k (x) is short hand for the jk th entry of B(x)B(x)T ; and δ z denotes the point mass (Dirac delta function) at z in Rn . See Remark 10 below for assumptions and references. The probability density p(s, y, t, x) is a fundamental solution to the partial differential equation (PDE) in (100), which is known as Kolmogorov’s forward equation or the Fokker–Planck equation. See also Appendix D. Our results in the previous sections 2.4 and 2.5 may be found by solving (100)–(102) for the GD and GWD models. In addition, by doing so (or by the mere attempt to do so), further information may be derived; ultimately one finds the probability distribution for stochastic process X(t) at each fixed t > 0 – and not just the first-order and second-order moments for the diffusion X(t). The GD model For the Gaussian diffusion model, the approach to solving (100)-(102) is well-known. A complete analysis was given in H¨ormander (1967). Here we only need to give a brief account. The main tool is the Fourier transformation, F. Recall that S(Rn ) denotes the Fr´echet space of rapidly decreasing C ∞ functions, that is, smooth ϕ(x) that for all multi-indices α and β in Nn0 satisfy (103) sup |xα ∂ β ϕ(x)| x ∈ Rn < ∞; where, xα = xα1 1 . . . xαnn , and, ∂ β = ∂xβ = (∂/∂x1 )β 1 . . . (∂/∂xn )β n . Then Fϕ defined by Fϕ(ξ) = e−iξ·x ϕ(x) dx (104) Rn
is a linear continuous bijection F : S(Rn ) → S(Rn ), the inverse of which is given as F −1 ψ(x) = (2π)−n Fψ(−x). Moreover, the formulae F(∂xα ϕ) = i|α| ξ α · Fϕ,
F(xα ϕ) = i|α| ∂ξα Fϕ,
(105)
are valid for all multi-indices α ∈ Nn0 and all ϕ ∈ S(Rn ).
85
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen Furthermore, F extends to a linear continuous bijection S (Rn ) → S (Rn ), where S (Rn ) denotes the dual space of tempered distributions. More precisely, any element v of S (Rn ) is a continuous linear map S(Rn ) → C, and, with the value at a given ϕ ∈ S(Rn ) denoted by v, ϕ , then Fv is the element of S (Rn ) for which
Fv, ϕ = v, Fϕ for every ϕ ∈ S(Rn ).
(106)
The space S (Rn ) contains Lp (Rn ) for every p ∈ [1, ∞] and M (Rn ) — the Radon measures of finite total variation. In fact, the inclusion M (Rn ) ⊂ S (Rn ) is given by ϕ dμ.
μ, ϕ = Rn
Moreover, Fμ(ξ) = e−iξ·x dμ, since by (106), Fϕ(x) dμ(x) = e−iξ·x dμ(x), ϕ .
F μ, ϕ =
(107)
Rn
In particular, if μ on Rn is given as the probability distribution of a stochastic variable X, this means that Fμ(−ξ) = E(eiξ·X ), i.e., the characteristic function of X. As an example, Fδ z = e−iξ·z . Recall also that ∂ α u is a well-defined element of S (Rn ) for each u therein and each muliti-index α, namely
∂ α u, ϕ = u, (−1)|α| ∂ α ϕ . Using this definition, the identities in (105) carry over to all u ∈ S (Rn ). Further details about these standard techniques are found in Rudin (1973). Concerning the solution u(t, x) in (100), one can now for each t ≥ 0 compute the Fourier transform in the x-variable and denote this by Fu(t, ξ) or by uˆ(t, ξ). Given formula (105), and because the 86
Moment Evolution of Gaussian and Wiener Diffusions diffusion coefficients bj,k (x) in the GD model are independent of x, the transformed system n n 1 ∂ uˆ ∂ uˆ =− bj,k ξ j ξ k uˆ + aj,k ξ j , ∂t 2 j,k=1 ∂ξ k j,k=1
(108)
1 = uˆ(t, 0) for each t > 0,
(109)
−iξ·x0
e
= lim uˆ(t, ξ). t→0+
(110)
is equivalent to (100)–(102). As noted in H¨ormander (1967), when the drift coefficient A = (aj,k ), as in the GD model, is a constant matrix, then the Fourier transformed system, (108)–(110), is solved uniquely by, 1 t sAT T T T (e ξ) BB T (esA ξ) ds (111) uˆ(t, ξ) = exp −ix0 · etA ξ − 2 0 for t > 0. By comparison with the characteristic function for Gaussian random variables, one gets the well-known result, (112)–(113): Theorem 3. The GD model, (13)–(14), has a unique solution X(t), which, for each t > 0, has a Gaussian distribution with the mean vector and covariance matrix [with initial value Σ(0) = 0] formally given by: m(t) = etA x0 t T esA BB T esA ds Σ(t) =
(112) (113)
0
In particular, m(t) is given explicitly in (45-47) of Theorem 1, and Σ(t) is given explicitly by the expression (67) of Theorem 2 (with σ(0) = 0), and it is positive definite, i.e., Σ(t) > 0, if and only if Kolmogorov’s equation (100) is hypoelliptic with the values of aj,k and bj,k used in the GD model; i.e., if and only if the rank condition (94) is fulfilled. Proof: See Theorem 8.2.10 in Arnold (1973) for the Gaussian distribution and the expressions for the mean and covariance (even between different times s = t). The equivalence with the hypoellipticity was observed in H¨ormander (1967). 2 87
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen Observe that since the rank condition (94) involves only A and B, and not t, one has Σ(t) > 0 for every t > 0, if and only if Σ(t) > 0 holds at a single time t. The formula for uˆ(t, ξ) in (111) has another merit: Suppose that A and B in (98) are such that Σ(t) in (113) is only positive semidefinite. Then there is a subspace N ⊂ Rn such that the integral in (111) vanishes for every ξ ∈ N ; consequently |ˆ u(t, ξ)| = 1 for all ξ ∈ N . Therefore the transition probability P (0, x0 , t, A) in (99) cannot have a density function p, because on the one hand, such a density would necessarily belong to L1 with respect to the Lebesgue measure on Rn , so pˆ(0, x0 , t, ξ) would go to 0 for |ξ| → ∞, and on the other hand, p would solve (100)–(102) and therefore have |ˆ p(0, x0 , t, ξ)| = 1 for ξ ∈ N , as we have just seen. This contradiction immediately leads to the following result: Proposition 5. When x0 is independent of ω in Ω, the properties: (i) the GD model has a transition probability density p(0, x0 , t, x), (ii) the covariance function Σ(t) is invertible for every t > 0, (iii) the covariance function Σ(t) is invertible for some t > 0, are equivalent. Remark 9. The formula in (113) is useful as a point of departure for an alternative determination of Σ(t). In fact, in two dimensions where etA can be written down explicitly, we have made a computer-aided calculation using Maple, and the result coincides with the formulae of Theorem 2, when the initial value Σ(0) equals zero. Specifically, this means that the GD density function u(t, x) = p(0, x0 , t, x) becomes 1
p(0, x0 , t, x) = '
1
(2π)n |Σ(t)|
T Σ(t)−1 (x−m(t))
e− 2 (x−m(t))
(114)
with m(t) , Σ(t) as in Theorems 1-2, provided Σ(t) > 0. ∇ Remark 10. That the GD density function (114) solves (100)–(102) is known under the assumption that, first of all, p(s, y, t, x) exists; secondly, the necessary derivatives ∂p , ∂t 88
aj (x)
∂p , ∂xj
bj,k (x)
∂2p , ∂xj xk
j, k = 1, . . . , n
(115)
Moment Evolution of Gaussian and Wiener Diffusions exist as continuous functions; and, thirdly, the transition function of a diffusion process, P (s, y, t, A) should have a(x) and (bj,k (x))j,k=1,...,n as the drift and diffusion coefficients with y-uniform convergence, cf. the definition of diffusion processes in Gikman and Skorokhod (1969, p.375) – as observed in their Remark 1, the existence of the derivatives in (115) is unnecessary because these always exist in the distribution sense in S (Rn ). However, the natural questions are not settled hereby: once a function solving (100) - (102) is obtained, it still has to be verified that it coincides with the density, p(0, x0 , t, x), even when the latter is known to exist [In the GD case, this is clear, however, since the adopted method shows the uniqueness of the solution]. Moreover, since P (0, x0 , t, ·) is a probability measure for each t > 0, it is in S (Rn ), cf. (106)–(107), so it is reasonable to ask whether it solves (100) in the distribution sense. Furthermore, it is a question whether any solution u(t, x) of (100)–(102) coincides with P (0, x0 , t, x) in general. ∇ Geometric Wiener Diffusion The GWD model (15)–(16) will not have probability densities. Proposition 6. For the GWD model, (15)–(16), the solution of Kolmogorov’s forward equation with side conditions, (100)–(102), is supported by a half-line in Rn . Hence a probability density function, p(0, x0 , t, x), does not exist. Proof: The forward Kolmogorov equation (100) is here, n n 1 2 ∂ 2u ∂u ∂u = β xj xk − aj,k xk , ∂t 2 j,k=1 ∂xj ∂xk j,k=1 ∂xj
(116)
since bj,k (x) = β 2 xj xk for j, k = 1,. . . ,n. The solution to (116), satisfying (101)–(102), is the probability measure u(t, dx) on Rn , defined for each t > 0 by the fulfillment of, ∞ β2 r2 1 e− 2t dr ϕ(x)u(t, dx) = ϕ(et(A− 2 I)+rβI x0 ) √ (117) 2πt −∞ Rn for every continuous bounded function ϕ(x) on Rn ; cf. Remark 11. 89
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen Clearly, u(t, dx) = 0 outside of the set: 1 2 x ∈ Rn ∃r ∈ R : x = et(A− 2 β I)+rβI x0 ,
(118)
which is a (t-dependent) half-line, since 1
et(A− 2 β
2
1
I)+rβI 0
x = erβ et(A− 2 β
2
I) 0
x
(119)
As u(t, dx) is supported by the Lebesgue null-set (118), the transition 2 probability, P (0, x0 , t, dx), has no probability density function. Remark 11. That the u(t, dx) stated in (117) actually is a fundamental solution of Kolmogorov’s equation associated with the GWD model, (15)–(16), will not be proved here; the verification is quite lengthy and requires full use of the various techniques for partial differential equations. In fact, our “proof” here is based on determination of the solution to a corresponding GD model (for the ‘log’ of X) followed by a transformation as a distribution density; cf. H¨ormander (1985, Ch. 6) (a technical report may be obtained from the authors). Moreover, in view of Remark 10, we are not able to definitely conclude that the solution (117) equals the transition probability P (0, x0 , t, dx). Theorem 4. Assuming that the probability measure u(t, dx) in (117) is the probability distribution P (0, x0 , t, dx) of X(t) described by the GWD model, (15)–(16), then inserting ϕ(x) = x into (117), we get ∞ 1 2 r2 1 e− 2t dr xu(t, dx) = et(A− 2 β I)+rβI · x0 √ E [X(t)] = 2πt −∞ Rn ∞ 2 (r−tβ) 1 √ e− 2t dr = etA · x0 2πt −∞ tA 0 (120) = e ·x and by insertion of ϕ(x) = x · xT in (117), we get [with Σ(0) = 0] x · xT u(t, dx) = (121) E [X(t)X(t)T ] = n R ∞ (r−2tβ)2 1 2 2 √ e− 2t dr = eβ t (etA x0 )(etA x0 )T . eβ t (etA x0 )(etA x0 )T 2πt −∞ Hence by (120) and (121), we obtain the formal expression Σ(t) = E(XX T ) − E(X)E(X)T = (eβ t − 1)(etA x0 )(etA x0 )T . (122) 2
90
Moment Evolution of Gaussian and Wiener Diffusions Expanding (122) by the explicit formulae (45)-(47), Theorem 1 for the mean values, m(t) = etA x0 , we find precisely the expression (68) for σ(t) in Theorem 2 [with σ(0) = 0], as shown in (78)–(86). Accordingly, under the assumption that u(t, dx) (117) is the probability distribution, we have not only the convenient expression in (122), but also independent evidence that parts of (78)–(86) are correct. Formula (122) also shows that Σ(t) is singular for every t > 0, when Σ(0) = 0, since, y = etA x0 gives, y · y T = (yj yk ), which has rank 1. 2.6.2 Formulae from stochastic integration The conclusion in Proposition 6 above that a density function does not exist can also be reached in the following way. The stochastic differential equation dX = AX dt + BX dw
(123)
with X(0) = x0 , is known from the stochastic integrals to have the solution 1 X(t) = exp t(A − B 2 ) + (w(t) − w(0))B · x0 , t > 0, (124) 2 when A and B commute. In particular, this holds when B = βI and A is arbitrary, as in our GWD model. See Arnold (1973, Thm.8.5.2, Rem.8.5.9) for a more general formula for m independent Wiener processes. Since [A, B] = 0, (124) becomes, 1
X(t) = et(A− 2 β
2
I) (w(t)−w(0))βI
t(A− 12 β 2 I)
= e The vector
e
· x0
(e(w(t)−w(0))β · x0 ).
(125)
e(w(t)−w(0))β · x0
is in span(x0 ) for each t > 0 and ω ∈ Ω, and the linear mapping span(x0 ) → Rn given by 1 2 et(A− 2 β I) , has at most one dimensional range; hence by (125), the vector X(t) lies in a (t-dependent) subspace of Rn . Hence P (0, x0 , t, ·) must vanish outside 1
et(A− 2 β
2
I)
(span(x0 )),
and for this reason, the measure P (0, x0 , t, ·) does not have a density function. 91
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen 2.7
Final comments
The evolution of the expectation and the covariance matrix of planar time homogeneous diffusion processes is seldom systematically presented in the literature. We have obtained the complete set of solutions for the moments of the general bivariate Gaussian diffusion process and a broad class of planar geometric Wiener diffusion processes. They were derived from the ordinary differential equations generated by Itˆo’s Lemma. These closed-form moment solutions — exposed in terms of the basic drift and diffusion coefficients — may offer useful insights and should find applications in factual studies of the diffusion dynamics within many areas of economic, social and biological process analysis. We have also related the moment solutions explicitly to the corresponding evolution of the transition probability density (or distribution), generated by Kolmogorov’s forward equation, which may play a large role in the future research on stochastic differential equations. Appendix A: Particular solutions for the GD model Based on the theory of ordinary differential equations, the particular solution of the affine differential equation σ˙ = Cσ + δ
(126)
can be obtained as follows: 1. When coefficient matrix C is nonsingular, i.e. |C| = 0, the critical point (127) σ ¯ = −C −1 δ. In Proposition 2, ⎡ ⎤ 2a 2b 0 C = ⎣ c a + d b ⎦, 0 2c 2d hence C −1
⎤ β 211 + β 212 δ = ⎣ β 11 β 21 + β 12 β 22 ⎦ , β 221 + β 222 ⎡
⎡
⎤ −2b2 −2d2 − 2 |A| 4bd 1 ⎣ ⎦. 2cd −4ad 2ab = 4 tr A |A| 2 2 4ac −2a − 2 |A| −2c
By inserting (128) and (129) into (127), then (62) follows. 92
(128)
(129)
Moment Evolution of Gaussian and Wiener Diffusions 2. When C is singular, i.e. |C| = 0, then (21) implies ... σ xx − 3 ( tr A) σ ¨ xx + 2 ( tr A)2 + 2 |A| σ˙ xx − 4 ( tr A) |A| σ xx = 2 |A| (β 211 + β 212 ) + 2(dβ 11 − bβ 21 )2 + 2(dβ 12 − bβ 22 )2 .
(130)
When tr A = a + d = 0, (130) becomes ...
σ xx + 4 |A| σ˙ xx = 2 |A| (β 211 + β 212 ) + 2(dβ 11 − bβ 21 )2 + 2(dβ 12 − bβ 22 )2 . (131) Assuming that σ xx = γt, then by (131), γ=
β 211 + β 212 (dβ 11 − bβ 21 )2 + (dβ 12 − bβ 22 )2 + 2 2 |A|
(132)
Based on (21), we get the relationship between σ xx and σ xy as σ xy =
1 (σ˙ xx − 2aσ xx − β 211 − β 212 ), 2b
(133)
and the relationship between σ yy and σ xx as σ yy =
1 [¨ σ xx 2b2
− (3a + d)σ˙ xx + (2a2 + 2ad − 2bc)σ 2x +(a + d)(β 211 + β 212 ) − 2b(β 11 β 21 + β 12 β 22 )]
(134)
By inserting (132) and σ xx = γt into (133) and (134), the particular solution of (63) is obtained. When |A| = ad − bc = 0, (130) becomes ...
σ xx − 3 ( tr A) σ ¨ xx + 2 ( tr A)2 σ˙ xx = 2(dβ 11 − bβ 21 )2 + 2(dβ 12 − bβ 22 )2 (135) Assuming that σ xx = γt, we get, from (135): (136) γ = (trA)−2 (dβ 11 − bβ 21 )2 + (dβ 12 − bβ 22 )2 By inserting (136) and σ xx = γt into (133) and (134), then the particular solution of (64) is obtained.
93
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen Appendix B: Exponential matrices in two dimensions To calculate etA for a matrix A with entries a, b, c, and d, decompose it as a−d 1 1 b 2 . (137) B = A − (a + d)I = A = (a + d)I + B, c d−a 2 2 2 Since the matrices in (137) commute, 1
etA = e 2 (a+d)t etB
(138)
From the Hamilton-Cayley theorem, (28), trB = 0 and (29), we get B 2 = ( tr B)B − |B|I = Δ2 I,
(139)
and from this it is inferred that B 2n = (Δ2 )n I,
B 2n+1 = (Δ2 )n B,
n ∈ N0 = {0} ∪ N
(140)
Using (140) and the definition in (137), we have etB =
∞ n=0
=
∞
tn B n n!
=
n=0
t2n (Δ2 )n
n=0
2n!
∞
I+
t2n B 2n 2n! ∞
∞
+
n=0
t2n+1 B 2n+1 (2n+1)!
t2n+1 (Δ2 )n (2n+1)!
n=0
(141) B
In case Δ2 = 0, and with 00 = 1, 0! = 1, we immediately obtain from (141) etB = I + tB (142) Δ2 = 0 : Thus, (142), (138) and (137) establish (46). In case Δ2 > 0, (141) gives, cf. (30), Δ2 > 0 :
etB =
∞ n=0
(Δt)2n I 2n!
+
= cosh(Δt)I +
1 Δ 1 Δ
∞ n=0
(Δt)2n+1 B (2n+1)!
(143)
sinh(Δt)B
In case Δ2 < 0, (141) gives analogously, cf. (30), Δ2 < 0 :
etB =
∞ n=0
(−1)n (Δt)2n I 2n!
= cos(Δt)I +
1 Δ
+
1 Δ
∞ n=0
(−1)n (Δt)2n+1 (2n+1)!
(144)
sin(Δt)B
Thus, (143), (144) and (138), (137), respectively, establish (45) and (47) .
94
Moment Evolution of Gaussian and Wiener Diffusions Appendix C: Exponentials in three dimensions To calculate the exponential matrix etC , we need the following Lemma 5. If a 3 × 3 matrix A has three equal eigenvalues λ, then 1 2 etA = eλt I + t (A − λI) + t2 (A − λI) , 2
(145)
and if it has three distinct eigenvalues λ1 , λ2 and λ3 , then (A − λ2 I) (A − λ3 I) (A − λ1 I) (A − λ3 I) + eλ2 t (λ1 − λ2 ) (λ1 − λ3 ) (λ2 − λ1 ) (λ2 − λ3 ) (A − λ I) (A − λ I) 1 2 + eλ3 t . (146) (λ3 − λ1 ) (λ3 − λ2 )
etA = eλ1 t
Proof: See Apostol (1969).
2
Appendix D: The Basic Wiener Process and the Evolution of Transition Probability Densities Bjarne S. Jensen University of Southern Denmark and Copenhagen Business School Mogens E. Larsen Department of Mathematics, University of Copenhagen The fundamental stochastic process in continuous time is the Wiener process (Brownian motion). The transition probability from any initial state (interval) to any other state (interval) is a fixed probability (unaffected by the past history of the stochastic process/states). A brief review of the methodology in Einstein (1905) is instructive and useful for an exact derivation of the partial differential equation (PDE) that generates the dynamics (evolution) of the probability densities from a Wiener process. The general solution of this PDE clarifies the origin and how the conditional (transition) probability density function of the Wiener process is calculated. Our demonstrations also provide a background for stochastic differential equations (SDE) and their solutions (sample paths), which are driven by the standard Wiener process.
95
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen Transition probability density and Kolmogorov’s equation Suppose we have a stochastic process described by the conditional density function, p(x, t | x0 ) - with x as the state variable (position, point) and t as the continuous time variable - describing for each value of t the distribution of the states, e.g., the position of particles. For a small time step (interval), τ , the conditional distribution of states at time t + τ depends on its neighboring distribution. The probability density of the increments during the period τ , is : f (Δ; τ ), with Δ as changes (distance) in the state (position) variable x from any initial state (position), x0 . The probability distribution of increments is assumed to be symmetric f (Δ; τ ) = f (−Δ; τ ) and have the mean zero: ∞ E(Δ) = Δf (Δ; τ )dΔ = 0; −∞
(147)
∞
f (Δ; τ )dΔ = 1;
(148)
−∞
and a variance proportional to the size of τ : ∞ 2 σΔ = Δ2 f (Δ; τ )dΔ = b2 τ ; b > 0 (b2 = D)
(149)
−∞
The conditional probability density function p(x, t +τ | x0 ) at t+τ can then be obtained as a weighted average of the neighboring conditional probability density functions ∞ p(x + Δ, t | x0 )f (Δ; τ )dΔ (150) p(x, t + τ | x0 ) = −∞
or alternatively, in view of (147), as a convolution of p(x, t | x0 ) and f (Δ; τ ) [cf. (162) below] ∞ p(x, t + τ | x0 ) = p(x + Δ, t | x0 )f (−Δ; τ )dΔ. (151) −∞
The Taylor series of p(x, t | x0 ), developed from (x, t) in both directions, yields p(x, t + τ | x0 ) = p(x, t | x0 ) + 96
∂p(x, t | x0 ) τ + ··· ∂t
(152)
Moment Evolution of Gaussian and Wiener Diffusions and 1 ∂ 2 p(x, t | x0 ) 2 ∂p(x, t | x0 ) Δ+ Δ +··· ∂x 2 ∂x2 (153) With substitution of (152) and (153) into, respectively, the left and right side of (150), we obtain
p(x + Δ, t | x0 ) = p(x, t | x0 ) +
p(x, t | x0 ) +
∂p(x, t | x0 ) τ ∂t ∞
∂p(x, t | x0 ) p(x, t | x0 ) f (Δ; τ )dΔ + ∂x −∞ ∞ 2 1 ∂ p(x, t | x0 ) + Δ2 f (Δ; τ )dΔ 2 ∂x2 −∞
∞
Δf (Δ; τ )dΔ −∞
(154)
Reduction of (154) yields, using (148) and (149), ∂p(x, t | x0 ) 1 2 ∂ 2 p(x, t | x0 ) τ b τ ∂t 2 ∂x2
(155)
which shows that - approximately and independently of τ - the conditional density function p(x, t | x0 ) satisfies the diffusion (“heat”) equation [Kolmogorov’s PDE]: 1 ∂p(x, t | x0 ) ∂ 2 p(x, t | x0 ) = b2 ∂t 2 ∂x2
(156)
Thus, by making the simple general descriptive assumptions (148-150), and then the first-order (152) and the second-order expansions (153), Einstein (1905, pp. 556) gave a clear-cut procedure for a rigorous derivation of the diffusion equation (156) - which of course physicists knew had a well-known (b2 = D) solution. To solve the partial differential equation (156), we use here the Fourier transformation (F) ∞ p(x, t | x0 )e−iξ(x−x0 ) dx (157) F p(x, t | x0 ) = pˆ(ξ, t | x0 ) = −∞
to obtain the ordinary differential equation in t: dˆ p 1 ∂ pˆ (ξ, t | x0 ) = (ξ, t | x0 ) = b2 (−iξ)2 pˆ(ξ, t | x0 ) ∂t dt 2 1 2 2 = − b ξ pˆ(ξ, t | x0 ) 2
(158) 97
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen Hence the solution of (158) is 1
pˆ(ξ, t | x0 ) = υˆ (ξ) e− 2 b
2 ξ2 t
(159)
Using the inverse Fourier transformation, we derive the solution of (156) as a convolution [cf., (162)] of the inverse of (159). We obtain, υ(x − x0 ), cf. (159), as ∞ 1 υ(x − x0 ) = F −1 υˆ (ξ) = υˆ (ξ)eiξ(x−x0 ) dξ (160) 2π −∞ and the Gauss kernel, pN (x, t | x0 ), as 1 2 2 ξ t
pN (x, t | x0 ) = F −1 e− 2 b =√
1 2πb2 t
− 12
e
=
1 2π
∞
1 2 2 ξ t
e− 2 b
eiξ(x−x0 ) dξ
−∞
(x−x0 )2 b2 t
(161)
As the inverse Fourier transform takes a product into a convolution, we get, cf. (159-161), ∞ 1 p(x, t | x0 ) = F −1 pˆ(ξ, t | x0 ) = pˆ(ξ, t | x0 ) eiξ(x−x0 ) dξ = 2π −∞ ∞ υ(x − x0 − s) pN (s, t | x0 )ds (162) υ(x − x0 ) ∗ pN (x, t | x0 ) = −∞
Now, in the sense of “distribution” (generalized functions), the Gauss kernel (161) converges to the Dirac δ(x − x0 ) function, for t → 0. Hence (162) gives p(x, 0 | x0 ) = lim p(x, t | x0 ) = υ(x − x0 ) ∗ δ(x − x0 ) = υ(x − x0 ) (163) t→0
If we originally had chosen the boundary condition as υ(x − x0 ) = δ(x − x0 )
(164)
then we would have the conditional probability density function as p(x, t | x0 ) = δ(x − x0 ) ∗ pN (x, t | x0 ) = pN (x, t | x0 )
(165)
i.e. we may consider p(x, t | x0 ) and the Gauss kernel (161) as the solution (162) to Kolmogorov’s equation (156) with the boundary condition (164). See also Bachelier (1900, p. 46, 1964, p. 38).
98
Moment Evolution of Gaussian and Wiener Diffusions Acknowledgements: We wish to thank P. Alsholm (Technical University of Denmark) and J.M.S. Jensen (Vinci Computers) for kind assistance in computer calculations, and G. Grubb, M. Jacobsen (University of Copenhagen), H. Spliid, H. Holst (Technical University of Denmark) for valuable comments and discussions. For discussions about the Kolmogorov equation, thanks are also due to Y. V. Prohorov (Steklov-Institute for Mathematics, Moscow). Bjarne S. Jensen thanks the Danish Social Sciences Research Council for Grant no. 9500740.
References: Apostol, T. M. (1969) “Some Explicit Formulas for the Exponential Matrix etA .” American Mathematical Monthly 76: 289-292. Arnold, L. (1973) Stochastic Differential Equations: Theory and Applications. John Wiley and Sons, New York, London, Sydney, Toronto. German original published by R. Oldenbourg Verlag, Munich 1973. Bachelier, L. (1900) Theorie de la Sp´eculation. Annales scientifiques de l’ Ecole normale sup´erieure, (3), No. 1018. Gauthier-Villars, Paris. English translation in: Cootner, P.H. (ed.) The Random Character of Stock Market Prices. MIT Press, Cambridge Mass., 1964: 17–78. Chang, F. and Malliaris, A. G. (1987) “Asymptotic Growth under Uncertainty: Existence and Uniqueness.” Review of Economic Studies 54: 169-174. ¨ Einstein, A. (1905) “Uber die von der molekularkinetischen Theorie der W¨arme gefordete Bewegung von in ruhenden Fl¨ ussigkeiten suspendierten Teilschen.” Annalen der Physik 17: 549 - 560. Translated as “On the Motion of Small particles in Liquids at Rest required by the Molecular-Kinetic Theory of Heat” in the following volumes: (i) F¨ urth, R. (ed.) Albert Einstein: Investigations on the Theory of the Brownian Movement Dover Publ., New York. (ii) Stachel, J. (ed.) Einsteins Miraculous Year. Five Papers that Changed the Face of Physics. Princeton University Press, Princeton, 2005. Gikman, I. I. and Skorokhod, A. V. (1969) Introduction to the Theory of Random Processes. W. B. Saunders Company, Philadelphia, London, Toronto. Russian original published by Nauka, Moscow, 1965. 99
Bjarne S. Jensen, Chunyan Wang, Jon Johnsen H¨ormander, L. (1967) “Hypoelliptic Second Order Differential Equations”, Acta Mathematica 119: 147–171. H¨ormander, L. (1983, 1985) The Analysis of Linear Partial Differential Operators. Grundlehren der mathematischen Wissenschaften, vol. 256, 257, 274, 275. Springer Verlag, Berlin. Itˆo, K. and Mckean, H. P. Jr. (1974) Diffusion Processes and Their Sample Paths. Second printing. Springer-Verlag, New York. Jacobsen, M. (1991) Homogeneous Gaussian Diffusions in Finite Dimensions. Institute of Mathematical Statistics, University of Copenhagen, Preprint No. 3. Jensen, B. S. (1994) The Dynamic Systems of Basic Economic Growth Models. Kluwer Academic Publishers, Dordrecht/Boston/London. Karatzas I. and Shreve, S. (1988) Brownian Motions and Stochastic Calculus. Springer-Verlag, New York. Merton, R. C. (1975) “An Asymptotic Theory of Growth Under Uncertainty.” Review of Economic Studies 47: 375-393. Prohorov, Y. V. and Rozanov, Y. A. (1969) Probability Theory. Springer-Verlag. Pugachev, V. S. and Sinitsyn, I. N. (1987) Stochastic Differential Systems. John Wiley and Sons, New York. Rudin, W. (1973) Functional Analysis, McGraw-Hill. Selby, S. M. (1973) Standard Mathematical Tables. 21. ed. Chemical Rubber Co., Cleveland, Ohio. Soong, T. T. (1973) Random Differential Equations in Science and Engineering, Academic Press, New York. Øksendal, B. (2005) Stochastic Differential Equations. 6. edition. Series Universitext, Springer-Verlag, Berlin.
100
Chapter 3 Two-Dimensional Linear Dynamic Systems with Small Random Terms
Nishioka Kunio Faculty of Commerce, Chuo University, Tokyo, Japan
3.1
Introduction
Let (Ω, F, P) be a standard probability space1 and {W (t, ω), t ≥ 0} be a R1 valued Brownian motion on (Ω, F, P). We consider a linear stochastic differential equation (SDE in abbreviation): dxε (t, ω) = A · xε (t, ω) dt + ε G · xε (t, ω) dW (t, ω) xε (0, ω) = x∗
(1)
where A and G are constant regular 2 × 2 matrices, ε is a positive real number, and x∗ = (x∗1 , x∗2 ) is a point in R2 . There exists a unique solution of SDE (1), which defines a random dynamical system {xε (t, ω), t ≥ 0} in R2 . Remark that the origin 0 = (0, 0) is a singular point to our dynamical system for any ε ≥ 0, that is xε (t, ω) ≡ 0 for all t ≥ 0 if xε (0, ω) = 0. 1
Ω is a sample space, whose element ω ∈ Ω denotes an individual experiment. F is a σ-field of Ω and P is a probability measure on (Ω, F). See Ito and McKean (1968) or Revuz and Yor (1999) for strict definitions of probability space, Brownian motion, SDE, and the others.
Nishioka Kunio Let ε = 0 in SDE (1), and we have an ordinary differential equation dx (t) = A · x(t), dt
x(0) = x∗ ,
(2)
whose solution defines a non-random dynamical system {x(t), t ≥ 0}. It is easy to investigate asymptotic behaviors of x(t) as t → ∞. According to these asymptotic behaviors, the origin 0 is classified into a spiral point, a center, a saddle point, an improper node, and a proper node. (See section 3.2 in this paper or/and Coddington and Levison (1955), Ch.15, §1.) Our problem in this note is to answer the question: Does that classification for a non-random system (2) keep validity for the random system (1) with sufficiently small ε? We construct this note as follows. In section 3.2, we talk about the non-random system (2), and review some needful facts to investigate our problem in section 3.2 and 3.3. We discuss the random system (1) for a spiral point and a center in section 3.4, for a saddle point in section 3.5, and for an improper and a proper node in section 3.6. After coordinating Theorems 1, 2, 3, 4, 5, and 6, we present the following main result: Main Theorem.2 Let the origin 0 be a spiral point, an improper node, or a saddle point for the non-random system (2), then it is true for the random system (1) with small ε > 0. However, if the origin is a center or a proper node for the nonrandom system (2), it is not necessarily true for the random system (1) even though ε > 0 is small. 3.2
Non-random dynamic system
We begin to remark that there is a regular matrix Q such that the transformed matrix Q · A · Q−1 or −Q · A · Q−1 is one of the following canonical forms:
2
102
See Remark 3 for an intuitive explanation of this theorem.
Two-Dimensional Systems with Small Random Terms
% (I) (II) (III) (IV) (V) (VI)
% % % % %
a1 a2 0 a2 a1 0 a1 0 a1 a2 a1 0
& −a2 a1 & −a2 0& 0 a2 & 0 a2 & 0 a1 & 0 a1
with a1 < 0 and a2 > 0, with a2 > 0, with a1 < 0 < a2 , with a2 < a1 < 0, with a1 < 0 and a2 > 0, with a1 < 0. (3)
For simplicity, we assume that the matrix A is one of the above canonical forms (I) through (VI), and consider the following couple of the non-random dynamical systems instead of (2) alone: dx (t) = A · x(t), dt ˜ dx ˜ (t), (t) = A˜ · x dt where we put
x(0) = x∗ , (4) ˜ (0) = x∗ , x
A˜ ≡ −A.
(5)
In order (4), we introduce new coordinate functions for to analyze {x(t) = x1 (t), x2 (t) , t ≥ 0}, that is θ(t) = tan−1 (x2 (t)/x1 (t)), ρ(t) ≡ log x(t).
(6)
The function
ρ(t) , x∗ = x(0) (7) t→∞ t is called Lyapunov index which denotes exponential stability or instability of the dynamical system (4). In fact, if L(x∗ ) = 0, then x(t) ∼ K exp{L(x∗ ) t} as t → ∞, x∗ = x(0), L(x∗ ) ≡ lim
where K is a positive constant. 103
Nishioka Kunio To the other dynamical system {˜ x(t), t ≥ 0} in (4), we define ∗ ˜ ˜ θ(t), ρ˜(t), and L(x ) by the corresponding functions in (6) and (7) respectively. If the matrix A is (I) in (3), we call the origin a spiral point for the dynamical system (4) (see Figure 1). By a simple calculation, we see that (8) θ(t) = θ(0) + a2 t, ρ(t) = ρ(0) + a1 t. So the spiral point is characterized by the fact: θ(T ) = a2 = 0, L(x∗ ) = a1 < 0, T ˜ ) θ(T ˜ ∗ ) = −a1 > 0. lim = −a2 = 0, L(x T →∞ T lim
T →∞
(9)
Figure 1: A spiral point (I), and a center (II) If A is (II), then it is just a special case of (8) where a1 = 0, and it is called a center (see Figure 1). A center is distinguished with the fact: θ(T ) = a2 = 0, L(x∗ ) = 0, T ˜ ) θ(T ˜ ∗ ) = 0. lim = −a2 = 0, L(x T →∞ T lim
T →∞
If the matrix A is (III) or (IV) in (3), then tan θ(t) = tan θ(0) exp{(a2 − a1 )t} , t cos2 θ(s) ds. ρ(t) = ρ(0) + a2 t + (a1 − a2 ) 0
104
(10)
Two-Dimensional Systems with Small Random Terms The origin 0 is said as a saddle point in the case of (III) (see Figure 2), and it is distinguished by the following: π/2 if θ(0) ∈ (0, π) lim θ(t) = (11a) 3π/2 if θ(0) ∈ (π, 2π), t→∞ L(x∗ ) = a2 > 0 if x∗ = (x∗1 , x∗2 ) with x∗2 = 0, ˜ lim θ(t) =
t→∞
˜ ∈ (−π/2, π/2) 0 if θ(0) ˜ ∈ (π/2, 3π/2), π if θ(0)
˜ ∗ ) = −a1 > 0 if x∗ = (x∗ , x∗ ) with x∗ = 0. L(x 1 2 1
(11b)
(11c) (11d)
Figure 2: A saddle point (III), and an improper node (IV) The origin is an improper node in the case (IV) (see Figure 2), and the following holds: lim θ(t) = same as the right side of (11c),
(12a)
L(x∗ ) = a1 < 0 if x∗ = (x∗1 , x∗2 ) with x∗1 = 0, ˜ = same as the right side of (11a), lim θ(t)
(12b) (12c)
t→∞
t→∞
˜ ∗ ) = −a1 > 0 if x∗ = (x∗ , x∗ ) with x∗ = 0. L(x 1 2 1
(12d)
When the matrix A is (V) in (3), then tan θ(t) = tan θ(0) + a2 t, t ρ(t) = ρ(0) + a1 t + a2 sin 2θ(s) ds. 0
105
Nishioka Kunio
Figure 3: An improper node (V), and a proper node (VI) and we say also that the origin 0 is an improper node (see Figure 3). This improper node is characterized by the fact: π/2 if θ(0) ∈ (−π/2, π/2] lim θ(t) = (13a) −π/2 if θ(0) ∈ (π/2, 3π/2], t→∞ (13b) L(x∗ ) = a1 < 0, ˜ = lim θ(t)
t→∞
−π/2 if θ(0) ∈ [−π/2, π/2) π/2 if θ(0) ∈ [π/2, 3π/2),
˜ ∗ ) = −a1 > 0. L(x
(13c) (13d)
If A is (VI) in (3), then θ(t) = θ(0) for all t ≥ 0, ρ(t) = ρ(0) exp{a1 t},
(14)
where the origin 0 is said as a proper node (see Figure 3), and it is distinguished with the fact: θ(t) = θ(0) for all t ≥ 0, ˜ = θ(0) for all t ≥ 0, θ(t) 3.3
L(x∗ ) = a1 < 0, ˜ ∗ ) = −a1 > 0. L(x
(15)
Lyapunov index of the random system
Let e(θ) and e† (θ) be two dimensional vectors such that e(θ) ≡ cos θ, sin θ , e† (θ) ≡ sin θ, − cos θ .
(16)
We denote by x, y the inner product of vectors x and y in R2 , and by gij the ij-element of the matrix G and so on. 106
Two-Dimensional Systems with Small Random Terms As in the previous section, we also use the same coordinate functions for the random dynamical system {xε (t, ω) = xε1 (t, ω), xε2 (t, ω) , t ≥ 0}, that is
θε (t, ω) ≡ tan−1 xε2 (t, ω) /xε1 (t, ω) , ρε (t, ω) ≡ log xε (t, ω).
(17)
Applying Itˆo’s formula, we know that the diffusion process {θε (t, ω), t ≥ 0} satisfies SDE dθε (t, ω) = bε (θε (t, ω)) dt + ε σ(θε (t, ω)) dW (t, ω)
(18)
where σ(θ) ≡ − e† (θ), G · e(θ) = g21 cos2 θ + (g22 − g11 ) cos θ sin θ − g12 sin2 θ, †
(19)
2
b (θ) ≡ − e (θ), A · e(θ) + ε bG (θ) ε
with bG (θ) ≡ e† (θ), G · e(θ) e(θ), G · e(θ) = −σ(θ) g11 cos2 θ + (g12 + g21 ) cos θ sin θ + g22 sin2 θ . On the other hand, the diffusion process {ρε (t, ω), t ≥ 0} is a solution of SDE dρε (t, ω) = Qε (θε (t, ω)) dt + ε R(θε (t, ω)) dW (t, ω)
(20)
where R(θ) ≡ e(θ), G · e(θ), Qε (θ) ≡ e(θ), A · e(θ) + ε2 qG (θ) with qG (θ) ≡
(21)
+ ,2 1 | G · e(θ)|2 − e(θ), G · e(θ) . 2 107
Nishioka Kunio Definition 1. As in (7), a non-random function3 ρε (t, ω) a.s., x∗ = xε (0, ω) (22) t→∞ t is said as Lyapunov index of the random system {xε (t, ω), t ≥ 0}, if it exists. Lε (x∗ ) ≡ lim
As well known, this Lyapunov index denotes stability of the random system {xε (t, ω), t ≥ 0}:4 Proposition 1. Assume that Lyapunov index Lε (x∗ ) exists with probability one. If Lε (x∗ ) < 0, then the random system {xε (t, ω), t ≥ 0} is asymptotically stable, that is lim |xε (t, ω)| = 0
a.s.,
t→∞
x∗ = xε (0, ω).
While if Lε (x∗ ) > 0, then the random system is asymptotically unstable, that is lim xε (t, ω) = ∞ a.s., x∗ = xε (0, ω). t→∞
Notation 1. After the manner of section 3.2, we suppose that the matrix A is one of the canonical forms (3). So besides SDE (1), we consider a SDE such that ˜ ε (t, ω) dt + ε G · x ˜ ε (t, ω) dW (t, ω) d˜ xε (t, ω) = A˜ · x ˜ ε (0, ω) = x∗ x
(23)
where A˜ = −A as in (5). We signify the random system defined by SDE (23) with {˜ xε (t, ω), t ≥ 0}, and the corresponding functions to ˜ ε (x∗ ) respectively. (17) and (22) with θ˜ε (t, ω), ρ˜ε (t, ω), and L We consider how to calculate the Lyapunov index (22). From SDE (20), it follows that ρε (T, ω) − ρε (0, ω) T = Qε (θε (t, ω)) dt + ε 0 3
0
T
+
, e(θε (t, ω)), G · e(θε (t, ω)) dW (t, ω).
The sign ‘a.s.’ in the below equation is an abbreviation of ‘almost surely’, what is same to say that a stochastic event occurs with probability one. 4 Khas’minski (1980).
108
Two-Dimensional Systems with Small Random Terms Since the function e(θ), G · e(θ) is bounded, the representation theorem of continuous martingales 5 and Law of the iterated logarithm 6 imply that ρε (T, ω) 1 = lim T →∞ T →∞ T T
lim
T
Qε (θε (t, ω)) dt a.s.,
(24)
0
which instructs a way to compute the Lyapunov index for the random system (1). Remark 1. Since the random system (1) is a two dimensional linear system, we enjoy the following remarkable facts: (a) The function Qε in SDE (20) depends only on the variable θ, (b) and there is no variable ρε (t, ω) in SDE (18) for {θε (t, ω), t ≥ 0}. In conclusion, we may calculate the Lyapunov index Lε (x∗ ) of the random system (1), if we can analyze asymptotic behaviors of the stochastic process {θε (t, ω), t ≥ 0} on the unit circle. ∇ 3.4
One-dimensional diffusion process in an interval
Before analyzing the stochastic process {θε (t, ω), t ≥ 0} on the unit circle, we consider a one dimensional diffusion process {x(t, ω), t ≥ 0} defined by SDE dx(t, ω) = b(x(t, ω)) dt + σ(x(t, ω)) dW (t, ω).
(25)
Assumption 1. Let [α, β] be a closed interval and we suppose the following: (a) The coefficients σ and b are continuous functions on R1 and they satisfy Lipschitz condition, i.e., there is a constant K such that |σ(x) − σ(y)| ≤ K |x − y| and |b(x) − b(y)| ≤ K |x − y| for all x, y. (b) σ(x) > 0 if α < x < β, and σ(α) = 0 = σ(β). 5 6
See Revuz and Yor (1999), Ch V, (1.7) Theorem. See Revuz and Yor (1999), Ch. II, (1.9) Theorem.
109
Nishioka Kunio We define Feller’s canonical scale function to the diffusion process of (25) by x S(x) ≡ s(x† , y) dy, α < x < β, (26) x†
†
where x is a fixed point in the open interval (α, β) and x 2 b(y) † s(x , x) ≡ exp − 2 dy , α < x < β. x† σ(y)
(27)
The first hitting time to a point x∗ ∈ [α, β] is defined as inf{t > 0 : x(t, ω) = x∗ } ∗ τ (x , ω) ≡ ∞ if the above { } is empty. For each point x∗ , we introduce the functions7 h− (x∗ ) ≡ lim Ex∗ [exp{−τ (x∗ − δ, ω)}], δ↓0
+
∗
h (x ) ≡ lim Ex∗ [exp{−τ (x∗ + δ, ω)}]. δ↓0
Here it is known that the values of these functions h− (x∗ ) and h+ (x∗ ) are 0 or 1, from Kolmogorov’s 0-1 law 8 . Due to Itˆo and McKean (1968), Ch. 3, §3.4, we classify a point x∗ ∈ [α, β]. (i) A point x∗ is regular, if h− (x∗ ) = 1 and h+ (x∗ ) = 1. (ii) It is a left shunt, if h− (x∗ ) = 1 and h+ (x∗ ) = 0. While it is a right shunt, if h− (x∗ ) = 0 and h+ (x∗ ) = 1. (iii) It is a trap, if h− (x∗ ) = 0 and h+ (x∗ ) = 0 . Moreover the trap x∗ is called right ( left ) repelling, if S(x∗ +) ≡ lim S(x∗ + δ) = −∞ δ↓0
( resp. S(x∗ −) = ∞ )
for Feller’s canonical scale function S in (26). On the other hand, it is right ( left ) attracting, if S(x∗ +) > −∞ ( resp. S(x∗ −) < ∞ ). 7
From now on, Ex∗ [ · ] in the below equation denotes the conditional expectation with respect to the probability measure P under the condition x(0, ω) = x∗ . 8 See Ito and McKean (1968), §3.3.
110
Two-Dimensional Systems with Small Random Terms Let Assumption 1 hold, then the following table (28) denotes all possible combinations of the boundary points α and β in the foregoing classifications.
Case 3.1: Case 3.2: Case 3.3: Case 3.4: Case 3.5: Case 3.6: Case 3.7: Case 3.8: Case 3.9: Case 3.10:
b(α) b(α) > 0, b(α) > 0, b(α) < 0, b(α) = 0, b(α) = 0, b(α) = 0, b(α) = 0, b(α) = 0, b(α) = 0, b(α) = 0,
S(α+)
b(β) S(β−) b(β) < 0 b(β) > 0 b(β) > 0 b(β) < 0 b(β) < 0 b(β) > 0 b(β) > 0 b(β) = 0, S(β−) = ∞ b(β) = 0, S(β−) < ∞ b(β) = 0, S(β−) < ∞ (28)
S(α+) = −∞, S(α+) > −∞, S(α+) = −∞, S(α+) > −∞, S(α+) = −∞, S(α+) = −∞, S(α+) > −∞,
where b is the coefficient function in SDE (25) and S is Feller’s canonical scale function (26). Following to the classification in table (28), we review behaviors of the diffusion process {x(t, ω), t ≥ 0} in the interval [α, β]. One can find proofs to each parts of the next proposition in Friedman (1976), Ito and McKean (1968), Khas’minski (1967), Maruyama and Tanaka (1957), Nishioka (1976a, 1976b). Proposition 2. Let Assumption 1 hold and {x(t, ω), t ≥ 0} be the diffusion process defined by SDE (25). (i) In Case 3.1, the boundary point α is a right shunt and β is a left. Therefore, it holds that x(t, ω) ∈ (α, β) a.s. for all t > 0 if x∗ = x(0, ω) ∈ [α, β], and there exists an invariant probability measure μ such that 1 T →∞ T
lim
0
T
f x(t, ω) dt =
β
f (y) μ(dy) a.s.
(29)
α
for each function f which is summable with respect to μ. 111
Nishioka Kunio (ii) When Case 3.2 holds, α and β are both right shunts. So it holds that x(t, ω) > α a.s. for all t > 0 and Ex∗ [τ (β, ω)] < ∞ if x∗ ∈ [α, β]. (iii) If Case 3.3 holds, then α is a left shunt and β is a right. Therefore . Ex∗ [ min τ (α, ω), τ (β, ω) ] < ∞ for each x∗ ∈ (α, β), and it holds that 9 S(β−) − S(x∗ ) , S(β−) − S(α+) S(x∗ ) − S(α+) Px∗ [τ (β, ω) < τ (α, ω)] = . S(β−) − S(α+)
Px∗ [τ (α, ω) < τ (β, ω)] =
(iv) If Case 3.4 holds, then α is a right repelling trap and β is a left shunt. So it holds that x(t, ω) ∈ (α, β) a.s. for all t > 0 if x∗ = x(0, ω) ∈ (α, β], and there exists an invariant measure 10 μ such that β T f (x(t, ω)) dt f (y) μ(dy) 0 α = β lim T a.s. T →∞ g(x(t, ω)) dt g(y) μ(dy) 0
(30)
(31)
α
for each functions β f and g which are summable with respect to the g(y) μ(dy) = 0. measure μ and α
(v) In Case 3.5, α is a right attracting trap and β is a left shunt. Then (30) holds, but Px∗ [ lim x(t, ω) = α] = 1 t→∞
for each x∗ ∈ (α, β].
9 Here and later on, we denote by Px∗ [τ (α, ω) < τ (β, ω)] in the below equation the conditional probability of the stochastic event
{ω : τ (α, ω) < τ (β, ω)} under the condition x(0, ω) = x∗ , and so on. 10 This μ is not necessarily a probability measure.
112
Two-Dimensional Systems with Small Random Terms (vi) Let Case 3.6 hold, and α is a right repelling trap and β is a right shunt. Then it holds that Px∗ [τ (β, ω) < ∞] = 1
for each x∗ ∈ (α, β].
(vii) In Case 3.7, α is a right attracting trap and β is a right shunt. Therefore it holds that S(β−) − S(x∗ ) and t→∞ S(β−) − S(α+) S(x∗ ) − S(α+) Px∗ [τ (β, ω) < ∞] = for each x∗ ∈ (α, β). S(β−) − S(α+)
Px∗ [ lim x(t, ω) = α] =
(viii) If Case 3.8 holds, then α is a right repelling trap and β is a left repelling. Therefore x(t, ω) ∈ (α, β) a.s. for all t ≥ 0 if x∗ = x(0, ω) ∈ (α, β),
(32)
and there exists an invariant measure 11 μ such that (31) holds. (ix) Let Case 3.9 hold, then α is a right repelling trap and β is a left attracting. Then (32) is valid, but Px∗ [ lim x(t, ω) = β] = 1 t→∞
for each x∗ ∈ (α, β).
(x) When Case 3.10 holds, α is a right attracting trap and β is a left attracting. Therefore (32) is true, but S(β−) − S(x∗ ) t→∞ S(β−) − S(α+) S(x∗ ) − S(α+) Px∗ [ lim x(t, ω) = β] = t→∞ S(β−) − S(α+) Px∗ [ lim x(t, ω) = α] =
3.5
and for each x∗ ∈ (α, β).
Spiral point and center
Let the matrix A be (I) in (3), and the origin 0 is a spiral point to the non-random system (4) (see Figure 1), which is characterized by (9). In this setting, SDE (18) and (20) come to be θε (T, ω) − θε (0, ω) T 2 ε = a2 T + ε bG (θ (t, ω)) dt + ε 0
11
T
σ(θε (t, ω)) dW (t, ω),
(33)
0
This μ is not necessarily a probability measure.
113
Nishioka Kunio ρε (T, ω) − ρε (0, ω) T qG (θε (t, ω)) dt + ε = a1 T + ε2 0
T
R(θε (t, ω)) dW (t, ω).
(34)
0
Step 1. First we suppose that
g11 − g22
2
+ 4 g12 g21 < 0
(35)
holds for the matrix G = gij in the random system (1). Since (19) implies that the coefficient function σ(θ) in SDE (18) does not vanish, the process {θε (t, ω), t ≥ 0} is a non-degenerate diffusion process and there exists an invariant probability measure με such that 1 T →∞ T
lim
T
ϕ(θε (t, ω)) dt =
0
2π
ϕ(y) με (dy) a.s.
(36)
0
for each continuous function ϕ on the unit circle. Step 2. Next we suppose that the matrix G = (gij ) satisfies
g11 − g22
2
+ 4 g12 g21 ≥ 0.
(37)
Note that the function σ(θ) in SDE (18) is a trigonometric polynomials of degree 2 and it is a periodic function with period π. Therefore if (37) holds, then σ(θ) vanishes at four points on the unit circle, say γ1 , γ2 , γ3 ( = γ1 + π ), and γ4 ( = γ2 + π ), where γ1 may equal to γ2 . (See Figure 4.) From (28), all γk ’s are right (anti-clockwise) shunts, because bε (θ) = a2 + ε2 bG (θ) > 0 for any θ, owing to smallness of ε. Now we can apply Proposition 2 (ii) to {θε (t), t ≥ 0} in the interval [γk , γk+1 ]. Repeat this procedure to the intervals [γ1 , γ2 ], [γ2 , γ3 ], · · · , and we see that θε (t, ω) > θ∗ − π
for all t ≥ 0 and lim θε (t, ω) = ∞ a.s., t→∞
Eθ∗ [τ (θ† , ω)] < ∞ for each point θ† on the unit circle.
(38)
Therefore there exists an invariant probability measure such that (36) holds. Lemma 1. Let the matrix A be (I) or (II) in (3) and ε be small. Then there exists an invariant measure με such that (36) holds. 114
Two-Dimensional Systems with Small Random Terms
Figure 4: Positions of γk ’s on the unit circle
From Lemma 1 we see that 1 lim T →∞ T
0
T
ε
bG (θ (t, ω)) dt and
1 lim T →∞ T
0
T
qG (θε (t, ω)) dt
converge a.s. to each constant. In addition since the functions σ(θ) and R(θ) are bounded, Law of iterated logarithm implies that ε lim θ (T, ω) − a2 ≤ 16 ε2 ||G||2 a.s., (39a) T →∞ T ρε (T, ω) ≤ a1 + 4 ε2 ||G||2 a.s., a1 − 16 ε2 ||G||2 ≤ Lε (x∗ ) = lim T →∞ T (39b)
where ||G|| ≡ max |(G)ij |. i,j
After repeating the similar arguments to θ˜ε (T, ω) and ρ˜ε (T, ω), we have the following theorem. 115
Nishioka Kunio Theorem 1. Suppose that the origin 0 is a spiral point to the nonrandom system (4) characterized by (9). Then the origin is also a spiral point to the related random system for sufficiently small ε > 0. More precisely, for any δ > 0, there exists a positive ε∗ such that the following inequalities hold with probability one: If ε < ε∗ , then ε lim θ (T, ω) − a2 < δ, T →∞ T ˜ε (T, ω) θ lim + a2 < δ, T →∞ T
ε ∗ L (x ) − a1 < δ, ε ∗ ˜ (x ) + a1 < δ. L
(40)
A center is the special case of the spiral point, and (40) is valid but a1 = 0. We present two counter examples such that Lyapunov index Lε (x∗ ) is not zero even though ε is sufficiently small. Example 1. Let the matrix A be (II), then the origin 0 is a center for the non-random system (4), which is characterized by (10). (i) First we set
% G=
0 c −c 0
& c = 0,
,
in the related random system (1). In this setting, SDE (20) becomes T ε ε 2 2 R(θε (t, ω)) dW (t, ω). ρ (T, ω) − ρ (0, ω) = ε c T + ε 0
From Theorem 1 and the Law of iterated logarithm we derive the following inequalities instead of (40): ε lim θ (T, ω) − a2 < δ, T →∞ T ˜ε (T, ω) θ lim + a2 < δ, T →∞ T
Lε (x∗ ) = ε2 c2 > 0, ˜ ε (x∗ ) = ε2 c2 > 0. L
From Proposition 1, we see that for any ε > 0 lim |xε (t, ω)| = ∞ a.s. and lim |˜ xε (t, ω)| = ∞ a.s.
t→∞
t→∞
in this random system, and the origin 0 is closer to an unstable spiral point than a center which is characterized by (10). (ii) Next we set
% G=
116
c 0 0 c
& ,
c = 0,
Two-Dimensional Systems with Small Random Terms in the related random system. In this set-up, SDE (20) is ε2 c2 ρ (T, ω) − ρ (0, ω) = − T +ε 2 ε
ε
T
R(θε (t, ω)) dW (t, ω),
0
and we have the following inequalities instead of (40): ε lim θ (T, ω) − a2 < δ, T →∞ T ˜ε (T, ω) θ lim + a2 < δ, T →∞ T
Lε (x∗ ) = −ε2 c2 /2 < 0, ˜ ε (x∗ ) = −ε2 c2 /2 < 0. L
So we obtain that, for any ε > 0, xε (t, ω)| = 0 a.s., lim |xε (t, ω)| = 0 a.s. and lim |˜
t→∞
t→∞
and the origin is nearer to a stable spiral point than a center. Remark 2. When the non-random system is a center, it is neither stable nor unstable. But it becomes to be stable or unstable after adding the foregoing random terms. ∇ Now the following result is evident. Theorem 2. Let the origin be a center to the non-random system (4), i.e. the matrix A is (II) in (3) and (10) holds. Then the origin is not necessarily a center to the related random system even though ε > 0 is sufficiently small. 3.6
Saddle point
Let the matrix A be (III) in (3), and the origin 0 is a saddle point to the non-random system (4) (See Figure 2). As the related random system, we first consider {θε (t, ω), t ≥ 0} given by SDE (18) which comes to be bε (θ) =
a2 − a1 sin 2θ + ε2 bG (θ), 2
a1 < 0 < a2 .
(41)
Since this bε is a periodic function with period π and ε is small, there are four points ξ1 ε < η1 ε < ξ2 ε ( = ξ1 ε + π) < η2 ε ( = η1 ε + π) 117
Nishioka Kunio on the unit circle such that bε (θ) =0 if θ = ξ1 ε , ξ2 ε , η1 ε , η2 ε >0 if θ ∈ (ξ1 ε , η1 ε ) ∪ (ξ2 ε , η2 ε )
(42)
0 for all θ. In this set-up, {θε (t, ω), t ≥ 0} is a non-degenerate diffusion process on the unit circle, and there exists an invariant probability measure με such that (36) holds. The invariant probability measure is με (dθ) ≡ 0
m1 ε (θ) + m2 ε (θ) dθ, 2π ε ε m1 (x) + m2 (x) dx
(43)
where (and later on) we set the function12 θ 1 2 bε (x) ε † s (θ , θ) ≡ exp − 2 2 dx . ε θ† σ(x)
(44)
and define the functions m1 ε and m2 ε as follows; (a) If θ ∈ [ξ1 ε , ξ2 ε ) = [ξ1 ε , ξ1 ε + π) ,
ξ2 ε
sε (ξ1 ε , x) dx , m1 (θ) ≡ 2 ε2 σ(θ) sε (ξ1 ε , θ) ε
θ
(b) If θ ∈ [ξ2 ε , ξ2 ε + π)
m2 ε (θ) ≡
sε (θ, ξ2 ε ) 2 ε2 σ(θ)
ξ1
= [ξ2 ε , ξ1 ε + 2π) ,
ξ2 ε +π
sε (ξ2 ε , x) dx m1 ε (θ) ≡ θ , 2 ε2 σ(θ) sε (ξ2 ε , θ) sε (θ, ξ2 ε + 2π) θ ε ε s (ξ1 , x) dx. m2 ε (θ) ≡ 2 ξ2 ε ε2 σ(θ) 12
118
This is the same function as (27).
θ ε
sε (ξ1 ε , x) dx.
Two-Dimensional Systems with Small Random Terms We shall investigate asymptotic behavior of the process θε (t, ω) for large t and small ε. Proposition 3. Let the matrix A be (III) in (3) and suppose that σ(θ) > 0 for all θ, i.e. the matrix G = (gij ) satisfies (35). (i) The process {θε (t, ω), t ≥ 0} is a non-degenerate diffusion process on the unit circle. (ii) We define the probability measure με by (43). Then
1 T
T
0
ϕ(θε (t, ω)) dt →
2π
ϕ(θ) με (dθ) a.s. 0
as T → ∞
for each periodic continuous function ϕ with period π. Moreover it holds that 2π π ϕ(θ) με (dθ) → ϕ( ) as ε → 0. 2 0 Using this proposition, we can calculate the Lyapunov index of the random system in our setting. Corollary 1. Suppose the same assumption as the previous proposition. Then the Lyapunov index Lε (x∗ ) of the random system (1) converges to a2 as ε → 0, where a2 is the Lyapunov index of the corresponding non-random system (4). Before the proof of Proposition 3, we prove the corollary. Proof: From Definition 1, (24), and Proposition 3, we see that ε
∗
L (x ) = 0
2π
π Qε (θ) με (dθ) → Q0 ( ) = a2 2
as ε → 0. 2
In order to prove Proposition 3, we need the following result, which is known as Laplace method 13 . 13 One can find its proof in many books/papers. See Nevelson (1964), for instance.
119
Nishioka Kunio Lemma 2. (Laplace method) Let g and f be continuous functions defined on a closed interval [α, β]. Suppose that (a) there is a unique point x∗ ∈ [α, β] such that maxx∈[α,β] f (x) = f (x∗ ), (b) in a neighborhood of x∗ , f is a C 2 class function and f (x∗ ) = 0, (c) and g(x∗ ) > 0. For a positive number K, we put β J(K) ≡ g(x) exp{Kf (x)} dx. α
(i) If α < x∗ < β, then / 1 2π ∗ ∗ g(x ) ) exp{Kf (x )} 1 + o ( J(K) = −K f (x∗ ) K (ii) If x∗ equals to α or β and if f (x∗ ) = 0, then 0 1 π ∗ ∗ g(x ) ) exp{Kf (x )} 1 + o ( J(K) = −K f (x∗ ) K
as K → ∞.
as K → ∞.
(iii) If x∗ equals to α or β and if f (x∗ ) = 0, then J(K) =
1 1 ∗ ∗ g(x ) ) exp{Kf (x )} 1 + o ( K |f (x∗ )| K
as K → ∞.
Proof of Proposition 3: The assertion (i) and the first half of (ii) are well known results14 . So we shall prove the last half of the statement (ii). First note that the equality ξ2 ε 2π ε ϕ(θ) μ (dθ) = 2 ϕ(θ) με (dθ), ξ2 ε = ξ1 ε + π , 0
ξ1 ε
is true, since ϕ and με are periodic with period π. We set a function F ε (θ) by θ 2 bε (x) ε F (θ) ≡ 2 dx ξ1 ε σ(x) 14
120
See Khasminski (1967), Ch. 4, for instance.
Two-Dimensional Systems with Small Random Terms and apply Lemma 2 to the integral ξ2 ε ε ϕ(θ) m1 (θ) dθ = J≡ ξ1
ε
ξ2 ε
ξ1
where ψ ε (θ) ≡
ϕ(θ) 2 2 ε σ(θ)
ψ ε (θ) exp{
ε
ξ2 ε
exp{− θ
1 ε F (θ)} dθ, ε2
1 ε F (x)} dx. ε2
In our set-up, σ(θ) > 0 for all θ. Since (42) holds, maxx∈[ξ1 ε ξ2 ε ) F ε (x) = F ε (η1 ε ). Now we have / 1 2π exp{ 2 F ε (η1 ε )} when ε → 0. (45) J ∼ ψ ε (η1 ε ) ε ε ε |F (η1 )| ε We again apply Lemma 2 to the function ψ ε (η1 ε ) in (45): ξ2 ε ϕ(η1 ε ) 1 ε ε exp{− 2 F ε (x)} dx ψ (η1 ) = 2 ε η1 ε ε2 σ(η1 ε ) 0 ε 1 ϕ(η1 ) π exp{− 2 F ε (ξ2 ε )} when ε → 0. ∼ 2 ε ε ε |F (ξ2 )| ε ε2 σ(η1 ε ) (46) Combine (45) with (46). Then it is derived that √ ε ε ε ε 2π exp{ F (η ) − F (ξ ) /ε2 } ε 1 2 ϕ(η1 ) ' J∼ 2 |F ε (η1 ε ) F ε (ξ2 ε )| σ(η1 ε ) Recall that
when ε → 0.
sε (θ, ξ2 ε ) = exp{ F ε (θ) − F ε (ξ2 ε ) /ε2 },
and we repeat the analogous arguments as before. Then it follows that √ ε ε ε ε ξ2 ε 2π exp{ F (η ) − F (ξ ) /ε2 } ε 1 2 ϕ(η1 ) ε ' , ϕ(θ) m2 (θ) dθ ∼ 2 |F ε (η1 ε ) F ε (ξ1 ε )| ξ1 ε σ(η1 ε ) when ε → 0. In the sequel we obtain that ξ2 ε π 1 1 ϕ(θ) με (dθ) ∼ ϕ(η1 ε ) ∼ ϕ( ) as ε → 0, 2 2 2 ξ1 ε 121
Nishioka Kunio and the proof is complete. 2 Next we suppose that σ(θ) vanishes. Recall (19), and we see that the function σ(θ) is a trigonometric polynomial of degree 2 with period π. So there are at most four points in 0 ≤ γ1 ≤ γ2 < γ3 (= γ1 + π) ≤ γ4 (= γ2 + π) < 2π
such that σ(θ)
if θ = γk ’s if θ = γk ’s.
=0 = 0
(47)
We classify positions of γk ’s on the unit circle as follows: Case Case Case Case Case Case Case Case Case Case Case Case
5.1 : 5.2 : 5.3 : 5.4 : 5.5 : 5.6 : 5.7 : 5.8 : 5.9 : 5.10 : 5.11 : 5.12 :
0 < γ1 < γ2 < π/2 0 < γ1 < π/2 < γ2 < π π/2 < γ1 < γ2 < π 0 = γ1 , 0 < γ2 < π/2 0 = γ1 , π/2 < γ2 < π 0 < γ1 < π/2, γ2 = π/2 π/2 = γ1 , π/2 < γ2 < π 0 = γ1 , π/2 = γ2 0 < γ1 = γ2 < π/2 π/2 < γ1 = γ2 < π 0 = γ1 = γ2 π/2 = γ1 = γ2 .
(48)
Case 5.1: In this case, all γk ’s are right (= anti-clockwise) shunts, which is discussed in Proposition 2 (ii). (See Figure 5). So we may repeat all discussions in Step 2 of §4.3, and obtain the same assertion as in Lemma (1). In this case, the invariant probability measure is με (dθ) ≡
1 mk ε (θ) dθ Nε
where γ5 ≡ γ1 + 2π and we set mk ε (θ) ≡ Nε ≡
1 2 σ(θ) sε (γk† , θ)
4 γk+1 k=1
122
ε2
γk
mk (θ) dθ,
θ
for γk ≤ θ < γk+1
γk+1
(49)
sε (γk† , x) dx, γk ≤ θ < γk+1 ,
k = 1, · · · , 4,
Two-Dimensional Systems with Small Random Terms
Figure 5: Case 5.1
in which the function sε is given by (44) and each γk† is a fixed point chosen from the open interval (γk , γk+1 ). Proposition 4. Suppose that Case 5.1 in (48) holds. Let ε be small. Then, the statements in Proposition 3 (ii) is true, except that the invariant measure με is defined by (49). Proof: We denote by U (γk ) a small open neighborhood of each γk , respectively. By a simple calculation, we see that 1 dmk ε . (θ) = 2 dθ 2 b(θ) − ε σ(θ) σ (θ) This is uniformly bounded in θ ∈ U (γk ) with respect to small ε. So there is a constant K and a positive number δ such that sup ε≤δ
4 k=1
U (γk )
mk ε (θ) dθ < K.
Since σ(θ) > 0 if θ ∈ [0, 2π] − ∪4k=1 U (γk ), we may repeat the similar argument as in the proof of Proposition 3 (ii), after substituting [0, 2π] with [0, 2π] − ∪4k=1 U (γk ) and the definition of με with (49). 2 The situation of Case 5.1 is analogous to the some other cases, in which an invariant measure exists and we have the following results. Corollary 2. Let Cases 5.2, 5.3, 5.4, 5.5 ,5.9, 5.10, or 5.11 in (48) hold and ε be small. Then, the statement in Proposition 3 (ii) is true, except the invariant measure με is slightly different. 123
Nishioka Kunio Proof: The proof of this corollary can be found in Nishioka (1976b). So we omit the proof. 2 Case 5.6: By the similar argument as in Case 5.4, we have the following inequalities; −k1 2bε (θ) −k2 ≤ − 2 ≤ π/2 − θ π/2 − θ σ(θ)
if θ ∈ [
k3 2bε (θ) k4 ≤ − 2 ≤ π/2 − θ π/2 − θ σ(θ)
π π if θ ∈ ( , + δ], 2 2
π π − δ, ), 2 2
where kj ’s are some positive constants and δ > 0. Due to the arguments in section 3.4.1, these inequalities derive that γ2 = π/2 is a right and left attracting trap. On the other hand, γ1 is a right ( = anti-clockwise ) shunt, and Proposition 2 (v) is applicable to {θε (t), t ≥ 0} in the interval (γ1 , π/2) and (vii) to it in (−π/2, γ1 ). (See Figure 6.)
Figure 6: Case 5.6
The next conclusion follows directly from Proposition 2: Proposition 5. Suppose that Case 5.6 in (48) holds and ε is small. (i) If θ∗ = θε (0, ω) ∈ (−π/2, γ1 ), then the statement of Proposition 2 (vii) holds with α = −π/2 and β = γ1 . (ii) If θ∗ = θε (0, ω) ∈ (γ1 , π/2), then θε (t) → π/2 a.s. as t → ∞. 124
Two-Dimensional Systems with Small Random Terms (iii) If θ∗ = θε (0, ω) is neither 0 nor π, then π ε2 Lε (x∗ ) = Qε ( ) = a2 + g22 2 2 2
a.s..
(50)
The situation of Case 5.6 is much similar to the some other cases, in which an attracting trap exists exists and we have the following results. Corollary 3. Let Cases 5.7, 5.8, or 5.12 in (48) hold and ε be small. (i) If θ∗ = θ ∗(0, w) does not equal to the trap points (0, π/2, π, 3π/2), then θε (t, w) converges to the attracting traps almost surly. (ii) If θ∗ = θ ∗ (0, w) equals to the trap points, then θε (t, w) stays there almost surly. Proof: The proof of this corollary can be found in Nishioka (1976b). So we omit the proof. 2 Generally speaking, it is not natural to expect that the diffusion process {θε (t), t ≥ 0} converges a.s. to a point as t → ∞. In fact such behavior is effected essentially by a property of the random term ε G · xε (t, ω) dW (t, ω) in SDE (1) as shown in Lemma 1, Propositions 3, 5, and etc.. Therefore we present a slightly wider characterization of a saddle point than the original (11a) – (11d). The new characterization is to request that the following equalities15 hold instead of (11a) and (11c): T π 1 ϕ(θ(t)) dt = ϕ( ) if θ(0) = 0 or π, lim T →∞ T 2 0 T 1 ˜ ϕ(θ(t)) dt = ϕ(0) if θ(0) = π/2 or 3π/2 lim T →∞ T 0
(51)
for each periodic continuous function ϕ with period π. Note that (11a) and (11c) imply (51), but the opposite is not necessarily true. 15 This limits (51) are known as a Toeplitz type limits, which are introduced in order to extend a concept of limit for a sequence and a series.
125
Nishioka Kunio We assert that the random dynamical system (1) satisfies the above request. Theorem 3. Let the origin 0 be a saddle point to the non-random dynamical system (4), which is characterized by (11b), (11d), and (51). If ε is small, then the origin is also a saddle point to the related random system. More preciously, there exists a positive ε∗ for any δ > 0 such that the next inequalities hold with probability one: If ε < ε∗ , then ε ∗ L (x ) − a2 < δ if x∗ = (x∗1 , x∗2 ) with x∗2 = 0, (52) ε ∗ ∗ ∗ ∗ ∗ ˜ (x ) + a1 < δ if x = (x1 , x2 ) with x1 = L 0, (53) T π 1 ϕ(θε (t, ω)) dt − ϕ( ) < δ if θ∗ = θε (0, ω) = 0 or π, lim T →∞ T 2 0 (54) T 3π π 1 or , ϕ(θ˜ε (t, ω)) dt − ϕ(0) < δ if θ∗ = θ˜ε (0, ω) = lim T →∞ T 2 2 0 (55) for each periodic continuous function ϕ with period π.
Proof: We have already proved validity of (52) and (54) by the propositions in section 3.6.1 and 5.2. To the random system {˜ xε (t, ω), t ≥ 0} in Notation 1 (page 108), we consider the diffusion process {θ˜ε (t, ω), t ≥ 0} given by SDE (18) (page 107), except that bε (θ) =
a1 − a2 sin 2θ + ε2 bG (θ), 2
a 1 < 0 < a2 .
While the diffusion process {˜ ρε (t), t ≥ 0} given by SDE (20) (page 107), but Qε (θ) = −a1 + (a1 − a2 ) sin 2θ + ε2 qG (θ). So after a little modification, we may apply those propositions in secρε (t), t ≥ 0}, then tion 3.6.1 and section 3.6.2 to {θ˜ε (t), t ≥ 0} and {˜ (53) and (55) follow. 2 126
Two-Dimensional Systems with Small Random Terms 3.7
Improper and proper node
First we suppose that the matrix A is (IV) in (3), and the origin 0 is an improper node to the non-random system (4). (See Figure 2.) We also extend the characterization (12a) – (12d) of an improper node. We replace (12a) and (12c) by the request such that: T 1 ϕ(θ(t)) dt = ϕ(0) if θ(0) = π/2 or 3π/2, T →∞ T 0 T π 1 ˜ lim ϕ(θ(t)) dt = ϕ( ) if θ(0) = 0 or π T →∞ T 2 0 lim
(56)
for each periodic continuous function ϕ with period π. In this case, the random system {xε (t), t ≥ 0} satisfies the same SDE as the random system in a saddle point case, except signatures of a1 and a2 . So we need only a little modification to obtain the following result: Theorem 4. Suppose that the matrix A is (IV) in (3) and the origin 0 is an improper node to the non-random dynamical system (4) characterized by (12b), (12d), and (56). If ε is small, then the origin is also an improper node to the related random system. In detail, there exists a ε∗ > 0 for any δ > 0 such that the following inequalities hold a.s.: If ε < ε∗ , then ε ∗ L (x ) − a1 ≤ δ if x∗ = (x∗1 , x∗2 ) with x∗1 = 0, ε ∗ ˜ (x ) + a1 ≤ δ if x∗ = (x∗ , x∗ ) with x∗ = 0, L 1 2 1 T 3π π 1 or , ϕ(θε (t, ω)) dt − ϕ(0) ≤ δ if θ∗ = θε (0, ω) = lim T →∞ T 2 2 0 T 1 π ϕ(θ˜ε (t, ω)) dt − ϕ( ) ≤ δ if θ∗ = θ˜ε (0, ω) = 0 or π. lim T →∞ T 2 0 for each periodic continuous function ϕ with period π.
127
Nishioka Kunio Let the matrix A be (V) in (3) ( see Figure 3 ). In this case, the origin 0 is an improper node to the non-random system, which is characterized by (13a) – (13d). We replace the characterization (13a) and (13c) by the condition:
T
π ϕ(θ(t)) dt = ϕ( ) and 2
T
π ˜ ϕ(θ(t)) dt = ϕ( ) 2 0 0 (57) for each periodic continuous function ϕ with period π. In the random system (1) of this set-up, we have 1 lim T →∞ T
1 lim T →∞ T
dθε (t, ω) = bε (θε (t, ω)) dt + ε σ(θε (t, ω)) dW (t, ω) with bε (θ) = a2 cos2 θ + ε2 bG (θ),
a1 < 0 < a2 .
The main part of the function bε (θ) is the non-negative term a2 cos2 θ
(58)
that has zeros of degree 2 at the points θ = 0 and θ = π. Therefore we must treat this case more delicately than the previous saddle point case, but the argument is essentially similar. Theorem 5. Suppose that the matrix A is (V) in (3) and the origin is an improper node to the non-random dynamical system, which is characterized by (13b), (13d), and (57). If ε is small, then the origin is also an improper node to the related random system. In detail, there exists a ε∗ > 0 for any δ > 0 such that the next inequalities hold a.s.: If ε < ε∗ , then ε ∗ ε ∗ ˜ (x ) + a1 ≤ δ, L (x ) − a1 ≤ δ, L (59a) T π 1 (59b) ϕ(θε (t, ω)) dt − ϕ( ) ≤ δ, lim T →∞ T 2 0 T π 1 ε ˜ (59c) ϕ(θ (t, ω)) dt − ϕ( ) ≤ δ lim T →∞ T 2 0 for each periodic continuous function ϕ with period π. Proof: We prove the theorem in an analogous way to these arguments developed in section 3.6 in order to prove Theorem 3. 128
Two-Dimensional Systems with Small Random Terms But the proof needs a different complicated classification than (48), since the main term (58) is different to that in a saddle point case. So we omit to talk about the proof, whose essential part can be found in Nishioka (1976b), Proof of Theorem 3. 2 Let the matrix A be (VI) in (3). The origin 0 is a proper node to the non-random system (4) ( see Figure 3 ) and it is characterized by (15), which should be extended into: ˜ ∗ ) = −a1 > 0, L(x∗ ) = a1 < 0 and L(x T 1 lim ϕ(θ(t)) dt = ϕ(θ(0)), T →∞ T 0 T 1 ˜ lim ϕ(θ(t)) dt = ϕ(θ(0)) T →∞ T 0
(60a) (60b) (60c)
for each periodic continuous function ϕ with period π. ρε (t, ω), t ≥ 0} satisfy In a proper node case, {ρε (t, ω), t ≥ 0} and {˜ the following SDE’s: dρε (t, ω) = a1 dt + ε2 qG (θε (t, ω)) dt + ε RG (θε (t, ω)) dW (t, ω), d˜ ρε (t, ω) = −a1 dt + ε2 qG (θ˜ε (t, ω)) dt + ε RG (θ˜ε (t, ω)) dW (t, ω). Therefore Law of iterated logarithm implies that 1 T →∞ T
Lε (x∗ ) = a1 + ε2 lim
˜ ε (x∗ ) = −a1 + ε2 lim 1 L T →∞ T
T
qG (θε (t, ω)) dt a.s.,
0
T
0
qG (θ˜ε (t, ω)) dt a.s.,
after our proving that 1 T →∞ T
lim
0
T
qG (θε (t, ω)) dt and
1 T →∞ T
T
lim
0
qG (θ˜ε (t, ω)) dt
converge a.s. to each constant. Then it follows that Lε (x∗ ) → a1
˜ ε (x∗ ) → −a1 a.s. and L
a.s. as ε → 0.
However for the diffusion process {θε (t, ω), t ≥ 0}, we cannot assert any general result as shown in the following example. 129
Nishioka Kunio Example 2. Let the matrix A be (VI) in (3). (i) We set
% G=
0 −c c 0
& ,
c = 0,
in SDE (1), and this setting derives that dθε (t, ω) = ε c dW (t, ω). So the invariant measure to this {θε (t), t ≥ 0} is Lebesgue measure on the unit circle, and we obtain that 1 lim T →∞ T
T
ε
ϕ(θ (t, ω)) dt = 0
2π
ϕ(θ) dθ
a.s. for all ε > 0.
0
This is far from (60b) of the non-random system in a proper node. (ii) When we put % G=
0 c c 0
& ,
c = 0,
in SDE (1), we have dθε (t, ω) = ε2 c2 cos 2 θε (t, ω) sin 2 θε (t, ω) dt+ε c cos 2 θε (t, ω) dW (t, ω). The random term of the above SDE vanishes at the points γ1 ≡ π/4,
γ2 ≡ 3π/4,
γ3 ≡ 5π/4,
γ4 ≡ 7π/4,
and all of them are right and left attracting traps. So Proposition 2 (x) derives the next fact: Set γ5 ≡ γ1 + 2π = 9π/4. Then for each ε > 0, ⎧ ⎨ γk or γk+1 if θ∗ = θε (0, ω) ∈ (γk , γk+1 ) lim θε (t, ω) = t→∞ ⎩ if θ∗ = θε (0, ω) = γk (k = 1, ..., 4) γk with probability one. This is also different from (60b) or the original (15) of the corresponding non-random system. 130
Two-Dimensional Systems with Small Random Terms Remark 3. This example shows that there is no angular motion in the non-random system (4) if the origin is a proper node. Therefore in this case, the angular part {θε (t, ω), t ≥ 0} of the random system (1) is framed by the added random term ε G · xε (t, ω) dW (t, ω) only, and a certain G may break the characterization (60b) and (15) of the non random system in a proper node. There is no radial motion in the non-random system when the origin is a center, and the sequential situation is much the same as what we show in Example 1. ∇ Now the following assertion is evident. Theorem 6. Let the origin be a proper node to the non-random system (4), i.e. the matrix A is (II) in (3) and (60a) – (60c) hold. Then the origin is not necessarily a proper node to the related random system even though ε > 0 is sufficiently small.
References: Coddington, E. A., and Levinson, N. (1955) Theory of Ordinary Differential Equations. McGraw-Hill. Friedman, A. (1976) Stochastic Differential Equations and Applications, Vol. 2. Academic Press. Itˆo, K., and McKean, Jr., H. P. (1968) Diffusion Processes and Their Sample Paths. Springer Verlag. Khas’minski, R. Z. (1967) “Necessary and Sufficient Conditions for the Asymptotic Stability of Linear Stochastic Systems.” Theory of Probability and its Applications, SIAM 12: 167–172. Khas’minski, R. Z. (1980) Stochastic Stability of Differential Equations. English Ed., Sijthoff & Noordhoff. Maruyama, G. and Tanaka, H. (1957) Some Properties of One Dimensional Diffusion Processes, Memo. Memoirs of the Faculty of Science, Kyushu University. Series A. Mathematics 13: 117–141. Nevelson, M. B. (1964) “On the Behavior of the Invariant Measure of a Diffusion Processes with Small Diffusion on a Circle.” Theory of Probability and its Applications, SIAM 9: 125–131. 131
Nishioka Kunio Nishioka, K. (1975) “Approximation Theorem on Stochastic Stability.” Proceedings of the Japan Academy, Supplement 51 Suppl. : 795797. Nishioka, K. (1976a) “On the Stability of Two-Dimensional Linear Stochastic Systems.” K¯ odai Mathematical Seminar Reports 27: 211230. Nishioka, K. (1976b) “Asymptotic Behaviors of Two-Dimensional Autonomous Systems with Small Random Perturbations.” Journal of Mathematics of Kyoto University 16: 56-69. Revuz, D., and Yor, M. (1999) Continuous Martingales and Brownian Motion. Third ed. Springer Verlag.
132
Chapter 4 Dynamic Theory of Stochastic Movement of Systems
Masao Nagasawa Institute of Mathematics, University of Z¨ urich
4.1
Dynamic theory of stochastic processes
The dynamic theory of stochastic processes consists of two parts, kinematics and mechanics.1 The dynamic theory concerns an evolution equation d d 2 ∂u ∂u 1 2 ij ∂ u + (σ (t, x)) + bi (t, x) i + c(t, x)u = 0, (4.1.1) i j ∂t 2 i,j=1 ∂x ∂x ∂x i=1
which contains a potential function c(t, x), and the diffusion matrix σ(t, x), [a, b] × Rd → Rd × Rd and drift vector b(t, x), [a, b] × Rd → Rd must be prescribed. The case with no potential term can be treated in the framework of the conventional theory of Markov processes of Kolmogorov and Itˆo, which is a kinematic theory, as will be explained. In kinematics we have the kinematic equation d d ∂u ∂2u ∂u 1 2 + (σ (t, x))ij i j + bi (t, x) i = 0, ∂t 2 i,j=1 ∂x ∂x ∂x i=1
(4.1.2)
which contains the drift terms bi (t, x)∂u/∂xi but no potential term. The kinematic equation determines Markov (diffusion) processes, i.e., 1 In Nagasawa (1993) the two parts of the dynamic theory, kinematics and mechanics are called q-representation and p-representation.
Masao Nagasawa the movement of systems. By contrast, we have the equation of motion in the mechanics part of the dynamic theory. The equation of motion contains the potential function c(t, x) of external forces as in (1). External forces influence the movement of systems, but not in a direct way. As will be explained, the potential function determines a drift vector through the equation of motion. The induced drift vector then defines the kinematic equation. The kinematic equation finally describes sample paths of the movement of observing systems. We must therefore clarify the mathematical structures which connect three notions, external force, induced drift vector and sample paths of the movement. 4.2
Kinematic theory
There is an important class of theories that we call kinematic theories. In kinematic theories one handles the kinematic equation given in (2). According to the analytic method of Kolmogorov (1931), equation (2) characterizes a Markov (diffusion) process uniquely, if an initial distribution is prescribed. To discuss the kinematic equation in (2) in applications, one must take a crucial step, namely an appropriate choice of the diffusion and drift coefficients σ(t, x) and b(t, x), which are decided through careful analysis of considering systems depending on chosen models.2 By using the fundamental solution q(s, x; t, y) of equation (2), we set q(s, x; t, y) dy, t − s > 0. Q(s, x; t, B) = B
Then it is a transition probability, that is, Q(s, x; t, B) satisfies the normality condition Q(s, x; t, Rd ) = 1, (3) and obeys the Chapman–Kolmogorov equation Q(s, x; t, B) = Q(s, x; r, dy)Q(r, y; t, B), Rd
s ≤ r ≤ t.
(4)
Moreover, with a transition probability Q(s, x; t, B) and an initial distribution density μa (x0 ) at the switch on time t = a, we can construct a Markov process {Xt , t ∈ [a, b], Q} through the finite dimensional 2 Cf. e.g. Jensen and Richter (2007), Jensen, Wang and Johnsen (2007), this volume.
134
Dynamic Theory of Stochastic Movement of Systems distributions Q[f (Xa , Xt1 , . . . , Xtn−1 , Xb )] = μa (x0 ) dx0 q(a, x0 ; t1 , x1 ) dx1 q(t1 , x1 ; t2 , x2 ) dx2 q(t2 , x2 ; t3 , x3 ) dx3 . . . q(tn−1 , xn−1 ; b, xn ) dxn f (x0 , x1 , . . . , xn ), (5) where a < t1 < · · · < tn−1 < b, and f (x0 , x1 , . . . , xn ) is any bounded measurable function on the space (Rd )n+1 , cf. Kolmogorov (1933). Equation (5) determines a Markov process uniquely, and is a fundamental equation in the conventional theory of Markov processes of Kolmogorov. For equation (5) the normality condition in (3) is indispensable. The (marginal) distribution Q[Xt ∈ dx] of a Markov process is a special case of equation (5). We write the distribution as Q[Xt ∈ dx] = e2R(t,x) dx,
(6)
with a density μ(t, x) = e2R(t,x) , and call R(t, x) ‘exponent of distribution’. A Markov process is therefore determined by an exponent of distribution R(a, x) at the switch on time t = a and a transition probability Q(s, x; t, B) through equation (5). We carefully note that the finite dimensional distribution, i.e., equation (5), belongs to neither classical analysis nor functional analysis, and is a stranger for people who are accustomed to work with the classical mathematics. In other words, with equation (5), we go into a new mathematical field called sample path analysis, leaving classical and functional analysis. This is a key in discussing and understanding stochastic processes. 4.3
Sample path equation in kinematic theory
The kinematic equation in (2), together with (5), is equivalent to Itˆo’s stochastic differential equation t t σ(s, Xs ) dBs + b(s, Xs ) ds, (7) Xt = Xa + a
a
which is the equation of sample paths Xt (ω) (abbreviated as Xt ), where Bt (= Bt (ω)) is a d-dimensional Brownian motion (Wiener process) defined on a probability space {Ω, F, P }, and Xa denotes the initial value (position) at the switch on time t = a. 135
Masao Nagasawa We can decompose the movement Xt into two parts t t σ(s, Xs ) dBs + b(s, Xs ) ds, Xt = Xr + r
where a ≤ r ≤ t, and
Xr = Xa +
(8)
r
r
r
σ(s, Xs ) dBs + a
b(s, Xs ) ds,
(9)
a
i.e., from the switch on time a to the present time r, and from the present time r to the future time t. Then equation (8) shows that the process Xt depends on the information of the past only through the position Xr at the present time r, which is given by equation (9). In other words, the process Xt from the present time r to the future time t does not depend on the detail of the past history. This property is called the Markov property of the process Xt . The first integral on the right-hand side of (7) is Itˆo’s stochastic integral. We often write equation (7) in a differential form dXt = σ(t, Xt ) dBt + b(t, Xt ) dt,
(10)
or component-wise dXti = (σ(t, Xt ) dBt )i + bi (t, Xt ) dt,
i = 1, . . . , d.
(11)
We can compute the differential of f (t, Xt ), for f ∈ C 2 ([a, b] × Rd ), with the Itˆo formula. If we take up to the second order differentials of Xt , i.e., up to dXti dXtj , then we get d f (t, Xt ) =
d d ∂f 1 ∂ 2f ∂f i dt + dX + dXti dXtj . t i i ∂xj ∂t ∂x 2 ∂x i=1 i,j=1
(12)
In classical analysis we take only the first order differentials, and consider d ∂f ∂f d f (t, Xt ) = dt + dXti , (13) i ∂t ∂x i=1 where we assume that the second order differentials dXti dXtj are of small order compared to the first order differentials. But in sample path analysis this is not the case, since P. Levy’s symbolic formulas hold: dBti dBtj = δ ij dt, 136
dBti dt = 0,
and (dt)2 = 0.
(14)
Dynamic Theory of Stochastic Movement of Systems Hence the second order differentials dXti dXtj are not of small order compared to the first order differentials, since we have dBti dBti = dt, i = 1, . . . , d. Combining equations (11), (12) and (13), we get the Itˆo formula % d f (t, Xt ) =
& d d 1 2 ij ∂ 2 ∂ i ∂ f (t, Xt ) dt + (σ ) + b ∂t 2 i,j=1 ∂xi ∂xj ∂xi i=1 +
d ∂f (t, Xt ) i=1
∂xi
(σdBt )i .
(15)
Itˆo’s stochastic differential equation (7) and the Itˆo formula above are not only of theoretical importance but also powerful mathematical devices in the theory of Markov processes. Equation (14) proves that the transition density of Xt in (7) satisfies (2). The kinematic equation in (2), which is often called Kolmogorov’s equation, is not easy to solve except in some simple cases. Therefore, to analyze the Markov (diffusion) process Xt it is often better to handle Itˆo’s stochastic differential equation in (7). In solving it, we have extremely powerful tools, the so-called sample path analysis, in particular P. Levy’s formulae in (14) and Itˆo’s formula in (15). 4.4
Mechanics and the equation of motion
The single equation in (1) does not help us. The equation of motion in the dynamic theory of random motion is given by a pair of twin evolution equations with a scalar potential c(t, x) ∂φ + ∂t ∂ φˆ + − ∂t
1 Δφ + c(t, x)φ = 0, 2 1 ˆ Δφ + c(t, x)φˆ = 0, 2
(16)
where Δ denotes the Laplace–Beltrami operator % & 1 ∂ ' 2 ij ∂ 2 Δ=∇·∇= ' , det |σ (x)|(σ (x)) ∂xj det |σ 2 (x)| ∂xi which is necessary for discussing duality. The equation of motion in the general case with a vector potential b(t, x) is a pair of twin 137
Masao Nagasawa evolution equations ∂φ + ∂t ∂ φˆ + − ∂t
1 (∇ + b(t, x))2 φ + c(t, x)φ = 0, 2 1 (∇ − b(t, x))2 φˆ + c(t, x)φˆ = 0, 2
(17)
which are in formal duality with respect to d˜ x dt, where ' d˜ x = det |σ 2 (x)| dx, since we have
g(∇ + b(t, x))2 f d˜ x=
f (∇ − b(t, x))2 g d˜ x,
for any smooth f and g vanishing at infinity. This is the duality relation between (∇ + b(t, x))2 and (∇ − b(t, x))2 with respect to the measure ' d˜ x = det |σ 2 (x)| dx. When b and c are independent of time, we often consider stationary solutions. In this case, substituting φ(t, x) = eλt ϕ(x) at the first evolution equation in (17), we get 1 λϕ + (∇ + b(x))2 ϕ + c(x)ϕ = 0, 2 which is an eigenvalue problem, and plays a crucial role in quantum physics. We carefully note that the equation of motion in (17) (or (16)) does not belong to the conventional theory of Markov processes of Kolmogorov and Itˆo. In fact, let p(s, x; t, y) be the fundamental solution of the twin equations of motion in (17). Then it does not satisfy the normality condition in (3). Instead, we have p(s, x; t, y) dy = 1, because of the potential terms in equation (17). Hence p(s, x; t, y) dy is not a transition probability, and equation (5) of Kolmogorov is not applicable. This means that we cannot apply the conventional theory of Markov processes to equations with potential terms. We need a new method, which will be explained in the following, for constructing stochastic processes. 138
Dynamic Theory of Stochastic Movement of Systems Let p(s, x; t, y) be the fundamental solution of the twin equations of motion in (17) (or (16)), and {φˆa (x), φb (y)} be a pair of functions which are normalized as (18) φˆa (x) dxp(a, x; b, y)φb (y) dy = 1. ˆ x) of the twin equations of moWe then get solutions φ(t, x) and φ(t, tion in (17) by φ(t, x) = p(t, x; b, y)φb (y) dy, (19) ˆ ˆ φ(t, x) = φa (z) dzp(a, z; t, x). ˆ x) ‘time-reversed (or We will call φ(t, x) ‘evolution function’ and φ(t, backward) evolution function’. The condition in (18) implies the normality condition in the dynamic theory ˆ x)φ(t, x) dx = 1. φ(t, Making use of the triplet {p(s, x; t, y), φˆa (x), φb (y)}, we can construct a stochastic process {Xt , t ∈ [a, b], Q} through the finite dimensional distributions Q[f (Xa , Xt1 , . . . , Xtn−1 , Xb )] = dx0 φˆa (x0 )p(a, x0 ; t1 , x1 ) dx1 p(t1 , x1 ; t2 , x2 ) dx2 · · ·
(20)
· · · p(tn−1 , xn−1 ; b, xn )φb (xn ) dxn f (x0 , x1 , . . . , xn ). This is a new method for constructing stochastic processes in the dynamic theory. As a special case of (20), the distribution of the stochastic process {Xt , t ∈ [a, b], Q} is given by ˆ x)φ(t, x) dx. Q[Xt ∈ dx] = φ(t, We carefully compare equation (20) with Kolmogorov’s equation in (5) which defines a Markov process. Equation (5) has only an initial function μa (x), and is defined by the fundamental solution q(s, x; t, y) of the kinematic equation. By contrast, equation (20) has an initial function φˆa (x) and in addition a terminal function φb (y), and is defined by the fundamental solution p(s, x; t, y) of the twin equations of motion. This is a decisive point that makes the dynamic theory of random motion completely different from the conventional theory of Markov processes of Kolmogorov and Itˆo. 139
Masao Nagasawa 4.5
Evolution function and kinematic equation
Since the process {Xt , t ∈ [a, b], Q} constructed by equation (20) depends on the initial and terminal functions {φˆa (x), φb (y)}, it is not a Markov process with p(s, x; t, y), which is the fundamental solution of the twin equations of motion in (17). We can nevertheless find a basic relation between the process {Xt , t ∈ [a, b], Q} and a Markov process. We otherwise cannot discuss the equation of sample paths in (7). This will be explained in what follows. ˆ x) be the evolution function and time-reversed Let φ(t, x) and φ(t, (or backward) evolution function given in (19). Then we have ˆ x) induce Theorem 1. The twin evolution functions φ(t, x) and φ(t, the forward drift vector a(t, x) and backward drift vector a ˆ (t, x) by a(t, x) =
σ 2 ∇φ(t, x) , φ(t, x)
a ˆ (t, x) =
ˆ x) σ 2 ∇φ(t, . ˆ x) φ(t,
(21)
We introduce a new transition density q(s, x; t, y). Definition 1. Let p(s, x; t, y) be the fundamental solution of the twin equations of motion in (17). By using an evolution function φ(t, x), we define a new transition density by q(s, x; t, y) =
1 p(s, x; t, y)φ(t, y). φ(s, x)
(22)
Theorem 2. (i) The function q(s, x; t, y) defined by (22) is the fundamental solution of diffusion equations in formal duality ∂u 1 + Δu + (b(t, x) + a(t, x)) · ∇u = 0, ∂t 2 ∂μ 1 + Δμ − div((b(t, x) + a(t, x))μ) = 0, − ∂t 2
(23)
which contain the drift vector a(t, x) induced by an evolution function φ(t, x). The function q(s, x; t, y) obeys the Chapman–Kolmogorov equation in (4), and satisfies the normality condition q(s, x; t, y) dy = 1, s < t. (24) (ii) The process {Xt , t ∈ [a, b], Q} constructed through equation (20) is a Markov process which has the transition probability q(s, x; t, y) dy, 140
Dynamic Theory of Stochastic Movement of Systems and its distribution is given by ˆ x)φ(t, x) dx, Q[Xt ∈ dx] = φ(t, ˆ x)φ(t, x), which satisfies the second equawith a density μ(t, x) = φ(t, tion in (23). The first equation in (23) is often called the KolmogorovSmoluchowski equation, which describes the transition of motion (a stochastic process). The second equation in (23) is called the Fokker– Planck equation, which is exclusively for distribution densities. The two equations must be strictly distinguished of each other to avoid confusion and misunderstandings. I carefully note that even though one knows distribution densities, one cannot see the motion itself. This fact is nowadays well-known in the theory of stochastic processes, but was not known in 1920’s (and is not well-understood even nowadays), and people computed only distribution densities by solving the Fokker–Planck equation. Therefore, discussing the motion itself was purely a guess work at the time. This fact is extremely important, when we look at history of physics. By contrast, the twin equations of motion in (16) or (17) describe the transition (or evolution) in normal time and in reversed time, respectively. Therefore, they play exactly the same role, although time runs in opposite directions of each other. For proving Theorem 2, we rewrite equation (20) into equation (5). In performing this we manipulate equation (20). We first insert φ(a, x0 )φ−1 (a, x0 ) just after φˆa (x0 ). We then replace φ−1 (a, x0 )p(a, x0 ; t1 , x1 )φ(t1 , x1 ) by q(a, x0 ; t1 , x1 ), by using the formula in (22). Repeating this procedure, we finally reach q(tn−1 , xn−1 ; b, xn ) at the tail of the equation. Thus we get equation (5) with the initial distribution density μa (x) = φˆa (x0 )φ(a, x0 ) and the transition function q(s, x; t, y). This proves that the process constructed by (20) is a Markov process with the transition probability q(s, x; t, y) dy. For details we refer to Nagasawa (1993, 2000). Thus an evolution function φ(t, x) given by (19) determines a drift vector a(t, x) by (21), and the induced drift vector a(t, x) then determines, together with the prescribed drift vector b(t, x), the kinematic equation in (23), hence we finally get sample paths of a Markov process Xt , which has the drift vector b(t, x) + a(t, x). Let us write this as a diagram: Potential c(t, x) =⇒ Induced drift a(t, x) =⇒ Sample paths Xt . 141
Masao Nagasawa Remark 1. By using a time-reversed (or backward) evolution funcˆ x) in (19), we define also a time-reversed (or backward) trantion φ(t, sition density ˆ x)p(s, x; t, y) qˆ(s, x; t, y) = φ(s,
1 ˆ y) φ(t,
,
which satisfies the time-reversed normality condition dxˆ q (s, x; t, y) = 1, s < t, and the function qˆ(s, x; t, y) is the fundamental solution of the timereversed kinematic equation ∂ uˆ 1 + Δˆ u + (−b(t, x) + a ˆ (t, x)) · ∇ˆ u = 0, ∂t 2 ∂μ 1 + Δμ − div((−b(t, x) + a ˆ (t, x))μ) = 0. ∂t 2
−
When we discuss the time-reversed description, we read equation (20) from right to left with a clock running backwards. ∇ 4.6
Exponent of motion and initial condition
We introduce a new pair of variables defined by R(t, x) =
1 ˆ x) and S(t, x) = 1 log φ(t, x) , log φ(t, x)φ(t, ˆ x) 2 2 φ(t,
(25)
and represent the evolution function φ(t, x) and time-reversed (or ˆ x), by using the pair of functions backward) evolution function φ(t, R(t, x) and S(t, x), in the exponential form as φ(t, x) = eR(t,x)+S(t,x) , ˆ x) = eR(t,x)−S(t,x) . φ(t,
(26)
The distribution density of our process depends on the exponent of distribution R(t, x), but does not depend on the function S(t, x), since ˆ (t, x)) μ(t, x) = e2R(t,x) . By contrast the drift vector a(t, x) (resp. a depends on S(t, x). In fact, let a pair of functions R(t, x) and S(t, x) be defined by (25). Then the formulae in (21) yield a(t, x) = σ 2 (∇R(t, x) + ∇S(t, x)), a ˆ (t, x) = σ 2 (∇R(t, x) − ∇S(t, x)). 142
(27)
Dynamic Theory of Stochastic Movement of Systems We will call the function S(t, x) ‘exponent of motion’, and the pair of functions {R(a, x), S(a, x)} ‘initial condition’ of the movement of a system. Then we have Theorem 3. Let an initial condition {R(a, x), S(a, x)} at the switch on time t = a be prescribed. Set φˆa (x) = eR(a,x)−S(a,x) . Then the time-reversed evolution function is given by ˆ x) = φ(t,
φˆa (y) dyp(a, y; t, x),
(28)
where p(s, z; t, x) is the fundamental solution of the twin equations of motion in (17). Further, the evolution function φ(t, x) is given as a solution of a linear integral equation φa (z) =
p(a, z; t, x) dxφ(t, x),
(29)
where φa (x) = eR(a,x)+S(a,x) , which is also a known function. We can get the terminal function φb (x) = φ(b, x) at the terminal time t = b by equation (29), hence the terminal function φb (x) is determined by an initial condition {R(a, x), S(a, x)}. Since the initial and terminal functions {φˆa , φb } are determined by the initial condition, we can use a triplet {p(s, x; t, y), R(a, x), S(a, x)} instead of the triplet {p(s, x; t, y), φˆa , φb }. In other words, the process {Xt , t ∈ [a, b], Q} is uniquely determined by the fundamental solution p(s, x; t, y) of the twin equations of motion in (17) and an initial condition {R(a, x), S(a, x)}. We carefully note that one can prescribe an initial condition {R(a, x), S(a, x)} in our dynamic theory. By contrast, only an initial distribution with a density e2R(a,x) , i.e., only the exponent of distribution R(a, x) can be prescribed in the conventional theory of Markov processes of Kolmogorov and Itˆo. I will demonstrate this definitive advantage of my dynamic theory with simple examples. 4.7
Examples
We will consider examples in one-dimension for simplicity. 143
Masao Nagasawa Example 1. The free movement of a system in one-dimension is governed by the equation of motion, which is a pair of twin equations ∂φ 1 2 ∂ 2 φ + σ = 0, ∂t 2 ∂x2 (30) ∂ φˆ 1 2 ∂ 2 φˆ + σ = 0, − ∂t 2 ∂x2 where σ is a constant. The fundamental solution of the equation of motion in (30) is (y − x)2 1 p(s, x; t, y) = ' . exp − 2 2σ (t − s) 2πσ 2 (t − s) We take an evolution function φ(t, x) = e−(κ
2 /2)t+ κ x σ
,
(31)
which is a solution of the first equation in (30), where κ is an arbitrary constant. We then get drift ∂ log φ(t, x) = σκ, ∂x in view of equation (21). The transition density of the free movement, which is a Markov process in one-dimension, is given by (y − x)2 1 1 κ q(s, x; t, y) = ' − κ2 (t−s)+ (y−x) , exp − 2 2σ (t − s) 2 σ 2πσ 2 (t − s) a(t, x) = σ 2
in view of (22). Then functions defined by u(t, x) = q(t, x; b, z) dzf (z) and μ(t, x) = μ(y) dzq(a, y; t, x) are solutions of the kinematic equation ∂u ∂u 1 2 ∂ 2 u + σ = 0, + σκ 2 ∂t 2 ∂x ∂x ∂μ 1 2 ∂ 2 μ ∂μ − + σ = 0, − σκ 2 ∂t 2 ∂x ∂x with constant drift σκ. We note that μ(t, x) is the distribution density of the process. Therefore, sample paths of the free movement is given by Xt = Xa + σBt−a + σκ(t − a).
(32)
Thus the free movement shows random zigzag motion as the Brownian motion σBt−a , and moreover it has drift σκ(t−a), although no external force is in existence. 144
Dynamic Theory of Stochastic Movement of Systems Example 2. We now consider the movement of a system governed by the twin equations of motion with Hooke’s potential ∂φ + ∂t ∂ φˆ + − ∂t We set, respectively,
1 2 ∂ 2φ σ − 2 ∂x2 1 2 ∂ 2 φˆ σ − 2 ∂x2
1 2 2 κ x φ = 0, 2 1 2 2ˆ κ x φ = 0. 2
φ = eλt ϕ(x) and φˆ = e−λt ϕ(x)
(33)
(34)
in equation (33), then we get an eigenvalue problem 1 d2 ϕ 1 − σ 2 2 + κ2 x2 ϕ = λϕ. 2 dx 2 Hence the process has a stationary distribution density ˆ x) = ϕ2 (x), μ(x) = φ(t, x)φ(t, in view of (34). For the smallest eigenvalue λ0 = σκ, we have the associated eigenfunction 2 (35) ϕ(x) = βe−κx /(2σ) , where β is a normalizing constant. The drift coefficient is therefore a(x) = σ 2
d (log eλt ϕ(x)) = −σκx, dx
in view of equation (21), hence the kinematic equation is ∂u ∂u 1 2 ∂ 2 u + σ = 0. − σκx 2 ∂t 2 ∂x ∂x Sample paths of the motion are given by solutions of a stochastic differential equation t Xt = Xa + σBt−a − σκ dsXs . (36) a
Drift a(x) = −σκx induces the tendency of the movement towards the origin added to the Brownian motion σBt−a , hence sample paths cannot stay long time far away from the origin. It is a stochastic generalization of the classic harmonic oscillation. In view of (35) it has the stationary distribution with a Gaussian density μ(x) = ϕ2 (x) = β 2 e−κx
2/σ
. 145
Masao Nagasawa 4.8
Schr¨ odinger’s wave theory and dynamic theory
The relation between the dynamic theory and Schr¨odinger’s wave theory should be explained. Although we can treat the general case with a vector potential, we will consider, for simplicity, the Schr¨odinger equations with a scalar potential V (t, x) and with a constant coefficient σ, namely ∂ψ 1 2 + σ Δψ − V (t, x)ψ = 0, ∂t 2 ∂ψ 1 2 + σ Δψ − V (t, x)ψ = 0, −i ∂t 2 i
(37)
where σ 2 = h/(2πm), h is Planck’s constant and m is the mass of an electron. We represent the solution of the first equation in (37) as a complex-valued exponential function ψ(t, x) = eR(t,x)+iS(t,x) .
(38)
We identify R(t, x) and S(t, x) of the complex-valued exponential function in (38) with the exponent of distribution and the exponent of motion {R, S}, and set φ(t, x) = eR(t,x)+S(t,x) , ˆ x) = eR(t,x)−S(t,x) . φ(t,
(39)
Then the real-valued exponential functions φ and φˆ satisfy the twin equations of motion 1 2 σ Δφ + c(t, x)φ = 0, 2 1 2 ˆ σ Δφ + c(t, x)φˆ = 0, 2
(40)
c(t, x) = −(V (t, x) + V˜ (t, x)),
(41)
∂φ + ∂t ∂ φˆ + − ∂t where with
∂S (42) V˜ (t, x) = σ 2 (∇S)2 + 2 , ∂t which we will call ‘self potential’, since it is not caused by external forces. Its physical meaning will be clarified later on in Remark 2. Moreover, we get a fundamental relation ˆ x), ψ(t, x)ψ(t, x) = φ(t, x)φ(t, 146
(43)
Dynamic Theory of Stochastic Movement of Systems between the wave function ψ(t, x) and the evolution functions φ(t, x) ˆ x). Equation (43) implies that the intensity ψ(t, x)ψ(t, x) of and φ(t, the complex-valued wave ψ(t, x) coincides with the distribution denˆ x) of the stochastic process Xt determined by the equasity φ(t, x)φ(t, tion of motion in (40). Cf. Nagasawa (1993, 2000) for a proof. We will quickly look at the history of quantum physics. Based on the wave equation in (37), Schr¨odinger (1926) developed a wave theory of electrons. It was successful in computing the energy. But the wave theory failed in explaining the fact that an electron is always found at a point. He then recognized the necessity of a particle theory for electrons, and discussed the stochastic motion of particles in Schr¨odinger (1931). However, it was not fully successful, since he could not find the formula given in (27) (he did not know the existence of the exponent of motion S(t, x)), and postponed further discussions about the relation between his theory of stochastic motion and quantum mechanics. The dynamic theory of stochastic movement explained in the present exposition is a further development of an idea in Schr¨odinger (1931). We carefully note that the Schr¨odinger equation in (37) is a complex-valued counterpart of the twin equations of motion in (40). The Schr¨odinger equation is the complex-valued evolution equation, and a useful mathematical device in the dynamic theory of stochastic motion of particles. In the last century people interpreted the Schr¨odinger equation as a wave equation, and reached conceptual confusion as a result. 4.9
Sample paths of motion governed by the Schr¨ odinger equation
We will consider, for simplicity, the Schr¨odinger equation i
∂ψ 1 2 + σ Δψ − V (t, x)ψ = 0, ∂t 2
(44)
whose solution we write as a complex-valued exponential function ψ(t, x) = eR(t,x)+iS(t,x) .
(45)
In Schr¨odinger’s wave theory, ψ(t, x) is a complex-valued wave function. However, we will regard equation (44) not as a wave equation but as a complex-valued counterpart of the twin equations of motion of particles in (40), based on the equivalence explained in the preceding section. Hence we identify R(t, x) and S(t, x) of ψ(t, x) in (45) 147
Masao Nagasawa with the exponent of distribution and the exponent of motion. We can then apply the first formula in equation (27), and get a drift vector a(t, x) = σ 2 (∇R(t, x) + ∇S(t, x)),
(46)
with which we define the equation of sample paths in (7). In this case with a constant σ we get t a(s, Xs ) ds, (47) Xt = Xa + σBt−a + a
where Bt denotes the d-dimensional Brownian motion, and Xa is an initial position, which is a random variable independent of the Brownian motion Bt . Example 3. We consider the Schr¨odinger equation in one-dimension with no potential function i
∂ψ 1 2 ∂ 2 ψ + σ = 0. ∂t 2 ∂x2
We take a special solution ψ(t, x) = eκx/σ+i(κ
2 /2)t
.
Based on the equivalence explained in the preceding section, we identify κ2 t κ R(t, x) = x and S(t, x) = σ 2 of ψ(t, x) with the exponent of distribution and the exponent of motion. Then we get ∂R = σκ, a(t, x) = σ 2 ∂x by equation (46). Therefore, the equation of sample paths in (47) is in this simple case Xt = Xa + σBt−a + σκ(t − a), which coincides with equation (32) of the free motion with a constant drift σκ. Example 4. We consider the Schr¨odinger equation in one-dimension i 148
∂ψ 1 2 ∂ 2 ψ 1 2 2 + σ − κ x ψ = 0, ∂t 2 ∂x2 2
Dynamic Theory of Stochastic Movement of Systems with Hooke’s potential V (x) = κ2 x2 /2. Substituting ψ(t, x) = e−iλt ϕ(x), we get
1 d2 ϕ 1 − σ 2 2 + κ2 x2 ϕ = λϕ. 2 dx 2 For the smallest eigenvalue λ0 = σκ, we have the associated eigenfunc2 2 tion ϕ(x) = βe−κx /(2σ) , hence ψ(t, x) = βe−κx /(2σ) e−iσκt . We then identify R(t, x) = −
κ 2 x + log β 2σ
and S(t, x) = −σκt
of ψ(t, x) with the exponent of distribution and the exponent of motion. Then we get a(t, x) = σ 2
∂R = −σκx, ∂x
by equation (46). Therefore, the equation of sample paths in (47) is in this case t dsXs , Xt = Xa + σBt−a − σκ a
which coincides with equation (36), which is the sample path equation of the stochastic generalization of the classic harmonic oscillation. Example 5. We consider the Schr¨odinger equation with Hooke’s potential in two dimensions in the polar coordinates 1 ∂ψ 1 2 1 ∂ ∂ψ 1 ∂ 2 ψ + σ r + 2 2 − κ2 r2 ψ = 0, (48) i ∂t 2 r ∂r ∂r r ∂η 2 where σ 2 = ε/m, ε = h/(2π) and κ is a constant. By substituting ψ(t, r, η) = e−iλt ψ(r, η), and by separating variables as ψ(r, η) = R(r)Φ(η), we get & % & % 1 21 d dR 1 2 2 1 2 m2 − σ r + κ r + σ 2 R = λR, 2 r dr dr 2 2 r 2 dΦ − 2 = m2 Φ. dη 149
Masao Nagasawa For the angular equation we take complex-valued solutions Φ = eimη ,
m = 0, ±1, ±2, . . . .
Solutions to the eigenvalue problem of the radial part are known. The eigenvalues are λ|m|+n = σκ(|m| + n + 1), where m = 0, ±1, ±2, . . . and n = 0, 2, 4, . . . . Associated eigenfunctions are given by %0 & & % 1κ 2 κ r exp − r , Rm,n (r) = F|m|,n σ 2σ where F|m|,n (x) is a polynomial function of x, cf. (11–13) in Pauling and Wilson (1935). For the case of the smallest eigenvalue λ0 = σκ(m = 0, n = 0, and 2 F|0|,0 (r) = 1) we have ψ(t, r, η) = βe−κr /(2σ)−iσκt . We identify R=−
κr2 2σ
and S = −σκt
with the exponent of distribution and the exponent of motion. Then we get the drift function of the motion a(r) = σ 2
∂R = −σκr, ∂r
in view of (46). The kinematic equation is therefore ∂u 1 2 1 ∂ ∂u 1 ∂ 2 u ∂u + σ r + 2 2 − σκr = 0. ∂t 2 r ∂r ∂r r ∂η ∂r To get stochastic differential equations, we must expand the Laplacian, and rewrite it as % & 1 1 ∂u ∂u 1 ∂2u ∂u 1 2 ∂ 2 u + σ − σκr = 0, + 2 2 + σ2 2 ∂t 2 ∂r r ∂η 2 r ∂r ∂r in the form of equation (2). We then get the sample path equations & % σ 1 21 1 σ drt = σdBt + − σκrt dt, dηt = dBt2 , 2 rt rt 150
Dynamic Theory of Stochastic Movement of Systems where Bt1 and Bt2 are independent one-dimensional Brownian motions. In this case we can also write the kinematic equation in the rectangular coordinates as % & % & ∂u ∂u 1 2 ∂ 2 u ∂ 2 u ∂u − σκ x + σ +y = 0. + ∂t 2 ∂x2 ∂y 2 ∂x ∂y We then get a Markov process Xt = (Xt1 , Xt2 ), which is a solution of a system of stochastic differential equations t i i i Xt = Xa + σBt−a − σκ dsXsi , i = 1, 2. a
Hence Xt =
(Xt1 , Xt2 )
has a stationary distribution density μ = ψψ = ϕ2 = β 2 e−κr
2 /σ
,
and is the stochastic generalization of the classic harmonic oscillation in two dimensions. We now choose m = ±1 and n = 0 for the first excited eigenvalue λ1 = 2σκ. In this case, F|±1|,0 (r) = 2r, and we get two complex-valued eigenfunctions ϕ+1 (r, η) = βre−κr ϕ−1 (r, η) = βre
2 /(2σ)
eiη ,
−κr 2 /(2σ) −iη
e
for m = 1, ,
(49)
for m = −1.
With the eigenfunctions, we define complex-valued functions by ψ±1 (t, r, η) = e−i2σκt ϕ±1 (r, η) = βre−κr
2 /(2σ)+i(−2σκt±η)
,
which satisfy the Schr¨odinger equation in (48). We identify κ 2 r and S±1 = −2σκt ± η R = log βr − 2σ of ψ±1 (t, r, η) with the exponent of distribution and the exponent of motion. In view of equation (46) we get ∂R 1 = σ 2 − σκr, ∂r r σ2 1 ∂S ±1 aη±1 (r, η) = σ 2 =± , r ∂η r ar (r, η) = σ 2
(50)
which give the drift vectors in the polar coordinates. Therefore, the kinematic equation is % & % % & & ∂u 1 ∂ 2u ∂u 1 2 1 ∂ ∂u σ 2 1 ∂u 21 + σ r + 2 2 + σ − σκr ± = 0, ∂t 2 r ∂r ∂r r ∂η r ∂r r r ∂η 151
Masao Nagasawa that is % & % & 1 2 1 ∂u ∂u 1 2 ∂ 2 u 1 ∂ 2 u ∂u σ 2 1 ∂u 21 + σ + 2 2 + σ + σ −σκr ± = 0. ∂t 2 ∂r r ∂η 2 r ∂r r ∂r r r ∂η (51) The kinematic equation (51) then implies the sample path equations & % 1 1 1 drt = σdBt1 + σ 2 dt + σ 2 − σκrt dt, 2 rt rt 2 σ σ dηt = dBt2 ± 2 dt, rt rt
(52)
where Bt1 and Bt2 are independent one-dimensional Brownian motions. The radial motion rt does not hit the origin because of repulsive drift 3σ 2 /(2r) (i.e., the Bessel process, cf. McKean (1960)), and is attracted by drift −σκr. Moreover, solving (3/2)σ 2 /r − σκr = 0, we get 0 r¯ =
3σ . 2κ
If r < r¯, then the radial drift function is positive, while it is negative for r > r¯. Hence our particle is attracted toward the zero point r¯ of the drift function. The angular motion ηt with drift ±σ 2 /r2 induces rotational motion. Therefore, the particle makes random motion in a two-dimensional (American) doughnut of an average radius r¯, and anti-clockwise rotation with drift σ 2 /r2 , or clockwise rotation with drift −σ 2 /(r2), respectively. Remark 2. For the case of the smallest eigenvalue λ0 = σκ, we can understand the motion quite naturally as the stochastic generalization of the classic harmonic oscillation in two dimensions. But, for the first excited eigenvalue λ1 = 2σκ, it is not so easy to understand the motion, if we only look at the Schr¨odinger equation in (48). In fact, it seems our particle knows that it must be in a two-dimensional doughnut. But how did it know this? This is hard to understand, since there is only Hooke’s potential κ2 r2 /2 in the Schr¨odinger equation. How can Hooke’s force confine our particle in the doughnut? Now, let us look at the twin equations of motion in (40). It has a potential function −c(r) = V (r) + V˜ (r) with the self-potential ∂S V˜ (r) = σ 2 (∇S)2 + 2 , ∂t 152
Dynamic Theory of Stochastic Movement of Systems and in the case of the first excited eigenvalue λ1 = 2σκ V˜ (r) = σ 2 /r2 − 4σκ. Hence −c(r) is ∞ at the origin. Moreover, we have dc(r)/dr = −κ2 r + 2σ 2 /r3 , hence the potential function −c(r) becomes minimum at r˜ =
1√
2σ/κ.
Therefore, our particle must stay near by the radius r˜. We now understand that our particle learned from the potential function −c(r) that it must be in the two-dimensional doughnut. However, to see the motion of our particle we must analyze the kinematic equation in (51) or (52), in particular the drift vector given by (50). Through this we have seen that our particle makes the rotational motion in the doughnut together with the random motion. ∇ Remark 3. We once more look at the radial motion rt described by & % ∂u 1 2 ∂ 2 u ∂u 3 21 + σ σ − σκr = 0. + 2 ∂t 2 ∂r 2 r ∂r To have another view of the motion, we apply the so-called time change, that is, Xt (ω) = rτ −1 (t,ω) (ω), where τ
−1
(t, ω) = sup{s; τ (s, ω) ≤ t}, τ (t, ω) =
0
t
α(rs (ω)) ds, and
α(r) = 1/r, cf. e.g., Nagasawa (1993, 2000). Then the motion Xt = rτ −1 (t) satisfies a stochastic differential equation ' dXt = σ Xt dBt + (3σ 2 /2 − σκXt2 )dt, √ in which the coefficient σ x vanishes at the origin, but drift has no singularity. ∇ Example 6. We consider the motion of an electron in a hydrogen atom and take the Schr¨odinger equation i
∂ψ 1 2 + σ Δψ − V (r)ψ = 0, ∂t 2 153
Masao Nagasawa with the Coulomb potential V (r) = −α/r,, where α is a constant. In the spherical coordinates (r, θ, η), x = r sin θ cos η, y = r sin θ sin η and z = r cos θ, it is % & % & ∂ψ 1 2 1 ∂ 1 ∂ψ 1 ∂ 2ψ ∂ 2 ∂ψ i + σ r + sin θ + ∂t 2 r2 ∂r ∂r r2 sin θ ∂θ ∂θ r2 sin2 θ ∂η 2 − V (r)ψ = 0. (53) Substituting ψ = e−iλt ϕ, we get % & % & ∂ 2ϕ 1 ∂ ∂ϕ 1 1 2 1 ∂ 2 ∂ϕ r + sin θ + − σ 2 r2 ∂r ∂r r2 sin θ ∂θ ∂θ r2 sin2 θ ∂η 2 + V (r)ϕ = λϕ. We then apply the separation of variables, namely by substituting ϕ = R(r)Θ(θ)Φ(η), we get d2 Φ = m2 Φ, dη 2 % & 1 d dΘ m2 − Θ = βΘ, sin θ + sin θ dθ dθ sin2 θ % & 1 β dR 1 d − σ2 2 r2 − 2 R + V (r)R = λn R, 2 r dr dr r −
where λn = −
α2 1 , 2σ 2 n2
n = 1, 2, 3, . . . ,
and β = l(l + 1),
l = |m|, |m| + 1, . . . .
For each n we can choose l = 0, 1, 2, . . . , n − 1, m = 0, ±1, ±2, . . . , ±l,
(54)
cf. e.g. section V-21 of Pauling and Wilson (1935). Each (n, l, m) determines the motion of an electron in a hydrogen atom. For n = 1, (n, l, m) = (1, 0, 0), we get a solution ϕ(r, θ, η) = 2 βe−αr/σ , where β is a normalizing constant. Hence the distribution of the motion Xt = (rt , θt , ηt ) is given by P [Xt ∈ d(r, θ, η)] = β 2 e−2αr/σ r2 dr sin θdθdη. 2
154
.
Dynamic Theory of Stochastic Movement of Systems The solution of the Schr¨odinger equation is ψ(t, (r, θ, η)) = βe−αr/σ
2 −iλ t 1
.
By identifying −αr/σ 2 and −λ1 t with the exponents of distribution and motion, we get the evolution function φ(t, (r, θ, η)) = βe−αr/σ
2 −λ t 1
.
The evolution function determines a drift vector by a = σ2
grad φ = (−α, 0, 0). φ
Therefore, the kinematic equation in the spherical coordinates is % & % & ∂u 1 2 1 ∂ 1 ∂u 1 ∂ 2u ∂ 2 ∂u + σ r + sin θ + ∂t 2 r2 ∂r ∂r r2 sin θ ∂θ ∂θ r2 sin2 θ ∂η 2 ∂u = 0. −α ∂r The kinematic equation describes the motion of an electron in a hydrogen atom. But to see sample paths of the motion we need stochastic differential equations. To get them we first expand the Laplacian in the kinematic equation, and rewrite it as σ 2 ∂u 1 σ 2 σ2 ∂ 2u ∂u ∂u 1 2 ∂ 2 u σ 2 ∂ 2 u + + σ + + + cot θ 2 2 2 2 2 2 2 ∂t 2 ∂r r ∂θ r ∂r 2 r ∂θ r sin θ ∂η ∂u −α = 0, ∂r in the form of equation (2). Then the kinematic equation in this form implies the sample path equations for the motion of an electron in a hydrogen atom, when n = 1, 1 dt − αdt, rt σ 1 1 dθt = dBt2 + σ 2 2 cot θt dt, rt 2 rt σ dηt = dB 3 , rt sin θt t drt = σdBt1 + σ 2
where Bt1 , Bt2 and Bt3 are independent one-dimensional Brownian motions, rt > 0, θt ∈ (0, π) and ηt is the angular motion. The radial motion rt does not hit the origin because of repulsive drift σ 2 /r and is 155
Masao Nagasawa attracted with drift −α (see Remark 3 and Remark 4 below). Solving σ 2 /r − α = 0, we get r¯ = σ 2 /α. If r < r¯, then the radial drift function is positive, while it is negative for r > r¯. Hence our electron is attracted toward the zero point r¯ of the drift function, and often moves near by the radius r¯. Since σ 2 = h/2πm and α = 2πe2 /h, where h is the Planck constant, m and −e are the mass and electric charge of an electron, we have h2 1 . r¯ = 2 4π me2 The classic Bohr radius agrees with this. The drift term σ 2 cot θ/2r2 of the motion θt , is singular at θ = 0 and π, hence the electron is repelled from the z-axis and moves near by the xy-plane. We carefully note that our electron does not rotate around a proton in this case of the lowest energy, since the angular motion ηt has no drift. Remark 4. Stochastic processes with drift can be treated with the help of the Maruyama–Girsanov theorem, cf., e.g. Nagasawa (1993, 2000). We consider, for instance, the radial motion. Let {Bt , P } be a one-dimensional Brownian motion, and set Xt = Xa + σBt−a . We define an exponential functional by % t & 1 t −1 t −1 2 σ a(Xs ) dBs − (σ a(Xs )) ds , Ma = exp 2 a a where
1 a(r) = σ 2 − α, r and a new probability measure R by dR = M. dP Moreover, set
˜t−a = σBt−a − σB
t
a(Xs ) ds. a
˜t , R} is a one-dimensional Brownian motion, and Xt can be Then {B written as t ˜ a(Xs ) ds, Xt = Xa + σ Bt−a + a
hence {Xt , R} gives the radial motion rt with the drift vector a(r) = σ 2 /r−α under the transformed probability measure R = Mab P . Therefore, R[f (Xt )] = P [Mat f (Xt )]. ∇ 156
Dynamic Theory of Stochastic Movement of Systems For n = 2, we can choose (l, m) according to (54). We consider an interesting case of (n, l, m) = (2, 1, ±1). In this case we get complexvalued solutions α r −αr/(2σ2 ) e sin θeiη , σ2 2 αr 2 ϕ−1 (r, θ, η) = β 2 e−αr/(2σ ) sin θe−iη , σ 2
ϕ+1 (r, θ, η) = β
where β is a normalizing constant, and solutions of the Schr¨odinger equation ψ±1 (t, (r, θ, η)) = β
α r −αr/(2σ2 )±η−iλ2 t e sin θ. σ2 2
Therefore, the evolution functions are φ±1 (t, (r, θ, η)) = β
α r −αr/(2σ2 )±η−λ2 t e sin θ. σ2 2
The drift vectors determined by the evolution functions φ(= φ±1 ) are % & σ 2 ∂φ 1 ∂φ 1 ∂φ σ2 grad φ = , , , a(r, θ, η) = φ φ ∂r r ∂θ r sin θ ∂η and in the spherical coordinates we get 1 α ar (r, θ, η) = σ 2 − , r 2 2 σ aθ (r, θ, η) = cot θ, r σ2 aη (r, θ, η) = ± . r sin θ Therefore, the kinematic equation in the spherical coordinates is % & % & ∂u 1 2 1 ∂ ∂ 2u 1 ∂ ∂u 1 2 ∂u + σ r + sin θ + 2 2 2 2 ∂t 2 r ∂r ∂r r sin θ ∂θ ∂θ r sin θ ∂η 2 ± a · ∇u = 0, where % ±a · ∇u =
& α ∂u σ 2 1 ∂u σ2 1 ∂u + cot θ ± . σ − r 2 ∂r r r ∂θ r sin θ r sin θ ∂η 21
157
Masao Nagasawa To get stochastic differential equations, we first expand the Laplacian in the kinematic equation, and rewrite it in the form of equation (2), i.e., ∂u 1 2 ∂ 2 u σ 2 ∂ 2 u σ2 ∂ 2u ∂u σ 2 ∂u 1 σ 2 + + + + cot θ + σ 2 2 2 2 2 2 2 ∂t 2 ∂r r ∂θ r ∂r 2 r ∂θ r sin θ ∂η ± a · ∇u = 0. Then the kinematic equation in this form implies the sample path equations for the motion of an electron in a hydrogen atom, when (n, l, m) = (2, 1, ±1), & % α 1 21 21 dt, dt + σ − drt = σdBt + σ rt rt 2 σ 1 σ2 σ2 dθt = dBt2 + cot θ dt + cot θt dt, t rt 2 rt2 rt2 σ σ2 dηt = dBt2 ± 2 2 dt, rt sin θt rt sin θt where Bt1 , Bt2 and Bt3 are independent one-dimensional Brownian motions, rt > 0, θt ∈ (0, π) and ηt is the angular motion. Because of the singularity of the radial drift function 2σ 2 /r at the origin, our electron is repelled strongly from the origin and the origin is inaccessible, but the electron is attracted toward the origin with constant drift −α/2 (see Remark 4). Solving σ 2 2r − α2 = 0, we get r¯ = 4σ 2 /α. If r < r¯, then the radial drift function is positive, while it is negative for r > r¯. Hence our electron is attracted toward the zero point r¯ of the drift function, and often moves near by the radius r¯. Since σ 2 = h/2πm and α = 2πe2 /h, we have r¯ =
h2 1 2 2. 4π 2 me2
The classic Bohr radius for n = 2 agrees with this. The drift term 3σ 2 cot θ/2r2 of the motion θt , is singular at θ = 0 and π, hence the electron is repelled strongly from the z-axis and moves near by the xy-plane. Our electron therefore moves in a three dimensional doughnut with an average radius r¯ around a proton. Moreover, the drift terms ±σ 2 /r2 sin2 θ of the angular motion ηt induce rotational motion, hence the electron makes anti-clockwise rotation if m = 1, and clockwise rotation if m = −1. The rotational motion of the electron induces the magnetic moment of the hydrogen atom 158
Dynamic Theory of Stochastic Movement of Systems in this case of (n, l, m) = (2, 1, ±1). For the case of further excited motion, cf. section 4.6 of Nagasawa (2000). I remark here that we can more generally treat the motion of a charged particle in an electro-magnetic field. In this case we must handle the Schr¨odinger equation with a vector potential. An interesting case of the motion in a homogeneous magnetic field is analyzed in Nagasawa (2002*). The analysis of the typical examples in this section clarifies that the Schr¨odinger equation is a complex-valued alternative of the twin equations of motion, and one can use it for computing the energy and distribution density of the motion, but one needs the dynamic theory of stochastic motion for analyzing the sample paths of the motion of particles. The Schr¨odinger equation is in fact the complex-valued evolution equation and its solutions are complex-valued evolution functions in the dynamic theory, and we can use them in analyzing the sample paths of the motion of particles. 4.10
Interference phenomena and entangled motion
The ‘interference’ is originally a notion in wave theories, and the stripes-like patterns of distributions in the problem of double slits have been understood in wave mechanics and quantum mechanics as the typical effect of the ‘wave property’ of electrons. This is not correct. Moreover, it has been said that the theory of stochastic processes cannot solve the double slits problem. This is, in a sense, true. As will be explained, Kolmogorov–Itˆo’s conventional theory of Markov processes does not have the mathematical structures for discussing the double slits problem. But does the dynamic theory of stochastic motion? We will explain that the stripes-like patterns of distributions in the double slits problem are a result caused by the entangled motion of an electron in the dynamic theory of stochastic motion. We shoot an electron and observe it at a screen. In between the electron gun and the screen we set double slits. In the wave theory of electrons, the wave function ψ splits into two parts ψ1 and ψ2 at the double slits, and at the screen we get ψ1 (t, x)+ψ2 (t, x), which is called the superposition of the wave functions ψ1 (t, x) and ψ2 (t, x), where we ignore the normality condition for simplicity. Therefore, the intensity of the wave is given by |ψ1 (t, x) + ψ2 (t, x)|2 = |ψ1 (t, x)|2 + |ψ2 (t, x)|2 + (ψ 1 (t, x)ψ2 (t, x) + ψ1 (t, x)ψ 2 (t, x)), 159
Masao Nagasawa where ψ 1 ψ2 + ψ1 ψ 2 is the interference of the wave functions ψ1 (t, x) and ψ2 (t, x). However, a single electron arrives at a point on the screen, and does not show this intensity. This means that the wave theory failed in solving the problem of double slits. Nevertheless, the intensity |ψ1 + ψ2 |2 is realized as a statistical distribution of many electrons. In fact, if we shoot electrons one by one, so that only a single electron is in the apparatus, then electrons will arrive at the screen one by one successively, and the distribution density of those electrons shows the existence of ‘interference’ ψ 1 ψ2 + ψ1 ψ 2 statistically. Therefore, this ‘interference’ cannot be the interference of waves in conventional wave theories, but it is a statistical effect, which is caused by the entanglement of motion. We will analyze it with the dynamic theory of stochastic motion. We first note carefully that there is no ‘entanglement of motion’ in the classical theory of Markov processes, because one can prescribe only initial distributions in the conventional theory of Markov processes. To see this, let us assume that the path of an electron goes through either the first slit or the second slit. If we apply the classical theory of Markov processes to the motion of an electron after the double slits, the Markov process starts at the double slits with an initial distribution density 12 (μ1 (x) + μ2 (x)), where μ1 (x) and μ2 (x) depend on the width of the first and second slits and the distance of the slits. The Markov process will then arrive at the screen with the distribution density 12 (μ1 (t, x) + μ2 (t, x)), which shows no entanglement. We can understand this as follows; we consider two Markov processes starting at the two slits with the distribution densities μ1 (x) and μ2 (x), respectively. We make the superposition of two Markov processes. Through this superposition in the conventional theory of Markov processes we get no entanglement. We now analyze the double slits problem with the dynamic theory of stochastic motion of particles, which is different from the conventional theory of Markov processes, as we have seen. In the dynamic theory, the motion of an electron is determined by an evolution funcˆ x). At the double tion φ(t, x) and a backward evolution function φ(t, slits they are decomposed into φ1 (t, x) = eR1 (t,x)+S1 (t,x) , φˆ1 (t, x) = eR1 (t,x)−S1 (t,x) ,
φ2 (t, x) = eR2 (t,x)+S2 (t,x) , φˆ2 (t, x) = eR2 (t,x)−S2 (t,x) .
(55)
To describe the motion of an electron after the double slits, we apply the equivalence of a complex-valued exponential function and a pair of 160
Dynamic Theory of Stochastic Movement of Systems real-valued exponential functions, explained in Section 4.8, in which we identify the pair of functions {R, S} that appears as the exponents of the complex-valued function in equation (38) and as the exponents of the real-valued functions in equation (39). By using the pairs {R1 , S1 } and {R2 , S2 } in (55), we first define complex-valued exponential functions by ψ1 (t, x) = eR1 (t,x)+iS1 (t,x) , ψ2 (t, x) = eR2 (t,x)+iS2 (t,x) ,
(56)
and apply the superposition ψ(t, x) = ψ1 (t, x) + ψ2 (t, x), where we ignore the normality condition for simplicity. We then represent ψ(t, x) in the exponential form as ψ(t, x) = eR
∗ (t,x)+iS ∗ (t,x)
.
(57)
Let t = t0 at the double slits. Then {R∗ (t0 , x), S ∗ (t0 , x)} is the initial condition at t = t0 for the motion of an electron after the double slits. We define a pair of real-valued exponential functions ∗
∗
φ(t, x) = eR (t,x)+S (t,x) , ˆ x) = eR∗ (t,x)−S ∗ (t,x) , φ(t,
(58)
by using {R∗ (t, x), S ∗ (t, x)} in (57). For the motion of an electron after the double slits, we thus get the evolution function φ(t, x) and ˆ x), which determine a Markov time-reversed evolution function φ(t, process with an entangled drift vector a(t, x) = σ 2 ∇(R∗ (t, x) + S ∗ (t, x)), in view of (27), hence the Markov process is described by the kinematic equation ∂u 1 2 + σ Δu + a(t, x) · ∇u = 0, ∂t 2 where σ 2 = ε/m with ε = h/2π. In this way we get an entangled Markov process {Xt , Q} which describes the motion of an electron from the double slits to the screen. Then, in view of the basic relation in (43), the distribution density of the entangled process is given by ˆ x)φ(t, x) = ψ(t, x)ψ(t, x) μ(t, x) = φ(t, = |ψ1 (t, x) + ψ2 (t, x)|2 = |ψ1 (t, x)|2 + |ψ2 (t, x)|2
(59)
+ ψ1 (t, x)ψ 2 (t, x) + ψ 1 (t, x)ψ2 (t, x), 161
Masao Nagasawa where ψ1 (t, x) and ψ2 (t, x) are given by (56). Therefore, we get μ(t, x) = e2R1 (t,x) + e2R2 (t,x) + 2eR1 (t,x)+R2 (t,x) cos(S1 (t, x) − S2 (t, x)). (60) One shoots electrons one by one making a time lag long enough so that only a single electron is in the apparatus to avoid interactions between electrons. Then electrons arrive on the screen one by one, and the distribution density of the electrons on the screen statistically shows the effect of entangled motion 2eR1 (t,x)+R2 (t,x) cos(S1 (t, x) − S2 (t, x)).
(61)
This should not be confused with the interference of waves. Let me remind you that we have discussed the double slits problem in the framework of the dynamic theory of random motion, which is a particle theory, not a wave theory. As remarked already, the wave theory cannot explain the double slits experiment. The problem of double slits has been for a long time a problem of gedankenexperiments. It is however experimentally realized by Tonomura (cf. e.g. Tonomura (1994)), and his experiment clearly shows that the stripes-like pattern of the distribution density of many electrons in the double slits problem is not a wave phenomenon but a purely statistical phenomenon. I emphasize that if an electron is a wave and the stripes-like pattern is caused by the wave, then the pattern must appear already by a single electron. But this does not occur. Therefore, Tonomura’s experiment of double slits definitively denies the claim in Schr¨odinger’s wave mechanics that an electron is a wave, and also the claim in quantum mechanics that an electron has the wave property. The experiment justifies the dynamic theory of stochastic motion. We have shown that the stripes-like pattern is not caused by the socalled wave ‘property’ of an electron, but it is a ‘phenomenon’, which is produced by the motion of an electron, i.e., by the ‘entanglement of the motion’ of an electron. Property and phenomenon are different notions, and we should distinguish them clearly. We notice moreover that the existence of the exponent of motion S(t, x) plays a decisive role in equation (61), which is the formula on the effect of entanglement of motion. This fact had not been recognized in the conventional theory of Markov processes. We carefully note that the exponent of motion S(t, x) was found through the analysis of time reversal of Markov processes and the duality relation between semi-groups of a space-time Markov process and its time reversed process, cf. Nagasawa (1993, 2000). 162
Dynamic Theory of Stochastic Movement of Systems Acknowledgements: I would like to express my gratitude to Professor Bjarne S. Jensen of Copenhagen Business School, who kindly arranged for me to give a lecture at the University of Copenhagen in 2004 on the theme explained in this exposition. References: Aspect, A., Dalibard, J., and Roger, G. (1982) “Experimental Test of Bell’s Inequalities Using Time-Varying Analyzers.” Physical Review Letters 49: 1804–1807. Bell, J.S. (1964) “On the Einstein Podolsky Rosen Paradox.” Physics 1: 195–202. Born, M. (1926) “Zur Quantenmechanik der Stossvorg¨ange.” Zeitschrift f¨ ur Physik 37: 863–867. Condon, E.U. (1962) “60 Years of Quantum Physics.” Physics Today 15: 37–49. Einstein, A., Podolsky, B. and Rosen, N. (1935) “Can QuantumMechanical Description of Physical Reality Be Considered Complete?” Physical Review 47: 777–780. Jensen, B.S., and Richter, M. (2007) “Stochastic Neoclassical OneSector and Two-Sector Growth Models with Uncertainty in Continuous Time.” In this volume. Jensen, B.S., Wang, C., and Johnsen, J. (2007) “Moment Evolution of Gaussian and Geometric Wiener Diffusion – Derived by the Itˆo Lemma and Kolmogorov Equation.” In this volume. ¨ Kolmogoroff, A. (1931) “Uber die Analytischen Methoden in Wahrscheinlichkeitsrechnung.” Mathematische Annalen 104: 415–458. Kolmogoroff, A. (1933) “Grundbegriffe der Wahrscheinlichkeitsrechnung.” Ergebniss der Mathematik 2: Heft 3. Springer-Verlag. McKean, H.P. (1960) “The Bessel motion and a singular integral equation.” Memoir of College of Science University of Kyoto, Series A. Mathematics 33: 317–322. Nagasawa, M. (1993) Schr¨ odinger Equations and Diffusion Theory. Birkh¨auser Verlag, Basel, Boston, Berlin. Nagasawa, M. (1997) “On the locality of hidden variable theories in quantum physics.” Chaos, Solitons and Fractals 8: 1773–1792. Nagasawa, M. (2000) Stochastic Processes in Quantum Physics. Birkh¨auser Verlag, Basel, Boston, Berlin. 163
Masao Nagasawa Nagasawa, M. (2002) “On quantum particles.” Chaos, Solitons & Fractals 13: 1393–1405. Nagasawa, M. (2002) “A note on a remark by Landau regarding a charged particle in a magnetic field.” Chaos, Solitons and Fractals 14: 1065–1070. Nagasawa, M. (2003) Schr¨ odinger’s Dilemma and Dream, Stochastic Processes and Wave Mechanics (in Japanese). Morikita Shuppan (publishing), Tokyo. Nagasawa, M., and Schr¨oder, K. (1997) “A note on the locality of Gudder’s hidden-variable theory.” Chaos, Solitons and Fractals 8: 1793– 1805. Nagasawa, M., and Tanaka, H. (1999a) “Stochastic differential equations of pure-jumps in relativistic quantum theory.” Chaos, Solitons and Fractals 10: 1265–1280. Nagasawa, M., and Tanaka, H. (1999b) “Time dependent subordination and Markov processes with jumps.” Seminaire de Probabilite 34: 257–288. Lecture Notes in Mathematics 1729, Springer. Nagasawa M., and Tanaka, H. (1999c) “The principle of variation for relativistic quantum particles.” Seminaire de Probabilite 35: 1–27. Lecture Notes in Mathematics 1775, Springer. Pauling, L., and Wilson, E.B. (1935) Introduction to Quantum Mechanics with Applications to Chemistry. McGraw-Hill Book Co. Inc., New York. Schr¨odinger, E. (1926a) “Quantisierung als Eigenwertproblem (1. Mitteilung).” Annalen der Physik 79: 336–376. ¨ Schr¨odinger, E. (1926, II) “Uber das Verh¨altnis der Heisenberg–Bom– Jordanschen Quantenmechanik zu der meinen.” Annalen der Physik 79: 734–756. ¨ Schr¨odinger, E. (1931) “Uber die Umkehrung der Naturgesetze.” Sitzungsberichte der Preussischen Akademie der Wissenschaften Physikalisch-Mathematische Klasse, 144–153. Tonomura, A. (1994) Microscopic world visualized with electron beam holography. (in Japanese). Nihon Hyoronsha, Tokyo.
164
Part II: Stochastic Dynamics of Basic Growth Models and Time Delays
Chapter 5 Stochastic One-Sector and Two-Sector Growth Models in Continuous Time
Bjarne S. Jensen University of Southern Denmark and Copenhagen Business School Martin Richter Copenhagen Business School Danske Research, Danske Bank, Copenhagen
5.1
Introduction
To set the stage for the main content, exposition, and organisation of our subject matter - stochastic neoclassical models of capital accumulation in continuous time with steady-state or persistent growth per capita - we will review some fundamental issues of “stochastic processes” [parameterized collections (sequences) of stochastic variables] that were first raised in the seminal papers of Mirman (1972, 1973), Brock and Mirman (1972, 1973), Merton (1975), Bourguignon (1974). Since the concept of a steady state equilibrium has played an important role in both positive and optimal theory of economic growth, Mirman asked whether the same questions can be posed in random growth models as in deterministic growth models: “In what sense should one even discuss the random evolution of the system ? How does one define a concept in the random case analogous to the deterministic steady state? Added to these questions are the usual questions of existence, uniqueness, and stability for the random analogue of the steady state”, Mirman (1973, p. 220).
Bjarne S. Jensen, Martin Richter Then he redefined the concept of a steady state in a stochastic sense: “This is done by using the distribution function of possible capital labor ratios generated by the stochastic growth process. Having defined the steady state in terms of a distribution function, we then show that, for each admissible policy, the corresponding stochastic system has a unique steady state distribution, which is a degenerate distribution in the deterministic theory. Moreover, it is shown that this unique steady state distribution is stable in the sense that the set of possible states of the system converges over time to a well-defined set, the analogue of the deterministic steady state, which supports the unique steady state distribution. Finally, it is shown that the sequence of distributions converges to a unique steady state distribution”, Mirman (1973, p. 220). In implementing this research program and in the mathematical analysis, Mirman (1972, pp. 224) first assumed that his random variables A(t) - “technology shocks” - were independent, identically distributed in discrete time, and at any time independent of the capital labor ratios, k(t); further, it was assumed that the shocks A(t) were always strictly positive and finite, (bounded away from both zero and infinity). To further simplify the analysis and avoid the possibility of a steady state at either zero or infinity, the technology (production function) was assumed to satisfy the simple derivative conditions of Inada (satisfied by the CD technology). With these assumptions, Mirman used mathematical techniques similar to those in the theory of Markov chains to establish the stochastic generalization of the Solow growth model by showing the existence, uniqueness, and stability of stationary probability measures. With his assumptions, he proved that a stationary measure will always exist, and the stationary measure will be unique if the recurrent states all communicate and admit no cyclically moving subsets. Stability meant that iterates (sequences) of the transition probability tend to a unique asymptotic (time invariant) probability measure (distribution). The tools of the Markov processes were used to demonstrate such stability (convergence) to the unique stationary (steady-state) distribution of the capital-labor ratio. In short, particular neoclassical assumptions and 168
Stochastic One-Sector and Two-Sector Models “techniques from positive deterministic growth theory were combined with the tools of Markov processes to achieve a positive theory of stochastic economic growth”, Mirman (1973, p. 230). Mirman (1973) had assumed that his random variables A(t), (technology shocks), which influence the production process, are strictly positive and finite. ”More precisely, for any capital stock, output can neither be arbitrarily large nor arbitrarily small (even with arbitrarily small probability)”, but “it was not clear where the bounds of possible random effects should be set”, Mirman (1972, p. 271). Next, he addressed this problem with arbitrarily large/small outputs. Still, the existence of a stationary measure (steady-state distribution) was demonstrated with a fixed-point argument, Mirman (1972, p. 279): “However, it is possible that there exists a positive probability of extinction or positive probability of an infinite capital stock in the stochastically generalized notion of a steady state”. Mirman (1972, p. 271) studied conditions “for the existence of stationary measures having zero probability at zero and infinity”, - which is analogous to imposing the Inada conditions on the production process. For recent advances in the field of stochastic neoclassical growth models in discrete time, see Schenk-Hoppe (2002), Lau (2002). The first and important extension of the stochastic study by Mirman and Brock of the discrete-time, neoclassical one-sector growth model to continuous-time stochastic processes was done by Merton (1975) and Bourguignon (1974). Besides existence and uniqueness, much more specific (parametric) structures of the steady-state (asymptotic, limit) distributions for the capital-labor ratio and other variables could now be examined. Thus, Merton (1975, 1990) derived density functions and first and second moments that would be obtained in a steady-state, and a comparison of the results and biases (in expected value) between deterministic (certainty) and stochastic modelling were rigorously derived for a stochastic growth model with a CD production function. The source of uncertainty was not positive technology shocks, but the uncertainty element in Merton (1975) affected the evolution of the labor (population) stock, which he assumed followed a geometric Wiener process. With the latter, the boundary problems at zero and infinity for the capital-labor ratio were essentially absent from the first stochastic Solow growth model in continuous time. 169
Bjarne S. Jensen, Martin Richter In the neoclassical one-sector growth model, the sources of uncertainty were extended by Bourguignon (1974) to saving and depreciation rates, and the more general CES technology was adopted. Hence, the boundary problem for the capital-labor ratio naturally came into focus, and the result was that “uncertainty can make the neoclassical model closer to the Harrod-Domar-type models of growth in introducing the possibility of a collapse of the economy”, Bourguignon (1974, p. 142). In particular, uncertainty in the saving rate posed (without further parameters restrictions) a serious problem of the lower boundary absorption (k = 0). Since 1975, the consequences of such critical boundary problems have somehow been that this field of research in continuoustime stochastic growth models has not matured and in fact mostly disappeared from the economic literature. A new start is needed and is here attempted, partly by first resolving the older methodological problems with absorbing boundaries and steady state and partly by extending the stochastic neoclassical framework to one-sector and two-sector models with parameters generating endogenous (persistent) per capita growth. Simulations of sample paths and asymptotic density functions will illustrate our Theorems and the properties of the parametric stochastic processes. The study of deterministic general equilibrium dynamics in twosector and multi-sector growth models has been reviewed and extended in Jensen [2003], Jensen and Larsen [2005] - with emphasis on factor allocation, output composition, and the dualities for commodity and factor prices. Sample paths of the stochastic two-sector analogue are here discussed and demonstrated. For our purposes and as benchmarks, production functions of the CD and CES form are used in both the stochastic one-sector and two-sector growth models. These technologies must therefore first be introduced and adequately described. 5.2
Neoclassical technologies and CES forms
The sector technologies are - in stochastic one-sector and two-sector dynamics - described by nonnegative, smooth, concave, homogeneous production functions, Fi (Li , Ki ), i = 1, 2, with constant returns to scale in labor and capital, Yi = Fi (Li , Ki ) = Li Fi (1, ki ) ≡ Li fi (ki ) ≡ Li yi , Li = 0; Fi (0, 0) = 0 (1) 170
Stochastic One-Sector and Two-Sector Models where the function fi (ki ) is strictly concave and monotonically increasing in the capital-labor ratio ki ∈ [0, ∞), i.e., ∀ki > 0 : fi (ki ) = dfi (ki )/dki > 0,
fi (ki ) = d 2fi (ki )/dki2 < 0
(2)
The sectorial output elasticities, Li , Ki , i - with respect to marginal and proportional factor variation - are, cf. (1), M P Li ∂Yi Li ki fi (ki ) > 0, ki = 0 (3) = =1− ∂Li Yi APLi fi (ki ) M PKi ∂Yi Ki ki fi (ki ) = E(yi , ki ) > 0 (4) ≡ E(Yi , Ki ) ≡ = = ∂Ki Yi APKi fi (ki ) ≡ Li + Ki = 1. (5)
Li ≡ E(Yi , Li ) ≡ Ki i
At any point on the isoquants, the marginal rates of technical substitution, ωi (ki ) are, by (2), positive monotonic functions, M P Li L fi (ki ) − ki = i ki > 0, ∀ki > 0. ωi (ki ) = = (6) M PKi fi (ki ) Ki CES Production Functions General CES forms of Fi (Li , Ki ), (1),γi > 0, 0 < ai < 1, σi > 0 are i Kiai = Li γi kiai ≡ Li fi (ki ) Yi = Fi (Li , Ki ) = γi L1−a i
σi −1 σi
Yi = Fi (Li , Ki ) = γi (1 − ai )Li
σi −1 σi
+ ai Ki
(7)
σ σ−1 i i
(8)
σ /(σi −1) (σ −1)/σi i ≡ Li fi (ki ) (9) = Li γi (1 − ai ) + ai ki i 1/(σi −1) −(σ −1)/σi (10) fi (ki ) = γi ai kiai −1 , fi (ki ) = γi ai ai + (1 − ai )ki i σ /(σ −1)
≶ 1), The limits of fi (ki ) and fi (ki ) become, (∀i : σi ≷ 1 ⇒ ai i i ⎧ σi ⎪ lim fi (ki ) = γi (1 − ai ) σi −1 ⎨ lim fi (ki ) = 0, ki →0 ki →∞ σ (11) σi < 1 : i ⎪ ⎩ lim fi (ki ) = γi aiσi −1 , lim fi (ki ) = 0 k →0 ki →∞ ⎧ i ⎨ lim fi (ki ) = 0, lim fi (ki ) = ∞ ki →∞ σi = 1 : ki →0 (12) lim fi (ki ) = 0 ⎩ lim fi (ki ) = ∞, ki →0 ki →∞ ⎧ σi ⎪ lim fi (ki ) = ∞ ⎨ lim fi (ki ) = γi (1 − ai ) σi −1 , ki →0 ki →∞ σi (13) σi > 1 : σ −1 ⎪ ⎩ lim fi (ki ) = ∞, lim fi (ki ) = γi ai i ki →0
ki →∞
171
Bjarne S. Jensen, Martin Richter For the CES technologies, the monotonic relations between marginal rates of substitution, factor proportions, and output elasticities are, cf. (8-10), σ 1 − ai 1/σi 1 1 − ai i σi ki , ki = [ωi ] , ci = i = 1, 2. (14) ωi = ai ci ai i −1 1 − ai 1−σ 1 ci ωi1−σi σi Ki = 1 + ki = , = (15) Li ai 1 + ci ωi1−σi 1 + ci ωi1−σi With two-sector models and CES technologies, it is apparent from (14) that sectorial factor ratio (”intensity”) reversals can only be avoided if and only if σ1 = σ2 and a1 = a2 . Hence, with σ1 = σ2 , there will be ¯ ω ¯ ): a reversal point, (ki , ωi ) = (k, k¯ = 5.3 5.3.1
a1 (1 − a2 ) a2 (1 − a1 )
σ2 σσ1−σ 2
1
c σ1 = 2σ2 c1
σ
1 2 −σ1
,
c2 ω ¯= c1
σ
1 2 −σ1
(16)
Stochastic one-sector growth models Introduction of stochastic elements
The standard deterministic neoclassical one-sector growth model is described by the ordinary differential equations (ODE), cf. (1), dL/dt dK/dt dk/dt
L˙ = Ln ≡ K˙ = Lsf (k) − δK ≡ k˙ = sf (k) − (n + δ)k ≡
(17) (18) (19)
This general model becomes, with uncertainty (stochastic elements i ) in the growth rate of labor n, the gross saving rate s, and the capital depreciation rate δ, L˙ = L(n + β1 1 ) K˙ = L(s + φ3 (k) 3 )f (k) − (δ + β2 2 )K
(20) (21)
where βi ≥ 0, and (1 , 2 , 3 ) are “white noise” (stochastic process with a constant spectral density function), related to Wiener processes (w1 , w2 , w3 ) with the correlation structure d wi , wj = ρij dt,
ρii = 1,
i, j = 1, 2, 3
(22)
and wi , wj is the quadratic variation process for the components of the Wiener process, Karatzas and Shreve (1991), Øksendal (2003). For 172
Stochastic One-Sector and Two-Sector Models the formal connection between Wiener processes and “white noise”, see Holden et. al (1996, chap.3). The function φ3 is, as later explained, here conveniently chosen (to avoid boundary problems at zero) as φ3 (k) = β3 tanh(λ3 k),
k ∈ [0, ∞),
φ3 (0) = 0,
φ3 (∞) = β3 (23)
For the labor and capital stock, the associated stochastic differential equations (SDE) to (20–21) are given by dL = Ln dt + Lβ1 dw1 dK = (sLf (k) − δK) dt − β2 K dw2 + Lf (k)φ3 (k) dw3
(24) (25)
The drift and diffusion coefficients of the stochastic dynamic system (24–25) are homogeneous functions of degree one in the state variables L and K. The homogeneity of degree one allows us to reduce the twodimensional stochastic system (24–25) to one-dimensional stochastic dynamics of the capital-labor ratio. As an alternative to the uncertainty in the saving rate, we also consider uncertainty (stochastic element 4 ) in technology, more precisely, uncertainty (4 ) in the total productivity parameter (γ) of the production function f (k), i.e., K˙ = Ls(γ + φ4 (k) 4 ) [f (k)/γ] − (δ + β2 2 )K
(26)
where φ4 is similar to φ3 defined in equation (23). Hence, with (26), the stochastic differential equation (25) is replaced by dK = (sLf (k) − δK) dt − β2 K dw2 + Ls [f (k)/γ] φ4 (k) dw4 .
(27)
The stochastic differential equations (24-25) or (24), (27) represent a two-dimensional stochastic system, driven by a three-dimensional Wiener process. For the purposes of simulations, i.e., computing the sample paths L(t, ω) and K(t, ω) or the ratio, k(t, ω) = K(t, ω)/L(t, ω), it is sufficient to use the equations (24), (25) or (24),(27). In fact, the SDE (24) is the well-known geometric Wiener process. There is in general no closed form expression for the solutions (sample paths) for K(t, ω) or k(t, ω). However, to precisely examine the absorbing boundary conditions and stationarity conditions for the diffusion process, k(t), it is necessary to obtain an analytical expression for k(t) as given by a particular one-dimensional Wiener process. Fortunately, as both drift and diffusion coefficients are homogeneous functions of degree one in K and 173
Bjarne S. Jensen, Martin Richter L, it allows us to analytically describe k(t) as a one-dimensional SDE, driven by a one-dimensional Wiener process, where the relevant drift coefficient and diffusion coefficient now need to be exactly determined, cf. Jensen and Wang (1999, Lemma 1). 5.3.2
The SDE of the capital-labor ratio
Theorem 1. The stochastic neoclassical dynamics for the capitallabor ratio k(t) of (24-25) is a diffusion process given by the SDE, dk = −(K/L2 ) dL + (1/L) dK + (K/L3 ) dL2 − (1/L2 ) dLdK = [s − ρ13 β1 φ3 (k)]f (k) − n + δ − (β12 + ρ12 β1 β2 ) k dt
(28) (29)
− β1 k dw1 − β2 k dw2 + φ3 (k)f (k) dw3 The SDE (29) can in its domain be given in the compact form, dk = a(k) dt + b(k) dw,
k(t) ∈ (0, ∞)
(30)
with the drift coefficient, a(k) = s¯(k)f (k) − Θk s¯(k) = s − ρ13 β1 φ3 (k)
Θ = n + δ − (β12 + ρ12 β1 β2 )
(31) (32)
and the diffusion coefficient, b2 (k) = β 2 k 2 + φ3 (k)2 f (k)2 − ρφ3 (k)f (k)k β 2 = β12 + β22 + 2ρ12 β1 β2
ρ = 2(ρ13 β1 + ρ23 β2 ).
(33) (34)
Proof: Itˆo’s Lemma: Let X(t) ∈ Rn be a general diffusion process, and if F (X) is an arbitrary C 2 map from Rn → R, then dF (X) = FxT dX + 1/2dX T Fxx dX
(35)
i.e., F (X), determined by diffusion process X(t), is again a diffusion process, where Fx represent the partial derivatives with respect to x of the function F (x), and Fxx represents the Hessian matrix of the function F (x), and where, (dwi )2 = dt ∀i ; dwi · dwj = 0 for i = j ; (dt)2 = 0 ; dt · dwi = 0. Hence, with X = (L, K)T , 174
dX = (dL, dK)T ,
F (X) = K/L ≡ k,
(36)
Stochastic One-Sector and Two-Sector Models ∂F FX =
∂L ∂F ∂K
=
− LK2
1 L
⎛ , FXX = ⎝
∂2F ∂L2
∂2F ∂L∂K
∂2F ∂K∂L
∂2F ∂K 2
⎞ ⎠=
2K L2 −1 L2
−1 L2
0 (37)
we get, cf. (35–37), 1 K dk = − 2 dL + dK + L L 1 K = − 2 dL + dK + L L
% & 1 2K 1 2 2 dL − 2 2 dLdK + 0 dK 2 L L K 1 dL2 − 2 dLdK L3 L
(38) (39)
which is (28). Inserting (24) and (25) into (39) gives dk = −
K 1 L(n dt + β1 dw1 ) + L[{sf (k) − δk} dt − β2 k dw2 2 L L K 2 + φ3 (k)f (k) dw3 ] + 3 L (n dt + β1 dw1 )2 L 1 − 2 L(n dt + β1 dw1 )L[{sf (k) − δk} dt L − β2 k dw2 + φ3 (k)f (k) dw3 ]
= −k(n dt + β1 dw1 ) + {sf (k) − δk} dt − β2 k dw2 + φ3 (k)f (k) dw3 + k(n dt + β1 dw1 )2 − (n dt + β1 dw1 ) × [{sf (k) − δk} dt − β2 k dw2 + φ3 (k)f (k) dw3 ] = −nk dt − β1 k dw1 + sf (k) dt − δk dt − β2 k dw2 + φ3 (k)f (k) dw3 + kβ12 dt − β1 dw1 [−β2 k dw2 + φ3 (k)f (k) dw3 ] = −nk dt − β1 k dw1 + sf (k) dt − δk dt − β2 k dw2 + φ3 (k)f (k) dw3 + kβ12 dt + ρ12 β1 β2 k dt − ρ13 β1 φ3 (k)f (k) dt
= sf (k) − ρ13 β1 φ3 (k)f (k) − nk − δk + β12 k + ρ12 β1 β2 )k dt − β1 k dw1 − β2 k dw2 + φ3 (k)f (k) dw3
(40)
which establishes (29). Finally, using Levy’s characterization, the local martingale term, −β1 k dw1 − β2 k dw2 + φ3 (k)f (k) dw3 , can be simplified to b(k) dw, where w is a new one-dimensional Wiener process. The diffusion coefficient b(k) can be calculated by determining the quadratic variation 175
Bjarne S. Jensen, Martin Richter of : −β1 k dw1 − β2 k dw2 + φ3 (k)f (k) dw3 . Hence, we get b(k) dw
≡
− β1 k dw1 − β2 k dw2 + φ3 (k)f (k) dw3
b2 (k) dt
≡ [−β1 k dw1 − β2 k dw2 + φ3 (k)f (k) dw3 ]2
b2 (k) dt
= β12 k 2 dt + β22 k 2 dt + φ3 (k)2 f (k)2 dt + 2β1 β2 k 2 [dw1 , dw2 ] − 2β1 kφ3 (k)f (k) [dw1 , dw3 ] − 2β2 kφ3 (k)f (k) [dw2 , dw3 ]
b2 (k) dt
= β12 k 2 dt + β22 k 2 dt + φ3 (k)2 f (k)2 dt + 2ρ12 β1 β2 k 2 dt − ρ13 2β1 kφ3 (k)f (k) dt − 2ρ23 β2 kφ3 (k)f (k) dt
b2 (k)
= (β12 + β22 + 2ρ12 β1 β2 )k 2 + φ3 (k)2 f (k)2
(41)
− 2(ρ13 β1 + ρ23 β2 )φ3 (k)f (k)k which is succinctly summarized in (33) together with (34).
2
The sample path (trajectory) of the process k(t), (30–34), is formally given by t t k(t; ω) = k(0) + a(k[u; ω]) du + b(k[u; ω]) dwu (ω) (42) 0
0
where k(0) is a fixed initial condition, and ω symbolizes a particular realization of the Wiener process. The sample path (128) in this paper is approximated by the Euler scheme, Kloeden and Platen (1995). Remark 1. Note that correlation ρ23 does not enter the drift coefficient in (31–32), because the coefficient of dK 2 is zero, cf. (37–38). Furthermore, note that our introduction of uncertainties (24–25) implies that the deterministic accumulation parameters, s, n, δ, only appear in the drift coefficient , (31–32), but not in the diffusion coefficient, (33–34). Some diffusion parameters βi , ρij , however, may appear in the drift coefficient, (31–32). From the derived diffusion process of the capital-labor ratio k(t) in Theorem 1, we can now analyze the evolution of the one-sector economy, with emphasis on long-run behavior (asymptotic properties). The drift and diffusion coefficients, however, govern the evolution only at interior points of the state space. To fully define a diffusion process, the behavior at any boundary points requires separate specification. For our purposes of studying the long-run evolution of nontrivial states, we need to carefully examine the conditions that will make the 176
Stochastic One-Sector and Two-Sector Models boundaries, k = 0 or k = ∞, inaccessible for any finite time (t < ∞). If a(0) = 0 and b(0) = 0 – as is often seen, cf. (30–34) and (11–13) – then k(t) = 0 is an absorbing boundary, i.e., the sample paths k(t) remain at the zero position, once it is attained. Even if a(0) = 0, or b(0) = 0, and hence k = 0 is not an absorbing state, we cannot admit negative state values of k(t), i.e., a viable (working) economic diffusion model must require that the inaccessibility of the boundary state k = 0 is ensured by imposing sufficient parameter restrictions on the actual drift and diffusion coefficients. As the incremental Wiener processes dwi (t) ∈ (−∞, ∞) may occasionally take on very large negative values, the drift and diffusion coefficients must indeed be carefully studied to prevent the random variable k(t) from hitting the lower boundary, k = 0. 5.4
Boundaries, steady-state, and convergence
5.4.1 Terminology, concepts and definitions Let the transition probability in case of a one-dimensional stochastic process X(t) be denoted P (x, t; x0 , t0 ) = Pr[X(t) ≤ x | X(t0 ) = x0 ]
(43)
where X(t) is the state of the process at instant t. The transition probability distribution P (x, t; x0 , t0 ) is assumed to have a probability density function p(x, t; x0 , t0 ), defined everywhere. Boundary conditions. In terms of notation in (30), in the one-dimensional case, we define the following indefinite integrals (functions), x x a(u) du; s(x) = exp{−2J(x)}, S(x) = s(u)du (44) J(x) = 2 x0 b (u) x0 x exp{2J(x)} 1 m(x) = = , M(x) = m(u)du (45) b2 (x) b2 (x)s(x) x0 The functions s(x), S(x), m(x), and M(x) are called, respectively, the scale density function, the scale function, the speed density function, and the speed measure of the stochastic processes X(t); cf. Karlin and Taylor (1981, p. 194–96, p. 229). Inaccessible boundaries. Let the diffusion process X(t) have two boundaries r1 < r2 . Sufficient conditions: The boundaries r1 and r2 are inaccessible, if ∀x0 ∈ [r1 , r2 ] , S(r1 ) = −∞; S(r2 ) = +∞; equivalently, s(x) is not integrable on the closed interval [ri , x0 ]
(46) 177
Bjarne S. Jensen, Martin Richter or if
S(ri ) = lim S(x) = lim x→ri
x→ri
x
s(x)dx = ∓∞;
i = 1, 2
(47)
x0
The necessary and sufficient condition is ri Σ(ri ) ≡ [S(ri ) − S(x)]m(x)dx = +∞;
i = 1, 2
(48)
x0
Existence of steady-state distribution. A time-invariant distribution function P (x) exists if and only if S(ri ) = ∓∞ and M(x) is finite at ri , i.e. |M(ri )| < ∞; i = 1, 2 (49) The existence of steady-state distribution P (x) – implying inaccessible boundaries – also implies the convergence of the nonstationary distribution functions P (x, t) towards P (x) as t → ∞. Existence of steady-state density function. A time-invariant probability density function p(x) exists if and only if the speed density m(x) satisfies r2 r2 m(x)dx < ∞, p(x) = mm(x), p(x)dx = 1 (50) r1 r1 where m is the normalizing constant. The conditions and formulas above can be applied directly when we have the same stochastic differential equation (drift and diffusion coefficients) for the whole interval of x. For more details, see Karlin and Taylor (1981) and Mandl (1968). Remark 2. It is well-known, (Karlin and Taylor, 1981, p. 359), that the solution (Itˆo-integral) to the stochastic differential equation (24) is the geometric Wiener process with the continuous sample paths (stochastic trajectories, realizations), 1 2 ∞ 2 L(t) = L0 exp{(n − β1 )t + β1 w1 (t)}; 2n ≷ β1 : lim L(t) = (51) 0 t→∞ 2 Moreover, the boundaries, zero or infinity, are inaccessible, as the sample paths (51) cannot attain any of the two boundaries in finite time. It is instructive to prove the latter statement as a prelude to the general procedure of proving inaccessibility. From (24) and (44), the scale density s(L) and the scale function S(L) become, cf. (44), L 2 nL/(β12 L2 )dL} = (L/L0 )−2n/β1 (52) s(L) = exp{−2 S(L) =
L0 L
L0
178
s(L)dL = L0 /(1 − 2n/β12 )[(L/L0 )1−2n/β1 − 1] (53) 2
Stochastic One-Sector and Two-Sector Models As S(0) = −∞ for 2n ≥ β12 , the latter is a sufficient parameter condition for the inaccessibility of the boundary: k = 0. But despite the finite S(0) = −L0 /(1 − 2n/β12 ) for 2n < β12 , the boundary k = 0 may still not be attainable. The speed density m(L) is, cf. (45), 2n/β12
m(L) = 2L2n/β1 /(β12 L2 ) = (L0 2
/β12 )L2n/β1 −2 2
(54)
and we must now, with (53)–(54) and (48), evaluate Σ(0) = 0
L0
4n/β 2
1 L [S(L) − S(0)]m(L)dL = 2 0 β1 (1 − 2n/β12 )
0
L0
dL = ∞ (55) L
Thus, Σ(0) = +∞ says that it takes infinite time to reach the zero boundary from any interior state, i.e., L = 0 is after all inaccessible for 2n < β12 , as it cannot be attained in finite time. The same analysis can be applied to boundary L = ∞, which is neither attainable in finite time. From this examination of the labor diffusion process (24) – which has no steady-state distribution – it is clear that boundary problems for the capital-labor ratio diffusion are essentially due to boundary problems associated with the capital stock diffusion process. 5.4.2 Boundary conditions – neoclassical growth models Labor growth and capital depreciation rates are uncertain From (30–34) with β3 = 0, and the CES, f = fi , (7–9), we have, σ = 1, dk = {sγk a − Θk}dt − βkdw, σ = 1, dk = {sγ[(1 − a) + ak
(σ−1)/σ σ/(σ−1)
]
(56) − Θk} dt − βk dw. (57)
Theorem 2. The sufficient conditions for the diffusion processes of one-sector neoclassical growth models to have inaccessible boundaries – with CES technologies, (56–57), and uncertainties in both labor growth and capital depreciation – are: k = 0 : 2(n + δ − sγaσ/(σ−1) ) ≤ β12 − β22 (58) σ < 1: k = ∞ : 2(n + δ) ≥ β12 − β22 k = 0 : always inaccessible (59) σ = 1: k = ∞ : 2Θ + β 2 ≥ 0 ⇔ 2(n + δ) ≥ β12 − β22 k = 0 : always inaccessible (60) σ > 1: k = ∞ : 2(n + δ − sγaσ/(σ−1) ) ≥ β12 − β22 179
Bjarne S. Jensen, Martin Richter The parametric conditions, (58)–(60), with strict inequalities, ensure the existence and the long-run convergence of the stochastic capitallabor ratio k(t) to a time-invariant (steady-state) probability distribution P (k). Proof: σ = 1 : For an arbitrary k0 ∈ (0, ∞), the scale density function is given by, cf. (44), (56),
k
s(k) = exp{−2
k0
sγk a − Θk dk}, β 2k2
0 0 and 2sγ/[(1 − a)β 2 ] > 0, cf. (7), the exponential term in (63) will dominate and explode for k → 0, and hence S(0) diverges, i.e., S(0) = −∞. Thus, the lower boundary k = 0 is always inaccessible, irrespective of the size of the drift and diffusion parameters. ∞ ∞ 2sγ 2 s(k)dk ≡ m0 k 2(Θ/β ) exp{ k −(1−a) }dk (64) S(∞) = (1 − a)β 2 k0 k0 Since 1 − a > 0, cf. (7), the divergence of S(∞) here only depends on the polynomial term in (64) with the exponent 2(Θ/β 2 ). Hence, divergence of S(∞) requires that 2(Θ/β 2 ) ≥ −1, or, equivalently, 2(n + δ) ≥ β12 − β22 . Thus, the upper boundary k = ∞ is inaccessible by imposing the parameter restriction stated in (59). σ = 1: From (57), we have the expressions, cf. (61), s(k) = exp{−2
k
k0
sγ[(1 − a) + ak (σ−1)/σ ]σ/(σ−1) − Θk dk} β 2k2
(65)
σ < 1 : From (65) and (11), we have, S(0) = lim
k→0
180
k
k0
0
s(k)dk = k0
(k/k0 )−2[sγa
σ/(σ−1) −Θ]/β 2
dk
(66)
Stochastic One-Sector and Two-Sector Models Hence, it follows from (66) that the divergence of S(0) to −∞ requires that the exponent must be less than or equal to −1, or equivalently, 2sγaσ/(σ−1) ≥ 2Θ + β 2 , which is the lower boundary condition in (58). From (65) and (11), we have, σ k ∞ 1 k 2Θ 2sγ(1 − a) σ−1 1 ( ) β2 exp{ ( − )} dk S(∞) = lim s(k)dk = 2 k→∞ k k β k k 0 0 k0 0 (67) The polynomial term in (67) decides the divergence of S(∞); it diverges to +∞ if the exponent 2Θ/β 2 ≥ −1, which gives the upper inaccessibility condition in (58). σ > 1: From (65) and (13), we have, σ k 0 1 k 2Θ 2sγ(1 − a) σ−1 1 2 β ( − )} dk S(0) = lim s(k)dk = ( ) exp{ 2 k→0 k β k k0 k0 k 0 0 (68) The exponential term in (68) will always explode for k → 0. Hence, S(0) is diverging, and accordingly, k = 0 is inaccessible, irrespective of parameter restrictions. From (65) and (13), we have, k ∞ σ/(σ−1) −Θ]/β 2 s(k)dk = (k/k0 )−2[sγa dk (69) S(∞) = lim k→∞
k0
k0
S(∞) is divergent, if and only if the exponent of the polynomial in (69) is larger than or equal to −1, which gives the parametric inaccessibility restriction as stated in (60). 2 Remark 3. Corresponding to the CD case, (59), Bourguignon (1974, pp. 153–54), gave the upper-boundary inaccessibility condition as 2b/c ≥ −1
⇔
2(n + δ) ≥ β12 − β22 − β1 β2 ρ12
(70)
which differs from our simpler expression in (59). The result (70) is due a “misprint” in his formula for dk (p. 146), equivalent to our (38–39). Thus, we observe that, in contrast to his result, (70), the correlation ρ12 has no implication for the inaccessibility of the upper-boundary. Incidentally, note that ρ12 does not enter the boundary condition for the CES cases, (58), (60). By the way, LHS of the boundary conditions (58–60) represent, with β1 = β2 = 0, the necessary and sufficient conditions for the existence of a non-trivial (non-zero and finite) deterministic steady state. Evidently, β1 > 0 makes it easier to avoid the trivial boundary k = 0, 181
Bjarne S. Jensen, Martin Richter but more likely to explode. Note that β2 > 0 makes it easier to avoid explosion, but more likely to hit the boundary k = 0. The economicmathematical intuition of such βi > 0 effects is left to the reader. The saving rate is uncertain It was seen in (30–34) that the uncertainty in saving behaviour will always introduce nonlinear terms in the diffusion coefficient. Inaccessible boundaries here raise conditions that clash with common, deterministic, dynamic regularity properties. By (30–34) with β1 = β2 = 0, λ3 = ∞, φ3 (k) = β3 , cf. (23), we get σ = 1,
dk = {sγk a − (n + δ)k}dt + β3 γk a dw,
(71)
σ = 1,
dk = {sf (k) − (n + δ)k} dt − β3 f (k) dw, f (k) : (9)
(72)
Theorem 3. The diffusion process (30–34) with CD or CES functions, (7), (9), and uncertainty only in the saving rate - β1 = β2 = 0, β3 = 0 - will have boundary properties and sufficient inaccessibility conditions as follows: k = 0 : 2[sγaσ/(σ−1) − (n + δ)] ≥ [β3 γaσ/(σ−1) ]2 (73) σ < 1: k = ∞ : always inaccessible k = 0 : inaccessible, if a > 12 ; poss.access. if a < 12 σ = 1: (74) k = ∞ : always inaccessible k = 0 : possibly accessible σ > 1: (75) k = ∞ : 2[sγaσ/(σ−1) − (n + δ)] ≤ [β3 γaσ/(σ−1) ]2 Proof: CD case. Applying (44) to (71) gives, 2(1−a)
s(k) = exp{
(n + δ)(k 2(1−a) + k0 ) − 2sγ(k 1−a − k01−a ) } β32 γ 2 (1 − a)
(76)
From (44) and (76), we have,
0
S(0) =
0
s(k)dk ≡ m0 k0
exp{ k0
(n + δ)k 2(1−a) − 2sγk 1−a }dk (77) β32 γ 2 (1 − a)
Since 1 − a > 0, S(0) will converge, and, accordingly, k = 0 may possibly be accessible. 182
Stochastic One-Sector and Two-Sector Models To decide whether k = 0 is in fact inaccessible, we must calculate Σ(0). In the CD case, we have, for the lower boundary k = 0, cf. (48), 0 [S(0) − S(k)]m(k)dk (78) Σ(0) = k0
The limit of the integrand S(k)m(k) in (78) is, cf. (44)–(45), (77), lim S(k)m(k) = k m0 k0 exp{[β32 γ 2 (1−a)]−1 [(n+δ)k 2(1−a) −2sγk 1−a ]}dk lim k→0 β32 γ 2 k 2a exp{[β32 γ 2 (1−a)]−1 [(n+δ)k 2(1−a) −2sγk 1−a ]} k→0
(79)
Since [β32 γ 2 (1 − a)]−1 > 0, the limit (79) converges iff 2(1 − a) > 1, i.e., a < 12 . Hence, with a > 12 , Σ(0), (78) will be divergent, and thus the lower boundary is inaccessible. From (44) and (76), we have, ∞ ∞ (n + δ)k 2(1−a) − 2sγk 1−a }dk (80) s(k)dk ≡ m0 exp{ S(∞) = β32 γ 2 (1 − a) k0 k0 Since 1 − a > 0, and k 2(1−a) is the dominating term, S(∞) will always diverge, i.e., k = ∞ is inaccessible. CES case. Applying (44) to (72) gives k sγ[(1 − a) + ak (σ−1)/σ ]σ/(σ−1) − (n + δ)k s(k) = exp{−2 dk} (81) β32 γ 2 [(1 − a) + ak (σ−1)/σ ]2σ/(σ−1) k0 σ < 1: From (81), we have, for small k, cf. (11), k 0 k sγaσ/(σ−1) − (n + δ) −1 exp{−2 k dk}dk S(0) = lim s(k)dk = k→0 k β32 γ 2 a2σ/(σ−1) k0 k0 0 0 −2[sγaσ/(σ−1) −(n+δ)] 2σ/(σ−1) 2 (k/k0 ) β3 γ 2 a dk (82) = k0
S(0) diverges if the exponent of k/k0 is less than or equal to −1, which immediately gives the condition (73). From (81), we have, for large k, cf. (11), ∞ k k σ/(σ−1) −(n+δ)k dk}dk S(∞) = lim k0 s(k)dk = k0 exp{−2 k0 sγ(1−a) 2 γ 2 (1−a)2σ/(σ−1) β 3 k→∞ ∞ (n+δ)(k2 −k02 )−2sγ(1−a)σ/(σ−1) (k−k0 )] }dk = k0 exp{ β32 γ 2 (1−a)2σ/(σ−1) ∞ n+δ 2s 2 ≡ m30 k0 exp{ β 2 γ 2 (1−a) (83) 2σ/(σ−1) k − β 2 γ(1−a)σ/(σ−1) k}dk 3
3
183
Bjarne S. Jensen, Martin Richter As 1 − a > 0, the constant denominators in (83) are positive, and since the k 2 term is the dominating term in the exponential expression, S(∞) will always be divergent; hence, the upper boundary is always inaccessible. σ > 1: From (81), we have, for small k, cf. (13), 0 k σ/(σ−1) −(n+δ)k s(k)dk = k0 exp{−2 k0 sγ(1−a) dk}dk β32 γ 2 (1−a)2σ/(σ−1) 0 n+δ 2s 2 ≡ m01 k0 exp{ β 2 γ 2 (1−a) (84) 2σ/(σ−1) k − β 2 γ(1−a)σ/(σ−1) k}dk
S(0) = lim
k
k→0 k0
3
3
Since 1 − a > 0, S(0) will always converge, and hence k = 0 may possibly be accessible. Whether in fact k = 0 is attainable in finite time requires similar evaluations as shown above, cf. (78)–(79), (55). From (81), we have, for large k, cf. (13), S(∞) = lim
k
k→∞ k0
s(k)dk = =
∞
∞ k0
exp{−2
(k/k0 ) k0
k k0
sγaσ/(σ−1) −(n+δ) −1 k dk} β32 γ 2 a2σ/(σ−1)
−2[sγaσ/(σ−1) −(n+δ)] 2 γ 2 a2σ/(σ−1) β3
dk
(85)
S(∞) diverges if the exponent of k/k0 is larger than or equal to −1, which is equivalent to the upper inaccessibility condition in (75). 2 With uncertainty in the saving rate, the drift and diffusion coefficients in (20)–(21) now have similar nonlinear elements, that, if dominating, will prevent us from satisfying a sufficient lower inaccessibility condition, as the scale function S(k) at k = 0 is now finite, whenever σ ≥ 1. The factor accumulation process is likely to be much more severely affected (large volatility) by uncertainty about the saving rate than by uncertainties in labor growth and depreciation rates. The lack of any parametric restrictions preventing the accessibility of the absorbing boundary k = 0 (implosion,“economic collapse”), cf. (74), (75), represents a critical stochastic dynamic model complication for the system (30–34) and a mathematical issue to be adequately resolved below. 5.4.3
General parameter uncertainty and inaccessible boundaries
To dampen the impact of the Wiener process dw3 , near k = 0 in our (30–34), the random element 3 in the saving parameter must be state-dependent, and to preserve (24)–(25) as a homogeneous stochastic dynamic system, the function φ3 (k) was chosen, cf. (23). 184
Stochastic One-Sector and Two-Sector Models From economic reasons, the actual shape of φ3 (k) on the domain k ∈ [0, ∞) is chosen as a monotonically increasing curve, but this curve should also – to avoid creating excessive saving parameter volatility – be bounded above by a horizontal asymptote. With these two stipulations upon relevant selections of φ3 (k), one choice might be the logistic (S-shaped) curve described by well-known exponential expression. But among the exponentials, a relevant and convenient choice of φ3 (k), with proper domain and range for our purposes, is (23). Theorem 4. The sufficient conditions for the general diffusion process (30–34) – with CES technologies and uncertainties in labor growth, capital depreciation, and saving rates – to have inaccessible lower and upper boundaries are: k= 0; k= 0; k = ∞; k = ∞;
σ1
: : : :
2(n + δ − sγaσ/(σ−1) ) ≤ β12 − β22 Always inaccessible 2(n + δ) ≥ β12 − β22 2(n + δ − sγaσ/(σ−1) ) ≥ β12 − β22 − Δ
(86) (87) (88) (89)
Δ ≡ [β3 γaσ/(σ−1) ]2 − 2ρ23 β2 β3 γ)aσ/(σ−1) . Proof: The hyperbolic function φ3 (k) = β3 tanh(λ3 k), (23), is φ3 (k) = β3 tanh(λ3 k) = β3 (eλ3 k − e−λ3 k )/(eλ3 k + e−λ3 k ),
k ≥ 0 (90)
It is well-known and easily verified from (90) that for small k : large k :
φ3 (k) ∼ β3 λ3 k, ⇔ φ3 (k)/β3 λ3 k → 1 as k → 0 (91) ⇔ φ3 (k)/β3 → 1 as k → ∞ (92) φ3 (k) ∼ β3 ,
Lower boundary. σ = 1: With the CD production function (7), the scale density function s(k) now becomes, cf. (44), (90), k [s − ρ13 β1 φ3 (k)]γk a − Θk dk} (93) s(k) = exp{−2 2 2 2 2 2a − ρφ (k)γk (1+a) 3 k0 β k + φ3 (k)γ k Since a < 1, the dominating term in the numerator and the denominator of (93) becomes, for small k, cf. (91), k 2sγ sγk a −(1−a) s(k) ∼ exp{−2 dk} = exp{ (k −(1−a) −k0 )} (94) 2 2 2 β k (1 − a)β k0 0 Since 1−a > 0, it is seen, from (94), that S(0) = k0 s(k)dk is diverging at k = 0, cf. (63), and hence, the lower boundary is inaccessible. 185
Bjarne S. Jensen, Martin Richter σ < 1: With the CES function (9), the scale function with the dominating terms becomes, cf. (9), (11), (91), (94), S(0) = lim
k→0
k
0
s(k)dk =
k0
(k/k0 )2[Θ−sγa
σ/(σ−1) ]/β 2
dk
(95)
k0
Hence, it follows from (95), that the divergence of S1 (0) requires that the exponent 2[Θ − sγaσ/(σ−1) ]/β 2 ≤ −1, which is the lower boundary condition in (86), cf. (66). σ > 1: With the CES function (7), the scale function with the dominating terms becomes, cf. (13),
0
σ
2sγ(1 − a) σ−1 −1 −1 (k −k0 )}dk (96) S(0) = lim s(k)dk = lim exp{ ¯b2 k→0 k k→0 k 0 0 k
where ¯b2 ≡ β 2 + β32 λ23 γ 2 (1 − a)2σ/(σ−1) − ρβ3 λ3 γ(1 − a)σ/(σ−1) > 0. Since the parameter 2sγ(1 − a)σ/(σ−1) /¯b2 in the exponential term is always positive, it seen by (96) that S(0) is always diverging; hence, k = 0 is inaccessible, irrespective of parameter restrictions, cf. (68). Upper boundary. With the CD function (7), the scale density s(k) becomes, cf. (44), (90), s(k) = exp{−2
k
k0
[s − ρ13 β1 φ3 (k)]γk a − Θk dk} β 2 k 2 + φ23 (k)γ 2 k 2a − ρφ3 (k)γk (1+a)
(97)
Since a < 1, the dominating term in the numerator and denominator of (97) becomes, for large k, cf. (92), s(k) ∼ exp{−2
k
k0
−Θk 2 dk} = (k/k0 )2Θ/β β 2k2
(98)
The divergence of S(∞) from (98) is analogous to the result in (64), (59); hence, we have (88) for σ = 1. σ < 1: With the CES function (9), the scale density s(k) becomes, keeping the dominant terms for large k, cf. (97), (92), s(k) ∼ exp{−2
k
k0
−Θk 2 dk} = (k/k0 )2Θ/β 2 2 β k
(99)
which is the same as (98), and the divergence of S(∞) is analogous to (67), (58). 186
Stochastic One-Sector and Two-Sector Models σ > 1: From (97), (9), (13), (92), we have, for large k, k (s − ρ13 β1 β3 )γaσ/(σ−1) k − Θk dk} s(k) ∼ exp{−2 2 2 2σ/(σ−1) 2 2 2 k − ρβ3 γaσ/(σ−1) k 2 k0 β k + β3 γ a k ¯2 = exp{−2 (¯ a/¯b2 )k −1 dk} = (k/k0 )a¯/b (100) k0 σ σ σ a ¯ ≡ (s − ρ13 β1 β3 )γa σ−1 − Θ, ¯b2 ≡ β 2 + β32 γ 2 a σ−1 − ρβ3 γa σ−1 > 0. Now S(∞) from (100) diverges, cf. the analogue (85) and (75), if the exponent is large: a ¯/¯b2 ≥ −1. Rewriting the exponent, using (32), (34), gives our condition (89), where Δ ≡ ¯b2 −β 2 +2ρ13 β1 β3 γaσ/(σ−1) = 2 [β3 γaσ/(σ−1) ]2 − 2ρ23 β2 β3 γaσ/(σ−1) .
The assumptions about φ3 (k), (23) have removed the uncertainty in saving rates (21) entirely from the lower boundary problems with, σ ≥ 1, cf. (74) and (75), because, with (23), we now have that (87) holds, irrespective of the size of any drift and diffusion parameters. With σ < 1, proper parameter restrictions (86) can safeguard against attaining k = 0. Thus, for any substitution elasticity of the CES technology, the stochastic neoclassical growth model of Theorem 1, (29–34), is made fully workable without any boundary problems (extinction, explosion). 5.4.4 Neoclassical SDE and asymptotic non-stationarity The relaxation of the sufficient inaccessibility condition S(∞) = ∞ does not itself in the long run imply an explosion. Still, to avoid any risk of implosion, we want to keep the sufficient condition S(0) = −∞. But, together with a finite S(∞), we have the following well-known implications (with probability one), [S(0) = −∞ ∧ S(∞) < ∞] ⇒ lim k(t) = ∞ ⇒ lim E[k(t)] = ∞ (101) t→∞
t→∞
Within our stochastic neoclassical growth model, a finite S(∞) is simply equivalent to reversing the inequality in (88)–(89). For σ ≤ 1, the reverse of (88) is 2(n + δ) ≤ β12 − β22 . The latter implies that k(t) → ∞ as t → ∞, but it is a pathological case, as the reversal of (88) also implies that L(t) → 0 (although never reached in finite time). In short, no relevant stochastic endogenous growth is possible with σ ≤ 1. Hence, as in the deterministic case, stochastic endogenous economic growth requires that the marginal product of capital is bounded below, i.e., σ > 1, cf. (13). 187
Bjarne S. Jensen, Martin Richter By reversing (89), the sufficient condition of persistent growth becomes, cf. (101), (87) σ
σ > 1 : S(∞) < ∞ ⇔ sγa σ−1 ≥ n + δ + 1/2(−β12 + β22 + Δ) (102) which is the stochastic analogue to the deterministic condition (with only n + δ on RHS) of endogenous (persistent) growth; see Jensen and Wang (1997, p. 93), Jensen and Larsen (1987). We note from (102) that it is generally more difficult (higher saving rates are required) to achieve persistent economic growth per capita in the face of uncertainty – as n − 12 β12 > 0 is now taken for granted in (102), and Δ is always positive when ρ23 = 0, cf. (89). Uncertainties in the accumulation of capital, (23), (25), (β2 = 0, β3 = 0) make the stochastic analogue (102) harder to satisfy. The rapidity of stochastic growth is not directly seen by (102). However, with S(∞) < ∞, the stochastic differential equation (30) is, asymptotically, σ dk ∼ a ¯ k dt + ¯b k dw ≡ { (s − ρ13 β1 β3 )γa σ−1 − Θ } k dt + (103)
[ β 2 + β32 γ 2 a
2(σ−1) σ
σ
1
− ρβ3 γa σ−1 ] 2 k dw
i.e., geometric Wiener processes with sample paths: a − ¯b2 /2) t + ¯b w(t)} ; 2¯ a > ¯b2 k(t) ∼ k0 exp{(¯
(104)
It is easily verified that the exponential growth condition, a ¯ − 12 ¯b2 > 0 in (104) is equivalent to (102). Thus, the stochastic condition (102) is indeed the analogue of deterministic exponential per capita growth in the neoclassical growth model. 5.5
Explicit steady-state distribution with CD technologies
Having obtained the conditions for the existence of and convergence to a steady-state (time-invariant) distribution, cf. (59), we also want to obtain as a benchmark – with CD sector technologies – a closed form expression for the time invariant probability density function p(k) and the distribution function P (k) of the diffusion process (56). It turns out that the benchmark distribution function P (k) for the CD economy can be expressed by gamma Γ(α) and incomplete gamma functions Γ(α, x0 ), which are generally defined, respectively, by the improper integrals, ∞ ∞ α−1 −x Γ(α) ≡ x e dx, Γ(α, x0 ) ≡ xα−1 e−x dx, α > 0 (105) 0
188
x0
Stochastic One-Sector and Two-Sector Models Theorem 5. The time invariant (steady-state) distribution P (k) for the stochastic process (56), cf. Theorem 1, will have a density function p(k) if and only if: 2Θ + β 2 > 0
2(n + δ) > β12 − β22
⇔
(106)
With (106), the time invariant probability density function p(k) is in closed form, p(k) = c0 k −2[1+Θ/β ] exp{−ck −(1−a) }, 2
0 0
⇔ ⇔
n + δ > β12 + ρ12 β1 β2 2(n + δ) > 3β12 + 4ρ12 β1 β2 + β22
(111) (112)
With (111)–(112), the steady-state distribution P (k), (110), will have first-order and second-order moments given by, E(k) =
Γ(α∗∗ ) 2(1−a)−1 2 Γ(α∗ ) (1−a)−1 c c , E(k 2 ) = , σ = E(k 2 ) − [E(k)]2 Γ(α) Γ(α) (113)
where α∗ = α − (1 − a)−1
and
α∗∗ = α − 2(1 − a)−1
(114)
and α was given by (108). 189
Bjarne S. Jensen, Martin Richter Proof: For the stochastic dynamic system (56), the speed densities are, cf. (44)–(45), m(k) =
exp{−2
k
sγ ua − Θu β 2 u2 2 β k2
˜ k
du}
0 0
⇔
Θ > 0,
α∗∗ > 0
⇔
2Θ > β 2
(119)
which gives the moment existence restrictions (111–112), and Theorem 5 is established. 2 5.6
Sample paths and asymptotic densities with CD and CES technologies
We include simulations of both sample paths and asymptotic (longrun stationary) densities of the stochastic growth models. For different sets of model parameters, we calculate the steady-state values (mean, 190
Stochastic One-Sector and Two-Sector Models mode, standard deviation) of the long-run stationary processes, or alternatively, simulate particular sample paths with infinity as attractor for parameters with stochastic endogenous (persistent) growth. If they exist, steady-state values (κ) of the capital-labor ratio in deterministic one-sector growth models are the critical points of (19): n+δ f (κ) = ] ; APK (κ) = (n + δ)/s (120) [ k˙ = 0 ⇔ k(t) = κ ] ⇔ [ κ s Closed-form expressions for the root values (κ) - LHS of (120) with CD and CES technologies, (122) - are given in Table 1, with explicit expressions for other steady-state properties, cf. (121–123), (4), 1 − a 1/σ k a σ σ−1 σ−1 f (k) = γ (1 − a) + ak σ
ω(k) = f (k) = γk a ; f (k) = γak a−1 ;
1/(σ−1) f (k) = γa a + (1 − a)k −(σ−1)/σ
(121) (122) (123)
These formulas of Table 1 show explicitly how six basic (structural) parameters - factor accumulation parameters (s, n, δ) and technology parameters (γ, a, σ) - determine various steady-state values (certainty equivalents). The tabulated CES formulas are rather elaborate parametric expressions, except APK (κ) or its reciprocal (K/Y ) at (κ). The actual invariance of APK (κ) to changes in technology parameters is a peculiarity that is solely tied to one-sector growth models [hence absent in Table 3 below] together with the concept of steady ˙ ˙ states (balanced growth : L/L = K/K = Y˙ /Y ), and the assumptions ˙ of : i) L/L = n, and ii) constant gross saving rates, (s). By (19): ˙ d ln k(t)/dt = k/k = 6 k = s APK (k) − (n + δ)
(124)
This growth equation (124) has played an important role in empirical convergence studies, cf. Sala-i-Martin (1996, p. 1342), Quah (1996), Barro and Sala-i-Martin (1992), Barro (1991). Here the invariance of APK (κ), (120),(124), allows a simple numerical consistency check of all the calculations of f (κ) and (κ) in Table 2. Although little commentary is allowed or necessary here, the extensive set of CD and CES parameter cases in Table 2 - illustrating formulas of Table 1 - deserve careful study and scrutiny, as such systematic steady-state numbers (certainty equivalents) are seldom shown; Table 2 is also important as benchmark for corresponding asymptotic expectations, E(k), and the stochastic growth model simulations. 191
Bjarne S. Jensen, Martin Richter In addition to the six basic accumulation and technology parameters for (κ), E(k) is also affected by the uncertainty (volatility) and correlation parameters : (β1 , β2 , β3 , β4 , λ3 , λ4 , ρij ), cf. (20–27), that are involved in the drift and diffusion cofficients, (31–34), of the stochastic capital-labor ratio, k(t). As most cases in Table 2 show, despite their theoretical distinctions, the actual values of (κ) and E(k) nearly coincide. Moreover, as the CD cases (8-9) show, the additional impact of (β1 = 0.01, β2 = 0.03) on E(k) needs 5 decimals to be seen. In the CD cases (14-15) and CES case (10), the starred values of (β3 , λ3 ) indicate that they are both zero and instead represent (β4 , λ4 ). The CD cases (13-14) and CES cases (9-10) in Table 2 show that interchanging the level size and the uncertainty about (s) and (γ) are equivalent as to the impacts on : (κ), E(k), σ(k), mode (k). The stationary densities of p(k) are obtained, cf. (115-117), by numerically integrating and normalizing their respective speed densities, (44-45), with Mathematica. Some p(k) are exhibited in Figures 5.1 - 5.10. The statistics in Table 2 - expectations, standard deviations, cf. (118) - for the stationary distribution are also calculated by using Mathematica, but closed forms of σ(k) are used as a control whenever they exist, cf. CD, (113). The modes are obtained by solving (125). In Merton (1975), with a one-sector CD-technology and uncertainty only in labor growth, it follows, Merton (1975, p. 383, footnote 1), that the deterministic steady-state value (κ), has the same value as the mode of the stationary (steady-state) distribution, and that (κ) is not equal to the expectation E(k) of the stationary distribution. But with uncertainty in both labor and depreciation rates, β2 = 0, the mode and (κ) do not necessarily coincide. The mode(k) (the most probable long-run value of k) can be obtained by diffentiating the density p(k) or speed density m(k). By setting it equal to zero, we have, cf. (44)–(45), p (k) = 0 ⇔ m (k) = 0 ⇔ b(k)b (k) = a(k)
(125)
Solving (125), together with (30–34) and β3 = 0, (56), gives 1
1
mode (k) = [γs/(Θ + β 2 )] 1−α = [γs/(n + δ + β22 + ρ12 β1 β2 )] 1−α (126) Since with CD, Table 1, we have κ = [γs/(n + δ)]1/(1−α) , and mode (k) = κ ⇔ β2 = −ρ12 β1 ; β2 = 0
(127)
This equality can only be satisfied if β1 ≥ β2 . With ρ12 = 0, β2 > 0, and CD, cf. (126), we always find: mode(k) < κ. See also Table 2. 192
Stochastic One-Sector and Two-Sector Models Although the mode does not coincide with the expectation, the stationary densities, p(k) - without being normal and frequently very spiked density curves - are often close to being symmetric (see the numbers in Table 2 and their shapes in Figures 5.1 - 5.10). But for some stationary distributions with basic parameters close to the boundary of stationarity (endogenous growth), we find heavy-tailed distributions with an expectation significantly larger than the mode. We have chosen in Fig. 5.1 - 5.11 to exhibit the sample paths for k(t), ω(t), and y(t). All simulations are done using a simple Euler scheme of the underlying stochastic differential equations (SDE). Hence, in one-sector models, we simulate the sample paths for the capital-labor ratio, k(t), of the SDE, (30). The sample path (trajectory) of the stochastic process k(t), (30), is formally given by t t a(k[u; ω]) du + b(k[u; ω]) dwu [ω] (128) k(t) = k[t; ω] = k(0) + 0
0
where k(0) is a fixed initial condition and [ω] symbolizes a particular realization of the Wiener process. This sample path (128) is thus approximated by the Euler scheme, Kloeden & Platen (1995). The random numbers (realizations) used in the simulations are generated by the Ran2 generator from Numerical Recipes in C++ with the same initial seed. All processes are sampled with the same step size. After having obtained the simulated sample path k(t), by numerically solving the SDE, (128), the sample paths for the wage-rental ratio, M RS = ω(t), and labor productivity, APL = y(t) - with the same realization of the Wiener process - are determined, for every simulated time point, by inserting the sample path k(t), (128), into the CD-CES equations, (121–122): 1−a k(t)1/σ a σ σ−1 σ−1 y(t) = γ (1 − a) + ak(t) σ
ω(t) = y(t) = γk(t)a ;
(129) (130)
As observed in Table 2 and from (129), the numerical values of k(t) in Fig. 5.1 - 5.11 are located below (above) the values of ω(t), in CD cases, if a < 1/2 (a > 1/2), and if these conditions in CES cases are combined with σ ≤ 1 (σ ≥ 1). The values of y(t), (130), are, with the selected size of (γ), located below the sample path of k(t).
193
194 1
Per capita saving: sL (κ)
sf (κ)
(1 − s)f (κ)
Per capita consumption: cL (κ)
γs n+δ
a
1 1−a
s n+δ
Capital share: K (κ) = 1 − L (κ)
Wage–rental ratio: ω(κ)
1−a a
a(n+δ) s
Marginal product of K: M PK (κ) = f (κ)
Capital–output ratio: K/Y, κ/f (κ)
n+δ s s n+δ
s n+δ a 1−a
a 1−a
1 1−a
(1 − a)γ 1−a
γs n+δ
γ 1−a
1
CD (σ = 1)
Average product of K: APK (κ), f (κ)/κ
Marginal product of L: M PL (κ)
Average product of L: APL = f (κ)
Capital-labor ratio: K/L = κ
Variables
a
γs n+δ
σ 1−σ −1 σ 1−σ γs σ−1 σ
1−σ σ
a
γs n+δ
σ−1 σ
sf (κ)
(1 − s)f (κ)
a
a
1−σ σ
s
n+δ σ1 γs n+δ
σ−1
aγ σ σ 1−a 1−σ 1
n+δ s s n+δ
−1
1 1−σ
σ γ(1 − a) σ−1 1 − a n+δ 1 1−σ γs σ−1 σ σ γ(1 − a) σ−1 1 − a n+δ
a
σ 1−a σ−1 1
CES (σ = 1)
Table 1. Steady-state values (certainty equivalents) of one-sector growth models
Bjarne S. Jensen, Martin Richter
s
0.20 0.20 0.20 0.20 0.20 0.25 0.25 0.30 0.30 0.20 0.20 0.20 0.20 0.30 0.33 0.25
0.20 0.20 0.20 0.20 0.20 0.25 0.25 0.20 0.20 0.30 0.20 0.20 0.20 0.25
0.25 0.20 0.25 0.30
case
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 2 3 4
0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.08 0.08 0.05 0.05 0.05 0.05
0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.05
δ
1.0 1.0 1.0 3.0 1.0 1.0 1.0 1.0 0.3 0.2 1.0 1.0 1.0 1.0
1.0 1.0 1.0 3.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 0.3 0.2 1.0 1.0
γ
0.25 0.40 0.60 0.60 0.40 0.40 0.60 0.60 0.40 0.40 0.40 0.40 0.40 0.40
0.20 0.25 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.60
a
0.5 0.5 0.5 0.5 1.5 1.5 1.5 1.5 1.5 1.5 2.0 3.0 7.0 3.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
σ
0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.00 0.00 0.01 0.01 0.01 0.01
0.01 0.01 0.01 0.01 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.01
β1
0.03 0.03 0.03 0.03 0.03 0.03 0.03 0.03 0.00 0.00 0.03 0.03 0.03 0.03
0.03 0.03 0.03 0.03 0.00 0.00 0.00 0.00 0.03 0.03 0.00 0.00 0.00 0.00 0.00 0.03
β2
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1∗ 0.0 0.0 0.0 0.0
0.00 0.00 0.00 0.00 0.10 0.05 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10∗ 0.10∗ 0.10
β3
0.02 0.02 0.02 0.02
0.05 0.05 0.05 0.05
1.0 1.0 1.0 1.0
0.40 0.60 0.60 0.60
4.0 2.0 2.0 2.0
0.01 0.01 0.01 0.01
0.03 0.03 0.03 0.03
0 0 0 0
Parameters - Endogenous growth models
0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.01 0.01 0.02 0.02 0.02 0.02
0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.01 0.01 0.01 0.01 0.01 0.02
n
Parameter values - stationary models
0 0 0 0
0 0 0 0 0 0 0 0 1 1∗ 0 0 0 0
0 0 0 0 1 1 1 1 1 1 1 1 1 1∗ 1∗ 1
λ3
3.476 4.095 5.643 19.929 7.633 13.148 401.664 55.715 0.523 0.523 9.806 15.469 191.698 99.222
3.715 4.054 5.753 35.900 5.753 8.345 8.345 6.240 6.240 3.175 3.784 12.014 0.509 0.509 8.719 24.105
κ
3.479 4.097 5.642 19.949 7.635 13.147 398.071 55.493 0.523 0.523 9.812 15.507 203.598 99.721
3.718 4.057 5.753 35.900 5.736 8.341 8.329 6.22862 6.22858 3.162 3.770 11.969 0.508 0.509 8.707 23.960
E(k)
3.428 4.035 5.552 19.657 7.412 12.724 345.493 51.178 0.518 0.518 9.429 14.502 105.010 82.518
3.656 3.986 5.632 35.143 5.685 8.329 8.282 6.194 6.102 3.072 3.726 11.833 0.504 0.504 8.719 22.949
mode(k)
0.309 0.375 0.539 1.756 0.988 1.795 120.230 12.365 0.033 0.033 1.471 3.021 182.373 34.630
0.353 0.398 0.631 3.936 0.694 0.403 0.806 0.600 0.830 0.541 0.517 1.644 0.033 0.033 0.079 4.312
σ(k)
36.252 25.157 21.228 264.766 5.815 8.356 36.293 9.725 0.974 0.974 4.697 3.737 3.178 6.944
14.858 12.163 8.629 53.849 8.629 12.517 12.517 9.360 9.360 4.762 5.676 18.021 0.763 0.763 13.078 16.070
ω(κ)
0.350 0.350 0.350 0.350 0.350 0.280 0.280 0.350 0.450 0.300 0.350 0.350 0.350 0.280
0.350 0.350 0.350 0.350 0.350 0.280 0.280 0.333 0.333 0.500 0.450 0.450 0.450 0.300 0.273 0.280
f (κ)/κ
0.031 0.049 0.074 0.024 0.199 0.171 0.257 0.298 0.157 0.105 0.237 0.282 0.344 0.262
0.070 0.088 0.140 0.140 0.140 0.112 0.112 0.133 0.133 0.200 0.180 0.180 0.180 0.120 0.109 0.168
f (κ)
Limits for k → ∞ ∞ 0.295 0.295 ∞ 0.360 0.360 ∞ 0.360 0.360 ∞ 0.360 0.360
1.217 1.433 1.975 6.975 2.672 3.681 112.466 19.500 0.235 0.157 3.432 5.414 67.094 27.782
1.300 1.419 2.014 12.565 2.014 2.336 2.336 2.080 2.080 1.587 1.703 5.406 0.229 0.153 2.378 6.749
f (κ)
Model characteristics
0.088 0.140 0.210 0.070 0.568 0.611 0.917 0.851 0.349 0.349 0.676 0.805 0.984 0.935
0.20 0.25 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.60
(κ) K
2.857 2.857 2.857 2.857 2.857 3.571 3.571 2.857 2.222 3.333 2.857 2.857 2.857 3.571
2.857 2.857 2.857 2.857 2.857 3.571 3.571 3.000 3.000 2.000 2.222 2.222 2.222 3.333 3.667 3.571
K/Y
Table 2. Numerical cases for one-sector growth models: CD (σ = 1) and CES
n+δ s
0.350 0.350 0.350 0.350 0.350 0.280 0.280 0.350 0.450 0.300 0.350 0.350 0.350 0.280
0.350 0.350 0.350 0.350 0.350 0.280 0.280 0.333 0.333 0.500 0.450 0.450 0.450 0.300 0.273 0.280
Bjarne S. Jensen, Martin Richter Among the alternative initial values k(0) for sample paths (128) and deterministic trajectories (smooth curves) are: [k(0) = 1, 10, 30]; corresponding values of ω(0), y(0), follow from (129–130). With different parameter cases selected from Table 2, the relevant scaling of the vertical axis in Fig. 5.1 - 5.11 is, for illustrative visual and comparative purposes, a delicate graphic problem (in particular without colors). However, the salient features of many sample paths arising from stationary and non-stationary stochastic dynamics are apparent in our Fig. 5.1 - 5.11 for one-sector stochastic growth models. In the CD case of Fig. 5.1, the influence of the respective initial values upon sample paths of ω(t) has mostly disappeared around (t = 100), cf. (LHS, transient movements), and the two curves have henceforth visually merged on the long-run time interval, (t = 100 − 500). The asymptotic (stationary) probability density of ω(t) is also depicted in Fig. 5.1, cf. (RHS, light). The sample paths of k(t) - and (transient/long-run) deterministic (black) trajectories and a horizontal line for (κ) - are similarly shown in Fig. 5.1, together with p(k), (115-117), cf. (RHS, dark ). To avoid clutter, the sample paths of y(t) are presented without a density curve on the RHS of Fig. 5.1. The overall pattern of transient motion and long-run evolution for the sample paths of Fig. 5.1 is essentially repeated for their alternative CD and CES sample paths, as exhibited in Fig. 5.2 - 5.10. But, evidently, the higher the long-run (steady-state level) of the sample paths is located, the larger is the volatility of the process (and the standard deviation of its asymptotic density) - simply because the diffusion coefficients, b(k), (33-34), are increasing functions of (k). It is noteworthy that in the short-run (transient motion), the upward (downward) parts of sample paths looks more regular (monotone) and deceptively less noisy than the underlying reality (process). The uncertainty (volatility) is hidden by the fact that the occurrence of some negative random shocks from the diffusion term, b(k) dw, have been masked (absorbed) by the temporary upwards contribution from the dominating drift term, a(k) dt - which later loses its dominance and then leaves the scene free for the volatility effects of the diffusion term, b(k) dw. Only with a low speed (drift) are bicycles susceptible to shocks and show erratic motions. Remark 4. For the stationary cases, the “mean reverting effect” for the simulated sample paths, k(t), is ensured by strong mixing properties of the processes. This has the effect that, far from the equilibrium, the “mean reverting effect kills the volatility of the process”. Closer to the mode of the distributions, we have more volatility, and the 196
Stochastic One-Sector and Two-Sector Models process behavior is more like a random walk. The mean reverting effect is also depicted in expectations, E [k(t)], since E [k(t)] converges exponentially fast towards the stationary mean value. As mentioned at the beginning of this section, parametric variation within the basic stochastic growth model (30-34) can also generate non-stationary stochastic processes with persistent (endogenous) growth solutions (sample paths) for the capital-labor ratio, k(t). We will briefly exhibit the stochastic solutions to (57) under parametric CES regimes that satisfy the fundamental sufficient (with probability one) condition (102) for long-run per capita growth - which for (57) [with β3 = 0] just corresponds to reversing the condition, (60). Condition (102) with (β3 = 0) is easily verified to be satisfied by the last four CES cases given in Table 2 [and of course by no other parameter cases listed in Table 2; CES case 13 is close to, but not sufficient as : 0.0687 < 0.0704. The size of the parameters (a) and (s) are seen in Table 2 to be critical for reducing the actual size of the substitution elasticity (σ) that is required for satisfying (102). The endogenous stochastic growth paths for k(t), y(t) and ω(t), cf. (128–130), for the CES cases (3-4) are simulated and displayed in Fig. 5.11. The time scale (unit: year, quarter, month) of economic growth models is seldom given much attention. However, whatever unit is appropriate, the sample paths in Fig. 5.11 demonstrate the character of the possible growth solutions to the stochastic process (57). We see also that the growth effects of a higher saving rate are significant on an extended time horizon. Moreover, as mentioned above, for the transient upward part of sample paths, a strong drift term can now, besides generating long-term growth, also absorb negative shocks along sample paths; fast motion (growth) is helpful. Such a stochastic growth path will, like deterministic growth, exhibit a pronounced tendency for monotone evolution. For large values of k(t), the SDE (57) is approximated by an analogously parameterized geometric Wiener process, (103), with sample paths, (104), and their explicit expectation, standard deviation given by, cf. Dixit and Pindyck (1994, p.71), de La Grandville (2001, p.292): E [k(t)] ∼ k0 ea¯ t ,
¯2
1
σ(t) = k0 ea¯ t (eb t − 1) 2
(131)
Incidentally, we note that (131) is the one-dimensional version for the expectation vector and the covariance matrix of the GWD (Geometric Wiener Diffusion) model, cf. Jensen et al (2007, Theorem 4, p. 88). Geometric Wiener processes have no asymptotic stationary density functions (curves) as shown in Fig. 5.1 - 5.10. 197
Bjarne S. Jensen, Martin Richter 30
30
30
25
25
25
20
20
20
15
15
Ωt 15
10
10
10
5
5
20
40
60
80
100
kt yt 100
200
300
400
5
500
0.5
1.
Figure 5.1: CD: case 1
40
40
40
35
35
35
30
30
30
25
25
25
20
20
20
15
15
15
10
10
5
5
Ωt kt
10 5
yt 20
40
60
80
100
100
200
300
400
0.25 0.5 0.75 1.
500
Figure 5.2: CD: case 3
80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5
80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 20
40
60
80
100
Ωt
kt
yt
100
200
300
Figure 5.3: CD: case 4
198
400
500
80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0.05 0.1 0.15 0.2
Stochastic One-Sector and Two-Sector Models 40
40
40
35
35
35
30
30
30
25
25
25
20
20
20
15
15
15
10
10
5
5
Ωt kt
10 5
yt 20
40
60
80
100
100
200
300
400
0.25 0.5 0.75 1.
500
Figure 5.4: CD: case 8
40
40
40
35
35
35
30
30
30
25
25
25
20
20
20
15
15
15
10
10
5
5
Ωt kt
10 5
yt 20
40
60
80
100
100
200
300
400
0.25 0.5 0.75 1.
500
Figure 5.5: CD: case 9
40
40
40
35
35
35
30
30
25
25
20
20
15
15
10
10
5
30 kt Ωt
40
60
80
100
20 15 10
yt
5 20
25
100
200
300
400
500
5 0.05 0.1 0.15 0.2
Figure 5.6: CD: case 16
199
Bjarne S. Jensen, Martin Richter 50 45 40 35 30 25 20 15 10 5
50 45 40 35 30 25 20 15 10 5 20
40
60
80
100
Ωt
kt yt 100
200
300
400
50 45 40 35 30 25 20 15 10 5 0.2 0.4 0.6 0.8 1.
500
Figure 5.7: CES: case 2
50 45 40 35 30 25 20 15 10 5
50 45 40 35 30 25 20 15 10 5 20
40
60
80
100
Ωt
kt yt 100
200
300
400
50 45 40 35 30 25 20 15 10 5 0.2 0.4 0.6 0.8 1.
500
Figure 5.8: CES: case 3
15
15
15
10
10
10 kt Ωt
5
5
5 yt
20
40
60
80
100
100
200
300
Figure 5.9: CES: case 5
200
400
500
0.3 0.6 0.9 1.2 1.5
Stochastic One-Sector and Two-Sector Models The uncertainty (volatility) expressed by σ(t) in (131) seems overwhelming. However, the formula (131) describes the potential volatility associated with all (infinite) possible sample paths (realizations, stochastic simulations) for k(t) = k[t; ω], (128); hence (131) says nothing about an individual (single) sample path (realization, simulation), however erratic it may appear. Thus, the σ(t) expression of volatility in (131) does not contradict the calm picture of the evolutions (sample paths) exhibited in Fig. 5.11. Economic history (society and individual) is fortunately not repeated (replayed), and the unique growth histories (paths), although uncertain, look like the calm (monotone) sample paths exhibited in Fig. 5.11. Evidently, stochastic growth models contribute to our understanding of historical time series observed from growing economies. 15
15
10
10
5
5
20
40
60
80
15
kt
10
Ωt yt
100
100
200
300
400
5
500
0.6 1.2 1.8 2.4 3. 3.6
Figure 5.10: CES: case 11 200
900 kt
kt
180
800
160
700
140 600 120 500 100 400 80 yt
yt
300 60 200
40
100
20 Ωt 20
40
60
80
100
Ωt 20
40
60
80
100
Figure 5.11: CES: case 3 and 4 - endogenous growth
201
Bjarne S. Jensen, Martin Richter 5.7
General equilibria of two-sector economies
Great emphasis was naturally first given to labor and capital accumulation in aggregate (one-sector) growth models. An extensive literature on two-sector growth models, however, began in the 1960s. The seminal work on two-sector growth models with flexible sector technologies was done by Uzawa (1961-62, 1963), Solow (1961-62), Inada (1963), Drandakis (1963). The main expositions and references to the early two-sector growth literature are: Stiglitz & Uzawa (1969), Burmeister & Dobell (1970), Wan Jr. (1971), Gandolfo (1980). The study of general equilibrium dynamics in two-sector and multisector growth models has been reviewed and extended in Jensen (2003), Jensen & Larsen (2005) - with emphasis on factor allocation, output composition, and the dualities for commodity and factor prices. 5.7.1
Factor Endowment Allocation and Prices
We now consider an economy consisting of a capital good industry (sector) and a consumer good industry, labeled, i = 1, 2, respectively. The factor endowments, total labor force (L) and the total capital stock (K), are inelastically supplied and are fully employed (utilized): L K k λ L1
= = ≡ ≡
L1 /L + L2 /L ≡ λL1 + λL2 ≡ 1, L 1 + L2 , K1 + K2 , K1 /K + K2 /K ≡ λK1 + λK2 ≡ 1, K/L ≡ λL1 k1 + λL2 k2 ≡ k2 + (k1 − k2 )λL1 , (k − k2 ) / (k1 − k2 ) , λKi ≡ (ki /k) λLi , k1 = k2
(132) (133) (134) (135)
where the factor allocation fractions are denoted λLi , λKi , (132-133). Free factor mobility between the two industries and efficient factor allocation impose the common MRS condition, cf. (6), ω = ω1 (k1 ) = ω2 (k2 ),
(136)
For the variables k1 and k2 to satisfy (136), it is, beyond (2) and (6), further required that the intersection of the sectorial range for ω1 (k1 ) and ω2 (k2 ) is not empty, ωi (ki ) ∈ Ωi = [ω i , ω i ] ⊆ R+ , ω ∈ Ω ≡ Ω1 ∩ Ω2 = [ω, ω] = ∅, (137) The two industries are assumed to operate under perfect competition (zero excess profit); absolute (money) input (factor) prices (w, r) are the same in both industries; and absolute (money) output (product, 202
Stochastic One-Sector and Two-Sector Models commodity) prices (P1 , P2 ) represent unit cost. Hence, we have the competitive producer equilibrium equations, w = Pi · M PLi , r = Pi · M PKi ; ω = w/r, Pi = 0. (138) Yi = Lyi λLi , Pi Yi = wLi + rKi , Li = wLi /Pi Yi , Ki + Li = 1 (139) M P L2 M PK2 P1 f2 (k2 ) − k2 f2 (k2 ) f (k2 ) p ≡ = = = = 2 (140) P2 M PK1 f1 (k1 ) f1 (k1 ) − k1 f1 (k1 ) M P L1 Gross domestic product, Y , is the monetary value of sector outputs, Y
≡ P1 Y1 + P2 Y2 = L(P1 y1 λL1 + P2 y2 λL2 ) ≡ Ly
(141)
and is, with (138-140), equal to the total factor incomes: Y = wL + rK = L(w + rk) = L(ω + k)Pi fi (ki ) = Ly (142) Hence, the factor income distribution shares, δK + δL = 1, become, δK ≡
rk rK wL ≡ , δL ≡ ; Y y Y
δK ≡
k , ω+k
δK k ≡ δL ω
(143)
The macro equivalence of total revenues and total expenditures gives the decomposition of GDP (141) into expenditure shares, si , as si = Pi Yi /Y,
2
si ≡
i=1
2
Pi Yi /Y = 1
(144)
i=1
Lemma 1. The macro factor income shares, δL , δK , (143), are expenditure-weighted combinations of the sectorial factor (cost) shares, δL =
2
si Li ,
δK =
i=1
2
si K i ,
δ K + δL = 1
(145)
i=1
The factor allocation fractions, (132-133), are obtained by, Li /L = λLi = si Li /δL
Ki /K = λKi = si Ki /δK (146)
The total factor endowment ratio, (134), satisfy the identity, cf. (143): 7 2 2 ωδK K/L ≡ k ≡ ≡ ω si Ki si Li (147) δL i=1 i=1 which is a representation of Walras’s law. 203
Bjarne S. Jensen, Martin Richter Proof: By definition, we have, δL = wL/Y = [wL1 + wL2 ] /Y, δK = rK/Y = [rK1 + rK2 ] /Y (148) From (139)and (144), we get wLi = Li Pi Yi = si Li Y,
rKi = Ki Pi Yi = si Ki Y
(149)
Hence, by (148) and (149) we obtain (145). Next, we obtain λLi =
si Li Y s i Li wLi Li = = = L wL δL Y δL
(150)
λKi =
si Ki Y si Ki rKi Ki = = = K rK δK Y δK
(151) 2
as stated in (146). 5.7.2
Commodity prices and factor prices with CES
The connection between relative factor (service) prices and relative commodity prices follows from (6, 136, 138, 140),
p(ω) =
M PK2 [k2 (ω)] f [k2 (ω)] P1 = 2 , (ω) = P2 M PK1 [k1 (ω)] f1 [k1 (ω)]
ω = w/r.
(152)
The exact form of the function (152) needs particular attention. With (10) and (14), the relative commodity prices (comparative costs) (152) become, with σi = 1, σi = 1, and σ1 = σ2 = σ, respectively, p(ω) =
γ2 a2 k2 (ω)a2 −1 γ2 aa22 (1 − a2 )1−a2 (a2 −a1 ) f2 [k2 (ω)] = = ω (153) f1 [k1 (ω)] γ1 a1 k1 (ω)a1 −1 γ1 aa11 (1 − a1 )1−a1
1/(σ2 −1) γ2 a2 a2 + (1 − a2 )k2 (ω)−(σ2 −1)/σ2 f2 [k2 (ω)] p(ω) = = 1/(σ −1) f1 [k1 (ω)] γ1 a1 [a1 + (1 − a1 )k1 (ω)−(σ1 −1)/σ1 ] 1 σ /(σ2 −1)
=
γ2 a2 2
σ /(σ −1)
1/(σ2 −1)
(1 + c2 ω 1−σ2 )
γ1 a1 1 1 (1 + c1 ω 1−σ1 )1/(σ1 −1) σ 1/(σ−1) σ γ2 a2 1 + c2 ω 1−σ 1 − ai p(ω) = c = i γ1 a1 1 + c1 ω 1−σ , ai 204
(154) (155)
Stochastic One-Sector and Two-Sector Models 5.7.3 Walrasian general equilibrium and CES As to the demand (expenditure share) decomposition between consumption and investment (saving), we shall employ the ”neoclassical” saving assumption, which has been standard in much of the growth literature. It is immaterial for our purposes whether investment is controlled by owners or managers. Hence, we use the aggregate monetary saving function: S = sY,
0 (n + δ)/s σ1 σ1 −1
γ1 a1
> (n + δ)
(173) (174)
σ1 ≤ 1 : (Sufficient condition) Persistent growth of k(t) is impossible. σ1 > 1 : Necessary and sufficient conditions for limt→∞ k(t) = ∞ are: σ1
σ1 > 1, σ2 < 1 : σ1 > 1, σ2 > 1 :
γ1 a1σ1 −1 > (n + δ) σ1 σ1 −1
γ1 a1
> (n + δ)/s
(175) (176)
except that (175) is occasionally not sufficient for small initial values. Proof: See Jensen (2003). The proposition follows essentially from a straigthforward examination of the sign of h(k), (169), by a comparative evaluation of f1 (k1 [ω] ), (172), and δK (ω), (162). The difference (s) in the RHS constant of the inequalities comes from δK (ω), (162), taking values 1 or s - for ω → 0 or for ω → ∞ depending on the size of σ2 . The boundary behavior of the marginal product of capital, cf. (11 - 13), in the capital good sector, f1 (k1 [ω] ), (172), is crucial for a steady-state or persistent growth. 2 Proposition 1 shows explicitly that the global existence issues of any steady state or persistent growth depend on the size of the key parameters: σi , a1 , γ1 , s, n, δ. While the accumulation parameters (s, n, δ) play some roles, the fundamental role of the technology parameters in the capital good sector (σ1 , γ1 , a1 ) for deciding the types of the longrun evolution in the CGE growth models complies with observation and economic intuition, as well as confirms the strategic importance ascribed to capital good industries by economic historians and the general public, cf. Mahalanobis (1955), Rosenberg (1963). The most important parameter in Proposition 1 is the substitution elasticity in the capital good sector, σ1 . The total productivity parameter γ1 in the capital-good sector matters in all the stated conditions (173-175). If we restrict γ1 = 1 and if σ1 2, then (176) is 208
Stochastic One-Sector and Two-Sector Models usually satisfied for other relevant parameters, e.g. high saving rates. The critical role in Proposition 1 is played by σ1 rather than σ2 . The conclusions in this Proposition 1 contrast sharply with earlier standard literature on two-sector growth models. Stiglitz & Uzawa (1969, p.407) reported (with neoclassical saving) that a sufficient condition for uniqueness and stability of (convergence to) balanced (steady state) growth paths is: “substitution elasticity in each sector greater than or equal to one.” This condition is neither necessary nor sufficient for the long-run steady state family. Indeed, a high value of σ1 would preclude the existence of steady-state growth, cf. (175 - 176). 5.8.2 Neoclassical SDE of the capital-labor ratio The deterministic model (164-165) becomes, with uncertainties (stochastic elements i ) in the growth rate of labor, (n), the gross saving rate, (s), and the capital depreciation rate (δ) : L˙ = L(n + β1 1 ) K˙ = L(s + φ3 (k) 3 )Y /P1 − (δ + β2 2 )K
(177) (178)
For the labor and capital stock, the associated stochastic differential equations (SDE) to (177–178), are given by (179) dL = Ln dt + Lβ1 dw1 dK = (LsY /P1 ) − δK) dt − β2 K dw2 + LY /P1 φ3 (k) dw3 (180) Theorem 6. The stochastic neoclassical dynamics for the capitallabor ratio of the two-sector model (179-180) is given by the SDE dk = a(k) dt + b(k) dw,
k ∈ (0, ∞)
(181)
where the drift coefficient a(k) and diffusion coefficient b(k) are, a(k) Θ b2 (k) β2
= = = =
s¯(k)(y/P1 ) − Θk, s¯(k) = s − ρ13 β1 φ3 (k) n + δ − (β12 + ρ12 β1 β2 ) β 2 k 2 + φ3 (k)2 (y/P1 )2 − ρφ3 (k)(y/P1 )k β12 + β22 + 2ρ12 β1 β2 , ρ = 2(ρ13 β1 + ρ23 β2 )
(182)
where y/P1 is given by, cf. (142), (168), y = (ω + k ) f1 (k1 (ω)) = (Ψ−1 (k) + k)f1 (k1 [Ψ−1 (k)]) ≡ Υ(k) (183) P1 Proof: The proof of Theorem 6 is a replication of the proof of Theorem 1 with f (k) in, (31), (33), replaced by : Υ(k), (183). 2 209
Bjarne S. Jensen, Martin Richter For the one-sector growth models, we obtained explicit conditions for the lower and upper boundaries - zero and ∞ - to be inaccessible boundaries. These explicit conditions - stated in Theorems 2-4 were established after lengthy calculation, as their proofs showed. With the examples considered here, the βi terms only have a second-order effect and the intuition from proposition 1 will help to understand the stochastic system. We have not calculated the exact boundary conditions, which probably are not feasible due to the complex structure of y/P1 , cf. (183). 5.9
Sample paths of two-sector models and CES
We include a few simulations of stochastic two-sector growth models with some variation of the parameter to show the critical parameter values in long-run stationary processes and, alternatively, with infinity as the attractor, in stochastic endogenous growth. Numerical parameter cases illustrating steady-state values (κ, ω ¯ ), (170–172), with CD/CES technologies are collected in Table 3. The sample path of the stochastic process, k(t), in two-sector growth models, (181), is formally given by : t t a(k[u; ω]) du + b(k[u; ω]) dwu [ω] (184) k(t) = k[t; ω] = k(0) + 0
0
where k(0) is a fixed initial condition and [ω] symbolizes a particular realization of the Wiener process. This sample path (184) is thus approximated by the Euler scheme, Kloeden & Platen (1995). After having determined the simulated sample path, k(t), the sample path for ω(t), k1 (t), and y/P1 (t) for the two-sector model (with the same realization of the Wiener process) is determined by using the respective equations, (169), (14), (163), (183), and (172), σi ai −1 −1 Ψ [ k(t) ] (185) ω(t) = Ψ [ k(t) ] ; ki (t) = 1 − ai y/P1 (t) = Υ [ k(t) ] = [ Ψ−1 [k(t)] + k(t) ] f1 ( k1 [Ψ−1 [k(t)] ) (186) Some sample paths, (184-186), are exhibited in Fig. 5.12 - 5.14. From the CGE relationship, ω = Ψ−1 (k), (169), (163), the stationary density function for (ω), called ϕ (ω), is the stationary density function, p(k), transformed by the function Ψ−1 . Hence, using the transformation theorem of densities, ϕ (ω) in Fig. 5.12 - 5.14 is, ϕ (ω) = p ( Ψ(ω)) Ψ (ω) 210
(187)
0.25 0.25 0.25 0.25 0.25 0.25
1 2 3 4 5 6
0.02 0.02 0.02 0.02 0.02 0.02
0.02 0.02
0.05 0.05 0.05 0.05 0.05 0.05
0.05 0.05
δ
0.4 0.6 0.4 0.4 0.4 0.4
0.2 0.4
a1
0.5 0.5 0.5 0.5 0.5 0.5
0.5 0.5
a2
0.5 0.5 1.2 1.2 1.5 1.5
1.0 1.0
σ1
0.7 0.7 1.5 1.5 1.2 1.2
1.0 1.0
σ2
0.01 0.01 0.01 0.01 0.01 0.01
0.01 0.01
β1
Parameters - stationary models
n
0.03 0.03 0.03 0.03 0.03 0.03
0.03 0.03
β2
0.02 0.02 0.02
0.05 0.05 0.05
0.6 0.6 0.6
0.5 0.5 0.5
2.0 2.0 2.0
1.5 1.5 1.5
0.01 0.01 0.01
0.03 0.03 0.03
0.0 0.1 0.1
0.0 0.1 0.0 0.1 0.0 0.1
0.0 0.0
β3
0 1 1
0 1 0 1 0 1
0 0
λ3
5.583 7.586 11.196 11.196 13.198 13.198
4.357 5.878
κ
5.586 7.564 11.192 11.146 13.150 13.113
4.361 5.878
E(k)
5.502 7.435 10.902 10.769 12.725 12.501
4.288 5.754
mode(k)
0.507 0.950 1.369 1.797 1.797 2.398
0.414 0.644
σ(k)
14.684 20.606 6.029 6.029 8.499 8.499
5.546 6.368
ω ¯
0.439 0.354 0.389 0.389 0.277 0.277
0.770 0.420
f1 (k1 (ω)) ¯ k1 (ω) ¯
0.360 0.360 0.360
0.077 0.075 0.182 0.182 0.170 0.170
0.154 0.168
f1 (k1 (ω)) ¯
Limits for k → ∞ ∞ 0.360 ∞ 0.360 ∞ 0.360
1.374 1.969 2.063 2.063 3.739 3.739
1.068 1.783
f1 (k1 (ω)) ¯
Model characteristics
0.176 0.212 0.468 0.468 0.613 0.613
0.200 0.400
K1 (ω) ¯
0.275 0.269 0.650 0.650 0.607 0.607
0.440 0.480
δK (ω) ¯
Remark. Whereas the stationary probability density (188) and the first part of Proposition 1, (173-174) refer to figures like Fig. 5.12-5.14, the second part of Proposition 1, (175-176) refers to and provides the parameter values that generate persistent (endogenous) economic growth in figures like Fig. 5.15. Proposition 1 serves as a deterministic substitute for the two-sector analogues of the stochastic Theorems 2-4 and the stochastic condition, (102), of one-sector growth models.
0.25 0.25 0.30
Parameters - endogenous growth models
0.20 0.20
1 2
1 2 3
s
case
Table 3. Numerical cases for two sector growth models: CD and CES
Bjarne S. Jensen, Martin Richter 20
20
15
15
15
10
10
10
5
5
20 Ωt
kt
5
k1t yP1t 20
40
60
80
100
100
200
300
400
0.2 0.4 0.6 0.8 1.
500
Figure 5.12: CES II: case 1
20
20
20
15
15
15
10
10
5
5
kt 10 Ωt k1t
5
yP1t 20
40
60
80
100
100
200
300
400
0.2 0.4 0.6 0.8 1.
500
Figure 5.13: CES II: case 3
20
20
15
15
10
10
Ωt
10
5
5
yP1t
5
20
40
60
80
100
20 k1t kt
100
200
300
Figure 5.14: CES II: case 6
212
400
500
15
0.2 0.4 0.6 0.8 1.
Stochastic One-Sector and Two-Sector Models where p is the basic stationary density function for (k) as defined by the drift and diffusion coefficients: (44–45), (115–117). The leading role of the capital-good sector as an engine of persistent growth is exhibited by sample paths in Fig. 5.15. For proper scaling of the vertical axis, the sample path of k1 (t) is not included in the left panel, whereas it appears in the right panel. The comments given above to Fig. 5.11 apply similarly to Fig. 5.15. Stochastic (probability) issues appear, according to Pierre-Simon Laplace, because we are partly knowing and partly ignorant. The dynamics of stochastic and deterministic growth models share a common element: the drift coefficient, which constitutes the substance of our knowledge, based on economic theory and empirical (verified) data. The course of future events is admittedly uncertain, but the diffusion coefficient, similarly based on solid theoretical/empirical premises, governs the volatility of outcomes. The mathematical tool of continuous-time probability of Norbert Wiener and Kiyoshi Itˆo allows probability calculations to be made about observable future time paths. Stochastic dynamics properly employed, far from making economic growth models more abstract, actually serve as a powerful liberating framework, enabling the analysis and integration of ever more realistic and complicated hypotheses. Stochastic and deterministic growth models contribute together to a genuine understanding of historical time series in economics and other disciplines.
300 kt
k2t 200
yP1t 100
Ωt
20
40
60
80
100
3000 2900 2800 2700 2600 2500 2400 2300 2200 2100 2000 1900 1800 1700 1600 1500 1400 1300 1200 1100 1000 900 800 700 600 500 400 300 200 100
k1t
kt k2t yP1t Ωt 20
40
60
80
100
Figure 5.15: CES II: case 1 - endogenous growth
213
Bjarne S. Jensen, Martin Richter References: Aghion, P. and Howitt P. (1998) Endogenous Growth Theory. Cambridge, MA and London, England: The MIT Press. Arrow, K.J., Chenery, H.B., Minhas, B. S., and Solow, R.M. (1961) “Capital-Labour Substitution and Economic Efficiency.” Review of Economics and Statistics 43: 225–250. Barro, R. J. (1991) “Economic Growth in a Cross-Section of Countries.” Quarterly Journal of Economics 106: 407-443. Barro, R. J. and Sala-i-Martin, X. (1992) “Convergence.” Journal of Political Economy 100: 223-251. Bourguignon, F. (1974) “A Particular Class of Continuous-Time Stochastic Growth Models.” Journal of Economic Theory 9: 141–158. Burmeister, E., and Dobell, A.R. (1970) Mathematical Theories of Economic Growth. London: MacMillan. Chang, F.R., and Malliaris, A.G. (1987) “Asymptotic Growth under Uncertainty – Existence and Uniqueness.” Review of Economic Studies 54: 169–174. De Long, J.B., and Summers, L. H. (1991) “Equipment Investment and Economic Growth.” Quarterly Journal of Economics 106: 445– 502. Dixit, A.K., and Pindyck, R.S. (1994) Investment under Uncertainty. New Jersey: Princeton University Press. Gandolfo, G. (1980) Economic Dynamics: Methods and Models. 2. ed. Amsterdam: North-Holland. Gandolfo, G. (1997) Economic Dynamics. Berlin/New York: Springer Verlag. Itˆo, K., and McKean, Henry P. Jr. (1965) Diffusion Processes and Their Sample Paths. Berlin: Springer-Verlag. Jensen, B.S. and Larsen, M.E. (1987) “Growth and Long-Run Stability.” Acta Applicandae Mathematicae 9: 219–137. Also in: Jensen (1994). Jensen, B.S. (1994) The Dynamic Systems of Basic Economic Growth Models. Dordrecht: Kluwer Academic Publishers. Jensen, B.S. and Wang, C. “General Equilibrium Dynamics of Basic Trade Models for Growing Economies.” In: Bjarne S. Jensen and Kar-yiu Wong (eds.) (1997) Dynamics, Economic Growth, and International Trade. Ann Arbor: University of Michigan Press. 214
Stochastic One-Sector and Two-Sector Models Jensen, B.S. and Wang, C. (1999) “Basic Stochastic Dynamic Systems of Growth and Trade.” Review of International Economics 7: 378–402. Jensen, B.S., Richter, M., Wang, C., and Alsholm, P.K. (2001) “Saving Rates, Trade, Technology, and Stochastic Dynamics.” Review of Development Economics 5: 182–204. Jensen, B.S. (2003) “Walrasian General Equilibrium Allocations and Dynamics in Two-Sector Growth Models.” German Economic Review 4: 53–87. Jensen, B.S., and Larsen, M.E. (2005) “General Equilibrium Dynamics of Multi-Sector Growth Models.” Journal of Economics Supplement 10: 17–56. Jensen, B.S., Wang, C., and Johnsen, J. (2007) “Moment Evolution of Gaussian and Geometric Wiener Diffusions - derived by Itˆo’s Lemma and Kolmogorov’s Forward Equation.” This volume. Jones, R.W. (1965) “The Structure of Simple General Equilibrium Models.” The Journal of Political Economy 73: 557-72. Karlin, S., and Taylor, H.M. (1981) A Second Course in Stochastic Processes. N.Y.: Academic Press. Kemp, M.C. (1969) The Pure Theory of International Trade and Investment. New Jersey: Prentice-Hall. Kloeden, P.E., and Platen, E. (1995) Numerical Solutions to Stochastic Differential Equations 2. ed. Berlin: Springer Verlag. Klump, R. (1995) “On the Institutional Determinants of Economic Development – Lessons from a Stochastic Neoclassical Growth Model.” Jahrbuch f¨ ur Wirtschaftswissenschaft 46: 138–51. de La Grandville, O. (2001) Bond Pricing and Portfolio Analysis. Cambridge, Mass.: MIT Press. Lau, S.-H.P. (2002) “Further Inspection of the Stochastic Growth Model by an Analytical Approach.” Macroeconomic Dynamics 6: 748– 757. Mahalanobis, P. C. (1955) “The approach of operational research to planning in India.” Sankhya: Indian Journal of Statistics 16: 3–62. Mandl, P. (1968) Analytical Treatment of One-Dimensional Markov Processes. (Die Grundlehren der mathematischen Wissenschaften in Einzeldarstellungen, Band 151) Berlin: Springer-Verlag. Malliaris, A.G., and Brock, W.A. (1982) Stochastic Methods in Economics and Finance. Amsterdam: North-Holland/Elsevier. Mas-Colell, A., Whinston, M.D., and Green, J.R. (1995) Microeconomic Theory. New York: Oxford University Press. 215
Bjarne S. Jensen, Martin Richter Merton, R.C. (1975) “An Asymptotic Theory of Growth under Uncertainty.” Review of Economic Studies 42 (1975): 375–393. Merton, R.C. (1990) Continuous-Time Finance. Cambridge MA: Blackwell. Minhas, B. S. (1962) “The Homohypallagic Production Function, Factor Intensity Reversals, and the Heckscher-Ohlin Theorem.” The Journal of Political Economy 70: 138–156. Quah, D.T. (1996) “Empirics for Economic Growth and Convergence.” European Economic Review 40: 1353-1375. Rosenberg, N. (1963) “Capital Goods, Technology, and Economic Growth.” Oxford Economic Papers 15: 217–227. Sala-i-Martin, X.X. (1996) “Regional Cohesion: Evidence and Theories of Regional Growth and Convergence.” European Economic Review 40: 1325-1352. Sandmo, A. (1970) “The Effect of Uncertainty on Saving Decisions.” Review of Economic Studies 37: 353–60. Scenk-Hoppe, K.R. (2002) “Is there a Golden Rule for the Stochastic Solow Growth Model?” Macroeconomic Dynamics 6: 457–475. Solow, R.M. (1961-62) “Note on Uzawa’s Two-Sector Model of Economic Growth.” Review of Economic Studies 29: 48–50. Also in: Stiglitz and Uzawa (1969). Stigum, B.P. (1972) “Balanced Growth under Uncertainty.” Journal of Economic Theory 5: 42–68. Stiglitz, J.E. and Uzawa, H. (eds.) (1969) Readings in the Modern Theory of Economic Growth. Cambridge (Mass.): M.I.T. Press. Uzawa, H. (1961-62) “On a Two-Sector Model of Economic Growth: I.” Review of Economic Studies 29: 40–47. Uzawa, H. (1963) “On a Two-Sector Model of Economic Growth: II.” Review of Economic Studies 30: 105-18. Also in: Stiglitz and Uzawa (1969). Wan, H.Y., Jr. (1971) Economic Growth. N.Y: Harcourt Brace Jovanowich. Øksendal, B. (2005) Stochastic Differential Equations – An Introduction with Applications. 6. ed.. Series: Universitext, Springer-Verlag.
216
Chapter 6 Comparative Dynamics in a Stochastic Growth and Trade Model with a Variable Savings Rate
Zhu Hongliang School of Management and Engineering, Nanjing University, Nanjing, P.R.China Huang Wenzao Department of Financial Mathematics, Peking University, Beijing, P.R.China
6.1
Introduction
The central purpose of theories of economic growth is to understand the factors behind the long-run growth of economies, and to explain differences in their growth performances. A wide class of growth models have been explored in the past decades, see Barro and Sala-i-Martin (1995), Burmeister and Dobell (1970). Most of these models, however, are deterministic and hence ignore fundamental uncertainties, which affect productivity and cause diversity between nations. The uncertainties are intrinsic features of dynamic economic systems, and stochastic elements will appear in any economic growth process generated by factor endowment accumulation and technological change, as many uncertainty factors exist in population growth, production processes, consumers behavior, government expenditure, and policy decisions. In this chapter, we consider the neoclassical two-sector growth models of a small country that is trading in both commodities under uncertainties. This model originates in the open neoclassical
Zhu Hongliang, Huang Wenzao two-sector growth model by Deardorff (1974), Deardorff and Hanson (1978), and Jensen and Wang (1997) in Jensen and Wong (1997), and extended into a stochastic environment in continuous time by Jensen and Wang (1999). In addition, we use a savings function s(k, θ) instead of a constant rate of savings in Jensen and Wang (1999), i.e., the rate of saving depends on the capital-labor ratio, and any policy parameter. As to the dependence of the savings rate on capital-labor, see Solow (1956). Policy instruments such as the (initial) stock of money, the rate of capital income taxation, or other government regulations enter into our analysis as parameters. A perturbation in these parameters will influence the dynamic behavior of the economy. Similar savings functions have been studied by Atkinson and Stiglitz (1980), Boadway (1979), Chang and Malliaris (1987) and Merton (1975). We study the global comparative dynamic properties of the capital accumulation process with respect to any policy parameter. In characterizing the entire time path of the capital accumulation process, the effect of a policy parameter on the behavior of the entire capital accumulation path can be determined. We show that the time path of the capital-labor ratio satisfies a monotonicity with respect to any policy parameter, if the savings function changes monotonically with respect to a policy parameter. In addition, we analyze the impact of the policy parameter on the steady-state distribution of the capital-labor ratio in the stochastic growth and trade model. 6.2
Stochastic dynamic systems for trading economies
Two-sector growth models were first studied systematically by Shinkai (1960), Uzawa (1961). Due to the fundamental differences between capital good and consumer good, production sector is divided into two sectors: capital sector and consumer sector, which are described by different production functions. Here we briefly present the structure of the neoclassical two-sector growth and trade model. The sector technologies are described by neoclassical production functions exhibiting constant returns to scale, Yi = Fi (Li , Ki ) = Li Fi (1, Ki /Li ) = Li fi (ki ) = Li yi , i = 1, 2
(1)
The factor endowments (L, K) belong to the diversification cone Ck : 2 Ck = {(L, K) ∈ R+ |k1 < K/L < k2 or k2 < K/L < k1 }.
218
(2)
Comparative Dynamics in a Growth and Trade Model The two-sector economy is assumed to operate under perfect competition; money factor prices(w, r) are the same in both sectors, and output prices (P1 , P2 ) represent unit cost. Hence, we have the competitive general equilibrium relations, Pi Yi = rKi + wLi , i = 1, 2. Gross domestic product Y is the monetary value of outputs from both sectors and represents aggregated gross factor incomes: (li = Li /L) Y = P1 Y1 + P2 Y2 = L(P1 y1 l1 + P2 y2 l2 ) = rK + wL = L(rk + ω) = Ly, (3) The small open, competitive, two-sector economy is trading at international prices, determined in the world market, i.e., at the exogenous terms of trade : p = P1 /P2 . Let Qi , i = 1, 2, respectively, denote the quantitative size of the domestic demand for investment (good 1) and consumption (good 2); then we have Qi = Yi − Xi , where Xi , i = 1, 2 are net exports of the two goods. The trade balance is assumed to satisfy the constraint P1 X1 + P2 X2 = 0,
(4)
Then Y = P1 Y1 + P2 Y2 = P1 Q1 + P1 X1 + P2 Q2 + P2 X2 = P1 Q1 + P2 Q2 , (5) i.e., trade equilibrium prevails with no foreign borrowing/lending allowed. Lemma 1.1 With given prices (P1 , P2 ) and the monotonicity and concavity conditions, the GNP function y(k) is a concave C 1 -class function on (0, ∞), and y(k) has a linear segment in the diversification cone Ck . Next we set up the neoclassical deterministic dynamic sytem for the small two-sector trading economy. The proportion of gross income Y , cf. (3), that is saved is given by the savings ratio s(k, θ). The savings ratio, as a function, is allowed to depend on k , and more importantly on any policy parameter θ. This general assumption is made, because any policy parameter will influence the savings-consumption decision, and thereby the dynamic behavior of the trading economy through the savings ratio. 1
Deardorff (1974), Jensen and Wang (1997).
219
Zhu Hongliang, Huang Wenzao With the depreciation of capital, (δP1 K), the factor accumulation equations become: ⎧ dL ⎪ ⎨ L˙ = dt = Ln, K˙ = dK = Ls(k, θ) PY1 − δK dt ⎪ ⎩ = L{s(k, θ)[y1 l1 + ( yp2 )l2 ] − δk} = Lg(k).
(6)
Thus, the dynamics of capital-labor ratio k - for 0 < k < ∞ - can be obtained as, k˙ = g(k) − nk = s(k, θ) PY1 − (δ + n)k = s(k, θ)[y1 l1 + yp2 l2 ] − (n + δ)k.
(7)
By Lemma 1, the complete dynamic system of the small trading economy is (7) - either a linear dynamic system, operating within the diversification cone Ck with ”fixed coefficient” sector technologies, k1∗ , k2∗ , y1∗ = f1 (k1∗ ), y2∗ = f2 (k2∗ ), or a nonlinear dynamic system, operating outside the diversification cone Ck with complete specialization. Following Merton (1975), our source of uncertainty is the size of the population. Introducing stochastic elements into the growth rate of labor in (6), we have
L˙ = L(n + 1 ), K˙ = Ls(k, θ) PY1 − δK,
(8)
where n represents the expected rate of growth of the population per unit of time, and the random variables 1 is formally given as, 1 = t , with β1 > 0, and Wt is standard Wiener process, β1 is the β1 dW dt instantaneous variance. Hence the stochastic model (8) can be written as the stochastic differential equations, for 0 ≤ t < ∞:
dL = L(ndt + β1 dW ), dK = [Ls(k, θ) PY1 − δK]dt,
(9)
defined on the whole nonnegative orthant for L and K, and which as (6)-(7) consist of three stochastic subsystems, allowing for, respectively, specialization in good 1, nonspecialization (diversification), and specialization in good 2. Using the tools of Ito’s stochastic calculus, we obtain: 220
Comparative Dynamics in a Growth and Trade Model Lemma 2. In the stochastic two-sector growth model with trade, (9), the stochastic dynamics for the capital-labor ratio, k(t), is a diffusion process given by, with k2∗ > k1∗ dk = [s(k, θ)
Y − (n + δ − β12 )k]dt − β1 kdW, k ∈ [0, ∞). P1
(10)
Explicitly, for the three subintervals of k, i) 0 < k ≤ k1∗ : dk = [s(k, θ)f1 (k) − (n + δ − β12 )k]dt − β1 kdW ;
(11a)
ii) k1∗ < k < k2∗ : ˜ 1 k) − (n + δ − β 2 )k]dt − β1 kdW ; ˜2 + Θ dk = [s(k, θ)(Θ 1
(11b)
iii) k ≥ k2∗ : dk = [s(k, θ) ˜1 = where, Θ Proof: 6.3
y1∗ −(y2∗ /p) (k1∗ −k2∗ )
f2 (k) − (n + δ − β12 )k]dt − β1 kdW ; p
˜2 = > 0, Θ
(y2∗ /p)k1∗ −y1∗ k2∗ (k1∗ −k2∗ )
(11c)
> 0.
See Jensen and Wang (1999, p. 382) with β2 = β3 = 0. 2
Comparative dynamics and policy parameters
In this section, we analyze the impact of any policy parameter on the entire capital accumulation path, as well as on the stochastic steady states. ∂s = 0, which means Assume that the savings ratio s(k, θ) satisfies ∂θ that if θ1 and θ2 are two alternative values of the policy parameter θ with θ1 < θ2 , then for all k > 0: s(k, θ1 )f (k) < (>) s(k, θ2 )f (k), if
∂s > ( ( 0, then the respective time paths, ktθ1 , ktθ2 for kt satisfy: ktθ1 ≤ (≥) ktθ2 a.s.P, for all t > 0.
(13)
Proof: Let sθ > 0 (the proof is identical for sθ < 0): We consider the comparative dynamics for the three subintervals of k, respectively. 1) 0 < k ≤ k1∗ : Given any k, k ∈ (0, k1∗ ], let x = |k − k |. For > 0, define the indicator function χ(0,) of the set (0, ): 1, if x ∈ (0, ), χ(0,) (x) = (14) 0, if x ∈ / (0, ).
Then
1 dx = ∞. (15) →0 x2 We can choose a sequence {an } (0, 1], n = 1, 2, · · · , such a0 = 1, an ≥ an+1 , limn→∞ an = 0, and for any n ≥ 1, 1 χ(an ,an−1 ) (x) 2 dx = n. (16) x lim
χ(0,) (x)
Choose the continuous function sequence {ρn (x)}, ρn (x) = support on the interval (an , an−1 ). Let Δt = ktθ1 − ktθ2 and consider the functions: |Δt | y Ψn (Δt ) = ρn (x)dxdy, 0
1 ; nx2
it has
0
Φn (Δt ) = Ψn (Δt )χ(0,∞) (Δt ),
(17)
Then Ψn is twice continuously differentiable on R+ , for a fixed y and y sufficiently large n, 0 ρn (x)dx = 1. Note that: lim Φn (Δt ) = χ(0,∞) (Δt ) lim Ψn (Δt ) = χ(0,∞) (Δt )|Δt | = sup{0, Δt }
n→∞
n→∞
≡ Δ+ t .
(18)
For Δt ≤ 0, Φn (Δt ) = 0. And for Δt > 0, we have Δt 0 < Φn (Δt ) = Ψn (Δt ) = ρn (x)dx ≤ 1. 0
222
(19)
Comparative Dynamics in a Growth and Trade Model Since Φn (Δt ) = ρn (Δt ), then dΔt = [s(ktθ1 , θ1 )f1 (ktθ1 ) − s(ktθ2 , θ2 )f2 (ktθ2 )]dt +(δ + n − β12 )(ktθ2 − ktθ1 )dt + β1 (ktθ2 − ktθ1 )dWt .
(20)
From Ito’s Lemma, dΦn (Δt ) = Φn (Δt )dΔt + 12 Φn (Δt )(dΔt )2 = Φn (Δt )[s(ktθ1 , θ1 )f1 (ktθ1 ) − s(ktθ2 , θ2 )f1 (ktθ2 )]dt +Φn (Δt )(Δ + n − β12 )(ktθ2 − ktθ1 )dt 1 + Φn (Δt )β12 (ktθ2 − ktθ1 )2 dt 2 +Φn (Δt )β1 (ktθ2 − ktθ1 )dWt .
(21)
A result in Karatzas and Shreve (1991) shows there exists a real valued function Z(k) : R+ → R+ , which satisfies the Lipschitz condition and is such that s(k, θ1 )f1 (k) ≤ Z(k) ≤ s(k, θ2 )f1 (k), if
∂s > 0, ∂θ
(22)
Then s(ktθ1 , θ1 )f1 (ktθ1 ) ≤ Z(ktθ1 ), s(ktθ2 , θ2 )f1 (ktθ2 ) ≥ Z(ktθ2 ). Therefore, Φn (Δt )[s(ktθ1 , θ1 )f1 (ktθ1 ) − s(ktθ2 , θ2 )f1 (ktθ2 )]dt ≤ Φn (Δt )[Z(ktθ1 ) − Z(ktθ2 )]dt.
(23)
since, 0 < Φn (Δt ) ≤ 1, E(dWt ) = 0, for any t ≥ 0. 223
Zhu Hongliang, Huang Wenzao Hence, by taking expectations in the integral form of (21), we have EΦn (Δt ) ≤ ξ1
0
t
E(Δ+ s )ds
1 + E 2
0
t
Φn (Δs )β12 (Δs )2 ds,
(24)
where ξ1 is a constant. By Φn (Δt ) = ρn (Δt ), EΦn (Δt ) ≤ ξ
0
t
E(Δ+ s )ds +
tβ12 . 2n
(25)
Thus let n → ∞, we have E(Δ+ t )
≤ξ
0
t
E(Δ+ s )ds.
Using the Gronwall inequality, it follows that E(Δ+ t ) = 0. This implies = 0 a.s.p., then Δ ≤ 0 a.s.p. for 0 < k ≤ k1∗ , i.e., that Δ+ t t ktθ2 ≥ ktθ1 a.s.p. for all t > 0.
(26)
2) For k1∗ < k < k2∗ , the stochastic dynamics of the capital-labor ratio is given by ˜ 2 ) − (δ + n − β 2 )k]dt − β1 kdt. ˜ 1k + Θ dk = [s(k, θ)(Θ 1
(27)
Let Δt = ktθ1 − ktθ2 , define the same functions ρn (x), Φn (Δt ) as in case 1). For Δt ≤ 0, Φn (Δt ) = 0, and for Δt > 0, we have ˜ 1 ktθ1 + Θ ˜ 2 ) − (δ + n − β12 )ktθ1 ]dtβ1 ktθ1 dt dΔt = [s(ktθ1 , θ1 )(Θ ˜ 1 ktθ1 + Θ ˜ 2 ) − (δ + n − β 2 )ktθ2 ]dt − β1 ktθ2 dt −[s(ktθ2 , θ2 )(Θ 1 ˜ 1 ktθ1 + Θ ˜ 2 ) − s(ktθ2 , θ2 )(Θ ˜ 1 ktθ2 + Θ ˜ 2 )]dt = [s(ktθ1 , θ1 )(Θ +(δ + n − β12 )(ktθ2 − ktθ1 )dt + β1 (ktθ2 − ktθ1 )dWt .
224
(28)
Comparative Dynamics in a Growth and Trade Model From Ito’s Lemma, dΦn (Δt ) = Φn (Δt )dΔt + 12 Φn (Δt )(dΔt )2 ˜ 1 ktθ1 + Θ ˜ 2) = Φn (Δt )[s(ktθ1 , θ1 )(Θ ˜ 1 ktθ2 + Θ ˜ 2 )]dt −s(ktθ2 , θ2 )(Θ +Φn (Δt )(δ + n − β12 )(ktθ2 − ktθ1 )dt + 12 Φn (Δt )β12 (ktθ2 − ktθ1 )2 dt + Φn (Δt )β1 (ktθ2 − ktθ1 )dWt . ˜ 1 > 0, it follows that From s(ktθ2 , θ2 ) > s(ktθ2 , θ2 ) and Θ
(29)
˜ 1 ktθ1 + Θ ˜ 2 ) − s(ktθ2 , θ2 )(Θ ˜ 1 ktθ2 + Θ ˜ 2) s(ktθ1 , θ1 )(Θ ˜ 1 ktθ1 + Θ ˜ 2 − (Θ ˜ 1 ktθ2 + Θ ˜ 2 )]dt ≤ s(ktθ2 , θ2 )[Θ ˜ 1 (ktθ2 − ktθ1 ) ≤ s(ktθ2 , θ2 )Θ ˜ 1 |ktθ2 − ktθ1 |. ≤ Θ Note that 0 < Φn (Δt ) ≤ 1 and E(dWt ) = 0, for any t ≥ 0. Therefore, from (29), we have t t 1 E E(Δ+ )ds + Φn (Δs )β12 (Δs )2 ds, (30) EΦn (Δt ) ≤ ξ2 s 2 0 0 ˜ 1 + δ + n − β12 . From Φn (Δt ) = ρn (Δt ), the inequality where ξ2 = Θ (30) becomes t tβ12 . (31) E(Δ+ EΦn (Δt ) ≤ ξ s )ds + 2n 0 Let n → ∞, then E(Δ+ t ) ≤ ξ
By the Gronwall inequality, implies Δt ≤ 0 a.s.P, and for
t
E(Δ+ s )ds.
0 E(Δ+ t ) = 0, k1∗ < k < k2∗ ,
(32)
then Δ+ t = 0 a.s.P, this
ktθ2 ≥ ktθ1 a.s.P for all t > 0.
(33)
3) k ≥ k2∗ , The proof is similar to the case of 1) and therefore omitted.
225
Zhu Hongliang, Huang Wenzao Thus, we have completed the proof of the Theorem 1.
2
In Theorem 1, the savings rate s(k, θ) depends monotonically on the policy parameter θ. Such a monotonicity restriction is common in conventional models, when analyzing the effect of a policy parameter, see Atkinson and Stiglitz (1980). Theorem 1 shows that the time path of the capital-labor ratio for a small trading economy enjoys the following property with respect to the policy parameter θ: If the savings function depends positively (negatively) on θ for all k > 0, then the capital-labor ratio with parameter value θ2 always lies above (below) the capital-labor ratio with parameter value θ1 in each time period. As is done in Jensen and Wang (1999), we can analyze the existence of the steady-state distribution in the stochastic model (12); moreover, with Cobb-Douglas sector technologies, we can obtain a closed-form expression for the time-invariant distribution function of the diffusion process. As an implication of Theorem 1, we offer a comparative dynamic analysis of the steady-state distributions. Define the distribution function {Ftθ }, t ≥ 0, associated with the time path of the capital-labor ratio {ktθ } as: Ftθ (k) = P [ktθ ≤ k], t ≥ 0. Then the steady-state distribution is defined by F θ = lim Ftθ . t→∞
Corollary 1. If sθ > (< 0) for all k > 0, and θ1 < θ2 , then the respective steady state distributions satisfy : F θ1 ≥ (≤) F θ2 for all k ≥ 0. Proof:
From Theorem 1, with sθ > (< 0), we have ktθ1 ≤ (≥)ktθ2 .
Then for all t ≥ 0, we get Ftθ1 (k) = P [ktθ1 ≤ k] ≥ (≤) P [ktθ2 ≤ k] = Ftθ2 (k).
(34)
Since F θ = limt→∞ Ftθ , in the above inequalities, letting t → ∞ yields 2 the result that F θ1 ≥ (≤)F θ2 if sθ > (< 0) for all k > 0. 226
Comparative Dynamics in a Growth and Trade Model This corollary says that the steady-state distribution of the stochastic dynamics of a small two-sector trading economy with policy parameter value θ2 dominates (is dominated by) the steady-state of the economy with parameter value θ1 , if the savings rate depends positively (negatively) on the policy parameter. Acknowledgements: This research was supported in part by NNSF, and the Fund for ”Study on the Evolution of Complex Economic System” at ”Innovation Center of Economic Transition and Development of Nanjing University” of Ministry of Education, China. We would like to thank Professor Bjarne S. Jensen, Copenhagen Business School, for many valuable discussions and suggestions. References: Atkinson A.B., and Stiglitz J.E. (1980) Lectures on Public Economies. McGraw-hill Inc., New York. Barro R.J., and Sala-i-Martin X. (1995) Economic Growth. New York. Boadway R. (1979) “Long-run Tax Incidence: A Comparative Dynamic Approach.” Review of Economic Studies 46: 505–511. Burmeister E., and Dobell A.R. (1970) Mathematical Theories of Economic Growth. MacMillan, London. Chang F.R., and Malliaris A.G. (1987) “Asymptotic Growth under Uncertainty: Existence and Uniqueness.” Review of Economic Studies 54: 169–174. Deardorff A.V. (1974) “A Geometry of Growth and Trade.” Canadian Journal of Economies 7: 295–306. Deardorff A.V., and Hanson J.A. (1978) “Accumulation and a Longrun Heckscher-Ohlin Theorem.” Economic Inquiry 16: 288–292. Jensen B.S. (1994) The Dynamic Systems of Basic Economic Growth Models. Dordrecht: Kluwer Academic Publishers. Jensen B.S., and Larsen M.E. (1987) “Growth and Long-Run Stability.” Acta Applicandae Mathematicae 9: 219–237. Jensen B.S. and Wong K.Y. (1997) Dynamics, Economic Growth, and International Trade. Ann Arbor, University of Michigan Press. Jensen B.S., and Wang C. (1999) “Basic Stochastic Dynamic Systems of Growth and Trade.” Review of International Economics 7: 378–402. 227
Zhu Hongliang, Huang Wenzao Karatzas I., and Shreve S.E. (1991) Brownian Motion and Stochastic Calculus. Berlin, Springer-Verlag. Merton R.C. (1975) “An Asymptotic Theory of Growth under Uncertainty.” Review of Economic Studies 42: 375–393. Shinkai Y. (1960) “On the Equilibrium Growth of Capital and Labor.” International Economic Review 1: 107–111. Solow R.M. (1956) “A Contribution to the Theory of Economic Growth.” Quarterly Journal of Economics 70: 65–94. Uzawa H. (1961) “On a Two-sector Model of Economic Growth.” Review of Economic Studies 29: 40–47.
228
Chapter 7 Inada Conditions and Global Dynamic Analysis of Basic Growth Models with Time Delays
Zhu Hongliang School of Management and Engineering, Nanjing University, Nanjing, P.R.China Huang Wenzao Department of Financial Mathematics, Peking University, Beijing, P.R.China
7.1
Introduction
Standard growth models exhibit solutions that in the long run will converge smoothly to a unique equilibrium (steady state) from any positive initial point. But actual observations from any country of most variables show many fluctuations. Arrow and Smale (1980) pointed out that the economic system can be seen as an evolutionary complex system, and that it is necessary to study economic theory from the view of nonlinear dynamics. Much evidence indicates that economic fluctuation does not only come from exogenous events, but also from the inner structure of the economic dynamic system, see Zhang (1990a, 1990b), Jarsulic (1993), Lorenz (1989) and Puu (1991). It is notable, however, that time delays are in economics often neglected in continous time dynamics - to a great extent, no doubt, due to the difficulty in solving and analyzing such models. Nevertheless, economic development depends not only on the current state, but also on the past states (history). Gandolfo (1997) points out that the delay dynamical systems are much more suitable than differential equations alone or difference equations alone for an adequate treatment of
Zhu Hongliang, Huang Wenzao dynamic economic phenomena. In recent years, functional differential equation theory has made considerable progress. S.Invernizzi and A.Medio (1991) probed the relationship between time delay and chaos in economic systems, and argued that time delay is important in bringing about chaos. E.N.Chukwu (1996, 1998) discussed the controllability of some economic growth models with time delay, and Boucekkine et al (1997) studied economic growth model by methods of numerical value solutions, and found that time delays have great impacts on the dynamic properties. In this chapter, we consider the neoclassical economic growth model with time delays, and our focus is on getting global conditions for steady state stability or oscillations. The standard growth model in labour, L, and capital, K, accumulation (k = K/L) is:
˙ L(t) = nL(t), ˙ K(t) = sF (L(t), K(t)) − δK(t),
(1)
˙ k(t) = sf (k(t)) − (n + δ)k(t),
(2)
where 0 < s < 1,
0 < δ < 1,
n > 0,
represent the gross saving rate, depreciation rate, growth rate of labour, respectively, and F (L(t), L(t)) is a neoclassical production function, homogeneous of degree one, i.e., F (L(t), K(t))/L(t) = F (1, k(t)) = f (k(t)). Moreover, the production function f (k) is here is assumed to satisfy the Inada conditions: f (k) > 0,
f (k) < 0 for all k > 0;
lim f (k) = 0, lim f (k) = ∞.
k→∞
k→0
f (0) = 0,
f (∞) = ∞; (3)
A standard result in growth theory is: Lemma 1. System (2) has a unique positive equilibrium point (steady state), k = κ, and κ is globally asymptotically stable under the Inada condition (3).
230
Global Analysis of Growth Models with Time Delays 7.2
Neoclassical growth model with time delays
In the production function, we introduce a delay (time lag) τ (positive constant) in the fully utilization (productive operation) of acquired (installed) capital, ˙ L(t) = nL(t), (4) ˙ K(t) = sF (L(t), K(t − τ )) − δK(t). The time derivative of the capital/labour ratio, k(t) = K(t)/L(t), is obtained from (4), and so we get the following delay equation for k(t): ˙ k(t) = sf (βk(t − τ )) − (n + δ)k(t) ;
β = e−nτ > 0
(5)
where f (βk(t)) = F (1, βk(t)) satisfies the Inada condition. We study the qualitative behavior of delay differential equation (5), and discuss its economic meaning. Assuming initial time t0 = 0, define Et0 = [−τ, 0] as the initial region. Let R+ = (0, +∞), and C + = C([−τ, 0], R+ ) be the continuous function space from [−τ, 0] to R+ . For every φ ∈ C + , define norm φ = sup |φ|. −τ ≤θ≤0
Theorem 1. If the initial functions, k0 (t) ≡ Φ(t), for t ∈ [−τ, 0], belongs to C + , then the solution k(t) of the delay differential equation (5) exists and is unique for t ∈ [0, +∞). The equilibrium point (steady state) κ is obtained from (5) as, sf (βκ) = (n + δ)κ ,
(6)
and it is globally asymptotically stable. Proof: The existence and uniqueness of the solutions in Theorem 1 follow easily from standard theory, see Hale (1993). Since f (βk) satisfies the Inada condition, then there exists a unique positive equilibrium point κ in (6). We must prove the global attractivity of κ. 231
Zhu Hongliang, Huang Wenzao As the first step, we show that the solution k(t) of the delay differential equation (5) is positive and bounded for all t ≥ 0. Assuming k(t) is not always positive for t ≥ 0, then there must be t1 > 0, such that for t ∈ [0, t1 ), k(t) > 0, but k(t1 ) = 0. Thus, we have ˙ 1 ) ≤ 0, i.e., k(t f (βk(t1 − τ )) ≤ (n + δ)k(t1 ) = 0. Since f (βk) > 0 for k > 0, and f (0) = 0, then f (βk(t1 − τ )) = 0. Thus k(t1 − τ ) = 0, which contradicts with the definition of t1 . So k(t) > 0 for t ≥ 0. By the Inada condition, we have sf (βk) sβf (βk) = lim = 0. k→∞ (n + δ)k k→∞ n+δ lim
Thus, there is a N > 0 such that sf (βk) < (n + δ)k for all k ≥ N . If ˙ 2 ) ≥ 0, k(t2 ) ≥ k(t) is unbounded, then there is a t2 > 0 such that k(t N , and k(t) < k(t2 ) for t ∈ [0, t2 ]; hence, sf (βk(t2 )) < (n + δ)k(t2 ). Since f (k) > 0, then sf (βk(t2 − τ )) < sf (βk(t2 )) < (n + δ)k(t2 ). ˙ 2 ) ≥ 0 leads to But k(t sf (βk(t2 − τ )) > (n + δ)k(t2 ), which is a contradiction. Next we show that if k(t) is a solution of equation (5), with 0 < k(0) < κ, where κ is the equilibrium of equation (5), then k(t) > km for all t > τ , where km = min{k(t) : t ∈ [0, τ ],
(7)
From the above, we know that km > 0 and km < κ. If the argument does not hold, then there exists a t1 > τ such that k(t1 ) = km ; but for ˙ 1 ) ≤ 0. Thus t ∈ [0, t1 ), we have k(t) > km , and k(t sf (βk(t1 − τ )) ≤ (n + δ)k(t1 ). Therefore, k(t1 − τ ) < k(t1 ) = km , 232
Global Analysis of Growth Models with Time Delays which contradicts the definition of km , (7). Thus for t > τ , we have k(t) > min{k(t) : t ∈ [0, τ ]} = km . Next, we prove the global attractivity of the equilibrium κ. Denote u = lim sup |k(t) − κ|. t→∞
As proved above, k(t) is positively bounded, so u < +∞. We must prove u = 0. Otherwise, if u > 0, then one of the following two statements holds: (i) There exists a sequence {ti } with ti > ti−1 , limi→∞ ti = +∞, and lim k(ti ) = κ + u;
i→∞
(8)
(ii) There exists a sequence {ti } with ti > ti−1 , limi→+∞ ti = +∞, and (9) lim k(ti ) = κ − u. i→∞
Assume that (i) holds, then there exists a > 0 such that sf (β(u + + κ)) < (n + δ)(u − + κ).
(10)
For this , from (8), there is a T = T () > τ such that for t ≥ T − τ , we have k(t) < u + + κ. In the following, we consider two kinds of cases: (ia) k(t) is not monotone; (ib) k(t) is monotone. First, assume that k(t) is not monotone, then there is a t > T such that ˙ ) = 0, k(t ) − κ > u − , k(t Thus, sf (βk(t − τ )) = (n + δ)k(t ) > (n + δ)(u − + κ). From (10), we obtain sf (βk(t − τ )) > (n + δ)(u − + κ) > sf (β(u + + κ)). But f (k) > 0, then k(t − τ ) > u + + κ. 233
Zhu Hongliang, Huang Wenzao This contradicts with k(t) < u + + κ. Next assume that k(t) is monotone, then ˙ = sf (β(κ + u)) − (n + δ)(κ + u) < 0. lim k(t)
t→∞
This leads to lim k(t) = −∞.
t→∞
This contradicts with limt→∞ k(t) = κ + u . Now if (ii) in page 233 holds, we can – with no loss of generality – let 0 < k(0) < κ, and choose 0 < < km , where km = min{k(t) : t ∈ [0, τ ]} such that f (β(κ − u − )) > (n + δ)(κ − u + ). The remaining proof is similar to case (i) with (ia),(ib); so we can omit it. This completes the proof of u = 0 - the global attractivity of κ. As the second step, we prove the local stability of the equilibrium κ of the delay equation (5). Consider the linearized system around κ in equation (5): ˙ k(t) = sβf (βκ)k(t − τ ) − (n + δ)k(t).
(11)
Its characteristic equation is λ = sβf (βκ)e−λτ − (n + δ).
(12)
By the Inada condition, we know that sβf (βκ) < (n + δ). Thus, all roots of equation (12) have negative real part ; see Gopalsamy (1992), i.e., the equilibrium point κ of the delay system (5) is asymptotically stable. With both global attractivity and asymptotic stability of κ, the proof is completed. 2 234
Global Analysis of Growth Models with Time Delays
Figure 1: Convergence of the delay system
By comparing, Lemma 1 and Theorem 1, we find that introducing any delay (any size of τ ) for capital, K(t−τ ), in the production function does not change the global stability property of the equilibrium (steady state). The standard growth model - at least with Inada conditions, Lemma 1 - is quite robust to such delays that occur only within the neoclassical production function, i.e., the nonlinear part of (5). Numerical simulation of the delay system (5) shows that in contrast to the smooth monotone solutions from the standard model, the solution of the delay model (5) converges to the positive equilibrium state κ with some fluctuation – see Figure 1, where for simplicity, we choose some parameters, 1 s= , 2
3 n= , 2
1 δ= , 2
τ =3
1
and f (k) = k 2 , which give the delay model: 1 1 1 ˙ k(t) = β 2 k 2 (t − 3) − 2k(t), 2 1 −2 where β = e−nτ = e− 2 , and from (6), κ = 16 e . Let k(t) = κex(t) , then the delay model can be transformed into the system: 9
9
1 x(t) ˙ = 2(exp[ x(t − 3) − x(t)] − 1). 2
235
Zhu Hongliang, Huang Wenzao 7.3
Dynamics with delays in production and depreciation
In this section, the delay model (5) is extended by allowing for also a time delay in capital depreciation. When it for various reasons requires a certain amount of time for acquired capital to be utilized (operated) efficiently, then the actual depreciation of capital could properly be postponed, too. For simplicity, the same time lag (delay, τ ) will be used for installation and depreciation of capital. Thus, the extended delay model of economic growth model becomes: ˙ L(t) = nL(t), (13) ˙ K(t) = sF (L(t), K(t − τ )) − δK(t − τ ). Hence the time derivative of k(t) gives (after some manipulations) the following delay equation for k(t): ˙ k(t) = sf (βk(t − τ )) − δβk(t − τ ) − nk(t),
(14)
where s, n, τ, δ, β are parameters as previously stated, and f (k) satisfies Inada conditions. The standard example of f (k) meeting (3) is the CD technologies: f (k) = γk α ; γ > 0 ,
0 < α < 1.
(15)
Thus from (14)-(15), we have the delay differential equation for k(t): ˙ k(t) = sγβ α k α (t − τ ) − δβk(t − τ ) − nk(t)
(16)
˙ k(t) = B[k(t − τ )] − nk(t)
(17)
where B(k) = sγβ α k α − δβk,
B(k) = 0
gives k ∗ = 0,
k ∗∗ = (
1 1 sγ 1−α ) , δ β
and B (k) = 0 gives k = kM = (
1 1 sγα 1−α ) , δ β
i.e., B(kM ) = max{B(k) : k ∈ [0, k ∗∗ ]}. 236
(18)
Global Analysis of Growth Models with Time Delays Theorem 2. If the initial functions, k0 (t) ≡ Φ(t), for t ∈ [−τ, 0], belongs to C + , then the solution k(t) of the delay differential equation (16) exists and is unique for t ∈ [0, +∞). The unique equilibrium point (steady state) κ is obtained from (16) as, κ=(
1 sγβ α 1−α ) . n + δβ
(19)
i) If δ satisfies : 1 α n 1 1−α ( ) ( ), (20) β α 1−α then there exists a T > 0 such that the solution k(t) to (16) for t ≥ T is bounded: sγ 1 1 (21) k(t) < k ∗∗ = ( ) 1−α , δ β ii) If δ satisfies :
δ
| B(k) − κ|; k ∈ [kM , k ∗∗ ], k = κ n iii) If δ satisfies : n α δ≤ . β1−α then the steady state, (19), is globally asymptotically stable.
(23) (24)
Proof: As to existence and uniqueness of the solutions k(t) to (16), see Theorem 1. Part i): We first show that, if (20) holds, then κ≤
1 B(kM ) < k ∗∗ . n
(25)
From (18), it is obvious that κ ≤ n1 B(kM ). In order to get 1 B(kM ) < k ∗∗ , n i.e.,
sγ 1 1 1 α (sγβ α kM − δβkM ) < ( ) 1−α , n δ β
we need (20). 237
Zhu Hongliang, Huang Wenzao If for all large t, i.e., t ≥ T , where T is a sufficient large number, ˙ k(t) ≥ κ, then k(t) ≤ 0, thus limt→+∞ k(t) = κ, and the conclusion is true. If k(t) oscillates around the equilibrium κ, then there must be a ˙ ) = 0. We have t > 2τ such that k(t ) > κ, and k(t k(t ) =
1 1 B(k(t − τ )) ≤ B(kM ) < k ∗∗ . n n
Therefore, there exists a T > 0 such that k(t) < k ∗∗ for t ≥ T . Part ii): Denote
u = lim sup |k(t) − κ| t→+∞
From (21), u < +∞. We assume u > 0. First we show that, if (21*) holds, then kM < κ. Because from (
1 1 1 sγα 1−α sγβ α 1−α ) βn 1−α ˜ Thus, for k ∈ (0, k], where k˜ satisfies 0 < k˜ < kM , and B(κ) = ˜ B(k), we have 1 |k − κ| > | B(k) − κ|. n ˜ kM ] , we have from (23), When k ∈ (k,
κ − k > κ − k >
1 B(k) − κ ≥ 0, n
where kM < k < κ, and B(k) = B(k ). To summarize, we obtain 1 |k − κ| > | B(k) − κ|, 0 < k < k ∗∗ , k = κ. n
(26)
If the solution k(t) of system (16) is monotone, then we have u = 0; If k(t) oscillates around the equilibrium state κ, then by the continuity of B(k) and (26), there exists a > 0 such that as ξ ∈ [−, ], 1 | B(κ + u + ξ) − κ| < u − , n
(27)
and there exists T > 0 such that as t ≥ T , |k(t) − κ| < u + . 238
(28)
Global Analysis of Growth Models with Time Delays Let t1 > T + τ such that |k(t1 ) − κ| > u − , k(t1 ) > κ, and ˙ 1 ) ≥ 0. k(t Thus, B(k(t1 − τ )) ≥ nk(t1 ), i.e., 1 B(k(t1 − τ )) − κ > u − . n
(29)
From (28), |k(t1 − τ ) − κ| < u + , then by (27), we have 1 B(k(t1 − τ )) − κ < u − . n This contradicts (29), hence u = 0. So (23) gives the variety range of the solutions around the equilibrium κ of the delay system (16). Part iii): If the solution k(t) of system (14) is monotone, then similarly, we have u = 0. In the following, we assume k(t) is not monotone. When kM = κ , we can prove that for t0 > 0, if k(t0 ) ≤ κ , then for any t ≥ t0 , k(t) ≤ κ . Otherwise, if there exists a t1 > τ such that ˙ 1 ) ≥ 0, k(t1 ) > κ, k(t and k(t) < k(t1 ) for t ∈ [t0 , t1 ), then we have B(κ) ≥ B(k(t1 − τ )) ≥ nk(t1 ) > nκ, which is a contradiction. ˙ < 0. Thus k(t) If k(t0 ) > κ , and for all t ≥ t0 , k(t) > κ , then k(t) is monotone. So we have limt→+∞ k(t) = κ. In the following, without loss of generality, we assume that as t ≥ t0 , k(t) ≤ κ. Obviously, for k ∈ (0, kM ] , we have 1 |k − κ| > | B(k) − κ|. n As in the proof of ii), we obtain: lim k(t) = κ.
t→+∞
When δ
κ, it is easily shown that for k ∈ (0, kM ], 1 |k − κ| > | B(k) − κ|. n 239
Zhu Hongliang, Huang Wenzao As in the proof of ii), we obtain: lim k(t) = κ,
t→+∞
which proves the global attractivity of κ. Next, consider the linearized equation of system (16) around κ: ˙ k(t) = [nα − δβ(1 − α)]k(t − τ ) − nk(t).
(30)
Its characteristic equation is: λ = [nα − δβ(1 − α)]e−λτ − n.
(31)
Since n > nα − δβ(1 − α), and by condition (24), we have δ≤
n α nα+1 < , β1−α β1−α
i.e., n > |nα − δβ(1 − α)|. Thus, all roots of the equation (31) have negative real part, so the equilibrium point κ is locally stable. The latter together with global attractivity of κ establish that κ, (19), of the delay model (16) is globally asymptotically stable. 2 Example. We consider some actual values of the parameters in the condition (24). The parameter n (”natural rate of proliferation”) have the range: 0.005 ≤ n ≤ 0.3, cf. Jensen and Wang (1997). If we choose n = 0.02, capital depreciation rate δ = 0.05, β = 0.15, and α = 13 , then 0.02 1 1 n α = = . δ = 0.05 < β1−α 0.15 2 15 Condition (24) is met by the parameters values, and κ is globally asymptotically stable. 7.4
Persistent oscillation in a growth model with delays
In this section, we consider dynamics of the model (16), when it is not asymptotically stable. We shall prove that in some circumstances, the solutions k(t) will oscillate around the equilibrium (steady state), κ. First, we give the definition of oscillation around κ: Definition 1. Let k(t) be a solution of the delay system (16) with the initial function Φ ∈ C + . If there exists a sequence {ti }, ti → ∞, i → ∞ such that k(ti ) = κ, we call k(t) an oscillation solution around the equilibrium point. 240
Global Analysis of Growth Models with Time Delays We give the main result: Theorem 3. Let k(t) be a solution of the delay system (16), (17). If the depreciation rate belongs to the interval, n 1 1 α n α < δ < ( ) 1−α ( ), β1−α β α 1−α then 0 < k(t) ≤
(32)
1 sγ 1 1 B(kM ) < k ∗∗ = ( ) 1−α , t ≥ 0. n δ β
a) Under the condition (32), and if, 0 < k(0) < κ, then there exist, η > 0, T > 0, such that k(t) ≥ η for t ≥ T . b) Under the condition (32), and if two additional conditions hold: 1 i) nkM < B( B(kM )), n
(33)
1 1 B( B(kM )), (34) n n then the solutions of the delay model (16) oscillate around the equilibrium point κ. ii) Dτ ≥ 1 ; D = −B (kΔ ), kΔ =
Proof: It is easy to verify that if condition (32) holds, then there exists a T > 0 such that k(t) ≤ n1 B(kM ) for t ≥ T . Without loss of generality, we assume that the delay model (16) has the initial function Φ ∈ C + , and 1 Φ ≤ F (kM ). n In the following, we also prove that k(t) ≤ n1 B(kM ) for all t ≥ 0. Otherwise, there exists a t1 > 0 such that ˙ 1 ) ≥ 0, k(t
k(t1 ) >
and since t ∈ [0, t1 ), k(t) ≤
1 B(kM ), n
1 B(kM ). n
˙ 1 ) ≥ 0 , then Since k(t B(k(t1 − τ )) ≥ nk(t1 ) > B(kM ), which is a contradiction, cf., B(kM ), (18). 241
Zhu Hongliang, Huang Wenzao If for t ≥ 0, there exists a t2 > 0 , such that k(t) > 0 for t ∈ [0, t2 ), ˙ 2 ) ≤ 0, i.e., and k(t2 ) = 0. Then we have k(t B(k(t2 − τ )) ≤ nk(t2 ) = 0. If B(k(t2 − τ )) = 0, then k(t1 − τ ) = 0, this contradicts with the definition of t1 . If B(k(t2 − τ )) < 0, then k(t1 − τ ) > k0 - a contradiction. Hence, 1 0 < k(t) ≤ B(km ). n Condition (32) gives the boundedness of the solutions of the delay model (16). a) If the statement is not true, then there must be a t1 > τ such that 1 k(t1 ) = min{k(t) : t ∈ [0, t1 ]}, B(k(t1 )) < B( B(kM )), n ˙ 1 ) ≤ 0. We choose 0 < k1 < κ with k1 < k(t1 ) < 1 B(k1 ). and k(t n ˙ 1 ) ≤ 0 leads to Since 0 < k(0) < κ, then 0 < k(t1 ) < κ, and k(t B(k(t1 − τ )) ≤ nk(t1 ) < B(k1 ). Because B(k) is an increasing function for k ∈ [0, k M ], and is a decreasing function for k ∈ [κ, ∞), then either, k(t1 − τ ) < k(t1 ), or k(t1 − τ ) >
1 B(kM ) = η > 0. n
But if k(t1 − τ ) < k(t1 ), this contradicts with the definition of t1 . The statement holds. b) We first assert that, if t ∈ [−τ, 0], 1 k(t) ∈ [kM , B(kM )], n then for all t ≥ 0:
1 (35) k(t) ∈ [kM , B(kM )]. n From (33), κ > kM , there exists a k1 with 0 < k1 < kM . Since k ∈ [k1 , kM ] , B( n1 B(k)) > nk. If (35) does not hold, then by a), there exists a t1 > 0 with k1 < k(t1 ) < kM , such that as t ∈ [0, t1 ), k(t1 ) < k(t) ≤ 242
1 B(kM ), n
Global Analysis of Growth Models with Time Delays ˙ 1 ) < 0. Thus, and k(t B(k(t1 − τ )) < nk(t1 ).
(36)
Since for B(k) ≥ nk for k ∈ [0, kM ] , then k(t1 − τ ) > κ. By the monotony of B(k), we have 1 B(k(t1 − τ )) ≥ B( B(kM )) > nkM > nk(t1 ). n which contradicts with (36). Next, we prove the oscillation of the solutions. Without loss of generality, we assume that k(0) > k ∗ . First, we show that there exists a e1 > 0, such that e1 = inf{t : t > 0, k(t) = κ}. From (34) with D = −B (kΔ ), we can choose a Δ > 0 such that 1 Δ < min{ B(kM ) − κ, κ − kM }, n Then for |k − κ| ≤ Δ, we have |B(k) − B(κ)| ≥ D|k − κ|. ˙ ≤ 0. Obviously, if k(t) ≥ κ for t ∈ [0, t0 ] , then for t ∈ [0, t0 ], k(t) Let t1 = inf{t : t > 0, k(t) ≤ κ + Δ}. If k(0) ≤ k ∗ + Δ, then t1 = 0 . Now assume k(0) > κ + Δ. For 0 ≤ t ≤ t1 , we have ˙ k(t) = [B(k(t − τ )) − B(κ)] − n[k(t) − κ]. Therefore,
˙ k(t) ≤ −n[k(t) − κ] ≤ −nΔ.
Thus, t1 ≤ −
1 1 1 [k(t1 ) − k(0)] ≤ [ B(kM ) − κ − Δ]. nΔ nΔ n
If t ∈ [t1 , t1 + τ ], and, 0 < k(t) − κ ≤ Δ, then for t ∈ [t1 , t1 + τ ] , ˙ k(t) ≤ B(k(t − τ )) − B(κ) ≤ D(κ − k(t − τ )) ≤ −DΔ. 243
Zhu Hongliang, Huang Wenzao This leads to k(t1 + τ ) ≤ k(t1 ) − DΔτ = κ + Δ(1 − Dτ ) ≤ κ, which is a contradiction; so t1 + τ ≥ e1 . ˙ 1) = 0 , ˙ 1 ) < 0; otherwise, if k(e Moreover, it is easy to verify that k(e then k(e1 − τ ) < kM , which is a contradiction. Henceforth, we prove that there exists a e2 , such that e2 = inf{t : t > e1 , k(t) = κ}. Denote e∗ = min{e2 , e1 + τ }. Note that if there does not exist such a e2 , then let e2 = ∞. Therefore, as t ∈ (e1 , e∗ ), we have k(t) < κ. Since ˙ k(t) = [B(k(t − τ )) − B(κ)] − n[k(t) − κ], then
d [(k(t) − κ)ent ] = ent [B(k(t − τ )) − B(κ)] < 0, (37) dt i.e., (k(t)−k ∗ )ent is monotonously decreasing on (e1 , e∗ ); so e∗ = e1 +τ . Thus kM ≤ k(t) < κ for t ∈ (e1 , e1 + τ ). As t ∈ (e1 + τ, e2 ), we have ˙ k(t) ≥ 0. Now, we prove that e2 is finite. Let t2 = inf {t : t ≥ e1 + τ, k(t) ≥ κ − Δ}. If k(e1 + τ ) ≥ κ − Δ, then t2 = e1 + τ . Now assume that k(e1 + τ ) < κ − Δ. Since t ∈ [e1 + τ, t2 ] , we have ˙ k(t) ≥ nΔ. Therefore, k(t2 ) − k(e1 + τ ) ≥ nΔ(t2 − e1 − τ ), i.e.,
1 (κ − kM − Δ). nΔ If we assume that : e2 > t2 + τ , then for t ∈ [t2 , t2 + τ ], t2 ≤ e1 + τ +
˙ k(t) ≥ B(k(t − τ )) − B(κ) ≥ D(κ − k(t − τ )) > DΔ, Integrating on both sides gives, k(t2 + τ ) > k(t2 ) + DΔτ = κ + Δ(Dτ − 1) > κ, 244
Global Analysis of Growth Models with Time Delays which is a contradiction. So instead, we conclude that : e2 ≤ t2 + τ . i.e. τ < e2 ≤ ≤ ≤ ≤
t2 + τ 1 (κ − kM − Δ) e1 + 2τ + nΔ 1 t1 + 3τ + nΔ (κ − kM − Δ) 1 1 [ n B(kM ) − kM − 2Δ]. 3τ + nΔ
Similarly, it can shown that for, t ∈ (e2 , e2 + τ ), that d [(k(t) − κ)ent ] > 0, (38) dt By repeating the analysis from (37) above, (38) can be proved. 2 Theorem 3 demonstrated that if delays in economic growth models are large enough, such delays can in continuous time dynamics generate persistent oscillatory solutions. 7.5
Final comments
In complex social economic systems - from information collection, decision making, investment implementation - there can be long time delays involved. Obviously, delay phenomena will influence the dynamic characteristics of the economic systems. But in many delay situations, their implications are not sufficiently recognized and effectively studied, and so many actual projects do not achieve good results. The effect of delays on the stability of systems is a difficult topic to handle. We have given the conditions of global asymptotic stability to economic growth model with time delay, and analyzed the oscillation around the steady states. The main contribution to economic growth theory is that we investigate a kind of nonlinear dynamic phenomenon, such as fluctuations. We may emphasize that oscillations are not rare, but common in growing economies. Acknowledgements: This research was supported in part by NNSF, and the Fund for ”Study on the Evolution of Complex Economic System” at ”Innovation Center of Economic Transition and Development of Nanjing University” of Ministry of Education, China. We would like to thank Professor Bjarne S. Jensen, Copenhagen Business School, for many valuable discussions and suggestions. 245
Zhu Hongliang, Huang Wenzao References: Boucekkine R., Licandro O., and Paul C. (1997) “Difference-difference Equations in Economics: On the numerical Solution of Vintage Capital Growth Models.” Journal of Economic Dynamics and Control 21: 347–362. Chukwu E.N. (1996) “Universal Laws for the Control of Global Economic Growth with Nonlinear Hereditary Dynamics.” Applied Mathematics and Computation 78: 19–81. Chukwu E.N. (1998) “On the Controllability of Nonlinear Economic Systems with Delay: The Italian Example.” Applied Mathematics and Computation 95: 245–274. Gandolfo G. (1997) Economic Dynamics: Methods and Models. 3 ed.. North-Holland, Amsterdam. Gopalsamy K. (1992) Stability and Oscillations in Delay Differential Equations of Population Dynamics. Kluwer Academic Publishers, Boston. Hale J.K. (1993) Theory of Functional Differential Equation. SpringerVerlag, New York. Invernizzi S., and Medio A. (1991) “On Lags and Chaos in Economic Dynamic Models.” Journal of Mathematical Economics 20: 521–550. Jarsulic M. (ed.) (1993) Non-Linear Dynamics in Economic Theory. Edward Elgar Publi. Com.. Jensen B.S. (1994) The Dynamic Systems of Basic Economic Growth Models. Dordrecht: Kluwer Academic Publishers. Jensen, B.S., and Wong K.Y. (1997) Dynamics, Economic Growth, and International Trade. Ann Arbor, Unversity of Michigan Press. Jensen B.S., and Larsen M.E. (1987) “Growth and Long-Run Stability.” Acta Applicandae Mathematicae 9: 219–237. Smale S. (1980) The Mathematics of Time: Essays on Dynamical Systems, Economic Processes: Berlin, Springer-Verlag. Zhang W.B. (1990a) Economic Dynamics,Growth and Development. Lecture Notes, in Economics and Mathematical Systems, Vol.350. Springer-Verlag. Zhang W.B. (1990b) Synergetic Economics: Dyanmics, Nonlinear, Instability, Non-equilibrium, Fluctuations and Chaos. Springer-Verlag.
246
Chapter 8 Hopf Bifurcation in Growth Models with Time Delays
Morten Brøns Department of Mathematics, Technical University of Denmark Bjarne S. Jensen University of Southern Denmark and Copenhagen Business School
8.1
Introduction
Time delays (time lags) in production/capital utilization and accumulation dynamics were introduced into basic aggregate growth models by Zhu and Huang (2007). Using a CD production function (Inada conditions), they performed a global analysis of the delay differential (“mixed difference-differential”) equation. They showed that for sufficiently small delays, the steady-state solution (equilibrium) is globally stable; but the steady-state solution of the capital-labor ratio loses this global stability property, when the delay (time lag) is above a certain critical value (length). Furthermore, they show that for certain values of the time delay, all solutions persistently oscillate (but not necessarily strictly periodic). The purpose of the present paper is - with CD and for particular CES technologies - to complement this global dynamic analysis by performing a nonlinear, local analysis of the dynamics of the delay (time lag) model, when the size of the time delay (lag) is close to the critical value. We show that for this critical delay value, a Hopf bifurcation (of a fixed point into a closed orbit in a neighborhood of the equilibrium) occurs, i.e., periodic solutions (“limit cycles”) are created, when the steady state solution (equilibrium) of the capital-labor
Morten Brøns, Bjarne S. Jensen ratio loses its local stability. We also derive an analytical expression which determines the stability type (supercritical or subcritical) of the periodic solutions (“limit cycles”). Finally, we show that the delay model with CES can exhibit dynamics with solutions that have previously been observed in other (electrodynamic, engineering), delay differential equations, namely: Square waves and chaos (aperiodic waves/cycles). Economic models of business cycles [as recurrent fluctuations (upswing/downswing) in economic activity] have a long tradition as a specialized branch of economic theory with many hypotheses or paradigms exhibited. However, regarding their dynamic properties, it is worth emphasizing as do Gabisch and Lorenz (1987, p. 3): Whether or not a dynamic model of an economy is a business cycle model, i.e. a model which allows for fluctuations of major economic variables for a considerable amount time, does not depend on the general motivated and paradigmatic features of a model, but rather on its mathematical structure. While, e.g. the introduction of a certain lagstructure in the production function can certainly not be a distinctive mark from an economic point of view, this structure may be the essential dynamic structure which allows for oscillatory motions of an economy. Therefore, this text will concentrate on those features of dynamic economic models which constitute the essential fluctuationgenerating forces. It is the nature and consequences of these delays (time lags) that our theorem and solutions will demonstrate. The relevance and proper interpretation of all economic time lags (duration of delays) will depend on the time units (period) of measurement (year, quarter, month, day) for the economic variables. The delays/lags are not necessarily equal to integers of any time units. Treating economic variables as continuous time processes, the actual occurring delays in such dynamic models can be studied for any length (real number). Often an analytical dilemma arises in economic dynamics, as noted by Goodwin (1990, p. 1): There are two broad types of dynamical equation systems: continuous time and discrete time; both have frequently been used in economics. The latter arise because there are significant time-lags in an economy. The trouble is 248
Hopf Bifurcation in Growth Models with Time Delays that these occur in the context of economic activity which is substantially continuous, so that one should formulate mixed difference-differential systems, a procedure the complications of which place it beyond the scope of this book. We examine the complications and show the advantages of a rigorous approach to delay issues of using a mixed dynamic system in continuous time. 8.2
Dynamics of growth and cycles
Let us briefly review the derivation of the dynamics for the capitallabor ratio in growth models with time delays. The starting point is the neoclassical economic growth model, L˙ = nL,
K˙ = sY (t) = sF (L, K) = sLF (1, k) ≡ Lsγf (k); k(t) = K(t)/L(t).
(1)
where K is capital, L is labour, and n and s are standard parameters (n > 0, 0 < s < 1). The production function F is homogeneous of degree one; the TPF (“total factor productivity”) parameter (γ) of F is here explicitly specified together with the saving (investment) parameter (s), as this is a practical procedure in numerical simulations below, cf. (7). The growth model is then modified to include a delay (time lag) τ in the productive operation (utilization) of installed (acquired) capital (machinery), L˙ = nL,
K˙ = sY (t) = sF (L, Kτ );
Kτ = K(t − τ )
(2)
with the standard notation Kτ = K(t − τ ). From (2), we obtain the time derivative of k(t) = K(t)/L(t) as, % & ˙ ˙ ˙k = LK − K L = sF (L, Kτ ) − n K = sF 1, Kτ − nk L2 L L L & & % % n(t−τ ) L0 e Kτ L τ − nk − nk = sγf kτ = sγf Lτ L L0 ent = sγf (e−nτ kτ ) − nk; kτ ≡ Kτ /Lτ = K(t − τ )/L(t − τ ) (3) i.e.,
k˙ = sγf (e−nτ kτ ) − nk.
(4) 249
Morten Brøns, Bjarne S. Jensen The delay model (4) may be combined with depreciation of the productively operating capital stock (same time lag, delay, τ ) to obtain the basic delay model, cf. Zhu and Huang (2006), k˙ = sγf (e−nτ kτ ) − δτ e−nτ kτ − nk.
(5)
For mathematical analysis, it is convenient to introduce a new variable q = e−nτ k = Kτ /L;
qτ = q(t − τ ) = e−nτ kτ = e−nτ (Kτ /Lτ )
(6)
which implies that the standard economic growth model with delay (5) becomes q˙ = e−nτ k˙ = βf (qτ ) − δqτ − nq ≡ h(q, qτ ); β = sγe−nτ ; δ = δτ e−nτ (7) The family of solutions q(t) to (7) will generally display a more oscillatory time path instead of the monotonicity usually seen with τ = 0. 8.3
Hopf bifurcation analysis
We will first assume that there exists an equilibrium (steady-state) solution, q ∗ , to (7): ∀t : q(t) = q ∗ = qτ∗ = q ∗ (t − τ ) = e−nτ k ∗ = e−nτ kτ∗ = e−nτ k ∗ (t − τ ) (8) satisfying q˙ = h(q ∗ , q ∗ ) = βf (q ∗ ) − (δ + n)q ∗ = 0.
(9)
It will be convenient to introduce the variable z (deviation of q from the equilibrium), z = q − q∗; zτ = z(t − τ ) = qτ − q ∗ = q(t − τ ) − q ∗ ;
z = 0 ⇔ zτ = 0 (10)
We next assume that f ,(7), is analytic (can be expanded as Taylor series), and as our dynamic analysis will be local, we will make up to a cubic/third-order Taylor expansion of h,(7), at the equilibrium (z = 0), to explicitly obtain the parametrized dynamics: z˙ = h(q ∗ + z, q ∗ + zτ ) = A0 z + A1 zτ + 250
A2 2 A3 3 z + z + O(zτ4 ) 2 τ 6 τ
(11)
Hopf Bifurcation in Growth Models with Time Delays where the Taylor coefficients are ∂h ∗ ∗ ∂h ∗ ∗ (q , q ) = −n, A1 = (q , q ) = βf (q ∗ ) − δ, ∂q ∂qτ ∂ 2h ∂ 3h A2 = 2 (q ∗ , q ∗ ) = βf (q ∗ ), A3 = 3 (q ∗ , q ∗ ) = βf (q ∗ ). ∂qτ ∂qτ A0 =
(12)
The basic properties and results are summarized in: Theorem 1. Assume that f , (7), is analytic, and assume the existence of a steady-state (equilibrium) solution q ∗ , (9), i.e., h(q ∗ , q ∗ ) = βf (q ∗ ) − (δ + n)q ∗ = 0.
(13)
If the ratio (R) of the first-order Taylor coefficients satisfies the condition, −n τ0 . Corresponding to τ0 , (15), and the periodic solution of the linearization of (11), its angular velocity (angular frequency), ω0 , and period, T0 , are given by: 1
ω0 = [A21 − A20 ] 2 ;
T0 = 2π/ω0 = 1/ν0
(ν0 : frequency)
(16)
Furthermore, a family of small-amplitude periodic solutions bifurcates from the steady-state, q ∗ ,(13). The periodic solutions (limit cycles) exist for delays τ in an interval, either to the left or to the right of τ0 . The important qualitative properties of the periodic solutions (limit cycles) are determined by the sign and the numerical value of a number 251
Morten Brøns, Bjarne S. Jensen (cubic expansion parameter), τ2 , given by: 1 × τ2 = 8ω02 2 − A1 τ0 [2R2 + 6R − 11] + R[4R(1 + R) − 13] 2 A2 − (A1 τ0 − R)A3 A1 (1 + R)(5 − 4R) 8 2 − A1 τ0 [2(A0 /A1 )2 + 6(A0 /A1 ) − 11] = 8A31 [1 − (A0 /A1 )2 ][1 + (A0 /A1 )][5 − 4(A0 /A1 )] (A0 /A1 )[4(A0 /A1 )(1 + (A0 /A1 )) − 13] + A22 8A31 [1 − (A0 /A1 )2 ][1 + (A0 /A1 )][5 − 4(A0 /A1 )] 1 A1 τ0 − (A0 /A1 ) A3 . − 8 A21 [1 − (A0 /A1 )2 ] (17) If τ2 > 0, the limit cycles existing for τ > τ0 (supercritical) are stable/attractive; if τ2 < 0, the limit cycles existing for τ < τ0 (subcritical) are unstable/repulsive. The period T2 of the limit cycles is given by & % 2π ω2 T2 = [τ − τ0 ] + O([τ − τ0 ]2 ) (18) 1− ω0 ω0 τ 2 where ω2 =
1 2 R2 + 6 R − 11 2 A2 + A1 A3 . 8 ω0 (1 + R)(5 − 4 R)
(19)
The absolute/numerical value of τ2 tells how fast the amplitude of the limit cycles grows with the size of the delay deviation τ − τ0 ; the amplitude of the limit cycle is given by, 0 τ − τ0 + O(τ − τ0 ). (20) = τ2 The smaller τ2 , the quicker the amplitude grows, according to (20). In the atypical case τ2 = 0, the computations must be continued to a higher order than three in (11). Proof: The stability of the equilibrium z = 0 of (11) can be determined by the linearization z˙ = A0 z + A1 zτ 252
(21)
Hopf Bifurcation in Growth Models with Time Delays Solutions of this linear delay differential equation (21) have the complex form z(t) = Ce(α+iω)t = Ceαt (cos ωt + i sin ωt).
(22)
If all solutions have α < 0, then the equilibrium z = 0 is stable, while it is unstable if there are solutions with α > 0. Hence, a change of stability occurs at a value τ0 of the delay when there are solutions z = Ceiω0 t . Inserting such a solution, and taking real and imaginary parts yields 0 = A0 + A1 cos ω0 τ0 , ω0 = −A1 sin ω0 τ0 .
(23)
which gives, cos ω0 τ0 = −A0 /A1 , sin ω0 τ0 = −
ω0 . A1
(24)
Then squaring and adding yields a critical angular velocity (angular frequency), ω0 , as 1 1 (25) ω0 = A21 − A20 = |A1 | [1 − (A0 /A1 )2 ] 2 , Hence a critical delay (time lag) τ0 exists, cf. (24),(25), τ0 =
1 arccos(−A0 /A1 ) arccos(−A0 /A1 ) = 1 ω0 |A1 | [1 − (A0 /A1 )2 ] 2
(26)
if |A0 /A1 | < 1.
(27)
Inserting the Taylor coefficient expressions (12) in (27) and (26) establishes (14) and (15). At τ = τ0 , where the local dynamics changes - from exponential decay towards the equilibrium to the existence of exponentially growing solutions - the linearized delay differential equation (21) has periodic solutions with the period: T0 =
2π = 1/ν0 ω0
(ν0 : frequency).
(28)
This indicates the existence of periodic solutions (limit cycles) for the full nonlinear delay differential equation (11) for delays τ close to τ0 . In fact, the Hopf bifurcation theorem states that small amplitude periodic solutions exist (Diekmann, Gils, Lunel and Walther 1995; 253
Morten Brøns, Bjarne S. Jensen Guckenheimer and Holmes 1983). We proceed to look for these period solutions (limit cycles) following Morris (1976). Since the period cannot be expected to be exactly given by (28) for all τ , the exact determination/computation of the period (T ), (or angular velocity, ω), is an important part of our task. We now introduce a scaled time variable θ = ωt
(29)
where ω is a parameter/constant close to the critical angular velocity, ω0 . This transforms (11) into ωz = A0 z + A1 zωτ +
A2 2 A3 3 ∂z 4 zωτ + zωτ + O(zωτ ); z = 2 6 ∂θ
(30)
The coefficients Ai , 1 = 0, . . . , 3, are here the same as in (12). In (30) we will look for 2π-periodic solutions, and determine the parameter ω = ω(τ ) as we go along such that they have period 2π. Let be some measure of the amplitude of a periodic solution z = z(θ, )
(31)
We expand the periodic solution in a Taylor series of the amplitude , z(θ, ) = z (1) (θ) + z (2) (θ)2 + z (3) (θ)3 + . . . ,
(32)
where
1 ∂ nz (θ, 0) (33) n! ∂n and correspondingly expand ω and the delay τ at ω0 , τ0 , where the oscillations are initiated z (n) (θ) =
ω = ω0 + ω1 + ω2 2 + . . . ,
(34)
τ = τ0 + τ1 + τ2 2 + . . . .
(35)
Inserting these expansions in (30), we collect terms of the same order in , and set each of the terms in the resulting power series equal to zero. (1) To order 1 , we get [z (1) = ∂ 2 z(θ, 0)/∂∂θ; zω0 τ0 = z (1) (θ − ω0 τ0 )] :
= 0. ω0 z (1) − A0 z (1) − A1 zω(1) 0 τ0
(36)
When (ω0 , τ0 ) fulfil (24), the complete solution of (36) is, z (1) (θ) = a cos θ + b sin θ. 254
(37)
Hopf Bifurcation in Growth Models with Time Delays We can fix the measure of the amplitude by picking the initial conditions: z(0) = , z (0) = 0, which gives:
z (1) (0) = 0, z (1) (0) = 1, z (j) (0) = 0, z (j) (0) = 0 for j ≥ 2. With this, we get
z (1) (θ) = sin θ.
(38) (39)
Proceeding to order 2 , we get the equation
= ω0 z (2) − A0 z (2) − A1 zω(2) 0 τ0
+ ω1 z (1) − A1 (ω0 τ1 + ω1 τ0 )zω(1) 0 τ0
A2 (1) 2 (z ) . (40) 2 ω0 τ0
With the solution for z (1) from (39), (40) can be rewritten as
= ω0 z (2) − A0 z (2) − A1 zω(2) 0 τ0 1 A2 ω02 + [−ω1 + A0 (ω1 τ0 + ω0 τ1 )] cos θ + ω0 (ω1 τ0 + ω0 τ1 ) sin θ 4 1 A2 + [(ω02 − A20 ) cos 2θ + 2A0 ω0 sin 2θ]. (41) 4 A21 The homogeneous part of this inhomogeneous linear equation is identical to the equation for z (1) . Hence, the solution to the homogeneous equation is resonant with the sin θ and cos θ terms on the right hand side, and will give rise to non-periodic solutions. To obtain periodic solutions of period 2π, we must require that the resonant terms vanish, i.e. −ω1 + A0 (ω1 τ0 + ω0 τ1 ) = 0, ω0 (ω1 τ0 + ω0 τ1 ) = 0.
(42)
These are linear equations in τ1 , ω1 , with solution τ1 = 0, ω1 = 0.
(43)
With (43), we can find – by insertion in (41) and using the initial conditions (38) – a periodic solution of (41) of the form z (2) (θ) = a0 + a1 cos θ + b1 sin θ + a2 cos 2θ + b2 sin 2θ
(44)
The result is
255
Morten Brøns, Bjarne S. Jensen A2 , 4(A0 + A1 )
(45)
a1 =
A2 (2A1 − A0 )(A1 − A0 ) , 2A1 (A1 + A0 )(5A1 − 4A0 )
(46)
b1 =
A2 ω0 (A1 − A0 ) , 2A1 (A1 + A0 )(5A1 − 4A0 )
(47)
a2 =
A2 (A21 − 2A20 + 2A1 A0 ) , 2A1 (A1 + A0 )(5A1 − 4A0 )
(48)
b2 =
A2 ω0 (A1 − A0 ) , 2A1 (A1 + A0 )(5A1 − 4A0 )
(49)
a0 =
Finally turning to order 3 , we obtain
ω0 z (3) − A0 z (3) − A1 zω(3) 0 τ0
= −ω2 z (1) − A1 (ω0 τ2 + ω2 τ0 )zω(1) + A2 zω(1) z (2) + 0 τ0 0 τ0 ω0 τ0
A3 (1) 3 (z ) . (50) 6 ω0 τ0
From the previous calculations, the right hand side is known. It has the form of a trigonometric polynomial, u0 + u1 cos θ + v1 sin θ + u2 cos 2θ + v2 sin 2θ + u3 cos 3θ + v3 sin 3θ. (51) The expressions for the coefficients are long and complicated. We are only interested in u1 and v1 , as they multiply the resonant terms, which must be zero to allow periodic solutions. Thus one obtains, u1 = A0 ω0 τ2 + (τ0 A0 − 1)ω2 −11A21 A22 + 4A1 A0 A22 − 4A1 A20 A3 + 8A21 (A1 + A0 )(5A1 − 4A0 ) 2 A1 A0 A3 + 5A31 A3 + 4A20 A22 + ω0 , 8A21 (A1 + A0 )(5A1 − 4A0 ) v1 = ω02 τ2 + ω0 τ0 ω2 4A1 A20 A22 + 4A30 A22 − 4A1 A30 A3 + 8A21 (A1 + A0 )(5A1 − 4A0 ) 2 2 A A A3 + 5A31 A0 A3 + 2A31 A22 − 13A21 A0 A22 + 1 0 . 8A21 (A1 + A0 )(5A1 − 4A0 )
(52)
(53)
Solving the equations u1 = 0, v1 = 0, one obtains τ2 , (17) and ω2 , (19), after simplifications using (14), (15), (16). 256
Hopf Bifurcation in Growth Models with Time Delays Solving (35) for yields (20). The amplitude is only defined when τ − τ0 has the same sign as τ2 ; hence the periodic solutions exist only on one side of τ0 , as described in the Theorem 1. Inserting (20) in (34) yields ω = ω0 +
ω2 [τ − τ0 ] + O([τ − τ0 ]2 ). τ2
(54)
In the original time variable the period is T = 2π/ω. Inserting (54) in the latter expression for T and making a Taylor expansion in τ − τ0 finally gives (18). We omit the proof of the final statement of the theorem, concerning stability of the periodic solutions. The result is a standard property of the Hopf bifurcation (Diekmann, Gils, Lunel, Walther 1995). The expression (17) for τ2 is complicated, and no direct interpretation in economic terms seems to be possible. In particular, it depends on both second and third derivatives of f , so convexity properties alone are not sufficient to determine τ2 . Hence, one must turn to concrete computations as we do now for specific illustrative examples. 8.4
CD technologies and time delays
We apply Theorem 1 to the equation (7) with a CD production function (55) F (L, K) = γL1−a K a = Lγf (k) ≡ Lγk a ; 0 < a < 1 f (q) = q a = (e−nτ k)a ; f (q) = aq a−1 , f (q) = a(a − 1)q a−2 , f (q) = a(a − 1)(a − 2)q a−3 (56) The equation (13) has a unique steady state (equilibrium), cf. (56), (8), (7),
β q = n+δ ∗
1 1−a
sγ = n+δ
1 1−a
−nτ
−nτ
e 1−a = κ e 1−a ; k ∗ = q ∗ enτ = κ e
−anτ 1−a
nτ
; κ = q ∗ e 1−a (57)
From the simple exact CD expressions, (57), we should note the general inequalities : (58) q∗ < k∗ < κ
257
Morten Brøns, Bjarne S. Jensen Next we find, with (12), (56), (57), A0 = −n,
(59)
A1 = a(n + δ) − δ,
(60)
β A2 = β n+δ
β A3 = β n+δ
a−2 1−a
a−3 1−a
a(a − 1),
(61)
a(a − 1)(a − 2)
(62)
The bifurcation condition (14) becomes −n < 1 ⇔ n/δ < 1 − a < 1 |R| = |A0 /A1 | = a(n + δ) − δ 1+a
(63)
Hence a Hopf bifurcation to periodic solutions can occur only when the depreciation parameter δ is sufficiently large. In particular, when delay of the depreciation of the capital is omitted, corresponding to δ = 0, the bifurcation condition is not fulfilled. Thus, with δ = 0, the steady-state is asymptotically stable for all delays τ . This is in agreement with the results from Zhu and Huang (2006), where it is even shown that the steady-state/equilibrium is globally attracting. Returning to situations where (63) is fulfilled, the bifurcation analysis can be applied. We find - according to Theorem 1 - the exact critical delay τ0 as, τ0 =
arccos
arccos (−A0 /A1 ) 1
|A1 | [1 − (A0 /A1 )2 ] 2
=
n a(n+δ)−δ
1
n |a(n + δ) − δ| [1 − ( a(n+δ)−δ )2 ] 2
and the exact cubic expansion parameter τ2 as, τ2 =
258
1 × 8ω02 2 − A1 τ0 [2 R2 + 6 R − 11] + R[4 R (1 + R) − 13] 2 A2 − A1 (1 + R) (5 − 4 R) (A1 τ0 − R)A3
(64)
Hopf Bifurcation in Growth Models with Time Delays =
1 × 8[(a(n + δ) − δ)2 − n2 ] 88
8 % 2 − (a(n + δ) − δ)τ0 2
−n a(n + δ) − δ & − 11
%
&2
−n a(n + δ) − δ & % &% & 9 % −n −n −n 4 1+ − 13 × + a(n + δ) − δ a(n + δ) − δ a(n + δ) − δ 2 a−2 β 1−a a(a − 1) β n+δ +6
−n −n [a(n + δ) − δ][1 + a(n+δ)−δ ][5 − 4( a(n+δ)−δ )] −n × − (a(n + δ) − δ) τ0 − a(n + δ) − δ 8 99 a−3 1−a β β a(a − 1)(a − 2) . (65) n+δ
A general determination of the sign of τ2 (65) as a function of the parameters is a huge task. For specific applications, a numerical computation is needed, but special cases may be analyzed in detail. As an example, we now consider the limit of small a. Using Taylor’s theorem, the critical length of the delay becomes,
τ0 =
arccos (−A0 /A1 ) |A1 | [1 − (A0 /A1
1
)2 ] 2
=
arccos (−n/δ) 1
|δ| [(1 − (n/δ)2 ] 2
+ O(a)
(66)
The cubic expansion parameter similarly becomes, τ2 =
[(1/β)(1 + n/δ)]2 (n + τ0 δ 2 ) a + O(a2 ). 4(1 − n/δ)
(67)
We note that τ2 → 0 for a → 0. As n/δ < 1 follows from (63), each of the terms in the coefficient of a, (67) are positive, i.e. τ2 is positive. Hence the Hopf bifurcation is supercritical.
259
Morten Brøns, Bjarne S. Jensen 8.5
CES technologies and time delays
Next we study the Hopf bifurcation of (7) with, σ/(σ−1) F (L, K) = γ (1 − a)L(σ−1)/σ + aK (σ−1)/σ ; 0 < a < 1, σ > 0, σ = 1 (68) σ/(σ−1) f (k) = (1 − a) + ak (σ−1)/σ .
(69)
Solving (13) with (69), we get the unique steady-state, cf. (57), (58), (7), σ 8 9 1−σ 1−σ σ β 1 ∗ −a ; q ∗ < k ∗ < κ (= q ∗ : τ = 0) q = 1−a n+δ (70) which economically exists when the parameters satisfy, % &(1−σ)/σ 1 β > 1. (71) a n+δ The bifurcation condition (14) is −n < 1. |A0 /A1 | = (σ−1)/σ β (n + δ)1/σ a − δ
(72)
For the CES production function, a delay in the capital stock depreciation is also needed to obtain a critical/bifurcation time delay. For δ = 0, the economic condition for existence of a steady-state (71) is % &(1−σ)/σ 1 β > 1. (73) a n But for δ = 0, the bifurcation condition (72) is % &(1−σ)/σ 1 β < 1. a n
(74)
Evidently, both cannot be satisfied simultaneously. The general expressions for τ2 and ω2 are formidable, and we omit them here. Again, as an example, we consider the limit of small a. The critical delay from (15) is τ0 = 260
arccos (−A0 /A1 ) [A21
−
1 A20 ] 2
=
arccos (−n/δ) 1
|δ| [(1 − (n/δ)2 ] 2
+ O(a)
(75)
Hopf Bifurcation in Growth Models with Time Delays so n/δ < 1 is needed for the existence of a critical delay. Next, we get σ+1
τ2 =
[(1/β)(1 + n/δ)] σ (n + τ0 δ 2 ) δ [8σ 2 /(σ + 1)](1 − n/δ)
1−σ σ
a + O(a2 ).
(76)
Here we also see that τ2 → 0 for a → 0. As each of the factors in the coefficient of a is positive, i.e., with τ2 is positive, the Hopf bifurcation is supercritical. 8.6
CES and delays with cycles, square waves, and chaos
Next we turn to numerical simulations to demonstrate that a number of typical features of delay differential equations appear in the present model (7) with the CES production function (69). The simulations are performed with a simple Euler method, using a time step, Δt = τ /200. Tests with smaller time steps resulted in the same solutions with high accuracy. Two sets of basic economic parameters as shown in Table 1, together with variation of the delay time τ . Consider a general delay-differential equation x˙ = h(xτ , x),
(77)
and consider x0 = y0 such that h(x0 , y0 ) = h(y0 , x0 ) = 0.
(78)
It is well-known (e.g., Chow, Hale, and Huang 1992, Ivanov and Sharkovsky 1992) that under certain conditions, the general equation (77) allows a periodic solution with period approximately 2τ , which is of the square-wave type. The periodic solution spends approximately time τ at each of x0 , y0 , with rapid transitions between these almost constant states. Without going into detail, we demonstrate below the possibility of this feature with parameters of CES delay model, cf. Case 1. Parameter Case 1. Fig. 1(a) shows the periodic solution for a slightly supercritical value of τ . As expected from the Hopf theory, the solution is close to harmonic. As τ is increased, the periodic solution gradually turns into a square wave as shown in Fig. 1(b). Solving (78) yields x0 = 2.57, y0 = 11.4, which agrees with the numerically obtained levels in the square wave. In Table 2, we see a dramatic difference between q ∗ and k ∗ ; this steady-state discrepancy measures the excess capital (installed compared to operating) per worker - the 261
Morten Brøns, Bjarne S. Jensen Basic parameters Case 1 σ 0.5 a 0.6 n 0.07 Hopf parameters β 0.7 δ 0.12 7.710 q∗ 44.96 τ0 τ2 0.5837 5.506 × 10−2 ω0 T0 114.1 −5.986 × 10−4 ω2 T2 114.1 + 2.125(τ − τ0 )
Case 2 0.5 0.3 0.01
0.1 0.02 4.333 154.9 1.700 1.412 × 10−2 445.0 −1.278 × 10−4 445.0 + 5.729(τ − τ0 )
Table 1: Parameter values for numerical simulations, together with computed quantities from Theorem 1: Critical point q ∗ , critical delay τ0 , cubic expansion coefficient τ2 , angular frequency of linearized system ω0 , expansion coefficient for angular frequency ω2 , and approximate period T2 of limit cycles as a function of the deviation of the delay from the critical value. β τ 0.7 50 0.7 150
sγ q∗ 23.2 7.71 25420.9 7.71
k∗ 255.3 279993.5
κ 303.5 334483.4
Table 2: Parameter values (steady state capital-labor ratios) corresponding to the solutions in Fig. 1. long-run accumulation of idle (non-operating) capital stocks, due to a time delay of the particular/critical size, τ0 . Fig. 1(b) with square waves is extreme (generated by large values of TPF and delay), but it looks a bit like the so-called “ceiling/floor” models - patterns that have been associated with, e.g., the housing industry. Rather large values of the basic parameter for the capital intensity and labor growth in Case 1 may also support alternating time paths of ”ceiling/floor” type for this and similar industries. Parameter Case 2. We shall demonstrate transition to chaos. As τ2 is smaller than for set 1, the amplitude of the periodic solution grows 262
Hopf Bifurcation in Growth Models with Time Delays (a)
(b)
11 10
8
q(t)
q(t)
9
7 6 5 4 0
200
400
600 t
800
1000
12 11 10 9 8 7 6 5 4 3 2
1200
0
200
400
600 t
800
1000
1200
Figure 1: Numerical simulations for Case 1. (a): τ = 50. (b): τ = 150. The dashed line represents the equilibrium q ∗ . slower with τ . An almost harmonic periodic solution quite far from the critical value τ2 is shown in Fig. 2(a). However, by increasing τ slightly as in Fig. 2(b), a period doubling bifurcation occurs, as the solution now consists of repeated pairs of different peaks. This is also demonstrated when the solution is plotted as a parameterized curve (q(t), q(t−τ )), as in the right panels. It is well-known, e.g., Guckenheimer and Holmes (1983), that a period doubling is typically is the first step in a route to chaos. Indeed, a further increase of τ , Fig. 2(c), shows a solution with a complicated pattern of different peaks. The solution appears chaotic. Liapunov exponents can be computed to substantiate this claim (cf. Guckenheimer and Holmes 1983). β τ 0.01 220 0.01 240 0.01 255
sγ 0.90 1.10 1.28
q∗ 4.33 4.33 4.33
k∗ 39.08 47.73 55.45
κ 42.40 51.95 60.52
Table 3: Parameter values (steady-state capital-labor ratios) corresponding to the solutions in Fig. 2. In Table 3, the difference between q ∗ and k ∗ is less pronounced than in Table 2. Still the steady-state (long-run) discrepancy is of tenfold size. The importance and gain from eliminating any delays (slack) in capital utilization is again demonstrated, cf.(κ); the delay model provides some evidence for the increased economic-financial attention to “turn-key delivery/contracting” and “just in time” supply-chain management efforts.
263
Morten Brøns, Bjarne S. Jensen
8
7
7
6
6
5
5
q(tau)
q(t)
(a) 8
4
4
3
3
2
2
1
1
0
0 0
1000
2000
3000
4000
5000
0
1
2
3
4 q(t)
5
6
7
8
0
1
2
3
4 q(t)
5
6
7
8
0
1
2
3
4 q(t)
5
6
7
8
t
8
7
7
6
6
5
5
q(tau)
q(t)
(b) 8
4
4
3
3
2
2
1
1
0
0 0
1000
2000
3000
4000
5000
t
8
8
7
7
6
6
5
5
q(tau)
q(t)
(c)
4
4
3
3
2
2
1
1
0
0 0
2000
4000
6000 t
8000
10000
Figure 2: Numerical simulations for parameter Case 2. Left panels show time paths, q(t), and the dashed lines represent the steady-state (equilibrium), q ∗ . Right panels show the pair (q, qτ ) in a phase plane; the marker represents the equilibrium, (q, qτ ) = (q ∗ , q ∗ ). (a): τ = 220. (b): τ = 240. (c): τ = 255. Regarding the occurrence of chaotic motion, it is well-known that, cf. Lorenz (1989, p. 139): While chaotic dynamics in discrete-time systems can already occur in one-dimensional systems like the logistic equation, the equivalent phenomenon in continuous time is restricted to at least three-dimensional systems. Canon264
Hopf Bifurcation in Growth Models with Time Delays ically, chaos cannot occur in a two-dimensional system, because a trajectory cannot intersect itself. The cyclical motion in a two-dimensional system is thus restricted to a monotonically damped, or explosive oscillations, and closed orbits. The fact that chaos can occur in threedimensional continuous time systems, the Lorenz attractor or the R¨ossler attractor can be illustrated with the help of so-called Poincare sections and maps. As to dimensional aspects of any continuous time model with delays, it is essentially “infinite dimensional”. Hence we can see chaotic time path of our single delay differential equation in Fig. 2. When the general and chaotic time paths are exhibited as phase portraits in Fig. 3, the fundamental difference between dynamic systems with delays (b), and systems with no delays (a), is highlighted. Moreover, the portrait (b) observationally integrates the phenomena of economic growth and cycles.
Figure 3: Phase Portraits of {L(t), K(t) = L(t)k(t)} from CES without delay (a), cf. Jensen et. al (2005), and with delay (b), Table 1.
8.7
Final comments
The search for various cycles in historical economic data has a long tradition - in Western economies essentially back to the take-off of industrialization with its rapid population and economic growth per capita. Two distinct “causes” (explanations, paradigms) of recurrent 265
Morten Brøns, Bjarne S. Jensen industrial fluctuations can be found in the mainstream economic literature. One category of models relies heavily on exogenous forces (stochastic shocks) to start as well to maintain cycles. Another type of models generate cycles endogenously by their own formal (mathematical) structure. We have only been concerned with the latter. However, in pure forms, both types of cyclical models have mostly ignored that industrial fluctuations (shorter or longer “waves”) occur on the backdrop of, more or less, steady economic growth. Despite many devices of “detrending” observed economic series, the dynamics of business cycle theory has been divorced from the dynamic systems of the basic (mainstream) growth models, which methodologically entered late in formal quantitative economic theory – mathematical physics/mechanics also began with equilibrium analysis and progressed to analyzing periodic motions (harmonic oscillators). It seems instructive now to recall a perspective on model building from Poincar´e, (1952, p. 181-182): Long ago it was said: If Tycho had had instruments ten times as precise, we would never have had a Kepler, or a Newton, or astronomy. It is misfortune for a science to be born too late, when the means of observation have become too perfect. That is what is happening at this moment with respect to physical chemistry; the founders are hampered in their general grasp by third and fourth decimal places; happily they are men of robust faith. As we get to know the properties of matter better, we see that continuity reigns. From the work of Andrews and Van der Waals, we see how the transition from the liquid to the gaseous state is made, and that is not abrupt. Similarly, there is no gap between the liquid and solid states. With this tendency there is no doubt a loss of simplicity. Such and such an effect was represented by straight lines; it is now necessary to connect these lines by more and more complicated curves. On the other hand, unity is gained. Separate categories quieted but did not satisfy the mind. Hence beginning with aspirations and mathematical tools and data for cyclical “ fine tuning” (epi-cycles) can hamper progress in overall economic understanding and successful application of dynamic models for secular growing economies. Thus, if we look at the numerical size (length) of the critical delays (τ0 ) for Hopf bifurcations, and the necessary delays (τ ) for the limit cycles or “chaos”, it is clear that the 266
Hopf Bifurcation in Growth Models with Time Delays relevant time unit cannot be decades or years, and hardly quarters. But if these delays/lags refer to months or days, their proper interpretations and the economic impacts of such delays becomes more relevant for useful insights into the dynamics behind such observed data series. Aperiodic oscillations (“chaos”) on a small time scale is not “strange economics”; but their mathematical model is fairly complicated. However, the conclusion from our extension of the standard aggregate growth model is that delay differential equations offer a powerful mathematical instrument for a coherent treatment (integration) - as exhibited in Fig. 3 - of economic growth and cycle models. References: Chow, S.N,, Hale, J.K., and Huang, W. (1992) “From Sine Waves to Square Waves in Delay Equations.” Proceedings of the Royal Society of Edinburgh 120A: 223–229. Diekmann, O., Gils, S.M. van, Verduyn Lunel, S.M. and Walther, H.-O. (1995) Delay Equations. Springer-Verlag, New York. Gabisch G., and Lorenz H.W. (1987) Business Cycle Theory. Springer Verlag. Goodwin, R.M. (1990) Chaotic Economic Dynamics. Oxford University Press. Lorenz, H.W. (1989) Business Cycle Theory. Springer Verlag. Guckenheimer, J., and Holmes, P. (1983) Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields. Springer Verlag, New York. Ivanov, A.F., and Sharkovsky, A.N. (1992) “Oscillations in Singularly Perturbed Delay Equations.” Dynamics Reported (New Series) 1: 164–224. Jensen, B.S. (1994) The Dynamic Systems of Basic Economic Growth Models. Kluwer Academic Publishers, Dordrecht (trans. Peking University Press, Beijing). Jensen, B.S, Alsholm, P.K, Larsen, M.E. and Jensen, J.M. (2005) “Dynamic Structure, Exogeneity, Phase Portraits, Growth Paths, and Scale and Substitution Elasticities.” Review of International Economics 13: 59–89. Morris, H.C. (1976) “A Perturbative Approach to Periodic Solutions of Delay-Differential Dquations.” Journal of the Institute of Mathematics and its Applications 18: 15–24. 267
Morten Brøns, Bjarne S. Jensen Poincar´e, H. (1952) Science and Hypothesis. Dover Publishers, New York. Puu, T. (2003) Attractors, Bifurcations, and Chaos. Springer Verlag. Zhu, H., and Huang, W. (2007) “The Economic Growth Models with Time Delays.” Chapter 7 in this volume.
268
Part III: Intertemporal Optimization in Consumption, Finance, and Growth
Chapter 9 Optimal Consumption and Investment Strategies in Dynamic Stochastic Economies
Claus Munk Department of Business and Economics, University of Southern Denmark Carsten Sørensen Department of Finance, Copenhagen Business School
9.1
Introduction
Individuals save in order to transfer consumption opportunities over time. They must determine how much to save (and hence how much to consume now) and how the savings should be allocated to different financial assets. Of course the optimal decisions will depend on the preferences of the individual and on the price dynamics of the financial assets. It is extremely important to study how the dynamics of financial investment opportunities – represented by interest rates, expected returns, volatilities, and correlations – affect the optimal consumption and investment decisions of various individuals. Since investors care about the real returns on their investments, the stochastic variations in the prices of consumer goods should also be taken into account. Recently, numerous papers have focused on one or a few sources of asset price uncertainty. In this chapter, we give a unified analysis by deriving the optimal consumption and investment strategy in a complete financial market where prices follow continuous, not necessarily Markovian, stochastic processes. We focus on investors with constant relative risk aversion (CRRA), but extend the results to hyperbolic
Claus Munk, Carsten Sørensen absolute risk aversion (HARA) and to power-linear habit formation preferences. Our general result shows how individuals will optimally hedge time-variations in investment opportunities and points out exactly what risks that are to be hedged. We discuss how the results of recent studies come out as special cases of our analysis and provide several new examples. The intertemporal consumption and investment decision of a utility-maximizing investor is a classical problem of financial economics dating back to Samuelson (1969) and Merton (1969), who derive the optimal strategies for investors with time-additive CRRA utility in a market with constant investment opportunities. Basically, all investors should invest in the same mean-variance optimal, speculative portfolio and in the riskless asset. While Merton (1971, 1973b) gave some preliminary results in the presence of stochastic shifts in investment opportunities, several recent papers have extended and concretized the analysis. We will now give a short review of some of these papers and then describe our analysis and results in more detail. Focusing on interest rate risk, Sørensen (1999) derives the optimal investment strategy when the short-term interest rates and bond prices follow the Vasicek (1977) model and the stock market index has a constant excess rate of return and volatility. Brennan and Xia (2000) allow for a two-factor version of the Vasicek model, but their analysis and results are basically the same. Grasselli (2000) assumes instead that the short rate complies with the Cox, Ingersoll and Ross (1985) model. All three papers find that for CRRA utility of terminal wealth only, the optimal investment strategy is to combine the speculative portfolio and the zero-coupon bond expiring at the investment horizon (or a portfolio replicating this zero-coupon bond). The higher the risk aversion, the higher the portfolio weight in the bond and the lower the portfolio weight in the speculative portfolio. In particular, the bond to stock ratio increases with the risk aversion, explaining the asset allocation puzzle identified by Canner, Mankiw and Weil (1997). Within the class of Gaussian term structure models of the type introduced by Heath, Jarrow and Morton (1992), Munk and Sørensen (2004) discuss the sensitivity of the optimal investment strategy with respect to the dynamics and the current form of the term structure of interest rates. Other papers generalize the dynamics of stock prices relative to the standard assumption of a constant excess rate of return and volatility. Both Kim and Omberg (1996) and Wachter (2002) derive closed-form expressions for the optimal portfolio when the excess expected stock 272
Optimal Consumption and Investment Strategies market return follows a mean-reverting Gaussian process and interest rates and volatilities are constant. Kim and Omberg consider an investor with CRRA utility of terminal wealth only, which enables them to allow for non-perfect correlation between the stock price and the excess expected return. Wachter assumes a perfect negative correlation in order to be able to explicitly solve the optimization of a time-additive CRRA utility of consumption. The presence of mean reversion in stock prices increases the demand for stocks, especially for long-term investors. Barberis (2000) and Xia (2001) explicitly allow for parameter uncertainty in similar settings. A few recent papers have investigated the effects of stochastic stock price volatility on portfolio choice. Chacko and Viceira (2005) study the case where the inverse of the stock price variance follows a meanreverting square-root process and the expected stock return is either constant or affine in the variance. They derive an explicit, but only approximately correct, expression for the optimal strategies of an infinitely lived investor with recursive preferences for consumption. Calibrating their model to U.S. stock returns, they conclude that the intertemporal hedging demand for stocks due to stochastic volatility in their model is lower in size than the hedging demands generated by variations in interest rates or excess returns. Liu and Pan (2003) also consider stochastic stock market volatility and add jump price risk. However, they allow agents also to invest in stock market derivatives and obtain a closed-form solution assuming a mean-reverting squareroot process for the stock price variance and a market price of variance risk proportional to the price volatility. When individuals save by investing in financial assets, they are interested in the real returns these assets offer. All the papers referred to above do not explicitly take inflation risk into account so that they apply to the real asset price dynamics. However, in most financial markets the traded bonds are nominal in the sense that they promise some fixed monetary payments.1 Similarly, the available short-term deposits promise a given nominal interest rate. Due to stochastic variations in the prices of consumer goods, such bonds and deposits have risky real returns. So far, only very few papers have explicitly incorporated inflation risk in the asset allocation problem and they all consider rather simple, specialized models. In both Brennan and Xia (2002) and Campbell and Viceira (2001), the real interest rate 1
In some countries, e.g. the United States and the United Kingdom, inflationindexed bonds are traded, but only for a few maturities and often with a modest turnover.
273
Claus Munk, Carsten Sørensen is described by a Vasicek-model and the expected inflation dynamics is given by an Ornstein-Uhlenbeck process. The term structure of nominal interest rates is therefore described by a two-factor model. While Brennan and Xia assume CRRA utility preferences, Campbell and Viceira apply a recursive utility specification in an infinite horizon setting. Munk, Sørensen and Vinther (2004) take a one-factor model for nominal interest rates while the implied term structure of real interest rates is described by a two-factor model, which makes it impossible to replicate real bonds by trading in nominal securities. They incorporate mean reversion in stock prices and derive the optimal investment strategy for investors with CRRA utility of terminal wealth. They demonstrate that stocks may be used as a non-perfect substitute for real bonds for hedging long term real interest rate risk in cases where the stock is negatively correlated with the real interest rate. In this chapter, we first study a very general complete financial market of nominal securities with prices following continuous, not necessarily Markovian, stochastic processes. Interest rates, excess expected returns, price volatilities, correlations, and consumer prices may all evolve stochastically over time. Despite the general market setting, we are able to derive an explicit and very precise characterization of the optimal consumption and, more remarkably, the optimal investment strategy of an investor with CRRA utility of consumption and/or terminal wealth. This result pinpoints exactly what risks individuals want to hedge and shows how to finance a desired real consumption process by investing in a market of nominal securities. We discuss how most of the studies listed above come out as special cases of our analysis. We extend our general result to the case of HARA utility and power-linear habit utility. We also discuss how to include labor income. Furthermore, we derive conditions under which an undiversifiable shock to the consumer price index can be allowed without changing the structure of the optimal strategies. Such a model feature is also present in Brennan and Xia (2002) in a model with very simple dynamics, but here we establish precisely when such an incompleteness will not ruin the form of the optimal policies. As a special case of our general analysis, we focus on the case where real interest rates are Gaussian and real market prices of risk are deterministic. Under these assumptions, the optimal investment strategy of a CRRA investor combines the speculative portfolio and a single real bond hedging changes in real investment opportunities, even though these changes may be generated by a multi-dimensional 274
Optimal Consumption and Investment Strategies Brownian motion. With utility from terminal wealth only, the hedge bond is the real zero-coupon bond maturing at the horizon of the investor. With utility from intermediate consumption, the hedge bond has a continuous coupon proportional to the expected future real consumption rate under the forward martingale measure. This result links the optimal hedge strategy and the optimal real consumption strategy closely together. While this result is to some extent known from Munk and Sørensen (2004), we in this chapter illustrate and extent the result by an example featuring specialized inflation uncertainty and where nominal interest rates follow a possibly non-Markovian, multi-factor Gaussian Heath-Jarrow-Morton model. In the example, we demonstrate how the optimal hedge strategy can as well be implemented by nominal bonds that match the expected nominal consumption under the forward martingale measure that is relevant with respect to nominal valuation. In a second example, we use an analogy to Markovian term structure models to solve in closed form the investment strategy of a CRRA investor who can invest in stock and a stock derivative in a complete market setting. The model of the equity market is adopted from Heston (1993) and features stochastic volatility and excess return on the stock. Besides providing a closed form solution for the optimal investment strategy, we also present numerical results based on realistic model parameters and focus on how changes in the investment opportunity set can be optimally hedged by investing relatively small amounts in a straddle position written on the stock. The rest of the chapter is organized as follows. In Section 9.2, we set up a general complete financial market and briefly review the martingale approach for general time-separable utility functions. In Section 9.3, we focus on CRRA utility and derive a general condition that the portfolio optimally hedging changes in the opportunity set must satisfy. We show how to obtain specific result by: (1) using an analogy to Markovian term structure models, and (2) specializing to Gaussian, not necessarily Markovian, real interest rates. Section 9.4 provide the two specific examples, including the example with non-Markovian Heath-Jarrow-Morton term structure dynamics. Section 9.5 contains the extensions to HARA or habit utility, to labor income, and to undiversifiable inflation risk. Finally, Section 9.6 concludes the chapter.
275
Claus Munk, Carsten Sørensen 9.2
Consumption and investment in complete markets
In this section, we describe the general economic model and state the utility maximization problem of an individual investor. We allow for non-Markovian dynamics of prices, but restrict ourselves to a complete market setting. 9.2.1
Information structure
Let z be a d-dimensional standard Brownian motion on a filtered probability space (Ω, F, F, P), where F = {Ft | t ∈ T } is a right-continuous filtration. Here we take T = [0, T ] for some T > 0 representing the time horizon of the investor considered. Ft is the augmentation of the σ-algebra generated by {zu | 0 ≤ u ≤ t}. It is assumed that F0 is the σ-algebra generated by the zero sets of P and that F = FT . Below, all statements involving stochastic variables are assumed to hold almost surely wrt. P and all stochastic processes are assumed to be adapted to F. 9.2.2
Consumption good and inflation
We assume that the economy has a single consumption good2 with a unit price Πt that follows a stochastic process with dynamics (1) dΠt = Πt πt dt + σΠt dzt . Since dΠt /Πt is the realized inflation rate over the next instant, πt is the expected inflation rate and σΠt is the volatility of the inflation rate. 9.2.3
Financial assets
The agents in the economy have access to continuous trading in (at least) d + 1 financial assets without transaction costs. One asset is an instantaneously nominally riskless asset called the savings account with nominal price At satisfying dAt = Rt At dt
(2)
where R is the continuously compounded short-term nominal interest rate process. Note that this asset is not riskless in real terms since the 2
For consumption/investment problems with multiple consumption goods see, e.g., Breeden (1979), Cuoco and Liu (2000), and Damgaard, Fuglsbjerg and Munk (2003).
276
Optimal Consumption and Investment Strategies real price At /Πt has a diffusion term. The other d assets are nominally risky with nominal prices given by the vector Pt = (P1t , . . . , Pdt ) satisfying dPt = diag(Pt ) [(Rt 1 + σP t Λt ) dt + σP t dzt ] .
(3)
Here, Λ is an Rd -valued L2 [0, T ] stochastic process of nominal market prices of risk, and σP is an Rd×d -valued stochastic process determining volatilities and correlations of the financial assets.3 The processes R, Λ, and σP are assumed to be progressively measurable with respect to F and such that the equations (2) and (3) are well-defined.4 Note that we allow for non-Markovian dynamics of the investment opportunity set. The volatility process σP t is assumed to satisfy the non-degeneracy assumption ∃ > 0 ∀(x, t) ∈ Rd × T : x σP t σPt x ≥ x 2 .
(4)
As a consequence of condition (4), σ has full rank d implying the dynamic completeness of the market.5 The unique nominal pricing kernel in this economy is the process M = (Mt ) defined by t t 1 t Mt = exp − Rs ds − Λs dzs − Λs 2 ds , (5) 2 0 0 0 with dynamics
dMt = −Mt Rt dt + Λ t dzt .
The unique real pricing kernel is m = (mt ) where mt = Mt Πt /Π0 so that dmt = −mt (Rt − πt + Λ t σΠt ) dt + (Λt − σΠt ) dzt . The implicit, real, and short-term interest rate is therefore6 rt = Rt − πt + Λ t σΠt
T L2 [0, T ] is the set of adapted stochastic processes x such that 0 xt 2 dt < ∞ almost surely. Similarly, L1 [0, T ] is the set of adapted processes x with T xt dt < ∞ almost surely. 0 4 In addition to the L2 [0, T ] assumption on Λ, it suffices that R ∈ L1 [0, T ] and σP ∈ L2 [0, T ]. 5 The market is potentially complete, but by restricting the set of admissible portfolio processes, various types of incompleteness can be modeled. This complicates the solution of the utility maximization problem considerably, cf. Cvitanic and Karatzas (1992) and Cuoco (1997). 6 The real rate can also be computed as the rate of return of the portfolio xt = (σPt )−1 σΠt , which is riskless in real terms. 3
277
Claus Munk, Carsten Sørensen and the real market price of risk vector is λt = Λt − σΠt . We can then write the real pricing kernel as t t 1 t mt = exp − rs ds − λs dzs − λs 2 ds . (6) 2 0 0 0 See, e.g., Cochrane (2001) or Duffie (2001) for more on pricing kernels. 9.2.4
The individual’s choice problem
We consider an investor seeking to maximize his expected remaining life-time utility by choosing a trading and consumption strategy appropriately. We assume that the investor does not receive income from non-traded assets so that zero is a natural lower bound on the wealth process. (The effects of income will be studied in Section 9.5.3.) We can represent the trading strategy of the investor by an Rd -valued progressively measurable stochastic process x = (x1 , . . . , xd ) with xit denoting the fraction of wealth invested in the i’th risky asset at time t. The fraction of wealth invested in the savings account is residually determined as x0t = 1 − x t 1. A real consumption strategy is a progressively measurable process c = (ct ) with the corresponding nominal consumption given by Ct = ct Πt . Given a trading strategy x and a nominal consumption strategy C, the nominal wealth Wt = WtC,x of the investor evolves according to C,x xt σP t dzt . (7) dWtC,x = Rt WtC,x + WtC,x x t σP t Λt − Ct dt + Wt We denote by C the set of all consumption processes C ∈ L1 [0, T ] and by L1+ the set of FT -measurable random variables W with finite expectations. A consumption/terminal wealth pair (C, W ) ∈ C × L1+ is called admissible with initial wealth W0 if a trading strategy x ∈ L2 [0, T ] exists such that W0C,x = W0 and WTC,x = W . In that case, the trading strategy x is said to finance (C, W ). The expected life-time utility of the agent is assumed to be of the time-additive form T U1 (Ct /Πt , t) dt + U2 (WT /ΠT ) , E 0
where U1 (·, t) and U2 (·) are strictly increasing and concave C 1 (0, ∞) functions with U1 (∞, t) ≡ limc↑∞ U1 (c, t) = 0 and U1 (0, t) ≡ limc↓0 U1 (c, t) = ∞ where the primes denote partial derivatives with respect to the first argument. Similarly for U2 (·). It follows from 278
Optimal Consumption and Investment Strategies the martingale approach initiated by Pliska (1986) and formalized by Karatzas, Lehoczky and Shreve (1987) and Cox and Huang (1989, 1991) that we can find the optimal consumption C ∗ and terminal wealth level W ∗ by solving the static problem T sup E U1 (Cs /Πs , s) ds + U2 (W/ΠT ) , (8) 0
(C,W )∈C×L1+
T
s.t. E 0
Mt Ct dt + MT W ≤ W0 .
(9)
Subsequently, a portfolio x∗ financing (C ∗ , W ∗ ) must be found. Due to our assumption that the inflation risk is spanned, we can alternatively let the agent directly choose a real consumption process c = (ct ) and a real terminal wealth level w. Given a real consumption process c and a portfolio process x, the real wealth of the investor, wtc,x = WtcΠ,x /Πt , evolves as σPit + x dwtc,x = wtc,x (Rt − πt + σΠt t σP t (Λt − σΠt )) − ct dt + wtc,x (x t σP t − σΠt ) dzt c,x c,x = wt (rt + (x t σP t − σΠt )λt ) − ct dt + wt (xt σP t − σΠt ) dzt . (10)
In terms of real consumption and wealth, the utility maximization problem is formulated as follows T U1 (cs , s) ds + U2 (w) , (11) sup E (c,w)∈C×L1+
s.t. E 0
T
0
mt ct dt + mT w ≤ w0 .
(12)
Given a solution (c∗ , w∗ ) we have to find a portfolio x∗ financing ∗ ∗ (c∗ , w∗ ) in the sense that wTc ,x = w∗ . 9.2.5 General solution technique By Lagrangian theory, the optimal solution to the static problem will satisfy U2 (w) = ψmT , U1 (ct , t) = ψmt , 279
Claus Munk, Carsten Sørensen where ψ is such that the inequality constraint holds as an equality. Let I1 (·, t) denote the inverse of the marginal utility function U1 (·, t) and I2 (·) the inverse of U2 (·). Define T H(ψ) = E mt I1 (ψmt , t) dt + mT I2 (ψmT ) . 0
By concavity of the utility functions, H(·) is a decreasing function. Assume that H(ψ) is finite for all ψ. Then H(·) has an inverse denoted by Y(·) so that ψ = Y(w0 ), and the optimal solution to (11)–(12) can be written as c∗t = I1 (Y(w0 )mt , t),
t ∈ [0, T ],
∗
w = I2 (Y(w0 )mT ). The real wealth process under the optimal policy is given by T 1 ∗ ∗ ∗ wt = Et ms cs ds + mT w . mt t
(13) (14)
(15)
The indirect utility is the future expected utility generated by the optimal policies, i.e. T ∗ ∗ V0 = E U1 (ct , t) dt + U2 (w ) 0 T U1 (I1 (Y(w0 )mt , t), t) dt + U2 (I2 (Y(w0 )mT )) . =E 0
A drawback of the martingale approach is that the optimal investment policy is only given implicitly by the martingale representation theorem. Therefore, it is generally not clear how to implement (c∗ , w∗ ) by a trading strategy x∗ such that wtc
∗ ,x∗
= wt∗ ,
t ∈ [0, T ].
In the case of logarithmic utility, it can be shown (see, e.g., Karatzas and Shreve (1988, Example 3.6.6)) that the optimal investment strategy is to invest the fractions −1 −1 Λt = σPt (λt + σΠt ) (16) xlog t = σP t of wealth in the d risky assets and the remaining fraction 1 − (xlog t ) 1
280
Optimal Consumption and Investment Strategies in the nominal savings account. For general utility functions, the optimal investment strategy can be represented rather abstractly in terms of stochastic integrals of Malliavin derivatives by the Clark-Ocone formula, cf. Ocone and Kazatzas (1991), but to derive an explicit expression for the optimal portfolio for non-logarithmic utility functions, it is generally recognized that the price dynamics must be specialized. Cox and Huang (1989) show that when the state-price density and the risky asset prices constitute a Markovian system, the optimal investment strategy is given in terms of the solution of a linear second order partial differential equation. More explicit results are given in the case of a deterministically changing investment opportunity set, cf., e.g., Cvitanic and Karatzas (1992). In the following sections, we – on the other hand – provide a closed-form expression for the optimal investment strategy in a very general, possibly non-Markovian, market setting. 9.3
Results for CRRA utility in general markets
We will characterize the optimal investment strategy of a CRRA utility investor in the general market setting outlined above. Therefore, we define U1 (c, t) = ε1 e−βt
c1−γ , 1−γ
U2 (w) = ε2 e−βT
w1−γ , 1−γ
γ > 0, (17)
which for γ = 1 is interpreted as the limiting case of logarithmic utility. The parameter β is the investor’s subjective time preference rate. The non-negative constants ε1 and ε2 allow for different weightings of intermediate and terminal consumption, including the cases with utility from intermediate consumption only (ε2 = 0) and the case with utility from terminal wealth only (ε1 = 0). We will state the optimal consumption and investment strategies in terms of the stochastic process Q = (Qt ), defined by T 1 1 β β γ e− γ (s−t) qts ds + ε2γ e− γ (T −t) qtT , (18) Qt = ε1 t
8%
where qts
= Et
ms mt
&1− γ1 9 .
If we write the dynamics of qts in the form s dqts = qts μsqt dt + (σqt ) dzt ,
(19)
(20) 281
Claus Munk, Carsten Sørensen it follows from a Leibnitz-type rule for stochastic processes proved in the Appendix that 1
σQt =
ε1γ
T
1
β
β
s T e− γ (s−t) qts σqt ds + ε2γ e− γ (T −t) qtT σqt . 1 1 β T − β (s−t) s γ γ − γ (T −t) T γ ε1 t e qt ds + ε2 e qt t
(21)
Theorem 1. The optimal real consumption strategy of a CRRA investor is 1 w∗ (22) c∗t = ε1γ t . Qt The optimal investment strategy is given by & % % & 1 1 Λt + 1 − σΠt + σQt . (23) x∗t = (σPt )−1 γ γ The indirect utility of the investor is Vt =
1 Qγ (w∗ )1−γ . 1−γ t t
(24)
In these expressions, w∗ denotes the real wealth process generated by the optimal strategies. Proof: With CRRA utility we have 1
β
1
1
I1 (y, t) = ε1γ e− γ t y − γ ,
β
1
I2 (y) = ε2γ e− γ T y − γ
so that we get 1 1 H(ψ) = ψ − γ E ε1γ
T
0
1− γ1
β
e − γ t mt
1
β
1− γ1
dt + ε2γ e− γ T mT
1
= ψ − γ Q0 ,
and consequently Y(w0 ) = Qγ0 w0−γ . The optimal consumption rate and terminal wealth are thus w0 , Q0 1 β − 1 w0 w∗ = ε2γ e− γ T mT γ . Q0 1
β
− γ1
c∗t = ε1γ e− γ t mt
Substituting into (15) we obtain the optimal real wealth level wt∗ = 282
−1 w0 − βγ t e Qt mt γ . Q0
(25) (26)
Optimal Consumption and Investment Strategies Combining this with (25), we obtain the expression for the optimal consumption process given in the theorem. By Itˆo’s Lemma, the dynamics of optimal real wealth is % dwt∗
= ...
dt + wt∗
1 dmt dQt − + γ mt Qt
&
% = ...
dt + wt∗
1 λt + σQt γ
& dzt ,
where we leave the drift term unspecified. Aligning this with the wealth dynamics for a general portfolio x in (10) and using the relation λt = Λt −σΠt , we derive the expression for the optimal portfolio stated in the theorem. The computation of the indirect utility applies the fact that ws∗
=
β wt∗ e− γ (s−t)
%
ms mt
&− γ1
Qs Qt
for all s and t in [0, T ]. Consequently, we can write consumption in [t, T ] and terminal wealth in terms of time t values: c∗s
1 γ
= ε1 e
−β (s−t) γ
%
ms mt
&− γ1
wt∗ , Qt
wT∗
1 γ
= ε2 e
−β (T −t) γ
%
mT mt
&− γ1
wt∗ . Qt (27)
The value function becomes T 1 1 −β(s−t) ∗ 1−γ −β(T −t) ∗ 1−γ (c ) (w ) e ε1 ds + e ε2 Vt = Et 1−γ s 1−γ t 8 % &− 1−γ % ∗ &1−γ T 1−γ γ β(1−γ) 1 ms wt − γ (s−t) γ −β(s−t) Et e ε1 ε1 e ds = 1−γ mt Qt t % &− 1−γ % ∗ &1−γ 9 1−γ γ β(1−γ) mT wt − γ (T −t) γ −β(T −t) ε2 ε2 e +e mt Qt % ∗ &1−γ 1 1 wt = Qγ (w∗ )1−γ Qt = 1 − γ Qt 1−γ t t as claimed. 2 We see that it is optimal to consume a time- and state-dependent fraction of wealth. With constant real investment opportunities, i.e. a constant real short rate rt and a constant real market price of risk λt , we have & % 1 1 1 1 γ γ γ −ξ(T −t) , ε1 + ξε2 − ε1 e Qt = ξ 283
Claus Munk, Carsten Sørensen where
% & &% 1 β 1 ξ = + 1− r+ λ λ . γ γ 2γ
Since σQt is then zero we are back at the Merton (1969) solution adapted to our setting which explicitly takes inflation risk into account. Since the difference between the optimal portfolio with and without −1 stochastic investment opportunities is given by the term σQt , we may interpret this as a hedge against shifts in real σP t investment opportunities. Note that the only shifts that the investor want to hedge are shifts that changes expectations of a power of the future real pricing kernel. In particular, only changes in the real short rate and the real market price of risk are of concern to the investor in this complete market setting. We can also see that the hedge term is determined by the volatility of the wealth/consumption ratio. In a later section, we will strengthen the link between the hedge portfolio and the optimal consumption strategy in a simplified framework. We will now rewrite the optimal investment strategy to better understand its structure and to simplify the comparison with the literature that does not explicitly take inflation risk into account. So far we have allowed the individual to invest in d nominally risky securities and one nominally instantaneously riskless security, the nominal savings account. Due to market completeness, we can combine these d + 1 securities so that we obtain a security which is instantaneously riskless weights real terms. This is achieved by the portfolio with in −1 σΠt in the risky assets and the weight 1 − x˜t 1 in the x˜t = σP t nominal savings account. While the dynamics of the real value p˜0t of this real savings account of course is d˜ p0t = p˜0t rt dt, the dynamics of the nominal value P˜0t = p˜0t Πt is given by dzt . dP˜0t = P˜0t (rt + πt ) dt + σΠt Suppose we allow the individual to invest in the real savings account instead of the nominal savings account and also in the same d risky assets as before. A portfolio x¯ of the d risky assets is then accompanied by a position of 1 − x¯ 1 in the real savings account. For a given portfolio x¯ and a consumption process C, the nominal wealth WtC,¯x will evolve as . σ Λ − C dWtC,¯x = WtC,¯x rt + πt + x¯ dt − 1σ Pt t t t Πt + WtC,x σΠt + x¯ dzt . t σP t − 1σΠt 284
Optimal Consumption and Investment Strategies Comparing this with (7), we can conclude that a portfolio xt in the “old” set of assets (including the nominal savings account) is equivalent to the portfolio −1 σP t xt − σΠt x¯t = σPt − σΠt 1 in the “new” set of assets (including the real savings account). In particular, the optimal portfolio in (23) corresponds to x¯∗t
% & −1 1 λt + σQt . = σP t − 1σΠt γ
(28)
Note that σP t − 1σΠt is the volatility matrix of the real asset prices Pt /Πt . A similar expression for the optimal portfolio was given by Munk and Sørensen (1999) in a real economy. The result in the theorem shows how to implement the strategy in a nominal economy. Next, let us look at some benchmark risk aversion parameters. For a log utility investor (γ = 1), we get
Qt =
1 ε1 + [βε2 − ε1 ]e−β(T −t) β
and σQt = 0, so that the well-known optimal strategies for log investors are obtained. In particular, log investors do not hedge changes in investment opportunities. For infinitely risk averse investors, interpreted as the limit γ → ∞, we have (assuming ε1 , ε2 > 0) Qt =
T
Et t
ms mt
ds + Et
mT , mt
which is the time t real price of a real bond with a continuous coupon of one (consumption unit) in the time interval [t, T ] and a lump sum time T payment of one (consumption unit). An infinitely risk averse investor will not invest speculatively, but simply try to replicate the riskless real annuity bond. To apply the theorem, we have to identify Qt and its volatility vector σQt . Below, we discuss three ways to obtain such specific results. 9.3.1 Specific results by analogy to term structure models The theorem generalizes recent results in affine and quadratic Markovian frameworks, cf. Liu (1999), Brennan and Xia (2000), Sørensen 285
Claus Munk, Carsten Sørensen (1999), Grasselli (2000), and Wachter (2002). To obtain their results, first rewrite the relevant expectation of the pricing kernel as 8% & 1 9 1− γ ms qts = Et mt % & & s% 1 1 2 = Et exp − 1 − ru + λu du γ 2 t % & s 1 − 1− λ u dzu γ t s (γ) (γ) = EQ e− t ru du , (29) t where we have defined the process r(γ) by & % & % 1 1 1 (γ) rt = 1 − rt + 1− λt 2 , γ 2γ γ
(30)
and Q(γ) is the probability measure under which the process z (γ) defined by & t% 1 (γ) 1− zt = zt + λu du (31) γ 0 is a standard Brownian motion. Combining this observation with the well-known zero-coupon bond pricing results in affine and quadratic term structure models, cf. Duffie and Kan (1996) and Leippold and Wu (2000), we can recover Liu’s results. For example, if r(γ) has an affine drift and variance under the probability measure Q(γ) , functions a(γ) and b(γ) will exist such that 8% & 1 9 1− γ (γ) ms (γ) (γ) s qt = Et = e−a (s−t)−b (s−t)rt , mt and hence
1
σQt = −σr(γ) ,t ε1γ
T
b(γ) (s − t)e−ˆa
(γ) (s−t)−b(γ) (s−t)r (γ) t
1
+ ε2γ b(γ) (T − t)e−ˆa 1 ε1γ
T
(γ) (s−t)−b(γ) (s−t)r (γ) t
e−ˆa
(γ) (T −t)−b(γ) (T −t)r (γ) t
1
ds + ε2γ e−ˆa
t
(γ) (T −t)−b(γ) (T −t)r (γ) t
−1 ,
where a ˆ(γ) (τ ) = a(γ) (τ ) + 286
ds
t
β τ, γ
Optimal Consumption and Investment Strategies and σr(γ) ,t is the (absolute) volatility of the process r(γ) defined above. For example, with utility of terminal wealth only (ε1 = 0, ε2 = 1), constant market prices of risk, and the real short rate following a one-factor Vasicek model r − rt ) dt − σr dzt , drt = κ (¯ we get that σr(γ) ,t = −(1 − 1/γ)σr and 1 b(γ) (τ ) = 1 − e−κτ ≡ b(τ ), κ and consequently & % 1 σQt = 1 − σr b(T − t). γ Since σr b(T − t) is the volatility of a real zero-coupon bond maturing at T , we see that the optimal portfolio combines the speculative portfolio and a position in the real zero-coupon bond maturing at the end of the investor’s horizon. This is the main result of Sørensen (1999). 9.3.2 Results under Gaussian real interest rate dynamics Following ideas originally laid out in Munk and Sørensen (1999), it is possible to describe the optimal consumption and investment strategies of CRRA investors in cases where the real interest rate, rt , follows a Gaussian process and the real market prices of risk process, λt , is a deterministic function of time. In this case, the real pricing kernel is log-normally distributed. Hence, it is possible to directly evaluate the relevant expectations that enter the definition of the stochastic process Q defined in (18). First note that the real price at time t of a real zero-coupon bond which pays off one consumption unit at time s ≥ t is given by % & % & % & 1 ms ms ms = exp Et ln + Vart ln . Bts = Et mt mt 2 mt It follows that 8% & 1 9 1− γ ms s qt = Et mt % % &* & % & % &2 1 1 1 ms ms 1− + Et ln 1− Vart ln = exp γ mt 2 γ mt 1 1−γ = (Bts )1− γ exp g(t, s) , (32) 2γ 2 287
Claus Munk, Carsten Sørensen where for notational simplicity we have introduced the deterministic function % & ms g(t, s) = Vart ln m st s 1 s 2 ru du − λu du − λu dzu . = Vart − (33) 2 t t t Substituting the above expressions for qts into (18) now provides the optimal real consumption and investment strategy and indirect utility of a CRRA investor through Theorem 1. In particular, the optimal hedge behavior is obtained by investing in a portfolio with diffusion term σQt . In the given context, the relevant hedge portfolio at any time t can be characterized as a real coupon bond with a continuous coupon payment stream where the future coupon rate at time s ≥ t must be chosen to be equal to the conditional expected future consumption rate at time s, where the expectations are taken under ¯ s introduced by Jamshidthe so-called forward martingale measures Q ian (1987) and Geman (1989). In general, expectations under the subjective probability measure P, and the time s forward martingale measure differ. We are concerned with valuation in real terms, and the subjective probability measure P and the relevant time s forward ¯ s are interlinked through the relation martingale measure Q % & ms ¯s X = Bts EQ (34) Et t [X] mt which must be satisfied for any sufficiently well-behaved random variable X, cf., e.g., Duffie (2001). All in all, we formally have the following corollary to Theorem 1. Corollary 1. If the real interest rate, rt , follows a Gaussian process and the real market price of risk process, λt , is deterministic, the optimal investment strategy is given by % & % & −1 1 1 Λt + 1 − (σBt + σΠt ) . (35) x∗t = σPt γ γ where (σBt + σΠt ) is the volatility vector of the nominal price of a real bond which pays continuous real coupon according to ¯
∗ s −1 s k(s) = EQ t [cs ] = (Bt )
288
wt∗ γ1 − βγ (s−t) s ε e qt , 0 ≤ t ≤ s < T, Qt 1
(36)
Optimal Consumption and Investment Strategies and has a terminal lump sum real payment at time T of −1 wt∗ γ1 − β (T −t) T ¯ k(T ) = EtQT [wT∗ ] = BtT ε e γ qt . Qt 2
(37)
Proof: First note that the last equalities in (36) and (37) follow by the definition of the relevant forward martingale measure, as in (34), and by the characterization of optimal consumption and terminal wealth in (27). Now, as a key observation, it follows from the characterization of qts in (32) and Ito’s lemma that & % 1 s s σBt (38) σqt = 1 − γ s is the volatility vector of the real zero-coupon bond price where σBt involved in (32). The real price of a coupon bond paying continuous real coupon k(s), t ≤ s < T and with a terminal lump sum real payment, k(T ), at time T is in general given by T Bt = k(s)Bts ds + k(T )BtT . t
Moreover, using Lemma 1 in the appendix, the volatility vector of such a coupon bond is given by T T k(s)B s σ s ds + k(T )BtT σBt . (39) σBt = t T t Bt s T k(s)B ds + k(T )B t t t By inserting k(s) and k(T ) as described in (36) and (37) as well as the relationship in (38), and by comparison with the characterization of the volatility vector σQt in (21), it is seen that & % 1 σBt . σQt = 1 − γ The corollary follows by inserting this relationship in (23) and observing that (σBt + σΠt ) is the volatility vector of the nominal price on the 2 relevant real bond, Πt Bt . Corollary 1 provides an explicit expression for the optimal investment strategy under inflation and with possibly non-Markovian, multifactor dynamics of interest rates. The optimal portfolio allocates a fraction of wealth (1/γ) into the speculative portfolio and a fraction 289
Claus Munk, Carsten Sørensen of wealth (1 − 1/γ) into the suggested real coupon bond which hedge against changes in the investment opportunity set. In the case of no intermediate consumption, the relevant bond is a real zero-coupon bond which pays off at the terminal date, and the corollary generalizes the insights of Brennan and Xia (2000) and Sørensen (1999) into a setting with inflation uncertainty. In the case of intermediate consumption, the corollary generalizes results in Munk and Sørensen (1999,2004) into a setting with inflation uncertainty. 9.4 9.4.1
Examples Example 1 (Non-Markovian term structure dynamics)
In this example, we will consider an economy where the dynamics of the term structure of nominal interest rates is given by a k-factor model of the HJM-class introduced by Heath, Jarrow and Morton (1992). An HJM-framework is natural since it allows for perfect calibration of the model to initially observed nominal zero-coupon bond prices of different maturity. Nominal zero-coupon bond prices are related one-to-one to the structure of forward rates by the re term τ lationship, Dtτ = exp − t fts ds , and the HJM-approach focus on modeling the simultaneous dynamics of all points on the whole forward rate curve. For any τ , the dynamics of the τ -maturity instantaneous forward rate is assumed given by t t α(s, τ ) ds + σf (s, τ ) dzs (40) ftτ = f0τ + 0
0
where σf (·, τ ) is an Rk -valued process while f0τ denotes the τ -maturity forward rate observed initially at time 0. As a no-arbitrage drift restriction, Heath, Jarrow and Morton (1992) have shown that % & τ σf (t, u) du α(t, τ ) = σf (t, τ ) Λt + t
must be satisfied. This implies that one only has to specify the initial term structure of forward rates and the volatility structure, σf (t, τ ) when modeling term structure dynamics in an HJM-framework. The nominal short interest rate is given by Rt = ftt and evolves, therefore, according to the equation t t α(s, t) ds + σf (s, t) dzs . (41) Rt = f0t + 0
290
0
Optimal Consumption and Investment Strategies In the following, we will assume that σf (·, τ ) is a deterministic vector process, which implies that the nominal short interest rate, Rt , is a Gaussian process. Furthermore, we will assume that the nominal market price of risk process, Λt , is a deterministic vector process. It is important to point out that in this kind of Gaussian HJM-model, the short rate process is not necessarily Markovian, although the general HJM-framework also encompasses all known Markovian term structure models as special cases.7 We will consider a specific HJM threefactor numerical example below which exhibits non-Markovian Gaussian interest rate dynamics; this problem is thus not solvable by a direct dynamic programming solution approach as in the tradition of Merton (1971, 1973b). In the absence of inflation, Corollary 1 applies. In particular, the relevant portfolio for hedging changes in investment opportunities can be characterized as a bond with coupon payments that match the forward-expected consumption pattern. We will now consider a specialized case of inflation dynamics of the form (42) dΠt = Πt π dt + σΠ dzt , where the expected inflation rate, π, and the volatility vector, σΠ , are constant. This process is known as a geometric process, and (Πs /Πt ) is log-normally distributed. The real short interest rate is given by rt = Rt − π + Λ t σΠ , and the real market price of risk vector is given by, λt = Λt − σΠ . Hence, the real interest rate is a Gaussian process and the real market price of risk vector process is deterministic. In this case, Corollary 1 applies and the optimal investment strategy can be implemented by investing in a portfolio of the nominal savings account, the nominal speculative portfolio, and a real bond with coupon that in real terms match the forward-expected consumption pattern in order to hedge changes in investment opportunities. Moreover, for concrete calculations of the relevant real coupons, as expressed in equations (36) 7
In fact, the short rate is only Markovian if σf (t, τ ) can be separated as σf (t, τ ) = G(t)H(τ ),
where H is a real-valued continuously differentiable function that never changes sign and G is an Rk -valued continuously differentiable function, cf. Carverhill (1994).
291
Claus Munk, Carsten Sørensen and (37) in Corollary 1, one can use that s s 1 s 2 ru du − λu du − λu dzu g(t, s) = Vart − 2 t t t 2 s s s 2 du = λu du + σ (u, τ ) dτ f t t u s s λ σf (u, τ ) dτ du. (43) +2 u t
9.4.2
u
Implementation of hedge by nominal bonds
In the following, we will consider how to implement the hedge against investment opportunities in the absence of real bonds. In particular, we will demonstrate that a conceptually similar hedge portfolio can be implemented using nominal coupon bonds that match the forwardexpected nominal consumption pattern. At this point, it is important to point out that the relevant time s forward martingale measure based on nominal valuation, Qs , differs from the time s forward martingale measure based on real valuation, ¯ s , as defined in (34). The different forward martingale measures are Q connected through the following relations to the subjective probability measure P, % & % & ms Ms ¯s s Q s Et X = Bt Et [X] and Et X = Dts EQ t [X] (44) mt Mt which again must be satisfied for all sufficiently well-behaved random variables X. Since c∗ = Π C ∗ and W ∗ = Π w∗ , and by manipulation of the relations in (44), it is seen that ¯
¯
∗ s Qs ∗ s Qs ∗ s Qs ∗ s Dts EQ t [Cs ] = Πt Bt Et [cs ] and Dt Et [WT ] = Πt Bt Et [wT ] (45)
where the expressions on the left-hand and right-hand sides of the equations are simply different ways of expressing the present nominal value of the future consumption rate at time s and terminal wealth at time T , respectively. Using that the nominal real pricing kernel, Mt , and the real pricing kernel, mt , are both log-normally distributed under the specialized inflation dynamics, one can establish the following connection between nominal and real zero-coupon bond prices: Ms ms s = Et Πt Ψ(t, s) = Πt Bts Ψ(t, s) (46) Dt = Et Mt mt 292
Optimal Consumption and Investment Strategies where
Ψ(t, s) = exp −π (s − t) + Λ σΠ (s − t) s s σΠ σf (u, τ ) dτ du . + t
u
In particular, since Ψ(t, s) is a deterministic function it follows by Ito’s lemma that real and nominal zero-coupon bond volatilities satisfy, s s = σΠ + σBt . σDt
(47)
Consider now a coupon bond paying continuous nominal coupon ∗ s at a rate K(s) = EQ t [Cs ] and with a terminal lump sum nominal Qs ∗ payment K(T ) = Et [WT ]. The nominal price on the bond is T Dt = K(s)Dts ds + K(T )DtT t
and, by again using Lemma 1 in the appendix, the volatility vector of such a coupon bond can be characterized by T s T K(s)Dts σDt ds + K(T )DtT σDt σDt = t T . (48) K(s)Dts ds + K(T )DtT t As inferred from the definition of K(s) and K(T ), the definition of k(s) and k(T ) in Corollary 1, and (45), we have Dts K(s) = Πt Bts k(s), t ≤ s ≤ T . Inserting this observation and the relation in (47) into (48), it is seen that σDt = σΠ + σBt where σBt is given in (39). Hence, the suggested nominal bond can be used to implement the hedge against changes in the investment set given in Corollary 1. 9.4.3 Example 2 (Stochastic volatility and excess returns on stocks) In this example, we will consider an investor who can invest in a single stock (a stock index) and an option on the stock. There are two basic sources of randomness in the economy and hence, since the investor can trade in two securities, markets are complete. The model of the economy is based on the stochastic volatility option pricing model of Heston (1993). The notation used in the example is similar to the notation used in Heston (1993), but after having described the formal model below, it is subsequently pointed out how the model exactly fits into the general description of asset price dynamics in section 9.2. 293
Claus Munk, Carsten Sørensen The dynamics of the stock price (cum-dividends), St , and the option price, Ct , are assumed described by √ dSt = (R + λs vt ) dt + vt dˆ z1t St dCt = Ct
%
(49)
& ∂C √ ∂C ∂C ∂C √ λs vt + σλv vt dt + σ vt dˆ R+ vt dˆ z1t + z2t ∂S ∂v ∂S ∂v (50)
and
√ z2t dvt = κ(θ − vt ) dt + σ vt dˆ
(51)
where zˆ1t = z1t and zˆ2t = ρ z1t +
'
1 − ρ2 z2t
are Brownian motions with Cov(dˆ z1t , dˆ z2t ) = ρ dt. Using a standard no-arbitrage approach, Heston (1993) shows how to price options in the above economy. Prices on, e.g., a European call option C(S, v, t), must thus as usual satisfy a PDE with appropriate boundary conditions. Heston (1993), pp. 330-331 and his appendix, demonstrates that the solution is on the following form: C(S, v, t) = SP1 − Ke−r(T −t) P2
(52)
where (see equation (18) in Heston (1993)) 1 1 Pj = + 2 π
0
∞
e−iϕ ln[K] fj (x, v, T ; ϕ) Re dϕ , j = 1, 2 iϕ
(53)
and where x = ln S and fj (·; ϕ) denotes characteristic functions that are obtained as solutions to PDE’s with terminal condition eiϕx . Since coefficients in the particular PDE’s are affine and the terminal conditions are exponential-affine, the solutions for fj , j = 1, 2 are exponential-affine and on the form in equation (17) in Heston (1993). In addition, it is straightforward to obtain closed-form expressions and ∂C that enter the similar to (52) for the partial derivatives ∂C ∂S ∂v description of option price dynamics in (50); the specific expression for option price dynamics in (50) is implied by Ito’s lemma. The above model of stock price and option price dynamics is a special case of the asset price dynamics in (3). The dynamics of asset prices in (49) and (50) are thus encompassed as a special case of (3) 294
Optimal Consumption and Investment Strategies where: Pt = (St , Ct ) , & % √ λs vt , and Λt = 2 − 12 (1 − ρ ) (λv − ρλs ) & % √ 1 0 ' vt . σP t = ∂C ∂C 2 σ ∂C + ρσ 1 − ρ ∂S ∂v ∂v We will consider the intertemporal portfolio choice of a power utility investor in a setting with constant rate of inflation, πt = π, and without inflation uncertainty, i.e. σΠt = 0. In this economy, the real rate is thus given by the constant, r = R − π, and nominal risk premia and real risk premia must coincide, λt = Λt . The optimal investment strategy of the investor is described in Theorem 1 and the portfolio solution depends critically on the processes Qt and qts , as defined in (19) and (20). It is possible to determine the specific optimal investment strategy by using the results on analogy to term structure models. In particu(γ) lar, the process rt defined in (30) in the present context is & & % &% 2 % 1 1 1 λs + λ2v − 2ρλs λv (γ) vt r+ 1− rt = 1 − γ 2γ γ 1 − ρ2 = k0 + k1 vt = k0 + vt∗ (54) where the first equality defines the constants k0 and k1 , and the second equality defines the proportional volatility process v ∗ = k1 v. In the following, we assume that γ > 1 so that, e.g., k1 is positive.8 By an application of Ito’s lemma, it follows that the dynamics of the proportional volatility process are described by ' (γ) z2t (55) dvt∗ = κ∗ (θ∗ − vt∗ )dt + σ ∗ vt∗ dˆ where
& % κ ' 1 κ∗ = κ + 1 − σλv , θ∗ = k1 θ ∗ , σ ∗ = k1 σ , γ κ
and (γ)
(γ)
zˆ2t = ρz1t +
t ' 1 √ (γ) 1 − ρ2 z2t = z2t + (1 − )λv vu du. γ 0
8
The logarithmic utility case, γ = 1, is described by the general results in Section 9.2. For example, the optimal investment strategy for a log-investor is given by (16) where the hedge term is absent.
295
Claus Munk, Carsten Sørensen (γ)
In particular, zˆ2t is a standard Brownian motion under the probability measure Q(γ) , as defined in (29). Using the results on the analogy to term structure models, the relevant calculations with respect to qts are now similar to the evaluations of zero-coupon bond prices in a term structure model where the dynamics of the short interest rate is given by (54) and (55). This term structure model is known as the extended CIR-model; c.f. Pearson and Sun (1994). By analogy to the derivations in Pearson and Sun (1994), we thus have s (γ) (γ) (γ) (γ) (γ) (56) qts = EQ e− t ru du = e−a (s−t)−b (s−t) rt t where a(γ) (τ ) = k0 τ −
σ ∗2
2(eγτ −1)
b(γ) (τ ) = γ
2κ∗ θ∗
log
(γ+κ∗ )(eγτ −1)+2γ
√
=
∗
2γe(γ+κ )τ /2 (γ+κ∗ )(eγτ −1)+2γ
,
,
κ∗2 + 2σ ∗2 .
The optimal investment strategy is given in (23) in Theorem 1. In the present context, we assume that σΠt = 0 and the optimal investment strategy in (23) reduces to −1 1 −1 σP t Λt + σPt σQt . γ
x∗t =
where the hedge term is given by (using (21), (56), and (51)) % & −1 √ −1 ' ρ σP t σQt = −h(vt , t, s) k1 σ vt σP t 1 − ρ2
(57)
(58)
with 1
T
β
1
β
e− γ (s−t) qts b(γ) (s − t) ds + ε2γ e− γ (T −t) qtT b(γ)(T −t) h(vt , t, s) = . 1 1 β β T ε1γ t e− γ (s−t) qts ds + ε2γ e− γ (T −t) qtT (59) In the special case where the investor has utility from terminal wealth only (ε1 = 0, ε2 = 1), the expression for the optimal investment strategy simplifies. In this case, the optimal investment strategy is given by (58) and (59), but the function h(vt , t, s) in (59) reduces to a function of time only given by ε1γ
t
h(vt , t, s) = b(γ) (T − t). 296
(60)
Optimal Consumption and Investment Strategies Also, in the special case where the relevant hedging instrument (the option with dynamics described in (50)) has zero sensitivity with respect to the underlying stock price, the relevant hedge strategy sim= 0, and the hedge term (58) in the optimal plifies. In this case, ∂C ∂S investment strategy can be written as % &−1 % & −1 ∂C 0 σQt = −h(vt , t, s) k1 . (61) σP t 1 ∂v = 0, the optimal hedge portfolio Hence, in the special case where ∂C ∂S only involves taking a position in the relevant option strategy. Finally, in the numerical calibration below, we also consider the special case where the stock and the volatility process are perfectly negatively correlated (ρ = −1). The optimal investment strategy can in this case be obtained as a limiting case of (57) (where one must set λv = −λs ), or by similar explicit derivation. In particular, implementation of the optimal strategy in this one-dimensional case only requires investing in a single asset, the stock. The optimal investment strategy thus describes the optimal stock position which is given by x∗t = where now, k1 =
1 (1 2γ
1 λs + h(vt , t, s) k1 σ γ
(62)
− γ1 )λ2s , and h(vt , t, s) is described in (59).
9.4.4 Numerical results This subsection presents numerical asset allocation results based on the optimal investment strategies derived above for the Heston (1993) model. We assume a constant rate of inflation at π = 0.02 and a constant nominal interest rate of R = 0.04 (and thus a constant real interest rate of r = 0.02). Moreover, the parameters of the basic Heston (1993) stochastic volatility model are chosen close to empirical estimates obtained for this specific model based on US data; see, e.g., Andersen, Benzoni and Lund (2002), Table IV and footnote 10. In particular, the parameters of the stochastic volatility process in (51) are set so that: κ = 0.50, θ = 0.04, and σ = 0.20.9 The current variance 9
In fact, Andersen, Benzoni and Lund (2002) presents a higher estimate of the mean-reversion parameter, κ = 3.2508. We have chosen to use a slower rate of mean-reversion mainly to illustrate the possibility of longer horizon asset allocation effects in the model. Thus, using a parameter value of κ = 3.2508 would almost eliminate the 5 year, 15 year, and 35 year higher stock allocations in Table 1 and Table 2.
297
Claus Munk, Carsten Sørensen √ rate vt is chosen such that the current stock volatility is 0.20; i.e. vt = 0.20 and, hence, vt = 0.04. Finally, the price on stock price risk is λs = 0.80. This implies that the expected excess return on stocks is, λs vt = 0.80 · 0.04 = 3.2%, at the current volatility level. The price on volatility risk is set at, λv = ρλs , which in our examples implies that there are no speculative demand for bearing volatility risk. It is a well-known fact that the correlation between stock prices and stock price volatility is usually estimated negative; in fact, this phenomenon is often referred to as the “leverage effect.” Below we present optimal asset allocation choices for two values of the correlation coefficient between the stock price and the volatility. First we consider the case of perfect negative correlation, ρ = −1. Then the case where the correlation is set at ρ = −0.60.10 In Table 1, we have tabulated the optimal stock proportion for investors with different degrees of constant relative risk aversion and time horizons in the case of perfectly negative correlation between the stock and the volatility process (ρ = −1). In this case, the investor only needs to consider how much to invest in stocks in order to implement the optimal investment strategy, and the residual is invested in the bank account or, equivalently, in bonds (since interest rates are assumed constant). The optimal stock proportions in Table 1 are obtained by inserting the relevant parameter values in the expression for the optimal investment strategy in (62). The results indicate that investors with longer investment horizons should optimally invest a higher fraction of wealth in stocks. This result is similar to results presented by Wachter (2002) in a similar setting where the excess return of stocks follow a mean-reverting process which is perfectly negatively correlated with the stock price. While the excess return in the present context (i.e. λs vt ) follows a CIR squareroot process, Wachter (2002) assumes that the excess return follows an Ornstein-Uhlenbeck process. Moreover, Wachter (2002) assumes constant volatilities. However, conceptually similar to the insight of Wachter (2002), the higher stock proportions for long horizon investors are due to the mean-reversion in stock prices that the negative correlation between excess returns and stock price movement induces. Similar to the complete market analysis in Wachter (2002), the asset allocation results with utility from intermediate consumption in Table 1 are basically obtained as a “duration” weighted averages over similar strategies for investors with utility from terminal wealth only, 10 This specific parameter value match the estimate of ρ = −0.5877 presented by Andersen, Benzoni and Lund (2002).
298
Optimal Consumption and Investment Strategies as formalized in our setting by the expression for h(vt , t, s) in (59). Table 1: Stock investment in perfect negative correlation case (ρ = −1) Panel A: Utility from terminal wealth only Investment Horizon
γ=1
Relative Risk Aversion γ=2 γ=4
one month
80.0%
40.1%
20.1%
10.1%
5 year
80.0%
43.3%
22.7%
11.6%
15 year
80.0%
43.8%
23.1%
11.9%
25 year
80.0%
43.8%
23.1%
11.9%
γ=8
Panel B: Utility from consumption and terminal wealth Investment Horizon
γ=1
Relative Risk Aversion γ=2 γ=4
one month
80.0%
40.1%
20.1%
10.1%
5 year
80.0%
42.4%
21.8%
11.1%
15 year
80.0%
43.1%
22.5%
11.5%
25 year
80.0%
43.3%
22.7%
11.6%
γ=8
In Table 2, we have tabulated similar optimal investment strategies for investors with different degrees of constant relative risk aversion and time horizons in the case of less than perfectly negatively correlation between the stock and the volatility process (ρ = −0.60). In this case, we allow the investor to invest as well in a derivative on the volatility/excess return state-variable, and the optimal asset allocations are in this case obtained by inserting the relevant parameter values in the expression for the optimal investment strategy in (57). In particular, in our numerical example the investor is allowed to invest in a straddle written on the stock and priced according to the Heston (1993) expressions.11 The straddle expires in one-month 11
A straddle is a combination of a bought call option and a bought put option
299
Claus Munk, Carsten Sørensen and the exercise price in the straddle position is set at 1.0075 times the current stock price such that the straddle is currently insensitive to changes in the underlying stock price (i.e. ∂C = 0). The straddle is ∂S thus designed to have maximal correlation and be a relevant instrument to hedge changes in the volatility/excess return state-variable vt in the economy considered. Table 2: Asset allocation choices under stochastic volatility/excess returns Panel A: Utility from terminal wealth only
γ=1
Relative Risk Aversion γ=2 γ=4
γ=8
Stock
80.0%
40.0%
20.0%
10.0%
Bank/Bonds
20.0%
60.3%
80.2%
90.2%
Straddle
0.0%
– 0.3%
– 0.2%
– 0.2%
Panel B: Utility from consumption and terminal wealth
γ=1
Relative Risk Aversion γ=2 γ=4
γ=8
Stock
80.0%
40.0%
20.0%
10.0%
Bank/Bonds
20.0%
60.3%
80.2%
90.1%
Straddle
0.0%
– 0.3%
– 0.2%
– 0.1%
The results tabulated in Table 2 are obtained for investors having a investment horizon of 15 years. However, the results are quite invariable for other horizon since the optimal stock allocation is the same for all investment horizons. Thus, only the allocation into the straddle position reflects potential horizon effects (which is also reflected in the residually determined bank account or, equivalently, bond position). For all investment horizons, the proportion of wealth invested in the straddle in order to hedge changes in volatility/excess return written on the stock and with the same exercise prices and maturities. The price on the straddle is obtained using the expression for the call option price in (52) and the put-call parity.
300
Optimal Consumption and Investment Strategies risk is numerically small. In the case of perfectly negative correlation in Table 1, this hedge was reflected in the higher stock proportions for longer term investors. The straddle is intuitively a more powerful instrument and, thus, the similar hedge can be accomplished using relatively small straddle positions. Also note that the correlation between the stock and the straddle (which is designed to have perfect positive correlation with vt ) is negative in the example. Therefore, while the hedge in Table 1 is accomplished by investing an additional proportion of wealth in stocks, the hedge position in Table 2 involves a short straddle position. 9.5
Extensions
9.5.1 HARA utility Let us describe how the general result for CRRA utility functions in the previous section can be generalized to HARA utility functions of the form U1 (c, t) = ε1 e−βt
(c − c(t))1−γ , 1−γ
U2 (w) = ε2 e−βT
(w − w)1−γ , (63) 1−γ
where c(t) and w are non-negative, non-stochastic real numbers that can be interpreted as the subsistence time t real consumption and terminal wealth level, respectively. Defining cˆt = ct − c(t) and wˆ = w − w, we can reformulate the static utility maximization problem (11)–(12) for the HARA investor as T ct )1−γ ˆ 1−γ −βt (ˆ −βT (w) dt + e , (64) e sup E 1−γ 1−γ (ˆ c,w) ˆ 0 T mt cˆt dt + mT w ˆ ≤ w0 − L0 , (65) s.t. E 0
where 1 Et Lt = mt
T
ms c(s) ds + mT w t
denotes the costs of meeting the future subsistence consumption and terminal wealth level. Of course, if w0 < L0 , the problem has no solution. If w0 > L0 , the problem is mathematically equivalent to the problem of a CRRA investor. We find that the optimal consumption process is 1 w ∗ − Lt , c∗t = ct + ε1γ t Qt 301
Claus Munk, Carsten Sørensen while the optimal wealth process is − γ1
β
wt∗ = Lt + e− γ t Qt mt
w 0 − L0 , Q0
which is obtained by the portfolio & % & % −1 Lt −1 1 ∗ σP t λt + σQt + σPt σΠt . xt = 1 − ∗ wt γ The value function becomes Vt = 9.5.2
1 Qγ (w∗ − Lt )1−γ . 1−γ t t
Habit formation
The preferences applied above and in most papers on portfolio and consumption choice are additively time separable. In particular, the utility of consumption at one point in time is independent of the consumption level at all other dates. However, it is probably more realistic that individuals develop habits for consumption so that the utility of the consumption level at one day is decreasing in some average of past consumption levels, cf., e.g., Browning (1991). Several papers have shown that the introduction of habit formation can resolve several of the “puzzles” of asset pricing models with a representative agent having time separable utility; see, e.g., Constantinides (1990) and Campbell and Cochrane (1999). A particularly tractable case is that of power-linear habit utility, i.e. the utility of a consumption stream (ct )t∈[0,T ] is of the form 0
T
e−βt
(ct − ht )1−γ dt, 1−γ
where ht is the habit level defined by t ht = h0 e−βt + α e−β(t−s) cs ds, 0
i.e. an exponentially weighted average of past consumption. Schroder and Skiadas (2002) demonstrate that the solution of a maximization problem with linear habit utility can be expressed in terms of the solution to a problem with standard time additive utility. Munk (2002) applies this procedure to derive the optimal strategies for 302
Optimal Consumption and Investment Strategies an investor with power-linear habit utility in a complete market, but does not explicitly incorporate inflation risk. Adapting that result to our setting with inflation risk, we obtain the optimal strategies given below. The solution will be stated in terms of the processes F = (Ft ) and G = (Gt ) defined by 9 8 % &1− γ1 T ms 1− γ1 − γδ (s−t) Gt = Et e (1 + αFs ) ds , mt t T T −(β−α)(s−t) ms Ft = Et e ds = e−(β−α)(s−t) Bts ds, mt t t where Bts is the real price at time t of a zero-coupon real bond paying one consumption unit at time s. We can interpret Ft as the real price of a real bond paying a continuous coupon that is exponentially declining over time. Then ht Ft is the cost of ensuring that future consumption exactly equals the habit level since with cs = hs for all s ≥ t, we have hs = e−(β−α)(s−t) ht . If we write the dynamics of the zero-coupon bond prices as dBts = Bts rt + (σts ) λt dt + (σts ) dzt , the dynamics of Ft becomes dFt = −1 dt + Ft where
rt + σFt λt dt + σFt dzt ,
T
e−(β−α)(s−t) Bts σts ds σF t ≡ t T . −(β−α)(s−t) B s ds e t t
We write the dynamics of Gt as dzt . dGt = Gt μGt dt + σGt The optimal consumption process c∗ = (c∗t ) is given by 1
c∗t = h∗t + (1 + αFt )− γ
wt∗ − h∗t Ft . Gt
The indirect utility is Vt =
1 Gγ (w∗ − h∗t Ft )1−γ . 1−γ t t 303
Claus Munk, Carsten Sørensen Finally, the optimal investment strategy is given by the vector & & % % h∗t Ft 1 −1 h∗t Ft −1 ∗ xt = 1 − ∗ σP t σ λt + 1 − ∗ σGt wt γ Pt wt −1 h∗ Ft −1 + t ∗ σPt σF t + σPt σΠt . wt Here h∗t and wt∗ are the habit level and the real wealth induced by the optimal consumption and investment strategy. 9.5.3
Labor income
So far we have assumed that the only income in the life of the investor is given by the return on her financial investments. Since labor income is the predominant source of income for most individuals, it is extremely important for consumption and investment decisions to take into account both the current level of labor income, the drift and riskiness of the income stream, and its correlation with financial asset returns. The introduction of labor income does not dramatically complicate the analysis as long as (1) the labor income stream is spanned by the traded financial assets, i.e. has no other risk components, and (2) the investor is able to borrow using future income as implicit collateral. Suppose for example that the individual receives an exogenously given labor income at the rate Yt , where dYt = Yt μY t dt + σYt dzt , so that the nominal income over the short period [t, t + dt] is Yt dt. Since the income stream is fully hedgeable, it can be valued as any financial asset. The time t real value of the income stream (Ys )s∈[t,T ] is therefore T m s Ys lt = Et ds . mt Πs t In this situation, we can think of the agent “selling” his future income at the financial market in the exchange of the payment lt so that he has a total real wealth of wt + lt to use for consumption and investments. Consequently, the optimal consumption rate of a CRRA investor in Eq. (22) must be adjusted to 1
c∗t = ε1γ 304
wt∗ + lt . Qt
Optimal Consumption and Investment Strategies The individual will invest in a financial portfolio such that the riskiness of the total position of financial investments and labor income is similar to the riskiness of the optimal financial portfolio in the absence of labor income. Denoting the percentage volatility of lt by σlt , we arrive at the optimal portfolio & % & % −1 lt −1 lt −1 1 ∗ σP t λt + σQt − ∗ σPt σlt + σPt σΠt , xt = 1 + ∗ wt γ wt which generalizes (23). While the results above certainly provide some intuition on the effects of labor income, the underlying assumptions on the income process are probably not realistic. The labor income of most individuals is not fully hedgeable in the financial markets and due to moral hazard and adverse selection problems, it may be impossible to borrow against future income so that the individual faces portfolio constraints. However, the optimal consumption and investment choice problem in settings allowing undiversifiable income risk and portfolio constraints can only be completely solved using numerical methods for optimal control. Key papers addressing the implications of labor income on consumption and portfolio choice are Bodie, Merton and Samuelson (1992), Cuoco (1997), Duffie, Fleming, Soner and Zariphopoulou (1997), Koo (1998), and Munk (2000). 9.5.4 Allowing for undiversifiable inflation risk Brennan and Xia (2002) solve the consumption and investment choice problem of a CRRA investor in a very concrete setting with particularly simple stochastic processes for a stock index, interest rates, the consumer price level, and the expected inflation rate. In particular, they allow for the case where the consumer price level has an undiversifiable risk component so that the market is incomplete. Nevertheless, they obtain closed-form solutions for the optimal strategies both with and without intermediate consumption. This contrasts the analysis of Liu (1999) who is only able to find closed-form optimal strategies with intermediate consumption if the financial market is complete. A natural question is: Does the extension to undiversifiable inflation risk depend crucially on their specialized setting, or can such a risk component generally be included without significantly complicating the analysis? To investigate this issue, we generalize the complete markets model of Sections 2 and 3 by adding a term with a new Brownian motion 305
Claus Munk, Carsten Sørensen to the dynamics of the consumer price level. To be more precise, we replace (1) by dzt + σ ˆΠt dˆ zt , dΠt = Πt πt dt + σΠt
(66)
where zˆ is a one-dimensional standard Brownian motion independent of z. Due to the unhedgeable component of the inflation process, the individual can no longer completely control his real wealth process by appropriate behavior. Hence we can no longer think of the investor choosing the real consumption process and the real terminal wealth as in the formulation (11)–(12), but we have to consider the nominal version (8)–(9) and distinguish between the risks that the individual controls and those he cannot control. From (66) we can write
s 1 s 2 πu du − σΠu du + σΠu dzu Πs = Πt exp 2 t t t s s 1 × exp − σ ˆ 2 du + σ ˆΠu dˆ zu 2 t Πu t ≡ Πt η(t, s)ˆ η (t, s). s
(67)
Assuming that (σΠt ) is adapted to the filtration generated by z and (ˆ σΠt ) is adapted to the filtration generated by zˆ, the random variables η(t, s) and ηˆ(t, s) will be independent. ˆ associated with the new There may be a market price of risk, λ, source of risk, zˆ, so that the real pricing kernel satisfies s s 1 s 2 ru du − λu du − λ ms = mt exp − u dzu 2 t t t s 1 s ˆ2 ˆ u dˆ × exp − zu λu du − λ 2 t t ˆ s). ≡ mt ζ(t, s)ζ(t,
(68)
Since the investor can only vary his nominal consumption rate and terminal wealth level in the space of random variables measurable with respect to the z-filtration, we have to be careful when deriving the first-order conditions for the utility maximization problem (8)–(9). 306
Optimal Consumption and Investment Strategies Applying (67) and (68), we can write the Lagrangian as 8
L = E ε1
T
e
1 1−γ
−βt
0
+ ε2 e ψ − Π0
−βT
0
%
Ct Π0 η(0, t)ˆ η (0, t)
&1−γ dt
&1−γ WT Π0 η(0, T )ˆ η (0, T ) *9 ˆ T) ˆ t) ζ(0, T )ζ(0, ζ(0, t)ζ(0, Ct dt + W − W0 . η(0, t)ˆ η (0, t) η(0, T )ˆ η (0, T )
1 1−γ T
%
We cannot maximize this expectation with respect to the filtration generated by both z and zˆ by a state-by-state maximization, since the individual can only control the states in the filtration generated by z. By independence, however, we can rewrite the Lagrangian as 8
ε1 L=E 1−γ
0
T
γ−1 e−βt Ct1−γ Πγ−1 E[ˆ η (0, t)γ−1 ] dt 0 η(0, t)
ε2 −βT 1−γ γ−1 e WT Π0 η(0, T )γ−1 E[ˆ η (0, T )γ−1 ] 1−γ 8 9 T ˆ t) ψ ζ(0, ζ(0, t) − E Ct dt Π0 ηˆ(0, t) 0 η(0, t) 8 9 *9 ˆ T) ζ(0, ζ(0, T ) E W − W0 , + ηˆ(0, T ) ηˆ(0, T ) +
where the outer expectation is now with respect to the uncertainty that the individual can control so that we can maximize state-by-state as is usually done. Doing that, the first-order conditions become 9− γ1 ˆ t) ζ(0, E[ˆ η (0, t)γ−1 ] E Ct = 1 Π0 e η(0, t)ζ(0, t) ηˆ(0, t) ψγ , 9− γ1 8 1 ˆ 1 β 1 ε2γ ζ(0, T ) η (0, T )γ−1 ] γ E W = 1 Π0 e− γ T η(0, T )ζ(0, T )− γ E[ˆ ηˆ(0, T ) γ ψ 1
ε1γ
−β t γ
− γ1
γ1
8
.
Substituting into the budget constraint, we find that ˆ 0 ), ψ −1/γ = W0 /(Π0 Q 307
Claus Munk, Carsten Sørensen where we have defined 9 8 T 1 ˆ s) 1 ζ(t, −β (s−t) 1− γ ˆ Qt = ε1 Γ(s) ds e γ Et ζ(t, s) γ Et ηˆ(t, s) t 9 8 1 ˆ β 1 ζ(t, T ) + ε2γ e− γ (T −t) Et ζ(t, T )1− γ Et Γ(T ) ηˆ(t, T ) with 1 Γ(s) = E ηˆ(0, s)γ−1 γ
8
ˆ s) ζ(0, E ηˆ(0, s)
9− γ1 ,
s ∈ [0, T ].
The optimal nominal wealth process becomes Wt∗ =
1 W0 − βγ t ˆ t. e η(0, t)ζ(0, t)− γ Q Q0
(69)
1 qts = Et ζ(t, s)1− γ .
Define
Since this expectation only involves uncertainty induced by z, the dynamics will be of the form s dzt . dqts = qts μsqt dt + σqt Similarly define
8 qˆts
= Et
9 ˆ s) ζ(t, , ηˆ(t, s)
which only involves uncertainty induced by zˆ so the dynamics will take the form s ˆsqt dt + σ ˆqt dˆ zt . dˆ qts = qˆts μ ˆ t is Consequently, the dynamics of Q ˆt = Q ˆ t . . . dt + σQt dQ dzt + σ ˆQt dˆ zt , where 1
σQt
σ ˆQt
308
T
β
1
β
s T e− γ (s−t) Γ(s)qts qˆts σqt ds + ε2γ e− γ (T −t) Γ(T )qtT qˆtT σqt = , (70) 1 1 β β T ε1γ t e− γ (s−t) Γ(s)qts qˆts ds + ε2γ e− γ (T −t) Γ(T )qtT qˆtT 1 1 β β T s T ˆqt ds + ε2γ e− γ (T −t) Γ(T )qtT qˆtT σ ˆqt ε1γ t e− γ (s−t) Γ(s)qts qˆts σ = . (71) 1 1 β β T ε1γ t e− γ (s−t) Γ(s)qts qˆts ds + ε2γ e− γ (T −t) Γ(T )qtT qˆtT
ε1γ
t
Optimal Consumption and Investment Strategies Applying Itˆo’s Lemma we find that the dynamics of the optimal wealth process is ˆt 1 d Q dη(0, t) dζ(0, t) − + dWt∗ = . . . dt + Wt∗ ˆt η(0, t) γ ζ(0, t) Q % & 1 ∗ λt + σQt + σΠt = . . . dt + Wt dzt + Wt∗ σ ˆQt dˆ zt . (72) γ Comparing with the nominal wealth process (7) for a given investment strategy, we see that the optimal choice of consumption and terminal wealth can only be financed with a portfolio of traded securities if σ ˆQt is identically equal to zero. In that case, the portfolio is given by (23), but with σQt defined in (70), and we have a obtained a generalized version of Theorem 1 which encompasses the Brennan and Xia analysis as a very special case. The term σ ˆQt will be zero whenever qˆts is deterministic for each s, i.e. whenever there is no uncertainty about how the expectations 9 8 ˆ s) ζ(t, Et ηˆ(t, s) s 1 s ˆ2 2 ˆu + σ = Et exp − λu − σ λ zu ˆΠu du − ˆΠu dˆ 2 t t ˆ u and σ are to be updated over time. This is satisfied when λ ˆΠu are both deterministic functions of time. In the model of Brennan and ˆ and σ Xia (2002), it is therefore the assumptions of constant λ ˆΠ (in their notation −ϕu and ξu , respectively) that are crucial for obtaining a closed-form solution for the optimal consumption and portfolio strategies in the incomplete market setting, while the other assumptions on the dynamics of rates and prices are not needed in this respect. 9.6
Concluding remarks
In this chapter, we have derived optimal consumption and investment strategies of an CRRA investor in a complete capital market setting, and surveyed related and recent literature on optimal consumption and investment strategies. Our analysis has stressed the risks individuals want to hedge, and how to implement optimal real consumption strategies by investing in nominal securities. In line with results in Munk and Sørensen (2004), we have thus shown that a CRRA investor faced 309
Claus Munk, Carsten Sørensen with Gaussian interest rate uncertainty will optimally hedge changes in future interest rates by investing in a real coupon bond with real payments that match the forward expected real consumption pattern. Furthermore, this result has been extended to a case under specialized inflation uncertainty where the same investment strategy applies, but using similar nominal bonds in implementing the optimal hedge strategy. In addition, several extensions of the general modeling framework have been given and discussed, including: HARA utility, Habit formation, labor income, and non-diversifiable inflation risk.
Appendix: A Leibnitz-type rule for stochastic processes Lemma 1. Let Zts be a family of stochastic processes so that for each fixed s ∈ [0, T ] dZts = μst dt + σts dzt , 0 ≤ t ≤ s where σts satisfies T
(σts )2 dt < ∞ for all s ∈ [0, T ], T T s 2 (b) 0 σt ds dt < ∞ t
(a)
0
almost surely. Let Yt be defined by
T
Yt = t
Zts ds.
Then the dynamics of Yt are given by %
T
dYt = t
& % μst ds − Ztt dt +
T
t
& σts ds dzt .
Proof: The proof is an application of the generalized Fubini-type rule for stochastic processes stated and applied in the Appendix of Heath, Jarrow and Morton (1992). Let t0 ≤ t1 , then since Zts1 310
=
Zts0
t1
+ t0
μst
t1
dt + t0
σts dzt ,
(73)
Optimal Consumption and Investment Strategies we have Yt1 =
T
t1
T
= t1
Zts0 ds + Zts0 ds +
= Yt0 +
t1
t0
− = Yt0 +
t0 t1
− = Yt0 +
t0 t1
−
t0 t1
t0
T
T
μst dt ds +
μst ds dt +
t1
t0
t1
t0
t
t0
t1
t1
μst ds dt +
T
t1
μst ds dt +
t
t1
t1
Zts0 ds −
t0
T
t t1
Zts0 ds −
t0
= Yt0 +
t t1
t1
t0 T
t1
t0
t1
T
t1
T
t
t0
t0
t1
σts ds dzt
t1
t0
t
t1
σts ds dzt
t1
t
σts ds dzt t1
t0
T
t
t0
t1
t0
T
μst dt ds −
μst ds dt +
σts ds dzt
T
μst ds dt −
t0
s
σts dzt ds
t1
s
t0
σts dzt ds
σts ds dzt
s s s s s Zt0 + μt dt + σt dzt ds
t
t0
T
s μt ds dt +
t0
t1
t0
t
T
σts
ds dzt −
t1
t0
Ztt dt
where the Fubini rule is used in the second and fourth equality while the first equality follows by inserting (73) in the definition of Yt and, also, the last equality follows by using (73) and the fact that t1 t1 t Zt dt = Zss ds; t0
t0
the other equalities follow by pure manipulation of the involved expressions. The claim is now established. 2
311
Claus Munk, Carsten Sørensen References: Amin, K. I., and Jarrow, R. A. (1992) “Pricing Options on Risky Assets in a Stochastic Interest Rate Economy.” Mathematical Finance 2(4): 217–237. Andersen, T. G., Benzoni, L., and Lund, J. (2002) “An Empirical investigation of Continuous-Time Models for Equity Returns.” Journal of Finance 57(3): 1239–1284. Barberis, N. (2000) “Investing for the Long Run when Returns are Predictable.” The Journal of Finance 55: 225–264. Bodie, Z., Merton, R. C., and Samuelson, W. F. (1992) “Labor Supply Flexibility and Portfolio Choice in a Life Cycle Model.” Journal of Economic Dynamics and Control 16: 427–449. Brace, A., and Musiela, M. (1994) “A Multifactor Gauss Markov Implementation of Heath, Jarrow, and Morton.” Mathematical Finance 4(3): 259–283. Breeden, D. T. (1979) “An Intertemporal Asset Pricing Model with Stochastic Consumption and Investment Opportunities.” Journal of Financial Economics 7: 265–296. Brennan, M. J., and Xia, Y. (2000) “Stochastic Interest Rates and the Bond-Stock Mix.” European Finance Review 4(2): 197–210. Brennan, M. J., and Xia, Y. (2002) “Dynamic Asset Allocation under Inflation.” The Journal of Finance 57(3): 1201–1238. Browning, M. (1991) “A Simple Nonadditive Preference Structure for Models of Household Behavior over Time.” Journal of Political Economy 99(3): 607–637. Campbell, J. Y., and Cochrane, J. H. (1999) “By Force of Habit: A Consumption-Based Explanation of Aggregate Stock Market Behavior.” Journal of Political Economy 107: 205–251. Campbell, J. Y., and Viceira, L. M. (2001) “Who Should Buy LongTerm Bonds?” American Economic Review 91(1): 99–127. Canner, N., Mankiw, N. G., and Weil, D. N. (1997) “An Asset Allocation Puzzle.” American Economic Review 87(1): 181–191. Carverhill, A. (1994) “When is the Short Rate Markovian?” Mathematical Finance 4(4): 305–312. Chacko, G., and Viceira, L. M. (2005) “Dynamic Consumption and Portfolio Choice with Stochastic Volatility in Incomplete Markets.” Review of Financial Studies 18(4): 1369–1402. Cochrane, J. H. (2001) Asset Pricing. Princeton University Press. 312
Optimal Consumption and Investment Strategies Constantinides, G. M. (1990) “Habit Formation: A Resolution of the Equity Premium Puzzle.” Journal of Political Economy 98: 519–543. Cox, J. C., and Huang, C.-F. (1989) “Optimal Consumption and Portfolio Policies when Asset Prices Follow a Diffusion Process.” Journal of Economic Theory 49: 33–83. Cox, J. C., and Huang, C.-F. (1991) “A Variational Problem Arising in Financial Economics.” Journal of Mathematical Economics 20: 465– 487. Cox, J. C., Ingersoll, J. E. Jr., and Ross, S. A. (1985) “A Theory of the Term Structure of Interest Rates.” Econometrica 53(2): 385–407. Cuoco, D. (1997) “Optimal Consumption and Equilibrium Prices with Portfolio Constraints and Stochastic Income.” Journal of Economic Theory 71(1): 33–73. Cuoco, D., and Liu, H. (2000) “Optimal Consumption of a Divisible Durable Good.” Journal of Economic Dynamics and Control 24(4): 561–613. Cvitani´c, J., and Karatzas, I. (1992) “Convex Duality in Constrained Portfolio Optimization.” The Annals of Applied Probability 2(4): 767– 818. Damgaard, A., Fuglsbjerg B., and Munk, C. (2003) “Optimal Consumption and Investment Strategies with a Perishable and an Indivisible Durable Consumption Good.” Journal of Economic Dynamics and Control 28(2): 209–253. Duffie, D. (2001) “Dynamic Asset Pricing Theory (Third ed.). Princeton University Press. Duffie, D., Fleming W., Soner, H. M., and Zariphopoulou, T. (1997) “Hedging in Incomplete Markets with HARA Utility.” Journal of Economic Dynamics and Control 21(4–5): 753–782. Duffie, D., and Kan, R. (1996). “A Yield-Factor Model of Interest Rates.” Mathematical Finance 6(4): 379–406. Geman, H. (1989) The Importance of the Forward Neutral Probability in a Stochastic Approach of Interest Rates. Working paper, ESSEC. Grasselli, M. (2000) HJB Equations with Stochastic Interest Rates and HARA Utility Functions. Working paper, CREST, Malakoff Cedex, France. Heston, S. L. (1993) “A Closed-Form Solution for Options with Stochastic Volatility with Applications to Bond and Currency Options.” Review of Financial Studies 6(2): 327–343. 313
Claus Munk, Carsten Sørensen Heath, D., Jarrow, R., and Morton, A. (1992) “Bond Pricing and the Term Structure of Interest Rates: A New Methodology for Contingent Claims Valuation.” Econometrica 60(1): 77–105. Jamshidian, F. (1987) Pricing of Contingent Claims in the One Factor Term Structure Model. Working paper, Merrill Lynch Capital Markets. Karatzas, I., Lehoczky, J. P., and Shreve, S. E. (1987) “Optimal Portfolio and Consumption Decisions for a “Small Investor” on a Finite Horizon.” SIAM Journal on Control and Optimization 25(6): 1557– 1586. Karatzas, I., and Shreve, S. E. (1998) Methods of Mathematical Finance, Volume 39 of Applications of Mathematics. New York: Springer-Verlag. Kim, T. S., and Omberg, E. (1996) “Dynamic Nonmyopic Portfolio Behavior.” The Review of Financial Studies 9(1): 141–161. Koo, H.K. (1998) “Consumption and Portfolio Selection with Labor Income: A Continuous Time Approach.” Mathematical Finance 8(1): 49–65. Leippold, M., and Wu, L. (2000) Quadratic Term Structure Models. Working paper, University of St. Gallen and Fordham University. Liu, J. (1999) Portfolio Selection in Stochastic Environments. Working paper, Stanford University. Liu, J., and Pan, J. (2003) “Dynamic Derivative Strategies.” Journal of Financial Economics 49(3): 401–430. Merton, R. C. (1969) “Lifetime Portfolio Selection Under Uncertainty: The Continuous-Time Case.” Review of Economics and Statistics 51: 247–257. Reprinted as Chapter 4 in Merton (1992). Merton, R. C. (1971) “Optimum Consumption and Portfolio Rules in a Continuous-Time Model.” Journal of Economic Theory 3: 373–413. Erratum: Merton (1973a) Reprinted as Chapter 5 in Merton (1992). Merton, R. C. (1973a) “Erratum.” Journal of Economic Theory 6: 213–214. Merton, R. C. (1973b) “An Intertemporal Capital Asset Pricing Model.” Econometrica 41(5): 867–887. Reprinted in an extended form as Chapter 15 in Merton (1992). Merton, R. C. (1992). Basil Blackwell Inc. 314
Continuous-Time Finance.
Padstow, UK:
Optimal Consumption and Investment Strategies Munk, C. (2000) “Optimal Consumption-Investment Policies with Undiversifiable Income Risk and Liquidity Constraints.” Journal of Economic Dynamics and Control 24(9): 1315–1343. Munk, C. (2002) Portfolio and Consumption Choice with Stochastic Investment Opportunities and Habit Formation in Preferences. Working paper, University of Southern Denmark. Munk, C., and Sørensen, C. (1999) Optimal Investment Strategies with a Heath-Jarrow-Morton Term Structure of Interest Rates. Working paper, University of Southern Denmark at Odense and Copenhagen Business School. Munk, C., and Sørensen, C. (2004) “Optimal Consumption and Investment Strategies with Stochastic Interest Rates.” Journal of Banking and Finance 28(8): 1987–2013. Munk, C., Sørensen, C., and Vinther, T. N. (2004) “Dynamic Asset Allocation Under Mean-Reverting Returns, Stochastic Interest Rates and Inflation Uncertainty: Are Popular Recommendations Consistent with Rational Behavior?” International Review of Economics and Finance 13(2): 141–166. Ocone, D. L., and Karatzas, I. (1991) “A Generalized Clark Representation Formula, with Application to Optimal Portfolios.” Stochastics and Stochastic Reports 34: 187–220. Pearson, N. D., and Sun, T.-S. (1994) “Exploiting the Conditional Density in Estimating the Term Structure: An Application to the Cox, Ingersoll, and Ross Model.” Journal of Finance 49(4): 1279-1304. Pliska, S. R. (1986) “A Stochastic Calculus Model of Continuous Trading: Optimal Portfolios.” Mathematics of Operations Research 11(2): 371–382. Samuelson, P. A. (1969) “Lifetime Portfolio Selection by Dynamic Stochastic Programming.” Review of Economics and Statistics 51: 239–246. Schroder, M., and Skiadas, C. (2002) “An Isomorphism between Asset Pricing Models with and without Linear Habit Formation.” Review of Financial Studies 15(4): 1189–1221. Sørensen, C. (1999) “Dynamic Asset Allocation and Fixed Income Management.” Journal of Financial and Quantitative Analysis 34: 513–531. Vasicek, O. (1977) “An Equilibrium Characterization of the Term Structure.” Journal of Financial Economics 5: 177–188. 315
Claus Munk, Carsten Sørensen Wachter, J. A. (2002) “Portfolio and Consumption Decisions under Mean-Reverting Returns: An Exact Solution for Complete Markets.” Journal of Financial and Quantitative Analysis 37(1): 63–91. Xia, Y. (2001) Long Term Bond Markets and Investor Welfare. Working paper, The Wharton School.
316
Chapter 10 Differential Systems in Finance and Life Insurance
Mogens Steffensen Department of Applied Mathematics and Statistics, Institute for Mathematical Sciences, University of Copenhagen 10.1
Introduction
The mathematics of finance and the mathematics of life insurance are always intersecting. Life insurance contracts specify an exchange of streams of payments between the insurance company and the contract holder. These payment streams may cover the lifetime of the contract holder. Therefore, time valuation of money is crucial for any measurement of payments due in the past as well as in the future. Life insurance companies never put their money under the pillow, and accumulation and distribution of capital gains were always part of the insurance business. With respect to the future, appropriate discounting of contractual obligations improves the estimates of liabilities. Financial contracts specify an exchange of streams of payments as well. However, while the life insurance payment stream is partly linked to the state of health of the insured, the financial payment stream is linked to the ’state of health’ of an enterprise. That could be the stream of dividends distributed to the owners of the enterprise or the stream of claims contingent on the price of the enterprise paid to the holder of a so-called derivative. The discipline of personal finance is particularly closely linked to life insurance. Decisions on, e.g., consumption, investment, retirement, and insurance coverage belong to some of the most substantial lifetime financial decisions of an individual.
Mogens Steffensen Valuation of payment streams is probably the most important discipline in the intersection between finance and life insurance. Various valuation dogmas are in play here. The principle of no arbitrage and the market efficiency assumption are taken as given in the majority of modern academic approaches to valuation of financial contracts. Life insurance contract valuation typically relies on independence, or at least asymptotic independence, between insured lives. Then the law of large numbers ensures that reasonable estimates can be found if the portfolio of insurance contracts is sufficiently large. Both dogmas reduce the valuation problem to being primarily a matter of calculation of conditional expected values. Conditional expected values can be approached by several different techniques. Monte Carlo simulation, for instance, exploits the property that conditional expected values can be approximated by empirical means. Sometimes, however, one can go at least part of the way by explicit calculations, for example, when a series of auxiliary models with explicit expected values converges towards the real model in such a way that the series of explicit expected values converges to the desired quantity. A different route can be taken when the underlying stochastic system is Markovian, i.e., if given the present state, the future is independent of the past. Then solutions to certain systems of deterministic differential equations can often be proved to characterize the conditional expected values. This is the route taken to various valuation problems and optimization problems in finance and life insurance in this exposition. Here, we just state the differential equations, but do not discuss possible numerical solutions to them. Valuation is performed by calculation of conditional expected values. However, the claim to be valuated may contain decision processes for which the valuation problem is extended to a matter of calculating extrema of conditional expected values. The extrema are taken over the set of admissible decision processes. However, also extrema of conditional expected values can be characterized by differential equations, albeit more involved. Also decision problems that are not part of a valuation problem are relevant and are studied here. We solve both a problem of minimizing expected quadratic disutility and a problem of maximizing expected power utility. In both cases, we state differential equations characterizing the solutions. Actually, from a technical point of view, valuation under decision making and utility optimization basically only differ by the first measuring streams of payments and the second measuring streams of utility of payments. Even from a qualitative point of view the disciplines are closely related, e.g. in 318
Differential Systems in Finance and Life Insurance the valuation approach called utility indifference pricing that we shall not deal with here, though. The models used in this article combine the geometric Brownian motion modelling of financial assets with the finite state Markov chain modelling of the state of a life insurance policy. However, the finite state Markov chain model appears in finance in other connections than life insurance. Therefore the stated differential equations apply to other fields of finance. One example is reduced form modelling of credit risk where the ’state of health’, or in this connection creditworthiness, of an enterprise can be modelled by a Markov chain. Another example is valuation of innovative enterprise pipelines. Many types of innovative projects may be modelled by a finite state Markov chain. In e.g. drug development, the drug candidate can be in different states (phases) and certain milestone payments are connected to certain states of the drug candidate. The list of discoverers in the field of Markov processes and systems of partial differential equations is awe-inspiring: Feller, Kolmogorov and Dynkin are the fathers of the connection between Markov processes and mathematical analysis. After them contributions by Feynman, Kac, Davis, Bensoussan and Lions among others are relevant in the context of this article. However, we concentrate on a few references on more recent applications related to the material of this article and enclose a sectionwise outline. Section 2: Thiele wrote down in 1875 an ordinary differential equation for the reserve of a life insurance contract. Interestingly, Thiele was actually also the first to model the Brownian motion mathematically in connection with his studies of time series, see Thiele (1880). Thiele’s work on reserves in life insurance was generalized by Hoem (1969) and further by Norberg (1991). The Nobel prize awarded work by Black and Scholes (1973) and Merton (1973) gave new insight in the pricing of claims contingent on underlying financial processes. The theory of option pricing has since then turned into one of the larger industries of applied mathematics worldwide. Shortly after, applications to insurance products with contingent claims were suggested by Brennan and Schwartz (1976). The first hybrid between Thiele’s and Black and Scholes’ differential equations appeared in Aase and Persson (1994). Differential equations for the reserve that connects Hoem (1969) with Aase and Persson (1994) appeared in Steffensen (2000). We state and derive the differential equations of Thiele, Black and Scholes and a particular hybrid equation.
319
Mogens Steffensen Section 3: Applications to more general life insurance products are based on the notions of surplus and dividend distribution. These were studied by Norberg (1999,2001) who also valuated future dividends by systems of ordinary differential equations. Steffensen (2006b) approached the dividend valuation problem by solving systems of partial differential equations conforming with a particular specification of the underlying financial market. We state the partial differential equation studied in Steffensen (2006b), including a particular case with a semi-explicit solution. Section 4: Contingent claims with early exercise options are connected to the theory of optimal stopping and variational inequalities. Grosen and Jørgensen (2000) realized the connection to surrender options in life insurance. In Steffensen (2002), the connection was generalized to general intervention options and the Markov chain model for the insurance policy. We state and prove the variational inequality for the price of a contingent claim and state the corresponding system for an insurance contract with a surrender option. Section 5: Optimal arrangement of payment streams in life insurance was first based on the linear regulator. We refer the reader to Fleming and Rishel (1975) for the linear regulator and Cairns (2000) for an overview over its applications to life insurance. The linear regulator was combined with the Markov chain model of an insurance contract in Steffensen (2006a). We state and prove the Bellman equation for the linear regulator, and state the Bellman equation derived in Steffensen (2006a), including an indication of the solution. Section 6: The more conventional approach to decision making in finance is based on utility optimization, see Korn (1997) and Merton (1990). Merton (1990) approached decision problems in personal finance and introduced uncertainty of lifetimes. A connection to the Markov chain model of an insurance contract was suggested in Steffensen (2004). In Nielsen (2004) a related problem is solved. We state the Bellman equations for the decision problems solved by Merton (1990) and Steffensen (2004), including an indication of the solution. Both Steffensen and Nielsen approach the decision problem of the life insurance company. The methodology used also applies to related decision problems of the policy holder, though. These problems are studied by Kraft and Steffensen (2006) who generalize original results by Richard (1975).
320
Differential Systems in Finance and Life Insurance 10.2
The differential equations of Thiele and Black-Scholes
10.2.1 Thiele’s differential equation In this section we state and derive the differential equation for the so-called reserves connected to a life insurance contract with deterministic payments. We give a proof for the differential equation that corresponds to the proofs that will appear in the rest of the article. We end the section by considering the stochastic differential equation for the reserve with application to unit-link life insurance. See Hoem (1969) and Norberg (1991) for differential equations for the reserve. We consider an insurance policy issued at time 0 and terminating at a fixed finite time n. There is a finite set of states of the policy, J = {0, . . . , J}. Let Z (t) denote the state of the policy at time t ∈ [0, n] and let Z be an RCLL process (right-continuous, left limits). By convention, 0 is the initial state, i.e. Z (0) = 0. Then also the associated J-dimensional counting process N = N k k∈J is an RCLL process, where N k counts the number of transitions into state k, i.e. N k (t) = # {s |s ∈ (0, t] , Z (s−) = k, Z (s) = k } . The history of the policy up to and including time t is represented by the sigma-algebra F Z (t) = σ {Z (s) , s ∈ [0, t]} . The development of the policy is given by the filtration FZ = F Z (t) t∈[0,n] . Let B (t) denote the total amount of contractual benefits less premiums payable during the time interval [0, t]. We assume that it develops in accordance with the dynamics bZ(t−)k (t) dN k (t) . (1) dB (t) = dB Z(t) (t) + k:k =Z(t−)
Here, B j is a deterministic and sufficiently regular function specifying payments due during sojourns in state j, and bjk is a deterministic and sufficiently regular function specifying payments due upon transition from state j to state k. We assume that each B j decomposes into an absolutely continuous part and a discrete part, i.e. dB j (t) = bj (t) dt + ΔB j (t) .
(2) 321
Mogens Steffensen Here, ΔB j (t) = B j (t) − B j (t−), when different from 0, is a jump representing a lump sum payable at time t if the policy is then in state j. The set of time points with jumps in (B j )j∈J is D = {t0 , t1 , . . . , tq } where 0 = t0 < t1 < . . . < tq = n. We assume that Z is a time-continuous Markov process on the state space J . Furthermore, we assume that there exist deterministic and sufficiently regular functions μjk (t) such that N k admits the stochastic intensity process Z(t−)k μ (t) t∈[0,n] , i.e.
M (t) = N (t) − k
k
t
μZ(s)k (s) ds
0
constitutes an FZ -martingale. 0
→ (←)
active
1 disabled
2 dead
Figure 1: Disability model with mortality, disability, and possible recovery. Figure 1 illustrates the disability model used to describe a policy on a single life, with payments depending on the state of health of the insured. We assume that the investment portfolio earns return on invest s ment by a constant interest rate r. We use the notation t = (s,t] throughout and introduce the short-hand notation s s r= r (τ ) dτ = r (s − t) . t
t
Throughout we use subscript for partial differentiation, e.g. Vtj (t) = ∂ V j (t). ∂t The insurer needs an estimate of the future obligations stipulated in the contract. The usual approach to such a quantity is to think of the insurer having issued a large number of similar contracts with 322
Differential Systems in Finance and Life Insurance payment streams linked to independent lives. The law of large numbers then leaves the insurer with a liability per insured that tends to the expected present value of future payments, given the past history of the policy, as the number of policy holders tends to infinity. We say that the valuation technique is based on diversification of risk. The conditional expected present value is called the reserve and appears on the liability side of the insurer’s balance scheme. By the Markov assumption the reserve is given by n Z(t) − ts r (t) = E e dB (s) Z (t) . (3) V t
We introduce the differential operator A, the rate of payments β, and the updating sum R, μjk (t) V k (t) − V j (t) , AV j (t) = k:k =j
β j (t) = bj (t) +
μjk (t) bjk (t) ,
k:k =j
R (t) = ΔB (t) + V j (t) − V j (t−) . j
j
We can now present the first differential equation, in general spoken of as Thiele’s differential equation. Proposition 1. The statewise reserve defined in (3) is characterized by the following deterministic system of backward ordinary differential equations, / D, 0 = Vtj (t) + AV j (t) + β j (t) − rV j (t) , t ∈ j 0 = R (t) , t ∈ D, 0 = V j (n) .
(4a) (4b) (4c)
In most expositions on the subject, (4a) is written as μjk (t) Rjk (t) , Vtj (t) = rV j (t) − bj (t) − k:k =j
with the so-called sum at risk Rjk (t) defined by Rjk (t) = bjk (t) + V k (t) − V j (t) . In the succeeding sections, however, it turns out to be convenient to work with the differential operator abbreviation. We choose to do 323
Mogens Steffensen this already at this stage in order to communicate the cross-sectional similarities. There are several roads leading to (4). We present a proof that shows that any function solving the differential equation (4) actually equals the reserve defined in (3). Such a result shows that (4) is a sufficient condition on V in the sense that the differential equation characterizes the reserve uniquely. Take an arbitrary function H j (t) solving (4) and consider the process H Z(t) (t). For this process the following line of equalities holds, n s Z(t) (t) = − d e− t r H Z(s) (s) H n t s =− e− t r −rH Z(s) (s) ds + dH Z(s) (s) t ⎞ ⎛ n s Z(s−)k e− t r ⎝dB (s) − RH dM k (s)⎠ = − −
t
k:k =Z(s−) n
e− t
s t
Z(s) r Hs (s) + AH Z(s) (s) + β Z(s) (s) − rH Z(s) (s) ds
e−
s∈(t,n]∩D
=
n
e−
s t
r
Z(s)
RH
(s)
⎛
s t
r
⎝dB (s) −
t
⎞ Z(s−)k
RH
(s) dM k (s)⎠ .
(5)
k:k =Z(s−)
j jk Here RH and RH are defined as Rj and Rjk with V replaced by H. Now, taking conditional expectation on both sides and assuming sufficient integrability, the integral with respect to the martingale vanishes. This leaves us with the conclusion that any solution to (4) equals the reserve, H Z(t) (t) = V Z(t) (t) . We end this section by stating the dynamics of the reserve. Using (4) we get the following, μZ(t)k (t) RZ(t)k (t) dt dV Z(t) (t) = rV Z(t) (t) dt − dB Z(t) (t) −
+
k:k =Z(t)
V k (t) − V Z(t−) (t) dN k (t) ,
(6)
k:k =Z(t−)
that is a backward stochastic differential equation. The word backward refers to the fact that the solution is fixed by the terminal condition (4c), i.e. V Z(n) (n) = 0. Usually this terminal condition is 324
Differential Systems in Finance and Life Insurance rewritten by (4b) into V j (n−) := ΔB j (n) where ΔB j (n) is a fixed terminal payment. However, one can turn things upside down by taking this terminal condition to be the defining relation of ΔB Z(n) (n) in terms of V Z(n) (n−), i.e. ΔB Z(n) (n) := V Z(n) (n−) with V Z(n) (n−) given by (6). Then the terminal condition V Z(n) (n) = 0 is fulfilled by construction. We then just need an initial condition on V to consider it as a forward stochastic differential equation. Here, one should take the so-called equivalence relation V 0 (0−) as initial condition. Hereafter, V k (t) can be taken to be anything and plays the role as initial condition at time t on V , given that the policy jumps into state k. See also Kraft and Steffensen (2006) who study in further details the transformation from a backward to a forward differential equation. The type of life insurance where terminal payments are linked to the development of the policy is, generally speaking, known as unitlink life insurance. The construction described above is indeed a kind of unit-link life insurance with no guarantee in the sense that there are no predefined bounds on ΔB Z(n) (n). The simplest implementation turns out by putting V k (t) = V Z(t−) (t) so that dV Z(t) (t) = rV Z(t) (t) dt−dB Z(t) (t)− μZ(t)k (t) bZ(t)k (t) dt. (7) k:k =Z(t)
This means that the reserve is maintained upon transition and the risk sum Rjk (t) reduces to the transition payment bjk (t). Then the reserve is really nothing but an account from which the infinitesimal benefits less premiums dB Z(t) (t) are paid and from which the so-called natural risk premium rate μZ(t)k (t) bZ(t)k (t) k:k =Z(t)
is withdrawn to cover the benefits bZ(t)k (t), k = Z (t). 10.2.2 Black-Scholes differential equation In this section we state and prove the differential equation for the value of a financial contract with payments linked to a stock index. See Black and Scholes (1973) and Merton (1973) for the original contributions. We consider a financial contract issued at time 0 and terminating at a fixed finite time n. The payoff from the financial contract is linked 325
Mogens Steffensen to the value of a stock index. Let X (t) denote the stock index at time t ∈ [0, n]. The history of the stock index up to and including time t is represented by the sigma-algebra F X (t) = σ {X (s) , s ∈ [0, t]} . The development of the stock index is formalized by the filtration FX = F X (t) t∈[0,n] . Let B (t) denote the total amount of contractual payments during the time interval [0, t]. We assume that it develops in accordance with the dynamics dB (t) = b (t, X (t)) dt + ΔB (t, X (t)) ,
(8)
where b (t, x) and ΔB (t, x) are deterministic and sufficiently regular functions specifying payments if the stock value is x at time t. The decomposition of B into an absolutely continuous part and a discrete part conforms with (2). Again, we denote the set of time points with jumps in B by D = {t0 , t1 , . . . , tq } where 0 = t0 < t1 < . . . < tq = n. The most classical example of a contractual payment function is the European call option given by the following specification of payment coefficients, b (t, x) = 0, ΔB (t, x) = 0, t < n, ΔB (n, x) = max (x − K, 0) ,
(9)
for some constant K. We assume that X is a time-continuous Markov process on R+ with continuous paths. Furthermore, we assume that the dynamics of X are given by the stochastic differential equation, dX (t) = αX (t) dt + σX (t) dW (t) , X (0) = x0 , where W is a Wiener-process, and α and σ are constants. We assume that one may invest in X but, at the same time, a riskfree investment opportunity is available. The riskfree investment opportunity earns return on investment by a constant interest rate r, corresponding to the investment portfolio underlying the insurance portfolio in the previous section. 326
Differential Systems in Finance and Life Insurance The issuer of the financial contract wishes to calculate the value of the future payments in the contract. The idea of so-called derivative pricing is that the contract value should prevent the contract from imposing arbitrage possibilities, i.e. riskfree capital gains beyond the return rate r. The entrepreneurs of modern financial mathematics realized that, in certain financial markets like the one given here, this idea is sufficient to produce the unique value of the financial contract. This contract value equals the conditional expected value, n Q − ts r e dB (s) X (t) , (10) V (t, X (t)) = E t
where dX (t) = rX (t) dt + σX (t) dW Q (t) , with W Q being a Wiener-process under the measure Q. The measure Q is called a martingale measure because the discounted stock index e−rt X (t) is a martingale under this measure. This construction ensures that the price preventing arbitrage possibilities can be represented in the form (10). Thus, it is actually just a probability theoretical tool for representation. We introduce the differential operator A, the rate of payments β, and the updating sum R, 1 AV (t, x) = Vx (t, x) rx + Vxx (t, x) σ 2 x2 , 2 β (t, x) = b (t, x) , R (t, x) = ΔB (t, x) + V (t, x) − V (t−, x) . We can now present the second differential equation. Proposition 2. The contract value given by (10) is characterized by the following deterministic backward partial differential equation, 0 = Vt (t, x) + AV (t, x) + β (t, x) − rV (t, x) , t ∈ / D, 0 = R (t, x) , t ∈ D, 0 = V (n, x) .
(11a) (11b) (11c)
The usual situation in financial expositions is that there are no payments until termination, in the case of which (10.11) reduces to 0 = Vt (t, x) + AV (t, x) − rV (t, x) , V (n−, x) = ΔB (n, x) , 327
Mogens Steffensen in general spoken of as the Black-Scholes equation. For the European call option given by (9), the terminal condition is given by V (n−, x) = max (x − K, 0). In this case, the system has an explicit solution that is known as the Black-Scholes formula. This can be found in almost any textbook on derivative pricing. As in the previous section we prove that the differential equation is a sufficient condition on the contract value in the sense that any function solving (10.11) indeed equals the contract value given by (10). Take an arbitrary function H solving (10.11) and consider the process H (t, X (t)). For this process the following line of equalities holds, n s d e− t r H (s, X (s)) H (t, X (t)) = − t n − ts r =− e (−rH (s, X (s)) + dH (s, X (s))) t n s = e− t r dB (s) − Hx (s, X (s)) σX (s) dW Q (s) t n s e− t r Hs (s, X (s)) + AH (s, X (s)) − t s e− t r RH (s, X (s)) + β (s, X (s)) − rH (s, X (s)) ds − =
n
e−
s t
r
s∈(t,n]∩D
dB (s) − Hx (s, X (s)) σX (s) dW Q (s) .
(12)
t
Now, taking conditional expectation on both sides and assuming sufficient integrability, the integral with respect to the martingale vanishes. This leaves us with H (t, X (t)) = V (t, X (t)) . Thus, any function solving (10.11) equals the contract value, and the differential equation is then a sufficient condition to characterize the contract value. 10.2.3
A hybrid equation
In this section we state the differential equation for the reserve connected to a life insurance contract with payments linked to a stock index. We end the section by considering a stochastic differential equation for the reserve with applications to unit-link life insurance. See Brennan and Schwartz (1976), Aase and Persson (1994), and Steffensen (2000) for the original ideas and the general hybrid equations, respectively. 328
Differential Systems in Finance and Life Insurance As in section 10.2.1, we consider an insurance policy issued at time 0 and terminating at a fixed finite time n with a payment stream given by (1). However, instead of letting each B j and each bjk be deterministic functions of time, we introduce dependence on the stock index as formalized in section 10.2.2. We assume that the accumulated payment process develops in accordance with the dynamics bZ(t−)k (t, X (t)) dN k (t) , (13) dB (t) = dB Z(t) (t, X (t)) + k:k =Z(t−)
where dB j (t, x) = bj (t, x) dt + ΔB j (t, x) , with sufficiently regular functions bjk (t, x), bj (t, x), and ΔB j (t, x). As in the previous sections, we are interested in valuation of the future payments in the payment process. The question is now how we should integrate the two approaches to risk pricing presented there. In section 10.2.1, we assumed insured risk to obey the law of large numbers and based the risk valuation on diversification. This left us with a conditional expected present value under the objective probability measure. In section 10.2.2, we based the risk valuation on the no-arbitrage paradigm of derivative pricing. This left us with a conditional expected present value under an artificial measure Q called the martingale measure. Which measure should we now use for valuation of integrated insurance and financial risk in the payment process (13)? The prevention of arbitrage possibilities is not sufficient to get a unique martingale measure. Instead, this idea leaves us with an infinite set of martingale measures. From these measures, some can be said to play more important roles than others. Probably the most important role is played by the product measure that combines the objective measure of insurance risk with the martingale measure of financial risk. We denote, with a slight misuse of notation, also this product measure by Q. This particular martingale measure appears both in several so-called quadratic hedging approaches and in the theory of asymptotic arbitrage. Typically, this measure is applied for valuation of integrated financial and insurance risk. Here, we simply take this measure for given and proceed. It should be mentioned that the differential equation below holds for a much larger class of martingale measures in the following sense: Instead of valuating insurance risk under the objective measure, one could change this measure and still have a martingale measure. However, changing the measure of insurance risk is just a matter of changing the transition intensities for Z. So changing the intensities in the 329
Mogens Steffensen formulas below corresponds to picking out an alternative martingale measure to the product measure described in the previous paragraph. We can now define the reserve by n Z(t) Q − ts r V (t, X (t)) = E e dB (s) Z (t) , X (t) . (14) t
Note here that we choose the term reserve for the hybrid (14) of the reserve given in (3) and the contract value given in (10.10). This reflects that the reserve (14) typically appears on the liability side of an insurance company’s balance scheme. We introduce the differential operator A, the payment rate β and the updating sum R, μjk (t) V k (t, x) − V j (t, x) (15a) AV j (t, x) = k:k =j
1 j +Vxj (t, x) rx + Vxx (t, x) σ 2 x2 , 2 μjk (t) bjk (t, x) , β j (t, x) = bj (t, x) +
(15b)
k:k =j
Rj (t, x) = ΔB j (t, x) + V j (t, x) − V j (t−, x) .
(15c)
We can now present the third differential equation. Proposition 3. The reserve given by (14) is characterized by the following deterministic system of backward partial differential equations, / D, 0 = Vtj (t, x) + AV j (t, x) + β j (t, x) − rV j (t, x) , t ∈
(16a)
j
0 = R (t, x) , t ∈ D,
(16b)
j
(16c)
0 = V (n, x) .
We shall not go through the derivation of the differential equation characterizing the reserve. The recipe and the calculations can be copied from the previous section but they become more messy as the valuation problem expands. But it is worthwhile to realize that the differential equation (16) is a true generalization of both (4) and (10.11). The specialization of (16) into (4) comes from erasing all stock index dependence. The specialization into (10.11) comes from erasing all state dependences and all payments triggered by transitions of Z. 330
Differential Systems in Finance and Life Insurance We end this section by studying the special insurance contract introduced at the end of section 10.2.1 in the presence of stock index dependence. The backward stochastic differential equation corresponding to (6) describing the dynamics of the reserve turns into dV Z(t) (t, X(t)) = rV Z(t) (t, X (t)) + (α − r) VxZ(t) (t, X(t)) X (t) dt + VxZ(t) (t, X (t)) σX (t) dW (t) − dB Z(t) (t, X (t)) − μZ(t)k (t) RZ(t)k (t, X (t)) dt k:k =Z(t)
+
k V (t, X (t)) − V Z(t−) (t, X (t)) dN k (t)
k:k =Z(t−)
(17) with Rjk (t, x) = bjk (t, x) + V k (t, x) − V j (t, x) . As in section 10.2.1 we let ΔB Z(n) (n) := V Z(n) (n−) be the defining relation implying that the terminal condition V Z(n) (n) = 0 is fulfilled by construction. Furthermore, we assume that from the reserve a proportion π (t) is invested in the stock index at time t. Then, letting h denote the number of stock indices held at time t and noting that π (t) V Z(t) (t, X (t)) = h (t) X (t) , we then have that VxZ(t) (t, X (t)) =
V Z(t) (t, X (t)) h (t) = . π (t) X (t)
Plugging this relation into (17) gives us a general version of (6). We write down here the special case coming from V k (t) = V Z(t−) (t), corresponding to (7), dV Z(t) (t, X (t)) = (r + π (t) (α − r)) V Z(t) (t, X (t)) dt +σπ (t) V Z(t) (t, X (t)) dW (t) −dB Z(t) (t, X (t)) μZ(t)k (t) bZ(t)k (t, X (t)) dt. − k:k =Z(t)
Now, this is an investment account with the proportion π invested in the stock index and with a flow of payments corresponding to (7), except for the possibility of stock index dependence in all payments. 331
Mogens Steffensen 10.3 10.3.1
Surplus and dividends The dynamics of the surplus
In this section we introduce the notion of surplus that measures the excess of assets over liabilities. Also the notion of dividends that allows the insured to participate in the performance of the insurance contract is introduced. For the succeeding sections, only the process of dividends and the derived dynamics of the surplus are important. See Norberg (1999,2001) and Steffensen (2006b) for detailed studies of the notions of surplus and dividends. Life insurance contracts are typically long-term contracts with time horizons up to half a century or more. Calculation of reserves is based on assumptions on interest rates and transition intensities until termination. Two difficulties arise in this connection. First, these are quantities that are difficult to predict even on a shorter-term basis. Second, the policy holder may be interested in participating in returns on risky assets rather than risk free assets. At the end of section 10.2.3 we gave one approach to the second difficulty: Let the terminal lump sum payment be defined by the terminal value of the reserve. Then the prospective expected value given by (14) can be calculated retrospectively. The unit-linked insurance without a guarantee is hereby constructed. For various reasons, however, only few life insurance contracts were constructed like that in the past. Instead the insurer makes a first prudent guess on the future interest rates and transition intensities in order to be able to put up a reserve, knowing quite well that realized returns and transitions differ. This first guess on interest rates and transition intensities, here denoted by (r∗ , μ∗ ), is called the first order basis, and gives rise to the first order reserve, V ∗ . The set of payments B settled under the first order basis is called the first order payments or the guaranteed payments. However, the insurer and the policy holder agree that the realized returns and transitions should be reflected in the realized payment stream. For this reason the insurer adds to the first order payments a dividend payment stream. We denote this payment stream by D and assume that its structure corresponds to the structure of B, i.e. δ Z(t−)k (t) dN k (t) , (18) dD (t) = dDZ(t) (t) + k:k =Z(t−) j
dD (t) = δ (t) dt + ΔDj (t) .
332
j
Differential Systems in Finance and Life Insurance Here, however, the coefficients of D, δ jk (t), δ j (t), and ΔDj (t), are not assumed to be deterministic. In contrast, the dividends should reflect realized returns and transitions relative to the first order basis assumptions. One can now categorize basically all types of life and pension insurance by their specification of D. Such a specification includes possible constraints on D, the way D is settled, and the way in which D materializes into payments for the policy holder or others. We shall not give a thorough exposition of the various types of life insurance existing but just give a few hints to what we mean by categorization. When dividends are constrained to be to the benefit of the policy holder, i.e. D is positive and increasing, one speaks of participating or with-profit life insurance. In so-called pension funding there is no such constraint. There, however, often the insured himself is not affected by dividends. In return, an employer pays or receives dividends. No matter whether dividends affect the insured or his employer, the dividends do not necessarily materialize into cash payments. The insurer may convert them into adjustments to first order payments. Such a conversion is then agreed upon in the contract. In participating life insurance, this adjustment of first order payments is called bonus. We could continue the categorization of life insurance contracts but we stop here. For all types of contracts, however, remains the question: How should dividends reflect the realized returns and transitions? A natural measure of realized performance is the surplus given by excess of assets over liabilities. Assuming that payments are invested in a portfolio with value process Y and that liabilities are measured by the first order reserve, we get the surplus t Y (t) d (− (B + D) (s)) − V Z(t)∗ (t) , X (t) = 0− Y (s) where the first part is the total payments in the past accumulated with capital gains from investing in Y . Note that X in this section is defined as the surplus, in contrast to the previous section where X was the stock index. We now assume that a proportion of Y given by π (t, X (t)) X (t) / X (t) + V Z(t)∗ (t) is invested in a risky asset modelled as in section 10.2.2. Then the dynamics of Y are given by dY (t) = rY (t) dt + σ
π (t, X (t)) X (t) Y (t) dW Q (t) . X (t) + V Z(t)∗ (t) 333
Mogens Steffensen Note that we choose to specify the dynamics of Y directly in terms of W Q , the Wiener process under the valuation measure. Deriving the dynamics of X, using these dynamics for Y , one arrives, after a number of rearrangements and abbreviations, at dX (t) = rX (t) dt + π (t, X (t)) σX (t) dW Q (t) + d (C − D) (t) ,
(19)
X (0) = x0 , where C is a surplus contribution process with a structure corresponding to the structure of B and D, i.e. dC (t) = dC Z(t) (t) + cZ(t−)k (t) dN k (t) , (20) k:k =Z(t−) j
j
dC (t) = c (t) dt + ΔC j (t) . The dynamics of X show that π is actually the proportion of the surplus invested in the risky asset. This is the reason for starting out with the proportion π (t, X (t)) X (t) / X (t) + V Z(t)∗ (t) . The elements cjk , cj , and ΔC j of C are deterministic functions. They are, of course, important for a closer study on the elements of the surplus. However, they are not crucial for derivation and comprehension of the formulas in what follows. See also Norberg and Steffensen (2005) for more on surplus dynamics in the case where payments, and not only capital gains, are diffusion processes. Having introduced the surplus above as a performance measure, a natural next step is to link the dividend payments directly to the surplus, i.e. δ j (t) = δ j (t, X (t)) , δ jk (t) = δ jk (t, X (t)) , ΔDj (t) = ΔDj (t, X (t)) , where we, with a slight misuse of notation, use the same notation for the dividend payments and their functional dependence on (t, X (t)). This formalization of dividends would certainly be a way of getting realized returns (in Y ) and transitions (in N ) reflected in the dividend payments. 334
Differential Systems in Finance and Life Insurance We could have introduced other performance measures than the surplus defined above. However, other well-founded performance measures would typically also follow the dynamics given by (19) with appropriate definition of the coefficients in C. The formulas derived below would hold true. Thus, in this respect, the story about first order quantities and surplus can be seen as just one example of the state process X underlying the dividend payments. 10.3.2 The differential equation for the market reserve In this section, we state the differential equation for the reserves connected to a life insurance contract with dividend payments linked to the surplus. This formalizes most practical life insurance contracts where dividends are linked to the performance of the insurance contract. Furthermore, for the special case of dividends that are linear in the surplus, we separate variables of the reserves. Thereby one system of partial differential equations is reduced to two systems of ordinary differential equations. See Steffensen (2006b) for further studies on partial differential equations for valuation of surplus-linked dividends. The insurer is interested in valuation of the total future liabilities. We introduce as reserve the expected present value of future total payments given the past history of the policy. The expectation is taken under the product measure Q introduced in section 10.2.2. Since future payments depend on (Z (t) , X (t)) only and (Z (t) , X (t)) is a Markov process, the reserve is given by n Z(t) Q − ts r (t, X (t)) = E e d (B + D) (s) Z (t) , X (t) . (21) V t
We introduce the differential operator A, the payment rate β and the updating sum R, μjk (t) V k t, x + cjk (t) − δ jk (t, x) − V j (t, x) AV j (t, x) = k:k =j
+Vxj (t, x) rx + cj (t) − δ j (t, x) 1 j (t, x) π 2 (t, x) σ 2 x2 , + Vxx 2 β j (t, x) = bj (t) + δ j (t, x) + μjk (t) bjk (t) + δ jk (t, x) ,
(22a)
(22b)
k:k =j
Rj (t, x) = ΔB j (t) + ΔDj (t, x) + V j t, x + ΔC j (t) − ΔDj (t, x) −V j (t−, x) .
(22c) 335
Mogens Steffensen We are now ready to present the fourth differential equation. Proposition 4. The reserve given by (10.21) is characterized by the following deterministic system of backward partial differential equations, / D, 0 = Vtj (t, x) + AV j (t, x) + β j (t, x) − rV j (t, x) , t ∈ 0 = Rj (t, x) , t ∈ D, 0 = V j (n, x) .
(23a) (23b) (23c)
As in section 10.2.3, we shall not go through the derivation of the differential equation. The calculations are even more messy than those leading to the system (16), but the basic ingredients remain the same. However, we explain how (23) generalizes (16) in several respects. First, compare the differential operators (15a) and (22a). In (15a), the change in the reserve corresponding to a transition from j to k is reflected in the difference V k (t, x) − V j (t, x). In this section, a state transition also affects the variable X such that after a jump from j to k at time t, X (t) = X (t−) + cjk (t) − δ jk (t, X (t−)) . In (22a), this is seen in the change in the reserve by an updating of the variable x accordingly. A similar difference appears between (15c) and (22c). In (15c) the state process X is not affected by a lump sum payment at a deterministic point in time. This leads to a change in the reserve of V j (t, x) − V j (t−, x). In this section, a lump sum payment at time t yields X (t) = X (t−) + ΔC j (t) − ΔDj (t, X (t−)) . This is then seen in (22c) by an updating of the variable x accordingly. Second, in (15a) the coefficient on Vxj (t, x), rx, stems from the systematic return rate on investment rX (t). In this section, the systematic rate of increments of X, given sojourn in state j, equals rX (t) + cj (t) − δ j (t, X (t)) . This is then reflected in the coefficient on Vxj (t, x), rx + cj (t) − δ j (t, x) . Finally, we have in this section allowed for a certain proportional investment of the surplus in the risky asset. The volatility π (t, X (t)) σX (t) dW (t) 336
Differential Systems in Finance and Life Insurance j leads to a different coefficient on Vxx (t, x) in (22a) than in (15a). Apart from the difference between the differential operators, the systems (23) and (16) are almost identical. In this section, we have added the two payment streams B and D, of which only D is linked to X. In section 10.2.3, the payment stream B was linked to what X presented there. This is reflected in the according replacement of payments in (15b) and (15c), such that (22b) and (22c) appear. So far we have just presented the differential equation characterizing the reserve. We have not discussed which functional dependence of dividends on X that might be relevant. For such a discussion we need to know the insurer’s and the policy holder’s agreement on reflection of performance in dividends. In practice, dividends are always increasing in X. Then a good performance is shared between the two parties by the insurer paying back part of the surplus as positive dividends. A bad performance is shared between the two parties by the insurer collecting part of the deficit as negative dividends. Since there may be constraints on D, e.g., D increasing, these qualitative estimates are not necessarily strict, though. There are only few examples of a functional dependence that allow for more explicit calculations. Luckily, the most important one allows us to take an important step further. We end this section by specifying a particular functional dependence of dividends on X that allows for more explicit calculations of the reserve. We introduce dividends that are linear in the surplus in the sense that
δ j (t) = pj (t) + q j (t) X (t) , δ jk (t) = pjk (t) + q jk (t) X (t) , ΔDj (t) = ΔP j (t) + ΔQj (t) X (t) , where pj , pjk , ΔP j , q j , q jk , and ΔQj are positive deterministic functions. It is an easy exercise to plug these dividends into the system (23). The next step is then to suggest a useful separation of variables in V . Linearity of dividends inspires a guess on the form V j (t, x) = f j (t) + g j (t) x. Plugging this guess and its derivatives into (23) and collecting all terms including and excluding x, respectively, gives us systems of ordinary differential equations for f and g. We leave it to the reader to verify that the differential equations covering f and g are similar in structure to (4). This makes further studies, interpretations, and 337
Mogens Steffensen representations possible. In this exposition, we just notify the separation of variables of the reserve function for linear dividends. This separation reduces the system (23) of partial differential equations to two systems of ordinary differential equations characterizing f and g. See Steffensen (2006b) for all details. 10.4 10.4.1
Intervention Optimal stopping and early exercise options
In this section, we state and prove the differential equation for the value of a financial contract with payments linked to a stock index and with an early exercise option. The proof shows that the differential equation is sufficient for a characterization of the contract value. In section 10.2.2, we studied the price of a financial contract where the payment rates and lump sum payments at deterministic points in time were linked to a stock index. Typically, there is the additional feature to such a contract that the contract holder can, at any point in time t until termination, close the contract. He then receives a payoff that depends on the stock value upon closure. This feature is known as the premature or early exercise option, since it gives the contract holder the opportunity to convert future payments into an immediate premature payment. Recall the payment stream (8) in section 10.2.2. Now assume that, given exercise at time t, all future payments are converted into one exercise payment, due at time t, and denoted by Φ (t) = Φ (t, X (t)) , where we, with a slight misuse of notation, use Φ for both the process and its sufficiently regular functional dependence on (t, X (t)). We are now interested in calculating the value of the contract. It is possible to give an arbitrage argument for the unique contract value, τ Q − ts r − tτ r e dB (s) + e Φ (τ ) X (t) . (24) V (t, X (t)) = sup E τ ∈[t,n]
t
The decision not to exercise prematurely is included in the supremum in (24) by specifying Φ (n) = 0 (25) and by presenting the decision not to exercise prematurely by τ = n. Assume that X is modelled as in section 10.2.2 and the market available is as in section 10.2.2. One cannot immediately see from 338
Differential Systems in Finance and Life Insurance the results in the previous sections how the differential equation from there can be generalized to the situation in this section. For a fixed τ , the valuation problem is the same as in section 10.2.2 with n replaced by τ but how does the supremum affect the results? Does there still exist a deterministic differential equation characterizing the contract value? We define the differential operator A, the rate of payments β, and the sum R as in section 10.2.2, and introduce furthermore the sum by (t, x) = ΔB (t, x) + Φ (t, x) − V (t−, x) . We can now present the fifth differential equation. Proposition 5. The contract value given by (24) is characterized by the following deterministic backward partial variational inequality, / D, 0 ≥ Vt (t, x) + AV (t, x) + β (t, x) − rV (t, x) , t ∈ 0 ≥ Φ (t, x) − V (t, x) , t ∈ / D, 0 = [Vt (t, x) + AV (t, x) + β (t, x) − rV (t, x)] × [V (t, x) − Φ (t, x)] , t∈ / D, 0 ≥ R (t, x) , t ∈ D, 0 ≥ (t, x) , t ∈ D, 0 = R (t, x) (t, x) , t ∈ D, 0 = V (n, x) .
(26a) (26b) (26c) (26d) (26e) (26f) (26g)
This system should be compared with (10.11). First, (11a) is replaced by (26a)-(26c). The equation in (11a) turns into an inequality in (26a). An additional inequality (26b) states that the contract value always exceeds the exercise payoff. This is reasonable, since one of the possible exercise strategies is to exercise immediately and this would give an immediate exercise payoff. The equality (26c) is the mathematical version of the following statement: At any point in the state space (t, x), at least one of the inequalities in (26a) and (26b) must be an equality. Second, (11b) is replaced by (26d)-(26f). The equation in (11b) turns into an inequality in (26d). An additional inequality (26e) states that the contract value on the time set D exceeds the lump sum plus the exercise payoff falling due. The equality (26f) states that at least one of the inequalities in (26d) and (26e) must be an equality. Note that (26d)-(26f) easily can be written as V (t−, x) = ΔB (t, x) + max (V (t, x) , Φ (t, x)) , t ∈ D,
(27) 339
Mogens Steffensen while there is no such abbreviation available for (26a)-(26c). However, we choose the version (26d)-(26f) to illustrate the symmetry with (26a)-(26c). The usual situation in financial expositions is that there are no payments until exercise or termination whatever comes first. In that case β (t, x) disappears from (26a)-(26c) and (26d)-(26g) reduce to V (n−, x) = ΔB (n, x) since both V (n, x) and Φ (n, x) are zero. With this specification, (26) is the variational inequality characterizing the value of a so-called American option. By the variational inequality (26) one can divide the state space into two regions, possibly intersecting. In the first region, (26a) and (26d) are equalities. This region consists of the states where the optimal stopping strategy for the contract holder is not to stop. In this region, the contract value follows a differential equation as if there were no exercise option. In the second region, (26b) and (26e) are equalities. This region consists of the states where the optimal stopping strategy for the contract holder is to stop. Thus, in this region the value of the contract equals the exercise payoff. It is possible to show that (26) is a necessary condition on the contract value. However, instead we go directly to verifying that (26) is also a sufficient condition. The proof starts out in the same way as the verification argument in section 10.2.2. Take an arbitrary function H solving (26) and consider the process H (t, X (t)). Then we can write, by replacing n by τ in (10.12), τ
H (t, X (t)) = e− t r H (τ, X (τ )) τ s e− t r dB (s) − Hx (s, X (s)) σX (s) dW Q (s) + t τ s e− t r Hs (s, X (s)) + AH (s, X (s)) − t +β (s, X (s)) − rH (s, X (s)) ds s e− t r RH (s, X (s)) . − s∈(t,τ ]∩D
Now consider an arbitrary stopping time τ . For this stopping time, 340
Differential Systems in Finance and Life Insurance we know from (26a), (26b) and (26d) that τ τ s H (t, X (t)) ≥ e− t r dB (s) + e− t r Φ (τ ) t τ s − e− t r Hx (s, X (s)) σX (s) dW Q (s) . t
First, taking conditional expectation, given X (t), on both sides and then taking supremum over τ gives that τ Q − ts r − tτ r H (t, X (t)) ≥ sup E e dB (s) + e Φ (τ ) X (t) . (28) τ ∈[t,n]
t
Now consider instead the stopping time defined by τ ∗ = inf {H (s, X (s)) = Φ (s, X (s))} . s∈[t,n]
This stopping time is indeed well-defined since, from (25) and (26g), H (n, X (n)) = Φ (n, X (n)) = 0, so that τ ∗ occurs no later than n. We now know from (26c) and (26f) that 0 = Hs (s, X (s)) + AH (s, X (s)) +β (s, X (s)) − rH (s, X (s)) , 0 = RH (s, X (s)) , s ∈ [t, τ ∗ ] ∩D,
s ∈ [t, τ ∗ ] ,
such that
τ∗
H (t, X (t)) = t
−
e− τ∗
s t
e−
r
dB (s) + e−
s t
r
τ∗ t
r
Φ (τ ∗ )
Hx (s, X (s)) σX (s) dW Q (s) .
t
Taking conditional expectation, given X (t), on both sides and then comparing both sides for all possible stopping times yields the inequality τ Q − ts r − tτ r H (t, X (t)) ≤ sup E e dB (s) + e Φ (τ ) X (t) . (29) τ ∈[t,n]
t
341
Mogens Steffensen By (28) and (29), we conclude that H (t, X (t)) = V (t, X (t)) . Thus, any function solving (26) characterizes the contract value. Note that the proof also produces the optimal exercise strategy. The contract holder should exercise according to the stopping time τ ∗ . However, in order to know when to exercise, one must be able to calculate the value. Only rarely, the variational inequality (26) has an explicit solution. However, there are several numerical procedures developed for this purpose. One may, e.g., use Monte Carlo techniques, general partial differential equation approximations, or certain specific approximations developed for specific functions Φ. 10.4.2
Intervention options in life and pension insurance
In this section, we state the differential equation for the reserve of a life insurance contract with dividends linked to the surplus and with a surrender option. Furthermore we comment on the generalization to general intervention options. See Grosen and Jørgensen (2000) and Steffensen (2002) for results on the surrender options and general intervention options. In correspondence with the previous section, also the holder of a life insurance contract can, typically, terminate his policy prematurely. The act of terminating a life insurance policy is called surrender, and the exercise option is in this context called a surrender option. We consider the insurance contract described in section 10.3, i.e. a contract with the total accumulated payments given by B + D. Assume now that the contract holder can terminate his policy at any point in time. Given that he does so at time t, he receives the surrender value Φ (t) = ΦZ(t) (t, X (t)) , for a sufficiently regular function Φj (t, x). Here, we take X to be the surplus process introduced in section 10.3. We are now interested in calculating the value of future payments specified in the policy. We consider the reserve, V Z(t) (t, X (t)) Q = sup E τ ∈[t,n]
t
τ
e
−
s t
r
−
d (B + D) (s) + e
τ t
r
Φ (τ ) Z (t) , X (t) , (30)
342
Differential Systems in Finance and Life Insurance where Q is the product measure described in section 10.2.3. As in the previous section, one cannot immediately see how the differential equation (23) generalizes to this situation. The results in the previous section indicate, however, that the differential equation can be replaced by a variational inequality. We define the differential operator A, the payment rate β, and the updating sum R as in section 10.3, and introduce furthermore the sum j by j (t, x) = ΔB j (t, x) + Φj (t, x) − V j (t−, x) . We can now present the sixth differential equation. Proposition 6. The reserve given by (30) is characterized by the following deterministic system of backward partial variational inequalities, / D, 0 ≥ Vtj (t, x) + AV j (t, x) + β j (t, x) − rV j (t, x) , t ∈ / D, 0 ≥ Φj (t, x) − V j (t, x) , t ∈ j j 0 = Vt (t, x) + AV (t, x) − β j (t, x) − rV j (t, x) / D, × V j (t, x) − Φj (t, x) , t ∈ 0 ≥ Rj (t, x) , t ∈ D, 0 ≥ j (t, x) , t ∈ D, 0 = Rj (t, x) j (t, x) , t ∈ D, 0 = V j (n, x) . This differential equation can be compared with (23) in the same way as (26) was compared with (10.11). Its verification goes in the same way as the verification of (10.26) although it becomes somewhat more involved. We shall not go through this here. As in the previous section, one can now divide the state space into two regions, possibly intersecting. In the first region, the reserve follows a differential equation as if surrender were not possible. This region consists of states from which immediate surrender is suboptimal. In the second region, the reserve equals the surrender value, and this region consists of the states where immediate surrender is optimal. The surrender value is often in practice given by the first order reserve defined in section 10.3, in the sense that Φj (t, x) = V j∗ (t) , and is, thus, not surplus dependent. 343
Mogens Steffensen The title of this section is Intervention. So far, we have only dealt with stopping of a financial contract in the previous subsection, and stopping of an insurance contract in this subsection. In practice, the insurance policy holder typically holds other options that in some respects are similar in nature to the surrender option but in other respects not. The most important one is the free policy option that allows the policy holder to stop all premium payments but continue the contract in a so-called free policy state. Exercising a free policy option leads to a reduction of the first order benefits that were settled under the assumption of full premium payment. Thus, exercising a free policy option does not stop the insurance policy that continues under free policy conditions, but stops only the premium payments. Therefore, one should rather speak of intervention in than stopping of the insurance policy. Of course, stopping is a special example of intervention. For a stopping or surrender option, there is always only one control act, namely the act of stopping since hereafter the contract has expired. Given that the policy has been converted into a free policy, the policy holder may still hold a surrender option. Thus, introducing interventions, the policy holder may choose between different series of interventions. This feature produces technical challenges in the verification of a variational inequality characterizing the reserve. However, the basic structure of the resulting variational inequality remains the same. See Steffensen (2002) for all details. 10.5 10.5.1
Quadratic optimization Portfolio quadratic optimization of dividends
In this section, we state and prove the differential equation for a value function of an optimization problem where preferences over surplus and dividends are specified by a quadratic disutility function. We speak of the value process as a disutility reserve. The surplus introduced in section 10.3 is here approximated by a considerably simpler process. We also indicate the solution to the differential equation and the optimal dividend strategy. The control problem studied in this section is known as the linear regulator. See Fleming and Rishel (1975) for the linear regulator in general and Cairns (2000) for its applications to life insurance. In section 10.3, we introduced the notion of surplus. The surplus accumulates a stochastic process of surplus contributions C and capital gains from investment in a Black-Scholes market. From the 344
Differential Systems in Finance and Life Insurance surplus is withdrawn redistributions to the policy holders in terms of dividends. We modelled the process of dividends similarly to the underlying payment process B (and the process of surplus contributions C). In (10.23) a deterministic differential equation for the reserve was presented where the coefficients in the dividend process are linked to the surplus. We concluded section 10.3 by proposing dividends to be affine in the surplus. This led to a reserve that is affine in the surplus. Thus, section 10.3 dealt with valuation of certain dividend plans. The question that we did not address was whether, or rather when, surplus linked dividends, or dividends affine in the surplus for that matter, are particularly attractive. Questions of that kind appear in the discipline of optimization rather than valuation. We approximate the surplus by a diffusion process on the basis of the following list of adaptations: • We assume that the surplus is invested in the riskfree asset exclusively. • We approximate the process of surplus contributions by a Brownian motion with volatility ρ and drift c. • We assume that accumulated dividends are absolutely continuous and paid out by the rate δ. These adaptations give us the following surplus dynamics, dX (t) = rX (t) dt + d (C − D) (t) , X (0) = x0 , where dC (t) = c (t) dt + ρ (t) dW (t) , dD (t) = δ (t) dt. We are now interested in deciding on a dividend rate δ that we prefer over other dividend rates according to some preference criterion. For this purpose, we introduce a process of accumulated disutilities U , that is absolutely continuous with disutility rate u (t, δ (t) , X (t)), i.e. dU (t) = u (t, δ (t) , X (t)) dt. We now introduce a certain quadratic disutility criterion, u (t, δ, x) = p (t) (δ − a (t))2 + q (t) x2 .
(31) 345
Mogens Steffensen This criterion punishes quadratic deviations of the present dividend rate from a dividend target rate a and deviations of the surplus from 0. Such a disutility criterion reflects a trade-off between policy holders preferring stability of dividends, relative to a, over non-stability, and the insurance company preferring stability of the surplus relative to 0. The preference over the surplus could be driven by regulatory rules stating that earned surplus contributions should be redistributed upon earning in some sense. The deterministic functions p and q give weights to these preference formalizations. At time t, the future disutilities are measured by their conditional expectation. We define the disutility reserve as the infimum of all such conditional expectations over all admissible dividend payment streams, i.e., n dU (s) X (t) . (32) V (t, X (t)) = inf E D
t
Except for the infimum over D, note the similarity with, e.g., (14). The primary difference is that, instead of measuring an expected (present) value of payment rates δ, we now measure an expected disutility function of payment rates, p (t) (δ (t) − a (t))2 . Hereto we add an expected disutility function of the position of the surplus, q (t) X (t)2 . We introduce the differential operator A and the rate of disutilities β, 1 AV (t, x) = Vx (t, x) (rx + c (t) − δ) + Vxx (t, x) ρ2 , 2 β (t, x) = u (t, δ, x) . We are ready to present the seventh differential equation, which is a so-called Bellman equation. Proposition 7. The disutility reserve given by (32) is characterized by the following Bellman equation, 0 = Vt (t, x) + inf [AV (t, x) + β (t, x)] ,
(33a)
0 = V (n, x) .
(33b)
δ
An appendix to this differential equation is the specification of the optimal dividend stream, i.e., the dividend stream that actually minimizes the disutility reserve (32). This optimal dividend stream, specified by the optimal rate δ ∗ , is simply the argument of the supremum in (33a), i.e., δ ∗ = arg inf [AV (t, x) + β (t, x)] . δ
346
(34)
Differential Systems in Finance and Life Insurance It is worthwhile to comment on the connection between (33) and e.g. the variational inequality (26). In (26a)-(26b) and in (26d)-(26e), we had two inequalities, corresponding to two different actions, stopping and not stopping. From (26c) and (26f) one of the inequalities must be an equality. The structure of (33a) is the same in the sense that (33a) represents an infinite set of inequalities, corresponding to each possible dividend rate. However, one of the inequalities must hold with equality. Since for each dividend rate, the disutility reserve is described by the same partial differential equation, we can write this in the very compact way (33a). This compact way actually corresponds to the compact writing of (26d)-(26f) in (27). We now go to the verification of (10.33) being a sufficient condition for characterization of the disutility reserve. We start out in the same way as in section 10.2. Given a function H (t, x) solving (10.33) and an arbitrary dividend strategy δ, we can write n dH (s, X (s)) H (t, X (t)) = − t n dU (s) − Hx (s, X (s)) ρdW (s) = t n (Hs (s, X (s)) + AH (s, X (s)) + β (s, X (s))) ds. − t
(35)
Note that, given sufficient integrability, we could now, by taking conditional expectation on both sides of (35), conclude the following: If the disutility reserve were defined for an exogenously given dividend payment stream, then (10.33) would characterize the disutility reserve with this stream plugged in and without the infimum over δ. This result is obtained by the methodology used in section 10.2. We now argue how the extremum in (32) simply leads to the extremum in (33a). First, consider an arbitrary strategy δ. For this strategy we know, by (33a), that 0 ≤ Ht (t, X (t)) + AH (t, X (t)) + β (t, X (t)) , such that, by (35), H (t, X (t)) ≤
n
dU (s) − Hx (s, X (s)) ρ (s) dW (s) .
t
347
Mogens Steffensen Now, assuming sufficient integrability, taking conditional expectation on both sides and then taking infimum over D, gives us the inequality n H (t, X (t)) ≤ inf E dU (s) X (t) . (36) D
t
Second, for the specific strategy, δ ∗ = arg inf [−AH (t, X (t)) − β (t, X (t))] , δ
we know from (33a) that Ht (t, X (t))+AH (t, X (t))+β (t, X (t)) = 0. Inserting this in (35) yields n dU (s) − Hx (s, X (s)) ρ (s) dW (s) . H (t, X (t)) = t
Now, taking conditional expectation on both sides and then estimating over all possible dividend strategies yields the inequality n H (t, X (t)) ≥ inf E dU (s) X (t) . (37) D
t
That H (t, X (t)) = V (t, X (t)) now follows from (36) and (37). We shall not go into the methodology of solving (33a), but just state that it actually has a solution in explicit form. The solution is V (t, X (t)) = f (t) (X (t) − g (t))2 + h (t) , that is just a certain parametrization of a second order polynomial function in X (t). The functions f , g, and h are deterministic functions solving certain differential equations. We choose this parametrization in order to write the optimal dividend rate as δ ∗ (t, x) = a (t) +
f (t) (x − g (t)) , p (t)
which leads to the following interpretation. First, the dividends contains the target rate a taking into consideration the preferences over present dividends. Second, the preferences over the present and future surplus are hidden in an adjustment to this control. This adjustment controls X towards g, which can be considered the optimal position for X at time t. This adjustment happens with the force f /p that somehow weighs the future preferences over X through f against the present preferences over δ through p. The functions a, p, and q appear in the differential equations for f , g, and h. 348
Differential Systems in Finance and Life Insurance The optimal dividend rate is affine in X. So we can conclude that if we redistribute according to the specifications in this section, then it makes sense to work with affine dividend strategies. In general, disutility rates that are functions of the dividend rates and the surplus always lead to optimal dividend rates that are linked to the surplus. This is a consequence of the Markov property. Thus, it does make sense in general to work with the system (10.23). 10.5.2 Statewise quadratic optimization of dividends In this section, we state the differential equation for the disutility reserve of an optimization problem where preferences over surplus and dividends are specified by a quadratic disutility function. The surplus is modelled as in section 10.3. We also indicate the solution to the differential equation and the optimal dividend strategy. See Steffensen (2006a) for the generalization of the linear regulator to Markov chain driven payments. In the previous section, we approximated the surplus by a diffusion process and controlled it by an absolutely continuous dividend process D. We now take the step back to the original surplus process with dynamics given by (19). Again, however, we skip investment in the risky asset such that (19) reduces to dX (t) = rX (t) dt + d (C − D) (t) , X (0) = x0 , where C and D are the contribution and dividend processes given in (20) and (18), respectively. As in the previous section, we now introduce a process U of accumulated disutilities. However, due to the structure of C and D, we allow for lump sum disutilities at the discontinuities of C and D. Thus, inheriting the structure of the payment processes, U is taken to have the dynamics dU (t) = dU Z(t) (t, δ (t) , ΔD (t) , X (t)) uZ(t−)k t, δ k (t) , X (t) dN k (t) , + k:k =Z(t−)
dU j (t, δ, ΔD, x) = uj (t, δ, x) dt + ΔU j (t, ΔD, x) . Inspired by the quadratic disutility functions introduced in the previous section, we form the coefficients in the process of accumulated 349
Mogens Steffensen disutilities U accordingly, i.e. 2 uj (t, δ, x) = pj (t) δ − aj (t) + q j (t) x2 , 2 ujk t, δ k , x = pjk (t) δ k − ajk (t) + q jk (t) x2 , 2 ΔU j (t, ΔD, x) = ΔP j (t) ΔD − ΔAj (t) + ΔQj (t) x2 . These coefficients should be compared with (31). First, there are now three coefficients corresponding to disutility rates, lump sum disutilities upon transitions of Z, and lump sum disutilities at deterministic points in time. Second, for each type of dividend payment, we allow the target to be state dependent. Third, the weights on disutility of dividend deviations against disutility of surplus deviations are also allowed to be state dependent. The idea is now, with the generalized process of accumulated disutilities, to solve the corresponding optimization problem associated with the disutility reserve n Z(t) (t, X (t)) = inf E dU (s) Z (t) , X (t) . (38) V D
t
We introduce the differential operator A, the utility rate β and the updating sum R, μjk (t) V k t, x + cjk (t) − δ k − V j (t, x) AV j (t, x) = k:k =j
+Vxj (t, x) rx + cj (t) − δ , 2 β j (t, x) = pj (t) δ − aj (t) + q j (t) x2 2 + μjk (t) pjk (t) δ k − ajk (t) k:k =j
2 +q jk (t) x + cjk (t) − δ k , 2 Rj (t, x) = ΔP j (t) ΔD − ΔAj (t) + ΔQj (t) x2 +V j t, x + ΔC j (t) − ΔD − V j (t−, x) . We are now ready to present the eighth differential equation, which is a generalized version of the Bellman equation (10.33).
350
Differential Systems in Finance and Life Insurance Proposition 8. The disutility reserve given by (38) is characterized by the following Bellman equation, 0 = Vtj (t, x) + inf AV j (t, x) + β j (t, x) , t ∈ / D, (39a) δ,δ k
0 = inf Rj (t, x) , t ∈ D,
(39b)
0 = V (n, x) .
(39c)
ΔD j
The methodology needed for verification of (39) as a sufficient condition for characterization of the disutility reserve is the same as in the previous section. However, the state dependence makes the calculations somewhat more involved. In the previous section, we proposed an appropriately parametrized second order polynomial function as solution to the Bellman equation. It is very convenient that this simple structure is inherited by the solution to (39). The only generalization of the proposed solution is that the coefficient functions f , g, and h should be state dependent, i.e. 2 V Z(t) (t, X (t)) = f Z(t) (t) X (t) − g Z(t) (t) + hZ(t) (t) . Now, it is possible to derive systems of ordinary differential equations for f , g, and h, that can be solved numerically. The optimal dividend payments, given that the policy is in state j at time t, are given by f j (t) j x − g (t) , pj (t) pjk (t) ajk (t) δ ∗jk (t, x) = jk p (t) + q jk (t) + f k (t) q jk (t) + jk x + cjk (t) jk k p (t) + q (t) + f (t) f k (t) x + cjk (t) − g k (t) . + jk jk k p (t) + q (t) + f (t) δ ∗j (t, x) = aj (t) +
The optimal lump sum dividend payment on D, ΔD∗j (t), follows a formula similar in structure to the formula for δ ∗jk (t). Due to the parametrization of the second order polynomial solution, the following interpretations of δ ∗j (t) and δ ∗jk (t), respectively, apply. The optimal dividend rate should be interpreted in the same way as in the previous section. The rate is given by the target rate and an adjustment that takes care of future preferences over X. The adjustment moves X towards its optimal position at time t, g j (t), with 351
Mogens Steffensen the force f j (t) /pj (t). Now consider the optimal lump sum payment upon transition. This is actually a weighted average of three quantities corresponding to three considerations. First, a dividend payment equal to its target is preferred with the first weight pjk (t) / pjk (t) + q jk (t) + f k (t) . Second, a payment pushing X (t) towards its target 0 is preferred with the second weight q jk (t) / pjk (t) + q jk (t) + f k (t) . Third, the consideration of the position of X in the future leads to an adjustment that brings X close to its optimal position after the transition, g k (t), by a force equal to the third weight, f k (t) / pjk (t) + q jk (t) + f k (t) . A similar interpretation applies for the lump sum payment at deterministic points in time. 10.6 10.6.1
Utility optimization Merton’s optimization problem
In this section, we state the differential equation for a value function, here called the utility reserve, of an optimization problem where preferences over surplus and dividends are specified by a power utility function. The surplus introduced in section 10.3 is here approximated by a considerably simpler process. We also indicate the solution to the differential equation and the optimal dividend strategy. See Korn (1997) and Merton (1990) for original contributions. In section 10.5.1, we approximated the surplus introduced in 10.3. This led to a portfolio version of the quadratic optimization problem of a life insurance company. Here again, we formulate the redistribution problem as a control problem. However, we now add a decision variable. We do not assume that surplus is invested in the riskfree asset only. Instead, we consider the proportion invested in risky assets as a decision variable. Here, we start out by approximating the surplus introduced in section 10.3 on the basis of the following list of adaptations: • We assume that the process of contributions to the surplus is absolutely continuous and accumulates by the rate c. 352
Differential Systems in Finance and Life Insurance • We assume that accumulated dividends are absolutely continuous and paid out by the rate δ. This gives us the following surplus dynamics, dX (t) = (r + π (t) (α − r)) X (t) dt + π (t) σX (t) dW (t) +d (C − D) (t) , X (0) = x0 , where dC (t) = c (t) dt, dD (t) = δ (t) dt. We present the problem here and the solution below in terms of the insurance company’s surplus distribution problem. However, the personal financial problem of optimal investment-consumption is the same. There the contribution process is interpreted as the personal income process and the dividend process is the consumption process. So the results below should also be read with that application in mind. As in section 10.5, we introduce a preference criterion to decide on a dividend rate δ and an investment proportion π. We introduce a process of accumulated utilities U and a power utility rate u (t, δ (t)), i.e. for γ < 1, dU (t) = u (t, δ (t)) dt, 1 a (t)1−γ δ γ . u (t, δ) = γ
(40)
The criterion (40) rewards high dividend rates without consideration of the surplus. The deterministic function a weighs the utility of dividends over time. Without further specifications such a problem has no solutions since it would be optimal to pay out infinite dividend rates. However, adding the constraint that the terminal surplus must be non-negative, the problem makes sense. We now measure the future utilities by the utility reserve, n dU (s) X (t) . (41) V (t, X (t)) = sup E π,D
t
Note the similarity with e.g. (14) where we now, instead of measuring an expected (present) value of the payment rates δ, measure an expected utility function of the payment rates. Hereto, we have added the supremum that leaves us with an optimization problem. 353
Mogens Steffensen Now, we introduce the differential operator A, and the rate of disutilities β, AV (t, x) = Vx (t, x) ((r + π (α − r)) x + c (t) − δ) 1 + Vxx (t, x) π 2 σ 2 x2 , 2 1 a (t)1−γ δ γ . β (t) = γ We are now ready to present the ninth differential equation, which is a Bellman equation. Proposition 9. The utility reserve given by (41) is characterized by the following Bellman equation, 0 = Vt (t, x) + sup [AV (t, x) + β (t)] ,
(42a)
0 = V (n, x) .
(42b)
δ,π
The optimal dividend stream and the optimal investment strategy, specified by the optimal rate δ ∗ and the optimal proportion π ∗ , are simply the arguments of the supremum in (42), i.e., δ ∗ = arg sup [AV (t, x) + β (t)] , δ
π ∗ = arg sup [AV (t, x) + β (t)] . δ
The verification of (42) as a sufficient condition characterizing the utility reserve goes in exactly the same way as in section 10.5. The only difference is that all inequalities are turned around since we are now solving a maximization problem instead of a minimization problem. As we did in section 10.5, we can separate the variables of the solution. In this case, the solution is given by V (t, X (t)) =
1 f (t)1−γ (X (t) + g (t))γ . γ
With this parametrization of the solution, both f and g have solutions that can be interpreted as present values. There exists an artificial rate r∗ that depends on all parameters in the model, such that n s ∗ e− t r a (s) ds, f (t) = t n s e− t r c (s) ds. g (t) = t
354
Differential Systems in Finance and Life Insurance The function f is a present value that says something about the value of investing and smoothing out the surplus over the residual time to maturity. The time weights in the function a appear in f . The function g is the present value of future contributions to the surplus. From its appearance in the utility reserve, the insurance company could activate all future surplus contributions and account for them in the surplus. The optimal controls become a (t) (x + g (t)) , f (t) 1 α−r π ∗ (t, x) x = (x + g (t)) . 1 − γ σ2 δ ∗ (t, x) =
These strategies are easy to interpret. One should pay out, at any point in time, a fraction a (t) /f (t) of X (t) + g (t) that weighs the future preferences over dividends through f against the present preferences over dividends through a. The amount optimally invested in stocks π ∗ (t, X (t)) X (t) is a proportion of the surplus plus activated future surplus contributions. In the personal finance consumption-investment problem, the function g is the present value of future income, which is also called human wealth. Note that for t → n, f → 0, such that the proportion of X paid out as dividends tends to infinity. The consequence is that the optimally controlled surplus ends at 0. 10.6.2 Statewise power utility optimization of dividends In this section, we state the differential equation for the utility reserve of an optimization problem where preferences over surplus and dividends are specified by a power utility function. The surplus is modelled as in section 10.3. We also indicate the solution to the differential equation and the optimal dividend strategy. See Steffensen (2004) for the generalization of Merton’s optimization problem to Markov chain driven payments. In the previous section, we approximated the surplus by modelling both contributions and dividends as absolutely continuous processes. We take a step back to the original surplus process with dynamics given by (19), and model the surplus as in (19) with the exception 355
Mogens Steffensen that we still take C to be approximated by a deterministic function, i.e., dC (t) = c (t) dt + ΔC (t) , not allowing for state dependence in the surplus contribution. We return to this detail at the very end. The full return to (19) with C given by (20) is not immediately possible. The process of accumulated utilities is, on the other hand, given by dU (t) = dU Z(t) (t, δ (t) , ΔD (t)) uZ(t−)k t, δ k (t) dN k (t) , + k:k =Z(t−) j
j
dU (t, δ, ΔD) = u (t, δ) dt + ΔU j (t, ΔD) . We generalize the power utility function to state-dependent utility functions in the sense that 1 j 1−γ γ a (t) δ , γ 1 jk 1−γ k γ a (t) , δ ujk t, δ k = γ 1 ΔU j (t, ΔD) = ΔAj (t)1−γ (ΔD)γ . γ uj (t, δ) =
These coefficients should be compared with (40). Due to the structure of the dividend payments, there is now one coefficient for each type of dividend payment. Furthermore, we allow the coefficient functions to depend on the state of Z. We now introduce the utility reserve T Z(t) (t, X (t)) = sup E dU (s) Z (t) , X (t) . (43) V D,π
t
We introduce the differential operator A, the utility rate β and the updating sum R, μjk (t) V k t, x − δ k − V j (t, x) AV j (t, x) = k:k =j
+ Vxj (t, x) ((r + π (α − r)) x + c (t) − δ) 1 j + Vxx (t, x) π 2 σ 2 x2 , 2 γ 1 1 μjk (t) ajk (t)1−γ δ k , β j (t) = aj (t)1−γ δ γ + γ γ k:k =j 356
Differential Systems in Finance and Life Insurance Rj (t, x) =
1 ΔAj (t)1−γ (ΔD)γ + V j (t, x + ΔC (t) − ΔD) γ − V j (t−, x) .
We are now ready to present the tenth - and final - differential equation, which is a generalized version of the Bellman equation (42). Proposition 10. The utility reserve given by (43) is characterized by the following Bellman equation, / D, (44a) 0 = Vtj (t, x) + sup AV j (t, x) + β j (t) , t ∈ π,δ,δ k
0 = sup R (t, x) , t ∈ D,
(44b)
0 = V (n, x) .
(44c)
j
ΔD j
The methodology needed for verifying (44) as a sufficient condition for the characterization of the utility reserve is the same as in section 10.5. In section 10.5, we separated variables of the utility reserve. In section 10.5, introducing state dependence led to a separation of variables, such that parts depending on time became state dependent as well. The question is whether this trick works here again. Indeed, V Z(t) (t, X (t)) =
1 Z(t) 1−γ f (t) (X (t) + g (t))γ . γ
(45)
In the previous section, the function f could be interpreted as an artificial present value of the stream of coefficients a. Here again, the resulting differential equation for f leads to similar possibilities for interpretations. However, the conclusion becomes somewhat involved and is not pursued further here. On the other hand, we still have that n s e− t r dC (s) ds, g (t) = t
and the insurance company can again activate all future deterministic surplus contributions and account for them in the surplus. The optimal amount invested in stocks is still given by π ∗ (t, x) x =
1 α−r (x + g (t)) , 1 − γ σ2 357
Mogens Steffensen whereas the optimal dividend payments are formalized by aj (t) (x + g (t)) , f j (t) ajk (t) (x + g (t)) , δ ∗jk (t, x) = jk a (t) + f k (t) ΔAj (t) (x + g (t)) . ΔD∗j (t, x) = ΔAj (t) + f j (t) δ ∗j (t, x) =
Again, we can interpret the optimal fraction of surplus in the optimal dividend rate as a trade-off between present considerations in a and future considerations in f . The same interpretation applies for the optimal lump sum dividends. The numerator concerns the present preferences, while the denominator concerns the future preferences, including the present. For all considerations, the state dependence of f is reflected in the state dependent optimal dividend payments. In section 10.5, we ended up with X (n) = 0 due to infinite dividend proportions of the surplus as we get closer to maturity. In this case, the same conclusion is a consequence of the terminal condition f j (n) = 0. We end by a comment on the assumption that the contribution process is not state dependent, in contrast to section 10.5.2. In the case of power utility and state dependent contributions, the value function (45) is not correct. The problem is that the market is incomplete, since the Z-risk is not given any price by the market in contrast to W -risk. Therefore, there exists no unique arbitrage free price of the future contributions if they depend on Z. In order to obtain a complete market situation, one has to add decision variables corresponding to insurance against Z-risk. This is exactly what happens in Kraft and Steffensen (2006). There, however, the viewpoint is not that of the insurance company. In contrast, the viewpoint is that of the policy holder, and therefore Kraft and Steffensen (2006) is a generalization of the personal finance consumption-investment problem, which incorporates Z-risk and insurance decisions.
References: Aase, K. K., and Persson, S.-A. (1994) “Pricing of unit-linked life insurance policies.” Scandinavian Actuarial Journal: 26–52. Black, F., and Scholes, M. (1973) “The pricing of options and corporate liabilities.” Journal of Political Economy 81: 637–654. 358
Differential Systems in Finance and Life Insurance Brennan, M. J., and Schwartz, E. S. (1976) “The pricing of equitylinked life insurance policies with an asset value guarantee.” Journal of Financial Economics 3: 195–213. Cairns, A. J. G. (2000) “Some notes on the dynamics and optimal control of stochastic pension fund models in continuous time.” ASTIN Bulletin 30(1): 19–55. Fleming, W. H., and Rishel, R. W. (1975) Deterministic and Stochastic Optimal Control. Springer-Verlag. Grosen, A., and Jørgensen, P. L. (2000) “Fair valuation of life insurance liabilities: The impact of interest rate guarantees, surrender options, and bonus policies.” Insurance: Mathematics and Economics 26: 37–57. Hald, A. (1981) “T. N. Thiele’s contributions to Statistics.” International Statistic Review 49: 1–20. Hoem, J. M. (1969) “Markov chain models in life insurance.” Bl¨atter der Deutschen Gesellschaft f¨ ur Versicherungsmathematik 9: 91–107. Korn, R. (1997) Optimal Portfolios. World Scientific. Kraft, H., and Steffensen, M. (2006) Optimal Consumption and Insurance: A Continuous-Time Markov Chain Approach. Technical report, Laboratory of Actuarial Mathematics, University of Copenhagen. Merton, R. C. (1973) “Theory of rational option pricing.” Bell Journal of Economics and Management Science 4: 141–183. Merton, R. C. (1990) Continuous-time Finance. Blackwell. Nielsen, P. H. (2005) “Optimal bonus strategies in life insurance: The Markov chain interest rate case.” Scandinavian Actuarial Journal 2: 81–102. Norberg, R. (1991) Reserves in life and pension insurance. Scandinavian Actuarial Journal: 3–24. Norberg, R. (1999). “A theory of bonus in life insurance.” Finance and Stochastics 3(4): 373–390. Norberg, R. (2001) “On bonus and bonus prognoses in life insurance.” Scandinavian Actuarial Journal 2: 126–147. Norberg, R., and Steffensen, M. (2005) “What is the time value of a stream of investment?” Journal of Applied Probability 42(3): 861–866. Richard, S. F. (1975) “Optimal consumption, portfolio and life insurance rules for an uncertain lived individual in a continuous time model.” Journal of Financial Economics 2: 187–203. Steffensen, M. (2000) “A no arbitrage approach to Thiele’s differential equation.” Insurance: Mathematics and Economics 27: 201–214. 359
Mogens Steffensen Steffensen, M. (2002) “Intervention options in life insurance.” Insurance: Mathematics and Economics 31: 71–85. Steffensen, M. (2004) On “Merton’s problem for life insurers.” ASTIN Bulletin 34(1): 5–25. Steffensen, M. (2006a) “Quadratic optimization of life and pension insurance payments.” ASTIN Bulletin 36(1): 245–267. Steffensen, M. (2006b) “Surplus-linked life insurance.” Scandinavian Actuarial Journal 2006(1): 1–22. Thiele, T.N. (1880) Sur la compensation de quelques erreurs quasisyst´ematiques par la m´ethodes de moindre carr´es. Reitzel, Copenhagen. See also Hald (1981).
360
Chapter 11 Uncertain Technological Change and Capital Mobility
Paul A. de Hek Netherlands Bureau for Economic Policy Analysis (CPB)
11.1
Introduction
Although much work has been done in the field of stochastic endogenous growth models (see e.g. King and Rebelo 1988; King, Plosser and Rebelo 1988; Obstfeld 1994; Hopenhayn and Muniagurria 1996; W¨alde 1999), there are only a few analyzes on the influence of uncertainty on (the distribution of) the long-run growth rate. Previous work on economic growth under uncertainty has focused on issues like the existence of a limiting distribution for capital and consumption, but has not tried to understand how the distribution of productivity shocks affects growth in the long run. In an important empirical study, Ramey and Ramey (1995) find evidence that economic growth and the volatility of the economic fluctuations are negatively linked. This negative relationship is primarily due to the volatility of the innovations to growth (i.e., of unpredictable changes in the growth rate). This latter measure corresponds closely to the notion of uncertainty. At face value, this result seems to contradict those of Kormendi and Meguire (1985), who finds that the standard deviation of output growth has a significant positive effect on growth. However, Ramey and Ramey (1995, p.1145) argue that in the regressions of Kormendi and Meguire, the positive effect of the standard deviation may be capturing the effect of predictable movements in growth. In that way, both results are consistent: volatility of the innovations seems to have a negative effect, while volatility in the
Paul A. de Hek predicted variable has a positive effect on growth.1 Recent studies by Martin and Rogers (2000) and Imbs (2004) largely confirm the result that countries with higher volatility grow (conditionally) at a lower rate. Investments in research and development (R&D) or, more generally, investments in the creation of knowledge are the driving force behind the advancement of the technology. More investments will generally lead to a higher rate of technological change, and, consequently, to higher economic growth. However, the return to these investments is not known in advance, that is, the productivity of knowledge creation is uncertain. This creates a link between uncertainty and (long-run) growth. In the present study, uncertainty derives from randomness in the productivity of R&D. In general, one part of uncertainty is due to individual, firm-specific (idiosyncratic) uncertainty, while the other part arises from economy-wide (common) shocks, which have the same impact on all firms. Here, the analysis will focus on common shocks2 , such as technology and policy shocks. The objective of this study is, then, to find out the nature (positive or negative) of the link between growth and aggregate uncertainty and to identify the main factors that determine this nature. Concerning the theoretical literature on this topic, both De Hek (1999) and Jones, Manuelli and Stacchetti (2005) show that the relationship between volatility in macroeconomic productivity and mean growth can be either positive or negative. The curvature of the utility function is identified as a key parameter that determines the sign of the relationship. In a recent paper, Blackburn and Pelloni (2004) investigate the relationship between growth and volatility in learningby-doing economies. They find that the correlation between long-term growth and short-term volatility depends on the source of stochastic fluctuations and the functioning of the labor market. As regards the former, long-run growth is negatively related to the volatility of (nonneutral) nominal shocks, but positively related to the volatility of real shocks. The present analysis uses a model of endogenous technological change where sustained growth stems from intentional investments in R&D from profit-maximizing, risk-averse firms. Physical capital is 1
See also Guiso and Parigi (1999) and Aizenman and Marion (1999), who find a negative relationship between volatility and (private) investment. 2 Schankerman (2001) finds that idiosyncratic shocks do not account for much (approximately 25%) of the variation in investment decisions. Nearly 75% of the micro-variance is due to heterogeneity in micro-level responses to aggregate (common) shocks.
362
Uncertain Technological Change and Capital Mobility assumed to be fully mobile, while labor is assumed to be immobile. Uncertainty derives from the productivity of investments in R&D. The main result of this analysis is that the relationship between long-run growth and uncertainty (on the productivity of knowledge creation) depends on two main factors - increasing or non-increasing returns to scale in knowledge creation and a high or low value of the elasticity of intertemporal substitution (of a firms’ profits). Empirical studies on the returns to scale in knowledge creation (”non-increasing”) and the value of the elasticity of intertemporal substitution (”higher than the critical value”) indicate a negative relationship between long-run growth and uncertainty regarding the productivity of knowledge creation. Hence, this study identifies a new factor - the returns to scale in the research sector - which influences the growth-uncertainty relationship. Moreover, while Jones, Manuelli and Stacchetti (2005) quantitatively find a positive relationship between growth and uncertainty3 , the present analysis establishes a verifiable critical value of the elasticity of intertemporal substitution, implying a negative relationship between growth and uncertainty that is consistent with the empirical evidence cited above. 11.2
Framework of the model
The model that will be developed in this section is based on the models of endogenous technological change of Romer (1990) and Aghion and Howitt (1998, Ch. 3). The main difference with these models is that in the present model, instead of having a separate research sector, research is being undertaken by the intermediate-good producers. Research by a firm enhances the firm’s own state of the technology (and has a positive external effect on the other firms’ states of the technology). This setting allows us to find the effect of higher uncertainty in the productivity of investments in R&D on the growth rate through the optimal choices, concerning capital and (skilled) labor, of the intermediate-good producers.
3 De Hek (1999) makes no prediction concerning the most likely nature of the relationship.
363
Paul A. de Hek 11.2.1
Technology
The consumption-capital good in the economy, final output Y , is produced according to 1
Yt = L1−β t
0
Ait xβit di,
(1)
where xit is the quantity of intermediate (or capital4 ) good i, Lt is the quantity of labor employed to produce final output and Ait is an index for the technology or knowledge in firm (or sector) i. At each date, the representative final-output firm decides how much of each intermediate good it rents from the producers of those goods. Maximization of its profits implies that the price (or rental rate) pit of intermediate good i is given by Ait xβ−1 (2) pit = βL1−β t it , ∀i ∈ [0, 1]. The wage rate wL,t of (skilled) labor used in the final-output sector is equal to its marginal product, 1 β wL,t = (1 − β)L−β (3) t Ait xit di. 0
Each intermediate good is produced by a firm that has an infinitelyvalid patent on that design (or can in some other way effectively prevent other competitors from entering the market, without affecting the profit maximization). Due to this monopoly power, an intermediate firm can devote resources, i.e., labor, to research and development (R&D), which enhances the state of the technology of that firm. A higher state of the technology might be seen as an improvement of the quality of the firm’s product and implies higher profits. The intermediate sector uses labor to conduct research. Labor or human capital5 in sector i is denoted by hit . Average or total human capital used to conduct research is then given by 1 Ht = hjt dj. 0
The total labor force in the economy is fixed and set to 1, i.e., Lt + Ht = 1 for all t. 4 Intermediate goods and capital (goods) are used interchangeably throughout the study. 5 In this study, the amount of human capital used in sector i, hit , is defined to be the amount of labor used in sector i, lAit , times the (constant) skill level, h. Normalizing h to 1 implies that hit = lAit .
364
Uncertain Technological Change and Capital Mobility To produce intermediate goods at the rate xit , the firm in sector i requires the use of Ait xit units of capital. We assume throughout this study full international capital mobility, while labor (human capital) is assumed to be immobile. Thus the interest rate r is exogenously given and is equal to the international interest rate. The per period profit of an intermediate-good producer is therefore given by π it = pit xit − rAit xit − wH,t hit , where wH,t is the wage rate of human capital. Suppose that technology or knowledge evolves according to Ai,t+1 = 1 + η t+1 hγit Htθ Ait ,
(4)
(5)
where γ > 0 is a returns-to-scale parameter, θ > 0 a parameter controlling the spill-over effect of average (or total) human capital, Ht =
0
1
hjt dj,
and η a random variable representing the productivity of human capital in the accumulation of knowledge. In every period, η may take any value on some interval I. As a result, the return to research is uncertain. The probability distribution of the return is, however, known and fixed. More formally, assume that the sequence of shocks {η t } satisfies: {η t } is a sequence of independently and identically distributed (i.i.d.) random variables with probability distribution μ and support I = [η, η], η > η > 0. Clearly, more (less) uncertainty is associated with higher (lower) variability. (For a formal definition of variability see Rothschild and Stiglitz, 1970). To determine the effect of changing the variability on the expectation of a function of the random variable, the following result by Rothschild and Stiglitz (1971) is very useful: Given that Y is more variable than X, Ef (X) > (≥) Ef (Y ) if f is strictly (weakly) concave, while Eg(X) < (≤) Eg(Y ) if g is strictly (weakly) convex. Therefore, to determine the effect of increasing (or decreasing) variability on Ef (X), it is sufficient to find out whether f (.) is strictly 365
Paul A. de Hek concave or strictly convex. E.g. if f (X) is strictly concave, increasing the variability of X leads to a decrease in the expectation of f (X). One line of reasoning suggests that, since all firms are owned by the consumers (possibly represented by the representative consumer), the utility functions of the consumers should determine how firms behave. That is, firms should make their choices to maximize the expected utility (of consumption) of the owners of the firm. However, according to a second line of reasoning, if the owners delegate the management of the firm to a manager, you could argue that the manager does not know the utility functions of the owners of the firm. Suppose, for example, that the ownership shares held in the firms can be traded among the consumers, either nationally or internationally. Then, if consumers (foreign or domestic) differ with respect to their utility functions, the managers of the firms will not know which (kind of) consumers own their firm. In that case, it seems natural for the manager to maximize the expected discounted stream of profits or to incorporate possible risk aversion, the expected discounted stream of the utility of profits6 . Adopting the second line of reasoning7 , the intertemporal expected profit maximization problem of an intermediate-good producer is given by: ∞ 1−σ π f −1 δ t it (6) max E 1 − σf t=0 s.t. Ai,t+1 = 1 + η t+1 hγit Htθ Ait , where E is the expectation operator, δ ≡ 1/(1+r) the discount factor, with r representing the interest rate, and π it is given by equation (4). The parameter σ f ∈ [0, ∞) reflects both a measure of risk aversion and the reciprocal of the elasticity of intertemporal substitution. Notice that this set-up includes the ’standard case’ of risk neutrality (and an infinite elasticity of intertemporal substitution), which occurs if σ f = 0. Notice that utility is not well-defined if profits are nonpositive in any period. As shown in Appendix B, profits are positive (negative) if and only if ht < (>)β/(1 + β). This implies that, regardless of its utility function, an intermediate-good producer will never employ more (or just as much) labor than the critical level, since this will 6
In the literature on the theory of the firm under uncertainty, the assumption that the firm maximizes the expected utility of profits is widely used. See e.g. Sandmo (1971) and Viaene and Zilcha (1998). 7 A short exposition of the first line of reasoning is given in Appendix E.
366
Uncertain Technological Change and Capital Mobility yield negative (or zero) profits independent from the shocks.8 Returns on investment in R&D are uncertain. In each period, the impact of research on each firm’s stock of knowledge is randomly determined. Since η is assumed to be independent from i, this specification of the uncertainty implies that the shocks are economy wide, i.e., the same for each firm. Therefore, the riskiness of the investments in R&D is the result of changes in the economic climate, e.g., induced by technology or policy shocks. In maximizing the expected discounted stream of profits, the firm knows the demand for its product as given by equation (2). Therefore, replacing pit in the maximization problem with the right-hand side of equation (2) and differentiating with respect to the two choice variables xit and hit leads to the first-order conditions. These two conditions can be written as 1 % & β−1 r Lt , (7) xit = β2 −σ
−σ
1−β β f θ η t+1 γhγ−1 π it f wH,t = E[δπ i,t+1 it Ht Ait β(1 − β)Lt+1 xi,t+1 ].
(8)
It is assumed that the transversality condition, as given in Appendix A, holds. Moreover, profits are assumed to be positive (see the appendix for the associated restriction on the optimal level of human capital). Let At denote the average productivity parameter across all firms at date t: 1 Ait di. At ≡ 0
Because each sector i uses Ait xit units of capital, the total capital stock (measured in forgone consumption) is equal to 1 Ait xit di. Kt ≡ 0
According to equation (7), all firms produce the same amount at any given time: xit = xt = Kt /At for all i. Next, suppose that initially at t = 0 every firm has the same productivity, that is, Ai0 = A0 for all i, which implies that Ait = At for all i. Then equation (8) allows 8
Another drawback resulting from this specific utility function concerns the fact that, at zero profit, marginal utility is infinite. However, as explained in the text, this situation will not arise. On the contrary, profits will grow larger and larger over time (see equation (12)), implying that this feature of the utility function is not driving any of the results. The apparent advantage of this specific utility function is that it produces an analytical solution.
367
Paul A. de Hek us to have hit = ht for all i, which, in turn, implies that Ht = ht . As a result, the aggregate technology (1) can now be expressed in the simpler form . (9) Yt = At xβt L1−β t 11.2.2
Preferences
Assume that consumers behave as if they maximize their expected value of lifetime utility. Consumers are heterogeneous in the sense that they differ in their time preference, ρj , and their elasticity of intertemporal substitution, σ j . The objective of agent j, then, is to select consumption and savings to maximize the expected value of his lifetime utility: &t 1−σj ∞ % cj,t − 1 1 (10) max E 1 + ρj 1 − σj t=0 s.t. bj,t+1 = (1 + r)bj,t + wL,t Lj,t + wH,t hj,t + sj π t − cj,t , where cj,t is consumption and bj,t represents assets. The agent’s sources of income are interest on his stock of assets rbj,t , wage income wL,t Lj,t + wH,t hj,t and his share sj of profits π t = βYt − rKt − wt ht . Maximization with respect to consumption and savings implies that the optimal path of consumption follows the Euler equation, 1+r −σ −σ j . (11) cj,t j = E cj,t+1 1 + ρj The associated transversality condition, which is assumed to be satisfied, is given in Appendix A. 11.2.3
Equilibrium
In equilibrium, the wage rate in the intermediate sector should equal the wage rate in the final-output sector, i.e., β wH,t = (1 − β)L−β t At xt .
Furthermore, as the total amount of labor present in the economy is normalized to 1, the time allocation restriction reads Lt + ht = 1. Due to the presence of shocks, the notion of balanced growth needs adjustment. Therefore, instead of a constant growth rate the analysis 368
Uncertain Technological Change and Capital Mobility here focuses on a constant expected growth rate. On this balanced expected-growth path (BEGP), the levels of the intermediate goods and labor are constant. This implies that the per period profit grows with the technology, that is, π t+1 = π t 1 + η t+1 hγ+θ . (12) t Incorporating these considerations in equation (8) leads to the BEGP research condition, η t+1 σ = 1. (13) E βγ(1 − h)hγ+θ−1 (1 + r) 1 + η t+1 hγ+θ f The left-hand side of this equation gives the ratio of the return to an additional unit of skilled labor over the cost of an additional unit of skilled labor, on the BEGP. Given the probability measure of η, the intermediate producers choose the optimal amount of time spent on research, h, according to above equation, which determines the rate of technological change, γ+θ
gA,t = 1 + η t+1 ht
.
(14)
On the BEGP, x, L and h are determined by the three conditions given by equations (7) and (13) and the time allocation restriction. Additionally, it is assumed that the solution to this set of equations also satisfies the transversality condition associated with the optimization problem. See Appendix A for the exact condition. Since the inputs x and L in the production function are constant along the BEGP, the growth rate of output is equal to the rate of technological change: (15) gY,t = gA,t . 11.3
The effect of uncertainty on growth
11.3.1 The growth rate of output The effect of higher volatility of the shock η on the optimal choice of h depends on the functional form of the BEGP research condition regarding the shock η and the variable h. The first step in finding the effect of more uncertainty on the growth rate of output is to determine the effect of a higher volatility of η on the left-hand-side of equation (13), which will be denoted by E(Φ). It turns out (see Proposition 369
Paul A. de Hek
Figure 1: Equilibrium research condition, with γ +θ ≤ 1. The figure is based on equation (13) where the expectation is approximated with a second-order Taylor series expansion (see Appendix D). The parameter values are: β = 1/3, γ = 0.5, θ = 0.05, ρ = 0.05, σ f = 1.25, η = 0.85, σ 2η = 0.01. 1 below) that Φ is a concave function of η, implying that a higher volatility of η has a negative effect on the expectation of Φ. Second, the effect of a smaller E(Φ) on the equilibrium value of h depends on the functional form of E(Φ) as a function of h. If γ + θ ≤ 1, it is easy to see that E(Φ) is a decreasing function of h, as depicted in Figure 1. A higher volatility, which decreases E(Φ) as a function of h, then leads to a smaller level of research. On the other hand, if γ + θ > 1, E(Φ) as a function of h is hump-shaped. This implies that there are two equilibrium values of h, a ”low research level equilibrium” and a ”high research level equilibrium” (that is, if the maximum of E(Φ) is higher than 1). See Figure 2 for an example of this situation. There will actually be more time spent on research due to more uncertainty if the economy is in the low level equilibrium, as opposed to less research time in the high level equilibrium. What is the effect of a change in the time spent on research on the growth rate of the economy? A reduction in the time spent on research, for example, implies that the expectation of gA decreases, which, in turn, implies that the growth rate of the economy, g, will be smaller on average. More formally, consider the two probability measures μ and μ+ , where μ+ is more uncertain than μ, that is, it has 370
Uncertain Technological Change and Capital Mobility
Figure 2: Equilibrium research condition, with γ +θ > 1. The figure is based on equation (13) where the expectation is approximated with a second-order Taylor series expansion (see Appendix D). The parameter values are: β = 1/3, γ = 1.1, θ = 0.05, ρ = 0.05, σ f = 1.25, η = 6, σ 2η = 0.01. the same mean but a higher volatility. Then the average growth rate under μ+ is smaller than the average growth rate under μ for almost any sequence of realizations of η; i.e., it occurs almost surely. The effect of uncertainty on the time spent on research and the average long-run growth rate is summarized in the next proposition. Proposition 1. Let 0 < σf < 2
1 + gA (η) − 1. gA (η)
(A) If γ + θ ≤ 1, then more uncertainty leads to (i) less time spent on research and (ii) a smaller growth rate of output on average. (B) If γ+θ > 1, there may exist two equilibria. Then more uncertainty leads to (i) more (less) time spent on research and (ii) a higher (smaller) growth rate of output on average if the economy is in the low (high) research level equilibrium. Proof: See Appendix C. 2 371
Paul A. de Hek The effect of uncertainty on the path of final output is as follows. For example, in case (A) of Proposition 1, more uncertainty leads to less labor used in research and therefore to more labor used in the production of final output. Equation (7), then, shows that the amount of every capital good increases. This implies, by equation (1), that final output increases initially. However, since the growth rate of output has fallen, at some point in time the new path of final output will lie below the initial path. Thus, in the long-run, final output is negatively influenced by uncertainty (that is, in case (A) of Proposition 1). In the previous analysis, the negative effect of uncertainty on output growth could be shown under two restrictions. The first restriction puts an upperbound on σ f , the reciprocal of the elasticity of intertemporal substitution (of the profits of the intermediate-good producers) as well as a measure of risk aversion. Even if gA under the best shock is as high as 20%, the restriction requires σ f to be less than 11. This means that this restriction will certainly be satisfied if the firms act as if they were close to risk neutral. (However, if firms behave in a strict risk-neutral manner, uncertainty will have no effect on the time spent on research and, hence, on the expected growth rate.) Moreover, even if the firms have similar attitudes towards risk and intertemporal substitution as households, estimates of σ f (and hence of σ) usually indicate that its value is roughly between 1 and 7 (see e.g. Gertner, 1993; Metrick, 1995; Beetsma and Schotman, 2001; Vissing-Jørgenson, 2002; Guvenen, 2006).9 The second restriction is that there are no increasing returns to R&D; i.e., γ + θ ≤ 1. The presence of constant or decreasing returns seems a fairly realistic assumption, which is confirmed by recent empirical evidence. For example, Dinopoulos and Thompson (1996, 2000) estimate versions of Romer’s model of endogenous technological change (Romer, 1990) and find positive, but decreasing, returns to R&D. Similar results are found in Hall, Griliches and Hausman (1986), Kortum (1993) and Thompson (1996). The intuition behind the finding that the nature of the effect of uncertainty on the time spent on research - positive or negative - depends on the parameter σ draws on the fact that this parameter represents both risk aversion and the elasticity of intertemporal substitution. The fact that firms are risk averse implies that higher uncertainty reduces the return on investment (in skilled labor) in terms of utility. This 9 On the contrary, very high values of σ are found by Hall (1988) and implied by evidence provided by the equity premium puzzle (see e.g. Campbell et al., 1997).
372
Uncertain Technological Change and Capital Mobility affects the amount of investment positively or negatively depending on the relative strenghts of the income and substitution effects. A relatively small σ, for example, implies that the substitution effect dominates the income effect, inducing a positive effect on investment (i.e., the time spent on research). 11.3.2 The growth rate of consumption Due to the international capital market, the growth rates of output and consumption differ. Although individual consumption levels and growth rates differ across consumers, as denoted by the subscript j, the nature of the effect - positive or negative - does not depend on these differences. As a result, we suppress the subscripts in the following analysis. To determine the effect of uncertainty on the long-run growth rate of consumption, we insert ct+1 = (1 + gc,t+1 )ct into the Euler equation (11) to get
which implies that
−σ −σ 1 + r , c−σ = E gc,t+1 ct t 1+ρ
(16)
1+ρ E gc−σ = . 1+r
(17)
Using a second-order Taylor series expansion around E [gc ], E [gc−σ ] can be approximated by 1 E gc−σ = E [gc ]−σ + σ(σ + 1)E [gc ]−(σ+2) var(gc ), 2
(18)
where var(gc ) is the variance of the growth rate of consumption. Since consumption depends on the income of the consumers, which in turn depends on the state of the technology, consumption depends on the shock η. This implies that a higher variability of the shock leads to more variable consumption and, hence, to a more variable growth rate of consumption. Thus, more uncertainty regarding the shock implies a higher var(gc ). Since, according to equation (17), E [gc−σ ] is constant, equation (18) shows that an increase in var(gc ) will be accompanied with an increase in E [gc ]. Therefore, from this analysis we may conclude that, due to more uncertainty, the growth rate of consumption will on average be higher. 373
Paul A. de Hek This result - that more uncertainty implies a higher average growth rate of consumption - is driven by the consumers’ precautionary saving motive. Due to this motive, consumers save more in more uncertain circumstances in order to ensure themselves against ’bad shocks’. Naturally, these higher savings lead to a higher growth rate. The technical reason for the existence of a precautionary saving motive is the fact that the marginal utility is convex. This convexity implies that the negative consequence, in terms of utility, of a bad shock dominates the positive consequence of a similar (in size) good shock. 11.4
Conclusion
The analysis in this study shows that unpredictable variations in economic productivity may have a positive or negative effect on the average growth rate of output. This confirms the results of earlier papers on this subject. However, this analysis adds two new elements. First, physical capital is assumed to be fully mobile, allowing capital to flow freely between economies. This is in contrast with the earlier closedeconomy models. Second, the theoretical ambiguity result is not solely determined by the value of the elasticity of intertemporal substitution (of consumption) - as is the case in the earlier analyses - but depends on two factors. That is, the relationship between unpredictable variations (uncertainty) in economic productivity and economic growth depends on whether returns to scale in knowledge creation are increasing or non-increasing and whether the elasticity of intertemporal substitution (of profits) is higher or lower than some critical value. Both factors have been studied in the empirical literature. First, empirical studies on the returns to scale in knowledge creation (R&D) indicate that these returns are decreasing. Second, based on empirical analyses on the elasticity of intertemporal substitution and given the critical value as implied by the rate of technological change (under the best possible shock), it is most likely that the value of the elasticity of intertemporal substitution is higher than the critical value. Together these two results imply that unpredictable variations in economic productivity have a negative effect on the average long-run growth rate.
374
Uncertain Technological Change and Capital Mobility Appendix A: Transversality conditions The transversality condition of the intermediate-good producer’s optimization problem is given by −σ f
lim Eδ t π t
t→∞
At+1 = 0.
(19)
The transversality condition of the consumer’s optimization problem is given by &t % 1 c−σ (20) lim E t bt+1 = 0. t→∞ 1+ρ Appendix B: Restriction for ”π > 0” The one-period profit of an intermediate-good producer can be written as β At xβt − β 2 L1−β At xβt − (1 − β)L−β π t = βL1−β t t t At xt ht , ht = β(1 − β)Yt − (1 − β)Yt . Lt
This equation implies that π t > 0 iff ht < βLt = β(1 − ht ). Hence, profit π t is positive if and only if ht < β/(1 + β). If β = 1/3, this implies that ht < 1/4. Appendix C: Proof of Proposition 1 This proof consists of proving the two steps taken in the text prior to the proposition. First, we have to prove that G(η) ≡ η/(1 + ηhγ+θ )σf is a concave function of η. Let us write G(η) = η/(1 + bη)σf , with b = hγ+θ . Differentiating G(.) with respect to η shows that ∂G/∂η = (1 + (1 − σ f )bη)/((1 + bη)1+σf ). Differentiating again with respect to η yields (1 + bη)(1 − σ f )b − (1 + (1 − σ f )bη)(1 + σ f )b ∂ 2 G(η) = . 2 ∂η (1 + bη)2+σf 375
Paul A. de Hek From this, we can conclude that ∂ 2 G/∂η 2 < 0 if and only if 1 + σ f < 2(1 + bη)/bη. Hence, G(η) is (strictly) concave for all η ∈ [η, η] iff 1 + bη 1 + gA (η) =2 , 1 + σ f < min 2 η∈[η,η] bη gA (η) since b = hγ+θ . Hence, by Lemma 1, a higher volatility of η decreases E(Φ). Define the function F as follows: F (h) = mη(1 − h)hγ+θ−1 /(1 + ηhγ+θ )σf , with m = βγ/(1 + r). If γ + θ ≤ 1, it is evident that F (h) is decreasing in h. As a result, E(Φ) is decreasing in h. If γ + θ > 1, numerical simulations indicate that F (h) is hump-shaped. Thus, if the maximum of the function is high enough there exist two equilibria. The first step implies that a higher volatility of η decreases E(Φ). The second step implies that depending on whether γ + θ ≤ 1 or γ + θ > 1, E(Φ) is decreasing in h for all h ∈ [0, 1] or hump-shaped. For example, in the first case, h has to fall in order to keep E(Φ) equal to 1. Appendix D: Taylor series approximation Using the second-order Taylor series expansion around η implies that
η E (1 + ηhγ+θ )σf & % 1 (σ f + 1)σ f ηh2(γ+θ) 2σ f hγ+θ η σ 2η , + − ≈ (1 + ηhγ+θ )σf 2 (1 + ηhγ+θ )σf +2 (1 + ηhγ+θ )σf +1 where σ 2η represents the variance of η. Hence, this yields an approximation of the expectation in the BEGP research condition (13), which is used to draw the graphs in Figures 1 and 2.
376
Uncertain Technological Change and Capital Mobility Appendix E: Alternative model This version of the model follows the line of reasoning that, since firms are owned by the representative consumer, the utility function of the representative consumer determines how firms behave.10 Hence, we assume here that all consumers are the same. As there are infinite many firms, a single firm has no effect on the profit of the representative consumer. We therefore let the representative consumer make all the choices (and, hence, internalizes the external effect of skilled labor). This implies that the representative consumer solves the optimization problem: &t 1−σ ∞ % ct − 1 1 (21) max E x,h,c 1 + ρ 1−σ t=0 s.t. bt+1 = (1 + r)bt + wt + π t − ct , At+1 = 1 + η t+1 hγ+θ At . t Inserting the expression for π t (= π it ) into the budget restriction, the restriction becomes bt+1 = (1 + r)bt + Yt − rAt xt − ct .
(22)
The first-order condition with respect to xt implies that . r = βLt1−β At xβ−1 t
(23)
The first-order condition with respect to ht can be written as β −β c−σ ) A x −(1 − β)(1 − h t t t + t . 1 γ+θ−1 β 1−β E c−σ (γ + θ)η h ) A x − rA x (1 − h = 0. t+1 t t+1 t t t+1 t t+1 1+ρ Using both first-order conditions and the fact that on a BEGP xt = xt+1 and ht = ht+1 , the ’alternative’ BEGP research condition reads & % η t+1 γ+θ−1 = 1. (24) E (γ + θ)(1 − h)h (1 + ρ) (ct+1 /ct )σ 10
In a way, this is similar to W¨ alde (1999), where firms indirectly, i.e., firms are only engaged in static maximization, maximize the expected utility of the representative consumer.
377
Paul A. de Hek If we compare this equation with the BEGP research condition as given by equation (13) in the text, there are three differences. First, instead of γ we have here γ + θ, reflecting the fact that the representative consumer internalizes the externalities between the firms. Second, r is replaced by ρ, since the consumer discounts time with the time preference, while the firms discount time with the interest rate. Third and most importantly, , 1 + η t+1 hγ+θ t the growth rate of technology, is replaced by ct+1 /ct , the growth rate of consumption. While the first two differences do not affect the qualitative effect of uncertainty on growth, the third difference makes it hard if not impossible to determine the effect of uncertainty on growth in general, since we cannot solve for the growth rate of consumption,11 except when the interest rate is exactly that value at which saving equals investment. In the latter case, the growth rate of consumption exactly equals the rate of technological change, and equation (24) is qualitatively similar to equation (13), yielding the same result concerning the effect of uncertainty on growth as stated in Proposition 1. Acknowledgements: I thank Jean-Marie Viaene for helpful discussions. Financial support from the Netherlands Organisation for Scientific Research (NWO) is gratefully acknowledged.
References: Aghion, P. and Howitt P. (1998) Endogenous Growth Theory. Cambridge, MA and London, England: The MIT Press. Aizenman, J., and Marion, N. (1999) “Volatility and Investment: Interpreting Evidence from Developing Countries.” Economica 66 (262): 1157–79. Beetsma, R.M.W.J., and Schotman, P.C. (2001) “Measuring risk attitudes in a natural experiment: data from the television game show lingo.” The Economic Journal 111: 821–48. 11
Actually, you do not necessarily have to solve for the growth rate of consumption completely. E.g. if the growth rate is a linear function of the shock, the effect of uncertainty is similar as in the model in the text.
378
Uncertain Technological Change and Capital Mobility Blackburn, K., and Pelloni, A. (2004) “On the Relationship between Growth and Volatility.” Economics Letters 83: 123–127. Campbell, J.Y., Lo, A.W., and MacKinlay, A.C. (1997) The Econometrics of Financial Markets. Princeton, NJ: Princeton University Press. De Hek, P.A. (1999) “On Endogenous Growth under Uncertainty.” International Economic Review 40: 727–44. Dinopoulos, E., and Thompson, P. (1996) “A Contribution to the Empirics of Endogenous Growth.” Eastern Economic Journal 22: 389– 400. Dinopoulos, E., and Thompson, P. (2000) “Endogenous Growth in a Cross-Section of Countries.” Journal of International Economics 51: 335–62. Gertner, R. (1993) “Game shows and economic behavior: Risk taking on ”card sharks”.” Quarterly Journal of Economics 108: 507–21. Guiso, L., and Parigi, G. (1999) “Investment and Demand Uncertainty.” Quarterly Journal of Economics 114: 185–227. Guvenen, M.F. (2006) “Reconciling Conflicting Evidence on the Elasticity of Intertemporal Substitution: A Macroeconomic Perspective.” Journal of Monetary Economics, forthcoming. Hall, R.E. (1988) “Intertemporal Substitution in Consumption.” Journal of Political Economy, 96: 339–57. Hall, B., Griliches, Z., and Hausman, J. (1986) “Patents and R&D. Is there a Lag?” International Economic Review 27: 265–283. Hopenhayn, H., and Muniagurria, M. (1996) “Policy Variability and Economic Growth.” Review of Economic Studies 63: 611–625. Imbs, J. (2004) Growth and Volatility. Mimeo, London Business School. Jones, L.E., Mamuelli, R.E., Siu, H.E., and Stachetti, E. (2005) “Fluctuations in Convex Models of Endogenous Growth I: Growth Effects.” Review of Economic Dynamics 8: 780–804. King, R.G., Plosser, C., and Rebelo, S.T. (1988) “Production, Growth and Business Cycles, II: New Directions.” Journal of Monetary Economics 21: 309–341. King, R.G., and Rebelo, S.T. (1988) Business Cycles with Endogenous Growth. Unpublished Paper, University of Rochester. Kormendi, R.L. and Mequire, P.G. (1985) “Macroeconomic Determinants of Growth: Cross-Country Evidence.” Journal of Monetary Economics 16: 141–163. 379
Paul A. de Hek Kortum, S. (1993) “Equilibrium R&D and the Patent-R&D Ratio: U.S. Evidence.” American Economic Review Papers and Proceedings 83: 450–457. Lucas, R.E., Jr. (1988) “On the Mechanics of Economic Development.” Journal of Monetary Economics 22: 3–42. Metrick, A. (1995) “A natural experiment in ”jeopardy”.” American Economic Review 85: 240–53. Obstfeld, M. (1994) “Risk-Taking, Global Diversification and Growth.” American Economic Review 84: 1310–29. Ramey, G., and Ramey, V.A. (1995) “Cross-Country Evidence on the Link between Volatility and Growth.” American Economic Review 85: 1138–1151. Romer, P.M. (1986) “Increasing Returns and Long-Run Growth.” Journal of Political Economy 94: 1002–1037. Romer, P.M. (1990) “Endogenous Technological Change.” Journal of Political Economy 98: S71–S102. Rothschild, M., and Stiglitz, J.E. (1970) “Increasing Risk I: A Definition.” Journal of Economic Theory 2: 225–243. Rothschild, M., and Stiglitz, J.E. (1971) “Increasing Risk II: Its Economic Consequences.” Journal of Economic Theory 3: 66–84. Sandmo, A. (1971) “On the Theory of the Competitive Firm under Price Uncertainty.” American Economic Review 61: 65–73. Thompson, P. (1996) “Technological Opportunity and the Growth of Knowledge.” Journal of Evolutionary Economics 6: 77–97. Viaene, J.-M., and Zilcha, I. (1998) “The Behavior of Competitive Exporting Firms under Multiple Uncertainty.” International Economic Review 39: 591–609. Vissing-Jorgenson, A. (2001) “Limited Asset Market Participation and the Elasticity of Intertemporal Substitution.” Journal of Political Economy 110: 825–53. W¨alde, K. (1999) “A Model of Creative Destruction with Undiversifiable Risk and Optimizing Households.” Economic journal 109: C156– C171.
380
Chapter 12 Stochastic Control, Non-Depletion of Renewable Resources, and Intertemporal Substitution
Nils Chr. Framstad The Financial Supervisory Authority of Norway1
12.1
Introduction
It is well known that if the economic discount rate uniformly exceeds the relative growth rate of a resource – measured in physical terms if price is constant, or more generally in value of extracting it all – then, assuming zero costs, a profit maximizer will want to do just that: instantly deplete the resource completely. Thus, from a conservationist point of view, a high discount rate is undesirable, since it represents less value of savings for future times and may lead to the extinction of populations and entire species and the irrecoverable loss of natural resources. There is considerable literature (see e.g., Alvarez 2001 and the references therein) on the effect of uncertainty in such expected profit maximizer models, mainly where uncertainty is modeled by Brownian motion. Pindyck (1984) concludes that in an equilibrium model with bounded maximal extraction rate, one will extract at a rate which is either zero or the maximum possible, but decreasing as a function of the volatility. The bang-bang property indicates that if any extraction rate is allowed, then the optimal strategy should be characterized as a reflection, i.e., harvesting precisely as much as necessary in order 1 This work does not reflect the views of the Financial Supervisory Authority of Norway.
Nils Chr. Framstad to prevent the population from exceeding a given threshold. In such a setting, Alvarez (2001) confirms rigorously that increasing Brownian uncertainty will increase the threshold, and lead one to wait for a higher population before harvesting (i.e. the opposite effect of the discounting term). However, as pointed out in Framstad (2003), the choice of Brownian noise is crucial as introducing qualitatively different zero-mean noises – namely jump uncertainty – may in fact lead to downwards reflection at populations lower than in the deterministic case. Having established that zero-mean uncertainty may actually lead to harvesting at a lower level, it is however shown that just as in the deterministic case, it is not optimal for an expected profit maximizer with an infinite time horizon to deplete the resource completely as long as the resource’s expected relative growth at zero exceeds the discount rate. Unlike the aforementioned works, this paper does not attempt to find any optimal solution; we shall show that the same criterion will imply that complete depletion cannot be optimal under a far more general class of preferences. It turns out that in this respect, linear utility still is the “worst case” among the risk averse, a property which is not a priori obvious as consumption now is certain while future consumption is not. We assume a setting where a single agent completely and cost free controls the irreversible extraction of the resource and possesses the relevant information on population size and the stochastic evolution law of the relative growth rate at zero. The model is assumed to be an Ito process with semimartingale driving noise. The reader who is not fully familiar with the mathematics behind this, may think of the model as growth which at zero is locally approximately geometric, with a relative growth rate b distorted by zero-mean noise; the noise will actually cancel out from the model, so we only need it to be sufficiently well behaved. 12.2
The preferences
With preferences represented by a direct utility function of present consumption rate, there might not be any substitute to consumption at a given time (i.e., a non-degenerate time interval). An obvious example is if direct utility is −∞ at zero. It is however objectionable to assume that the agent has to consume at each and every second. In a more realistic setting, formalized by Hindy, Huang and Kreps (1992) in the deterministic case, and by Hindy and Huang (1992) under uncertainty, a positive portion con382
Stochastic Control and Non-Depletion of Renewable Resources sumed should keep the agent satisfied for some time, and consumption at two close points in time should be considered close substitutes not only when considering the value function (indirect utility), but also when considering the running direct utility rate. This means that the agent does not necessarily have to consume at all times; a “gulp” consumed now will not only increase direct utility rate at this particular moment, but also in the future. It is not unreasonable to guess that this could lead to earlier harvesting than in the case without intertemporal substitution. So just like in the jump uncertainty case, we may want to ask if the old criterion for non-depletion prevails: is it sufficient that the harvested population has a relative growth rate exceeding the economic discount rate? This paper sets out to show that this criterion seems quite robust. To introduce intertemporal substitution, we shall assume that the current consumption rate only indirectly affects direct utility. Although current consumption rate does not enter as an argument directly [Cf. Hindy and Huang (1992), section 5], the case where direct utility depends only on current consumption rate may be obtained as a limiting case. As a simplification, we shall assume the agent’s direct utility to depend on a single nonnegative process C which represents not consumption itself, but a decayed transformation of the past consumption path (also frequently referred to as “durability”). 12.2.1 Exponential decay Since we assume C to be one-dimensional, it will turn out from quite reasonable mathematical assumptions that past consumption decays exponentially as time goes by. The justification is as follows: Assume consumption at two stopping times times t1 ≥ t0 . We will assume that the rate at which past consumption is “forgotten” is F , so that we have for t ∈ (t0 , t1 ) C(t+ 0 ) · F (t, t0 ) (1) C(t) = C(t+ ) · F (t, t ) for t > t1 . 1 1 Now assume that there is continuity in the sense that if a “zero amount” is consumed at t1 , then C is continuous at t1 and both formulae may be applied. Hence we will require, for t > t1 , that C(t) = F (t, t1 )C(t1 ) = F (t, t1 )F (t1 , t0 )C(t0 ) but also = F (t, t0 )C(t0 ). (2) 383
Nils Chr. Framstad Since F is supposed to be decay, then the following assumptions are natural: ∂F ¯ (t, t) ≤ 0 and F (t, t) = 1 everywhere, ∂t
(3)
Conditions (2)–(3) grant that F (t, t¯) represents exponential decay with respect to the difference (t − t¯). Now exponential decay has the property that one mouthful consumed today increases utility at all future times, regardless how distant, and one may object to this property. It will however turn out that long durability of consumption is undesirable from a conservationist point of view, in the sense that assuming infinite memory is a harder test for the non-depletion criterion we want to prove. On the other hand, it will not represent much technical obstacle to allow for the decay rate δ (which enters in (4)–(5) below) to be time-dependent, and we will do so – but (for simplicity) we shall assume it to be non-random. We will frequently need the multiplicative decay factor · δ dt}, (4) Δ := exp{− 0
where δ(t) is continuous at 0 and locally bounded. 12.2.2
The process C and direct utility
Having justified the exponential decay form, we shall assume that, for H being the cumulative harvest process, C obeys dC = −δC dt + k dH.
(5)
We shall argue that we can take k = 1 (constant), but first we notice that if we put k = δ and let δ → ∞, we recover the classical nonintertemporal case dC = dH. Furthermore, we can interpret k as a price and technology parameter; imagine that extracting 1 unit from the resource yields k units (units may not be the same, for example in case k reflects price) to consume (we note that Theorem 1 will be trivial unless k > 0.) It is natural to allow this parameter to vary over time, but we leave to the reader to see that we can incorporate such time-dependence into the drift term b of the process X to be introduced in (7) below. That way, or by considering C/k instead of C, we see that we can (and will) assume k = 1. 384
(6)
Stochastic Control and Non-Depletion of Renewable Resources We then assume that the direct utility rate at time t is U (t, C(t)), where U is continuous near t = 0 and C2 near C = 0. We denote the derivatives in the second argument by primes, and for technical reasons assume that they are sufficiently regular for small nonzero values in C. The prototypical direct utility rate would be the discounted form exp{−ρt}u(C(t)), but our generalization represents no obstacle and allows for t-dependence in u as well. Hence we shall stick with the U notation through the below Theorem 1 until Corollary 1, where a more explicit calculation will turn out convenient. 12.3
The optimal control problem
Consider an agent who wants to maximize expected total utility from harvesting from a population X, obeying an Ito stochastic differential equation of the form ˜ dz) − dH(t), dX(t) = X(t− ) · b(t, X)dt + σdM (t) + ηz Π(dt, X(0+ ) = x,
(7)
where we assume the following mathematical detail: M is a continuous ˜ = Π − π is a measure-valued pure jump martingale martingale, Π composed from an integer-valued random measure Π (assumed right continuous with left limits) and its continuous compensator π. Here, σ and η are stochastic functions, and we assume ad hoc uniqueness and (local) existence of (7), and furthermore that b is continuous at (0, 0), while both σd[M, M ] and ηz dπ are sufficiently integrable. The process H is our control, the total amount extracted up to and including time t; of course, the harvested process X should remain nonnegative, so we assume that X does not jump past 0 by itself (i.e. we assume ηz ≥ −1) and that we cannot harvest more than the present population, i.e. we restrict the set of admissible H by imposing H(t+ ) − H(t− ) ≤ X(t) for all t.
(8)
For given values of X(0) and C(0), define the performance up to time T to be: T U (t, C(t))dt], (9) JT (H) := E[ 0
and suppose the objective function to be maximized over H to be JT¯ where T¯ is a fixed deterministic time horizon, finite or infinite. 385
Nils Chr. Framstad 12.3.1
Optimality criteria
Now the usual idea of optimality would be to try to find a control maximizing JT¯ . There are however other optimality criteria designed either for refinement or to ensure existence. We want to treat both the quite weak “sporadically catching up” (SCU) and the strong “overtaking” (OT) optimality criteria, i.e., by definition, an “optimal” H ∗ should satisfy, respectively, for all admissible H: SCU: OT:
lim sup (JT (H ∗ ) − JT (H)) ≥ 0,
T T¯ JT (H ∗ )
− JT (H) ≥ 0 for all large enough T < T¯.
(10) (11)
For some intuition on the concepts, think of T¯ = ∞ as merely a mathematically convenient approximation to a very long time horizon, and assume that there are multiple controls which accumulate arbitrarily high utility at large times. For example, assume that we have three controls H1 , . . . , H3 with JT (H1 ) = T, JT (H2 ) = 2(T − 1/(2T + 1)), and JT (H3 ) = 2(T − 2/(2T + 1)) sin T. Then J∞ (H1 ) = J∞ (H2 ) = +∞. However, H2 overtakes the two others and H3 still sporadically catches the two others, as lim sup(JT (H3 ) − JT (H2 )) = lim 1/(2T + 1) = 0 and lim sup(JT (H3 ) − JT (H1 )) = ∞. This reasoning exhibits OT-optimality as stronger than both “ordinary” optimality and SCU-optimality, but no relation between the two latter, although SCU-optimality is quite weak if lim sup JT < ∞ for all H. T
For a more thorough treatment of different optimality concepts, see e.g. Seierstad and Sydsæter (1987). 386
Stochastic Control and Non-Depletion of Renewable Resources 12.3.2 Comparing only a few strategies A priori, the agent should be permitted to choose among a possibly quite large class of non-anticipating left continuous non-decreasing (cumulative) harvesting processes H satisfying (8), and as pointed out by Hindy and Huang, it should be possible to consume both in a continuous way and with discrete gulps. However, the purpose of this paper is not to prove optimality, but to prove non-optimality of a given strategy, namely immediate total depletion, denoted by ¯ 0 . For this purpose, it suffices to find one strategy which H = H is better. Specifically, we shall consider strategies where nearly all of the initial population X(0) is harvested immediately, and after a short time t˜, the rest is harvested. Therefore, we can without loss of generality assume C(0) = 0: we are not allowed to extract a negative amount, but we can imagine doing so, only to immediately reverse the operation and also extract most of the rest. This fictitious operation will not affect the processes nor the running utility rate except at the single time zero, hence not the problem. We schematize: • We assume H(0) = 0 and C(0) + X(0) = x0 ; without loss of generality, we assume C(0) = 0 and X(0) = x0 . • We harvest H(0+ ) = x0 −x and “start X at” X(0+ ) = x (cf. (7)), where x is assumed small. Thus C(0+ ) = x0 − x. • We let X evolve according to (7) until time t˜, when we harvest the rest, namely X(t˜). ¯ 0,x = H ¯ t˜,0 =: ¯ t˜,x . Observe that H • This strategy will be denoted H ¯ 0. H ¯ t˜,x ) − JT (H ¯ 0 ) has a first-order term. It will turn out that t˜x → " JT (H 12.4
Non-optimality of immediate total depletion
Let us first define a function K which will help us to state the result in a very general form: T ¯ ¯ K(T ) := (b + δ) Δ(t) · U (t, Δ(t)x0 )dt − U (0, x0 ), (12) 0
where ¯b := b(0, 0)
(13) 387
Nils Chr. Framstad is the relative drift rate at the limit (t, X) → (0, 0), and δ¯ := δ(0)
(14)
is the relative decay rate of the consumption at time 0. We then have the following: Theorem 1. Assume x0 > 0, and that the above conditions hold. ¯ 0 is not • If K(Tn ) > 0 for some sequence Tn # T¯, then H overtaking-optimal. • If K is positive and bounded away from 0 on some nonempty ¯ 0 is not sporadically catching up-optimal. interval (T, T¯], then H ¯ 0 ) < ∞ holds, then • If in addition to the second bullet point JT¯ (H ¯ H0 is not optimal in the ordinary sense. ¯ t˜,x , we can calculate C Proof: With the above described strategy H easily: if t ∈ (0, t˜] Δ(t)(x0 − x) C(t) = Δ(t)[(x0 − x) + X(t˜)/Δ(t˜)] if t > t˜. Then we have for each T < T¯, t˜ ¯ JT (Ht˜,x ) = U (t, Δ(t)(x0 − x))dt 0 T + E U (t, Δ(t)[(x0 − x) + X(t˜)/Δ(t˜)]) dt. t˜
¯ 0 )| < ∞, and consider the difference Observe that |JT (H ¯ t˜,x ) − JT (H ¯ 0) : D(t˜, x) := JT (H t˜ . U (t, Δ(t)(x0 − x)) − U (t, Δ(t)x0 ) dt D(t˜, x) = 0 T E U (t, Δ(t)[(x0 − x) + X(t˜)/Δ(t˜)]) − U (t, Δ(t)x0 ) dt. + t˜
By the mean value theorem, there is some t1 ∈ (0, t˜) and some x1 ∈ (0, x) so that the first line of the right hand side is equal to −xt˜Δ(t1 )U (t1 , Δ(t1 )(x0 − x1 )). 388
Stochastic Control and Non-Depletion of Renewable Resources As for the second line, fix t and write the expectation argument as f (Y (t˜)) where Y := X/Δ; note that (dY )/Y = δdt + (dX)/X. By the Ito formula, df (Y ) = (b(t˜, Y Δ) + δ)Y f (Y )d t˜ + 12 σ 2 Y 2 f (Y )d[M, M ] + f ((1 + ηz )Y ) − f (Y ) − ηz Y f (Y ) dπ + [martingale terms].
(15)
Noting that in general, f (Y (t1 )) will depend on t (but that f (Y (0)) = 0 for all t), and taking expectation, we get that for some t2 ∈ (0, t˜), D(t˜, x) = −xt˜Δ(t1 )U (t1 , Δ(t1 )(x0 − x1 )) ˜ + E tY (t2 ) δ(t2 ) + b(t2 , X(t2 )) +
t˜
t˜
t˜
f (Y (t2 ))dt
[higher than first order terms in x]d[M, M ]dt
t˜
+
T
(16) T
0
T
t˜
[higher than first order terms in x]dπdt .
0
So D(t˜, x) lim = −U (0, x0 ) + (δ¯ + ¯b) t˜x→0 t˜x
T
Ef (Y (0))dt.
0
Substituting for f (Y (0)) = Δ(t)U (t, Δ(t)x0 ), we see that the right hand side is precisely K(T ). Now let T grow through some sequence (OT) or all sequences (SCU), and the conclusion follows. 2 A more recognizable form of the criterion follows immediately:
389
Nils Chr. Framstad Corollary 1. Assume T¯ = ∞, δ = δ¯ (constant) and that for all (t, c) ∈ [0, ∞) × (0, x0 ], the utility function U (t, c) coincides with e−ρt u(c) for some nondecreasing and concave u, with constant ρ > −δ¯ ¯ 0 ) < ∞). If ¯b > ρ, then K(T ) is bounded away from 0 (implying J∞ (H ¯ 0 is neither SCU-optimal nor optimal for all T large enough and thus H in the ordinary sense. Proof: Consider K. Observe that the coefficient in front of the integral is positive, and that by concavity, U (t, Δ(t)x0 ) ≥ e−ρt u (x0 ). Substitute this and calculate explicitly the (over)estimate, and note that it is increasing at infinity. 2 Note that the conditions of infinite horizon and constant discount rate go hand in hand here: finite horizon T¯ corresponds to shifting ρ to +∞ at T¯, so a corresponding result would be less powerful for T¯ < ∞. By proceeding as in the proof, we can nevertheless obtain conditions for non-depletion for this and other cases of time-dependent ρ. We skip the details. Corollary 1 suggests that linear utility is “worst case” among the concaves – this is however no proof, as we have only compared within a narrow class of (usually sub-optimal) strategies. But even an expected profit maximizer will not want to deplete the population immediately if growth at 0 exceeds the discount rate, just as in the case where C equals the extraction rate itself. Rather than using concavity, we may sometimes want a stronger result for a given utility function: Example. (CRRA utility) Assume that we are in the setting of Corollary 1, with u being CRRA, i.e. u(c) =
c1−θ − 1 1−θ
for some θ ≥ 0.
Calculating K(∞) explicitly, we first find that K(∞) = +∞ in the case ρ + δ · (1 − θ) ≤ 0; otherwise, we have ρ + δ · (1 − θ) K(∞) = ¯b + δθ − ρ. xθ0 We get an improved condition compared to Corollary 1: immediate depletion is non-optimal even in all three senses if either ρ ≤ δ · (θ − 1) or ρ < ¯b + δθ, 390
Stochastic Control and Non-Depletion of Renewable Resources and we note that for the most interesting parameter ranges ¯b + δ > 0, the former holds if the latter does. In particular, it will always hold for large enough δ, i.e. low enough degree of intertemporal substitution, cf. in particular the limiting case where k = δ → ∞. Thus our class of preferences gives for CRRA utility the entire spectrum of conditions from none at all in the classical nonlinear case (k = δ → ∞, θ > 0 – cf. the discussion at the beginning of section 12.2), to ρ < ¯b for no decay in direct utility (δ = 0). We see that δ and θ enter our criterion only as a product δθ, i.e. symmetrically. An economic interpretation could be that intertemporal decay plays a similar role as risk aversion represented by the Arrow-Pratt index: no decay or zero risk aversion gives the classical ρ < ¯b criterion, while high decay or high risk aversion indicates a more conservative resource extraction strategy. We remark though that this interpretation is not sufficiently supported by logic: again, we have only compared immediate depletion with non-optimal strategies. 12.5
Concluding remarks
It is no surprise that intertemporal direct utility can be “worse” (from a conservationist point of view) than the classical case, as seen by our example: In the extreme, direct utility functions which yield −∞ at zero will immediately prohibit immediate depletion in the classical case, whereas an initial “gulp” will grant the agent finite running utility in our setup. Nevertheless, we have shown that the well-known discount-rate criterion for non-depletion seems fairly robust to these kinds of preferences and to non-Markovian noise. Arguably, there is a weakness that the model has no financial market; the harvested amount must immediately be converted into consumption. There might be a price ratio, but there is no way of investing in this model. Finally, we mention that our results do admit several improvements in the technical conditions. For example, the only need for U is to be able to deal with stochastic differentials. The Meyer-Ito-formula allows for c "→ U to merely be a difference between convex functions, and the second-order terms still vanish in the limit transition. Maybe more importantly, we can – under the appropriate conditions – allow the coefficients to depend on other stochastic parameters than X, and in this way cover more complex systems.
391
Nils Chr. Framstad Acknowledgements: Main research was carried out at the Department of Mathematics, University of Oslo. The author gratefully acknowledges financial support from the Research Council of Norway. References: Alvarez, L. H. R. (2001) “Singular Stochastic Control, Linear Diffusions, and Optimal Stopping: a Class of Solvable Problems.” SIAM Journal on Control and Optimization 39: 1697–1710. Framstad, N. C. (2003) “Optimal Harvesting of a Jump Diffusion Population and the Effect of Jump Uncertainty.” SIAM Journal on Control and Optimization 42: 1451–1465. Hindy, A., and Huang, C.-F. (1992) “Intertemporal Preferences for Uncertain Consumption: a Continuous Time Approach.” Econometrica 60: 781–801. Hindy, A., Huang, C.-F., and Kreps, D. (1992) “On Intertemporal Preferences in Continuous Time: the Case of Certainty.” Journal of Mathematical Economics 21: 401–440. Pindyck, R. S. (1984) “Uncertainty in the Theory of Renewable Resource Markets.” Review of Economic Studies 51: 289–303. Seierstad, A., and Sydsæter, K. (1987) Optimal Control Theory with Economic Applications. North-Holland Publishing Co., Amsterdam.
392
Chapter 13 Capital Accumulation in a Growth Model with Creative Destruction
Klaus W¨alde University of W¨ urzburg, W¨ urzburg, Germany
13.1
Introduction
Aghion and Howitt (1992) have presented a very influential model of endogenous growth. Long-run growth results from R&D for improved intermediate goods where each new vintage of intermediate goods yields a higher total factor productivity. While the original presentation of the model did not take capital accumulation into consideration, various more recent contributions (Aghion and Howitt, 1998; Howitt and Aghion, 1998, Howitt, 1999) combined capital accumulation with R&D. These contributions share the feature of risk-neutral agents. W¨alde (1999a) has shown that introducing risk-averse households into the Aghion and Howitt (1992) model substantially alters equilibrium properties. Three out of four market failures disappear and a new market failure resulting from a complementarity in financing R&D is identified. It is the objective of the present paper to show that extending also Aghion and Howitt’s (1998), Howitt and Aghion’s (1998) and Howitt’s (1999) model for risk-averse households considerably broadens the range of phenomena to which their model can be applied. Such an extension allows us to understand not only long-run growth but also short-run fluctuations. Aghion and Howitt’s basic setup implies therefore much richer predictions once the assumption of risk-neutrality is relaxed.
Klaus W¨alde The next section presents a model that contains the central features of Aghion and Howitt’s setup, notably an R&D sector whose probability of success (arrival rate) depends on the amount of resources allocated to R&D. In addition, capital accumulation and the consumption and investment decision of risk-averse households are modeled explicitly. The economy we present produces one good that can be used for consumption, for capital accumulation and as an input for risky R&D. This good employs capital and labor. Risk-averse households can use their savings for financing capital accumulation and R&D. As this investment decision is based on (expected) returns, the amount of resources allocated to capital accumulation will be high when returns to capital accumulation are high relatively to expected returns to R&D. With high capital returns, capital accumulation will be fast. When returns to capital accumulation have fallen (due to decreasing returns to capital), capital accumulation will be slower - just as on the saddle path of a standard Ramsey growth model. When capital returns are sufficiently low, research for new technologies will be financed. Once research is successful, a new technology is available and returns to capital accumulation will again be high. The discrete increase in total factor productivity due to new technologies combined with gradual capital accumulation allows us to understand how short-run fluctuations and long-run growth are jointly determined.1 The two features of the model that differ from Aghion and Howitt’s setup are worth of being emphasized: First, gradual capital accumulation can be studied only with risk-averse households. Hence without risk-aversion, short-run fluctuations (of the type presented here) cannot be understood. Second, some features of Aghion and Howitt’s setup, which are not essential for the argument we want to make here, are not taken into consideration. Most importantly, the present model does not have any imperfect competition features. Modeling an economy that is perfectly competitive in all sectors (and therefore has no monopolist in the intermediate good sector) makes the model very tractable. Incentives for R&D are nevertheless present in a decentralized economy, as the outcome of R&D is assumed to consist not only in a blueprint but also in a prototype of the new units of production. As will become clear below, the qualitative properties of the present model should be identical to the qualitative properties of a model with a monopolist in the intermediate goods sector. 1 Some equilibrium properties of the present model resemble the findings of Bental and Peled (1996) and Matsuyama (1999), who use a discrete-time framework.
394
Capital Accumulation, Growth, and Creative Destruction 13.2
Framework of the model
13.2.1 Technologies Technological progress is labor augmenting and embodied in capital. A capital good Kj of vintage j allows workers to produce with a labor productivity of Aj . Hence, a more modern vintage j +1 implies a labor productivity that is A times higher than labor productivity of vintage j. The production function corresponding to this capital good reads 1−α . (1) Yj = Kjα Aj Lj The amount of labor allocated to this capital good is denoted by Lj , 0 < α < 1 is the output elasticity of capital. The sum of labor employment Lj per vintage equals aggregate constant labor supply L, Σqj=0 Lj = L, where q is the most advanced vintage currently available. Independently of which vintage is used, the same type of output is produced. Aggregate production therefore equals Y = Σqj=0 Yj .
(2)
Aggregate output is used for producing consumption goods C, investment goods I and it is used as an input R for doing R&D, C + I + R = Y.
(3)
The objective of R&D is to develop capital goods that yield a higher labor productivity than existing capital goods. R&D is an uncertain activity which is modeled by the Poisson process q. The probability per unit of time dt of successful R&D is given by λdt, where λ is the arrival rate of the process q. This arrival rate is an increasing function of the amount of resources R used for R&D, λ = R/D (q) .
(4)
The parameter D (q), a fundamental of the model, captures differences in sector input requirements between R&D and the other sectors. It is an increasing function of the currently most advanced vintage q, as will be discussed later. It will basically be used to remove the well-known scale effect (Backus, Kehoe and Kehoe, 1992; Jones, 1995; Segerstrom, 1998; Young, 1998; Howitt, 1999) in the present model. 395
Klaus W¨alde When R&D is successful, a first prototype of a production unit that yields a labor productivity of Aq+1 becomes available. Let the size of this first machine be given by κq+1 .2 It might appear unusual that research actually leads to a first production unit. Usually, output of successful research is modeled as a blueprint. It should not be too difficult to imagine, however, that at the end of some research project, engineers have actually developed a first machine that implies this higher labor productivity. With this assumption, there are incentives to finance R&D in a decentralized economy, even though all sectors produce under perfect competition: Those who have financed R&D obtain the production unit κq+1 , whose capital rewards balance R&D costs. Hence, no profits by a monopolist are required.3 As a second effect of successful R&D, the economy can accumulate capital that yields this higher labor productivity. This is a positive externality.4 Each vintage of capital is subject to depreciation at the constant rate δ. If more investment is allocated to vintage j than capital is lost due to depreciation, the capital stock of this vintage increases in a deterministic way, dKj = (Ij − δKj ) dt,
j = 0...q.
(5)
When research is successful, the capital stock of the next vintage q + 1 increases discretely by the size κq+1 of the first new machine of vintage q + 1, (6) dKq+1 = κq+1 dq. Afterwards, (5) would apply to vintages j = 0...q + 1.5 Before describing households in this economy, we now derive some straightforward equilibrium considerations that both simplify the presentation of the production side and, more importantly, the derivation of the budget constraint of households in the next section. Allowing labor to be mobile across vintages j = 0...q such that wage rates equalize, the total output of the economy can be repre2 The size can differ for different vintages and we will later assume that κq increases in q. 3 With a monopolist and capital, agents could hold capital and shares in the monopolist. This would require asset pricing, which would make the model intractable when transitional dynamics are to be analyzed. 4 There is an interesting link to the Coase theorem as it was recently amended by Dixit and Olson (2000): When bundling a collective good (the new technology) with a private good (the new machine), the collective good will be provided. 5 Formally, this equation is a stochastic differential equation driven by the Poisson process q. The increment dq of this process can either be 0 or 1. Successful R&D means dq = 1. For an introduction, cf., e.g., Dixit and Pindyck (1994).
396
Capital Accumulation, Growth, and Creative Destruction sented by a simple Cobb-Douglas production function (cf., Appendix A) Y = K α L1−α . (7) Vintage specific capital stocks have been aggregated to an aggregate capital index K, K = K0 + BK1 + ... + B q Kq = Σqj=0 B j Kj ,
B=A
1−α α
(8)
.
This index can be considered to be a quality-adjusted measure of the aggregate capital stock, where B j captures the quality of capital of vintage j. The value marginal productivity of a vintage j is then given by ∂Y j B , (9) wjK = pc ∂K where pc is the price of the consumption good. The evolution of this aggregate capital index K follows from (5) and (6). Given that the price of an investment good does not depend on where this investment good is used, that depreciation is the same for all investment goods and given that value marginal productivities (9) are highest for the most advanced vintage, investment takes place only in the currently most advanced vintage q, 0 ∀j < q . Ij = I j=q Hence, dK =
−δK0 − BδK1 − ... − B q−1 δKq−1 + B q [Iq − δKq ] dt
+B q+1 κq+1 dq = (B q I − δK) dt + B q+1 κq+1 dq.
(10)
Concerning prices in this economy, technologies presented above imply (11) p Y = p c = p I = pR Good Y will be chosen as numeraire. Prices pY , pc , pI , pR will therefore be constant throughout the paper; we will nevertheless use them at various places (and not normalize to unity) as this makes some derivations more transparent. As long as investment is positive, the price vq of an installed unit of the most recent vintage of capital equals the price of an investment good, vq = pI . As different vintages are 397
Klaus W¨alde perfect substitutes in production (8), prices of different vintages are linked to each other by pI = vq = B q−j vj ,
∀j = 0...q.
(12)
Further, the price pK of one efficiency unit of capital (which corresponds to one unit of capital of vintage 0) is a decreasing function of the most advanced vintage q, pK = B −q pI .
(13)
This also reflects the term B q in the capital accumulation equation (10). The pricing relationship (12) reveals a creative destruction mechanism in the model, despite the absence of aggressive competition between firms (as e.g., in the original Aghion and Howitt model where the intermediate firm is always a monopolist). When a new vintage is found, i.e., when q increases by one, the price of older vintages relative to the consumption good fall as by (12) and (11) vj /pc = B −(q−j) . Capital owners therefore experience a certain reduction in their real wealth. 13.2.2
Households
There is a discrete finite number of households in this economy. Each household is sufficiently small to neglect the effects of own behavior on aggregate variables. Households maximize expected utility U (t) given by the sum of instantaneous utility u (.) resulting from consumption flows c (τ ) , discounted at the time preference rate ρ, ∞ e−ρ[τ −t] u(c(τ ))dτ, (14) U (t) = E t
where the instantaneous utility function u (.) is characterized by constant relative risk aversion, u(c (τ )) =
c (τ )1−σ − 1 . 1−σ
(15)
For saving purposes, households can buy capital and finance R&D. When they buy capital, their real wealth a increases in a deterministic and continuous way. This increase depends on the difference between capital plus labor income ra + w minus expenditure i for R&D and expenditure pc c for consumption. When financing R&D, i.e., when i 398
Capital Accumulation, Growth, and Creative Destruction is positive, successful research changes their wealth in a discrete way. A household receives the same share of the value of the successful research project that it has contributed to financing this project. When total investment into research is given by J, the household receives the share i/J.6 The value of the successful research project depends on the price vq+1 of the capital good and the ”size” κq+1 of the prototype. In summary, the budget constraint (16) is a stochastic differential equation, where the deterministic part (.) dt stems from buying capital and the stochastic part (.) dq captures the effects of financing R&D. As in (6), when R&D is successful, the increment dq of the Poisson process q underlying R&D equals unity, otherwise, dq = 0. A negative effect of successful research stems from the devaluation of capital, as discussed in relation to the pricing equation (12). As the relative price (13) of an efficiency unit of capital falls when a new vintage is discovered, households experience a loss in the value of their assets relative to the consumption good price. The share of assets that is ”lost” due to this devaluation is denoted by s. Hence, % da =
& & % i w−i − c dt + κq+1 − sa dq, ra + pY J
where s=
(16)
B−1 B
(17)
∂Y − δ. ∂K
(18)
and the interest-rate is given by r = Bq
This budget constraint is formally derived in Appendix B. 13.3
Solving the model
This section shows that the economy can be analyzed almost as easily as a standard textbook growth model. All optimality and equilibrium conditions will be expressed in terms of aggregate consumption C and the capital stock K. The behavior of the economy is summarized in the next section as an almost standard phase diagram. 6 This sharing rule introduces an externality in this economy. Individuals tend to invest too much, as shown (in a different setup) in W¨ alde (1999a).
399
Klaus W¨alde 13.3.1
Investment decisions of households
Households maximize utility (14) subject to the budget constraint (16) by choosing investment i into R&D and the consumption level c. Optimal investment follows a bang-bang investment rule saying that either no savings are used for R&D at all or all savings are used for R&D. Formally (cf., Appendix C or W¨alde, 1999b), i > 0 ⇐⇒ r − ρ − λ [1 − (1 − s) Ω] 0. = = ra + w/pY − c pY (19) where u (˜ c) Ω= (20) u (c) is the ratio of marginal utility of consumption under the new technology to marginal utility of consumption under the current technology. In this paper, a tilde (˜) denotes the value of a variable immediately after successful R&D. This rule says that R&D is not financed (i = 0) when the right hand side is positive, i.e., when returns r to capital accumulation are sufficiently high. With low returns such that the right hand side is zero (as shown in W¨alde (1999b) and, as we will see, it cannot be negative in equilibrium), all savings net of capital depreciation will be used for financing R&D. This bang-bang result might be surprising but it is extremely useful for keeping the model tractable. It is the consequence of three sufficient (not necessarily necessary) conditions: (i) there is a representative consumer, so distributional aspects are not taken into consideration here, (ii) the R&D sector operates under constant returns to scale, a standard assumption that allows us to model perfect competition and (iii) the result κq+1 of a successful research project is independent of the amount of resources allocated to R&D. More or less investment into R&D has only an impact on the probability of success, not on its outcome κq+1 .7 It is important to note at this point that allocating all savings to R&D implies that wealth of households remains constant (as long as R&D is not successful). This directly follows from inserting i/pY = ra + w/pY − c into the budget constraint (16) of households. When wealth of households is constant, aggregate wealth, i.e., the capital 7
This bang-bang property can also be found in central planner solutions of economies of this type (W¨alde, 2001). A technical condition is the continuoustime setup. Discrete-time models would have an interior solution (W¨ alde, 1998, ch.8).
400
Capital Accumulation, Growth, and Creative Destruction stock, needs to be constant as well. Hence, when all savings are allocated to R&D, there is still some investment in new equipment such that depreciation is just balanced. Looking at the expression for the interest-rate (18) shows that this is no contradiction to the allocation of all savings to R&D. Savings are net savings, i.e., gross savings Bq
∂Y a + w/pY − c ∂K
minus δa, losses due to depreciation. Hence, gross savings are used for keeping wealth (and thereby the capital stock) constant and for financing R&D, Bq
∂Y a + w/pY − c = δa + i/pY . ∂K
From an intuitive point of view, this rule can most easily be understood by looking at the Keynes-Ramsey rule that would hold in an economy where households allocate savings both to R&D and capital accumulation, i.e., where an interior solution for investments into R&D exists. It reads (cf., Appendix D) −
du (c) = [r − ρ − λ [1 − [1 − s] Ω]] dt + [1 − Ω] dq. u (c)
(21)
The deterministic part of this rule is identical to the investment rule. The deterministic part says that consumption grows as long as the interest rate r is sufficiently high. When the interest rate is too low, no further accumulation of assets takes place. This is a well-known relationship from standard growth models. This helps to understand the above investment rule for the case where no interior solution for investment into R&D exists. As long as the interest-rate is sufficiently high, only capital accumulation takes place and consumption rises. When the interest-rate has fallen to ρ + λ [1 − [1 − s] Ω] , no further assets are accumulated and consumption is constant. Hence all savings go to financing R&D. 13.3.2 The regimes of the economy We now exploit the implications of the investment rule. When no R&D is undertaken, the economy finds itself in a period of deterministic changes, the deterministic regime. When R&D is undertaken, the economy finds itself in a stochastic regime. 401
Klaus W¨alde Deterministic regime When all savings are allocated to capital accumulation, no research takes place and no uncertainty is present in the economy. In those ”deterministic times”, consumption follows the standard Keynes-Ramsey rule, u (C) ˙ C = r − ρ. (22) − u (C) where C is aggregate consumption. Capital accumulation is then also deterministic and reads from (10) and (3) K˙ = B q (Y − C) − δK.
(23)
Stochastic regime By contrast, when the interest rate is sufficiently low such that the investment rule (19) advises to allocate all savings to R&D, the economy finds itself on what could be called the R&D line. This line follows from the investment rule (19) and reads (cf. Appendix E) Bq
Y − B −q δK − C ∂Y −δ−ρ= ∂K D (q)
% 1−
D (q) Bκq+1
& .
(24)
This line gives combinations of the aggregate capital stock and consumption where the economy is in the stochastic regime. As follows from the discussion of (19), individual consumption, individual wealth and therefore aggregate consumption and the aggregate capital stock are constant on this line. The economy is therefore in a transitory stationary equilibrium. At some point, however, a new technology will be found and individuals adjust their saving plans. The associated jump in consumption is given by (cf. Appendix E) u (C) =
13.4
κq+1 ˜ u (C). D (q)
(25)
Cycles and growth
This section shows how a phase diagram can be used to illustrate the equilibrium path of the economy. It also presents selected properties of time paths as predicted by the model. 402
Capital Accumulation, Growth, and Creative Destruction 13.4.1 The equilibrium path Studying a standard phase diagram would be cumbersome, as the phase diagram ”grows” with each vintage. More formally, zero-motion lines are a function of the most advanced vintage q and they shift outward when q rises. We will therefore present a phase diagram where variables have been transformed according to ˆ q/α , K = KA
ˆ q. C = CA
(26)
ˆ and C, ˆ zero-motion With these new productivity-adjusted variables K lines are vintage-independent. Zero-motion lines The phase diagram consists of zero-motion lines and, in addition to standard phase diagrams, of an R&D line. Productivity-adjusted capital and consumption follow (cf., Appendix F) d ˆ ˆ K = Yˆ − Cˆ − δ K, dt rˆ − ρ ˆ d ˆ C = C. dt σ
(27) (28)
The zero-motion lines for consumption and capital are then ˆ Cˆ = Yˆ − δ K, where ˆ α L1−α , Yˆ = K
rˆ = ρ,
1−α ˆ rˆ = α L/K − δ.
(29)
The R &D line and jumps in consumption and capital Transforming the R&D line (24) yields (cf., Appendix F) ˆ −φ Cˆ = Yˆ − δ K
rˆ − ρ . φ 1 − A1/α κ ˆ0
(30)
For this transformation, we assumed D (q) = φAq ,
and κq+1 = Aq+1 κ ˆ0,
(31)
where φ is a positive parameter that reflects relative productivity of investment vs. R&D. This derivation assumed also 1−
φ D (q) = 1 − 1/α > 0. Bκq+1 A κ ˆ0
(32) 403
Klaus W¨alde Both the parameter φ and this parameter restriction will be discussed when drawing the phase diagram. The first assumption in (31) is by now standard in models of economic growth. It implies by the resource constraint (3) of the economy that more resources are required to increase labor productivity with a ”probability” λ from q to q + 1 than with the same λ from q − 1 to q. When q machines with each one providing higher productivity have already been developed, it is harder to find new and more productive ones.8 Without this assumption, the economy would be characterized by the scale effect: The larger the economy, the faster it grows (which is empirically disputed, cf., e.g., Jones 1995 or Backus, Kehoe and Kehoe, 1992). This scale effect has been solved in many ways (Segerstrom, 1998; Young, 1998; Howitt, 1999), of which the approach chosen here (close to Segerstrom, 1998) appears to be the simplest one from a modelling perspective. The second assumption implies that the size of new machines is such that total factor productivity of this new machine (compare the technology (1)) is A times higher than total factor productivity with the previous vintage. Both assumptions together yield a productivityadjusted R&D line (30) that is an invariant line in the productivityadjusted phase diagram. The consumption jump condition (25) reads with (15) and (31) C
−σ
Aˆ κ0 ˜ −σ C ⇔ C˜ = = φ
%
Aˆ κ0 φ
&1/σ C
for actual consumption and Cˆ˜ = A−1
%
Aˆ κ0 φ
&1/σ Cˆ
(33)
for productivity-adjusted consumption. The capital stock increases due to successful research according to ˜ − K = B q+1 κq+1 . Productivity-adjusted capital changes (10) by K following (26) and (31) are then given by ˆ˜ = K/A ˆ 1/α + κ K ˆ0.
(34)
8 Segerstrom (2001) provides convincing data on R&D expenditures by Intel who supports this (and his) view.
404
Capital Accumulation, Growth, and Creative Destruction d dt C=0
C
R&D line
d dt K=0
m
pa th
EP
ri u ib l i u eq C0 EP
C K
K0
K
Figure 1: Long-run growth and short-run cycles. Equilibrium Let us now plot the phase diagram that will help to understand what ˆ on the horizontal an equilibrium in this economy is. Figure 1 plots K ˆ and C on the vertical axis. Zero-motion lines have the usual shape and laws of motion indicated by arrows are identical to standard Ramsey growth models as well. The R&D line is upward sloping and crosses the steady state. The slope of the R&D line crucially depends on φ, the parameter that captures relative productivity of the R&D sector vs. the investment good sector. A high φ means high productivity in the investment goods sector relative to the research sector (compare (31) and (4) with the resource constraint (3)). The higher φ, the further the R&D line (30) moves to the right. Ceteris paribus, this means longer capital accumulation before R&D starts. The R&D line lies below the zero-motion line for capital because ˆ 0 = 0, the R&D of the parameter restriction (32). If 1 − φ/ A1/α κ line would coincide with the zero-motion line for consumption. This can mosteasily be seen from the expression for the R&D line in (24). 1/α ˆ 0 > 0, the R&D line would lie above the zero-motion If 1 − φ/ A κ line for capital.9 9
In the present paper, we restrict attention to the case in (32). When the R&D line coincides with the zero-motion line for consumption, no resources are left for R&D when the R&D line is hit, and new technologies would never be discovered.
405
Klaus W¨alde Equations (27), (28), (30), (33) and (34) jointly determine the evolution of productivity-adjusted capital and consumption in this economy. An equilibrium is a path EP − EP as drawn in the phase ˆ 0 , Cˆ0 , following laws of motion (27) diagram, starting at a point K and (28), ending on the R&D line (30), jumping according to (33) and (34) to ˆ ˆ ˜ C˜ K, ending up after having followed again laws of motion (27) and (28) at ˆ 0 , Cˆ0 .10 K 13.4.2
Properties of the equilibrium path
This section will present properties of the equilibrium path of this economy. It studies both the long-run and the short-run predictions of the model. While it would be extremely interesting to calibrate this model and derive quantitative predictions, this is left for future work.11 Short-run fluctuations This economy is characterized by long-run growth with short-run fluctuations. The evolution of the economy can nicely be summarized using the above phase diagram. The subsequent discussion refers to actual quantities (like K and C rather than productivity-adjusted variˆ and C), ˆ assuming the economy is in equilibrium. ables K Let the economy start with a capital stock K0 and let it choose a consumption level such that it is on the equilibrium path EP − EP . As returns to capital are sufficiently high, no one wants to finance research for new technologies. The economy therefore accumulates more capital of the currently most advanced vintage and approaches the R&D line. Consumption rises and returns to capital fall. The implications of an R&D line lying above the zero motion line for capital are still to be worked out. 10 In equilibrium, the interest rate is always larger or equal to ρ+λ [1 − (1 − s) Ω], as argued in (19). The interest rate would be smaller than this expression only if the economy were below the R&D line. 11 As the objective of a theoretical model is to present an argument as easily as possible, certain predictions especially on cyclical and counter-cyclical behavior are extreme. In work in progress (in a discrete time version of the present model), the author shows that these extreme predictions can be weakened which makes the discrete-time version more suitable for calibration.
406
Capital Accumulation, Growth, and Creative Destruction After some finite length of time, the economy hits the R&D line (at the upper EP ). Investors realize that capital rewards have fallen so much that they are now indifferent between accumulating capital and financing research for a new technology. Resources that were used an instant before for producing new capital equipment are now used for searching for a better type of capital. As long as research is not successful, the economy remains on the R&D line at this point EP . Some new capital goods continue to be produced, just to compensate depreciation. Hence, the aggregate capital stock is constant. Once a new technology is found, the economy is hit by an endogenous technology shock. Its capital stock increases in a discrete way by the size of the new machine κq+1 , as shown in (10), and consumption jumps according to the consumption jump condition (25). The capital stock K unambiguously increases, consumption might rise or fall. After these discrete changes in aggregate capital and consumption, the economy starts accumulating capital again in a smooth way. It now accumulates capital of the new vintage. With this new vintage, the consumption level is on average A times higher than one vintage before. This increase in labor productivity implies positive long-run growth. Moving up the equilibrium path towards the R&D line implies non-constant growth rates. Short run properties of this model are presented in the following figures. Aggregate consumption plotted in the upper figure rises over time until the economy hits the R&D line at tR&D . From then on, research is undertaken and consumption is constant. At some point in time t∗ , research is successful. The length between tR&D and t∗ is indeterminate, while the expected length is given by λ−1 . Consumption rises or falls after successful research. Inserting assumptions (31) into (25) yields Aˆ κ0 ˜ (35) u C u (C) = φ This implies Aˆ κ0 C˜ ≷ C ⇔ ≷ 1. φ
(36)
As by the assumption (32) we made in deriving the R&D line, ˆ0 A1/α κ > 1, φ Aˆ κ0 /φ can be larger or smaller than unity, given that A is larger than unity by definition and 0 < α < 1. The increase in consumption from 407
Klaus W¨alde C AC*
C*
t R&D
O
t
t*
I
I, R
R I
I
R I
O
t R&D
t*
t
Figure 2: Time series of consumption, investment and R&D expenditure. a given point of one cycle to the same point of the next cycle is known. Denoting the consumption level on the R&D line by C ∗ , consumption on the R&D line in the next cycle is A times higher at AC ∗ . This immediately follows from the transformation (26) and the fact that, in equilibrium, productivity-corrected consumption Cˆ is at the same level independently of the currently most advanced vintage q. Output and capital follow qualitatively identical paths to consumption. In contrast to consumption, output and capital definitely increase after successful R&D. Output also increases from one cycle to the next one by the factor A. The physical capital stock increases by ˆ MeaA1/α , which also follows from (26) and vintage independent K. suring the capital stock in terms of the consumption good, however, shows that it increases by A as well (cf. next section on long-run growth). The growth rate of output relative to capital is given by the ˙ standard expression Y˙ /Y = αK/K. The growth rate of output relative to consumption depends on whether consumption drops or rises 408
Capital Accumulation, Growth, and Creative Destruction after successful R&D. When it drops, consumption grows faster than output (as at the end of a cycle, both have increased by the same factor A). If consumption rises more than output, output grows faster than consumption. Investment decreases over time, as does the interest rate, while resources R are allocated to R&D only at the end of a cycle. This is shown in the lower part of the figure. Resources R allocated to R&D in the stochastic regime are lower than resources used for investment I an instant before R&D starts: Aggregate output Y does not jump when the economy hits the R&D line, simply because the capital stock does not jump at this point. As consumption remains constant as well, the amount of resources for investment and R&D in the stochastic regime are just as high as an instant before the economy hits the R&D line. This follows from the resource constraint (3). As investment equals depreciation, not all resources that were used for investment go into R&D. Both quantities increase by the factor A from one cycle to the other. The prediction about the timing of R&D is extreme and will empirically probably not hold. The more general prediction of the model is that R&D investment is larger, when returns to capital accumulation are low.12 Long-run growth The model satisfies all of Kaldor’s stylized facts (cf., e.g., Barro and Sala-i-Martin, 1995) which are relevant for the present model. (i) Per capita output grows at a constant rate: Output at some fixed ˆ α L1−α , where point of a cycle q can be written with (26) as Y = Aq K the aggregate production function (7) and the transformation (26) was used. As the labor force L is constant and productivity-adjusted ˆ is the same at some fixed point (take, e.g., the capital stock capital K on the R&D line) of any cycle, output per capita increases by A from one cycle to the other. (ii) Physical capital per worker grows over time: ˆ which uses the argument This directly follows from K/L = Aq/α K/L just made. (iii) The rate of return to capital is nearly constant: The interest rate was computed for the R&D line in (29). As it is a function ˆ only, it does not display of the productivity-adjusted capital stock K any long-run trend. (iv) The ratio of physical capital to output is 12
In the discrete time version of the model mentioned in a footnote above, R&D takes place all of the time. The discrete time version is not as tractable as the version presented here, however.
409
Klaus W¨alde nearly constant: This stylized fact is the least obvious to see in the present model. Physical capital is measured as the value of all capital in an economy, deflated in an appropriate way. Here, the value of capital is given by its price pK per efficiency unit times the measure of the aggregate stock K. Using (8) and (13) yields pK K = B −q pI K. Dividing by the value of output, pc Y , yields ˆ ˆ B −q pI Aq/α K K B −q pI K = = , ˆ α L1−α ˆ α L1−α pc Y K pc Aq K where we used B −q Aq/α = Aq . Hence, capital per output is constant. (v) The shares of labor and physical capital in national income are nearly constant: This directly follows from a Cobb-Douglas production function. 13.5
Conclusions
The economy we have analyzed is characterized by short-run fluctuations and long-run growth. Both short-run fluctuations and long-run growth are caused by endogenous technology shocks. Technology shocks are endogenous, i.e., the point in time when a shock occurs depends on decisions made by agents in this economy, as the economy offers two saving technologies. Households accumulate capital when returns to capital accumulation are sufficiently high. Capital accumulation implies decreasing capital returns and, at some point, households put their savings into R&D activities when capital returns are low. When R&D is successful, a new technology is available, i.e., a technology shock occurs, and returns to capital accumulation are high again. This result follows from allowing households to be risk-averse. While capital accumulation and uncertain R&D have been studied in the literature, these results were so far not available, as risk-averse households were not taken into consideration. The present paper has shown that including this feature considerably broadens the range of phenomenon to which models of creative destruction and long-run growth can be applied.
410
Capital Accumulation, Growth, and Creative Destruction Acknowledgements: I thank Bettina B¨ uttner, seminar participants at CES Munich, Louvain-la-Neuve, the University of Amsterdam and the Federal Reserve Bank of Minnesota for useful discussions and comments and especially Pat Kehoe, Tim Kehoe and Paul Segerstrom for helpful suggestions and stimulating discussions. Olaf Posch and Benjamin Weigert provided excellent research assistance.
Appendix A: A vintage capital structure Vintage-specific technologies are given by 1−α , Y0 = K0α A0 L0 1−α , Y1 = K1α A1 L1 .. . Yq = Kqα (Aq Lq )1−α . Labour mobility implies equality of wages for all vintages j, wj = w0 ∀j. The wage rate of vintage j is given by &α % Kj wj = pc (1 − α) Aj . Aj Lj The wage rate of vintage 0 is w0 = pc (1 − α)
%
K0 A0 L0
&α
A0 .
Equality of wages for vintages 0 and 1 implies vintage 1 relative to vintage 0 of &α &α % % K0 K1 0 A = A1 ⇔ w0 = w 1 ⇔ A0 L0 A1 L1 0 1−α K1 1 K1 A ⇔ L1 = A α 1 L0 = A α L0 . A K0 K0 Undertaking the same steps for vintage j yields &α &α % % K0 Kj 0 w0 = w j ⇔ A = Aj ⇔ A0 L0 Aj Lj 0 j Kj A j 1−α Kj ⇔ Lj = A α j L0 = A α L0 . A K0 K0
labor allocation to 1 K0 K1 = Aα 1 0 A L0 A L1
(37)
j Kj K0 = Aα j 0 A L0 A Lj
(38) 411
Klaus W¨alde Inserting into the labor market clearing condition Σqj=0 Lj = L yields q 1−α Kq 1−α K1 L0 + A α L0 + ... + A α L0 = L ⇔ K0 K0 % & q 1−α 1−α L0 α K0 + A α K1 + ... + A = L⇔ Kq K0 K0 L0 = L, K
(39)
where K = K0 + A
1−α α
q 1−α α
K1 + ... + A
q
Kq ≡ K0 + BK1 + ... + B Kq .
Inserting (39) in (37) gives labor allocation to vintage 1, L1 =
A
1−α α
K1
K
L,
and inserting (39) in (38) gives labor allocation to vintage j, j 1−α α
Lj =
A
K
Kj
L.
Now aggregate over outputs. Output of vintage 0 is % % &1−α &1−α 0 1−α K0 L α α L Y0 = K 0 A L 0 = K0 = K0 , K K where we used (39). Output of other vintages are α
1
1−α
j
1−α
Y1 = K 1 A L 1
=
K1α
.. . α
Yj = K j A L j
=
Kjα
1
A α K1 L K j
A α Kj L K
1−α
% = K1
1−α
% = Kj
L A K 1 α
L A K j α
&1−α ,
&1−α .
Total output is then given by & % &1−α % 1−α q 1−α L α α + ...Kq A Y = Y0 + Y1 + ... + Yq = K0 + K1 A K = K α L1−α . 412
Capital Accumulation, Growth, and Creative Destruction Appendix B: The budget constraint Real wealth a of households is given by the sum of the number kj of units of capital of vintage j held by the household times their real price vj /pY , vj . (40) a = Σq+1 j=0 kj pY For reasons that will become clear in a moment, the sum extends from 0 to q + 1, though the most advanced vintage is vintage q and household therefore can not own any capital of vintage q +1, kq+1 = 0. Households trade only capital goods of the most recent vintage. The allocation of older capital goods is fixed (in equilibrium, households would be indifferent about trading old capital goods). Capital held by households therefore follows for old vintages j dkj = −δkj dt,
∀j < q,
for the most recent one q+1 Σj=0 wjK kj + w − i − pc c dkq = − δkq dt, vq
(41)
(42)
and for the next vintage q + 1 i dkq+1 = κq+1 dq. J
(43)
The capital stock kq in (42) of a household increases in a deterministic fashion when the difference between actual income and spending, K Σq+1 j=0 wj kj + w − i − pc c,
divided by the price vq of an installed or the price pI of a new unit of capital exceeds losses δkq of capital due to depreciation. Capital income K Σq+1 j=0 wj kj of households is given by factor rewards wjK for capital (value marginal productivities) times the amount of capital kj , summed up over all vintages. Equation (43) shows that in the case of a successful R&D project, i.e., when dq = 1, the household obtains the share i/J, i.e., depending on its investment i relative to total investment J into the successful project, of total payoffs κq+1 . A successful research project therefore 413
Klaus W¨alde increases the capital stock of vintage q + 1 held by the household from 0 to κq+1 i/J. After that, equation (42) applies to vintage q + 1. The price of a vintage j in terms of the numeraire good is given by (12) with (11). Hence, letting vintage prices evolve in all generality as d
vj vj vj = αj dt + βs dq, pY pY pY
(44)
we know that the deterministic change of the real price vj /pY must be zero, αj = 0 ∀j = 0...q. When research is successful, the price of a unit of a given vintage j in terms of the numeraire good drops as pY = B j−(q+1) .13 Hence, as then, by (12) and (11), v˜j /˜ d (vj /pY ) = v˜j /˜ pY − vj /pY , we have d (vj /pY ) = B j−(q+1) − B j−q . As a consequence and with (44), βs =
d (vj /pY ) B j−(q+1) − B j−q 1−B < 0, = = j−q vj /pY B B
which is identical for all vintages j ≤ q. Real vintage prices (44) therefore evolve according to d
vj B − 1 vj =− dq pY B pY
∀j < q.
(45)
This equation reflects the devaluation of old vintages relative to the numeraire good when a new vintage has been developed. This is the source of the creative destruction mechanism in the present model. We can now derive the budget constraint by computing the differential & % vj d k da = Σq+1 j . j=0 pY For all vintages 0 < j < q, we obtain with (41) and (45) and using Ito’s Lemma % & & % vj vj vj B − 1 vj vj d = − δkj dt + kj − kj − kj dq pY pY pY B pY pY vj B − 1 vj = −δ kj dt − kj dq ∀j = 0...q − 1. pY B pY 13 A tilde (˜) denotes the value of a quantity immediately after successful research.
414
Capital Accumulation, Growth, and Creative Destruction For the currently most advanced vintage q, we use (42) and (45) to obtain q+1 % & vq vq Σj=0 wjK kj + w − i − pc c d kq − δkq dt = pY pY vq & % B − 1 vq vq vq kq − − kq dq + pY B pY pY K w−i vq q+1 wj = Σj=0 kj + − c − δ kq dt pY pY pY −
B − 1 vq kq dq. B pY
For the next vintage q + 1 to come, from (43) and with a real price pY for the prototype after successful R&D, i.e., only when the v˜q+1 /˜ good κq+1 exists, % & vq+1 i i v˜q+1 (46) kq+1 = κq+1 dq = κq+1 dq. d pY p˜Y J J The real price equals unity, v˜q+1 /˜ pY = 1 from (12). Hence, κq+1 stands for the number of consumption goods that can be exchanged for the prototype. This is in accordance with the definition of real wealth in (40), which also is the number of consumption goods that can be changed for a. Summarizing, we obtain14 & % vj q+1 da = Σj=0 d kj pY % & vj B − 1 vj −δ = Σq−1 k dt − k dq j j j=0 pY B pY K w−i vq B − 1 vq q+1 wj kj + − c − δ kq dt − kq dq = Σj=0 pY pY pY B pY i + κq+1 dq J % & vj B − 1 vj q kj dq = Σj=0 −δ kj dt − pY B pY K w−i vq i q+1 wj kj + − c − δ kq dt + κq+1 dq + Σj=0 pY pY pY J 14 Here we need assets a to equal the sum over all vintages including the notyet-existing one q + 1, as we need to include the development of κq+1 in (46).
415
Klaus W¨alde % =
& w−i pc ∂Y q+1 j Σ B kj − δa + − c dt pY ∂K j=0 pY & % i B−1 a dq. + κq+1 − J B
where the last equality used (9). As (12) tells us pI B j = B q vj and pc = pI by (11), we can replace B j by B j = B q vj /pc and obtain % & w−i q+1 vj q ∂Y da = B Σ kj − δa + − c dt ∂K j=0 pc p & Y % B−1 i a dq + κq+1 − J B % & & % w−i i = ra + − c dt + κq+1 − sa dq, pY J where the interest-rate r and s stand for r = Bq
∂Y − δ, ∂K
s=
B−1 . B
Appendix C: The Bellman equation, the investment rule and the consumption jump condition The Bellman equations is (cf., e.g., Dixit and Pindyck, 1994) w−i ρV (a, q) = max u (c) + Va (a, q) ra + −c pY + λ [V (˜ a, q + 1) − V (a, q)]
(47)
with
i a ˜ = (1 − s) a + κq+1 . J The first order condition for consumption is u (c) = Va (a, q) .
(48)
(49)
The derivative with respect to real investment i/pY in R&D is 1 d {.} = −Va (a, q) + λVa˜ (˜ a, q + 1) κq+1 d (i/pY ) J/pY 1 . = −Va (a, q) + Va˜ (˜ a, q + 1) κq+1 D (q) 416
(50)
Capital Accumulation, Growth, and Creative Destruction As R&D is undertaken under perfect competition, total investment J into R&D equals total production costs pR R of R&D firms. Using (4), we obtain J = pR R = pR λD (q) , which has been used for the last equality in (50). The derivative (50) is in perfect analogy to expression (12) in W¨alde (1999b) (with κq+1 ≡ and D (q) ≡ pI /b though). Hence, the investment rule can be taken from there, taking the slightly different budget constraint (16) into account. The consumption jump condition is in analogy to W¨alde (1999b) as well: As on the R&D line, households are indifferent between financing R&D and accumulating capital, the derivative (50) with respect to investment i in R&D is zero. Inserting the first order condition for consumption (49) into (50) yields u (c) =
κq+1 u (˜ c) . D (q)
(51)
Replacing individual consumption by aggregate consumption (apply the inverse function of u (.) before) yields the aggregate consumption jump condition (25). Appendix D: The Keynes-Ramsey rule The marginal value of a unit of wealth Va (a, q) is a function of both assets a and of the technological level q. Applying the appropriate version of Ito’s Lemma (W¨alde, 1999b, appendix 1), the differential of the marginal value reads w−i − c dt+[Va˜ (˜ a, q + 1) − Va (a, q)] dq dVa (a, q) = Vaa (a, q) ra + pY (52) where a ˜ is as in (48). It is important to note that Ito’s Lemma is applied to the partial derivative of the function V (.) with respect to the first argument. This means that the jump-term Va˜ (˜ a, q + 1) − Va (a, q) is a difference between partial derivatives with respect to first arguments and not a difference between partial derivatives with respect to a. 417
Klaus W¨alde The partial derivative of the maximized Bellman equation using the envelope theorem, i.e., assuming interior solutions such that derivatives with respect to control variables are zero, reads (this is derived with more intermediate steps in W¨alde 1999b, app. 3) w−i − c + Va (a, q) r ρVa (a, q) = Vaa (a, q) ra + pY a, q + 1) − Va (a, q)] +λ [Va (˜ w−i − c + Va (a, q) r = Vaa (a, q) ra + pY a, q + 1) − Va (a, q)] +λ [(1 − s) Va˜ (˜ where the last equality used (48). With Va ≡ Va (a, q), Vaa ≡ Vaa (a, q) a, q + 1) and rearranging this reads and Va˜ ≡ Va˜ (˜ w−i −c . [ρ − r + λ] Va − λ (1 − s) Va˜ = Vaa ra + pY Replacing
w−i −c Vaa ra + pY
in (52) by this expression gives dVa = [(ρ − r + λ) Va − (1 − s) λVa˜ ] dt + [Va˜ − Va ] dq ⇔ Va˜ dVa Va˜ dt + = ρ − r + λ 1 − (1 − s) − 1 dq. (53) Va Va Va Using the first order condition for consumption (49), we can express the differential of Va in (53) as dVa = du (c) .
(54)
Dividing (54) by (49) yields du (c) dVa . = Va u (c) The Keynes-Ramsey rule therefore reads with (53) Va˜ du (c) Va˜ = r − ρ − λ 1 − (1 − s) − − 1 dq dt − u (c) Va Va = [r − ρ − λ [1 − [1 − s] Ω]] dt + [1 − Ω] dq. 418
(55)
Capital Accumulation, Growth, and Creative Destruction Appendix E: Deriving the R&D line We start from the investment rule (19) and first derive an expression for (1 − s) Ω. From the definition of Ω in (20) and the consumption jump condition (25) D (q) u (˜ c) = . (56) Ω= u (c) κq+1 Hence, from (17), (1 − s) Ω =
D (q) . Bκq+1
Then, we use the resource constraint (3) to express the arrival rate (4) on the R&D line where I = B −q δK as λ=
Y − B −q δK − C . D (q)
The fact that investment in capital just balances depreciation follows from looking at the budget constraint of households and the interest rate, as discussed after presenting the investment rule (19). Finally, using these two equations plus the expression for the interestrate (18) we obtain a rewritten expression for the investment rule (19) that is a function of the aggregate capital stock and aggregate consumption only, & % Y − B −q δK − C D (q) q ∂Y −δ−ρ> . (57) 1− B ∂K D (q) Bκq+1 This is (24) in the main text, where the inequality sign was replaced by the equality sign. The inequality sign is important to check whether investment in capital accumulation takes place above or below the R&D line. As this expression shows, capital is accumulated when consumption C is sufficiently high. The phase-diagram analysis will show that this implies that capital accumulation takes place above the R&D line. Appendix F: Transformation in section 13.4.1 This section derives the phase diagram in the transformed variables ˆ q as in (26). Starting from ˆ and C, ˆ where K = KA ˆ q/α , C = CA K ˆ follows (remember the definition (23), the transformed capital stock K 1−α B = A α in (8)) d ˆ ˆ q/α ⇔ d K ˆ = Yˆ − Cˆ − δ K. ˆ Aq/α K = B q Aq Yˆ − Aq Cˆ − δ KA dt dt 419
Klaus W¨alde where
ˆ α L1−α . Yˆ = K
(58)
With (22), consumption follows qˆ ˆ q u CA Y ∂ A d Aq Cˆ = B q −δ−ρ⇔ − dt ˆ q ˆ q/α u CA ∂ KA ˆ q u CA ∂ Yˆ d Aq Cˆ = − δ − ρ. − ˆ dt ˆ q ∂K u CA where we used the definition of the interest rate in (18). For our CES utility function, the LHS simplifies and one gets ˆ −δ−ρ ˆ ∂ Yˆ /∂ K dC/dt = . σ Cˆ Let us now derive the transformed R&D line. Replace actual consumption and capital levels by productivity-adjusted levels as in (26), &1−α % L Bq α −δ−ρ ˆ q/α KA & q ˆ − Aq Cˆ % Aq Yˆ − B −q δA α K D (q) ⇔ > 1− D (q) Bκq+1 % &1−α & ˆ − Cˆ % L Yˆ − δ K D (q) α −δ−ρ> . (59) 1− ˆ A−q D (q) Bκq+1 K 1−α
where we used B = A α from (8). This expression shows us that the R&D line is vintage independent (and therefore does not move in the phase diagram) if D (q) = φAq ,
and κq+1 = Aq+1 κ ˆ0,
where κ ˆ 0 if the productivity adjusted size of the prototype 0. Assuming 1−
φAq φ D (q) =1− = 1 − 1/α > 0 q+1 Bκq+1 BA κ ˆ0 A κ ˆ0
(60)
which is (32) in the main text, inserting these assumptions (which have an intuitive economic meaning given in the main text), defining 1−α ˆ rˆ = α L/K −δ 420
Capital Accumulation, Growth, and Creative Destruction as in (29) and solving for consumption yields, φ
rˆ − ρ ˆ − Cˆ ⇔ Cˆ > Yˆ − δ K ˆ − φ rˆ − ρ , > Yˆ − δ K φ φ 1 − A1/α κˆ0 1 − A1/α κ ˆ0
The capital stock increases due to successful research according to (10) by ˜ − K = B q+1 κq+1 . K Productivity-adjusted capital changes following (26) and (31) are then given by A
q+1 α
q+1 ˆ˜ = K/A ˆ˜ − A αq K ˆ = B q+1 Aq+1 κ ˆ 1/α + κ ˆ0 = A α κ ˆ0 ⇔ K ˆ0. K
References: Aghion, P., and Howitt, P. (1992) “A Model of Growth through Creative Destruction.” Econometrica 60: 323–351. Aghion, P., and Howitt, P. (1998) Endogenous Growth Theory. Cambridge, MA, MIT Press. Backus, D. K., Kehoe P. J., and Kehoe, T. J. (1992) “In Search of Scale Effects in Trade and Growth.” Journal of Economic Theory 58: 377–409. Barro, R. J., and Sala-i-Martin, X. (1995) Economic Growth. New York, McGraw-Hill. Bental, B. and Peled, D. (1996) “The Accumulation of Wealth and the Cyclical Generation of new Technologies: A Search Theoretic Approach.” International Economic Review 37: 687–718. Cripps, M.W., Keller, G., and Rady, S. (2002) Strategic Experimentation: The Case of Poisson Bandits. CESifo Working Paper No. 737. Dixit, A. K. and Olson, M. (2000) “Does Voluntary Participation Undermine the Coase Theorem?” Journal of Public Economics 76: 309– 335. Dixit, A. K. and Pindyck, R. S. (1994) Investment Under Uncertainty. Princeton University Press. Howitt, P. (1999) “Steady Endogenous Growth with Population and R&D Inputs Growing.” Journal of Political Economy 107: 715–730. 421
Klaus W¨alde Howitt, P. and Aghion, P. (1998) “Capital Accumulation and Innovation as Complementary Factors in Long-Run Growth.” Journal of Economic Growth 3: 111 –130. Jones, C. I. (1995) “R&D-Based Models of Economic Growth.” Journal of Political Economy 103: 759–84. Matsuyama, K., 1999, “Growing through Cycles.” Econometrica 67: 335 –347. Segerstrom P. S., 1998, “Endogenous Growth without Scale Effects.” American Economic Review 88: 1290–1310. Segerstrom P. S. (2001) Intel Economics. Stockholm School of Economics, mimeo. August 2001. W¨alde, K. (1999a) “A Model of Creative Destruction with Undiversifiable Risk and Optimising Households.” Economic Journal 109: C156– C171 W¨alde, K. (1999b) “Optimal Saving under Poisson Uncertainty.” Journal of Economic Theory 87: 194–217. W¨alde, K. (2002) “The Economic Determinants of Technology Shocks in a Real Business Cycle Model.” Journal of Economic Dynamics and Control 27: 1–28. Young, A. (1998) “Growth without Scale Effects.” Journal of Political Economy 106: 41–63.
422
Chapter 14 Employment Cycles in a Growth Model with Creative Destruction
Tapio Palokangas University of Helsinki and HECER
14.1
Introduction
The purpose of this chapter is to construct a model that would explain economic growth with fluctuations in output and employment. This study is closely related to theories of endogenous growth and real business cycles (RBC). Aghion and Howitt (1992) shows that the introduction of jump processes into general equilibrium models leads to endogenous business cycles. In their original model, however, there is a perfect labor market, no real capital, and the households were risk neutral. Aghion and Howitt (1998) incorporates capital accumulation and W¨alde (1999) risk averse households into the same model. Despite of these generalizations, it is still typical for these models that the economy generates output and employment cycles only outside the balanced-growth path. In this chapter, we construct a model which generates such cycles on the balanced-growth path, but in which there are constant equilibrium levels for the labor-capital ratio and the productivity-adjusted wages. In the long run, the economy is expected to follow a balancedgrowth path that satisfies Kaldor’s stylized facts as follows:1 1 Cf. Barro and Sala-i-Martin (1995), p. 5. The original reference is Kaldor (1963).
Tapio Palokangas 1. Output per physical labor in production grows over time. 2. Capital per physical labor grows over time. 3. The rate of return to capital is constant. 4. The proportion of output to capital is constant. 5. The share of labor in national income is nearly constant. Around this balanced-growth path, the economy generates cycles. There is large empirical evidence on the assertion that technology shocks are contractionary on impact. Gali (1999) and Basu et al. (2004) document for the U.S. and other G7 economies a negative correlation between technology shocks, identified under different assumptions, and several measures of labor and other inputs. Marchetti (2005) confirms the same result by panel data of Italian manufacturing firms. These authors, however, interpret the finding as evidence in favour of sticky nominal prices as follows. In the wake of technology expansion, nominal rigidities prevent prices from falling and thus aggregate demand does not increase. Therefore, firms produce the same output with a smaller volume of inputs, which have become more productive. In this chapter, an alternative explanation is constructed on the basis of sticky real wages. Introducing endogenous shocks into a RBC model, W¨alde (2002) showed that the ‘laissez faire’ economy and the social planner generate different outcomes. In his model, however, the economy is characterized by ‘bang-bang’ development: because R&D is subject to constant returns to scale and the same good is used in both R&D and capital accumulation, the firms either do R&D or invest in real capital, but not both at the same time. It is assumed that because the firms also learn from each other, technological change in a single firm is a function of R&D inputs of all firms in the economy. This means that firms invest in R&D and real capital simultaneously and the economy remains on a stationary state, although there are endogenous technological shocks. This study constructs a model where the economy adjusts to productivity shocks through employment, while wages evolve in proportion to the productivity of labor. The major causes of non-competitive real wages in macroeconomic models are efficiency wages and unionemployer bargaining. It is assumed that workers in R&D earn efficiency wages, i.e., their productivity depends on their expected relative wage, while workers in production belong to a labor union which sets wages for its members. In the production of goods, the marginal 424
Employment Cycles, Growth, and Creative Destruction product of labor is falling for a given capital stock. Hence, there are profits to be bargained over. In the R&D sector, the marginal product of labor is constant, there are no profits, but a worker’s efficiency depends on his expected relative wage. The interplay of union and efficiency wages generate involuntary unemployment, but relative wages are stable over a cycle. The remainder of this chapter is organized as follows. Technological change is specified in section 14.2, and R&D and capital accumulation in 14.3. Sections 14.4 and 14.5 introduce capitalists and wage settlement into the model. Growth and cycles are examined in sections 14.6 and 14.7. 14.2
Technology
There is a fixed number m of workers who supply labor and consume all their income, and a fixed number n of capitalists who invest in capital and R&D projects.2 Each capitalist owns and fully controls one firm. All firms produce the same consumption good which is chosen as the numeraire. In addition, each firm produces a capital good which is specific to the firm itself.3 The productivity of labor is unity in R&D and a in production. Learning by investment in capital by any firm contributes a stock of knowledge which is common for all firms. This spillover of knowledge increases the productivity of labor in production, a. Each capitalist j accumulates firm-specific capital Kj , produces goods Yj from labor Lj and firm-specific capital Kj and does R&D by labor Zj . Capitalist j produces the consumption good and the firmspecific capital good otherwise by the same technology, but total factor productivity in the production of the consumption good is subject to technological change. Thus, capitalist j accumulates firm-specific capital Kj by the amount Ij and converts the rest Yj − Ij of output Yj into a consumption good in proportion Aγj , where A > 1 is a constant and γj is the serial number of technology.4 2
I have to separate between workers and capitalists, for simplicity. If workers possessed any capital and were therefore also owners of firms, then it would be very difficult to model wage bargaining in section 14.5. 3 If capital were freely tradable and not firm-specific, then the capitalist’s budget constraint (2) should be modeled as Cj + Wj Lj + vZj + Ij = Aγj Yj . In that case, the capitalist’s propensity to consume, (17), would not be constant and there would be no solution for dynamic programming that characterizes the capitalist’s behavior in section 14.4. 4 If the production of capital goods were subject to technological change as
425
Tapio Palokangas Capitalist j produces its output Yj from labor Lj and capital Kj through a twice-differentiable production function with constant returns to scale: . (1) Yj = F (aLj , Kj ) = f (lj )Kj , lj = aLj /Kj , f > 0, f < 0. Capitalist j pays the wage Wj per worker in production and all capitalists pay the same wage v per worker in R&D. Capitalist j’s budget constraint is given by Cj + Wj Lj + vZj = Aγj (Yj − Ij ),
(2)
where Cj consumption, Yj output, Wj Lj and vZj wages in production and R&D, and Aγj (Yj − Ij ) supply of the consumption good. It is assumed that the capitalists have following rational expectations on the working of the labor market institutions in the economy: Assumption 1. Each capitalist j expects that the wage Wj for its workers in production will increase in proportion to the total productivity of these workers, aAγj . This implies that Wj = wj aAγj ,
(3)
where the productivity-adjusted wage wj is exogenous for capitalist j. Each capitalist j can increase the probability of technological change for it by investing in R&D. In the advent of technological change, the conversion ratio between consumption and investment for the capitalist increases from Aγj to Aγj +1 . This will generate cycles. The improvement of technology for capitalist j depends on the capitalist’s own R&D, Zj . In a small period of time dt, the probability that R&D leads to development of a new technology is given by (λ log Zj )dt, while the probability that R&D remains without success is given by 1 − (λ log Zj )dt, where λ is research workers’ productivity:5 1 with probability (λ log Zj )dt, dqj = (4) 0 with probability 1 − (λ log Zj )dt, where qj is the Poisson process resulting from capitalist j’s investment in R&D and dqj is the increment of this process. well, then the rate of return to capital would increase with time and Kaldor’s third stylized fact of growth [cf. section 14.1] would not hold. 5 The logarithmic specification is chosen, for analytical convenience. With some complication, the results could be generalized to the case in which the probability (λ log Zj )dt is replaced by (λ/ν) log Zjν dt, and ν ∈ (0, ∞) is a constant.
426
Employment Cycles, Growth, and Creative Destruction 14.3
R&D and capital accumulation
Assume that capital is a stock of goods that does not depreciate, for simplicity. Investment per unit of time dt then equals deterministic capital accumulation, Ij dt = dKjd . Solving for Ij from (2) and noting (1) and (3) yield dKjd = Ij dt = [f (lj ) − wj lj ]Kj − A−γj (Cj + vZj ) dt. (5) R&D is directed at developing new production units. Assume that after a successful development of new technology, a constant share φ of the previous vintage can be upgraded which therefore has the higher productivity.6 The remaining share 1 − φ of capital stock becomes obsolete. After successfully completing an R&D project, capital stock is then given by j = φKj , 0 < φ < 1. K (6) Given this definition, the entire capital stock belongs to the same vintage. Noting (5), capital accumulation for capitalist j is given by j − Kj )dqj = Ij dt + (K j − Kj )dqj dKj = dKjd + (K j − Kj )dqj . = [f (lj ) − wj lj ]Kj − A−γj (Cj + vZj ) dt + (K
(7)
This is a stochastic differential equation where uncertainty results from a Poisson process qj . During a small period of time dt, the capital stock of vintage γj increases deterministically by investment in capital accumulation. With a successful R&D project, dqj = 1, capital stock j − Kj and the level of productivity in the consumptionjumps by K goods sector rises by A. When no investment in R&D takes place or when R&D fails, the increment dqj is zero, the level of productivity does not change and there is no jump in capital stock Kj . Lucas (1988) shows that endogenous growth models produce constant steady-state growth only under “knife-edge” parameter assumptions. The “knife-edge” assumption of this model is the following: Assumption 2. There is no trend for unemployment. The economy would converge to full employment, if aggregate capital stock grew faster, and unemployment would increase indefinitely, if aggregate capital stock grew slower than the productivity of labor [cf. section 14.5]. Assumption 2 means that capital stock and the productivity of labor must in the long run grow at the same rate. 6
This idea is from W¨alde (2002).
427
Tapio Palokangas 14.4
Capitalists
Capitalist j maximizes its expected utility over time by choosing its streams of consumption Cj , the labor-capital ratio in production, lj , and investment in R&D, Zj , subject to the production function (1), capital accumulation (7) and the stochastic process (4), given the wage v and the productivity-adjusted wage wj . Let ρ > 0 be the constant rate of time preference and 1/(1−σ) the constant rate of risk aversion. The value of the optimal program starting at time t is then ∞ e−ρ(τ −t) Cjσ dτ Γ(Kj , Z, Wj , v, γj ) = max E C j , Z j , lj
t
s.t. (11) and (7),
(8)
where E is the expectations operator. Because the capitalist is a risk :j and Γ = Γ(K j , W :j , v , γj + 1) be averter, 0 < σ < 1 holds. Let K the values of Kj and Γ after successfully completing an R&D project. . Denoting ΓK = ∂Γ/∂Kj and noting (4) and (7), the Bellman equation of the optimal program of capitalist j is as follows:7 ρΓ(Kj , Z, Wj , v, γj ) = max Φ(Cj , Zj , lj , Kj , Z, Wj , v, γj ), Cj , Zj , lj
(9)
where . − Γ] + ΓK Ij Φ(Cj , Zj , lj , Kj , Z, Wj , v, γj ) = Cjσ + (λ log Zj )[Γ :j , Z, Wj , v, γj + 1) − Γ(Kj , Z, Wj , v, γj )] = Cjσ + (λ log Zj )[Γ(K + [f (lj ) − wj lj ]Kj − A−γj (Cj + vZj ) ΓK (Kj , Wj , v, γj ). (10) The first-order conditions associated with the optimal program of capitalist j are the following. First, maximizing (10) by the laborcapital ratio lj yields . wj = f (lj ), Πj = π(lj )Kj Aγj , π(lj ) = f (lj ) − f (lj )lj , (11) where Πj is capitalist j’s income (= profits). Second, maximizing (10) by consumption Cj yields σCjσ−1 = ΓK A−γj .
(12)
Finally, maximizing (10) by R&D Zj yields − Γ)λ/Zj − vΓK A−γj = 0. ∂Φ/∂Zj = (Γ 7
428
Cf. Dixit and Pindyck (1994).
(13)
Employment Cycles, Growth, and Creative Destruction Because the productivity-adjusted wage wj is exogenous for capitalist j, given (11), its optimal labor-capital ratio lj = (f )−1 (wj ) can be considered as constant in optimization. To solve the dynamic program, assume first that the capitalist’s consumption expenditure Cj is a fixed share cj ∈ (0, 1) of its income Πj , and second that the value function Γ is in fixed proportion (cj rj )−1 to the instantaneous utility Cjσ , where cj and rj are constants. From these, (6), (11) and (12) it follows that Cj = cj Πj = cj π(lj )Kj Aγj ,
∂Cj /∂Kj = cj π(lj )Aγj ,
Γ = Cjσ /(cj rj ),
σCjσ−1 ∂Cj 1 ∂Cjσ ΓK ∂Cj ΓK = = = π(lj ), cj rj ∂Kj cj rj ∂Kj cj rj Aγj ∂Kj rj j /Cj = AK j /Kj = φA, Γ/Γ = (C j /Cj )σ = (φA)σ , rj = π(lj ), C
ΓK =
Kj ΓK /Γ = Kj σCjσ−1 Aγj /Γ = Kj σcj rj Aγj /Cj = Kj σcj π(lj )Aγj /Cj = σ.
(14)
Assume that a technological change leads to the increase in welfare, > Γ, since otherwise, there would be no incentive to do R&D. Given Γ (14), there is then a constant . θ = Γ/Γ − 1 = (φA)σ − 1 > 0.
(15)
From (13), (14) and (15) it follows that vA−γj =
−Γ λ Γ θΓ λ θλKj = = . Γ K Zj ΓK Zj σZj
Given this equation, (14) and (15) imply vZj ΓK /Γ = θλAγj ,
Zj = [θλ/(σv)]Kj Aγj .
(16)
Inserting (10), (11), (14), (15) and (16) into (9) and (10) yields ρ = Φ/Γ
− 1)λ log Zj + π(lj )Kj − A−γj (Cj + vZj ) ΓK /Γ = Cjσ /Γ + (Γ/Γ = Cjσ /Γ + θλ log Zj + π(lj )Kj − A−γj (Cj + vZj ) ΓK /Γ = cj rj + θλ log Zj + [(1 − cj )π(lj )Kj − vZj A−γj ]ΓK /Γ = cj rj + θλ log Zj + (1 − cj )π(lj )σ − vZj A−γj ΓK /Γ = cj π(lj ) + θλ log Zj + (1 − cj )π(lj )σ − θλ = [(1 − σ)cj + σ]π(lj ) + θλ log Zj − θλ. 429
Tapio Palokangas Solving for cj yields the propensity to consume for capitalist j: cj = 14.5
σ ρ + θλ(1 − log Zj ) − . (1 − σ)π(lj ) 1−σ
(17)
Wage settlement
Workers employed by capitalist j in Product ion are organized in labor union j. In a bargain over the wage Wj , capitalist j attempts to maximize its profit Πj , while union j attempts to maximize its members’ income Wj Lj . I construct the reference income for the parties of bargaining as follows.8 Let the wage and employment before dispute be Wj0 and L0j , respectively. It is assumed that due to labor market legislation a fixed proportion β ∈ (0, 1) of the employed workers L0j cannot go on strike (e.g. protection work) and they get the previous wage wj0 during a strike. Noting (1), this implies that during a strike the labor force is βL0j and the employer earns in terms of consumption Π0j = Aγj F (aβL0j , Kj ) − Wj0 βL0j = f (βlj0 ) − wj0 βlj0 Kj Aγj . with lj0 = aL0j /Kj . (18) Thus, the reference income is zero for the union and (18) for the capitalist. It is assumed that both parties in bargaining take capital stock Kj , the level of Product ivity, aAγj , the previous wage wj0 and the previous capital-labor ratio lj0 as given.9 The Generalized Nash Product of the asymmetric bargaining between the union and the capitalist is 1−α . (19) Λj = (Wj Lj )α Πj − Πoj , 8
The same assumption is used in Palokangas (2005). Some papers assume that the expected wage outside the firm is the union’s reference point, but this is not quite in line with the microfoundations of the alternating offers game. Binmore, Rubinstein and Wolinsky (1986, pp. 177, 185-6) state that the the reference income should not be identified with the outside option point. Rather, despite the availability of these options, it remains appropriate to identify the reference income with the income streams accruing to the parties in the course of the dispute. For example, if the dispute involves a strike, these income streams are the employee’s income from temporary work, union strike funds, and similar sources, while the employer’s income might derive from temporary arrangements that keeps the business running. 9 If these parties took also the effect of the wage Wj through capital accumulation into account, then the union’s (capitalist’s) target would be the expected value of the stream of wages (profits). Because in our model capital stock follows a cycle, the mathematic solutions for such expected values would be very difficult to obtain.
430
Employment Cycles, Growth, and Creative Destruction where the constant α ∈ (0, 1) is the union’s relative bargaining power. Given (1), (11) and (18), the product (19) takes the form 1−α . Λj (Wj , Kj , α) = (Wj Lj )α [Πj − Πoj 1−α α = lj f (lj ) π(lj ) − f (βlj0 ) + wj0 βlj0 Kj Aγj 1−α α = lj f (lj ) π(lj ) − f (βlj0 ) + f (lj0 )βlj0 Kj Aγj .
(20)
The outcome of the bargaining is obtained through maximizing the Generalized Nash Product (20), given capital stock Kj and the level of productivity Aγj . Because there is a one-to-one correspondence between Wj and lj through (3) and (11), the wage Wj is replaced by the capital-labor rato lj as the instrument of this maximization. Hence, there must be . lj (lj0 ) = arg max Λj lj - 1−α . α . = arg max lj f (lj ) π(lj ) − f (βlj0 ) + f (lj0 )βlj0 lj
(21)
In equilibrium, the capital-labor ratio is constant over time, lj0 = lj . This, (21) and the symmetry throughout all j imply lj = l = constant.
(22)
Assume that a research worker’s productivity λ is an increasing function of his wage v relative to the expected wage after losing one’s job, ω, as follows:10 λ = (v/ω − 1)ζ ,
0 < ζ < 1,
(23)
where the elasticity ζ of productivity with respect to extra wage is a constant. An R&D firm chooses the wage for its employees, v, to minimize its unit cost of research, v/λ, given the expected wage after losing one’s job, ω. Noting (23), this yields the equilibrium conditions v = ω/(1 − ζ),
λ = [ζ/(1 − ζ)]ζ = constant.
(24)
Let m be the number of workers. If the number of capitalists, n, is large, the expected wage in the economy after losing one’s job, ω, is then equal to the sum of the wages Wj weighed by the probabilities 10 This assumption is a modification of the efficiency wage model presented in Solow (1979), Summers (1988) and Van Schaik and De Groot (1998).
431
Tapio Palokangas of being employed, Lj /m, for all capitalists j, plus the wage in11R&D, v, times the probability of being employed in R&D, j Zj /m: Zj 1 v Lj . +v = ω= Wj W j Lj + Zj . m m m j=1 m j=1 j=1 j=1 n
n
n
n
(25)
The macroeconomic real variables are as follows: . Cj , C= n
. I= Ij , n
j=1 n . Z= Zj , j=1
. Y =
j=1 n
. K= Kj , n
. L= Lj ,
j=1
n
j=1
Yj .
(26)
j=1
From (11), (16), (22), (24), (25) and (26) it follows that n θ vZ = λ Kj Aγj , σ j=1
Kj Aγj Zj = n , γk Z k=1 Kk A
(27)
n n ωm 1 1 m = W j Lj + Zj (1 − ζ) = Z vZ vZ j=1 Z j=1 n f l σ f (l)l + 1, = Kj Aγj + 1 = vZ j=1 θλ σ −1 f (l)l + 1 Z = (1 − ζ)m = constant. θλ
14.6
(28)
Economic growth
From (1), (11), (22) and (26) it follows that Wj = aAγj f (l),
Y = f (l)K,
L = lK/a,
[Yj − A−γj Wj Lj ]/Kj = f (l) − lf (l) = π.
(29)
These results can be rephrased as: 11
This specification is based on the simplification that the unemployed are supported by the employed workers in production. It would be a minor modification with the same results to extend the model as follows. (i) There are unemployment benefits. (ii) In line with Summers (1988), each unemployed worker obtains benefits in fixed proportion to e.g. the expected wage ω. (iii) The benefits are financed by a proportional labor income tax.
432
Employment Cycles, Growth, and Creative Destruction Proposition 1. The output-capital ratio Y /K = f (l) and the rate of return paid to capital, π(l), are constants. A worker’s wage in production, Wj , grows in fixed proportion to the total productivity of labor in production, aAγj , for each capitalist j. Defining the serial number γ of macroeconomic technology so that A (Y − I) = γ
n
Aγj (Yj − Ij ),
(30)
j=1
and aggregating throughout all j, the equation (2) becomes C+
n
Wj Lj + vZ = Aγ (Y − I),
j=1
where C is the capitalists’ total consumption, nj=1 Wj Lj + vZ the workers’ total consumption (= all wages paid in the economy) and Aγj (Y − I) aggregate production of the consumption good. Noting (4), (27) and (30), the average growth rate of aggregate productivity Aγ is as follows:12 E log Aγ+1 − log Aγ n ∂[Aγ (Y − I)] Aγj (Yj − Ij ) E log Aγj +1 − log Aγj = γ γ j ∂[A (Yj − Ij )] A (Y − I) j=1 =
n Aγj (Yj − Ij ) E log Aγj +1 − log Aγj γ A (Y − I) j=1
n Aγj (Yj − Ij ) log Zj Aγ (Y − I) j=1 n Aγj (Yj − Ij ) Z nKj Aγj n log =λ γk Aγ (Y − I) n k=1 Kk A j=1 n nKj Aγj Z Aγj (Yj − Ij ) , log n = λ log + γk n j=1 Aγ (Y − I) k=1 Kk A
=λ
(31)
where E is the expectations operator. Now assume that the number of capitalists, n, is high enough. Because the terms Aγj (Yj − Ij ) n→∞ Aγ (Y − I) lim
12
For this, see Aghion and Howitt (1998), p. 59.
433
Tapio Palokangas approach zero for all j, but the terms nK Aγj n j γk k=1 Kk A are constrained and close to one for all j, when n → ∞, the equation (31) can be written in the form lim E log Aγ+1 − log Aγ = λ log(Z/n). n→∞
This result can be rephrased as follows: Proposition 2. The level of productivity in the consumption-good sector, Aγ , grows on the average at the constant rate λ log(Z/n). A technological change that increases capitalist j’s productivity ˜ j = φKj from Aγj to Aγj +1 and decreases its capital stock from Kj to K follows the Poisson process qj with (4). This means that given (30), proposition 2 and the properties of the Poisson processes, a technological change that increases macroeconomic productivity from Aγ to ˜ = φK, follows Aγ+1 and decreases total capital stock from K to K the Poisson process q with 1 with probability λ log(Z/n)dt, dq = (32) 0 with probability 1 − λ log(Z/n)dt, where dq is the increment of the process q. Noting (1), (11),(22),(26), (27) and (28), the ratio of total labor income n (Wj Lj + vZj ) to national income
n j=1
j=1
Aγj Yj can be written as follows:
n
n γj j=1 (Wj Lj + vZj ) j=1 f (lj )lj Kj A + vZ n = n γj γj j=1 A Yj j=1 f (lj )Kj A n n θ γj γj f (l)l + θλ/σ j=1 f (lj )lj Kj A + σ λ j=1 Kj A n , = = γ j f (l) j=1 f (lj )Kj A
which is a constant. This result can be rephrased as follows: Proposition 3. The share of labor in national income is constant. 434
Employment Cycles, Growth, and Creative Destruction 14.7
Cycles
Noting (11) and (29), unemployment is expressed as U = m − L − Z = m − lK/a − Z,
(33)
where m the number of workers, L employment in production and Z labor devoted to R&D. Because there is no trend for unemployment U by assumption 2, there is no trend for K/a either and 1 dK a˙ =E , a K dt
(34)
where E is the expectations operator. There is a theoretical possibility that total employment L + Z hits the total supply of labor, m, which would excessively complicate the dynamics of the model. To eliminate this, the number of workers, m, is assumed to be large. Noting (33) and (28), one obtains that the economy never attains full employment, if U > 0 and K(t) 1−ζ =