Stochastic Analysis with Financial Applications: Hong Kong 2009 [1 ed.] 3034800967, 9783034800969

Stochastic analysis has a variety of applications to biological systems as well as physical and engineering problems, an

254 6 4MB

English Pages 430 [440] Year 2011

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Front Matter....Pages i-ix
Front Matter....Pages 1-1
Dirichlet Forms for Poisson Measures and Lévy Processes: The Lent Particle Method....Pages 3-20
Stability of a Nonlinear Equation Related to a Spatially-inhomogeneous Branching Process....Pages 21-32
Backward Stochastic Difference Equations with Finite States....Pages 33-42
On a Forward-backward Stochastic System Associated to the Burgers Equation....Pages 43-59
On the Estimate for Commutators in DiPerna–Lions Theory....Pages 61-71
Approximation Theorem for Stochastic Differential Equations Driven by G -Brownian Motion....Pages 73-81
Stochastic Flows for Nonlinear SPDEs Driven by Linear Multiplicative Space-time White Noises....Pages 83-97
Optimal Stopping Problem Associated with Jump-diffusion Processes....Pages 99-120
A Review of Recent Results on Approximation of Solutions of Stochastic Differential Equations....Pages 121-144
Strong Consistency of Bayesian Estimator Under Discrete Observations and Unknown Transition Density....Pages 145-167
Exponentially Stable Stationary Solutions for Delay Stochastic Evolution Equations....Pages 169-178
Robust Stochastic Control and Equivalent Martingale Measures....Pages 179-189
Multi-valued Stochastic Differential Equations Driven by Poisson Point Processes....Pages 191-205
Sensitivity Analysis for Jump Processes....Pages 207-219
Quantifying Model Uncertainties in Complex Systems....Pages 221-252
Front Matter....Pages 253-253
Convertible Bonds in a Defaultable Diffusion Model....Pages 255-298
A Convexity Approach to Option Pricing with Transaction Costs in Discrete Models....Pages 299-315
Completeness and Hedging in a Lévy Bond Market....Pages 317-330
Asymptotically Efficient Discrete Hedging....Pages 331-346
Efficient Importance Sampling Estimation for Joint Default Probability:The First Passage Time Problem....Pages 347-359
Front Matter....Pages 253-253
Market Models of Forward CDS Spreads....Pages 361-411
Optimal Threshold Dividend Strategies under the Compound Poisson Model with Regime Switching....Pages 413-429
Recommend Papers

Stochastic Analysis with Financial Applications: Hong Kong 2009 [1 ed.]
 3034800967, 9783034800969

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Progress in Probability Volume 65

Series Editors Charles Newman Sidney I. Resnick

For other volumes published in this series, go to www.springer.com/series/4839

Stochastic Analysis with Financial Applications Hong Kong 2009

Arturo Kohatsu-Higa Nicolas Privault Shuenn-Jyi Sheu Editors

Editors Arturo Kohatsu-Higa Department of Mathematical Sciences Japan Science and Technology Agency Ritsumeikan University 1-1-1 Nojihigashi, Kusatsu Shiga, 525-8577 Japan [email protected]

Nicolas Privault Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang Technological University SPMS-MAS-05-43, 21 Nanyang Link Singapore 637371 [email protected]

Shuenn-Jyi Sheu Department of Mathematics National Central University No. 300, Zhongda Rd., Zhongli City Taoyuan County 320 Taiwan (R.O.C.) [email protected]

2010 Mathematical Subject Classification 60H, 65C, 91B, 91G, 93E ISBN 978-3-0348-0096-9 DOI 10.1007/978-3-0348-0097-6

e-ISBN 978-3-0348-0097-6

Library of Congress Control Number: 2011933482 © Springer Basel AG 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in other ways, and storage in data banks. For any kind of use, permission of the copyright owner must be obtained. Cover design: deblik, Berlin Printed on acid-free paper Springer Basel AG is part of Springer Science+Business Media www.birkhauser-science.com

Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii

List of Speakers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

Part I: Stochastic Analysis N. Bouleau and L. Denis Dirichlet Forms for Poisson Measures and L´evy Processes: The Lent Particle Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

S. Chakraborty, E.T. Kolkovska and J.A. L´ opez-Mimbela Stability of a Nonlinear Equation Related to a Spatially-inhomogeneous Branching Process . . . . . . . . . . . . . . . . . . . . . . . . .

21

S.N. Cohen and R.J. Elliott Backward Stochastic Difference Equations with Finite States . . . . . . . .

33

A.B. Cruzeiro and E. Shamarova On a Forward-backward Stochastic System Associated to the Burgers Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

S. Fang and H. Lee On the Estimate for Commutators in DiPerna–Lions Theory . . . . . . . .

61

F. Gao and H. Jiang Approximation Theorem for Stochastic Differential Equations Driven by G-Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

B. Goldys and X. Zhang Stochastic Flows for Nonlinear SPDEs Driven by Linear Multiplicative Space-time White Noises . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

Y. Ishikawa Optimal Stopping Problem Associated with Jump-diffusion Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99

B. Jourdain and A. Kohatsu-Higa A Review of Recent Results on Approximation of Solutions of Stochastic Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

121

vi

Contents

A. Kohatsu-Higa, N. Vayatis and K. Yasuda Strong Consistency of Bayesian Estimator Under Discrete Observations and Unknown Transition Density . . . . . . . . . . . . . . . . . . . . . . 145 J. Luo Exponentially Stable Stationary Solutions for Delay Stochastic Evolution Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

169

B. Øksendal and A. Sulem Robust Stochastic Control and Equivalent Martingale Measures . . . . . 179 J. Ren and J. Wu Multi-valued Stochastic Differential Equations Driven by Poisson Point Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 A. Takeuchi Sensitivity Analysis for Jump Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 J. Yang and J. Duan Quantifying Model Uncertainties in Complex Systems . . . . . . . . . . . . . . . 221 Part II: Financial Applications T.R. Bielecki, S. Cr´epey, M. Jeanblanc and M. Rutkowski Convertible Bonds in a Defaultable Diffusion Model . . . . . . . . . . . . . . . . . 255 T.-S. Chiang and S.-J. Sheu A Convexity Approach to Option Pricing with Transaction Costs in Discrete Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 J.M. Corcuera Completeness and Hedging in a L´evy Bond Market . . . . . . . . . . . . . . . . . . 317 M. Fukasawa Asymptotically Efficient Discrete Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 C.-H. Han Efficient Importance Sampling Estimation for Joint Default Probability: The First Passage Time Problem . . . . . . . . . . . . . . . . . . . . . . . 347 L. Li and M. Rutkowski Market Models of Forward CDS Spreads . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 J. Wei, H. Yang and R. Wang Optimal Threshold Dividend Strategies under the Compound Poisson Model with Regime Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

413

Preface The Workshop on Stochastic Analysis and Finance took place at City University of Hong Kong from June 29 to July 03, 2009. The goal of this workshop was to present a broad overview of the range of applications of stochastic analysis and its recent theoretical developments, while giving some weight to the research being carried out in the East Asia region. The topics of the talks given in the conference ranged from mathematical aspects of the theory of stochastic processes, to their applications to finance. This is reflected in the organization of the volume which is split into two sections on stochastic analysis and on financial applications. In recent times the applications of stochastic analysis to finance and insurance have bloomed exponentially, and for this reason we have devoted to them a significant attention. Stochastic analysis has also a variety of other applications to biological systems, physical and engineering problems, requiring the development of advanced techniques, a representative sample of which is also included here. A large number of articles in this volume deal with stochastic equations, and in particular stochastic (partial) differential equations and stochastic delay equations which arise naturally in physical systems depending on time and space. Contributions dealing with the numerical simulation and error analysis of these stochastic systems, which are an obligatory step before carrying out the actual applications, are also included and can also be of crucial importance to finance. The important and difficult topics of statistical estimation of parameters in these models, as well as their control and robustness, are also treated in this volume. The conference has also covered two areas that are growing rapidly. First the area of backward stochastic differential equations and all its variants that have deep connections with non-linear partial differential equations. Secondly, the recent developments of (non-linear) G-Brownian motion and its potential uses in risk analysis, which are also opening a new venue of development for stochastic analysis. From a technical point of view, the existence of densities of random variables associated with stochastic differential equations is an important matter, for which a quite complete basic theory is already available for continuous diffusion processes. The case of jump processes, which has already been the object of many important advances, is still in need of many developments that are motivated by applications, as shown in this volume. Concerning the applications to finance, many of the articles deal with a topic that has taken by storm our current society, which is how to deal with the valuation and hedging of credit risk in various forms. The

viii

Preface

results presented in the financial applications section cover in particular pricing and hedging in credit risk and jump models, including recent results on markets with frictions such as transaction costs, and L´evy driven market models. The articles contained in these proceedings are survey articles and original research papers which have been peer reviewed, and we take this opportunity to thank the colleagues who have largely contributed with their time as referees. We also thank the contributors for answering our requests to improve the presentation and results in order to produce a high quality volume, and all workshop participants for lively discussions. The participants and organizers are also grateful to the Lee Hysan Foundation, the Hong Kong Pei Hua Education Foundation, and the Department of Mathematics at City University of Hong Kong, for their generous financial support and for providing the conference venue. Last but not least, we acknowledge Ms Lonn Chan of the Mathematics General Office, whose highly efficient organizational skills ensured the complete success of the event. October 2010

Arturo Kohatsu-Higa Nicolas Privault Shuenn-jyi Sheu

List of Speakers Bouleau, N. Corcuera, J.M. Cruzeiro, A.B. Da Fonseca, J. Dai, M. Denis, L. Di Nunno, G. Dong, Z. Duan, J. El Khatib, Y. Elliott, R.J. Fang, S. Fukasawa, M. Gao, F.Q. Han C.H. Hayashi, M. Hwang, C.R. Ishikawa, Y. Jeanblanc, M. Kohatsu Higa, A. Li, X.D. L´ opez Mimbela, J.A. Øksendal, B. Ren, J. Rutkowski, M. Sheu, S.J. Shin, Y.H. Siu, T.K. Takeuchi, A. Wu, C.T. Yang, H. Yasuda, K. Ying, J. Zambrini, J.-C. Zhang, X. Zheng, H.

ENPC Paris, France University of Barcelona, Spain IST Lisbon, Portugal Auckland University of Technology, New Zealand National University of Singapore University of Evry, France University of Oslo, Norway Chinese Academy of Sciences, China Illinois Institute of Technology, USA United Arab Emirates University University of Calgary, Canada University of Bourgogne, France Osaka University, Japan Wuhan University, China National Tsing-Hua University, Taiwan Kyoto University, Japan Academia Sinica, Taiwan Ehime University, Japan University of Evry, France Ritsumeikan University, Japan Fudan University, China CIMAT Guanajuato, Mexico University of Oslo, Norway Sun Yat-sen University, China University of Sydney, Australia National Central University, Taiwan KAIST Daejeon, Korea Macquarie University, Australia Osaka City University, Japan National Chiao Tung University, Taiwan University of Hong Kong, China Hosei University, Japan Fudan University, China University of Lisbon, Portugal Wuhan University, China Imperial College London, U.K.

Part I Stochastic Analysis

Progress in Probability, Vol. 65, 3–20 c 2011 Springer Basel AG 

Dirichlet Forms for Poisson Measures and L´evy Processes: The Lent Particle Method Nicolas Bouleau and Laurent Denis Abstract. We present a new approach to absolute continuity of laws of Poisson functionals. The theoretical framework is that of local Dirichlet forms as a tool for studying probability spaces. The argument gives rise to a new explicit calculus that we present first on some simple examples: it consists in adding a particle and taking it back after computing the gradient. Then we apply the method to SDE’s driven by Poisson measure. Mathematics Subject Classification (2000). Primary 60G57, 60H05; secondary 60J45, 60G51. Keywords. Stochastic differential equation, Poisson functional, Dirichlet form, energy image density, L´evy processes, gradient, carr´e du champ.

1. Introduction In order to situate the method it is worth to emphasize some features of the Dirichlet forms approach with comparison to the Malliavin calculus which is generally better known among probabilists. First the arguments hold under only Lipschitz hypotheses: for example the method applies to a stochastic differential equation with Lipschitz coefficients. Second a general criterion exists, (EID) the Energy Image Density property, (proved on the Wiener space for the Ornstein-Uhlenbeck form, still a conjecture in general cf. Bouleau-Hirsch [7] but established in the case of random Poisson measures with natural hypotheses) which ensures the existence of a density for a Rd -valued random variable. Third, Dirichlet forms are easy to construct in the infinite-dimensional frameworks encountered in probability theory and this yields a theory of errors propagation through the stochastic calculus, especially for finance and physics cf. Bouleau [2], but also for numerical analysis of PDE’s and SPDE’s cf. Scotti [18]. Our aim is to extend, thanks to Dirichlet forms, the Malliavin calculus applied to the case of Poisson measures and SDE’s with jumps. Let us recall that in the

4

N. Bouleau and L. Denis

case of jumps, there are several ways for applying the ideas of Malliavin calculus. The works are based either on the chaos decomposition (Nualart-Vives [14]) and provide tools in analogy with the Malliavin calculus on Wiener space, but nonlocal (Picard [15], Ishikawa-Kunita [12]) or dealing with local operators acting on the size of the jumps using the expression of the generator on a sufficiently rich class and closing the structure, for instance by Friedrichs’ argument (cf. especially Bichteler-Gravereaux-Jacod [1], Coquio [8] and Ma-R¨ ockner [13]). We follow a way close to this last one. We will first expose the method from a practical point of view, in order to show how it acts on concrete cases. Then in a separate part we shall give the main elements of the proof of the main theorem on the lent particle formula. Eventually we will display several examples where the method improves known results. Then, in the last section, we shall apply the lent particle method to SDE’s driven by a Poisson measure or a L´evy process. Complete details of the proofs and hypotheses for getting (EID) are published in [3] and [4].

2. The lent particle method Consider a random Poisson measure as a distribution of points, and let us see a L´evy process as defined by a Poisson random measure, that is let us think on the configuration space. We suppose the particles live in a space (called the bottom space) equipped with a local Dirichlet form with carr´e du champ and gradient. This makes it possible to construct a local Dirichlet form with carr´e du champ on the configuration space (called the upper space). To calculate for some functional the Malliavin matrix – which in the framework of Dirichlet forms becomes the carr´e du champ matrix – the method consists first in adding a particle to the system. The functional then possesses a new argument which is due to this new particle. We can compute the bottom-gradient of the functional with respect to this argument and as well its bottom carr´e du champ. Then taking back the particle we have added does not erase the new argument of the obtained functional. We can integrate the new argument with respect to the Poisson measure and this gives the upper carr´e du champ matrix – that is the Malliavin matrix. This is the exact summary of the method. 2.1. Let us give more details and notation Let (X, X , ν, d, γ) be a local symmetric Dirichlet structure which admits a carr´e du champ operator. This means that (X, X , ν) is a measured space, ν is σ-finite and the bilinear form e[f, g] = 12 γ[f, g] dν is a local Dirichlet form with domain d ⊂ L2 (ν) and carr´e du champ γ (cf. Fukushima-Oshima-Takeda [10] in the locally compact case and Bouleau-Hirsch [7] in a general setting). (X, X , ν, d, γ) is called the bottom space. Consider a Poisson random measure N on [0, +∞[×X with intensity measure dt × ν. A Dirichlet structure may be constructed canonically on the probability

Dirichlet Forms for Poisson Measures and L´evy Processes

5

space of this Poisson measure that we denote (Ω1 , A1 , P1 , D, Γ). We call this space the upper space. D is a set of functions in the domain of Γ, in other words a set of random variables which are functionals of the random distribution of points. The main result is the following formula: For all F ∈ D   +∞

ε− (γ[ε+ F ]) dN

Γ[F ] =

0

X

in which ε+ and ε− are the creation and annihilation operators. Let us explain the meaning and the use of this formula on an example.

2.2. First example Let Yt be a centered L´evy process with L´evy measure ν integrating x2 . We assume that ν is such that a local Dirichlet structure may be constructed on R\{0} with carr´e du champ γ[f ] = x2 f ′2 (x). The notion of gradient in the sense of Dirichlet forms is explained in [7] Chapter V. It is a linear operator with values in an auxiliary Hilbert space giving the carr´e du champ by taking the square of the Hilbert norm. It is convenient to choose as Hilbert space a space L2 of a probability space. Here we define a gradient ♭ associated with γ by choosing ξ such that  1  1 ξ 2 (r)dr = 1 ξ(r)dr = 0 and 0

0

and putting

f ♭ = xf ′ (x)ξ(r). Practically ♭ acts as a derivation with the chain rule (ϕ(f ))♭ = ϕ′ (f ).f ♭ (for ϕ ∈ C 1 ∩ Lip or even only Lipschitz). N is the Poisson random measure associated with Y with intensity dt × σ  t ˜ (dsdx) for h ∈ L2 (R+ ). such that 0 h(s) dYs = 1[0,t] (s)h(s)xN loc We study the regularity of  t ϕ(Ys− )dYs V = 0

1

where ϕ is Lipschitz and C . 1) We add a particle (α, x), i.e., a jump to Y at time α with size x what gives  t ε+ V = V + ϕ(Yα− )x + (ϕ(Ys− + x) − ϕ(Ys− ))dYs . ]α



2) V = 0 since V does not depend on x, and    t + ♭ ′ (ε V ) = ϕ(Yα− )x + ϕ (Ys− + x)xdYs ξ(r) ]α



because x = xξ(r).

6

N. Bouleau and L. Denis 3) We compute γ[ε+ V ] =



(ε+ V )♭2 dr = (ϕ(Yα− )x +



t

ϕ′ (Ys− + x)xdYs )2 .



4) We take back the particle what gives ε− γ[ε+ V ] = (ϕ(Yα− )x +



t

ϕ′ (Ys− )xdYs )2



 and compute Γ[V ] = ε− γ[ε+V ]dN (lent particle formula) 2  t   ′ ϕ (Ys− )dYs x2 N (dαdx) Γ[V ] = ϕ(Yα− ) + ]α

=



αt

∆Yα2 (



t

ϕ′ (Ys− )dYs + ϕ(Yα− ))2



where ∆Yα = Yα − Yα− . For real functional, the condition (EID) is always fulfilled: V possesses a density as soon as Γ[V ] > 0. Then the above expression may be used to discuss the strict positivity of Γ[V ] depending on the finite or infinite mass of ν cf. [4] Example 5.2. Before giving a typical set of assumptions that the L´evy measure ν has to fulfill, let us explicit the (EID) property. 2.3. Energy Image Density property (EID) A Dirichlet form on L2 (Λ) (Λ σ-finite) with carr´e du champ γ satisfies (EID) if, for any d and all U with values in Rd whose components are in the domain of the form, the image by U of the measure with density with respect to Λ the determinant of the carr´e du champ matrix is absolutely continuous with respect to the Lebesgue measure, i.e., U∗ [(detγ[U, U t]) · Λ] ≪ λd . This property is true for the Ornstein-Uhlenbeck form on the Wiener space, and in several other cases cf. Bouleau-Hirsch [7]. It was conjectured in 1986 that it were always true. It is still a conjecture. It is therefore necessary to prove this property in the context of Poisson random measures. With natural hypotheses, cf. [4] Parts 2 and 4, as soon as (EID) is true for the bottom space, (EID) is true for the upper space. Our proof uses a result of Shiqi Song [19]. 2.4. Main example of bottom structure in Rd Let (Yt )t  0 be a centered d-dimensional L´evy process without gaussian part, with L´evy measure ν = kdx. Under standard hypotheses, we have the following representation:  t ˜ (ds, dx), Yt = xN 0

Rd

Dirichlet Forms for Poisson Measures and L´evy Processes

7

˜ is a compensated Poisson measure with intensity dt × kdx. In this case, where N the idea is to introduce an ad-hoc Dirichlet structure on Rd . The following example gives a case of such a structure (d, e) which satisfies all the required hypotheses and which is flexible enough to encompass many cases: Lemma 1. Let r ∈ N∗ , (X, X ) = (Rr , B(Rr )) and ν = kdx where k is non-negative and Borelian. We are given ξ = (ξi,j )1  i,j  r an Rr×r -valued and symmetric Borel function. We assume that there exist an open set O ⊂ Rr and a function ψ continuous on O and null on Rr \ O such that 1) k > 0 on O ν-a.e. and is locally bounded on O. 2) ξ is locally bounded and locally elliptic on O. 3) k  ψ > 0 ν-a.e. on O. 1 4) for all i, j ∈ {1, . . . , r}, ξi,j ψ belongs to Hloc (O). 2 We denote by H the subspace of functions f ∈ L (ν) ∩ L1 (ν) such that the restriction of f to O belongs to Cc∞ (O). Then, the bilinear form defined by r   ∀f, g ∈ H, e(f, g) = ξi,j (x)∂i f (x)∂j g(x)ψ(x) dx i,j=1

O

is closable in L2 (ν). Its closure, (d, e), is a local Dirichlet form on L2 (ν) which admits a carr´e du champ γ: ∀f ∈ d, γ(f )(x) =

r 

ξi,j (x)∂i f (x)∂j f (x)

i,j=1

ψ(x) . k(x)

Moreover, it satisfies property (EID). Remark. In the case of a L´evy process, we will often apply this lemma with ξ the identity mapping. We shall often consider an open domain of the form O = {x ∈ Rd ; |x| < ε} which means that we “differentiate” only w.r.t. small jumps and hypothesis 3. means that we do not need to assume regularity on k but only that k dominates a regular function. 2.5. Multivariate example Consider as in the previous section, a centered L´evy process without gaussian part Y such that its L´evy measure ν satisfies assumptions of Lemma 1 (which imply 1 + ∆Ys = 0 a.s.) with d = 1 and ξ(x) = x2 . We want to study the existence of density for the pair (Yt , Exp(Y )t ) where Exp(Y ) is the Dol´eans exponential of Y .  Exp(Y )t = eYt (1 + ∆Ys )e−∆Ys . st

1) We add a particle (α, y), i.e., a jump to Y at time α  t with size y:  Yt +y ε+ (1 + ∆Ys )e−∆Ys (1 + y)e−y = Exp(Y )t (1 + y). (α,y) (Exp(Y )t ) = e st

8

N. Bouleau and L. Denis . 2) We compute γ[ε+ Exp(Y )t ](y) = (Exp(Y )t )2 y 2 ψ(y) k(y) 3) We take back the particle:  2 ψ(y) ε− γ[ε+Exp(Y )t ] = Exp(Y )t (1 + y)−1 y 2 k(y)

we integrate w.r.t. N and that gives the upper carr´e du champ operator (lent particle formula):   2 ψ(y) Γ[Exp(Y )t ] = Exp(Y )t (1 + y)−1 y 2 N (dα, dy) k(y) [0,t]×R   2 ψ(∆Yα ) Exp(Y )t (1 + ∆Yα )−1 = ∆Yα2 . k(∆Yα ) αt

By a similar computation the matrix Γ of the pair (Yt , Exp(Yt )) is given by

 1 Exp(Y )t (1 + ∆Yα )−1 ψ(∆Yα )   Γ= ∆Yα2 . 2 k(∆Yα ) Exp(Y )t (1 + ∆Yα )−1 Exp(Y )t (1 + ∆Yα )−1 αt

Hence under hypotheses implying (EID), such as those of Lemma 1, the density of the pair (Yt , Exp(Yt )) is yielded by the condition



1 dim L α ∈ JT =2 Exp(Y )t (1 + ∆Yα )−1

where JT denotes the jump times of Y between 0 and t. Making this in details we obtain Let Y be a real L´evy process with infinite L´evy measure with density dominating near 0 a positive continuous function, then the pair (Yt , Exp(Y )t ) possesses a density on R2 .

3. Demonstration of the lent particle formula 3.1. The construction Let us recall that (X, X , ν, d, γ) is a local Dirichlet structure with carr´edu champ called the bottom space, ν is σ-finite and the bilinear form e[f, g] = 21 γ[f, g] dν is a local Dirichlet form with domain d ⊂ L2 (ν) and with carr´e du champ γ. We assume {x} ∈ X for all x ∈ X and ν is diffuse. The associated generator is denoted a, its domain is D(a) ⊂ d. We consider a random Poisson measure N , on [0, +∞[×X with intensity dt×ν. It is defined on (Ω1 , A1 , P1 ) where Ω1 is the configuration space of countable sums of Dirac masses on [0, +∞[×X, A1 is the σ-field generated by N and P1 is the law of N . (Ω1 , A1 , P1 ) is called the upper space. The question is to construct a Dirichlet structure on the upper space, induced “canonically” by the Dirichlet structure of the bottom space.

Dirichlet Forms for Poisson Measures and L´evy Processes

9

This question is natural by the following interpretation. The bottom structure may be thought as the elements for the description of a single particle moving according to a symmetric Markov process associated with the bottom Dirichlet form. Then considering an infinite family of independent such particles with initial law given by (Ω1 , A1 , P1 ) shows that a Dirichlet structure can be canonically considered on the upper space (cf. the introduction of [4] for different ways of tackling this question). Because of typical formulas on functions of the form eiN (f ) related to the Laplace functional, we consider the space of test functions D0 to be the set of ˜ are the linear of variables of the form eiN (f ) elements in L2 (P1 ) which   combinations 2 1 2 where f belongs to D(a) ⊗ L (dt) L (dt×ν) and is such that γ[f ] ∈ L (dt×ν), ˜ = N − dt × ν. recall that N Remark 1. As we need D0 to be a dense subset, we make what we call a Bottom core hypothesis. Namely we assume that there exists a subspace H of D(a) L1 (ν), dense in L1 (ν)∩L2 (ν) and such that ∀f ∈ H, γ[f ] ∈ L2 (ν) (see [4] for more details on the technical hypotheses we adopt). ˜ If U = p λp eiN(fp ) belongs to D0 , we put  ˜ ˜ (a[fp ]) − 1 N (γ[fp ])), (1) A0 [U ] = λp eiN(fp ) (iN 2 p where, in a natural way, if f (x, t) = l ul (x)ϕl (t) ∈ D(a) ⊗ L2 (dt)   a[f ] = a[ul ]ϕl and γ[f ] = γ[ul ]ϕl . l

l

In order to show that A0 is uniquely defined and is the generator of a Dirichlet form satisfying the required properties, starting from a gradient of the bottom structure we construct a gradient for the upper structure defined first on the test functions. Then we show that this gradient does not depend on the form of the test function and this allows to extend the operators thanks to Friedrichs’ property yielding the closedness of the upper structure. 3.2. The bottom gradient We suppose the space d separable, then there exists a gradient for the bottom space, i.e., there is a separable Hilbert space and a linear map D from d into L2 (X, ν; H) such that ∀u ∈ d, D[u] 2H = γ[u], then necessarily – If F : R → R is Lipschitz then ∀u ∈ d, D[F ◦ u] = (F ′ ◦ u)Du, – If F is C 1 and Lipschitz from Rd into R then D[F ◦ u] = di=1 (Fi′ ◦ u)D[ui ] ∀u = (u1 , . . . , ud ) ∈ dd . We take for H a space L2 (R, R, ρ) where (R, R, ρ) is a probability space s.t. 2 L (R, R, ρ) is infinite dimensional. The gradient D is denoted by ♭: ∀u ∈ d, Du = u♭ ∈ L2 (X × R, X ⊗ R, ν × ρ).

10

N. Bouleau and L. Denis

Without loss of generality, we assume moreover that the operator ♭ takes its values in the orthogonal space of 1 in L2 (R, R, ρ). So that we have  ∀u ∈ d, u♭ dρ = 0 ν-a.e. 3.3. Candidate gradient for the upper space Now, we introduce the creation operator (resp. annihilation operator) which consists in adding (resp. removing if necessary) a jump at time t with size u: ε+ / w1 } (t,u) (w1 ) = w1 1{(t,u)∈supp w1 } + (w1 + ε(t,u) })1{(t,u)∈supp

ε− / w1 } + (w1 − ε(t,u) })1{(t,u)∈supp w1 } . (t,u) (w1 ) = w1 1{(t,u)∈supp In a natural way, we extend these operators to the functionals by − − ε+ H(w1 , t, u) = H(ε+ (t,u) w1 , t, u) ε H(w1 , t, u) = H(ε(t,u) w1 , t, u).

Definition. For F ∈ D0 , we define the pre-gradient  +∞  F♯ = ε− ((ε+ F )♭ ) dN ⊙ ρ, 0

X×R

where N ⊙ρ is the point process N “marked” by ρ, i.e., if N is the family of marked points (Ti , Xi ), N ⊙ ρ is the family (Ti , Xi , ri ) where the ri are new independent random variables mutually independent and identically distributed with law ρ, ˆ A, ˆ P). ˆ So N ⊙ ρ is a Poisson random defined on an auxiliary probability space (Ω, measure on [0, +∞[×X × R.

3.4. Main result The above candidate may be shown to extend in a true gradient for the upper structure. The argument is based on the extension of the pregenerator A0 thanks to Friedrichs’ property (cf. for instance [7] p. 4): A0 is shown to be well defined on D0 which is dense, A0 is non-positive and symmetric and therefore possesses a selfadjoint extension. Before stating the main theorem, let us introduce some notation. We denote by D the completion of D0 ⊗ L2 ([0, +∞[, dt) ⊗ d with respect to the norm  +∞ 

21 − H D = E ε (γ[H])(w, t, u)N (dt, du) 0

+E



 = E

0

0

+E

X

+∞





+∞ 

0

+∞

X

X



(ε− |H|)(w, t, u)η(t, u)N (dt, du)

12 γ[H](w, t, u)ν(du)dt

X

|H|(w, t, u)η(t, u)ν(du)dt,

where η is a fixed positive function in L2 (R+ × X, dt × dν).

Dirichlet Forms for Poisson Measures and L´evy Processes

11

As we shall see below, a peculiarity of the method comes from the fact that it involves, in the computation, successively mutually singular measures, such as measures PN = P1 (dω)N (ω, dt, dx) and P1 × dt × ν. This imposes some care in the applications. Main theorem. The formula



∀F ∈ D, F ♯ =

+∞ 0



X×R

ε− ((ε+ F )♭ ) dN ⊙ ρ,

extends from D0 to D, it is justified by the following decomposition: ε+ −I

F ∈ D → ε+ F − F ∈ D

ε− ((.)♭ )



ε− ((ε+ F )♭ ) ∈ L20 (PN × ρ)

d(N ⊙ρ)



ˆ F ♯ ∈ L2 (P1 × P)

where each operator is continuous on the range of the preceding one and where L20 (PN × ρ) is the closed set of elements G in L2 (PN × ρ) such that R Gdρ = 0 PN -a.s. Furthermore for all F ∈ D  +∞  ♯ 2 ˆ ε− γ[ε+ F ] dN. (2) Γ[F ] = E(F ) = 0

X

Let us explain the steps of a typical calculation applying this theorem. Let H = Φ(F1 , . . . , Fn ) with Φ ∈ C 1 ∩ Lip(Rn ) and F = (F1 , . . . , Fn ) with Fi ∈ D, we have:  a) γ[ε+ H] = Φ′i (ε+ F )Φ′j (ε+ F )γ[ε+ Fi , ε+ Fj ] P × ν-a.e. ij



+

b) ε γ[ε H] = c) Γ[H] =





Φ′i (F )Φ′j (F )ε− γ[ε+ Fi , ε+ Fj ] PN -a.e.

ij



+

ε γ[ε H]dN =

 ij

Φ′i (F )Φ′j (F )



ε− γ[ε+ Fi , ε+ Fj ]dN P-a.e.

Let us eventually remark that the lent particle formula (2) has been encountered previously by some authors for test functions (see, e.g., [17] before Prop. 8). Here, it is proved on the whole domain D, this is essential to apply the method to SDE’s and to exploit the full strength of the functional calculus of Dirichlet forms.

4. Applications 4.1. Sup of a stochastic process on [0, t] The fact that the operation of taking the maximum is typically a Lipschitz operation makes it easy to apply the method. Let Y be a centered L´evy process as in §2.2. Let K be a c` adl` ag process independent of Y . We put Hs = Ys + Ks . Proposition. If ν(R\{0}) = +∞ and if P1 [sups  t Hs = H0 ] = 0, the random variable sups  t Hs has a density.

12

N. Bouleau and L. Denis

As a consequence, any L´evy process starting from zero and immediately entering R∗+ , whose L´evy measure dominates a measure ν satisfying Hamza condition and infinite, is such that sups  t Xs has a density. Let us recall that the Hamza condition (cf. Fukushima et al. [10] Chapter 3) gives a necessary and sufficient condition of existence of a Dirichlet structure on L2 (ν). Such a necessary and sufficient condition is only known in dimension one. 4.2. Regularity without H¨ ormander Consider the following SDE driven by a two-dimensional Brownian motion  1 t  Xt = z1 + 0 dBs1   t t Xt2 = z2 + 0 2Xs1 dBs1 + 0 dBs2   t t  3 Xt = z3 + 0 Xs1 dBs1 + 2 0 dBs2 .

(3)

This diffusion is degenerate and the H¨ormander conditions are not fulfilled. The generator is A = 21 (U12 + U22 ) + V and its adjoint A∗ = 12 (U12 + U22 ) − V with ∂ ∂ ∂ ∂ ∂ U1 = ∂x + 2x1 ∂x + x1 ∂x , U2 = ∂x + 2 ∂x and V = − ∂z∂ 2 − 12 ∂z∂ 3 . The Lie 1 2 3 2 3 brackets of these vectors vanish and the Lie algebra is of dimension 2: the diffusion remains on the quadric of equation 43 x21 − x2 + 12 x3 − 43 t = C. Consider now the same equation driven by a L´evy process:  1 t Zt = z1 + 0 dYs1    t t Zt2 = z2 + 0 2Zs1− dYs1 + 0 dYs2   t t  3 Zt = z3 + 0 Zs1− dYs1 + 2 0 dYs2

under hypotheses on the L´evy measure such that the bottom space may be equipped with the carr´e du champ operator γ[f ] = y12 f1′2 + y22 f2′2 satisfying the hypotheses yielding (EID). Let us apply in full details the lent particle method. For α  t     y1 y1  t  2Y 1 y + 2 y dYs1 + y2  ε+ α− 1  = Zt +  2y1 Yt1 + y2  , ]α 1 (α,y1 ,y2 ) Zt = Zt +   t 1 y1 Yt1 + 2y2 Yα− y1 + ]α y1 dYs1 + 2y2

1 where we have used Yα− = Yα1 because ε+ send into P1 × dt × ν classes. That gives   2 y12 2Yt1 y12 Yt1 y1 γ[ε+ Zt ] =  id y12 4(Yt1 )2 + y22 y12 2(Yt1 )2 + 2y22  id id y12 (Yt1 )2 + 4y22

and



y12 − + ε γ[ε Zt ] =  id id

y12 2(Yt1 − ∆Yα1 ) 2 y1 4(Yt1 − ∆Yα1 )2 + y22 id

 y12 (Yt1 − ∆Yα1 ) y12 2(Yt1 − ∆Yα1 )2 + 2y22  , y12 (Yt1 − ∆Yα1 )2 + 4y22

Dirichlet Forms for Poisson Measures and L´evy Processes

13

where id denotes the symmetry of the matrices. Hence 

1 2(Yt1 − ∆Yα1 ) Γ[Zt ] = (∆Yα1 )2  id 4(Yt1 − ∆Yα1 )2 id id αt 

 (Yt1 − ∆Yα1 ) 2(Yt1 − ∆Yα1 )2  (Yt1 − ∆Yα1 )2 

0 + (∆Yα2 )2  0 0

 0 0 1 2 . 2 4

With this formula we can reason, trying to find conditions for the determinant of Γ[Z] to be positive. For instance if the L´evy measures of Y 1 and Y 2 are infinite, it follows that Zt has a density as soon as      1 0   dim L  2(Yt1 − ∆Yα1 )  ,  1  α ∈ JT = 3.   (Yt1 − ∆Yα1 ) 2 But Y 1 possesses necessarily jumps of different sizes, hence Zt has a density on R3 . It follows that the integro-differential operator ˜ (z) = Af



 z 1 + y1 f (z) − f  z2 + 2z1 y1 + y2  z3 + z1 y1 + 2y2 



 y1 −(f1′ (z) f2′ (z) f3′ (z))  2z1 y1 + y2  σ(dy1 dy2 ) z1 y1 + 2y2 

is hypoelliptic at order zero, in the sense that its semigroup Pt has a density. No minoration is supposed on the growth of the L´evy measure near 0 as assumed by some authors. This result implies that for any L´evy process Y satisfying the above hypotheses, even a subordinated one in the sense of Bochner, the process Z is never subordinated of the Markov process X solution of equation (3) (otherwise it would live on the same manifold as the initial diffusion).

5. Application to SDE’s driven by a Poisson measure 5.1. The equation we study We consider another probability space (Ω2 , A2 , P2 ) on which an Rn -valued semimartingale Z = (Z 1 , . . . , Z n ) is defined, n ∈ N∗ . We adopt the following assumption on the bracket of Z and on the total variation of its finite variation part. It is satisfied if both are dominated by the Lebesgue measure uniformly:

14

N. Bouleau and L. Denis

Assumption on Z: There exists a positive constant C such that for any square integrable Rn -valued predictable process h:  t  t hs dZs )2 ]  C 2 E[ |hs |2 ds]. (4) ∀t  0, E[( 0

0

We shall work on the product probability space: (Ω, A, P) = (Ω1 ×Ω2 , A1 ⊗A2 , P1 × P2 ). For simplicity, we fix a finite terminal time T > 0. Let d ∈ N∗ , we consider the following SDE:  t  t ˜ (ds, du) + σ(s, Xs− )dZs (5) c(s, Xs− , u)N Xt = x + 0

d

+

0

X

d

d

+

where x ∈ R , c : R × R × X → R and σ : R × Rd → Rd×n satisfy the set of hypotheses below denoted (R). Hypotheses (R): 1. There exists η ∈ L2 (X, ν) such that: a) for all t ∈ [0, T ] and u ∈ X, c(t, ·, u) is differentiable with continuous derivative and ∀u ∈ X,

sup t∈[0,T ],x∈Rd

|Dx c(t, x, u)|  η(u),

b) ∀(t, u) ∈ [0, T ] × X, |c(t, 0, u)|  η(u), c) for all t ∈ [0, T ] and x ∈ Rd , c(t, x, ·) ∈ d and sup t∈[0,T ],x∈Rd

γ[c(t, x, ·)](u)  η 2 (u),

d) for all t ∈ [0, T ], all x ∈ Rd and u ∈ X, the matrix I + Dx c(t, x, u) is invertible and     −1 sup (I + Dx c(t, x, u)) × c(t, x, u)  η(u). t∈[0,T ],x∈Rd

2. For all t ∈ [0, T ], σ(t, ·) is differentiable with continuous derivative and sup t∈[0,T ],x∈Rd

|Dx σ(t, x)| < +∞.

3. As a consequence of hypotheses 1. and 2. above, it is well known that equation (5) admits a unique solution X such that E[supt∈[0,T ] |Xt |2 ] < +∞. We suppose that for all t ∈ [0, T ], the matrix (I + nj=1 Dx σ·,j (t, Xt− )∆Ztj ) is invertible and its inverse is bounded by a deterministic constant uniformly with respect to t ∈ [0, T ].

Remark. We have defined a Dirichlet structure (D, E) on L2 (Ω1 , P1 ). Now, we work on the product space, Ω1 × Ω2 . Using natural notations, we consider from now on that (D, E) is a Dirichlet structure on L2 (Ω, P). In fact, it is the product structure of (D, E) with the trivial one on L2 (Ω2 , P2 ) (see [7] ). Of course, all the properties

Dirichlet Forms for Poisson Measures and L´evy Processes

15

remain true. In other words, we only differentiate w.r.t. the Poisson noise and not w.r.t. the one introduced by Z. 5.2. Spaces of processes and functional calculus We denote by P the predictable sigma-field on [0, T ]×Ω and we define the following sets of processes: – HD : the set of real-valued processes (Ht )t∈[0,T ] , which belong to L2 ([0, T ]; D), i.e., such that  T  T H 2HD = E[ |Ht |2 dt] + E(Ht )dt < +∞. 0

0

– HD,P : the subvector space of predictable processes in HD . – HD⊗d,P : the set of real-valued processes H defined on [0, T ] × Ω × X which are predictable and belong to L2 ([0, T ]; D ⊗ d), i.e., such that   T 2 |Ht |2 ν(du)dt H HD⊗d,P = E 0

+



T 0

X



X

E(Ht (·, u))ν(du)dt + E



T

e(Ht )dt < +∞. 0

The main idea is to differentiate equation (5). To do that, we need some functional calculus. It is given by the next proposition that we prove by approximation: n Proposition 2. Let H ∈ HD⊗d,P and G ∈ HD,P , then:

1) The process

∀t ∈ [0, T ], Xt =

 t 0

˜ (ds, du) H(s, w, u)N X

is a square integrable martingale which belongs to HD and such that the process X − = (Xt− )t∈[0,T ] belongs to HD,P . The gradient operator satisfies for all t ∈ [0, T ]:  t ˜ (ds, du) Xt♯ (w, w) ˆ = H ♯ (s, w, u, w)d ˆ N 0 X (6)  t ♭ H (s, w, u, r)N ⊙ ρ(ds, du, dr). + 0

X×R

2) The process

∀t ∈ [0, T ], Yt =



t

G(s, w)dZs

0

is a square integrable semimartingale which belongs to HD , Y − = (Yt− )t∈[0,T ] belongs to HD,P and  t ♯ G♯ (s, w, w)dZ ˆ (7) ˆ = ∀t ∈ [0, T ], Yt (w, w) s. 0

16

N. Bouleau and L. Denis

5.3. Computation of the Carr´e du champ matrix of the solution Applying the standard functional calculus related to Dirichlet forms, the previous proposition and a Picard iteration argument, we obtain: Proposition 3. The equation (5) admits a unique solution X in HDd . Moreover, the gradient of X satisfies:  t

Xt♯ =

♯ ˜ Dx c(s, Xs− , u) · Xs− N (ds, du) 0 X  t + c♭ (s, Xs− , u, r)N ⊙ ρ(ds, du, dr)

+



0

X×R

t

0

♯ Dx σ(s, Xs− ) · Xs− dZs .

Let us define the Rd×d -valued processes U by dUs =

n 

Dx σ.,j (s, Xs− )dZsj ,

j=1

and the derivative of the flow generated by X: Kt = I +

 t 0

˜ (ds, du) + Dx c(s, Xs− , u)Ks− N X



t

dUs Ks− . 0

Proposition 4. Under our hypotheses, for all t  0, the matrix Kt is invertible and ¯ t = (Kt )−1 satisfies: its inverse K ¯t = I − K − +

 t 0



¯ s− (I + Dx c(s, Xs− , u))−1 Dx c(s, Xs− , u)N ˜ (ds, du) K

X

t

0





¯ s− dUs + K

¯ s− (∆Us )2 (I + ∆Us )−1 K

st t

¯ s d < U c , U c >s . K

0

We are now able to calculate the carr´e du champ matrix. This is done in the next theorem whose proof is sketched to show how simple is the lent particle method. Theorem 5. For all t ∈ [0, T ], Γ[Xt ] = Kt

 t 0

X

¯ s γ[c(s, Xs− , ·)]K ¯ ∗ N (ds, du)K ∗ . K s t

Dirichlet Forms for Poisson Measures and L´evy Processes (α,u)

Proof. Let (α, u) ∈ [0, T ] × X. We put Xt (α,u)

Xt

= x+ + +

 



α 0 α 0



(α,u)

X

c(s, Xs−

= ε+ (α,u) Xt .

˜ (ds, du′ ) , u′ )N

(α,u)

(α,u)

σ(s, Xs− )dZs + c(α, Xα− , u)   (α,u) ′ ˜ ′ c(s, Xs− , u )N (ds, du ) +

]α,t]

17

X

]α,t]

(α,u)

σ(s, Xs−

)dZs .

(α,u)

= Xt if t < α so that, taking the gradient with respect Let us remark that Xt to the variable u, we obtain:   (α,u) ♭ (α,u) (α,u) (α,u) ˜ (ds, du′ ) ) = (c(α, Xα− , u))♭ + (Xt Dx c(s, Xs− , u′ ) · (Xs− )♭ N ]α,t] X  (α,u) (α,u) + Dx σ(s, Xs− ) · (Xs− )♭ dZs . ]α,t]

(α,u)

Let us now introduce the process Kt = ε+ (α,u) (Kt ) which satisfies the following SDE:  t  t (α,u) (α,u) (α,u) ˜ (α,u) Kt Dx c(s, Xs− , u′ )Ks− N =I+ dUs(α,u) Ks− (ds, du′ ) + 0

X

0

¯ (α,u) = (K (α,u) )−1 . Then, using the flow property, we have: and its inverse K t t (α,u) ♭

∀t  0, (Xt

(α,u)

) = Kt

¯ α(α,u) (c(α, Xα− , u))♭ . K

Now, we calculate the carr´e du champ and then we take back the particle: (α,u)

∀t  0, ε− (α,u) γ[(Xt

¯ α γ[c(α, Xα− , ·)]K ¯ α∗ Kt∗ . )] = Kt K

Finally integrating with respect to N we get  t ¯ s γ[c(s, Xs− , ·)](u)K ¯ s∗ N (ds, du)Kt∗ . K ∀t  0, Γ[Xt ] = Kt 0



X

5.4. First application: the regular case An immediate consequence of the previous theorem is: Proposition 6. Assume that X is a topological space, that the intensity measure ds × ν of N is such that ν has an infinite mass near some point u0 in X. If the matrix (s, y, u) → γ[c(s, y, ·)](u) is continuous on a neighborhood of (0, x, u0 ) and invertible at (0, x, u0 ), then the solution Xt of (5) has a density for all t ∈]0, T ].

18

N. Bouleau and L. Denis

5.5. Application to SDE’s driven by a L´evy process Let Y be a L´evy process with values in Rd , independent of another variable X0 . We consider the following equation  t a(Xs− , s) dYs , t  0 Xt = X0 + 0

k

+

where a : R × R → R

k×d

is a given map.

Proposition 7. We assume that: 1) The L´evy measure, ν, of Y satisfies hypotheses of the example given in Section 2.4 with ν(O) = +∞ and ξi,j (x) = xi δi,j . Then we may choose the operator γ to be d

γ[f ] =

d

ψ(x)  2  xi (∂i f )2 k(x) i=1

1

i=1

for f ∈ C0∞ (Rd ).

2) a is C ∩ Lip with respect to the first variable uniformly in s and sup |(I + Dx a · u)−1 (x, t)|  η(u), t,x

2

where η ∈ L (ν). 3) a is continuous with respect to the second variable at 0, and such that the matrix aa∗ (X0 , 0) is invertible; then for all t > 0 the law of Xt is absolutely continuous w.r.t. the Lebesgue measure. Proof. We just give an idea of the proof in the case d = 1: ψ(u) 2 ′2 u f (u). Let us recall that γ[f ](u) = k(u) t ˜ (ds, du), so that We have the representation: Yt = 0 R uN  t ˜ (ds, du). Xt = X0 + a(s, Xs− )u N 0

R

The lent particle method yields:  t 2 ¯ 2 a2 (s, Xs− )γ[j](u)N (ds, du) Γ[Xt ] = Kt K s 0

X

where j is the identity application: γ[j](u) = So

ψ(u) 2 u . k(u)

 t

¯ 2 a2 (s, Xs− ) ψ(u) u2 N (ds, du) K s k(u) 0 X  ¯ 2 a2 (s, Xs− ) ψ(∆Ys ) ∆Y 2 , = Kt2 K s s k(∆Ys ) α 0 and p0 > 0. Pinsky [10] and Zhang [13] studied both, existence and non-existence, of positive global solutions to the above equation when the sum in the reaction term is of the form a(x)up with p > 1, p0 does not depend on x and p1 = 0. Using a probabilistic approach, in [2] we investigated existence of positive global solutions to semilinear equations of the form ∞  ∂u = ∆α u + pk uk , (2) ∂t k=2

22

S. Chakraborty, E.T. Kolkovska and J.A. L´ opez-Mimbela

where pk ≥ 0 for all k ≥ 2 and k pk = 1. Both equations (1) and (2) admit the probabilistic representation (see [8], [9] or [2])    u(0, z) , u(t, x) = E eLt z∈suppXtx

where Xtx is a branching α-stable motion in Rd (with exponential individual lifetimes and offspring number Z such that P [Z = k] = pk ) starting from an ancestor at x, and Lt is the total length of the family tree up to time t. The approach developed in [2] works not only for the α-Laplacian but also for a wider class of Markov generators. In the framework of the above representation, for suitably decaying initial values u(0) := f , it is the exponential of the tree length that makes the solution u to grow. Hence, in order to get a global solution it is necessary that the motion process be mobile enough as to quickly populate the regions where f is near to 0. Therefore, in order to know for which values of the equation parameters (2) admits global solutions, a careful analysis of the underlying branching model is in order. For simpler semi-linear equations, this was successfully achieved in a number of previous papers (see, e.g., [11, 9, 2]). At the level of the branching model, (1) corresponds to a branching α-stable motion in which the branching rate depends on the space variable in such a way that it can grow and needs not to be away from 01 . These circumstances posse serious technical problems to directly apply the probabilistic approach, and therefore we have to proceed in a more analytical way. In Section 2 we prove existence and uniqueness of mild solutions to Equation (1) on Rd × [0, tmax ), where [0, tmax ) is a maximal existence interval (see Proposition 2). Typically, the “life span” tmax of a positive solution u depends on its initial value u(·, 0); when tmax = +∞ the corresponding solution is called global. Our main result, Theorem 4 below, gives conditions on the equation parameters, and on the decay of the initial value of (1), which are sufficient for existence of a nontrivial positive global solution to (1). Our proof of Theorem 4 relies on a comparison procedure (which is based on Proposition 3; see Section 2), and is carried out in Section 3. The final section is devoted to prove a technical lemma which is necessary to implement the comparison method.

2. Preliminary results We denote by | · | the usual Euclidean norm in Rd , and by · ∞ the norm in L∞ (Rd ). For any T > 0, let CB (Rd × [0, T ]) be the space of bounded continuous functions on Rd × [0, T ], endowed with the supremum norm. Let us write {St , t ≥ 0} for the semigroup of the spherically symmetric α-stable process in Rd , 1 Other relevant aspects of related branching models featuring this form of spatially-inhomogeneous branching have been investigated, e.g., in [5] and [6]; see also [3].

Stability of a Nonlinear Equation namely, St g(x) = E[g(x + Wt )] =



Rd

23

(α)

g(y)pt (y − x) dy,

where g : Rd → R is bounded and measurable, and {Wt , t ≥ 0} is the standard (α) d-dimensional symmetric stable process with transition densities pt (·), t > 0. When α = 2, the infinitesimal generator ∆α of {St , t ≥ 0} is the d-dimensional Laplacian. When 0 < α < 2, the generator is given by  f (x + y) − f (x) dy f ∈ Cb2 (Rd ), ∆α f (x) = γ−α,d P V d+α |y| d R where γα,d is given in (11) and P V stands for Principal Value, see [1]. Let ω(x) = (1 + |x|)m , x ∈ Rd , where 0 ≤ m < α, and d ∞ d L∞ ω (R ) = {f ∈ L (R ) : f ω ∞ < +∞}. d We put f ω ≡ f ω ∞ when f ∈ L∞ ω (R ). We also denote ! " d ∞ d L∞ ([0, T ], L∞ < +∞ , ω (R )) = F : [0, T ] → Lω (R ) : F L∞ [0,T ]

with

F L∞ := ess sup0≤t≤T F(t) ω . [0,T ] Lemma 1. Let T > 0 and α ∈ (0, 2]. There exists a constant C1 (T ) > 0 such that d for all t ∈ [0, T ] and all f ∈ L∞ ω (R ), (St |f |) (x) ≤ (2m + C1 (T )) f ω ω −1 (x). (α)

(α)

Proof. Recall that td/α pt (t1/α x) = p1 (x) for all t > 0 and x ∈ Rd . Therefore  (α) St |f |(x) = pt (x − y)|f (y)| dy Rd  (α) ≤ f ω pt (x − y)ω −1 (y) dy d R (α) = f ω pt (t1/α η)(1 + |x − t1/α η|)−m td/α dη Rd #  $ 

(α) 1/α −m = f ω + p1 (η)(1 + |x − t η|) dη R1 R2 %

  |x| &   R1 := η ∈ Rd : x − t1/α η ≤ ; R2 := Rd \ R1 2 

−m  |x| (α) (α) ≤ f ω p1 (η) dη + p1 (η) dη 1 + 2 R1 Rd 

(α) ≤ f ω p1 (η) dη + 2m ω −1 (x) . (3) R1

24

S. Chakraborty, E.T. Kolkovska and J.A. L´ opez-Mimbela

If |x| > 0, using that |x| − t1/α |η| ≤ |x − t1/α η| ≤ |x|/2 on R1 , we obtain    (α) (α) (α) p1 (η) dη ≤ p1 (η) dη, I1 := p1 (η) dη ≤ |η|≥

R1

|x| −1/α 2 t

|η|≥

|x| −1/α 2 T

hence, % & % & |x| −1/α |x|m 2m T m/α E|W1 |m I1 ≤ Pr |W1 | ≥ T , = Pr |W1 |m ≥ m T −m/α ≤ 2 2 |x|m where E|W1 |m < ∞ due to α > m. Since I1 ≤ 22m T m/α E|W1 |m ω −1 (x) for |x| ≥ 1, and I1 ≤ 1 ≤ 2m ω −1 (x) if |x| < 1, it follows that I1 ≤ 4m T m/α E|W1 |m ω −1 (x)1{|x|≥1} + 2m ω −1 (x)1{|x| 0, u ∈ CB (Rd × [0, T ′ ]) for any 0 < T ′ < T , and   t ∞  u(x, t) = St f (x) + St−s a(·) pk uk (·, s) + (p0 + p1 u(·, s))φ(·) (x) ds 0

k=2

for each (x, t) ∈ Rd × [0, T ). We define a mild upper (lower) solution to (1) by replacing “=” above by “≥” (“≤”). In the proofs of Propositions 2 and 3 below we follow a method of [12]. Proposition 2. Let d > α. Consider the inhomogeneous equation (1), where 1. 0 ≤ a(x) ≤ c1 (1+|x|)m , x ∈ Rd , for some constants c1 > 0 and m ∈ (−∞, α), and 2. 0 ≤ φ(x) ≤ c2 (1 + |x|)−q , x ∈ Rd , for certain constants c2 > 0 and q ∈ (α, d].

d If f ∈ L∞ ω (R ) is continuous, nonnegative and satisfies ∞ 

k=2 ∞ 

k=2

' (k pk f ω (3 + 2m+2 ) < ∞,

(5)

' (k−1 kpk f ω (3 + 2m+2 ) < ∞,

(6)

then there exists a constant Tf > 0 such that Equation (1) has a unique mild solution on Rd × [0, Tf ). Moreover, if Tf < ∞, then limT →T − u(·, T ) ∞ = +∞. f

Stability of a Nonlinear Equation

25

Proof. We deal only with the case of m ≥ 0; if m < 0 the function a is bounded by a constant and the result can be obtained in the same way as for m = 0. Let us define the operator  ∞  t  St−s a F (u) = St f + pk uk + (p0 + p1 u)ϕ ds. (7) 0

k=2

We denote % & ∞ ∞ d = u ∈ L ([0, T ], Lω (R )) : sup u(·, t) − f ω ≤ δ ,

BδT (f )

0≤t≤T

where δ > 0 and T > 0 are going to be specified afterwards. For any u ∈ BδT (f ), |F (u) − f |(x, t)

≤ |f (x)| + St |f |(x) #  t ∞  (α) pt−s (x − y) c1 (1 + |y|)m + pk |u(y, s)|k Rd

0

k=2

$ + c2 (p0 + p1 |u(y, s)|)(1 + |y|)−m dy ds

≤ f ω ω(x)−1 (1 + C1 (T ) + 2m ) + I, where we have used Lemma 1, and    t

(α)

I :=

0





 t 0

 t 0

Rd

Rd

Rd

pt−s (x − y) c1 (1 + |y|)m

(α) pt−s (x

(α) pt−s (x



k=2

pk |u(y, s)|k

( +c2 (p0 + p1 |u(y, s)|)(1 + |y|)−m dy ds

− y) c1 (1 + |y|)m 

∞ 

∞ 

k=2

pk u(·, s) kω ω −k (y)

( +c2 (p0 + p1 u(·, s) ω ω −1 (y))(1 + |y|)−m dy ds

− y) c1

∞ 

k=2

pk u(·, s) kω + c2 (p0 + p1 u(·, s) ω ) ω −1 (y) dy ds.

Noting that u(·, t) ω ≤ f ω + δ for 0 ≤ t ≤ T , from Lemma 1 we get  t (α) pt−s (x − y)ω −1 (y) dy ds ≤ C2 (2m + C1 (T ))T ω −1(x), I ≤ C2 Rd

0

where C2 = c1



k=2

pk ( f ω + δ)k + c2 (p0 + p1 ( f ω + δ)). Therefore,

F (u) − f ω ≤ f ω (1 + 2m + C1 (T )) + C2 T (2m + C1 (T )) ,

(8)

26

S. Chakraborty, E.T. Kolkovska and J.A. L´ opez-Mimbela

where C1 (T ) is given by (4). In a similar way, for any u, v ∈ BδT (f ), |F (u) − F (v)|(x, t)   t ∞  (α) pk |uk − vk |(y, s) ≤ pt−s (x − y) c1 (1 + |y|)m 0





Rd

k=2

 t

(α) pt−s (x

 t

(α) pt−s (x

0

0

Rd

Rd

× u − v ω ω



( +c2 p1 |u − v|(y, s)(1 + |y|)−m dy ds

− y) c1 (1 + |y|)m 

k=2

  kpk |u|k−1 + |v|k−1 (y, s)

( +c2 p1 ω −1 (y) |u − v|(y, s) dy ds

− y) c1

−1

∞ 

∞ 

k=2

(y) dy ds.

  + c2 p1 kpk u(·, s) k−1 + v(·, s) k−1 ω ω

Since max{ u(·, t) ω , v(·, t) ω } ≤ f ω + δ for 0 ≤ t ≤ T , using again Lemma 1 gives |F (u) − F (v)|(x, z) ≤ C3 (2m + C1 (T )) T ω −1 (x) u − v ω , k−1 where C3 = 2c1 ∞ + c2 p1 . Hence, k=2 kpk ( f ω + δ) F (u) − F (v) ω ≤ C3 (2m + C1 (T ))T u − v ω .

Let us now choose δ = 2(2m+1 + 1) f ω and " ! ) * −α/m T < min 2−α (E|W1 |m ) , f ω 2−(m+1) + 1 C2−1 , 2−(m+1) C3−1 := Tf .

Then C2 < ∞ and C3 < ∞ due to assumptions (5) and (6), whereas C1 (T ) = 2m . It follows that F (u) ∈ BδT (f ) and that F is a contraction on BδT (f ). Therefore F possesses a unique fixed point in BδT (f ), which is the unique mild solution to Equation (1) on Rd × [0, Tf ). The fact that limT →T − u(·, T ) ∞ = +∞ when f Tf < ∞, follows from a ladder argument.  Remark. Notice that, due to k pk = 1, assumptions (5) and (6) are trivially fulfilled when f ω < (3 + 2m+2 )−1 . Proposition 3. Under the assumptions in Proposition 2, suppose that u and u are, respectively, upper and lower mild solutions of (1) on Rd × [0, T ). Then the unique mild solution of (1) on Rd × [0, Tf ) satisfies the inequalities u ≤ u ≤ u on Rd × [0, T ) for each T ≤ Tf . Proof. Let us take T+ = min{T, Tf }, where Tf is given in Proposition 2. From the proof of Proposition 2 we know that there exists a T1 ≤ T+ such that the T1 T1 operator F is a contraction !on BδT1 (f ) and maps " Bδ (f ) into Bδ (f ), where δ =  m+2  f ω 2 + 2 . Let B = v ∈ BδT1 (f ) : v ≤ u . Since F is increasing, F (v) ≤

F (u) for any v ∈ BδT1 (f ). In addition, F (u) ≤ u because u is a supersolution

Stability of a Nonlinear Equation

27

to (1). Hence F : B → B, and therefore, F has a unique fixed point in B. Since u is the unique fixed point of F in BδT1 (f ), it follows that u ∈ B and u ≤ u on Rd × [0, T1 ]. By a ladder argument, one can prove that u ≤ u on Rd × [T1 , T2 ] for some T+ ≥ T2 > T1 , and so on. The assertion regarding the inequality u ≤ u can be proved similarly. 

3. Main theorem Theorem 4. Assume that d > α, and let pk , k = 0, 1, . . . , be nonnegative numbers such that ∞ k=0 pk = 1 and 0 < p0 , p1 < 1. Consider the inhomogeneous equation (1), where 1. 0 ≤ a(x) ≤ c1 (1+|x|)m , x ∈ Rd , for some constants c1 > 0 and m ∈ (−∞, α), 2. 0 ≤ φ(x) ≤ c2 (1 + |x|)−q , x ∈ Rd , for certain constants c2 > 0 and q ∈ (α, d], 3. q − α > (α + m)+ .   Let δ ∈ (0, q − α), ǫ > q − α − (α + m)+ and µ ∈ 0, (2c1 γα,d C 2 (δ) + C(δ))−1 , where γα,d > 0 is given) by (11) and C(δ) > 0 ) is given in Lemma*5 below. * −1

−1

Assume also that p0 ∈ 0, µ (4γα,d c2 ) , p1 ∈ 0, (4γα,d C(δ)c2 ) and let  , - m+2 −1 c ∈ 0, min µC∗ , (3 + 2 ) , where C∗ is given in Lemma 5 below. Then, for all continuous initial values f satisfying 0 ≤ f (x) ≤ c (1 + |x|)

−(α+m)+ −ǫ

,

the mild solution to (1) is global.

x ∈ Rd ,

Our method of proof is an adaptation of a technique used in [7] and [10]. We use the following lemma (which is proved in the final section of the present paper). Lemma 5. There exists a C∗ > 0 and for all δ > 0, a C(δ) > 0 such that,  (1 + |y|)−q dy ≤ C(δ)(1 + |x|)α−q+δ , q ∈ (α, d]. C∗ (1 + |x|)α−q ≤ d−α Rd |x − y| For a given constant µ > 0 we define  (1 + |y|)−q v(x) = µ dy. d−α Rd |x − y|

(10)

Recall [1] that the Green’s function G of the linear operator −∆α satisfies γα,d G(x, y) = where γα,d = Γ((d − α)/2)/(2α π d/2 |Γ(α/2)|). |x − y|d−α

Lemma 6. Let v be given by (10). There exists a δ > 0 such that ∞  ∆α v + a(x) pk vk + (p0 + p1 v)φ(x) ≤



k=2



 −µ + c1 pk C(δ)k µk + (p0 + p1 C(δ)µ)c2 γα,d k=2

(9)



(1 + |x|)−q .

(11)

(12)

28

S. Chakraborty, E.T. Kolkovska and J.A. L´ opez-Mimbela

Proof. This follows from Inequality (9). Indeed, from (11) we get that µ (1 + |x|)−q , ∆α v(x) = − γα,d (see, e.g., [4], Chapter 2), and by assumption p0 φ(x) ≤ p0 c2 (1 + |x|)−q . Using Lemma 5 we obtain that, for any δ > 0, a(x)

∞ 

k=2

pk v k ≤ c1

∞ 

k=2

pk C(δ)k µk (1 + |x|)m+k(α−q+δ) .

Let δ ∈ (0, q − α). Since the inequality m + k(α − q + δ) ≤ m + 2(α − q + δ) is valid for all k ≥ 2, to prove (12) it suffices to show that m + 2(α − q + δ) ≤ −q.

(13)

If m + α < 0, then we choose δ > 0 so small that m + α + δ < 0, and thus m + 2(α − q + δ) ≤ −q. In case of m + α ≥ 0, since q − α > (m + α)+ = m + α, we can choose δ > 0 so small that m + α + (α − q) + 2δ is negative. This proves (13). In addition, because of α − q + δ < 0, p1 vφ(x) ≤ p1 c2 C(δ)µ(1 + |x|)α−q+δ−q ≤ p1 c2 C(δ)µ(1 + |x|)−q . Thus (12) is satisfied.  Proof of Theorem 4. Since by assumption δ ∈ (0, q−α), Lemma 5 implies c2 v(x) ≤ c2 µC(δ) for all x ∈ Rd . Therefore p1 c2 C(δ)µ < µ/(4γα,d ) because ) * −1 p1 ∈ 0, (4γα,d C(δ)c2 ) . Hence,



 −µ µ2 C(δ)2 2µ −µ + c1 pk C(δ)k µk + (p0 + p1 C(δ)µ)c2 < + c1 + q, 0 ≤ (18)    −(d−α) log(1 + |x|), d = q. b9 |x|

If in addition |x| ≤ 1,  |x|/2  −(d−α) −q d−1 −(d−α) |x| (1 + r) r dr ≤ |x| 0

|x|/2 0

rd−1 dr = b10 |x|α ≤ b10 . (19)

)d−α . When x ∈ D3 and |x| ≥ 1 we get from (18) that for Let κ = sup|x|≥1 ( 1+|x| |x| d > q,  |x|/2 b7 |x|−(d−α) (1 + r)−q rd−1 dr ≤ b8 κ(1 + |x|)α−q+δ , (20) 0

whereas for d = q,  b7 |x|−(d−α)

|x|/2

0

(1 + r)−q rd−1 dr ≤ b9 κ(1 + |x|)−(d−α) log(1 + |x|) < b11 (1 + |x|)

α−q+δ

(21)

.

Putting together (16), (17) and (19)–(21) gives the RHS inequality in (9). To prove the LHS inequality in (9) we notice that   (1 + |y|)−q dy −q ≥ b12 (1 + |x|) |x − y|α−d dy, (22) d−α |x − y| d + R D

+ = {y ∈ Rd : |x − y| < |x|/2}. This is because |x − y| < |x|/2 implies where D |y| < 3|x|/2, hence (1 + |x|)q (1 + |x|)q ≥ ≥ b13 , q (1 + |y|) (1 + 3|x|/2)q

yielding (1 + |y|)−q ≥ b13 (1 + |x|)−q . We use once again polar coordinates to show that the RHS of (22) equals  b12 (1 + |x|)−q |x − y|α−d dy {y: 0 s. Application of Itˆo’s formula together with the BSDE (27) imply:  s′  s′ 2 t,e 2 2 Yst,e −Yst,e F (r, Z ) dr+ Yrt,e −Yst,e n n ′ L (Tn ,Rn )  ′ L (Tn ,Rn ) dr. r L2 (T ,R ) 2 2 s

s

Gronwall’s lemma implies that there exist constants γ˜ > 0 and γ > 0 such that  s′ 2 F (r, Zrt,e ) 2L2 (Tn ,Rn ) dr  γ(s′ − s). Yst,e − Yst,e  γ ˜ ′ L2 (Tn ,Rn ) s

This implies that if p > 1 then

2p ′ p Yst,e − Yst,e ′ L (Tn ,Rn )  γ|s − s | . 2

56

A.B. Cruzeiro and E. Shamarova

By Kolmogorov’s continuity criteria, Yst,e has a continuous path modification with respect to the L2 (Tn , Rn )-topology. The SDE (26) implies that Zst,e has a continuous path modification in the L2 (Tn , Rn )-topology as well.  Lemmas 5 and 6 below characterize the deterministic nature of the process Yst,e and describe its continuity properties. Lemma 5. The map [0, T ] × Tn → Rn , (t, θ) → Ytt,e (θ)

is deterministic and the function [0, T ] → H α (Tn , Rn ), t → Ytt,e is continuous. Proof. The first statement is a consequence of Blumenthal’s zero-one law and the fact that the random variable Ytt,e is F0 -measurable (as in Lemma 13 of [C-S] or Corollary 1.5. of [D]). The proof of the continuity of the map [0, T ] → L2 (Tn , Rn ), t → Ytt,e follows as in Lemma 14 of [C-S]. Consider the FBSDEs below on the interval [0, T ] with respect to (∇l Zst,ξ , ∇l Yst,ξ , ∇l Xst,ξ ):  s ∇l Zst,ξ = ∇l ξ + I[t,T ] ∇l Yrt,ξ dr (31) 0

l



Yst,ξ

  = ∇h ZTt,ξ ( · ) ∇l ZTt,ξ + +

l  j=2

+



T

 ∇j h ZTt,ξ ( · ) I[t,T ]

s







l  j=2



s

T

3



s

T

  I[t,T ] ∇F r, Zrt,ξ ( · ) ∇l Zrt,ξ dr 

i1 +···+ij =l−j+1

 3 ∇j F r, Zrt,ξ ( · )

∇i1 ZTt,ξ . . . ∇ij ZTt,ξ 

i1 +···+ij =l−j+1

4

4 ∇i1 Zrt,ξ . . . ∇ij Zrt,ξ dr

∇l Xrt,ξ dWr

(32)

and note that its solution (∇l Zst,ξ , ∇l Yst,ξ , ∇l Xst,ξ ) can be obtained from the solution to (11), (12) by extending it to [0, t] as follows: ∇l Zst,ξ = ∇l ξ, ∇l Yst,ξ = ∇l Ytt,ξ , ∇l Xst,ξ = 0, s ∈ [0, t]. The extended triple solves the FBSDEs (31), (32) on [0, T ]. The same argument as in the proof of Lemma 14 of [C-S] implies that there exists a constant γ > 0 such that ′

∇l Ytt,ξ − ∇l Ytt′ ,ξ L2 (Tn ,Rnl )  γ|t − t′ |. Therefore the map t → Ytt,ξ is continuous with respect to the H α (Tn , Rn )-topology.  Lemma 6. Let the function y : [0, T ] × Tn → Rn be defined by the formula y(t, θ) = Ytt,e (θ).

(33)

Forward-backward Stochastic System

57

Then, for every t ∈ [0, T ], there exists a set Ω′ ⊂ Ω of full P-measure, so that for all u ∈ [t, T ], for all ω ∈ Ω′ the following relation holds: Yut,e = y(u, · ) ◦ Zut,e .

(34)

Proof. Note that (25) implies that if ξ is Ft -measurable then Ytt,ξ = y(t, · ) ◦ ξ.

(35)

(Zst,e , Yst,e , Xst,e )

Further, for each fixed u ∈ [t, T ], is a solution of the following problem on [u, T ]: √ s . t,e Zs = Zut,e + u Yrt,e dr + 2ν (Ws − Wu ), √ T   T   Yst,e = h ZTt,e ( · ) + s F r, Zrt,e ( · ) dr − 2ν s Xrt,edWr . t,e u,Zu

By the uniqueness of solution, it holds that Yst,e = Ys u,Z t,e Yu u

a.s. on [u, T ]. Next, by

Zut,e .

(35), we obtain that = y(u, · ) ◦ This implies that there exists a set Ωu (which depends on u) of full P-measure such that (34) holds everywhere on Ωu . Clearly, one can find a set ΩQ , P(ΩQ ) = 1, such that (34) holds on ΩQ for all rational u ∈ [t, T ]. But the trajectories of Zst,e and Yst,e are a.s. continuous with respect to L2 (Tn , Rn )-topology by Lemma 4. Furthermore y(t, · ) is continuous in t with respect to (at least) the L2 (Tn , Rn )-topology. Therefore, (34) holds a.s. with respect to the L2 (Tn , Rn )-topology. Since both sides of (34) are continuous in θ ∈ Tn it also holds a.s. for all θ ∈ Tn .  Finally the function y(s, · ) defined by (33) indeed verifies the Burgers equation. This is the content of the next lemma. Lemma 7. The function y defined by formula (33) is C 1 -smooth in t ∈ [0, T ], and is a solution of problem (1). Proof. Let δ > 0. We obtain: t+δ,e t,e t,e t+δ,e y(t + δ, · ) − y(t, · ) = Yt+δ − Ytt,e = Yt+δ − Yt+δ + Yt+δ − Ytt,e .

As before, let Gα be the group of H α -diffeomorphisms Tn → Tn , and let Yˆs be the right-invariant vector field on Gα generated by y(s, · ) (see [C-S]). Relation (34) allows us to represent the SDE (26) as an SDE on the manifold Gα . Indeed, by results of [G1] and [C-S], the SDE . t,e √ dZs = exp{Yˆs (Zst,e ) ds + 2ν dWs }, (36) Ztt,e = e where exp is the exponential map of the weak Riemannian metric on Gα (see [C-S]), has a unique Gα -valued solution. As it was proved in [C-S], the latter solution coincides with the unique solution of the H α (Tn , Rn )-valued SDE . t,e √ dZs = y(s, · ) ◦ Zst,e ds + 2ν dWs , Ztt,e = e.

58

A.B. Cruzeiro and E. Shamarova

Therefore, the Zst,e -part of the solution to (26), (27) is the unique solution to (36). t,e t,e By Lemma 6, a.s. Yt+δ = Yˆt+δ (Zt+δ ). Thus we obtain that a.s.  t,e  t,e y(t + δ, · ) − y(t, · ) = Yˆt+δ (e) − Yˆt+δ (Zt+δ ) + (Yt+δ − Ytt,e ). We use the BSDE (27) for the second difference and apply Itˆo’s formula to the first difference when considering Yˆt+δ as a C 2 -smooth function Gα → L2 (Tn , R2 ). We obtain:  t+δ t,e Yˆrt,e (Zrt,e )[Yˆt+δ (Zrt,e )] dr Yˆt+δ (Zt+δ ) − Yˆt+δ (e) = +







t

t+δ

t

n  ' i=1

( ¯ ei Yˆt+δ (Zrt,e ) dWr + 2ν ∇



t+δ

t

n  ' i=1

( ¯ 2e Yˆt+δ (Zrt,e ) dr ∇ i

¯ is the covariant derivative on Gα , ei are regarded as constant vector fields where ∇ α on G , and the expression Yˆrt,e (Zrt,e )[Yˆt+δ (Zrt,e )] has the same meaning as in [C-S]. We obtain:  t+δ t,e dr ∇y(r, · ) y(t + δ, · ) ◦ Zrt,e Yˆt+δ (Zt+δ ) − Yˆt+δ (e) = +



t

t

t+δ

dr ν ∆ y(t + δ, · ) ◦

Further we have: t,e Ytt,e − Yt+δ =



t+δ

t

Taking expectations implies:

Zrt,e

√  + 2ν

t+δ

t

dr F (r, · ) ◦ Zrt,e −

n  ' ( ¯ ei Yˆt+δ (Z t,e ) dWr . ∇ r i=1







t

t+δ

Xrt,e dWr .

  1 1 3 t+δ y(t + δ, · ) − y(t, · ) = − E dr [ (y(r, · ), ∇) y(t + δ, · ) δ δ t 4 + ν ∆ y(t + δ, · ) + F (r, · )] ◦ Zrt,e .

(37)

Note that Zrt,e , F (r, · ), and (y(r, · ), ∇) y(t + δ, · ) ◦ Zrt,e are continuous in r a.s. with respect to the L2 (Tn , R2 )-topology. By Lemma 5, ∇ y(t, · ) and ∆ y(t, · ) are continuous in t with respect to at least the L2 (Tn , R2 )-topology. Formula (37) and the fact that Ztt,e = e imply that in the L2 (Tn , R2 )-topology ∂t y(t, · ) = −[∇y(t, · ) y(t, · ) + ν ∆ y(t, · ) + F (t, · )].

(38)

Since the right-hand side of (38) is an H α−2 -map, so is the left-hand side. This implies that ∂t y(t, · ) is continuous in θ ∈ Tn . Therefore, (38) holds for any θ ∈ Tn . Relation (38) is obtained so far for the right derivative of y(t, θ) with respect to t. Note that the right-hand side of (38) is continuous in t which implies that the right derivative ∂t y(t, θ) is continuous in t on [0, T ). Hence, it is uniformly continuous on every compact subinterval of [0, T ). This implies the existence of the left derivative of y(t, θ) in t, and therefore, the existence of the continuous  derivative ∂t y(t, θ) everywhere on [0, T ].

Forward-backward Stochastic System

59

Remark. Note that at the same time we have proved that the process Zst,e takes values in the group Gα of H α -diffeomorphisms Tn → Tn . Acknowledgement The first author acknowledges the support of the Portuguese Foundation for Science and Technology through the project PTDC/MAT/69635/2006. The second author acknowledges the support of the Portuguese Foundation for Science and Technology through the Centro de Matem´ atica da Universidade do Porto.

References [A]

V.I. Arnold, Sur la g´eom´ etrie diff´erentielle des groupes de Lie de dimension infinie et ses applications a l’hydrodynamique des fluides parfaits, Ann. Inst. Fourier 16 (1966), 316–361. [B] Ya.I. Belopolskaya, Yu.L. Dalecky, Stochastic equations and differential geometry, Series: Mathematics and its Applications, Kluwer Academic Publishers, Netherlands, (1989), 260 p. [C-S-T-V] P. Cheridito, H. Mete Soner, N. Touzi and N. Victoir, Second order backward stochastic differential equations and fully nonlinear parabolic PDEs, Comm. Pure Appl. Math. 60 (2007), 1081–1110. [C-S] A.B. Cruzeiro, E. Shamarova, Navier–Stokes equations and forward-backward SDEs on the group of diffeomorphisms of a torus, Stochastic Processes and their Applications, 119, (2009), 4034–4060. [D] F. Delarue, On the existence and uniqueness of solutions to the FBSDEs in a non-generate case, Stoch. Proc. and their Appl. 99, (2002), 209–286. [G] Yu.E. Gliklikh, Solutions of Burgers, Reynolds, and Navier–Stokes equations via stochastic perturbations of inviscid flows, Journal of Nonlinear Mathematical Physics, Vol. 17, No. Supplementary Issue 1 (2010) 15–29. [G1] Yu.E. Gliklikh, Global Analysis in Mathematical Physics: Geometric and Stochastic Methods, Springer (1997), 213 p. [E-M] D.G. Ebin and J. Marsden, Groups of diffeomorphisms and the motion of an incompressible fluid, Ann. of Math. 92 (1970), 102–163. [N-Y-Z] T. Nakagomi, K. Yasue, J.-C. Zambrini, Stochastic variational derivations of the Navier-Stokes equation, Lett. Math. Phys., 160 (1981), 337–365. [Y] K. Yasue, A variational principle for the Navier-Stokes equation, J. Funct. Anal., 51 (2), (1983), 133–141 Ana Bela Cruzeiro Dep. de Matem´ atica, IST-UTL and Grupo de F´ısica Matem´ atica da Universidade de Lisboa e-mail: [email protected] Evelina Shamarova Centro de Matem´ atica da Universidade do Porto e-mail: [email protected]

Progress in Probability, Vol. 65, 61–71 c 2011 Springer Basel AG 

On the Estimate for Commutators in DiPerna–Lions Theory Shizan Fang and Huaiqian Lee Abstract. In this note, we will exhibit the estimates on the commutator of semi-groups, motivated by commutator estimates in Di Perna-Lions theory. Mathematics Subject Classification (2000). 60H30, 34A12, 60H10. Keywords. Sobolev regularity, OU semi-group, heat semi-group, commutator, renormalized solutions.

1. Introduction In 1989, R.J. Di Perna and P.L. Lions proved in [4] that for a vector field Z on Rd having only Sobolev space W1p regularity such that div(Z) ∈ L∞ and  |Z(x)| Rd 1+|x| dx < +∞, the differential equation dXt (x) = Z(Xt (x)) dt

(1)

has a unique solution Xt (x) for almost all initial x ∈ Rd ; moreover for each t ≥ 0, the Lebesgue measure dx on Rd admits a density Kt (x) under x → Xt (x); that is the push forward measure (Xt )∗ (dx) by Xt is absolutely continuous with respect to the Lebesgue measure: (Xt )∗ (dx) = Kt (x) dx. For the procedure of approximation in order to solve (1), instead of estimating the Jacobian of vector fields, they transformed the ordinary differential equation (1) to the transport equation dut + Z · ∇ut = 0. dt

(2)

We say that u ∈ L∞ ([0, T ], Lp(Rd )) solves (2) in distribution sense if   3 4 ′ −α F ut − α div(F Z) ut dxdt = α(0)F u0 dx [0,T ]×Rd

where α ∈ Cc∞ ([0, T )) and F ∈ Cc∞ (Rd ).

Rd

(3)

62

S. Fang and H. Lee

A solution in distribution sense is generally difficult to be handled. A useful concept in this respect is the notion of renormalized solutions: for any β ∈ Cb1 (R), β(ut ) solves again (3). A basic result in the above-cited paper is Theorem 1.1. Let unt = ut ∗ χn . Then unt satisfies

dunt + Z · ∇unt = cn (ut , Z), dt here cn (f, Z) = (DZ f ) ∗ χn − DZ (f ∗ χn ) and ||cn (f, Z)||L1 ≤ C ||f ||Lp (||∇Z||Lp + || div(Z)||Lp ).

(4)

(5)

Theorem 1.1 plays a key role in proving the existence and uniqueness of solutions to (2). By establishing the well-posedness for (2), they solved the differential equation (1). This concept of renormalized solutions has been substantially put forward by L. Ambrosio [1]: he took advantage the use of continuity equations ∂µt + Dx · (Zµt ) = 0, t > 0, µ|t=0 = µ0 ∂t and dealt successfully with vector fields having only the bounded variation. The main purpose of this note is to establish such kind of estimate for the heat semi-group in Riemannian manifolds. The main result is Theorem 3.3 in Section 3. In order to make the contrast with flat cases, we will include a short discussion on the Wiener space in Section 2.

2. The case of the Wiener space In the spirit of [1], Ambrosio and Figalli in [2] treated vector fields Z on the abstract Wiener space (W, H, µ). Let Pt be the Ornstein-Ulenbeck semi-group:  5 Pt f (x) = f (e−tx + 1 − e−2t y) dµ(y). W

They proved that if rt (f, Z) = et Z, ∇Pt (f ) − Pt (divµ (f Z)), then 3 Λ(p)t 4 ||rt (f, Z)||L1 ≤ 2 ||f ||Lp √ ||Z||Lq + || divµ (Z)||Lq + ||∇Z||Lq . 1 − e−2t Notice first that the divergence is defined by   f divµ (Z) dµ = − ∇f, ZH dµ. W

For h ∈ H,

W

Dh Pt f = e−t Pt (Dh f ).

Then rt (f, h) = −Pt (f div(h)). In this simple case: ||rt (f, h)||L1 ≤ ||f ||Lp || div(h)||Lq .

(6)

Commutators in DiPerna–Lions Theory

63

But for Z = h, the term ||Z||Lq = |h|H is not zero, the first term on the right hand in (6) is an extra term comparing to the above estimate. In the paper [9], we proved that if ct (f, Z) = DZ Pt f − Pt (DZ f ), then ||ct (f, Z)||L1 ≤ 2||f ||Lp (||∇Z||Lq + || divµ (Z)||Lq ).

(7)

In order to emphasize the difference with the case of Riemannian manifolds, let’s explain a bit the proof of (7). Let h ∈ H. Then  5 −t (∇f )(e−t x + 1 − e−2t y), hH dµ(y) (Dh Pt f )(x) = e W  5 e−t =√ f (e−t x + 1 − e−2t y) δh (y) dµ(y) 1 − e−2t W

where δh (y) = h, y is a Gaussian Random variable of variance |h|2H . If we replace h by a vector field Z : W → H, then  5 e−t (DZ Pt f )(x) = √ f (e−tx + 1 − e−2t y) Zx , y dµ(y). (8) −2t 1−e W There is no convenient expression for Pt (DZ f ). Recall that for Z = fi hi , divµ (Z) = (−fi δ(hi ) + Dhi fi ). We have  5 Pt (Dhi fi )(x) = Dhi fi (e−t x + 1 − e−2t y) dµ(y) W  5 1 √ fi (e−t x + 1 − e−2t y) hi , y dµ(y). = 1 − e−2t W and  5 5 Pt (fi δ(hi )) = fi (e−t x + 1 − e−2t y)hi , e−t x + 1 − e−2t y dµ(y). W

Remark

5 5 y e−t √ − (e−t x + 1 − e−2t y) = √ (− 1 − e−2t x + e−t y). 1 − e−2t 1 − e−2t ˜ y) = Zx , y = δ(Zx )(y), then Let Z(x,  e−t ˜ t (x, y)) dµ(y), Pt (divµ (Z))(x) = √ Z(O (9) 1 − e−2t W where 5 5 Ot (x, y) = (e−t x + 1 − e−2t y, − 1 − e−2t x + e−t y). Replacing Z by f Z in (9), we get  5 e−t ˜ t (x, y)) dµ(y). (10) Pt (divµ (f Z)) = √ f (e−t x + 1 − e−2t y)Z(O 1 − e−2t W Now note divµ (f Z) = f divµ (Z) + DZ f . Then ct (f, Z) = DZ Pt f − Pt (DZ f ) = DZ Pt f − Pt (divµ (f Z)) + Pt (f divµ (Z)).

64

S. Fang and H. Lee Let Bt = DZ Pt f − Pt (divµ (f Z)), then  3 4 5 e−t ˜ y) − Z(O ˜ t (x, y)) dµ(y). f (e−t x + 1 − e−2t y) Z(x, Bt = √ 1 − e−2t W

Note that

dOt dt

Bt = − √ ˜′

=

−t

If f˜(x, y) = f (x), then ) 1 * te−ts √ f˜(Ot (x, y)) I(Ots (x, y)) ds dµ(y) 1 − e−2ts 0

√ e R O. 1−e−2t −π/2 t

e−t 1 − e−2t



W

where I = Z (R−π/2 ) and I(x, y) = Z ′ (x)y, y−Z(x), x. Therefore the estimate (7) follows (for more details, see [9]): ||ct (f, Z)||L1 ≤ 2||f ||Lp (||∇Z||Lq + || divµ (Z)||Lq ), where ct (f, Z) = DZ Pt f − Pt (DZ f ).

3. The case of Riemannian manifolds Let M be a smooth compact Riemannian manifold and consider the heat semigroup TtM := et∆ on M . Recall first a probabilistic construction for TtM . Let O(M ) be the bundle of orthonormal frames of M ; r ∈ O(M ) is an isometry from Rd onto Tπ(r) M . Endowing M with the Levi-Civita connection, denote by A1 , . . . , Ad the d canonical horizontal vector fields on O(M ): let {ε1 , . . . , εd } be the canonical basis of Rd , consider the geodesic ξi (t) on M , starting from x0 and tangent to r0 εi ∈ Tx0 M ; translating the frame r0 along ξi (t) gives a curve ri (t) ∈ O(M ), then dri (t) Ai (r0 ) = dt at t = 0. Let rt (w, r0 ) solve the following Stratanovich SDE on O(M ): drt (w) =

d  i=1

Ai (rt (w)) ◦ dwti ,

r0 (w) = r0

(11)

where t → w(t) is the standard Rd -valued Brownian motion, defined on a probability space (Ω, F, P ). Let xt (w) = π(rt (w, r0 )). The law of w → x· (w) is independent of r0 and TtM f (x0 ) = E(f (xt )),

for any r0 ∈ π −1 (x0 ).

Now let’s state two basic formulae in path space analysis. The first one is due to J.M. Bismut [3]: Theorem 3.1. Let Z be a C 1 vector field on M . Then for any 0 ≤ t ≤ 1 and r0 ∈ π −1 (x0 ),  t 4  1 3 M DZ Tt f )(x0 ) = E f (xt ) Qs · r0−1 Zx0 , dws  , (12) t 0

Commutators in DiPerna–Lions Theory

65

where {Qt ; t ∈ [0, 1]} solves the following resolvent equation

1 dQt = − ricrt (w,r0 ) Qt , Q0 = Id. (13) dt 2 For more general study of such formula, we refer to S. Fang and P. Malliavin [10], D. Elworthy and X. Li [7], or the book [12]. Another formula due to B. Driver [5], concerns the forward derivative on the heat semi-group. Theorem 3.2. For any r0 ∈ π −1 (x0 ),  t  7*  1 )6 1 ¯ s , E(div(Z)(xt )) = E rt−1 Zxt (w) , I − s ricrs (w) dw t 2 0

(14)

¯ s denotes the Itˆo backward integral: where dw  t  ¯ s = lim As dw At∧τ+ (wt∧τ+ − wt∧τ ), |P|→0

0

τ ∈P

where τ+ denotes the point at the right side of τ in the partition P. The relation t ¯ s and the usual Itˆo stochastic integral is given by between 0 As dw  t  t  t ¯ s= As dw As dws + dAs · dws , (15) 0

0

0

It seems that the formula (14) has the inconvenience that the derivative of the Ricci tensor will be involved according to (15). There is another formula given in [6]:  t 7* 1 )6 Q−1 dw . E(div(Z)(xt )) = − E rt−1 Zxt (w) , Qt s s t 0 t Set Mt = Qt 0 Q−1 o’s formula s dws . Then by Itˆ ) t * 1 dMt = − ricrt (w,r0 ) Qt Q−1 ds dt + dwt , s 2 0 t or Mt = wt − 21 0 ricrs (w,r0 ) Ms dws . So  t 7* 1 1 )6 −1 −1 E(div(Z)(xt )) = − E(rt Zxt , wt ) + E rt Zxt , ricrs Ms ds . (16) t 2t 0 we rewrite (14):

E(div(Z)(xt )) =

where

 t * 1 1 ) E(rt−1 Zxt , wt ) − E rt−1 Zxt , s ricrs dws  t 2t 0  t ) * 1 − E rt−1 Zxt , sJrs ds 2t 0 J(r) =

d   i=1

 LAi ric (r)εi .

(17)

66

S. Fang and H. Lee

Note that when t → 0, the singular term in above two formulae is 1t E(rt−1 Zxt , wt ), but in opposite sign. To see the compatibility of these two formulae, note that 1 1 E(rt−1 Zxt , wt ) = E(rt−1 Zxt − r0−1 Zx0 , wt ). t t This means that the singular term will disappear, for the price that the derivative of Z will be involved. In order to get the term E(DZ f (xt )), we use the relation div(f Z) = DZ f + f div(Z); therefore when we replace Z by f Z in the formula (16), we get E(DZ f (xt )) = E(div(f Z)(xt )) − E(f (xt ) div(Z)(xt ))  t 7* 1 1 )6 = − E(f (xt ) rt−1 Zxt , wt ) + E rt−1 f (xt )Zxt , ricrs Ms ds t 2t 0 − E(f (xt ) div(Z)(xt )). (18) Recall the Bismut formula  t 4  1 3 M Qs · r0−1 Zx0 , dws  , DZ Tt f )(x0 ) = E f (xt ) t 0

In order to make a cancelation between the singular term in (18) and the term in above Bismut formula, we write down that 1 − E(f (xt ) rt−1 Zxt , wt ) t * 1 2 ) = E(f (xt ) rt−1 Zxt , wt ) − E rt−1 f (xt )Zxt − r0−1 f (x0 )Zx0 , wt  ; t t this last term gives rise the derivative of f . So the formula (16) is not convenient for our purpose. Theorem 3.3. Let q ≥ 2 and ε > 0. Then for t ∈ [0, 1],

||ct (f, Z)||L1 ≤ C1 ||f ||Lp+ε ||∇Z||Lq √ + C2 (ric) t ||f ||Lp+ε ||Z||Lq + ||f ||Lp || div(Z)||Lq .

Proof. By Bismut and Driver’s formulae, ct (f, Z) admits the expression * 1 ) ct (f, Z)(x0 ) = E f (xt ) r0−1 Zx0 − rt−1 Zxt , w(t) t  t  s *   1 ) − E f (xt )  ricru Qu du (r0−1 Zx0 ), dws  2t 0 0  t * 1 ) −1 + E f (xt )rt Zxt , s ricrs dws  2t 0  t *   1 ) + E f (xt )rt−1 Zxt , sJrs ds + E (f div(Z))(xt ) . 2t 0

(19)

(20)

Let a1 (t, r0 ) be the first term on the right side. Using Itˆo’s formula will involve the second derivative of the vector field Z. To avoid this, we shall use the reversibility of the process rt (w, r0 ) with the initial law dr0 on O(M ) (see

Commutators in DiPerna–Lions Theory

67

D. Stroock [13]), which is locally dx ⊗ dg, where dx is renormalized Riemannian measure on M and dg the renormalized Haar measure on SO(d). Let {ε1 , · · · , εd } be the canonical basis of Rd ; then we have rt−1 Zxt , εi  = Zxt , rt εi . We know that π ′ (r)Ai (r) = rεi . Set ϕi (r) = Zπ(r) , π′ (r)Ai (r). Set f˜(r) = f (π(r)). Then a1 (t, r0 ) =

d   4 1 3˜ E f (rt (w, r0 )) wti · ϕi (rt (w, r0 )) − ϕi (r0 ) . t i=1

and d ) 3 ϕ(r (w, r )) − ϕ(r ) q 4*q′ /q 3 1   q′ 4   t 0 0  q √ E  wti ϕi (rt (w, r0 )) − ϕi (r0 )  ≤ Cq,q E   ′ t t i=1

Now it remains to estimate  3 ϕ(r (w, r )) − ϕ(r ) q 4  t 0 0  √ E   dr0 . t O(M)

By Lyons-Zheng decomposition formula [11], for each i, ϕi (rt ) − ϕi (r0 ) =

1 ¯ t ), Mt − M 2

(21)

¯ s is a where s → Ms is a martingale relative to Fs = σ(ru ; u ≤ s) and s → M ¯ martingale to Fs = σ(rt−u ; 0 ≤ u ≤ t). Their quadratic variation are given by dMs · dMs = ¯ s · dM ¯s = dM



d s

0 j=1



d s

0 j=1

|LAj ϕi |2 (ru ) du, |LAj ϕi |2 (rt−u ) du.

Set for a smooth function F : O(M ) → R, |∇H F (r)|2 =

d  j=1

|LAj F (r)|2 .

By B¨ urkh¨older inequality, q 3) t *q/2 4 Mt q 1 || √ ||Lq ≤ Cq √ EPˆ |∇H ϕi |2 (ru ) du , t t 0

68

S. Fang and H. Lee

where Pˆ is the probability for which rt is reversible. For q ≥ 2, the right-hand side is dominated by   4 31  t * 1 t) H q |∇ ϕi | (ru ) du = Cq Cq EPˆ |∇H ϕi |q dr0 du t 0 t 0 O(M )  = Cq |∇H ϕi |q dr0 . O(M )

Therefore

8 8 8 ϕ(rt ) − ϕ(r0 ) 8q 8 8 √ 8 q 8 t

L (Pˆ )

≤ Cq,d

d   i=1

O(M)

|∇H ϕi |q dr0 .

(22)

Now we shall compute the right-hand side of (22). Let Uj (s) be the flow associated to Aj on O(M ): dUj (s) = Aj (Uj (s)). ds Recall that ϕi (r) = Zπ(r) , π ′ (r)Ai (r). Then ϕi (Uj (s)) = Zπ(Uj (s)) , π′ (Uj (s))Ai (Uj (s)). Set mj (s) = π(Uj (s)). Then Uj (s) dmj (s) = π ′ (Uj (s)) · = π ′ (Uj (s)) · Aj (Uj (s)) = Uj (s)εj . ds ds So ϕi (Uj (s)) = Zmj (s) , Uj (s)εi  = r0 Uj (s)−1 Zmj (s) , r0 εi . Therefore, by taking the derivative with respect to s and at s = 0, we get the expression (LAj ϕi )(r0 ) = (∇r0 εj Z)(x0 ), r0 εi . (23) It follows that d 

i,j=1

|LAj ϕi |2 (r0 ) = |∇Z|2 (x0 ).

The other terms in (20) are easily obtained.

(24) 

4. The case of Lie groups Now a natural question is what happens for compact Lie groups? Is there a simpler approach based on the group structure? Let G be a compact Lie group (for example closed subgroups of SO(N )), G its Lie algebra equipped with an AdG invariant metric  , . For a ∈ G, the left

Commutators in DiPerna–Lions Theory

69

invariant a ˜(g) = ga and the right invariant vector field a ˆ(g) = ag. Let {ε1 , · · · , εd } be an orthonormal basis of G. Then the Laplace operator is given by d d 1 2 1 2 Lε˜i = L . 2 i=1 2 i=1 εˆi

∆= Let pt (g) be the heat kernel:

Tt f (g0 ) = et∆ f (g0 ) = Let a ∈ G. Then



G

f (g)pt (g0−1 g) dg.

 d f (g)pt (e−εa g0−1 g) dg dε |ε=0 G  =− f (g0 g) (Laˆ pt )(g) dg.

(La˜ Tt f )(g0 ) =

G

On the other hand,



La˜ f (g)pt (g0−1 g) dg G  =− f (g0 g)(La˜ pt )(g) dg.

Tt (La˜ f )(g0 ) =

G

If G is not commutative, there is no reason to expect that La˜ Tt f = Tt (La˜ f ). on G

Let (w(t))t≥0 be the G-valued standard Brownian motion. Consider the SDE

dgw (t) = gw (t) ◦ dw(t), gw (0) = e. , Let Pe (G) = γ : [0, T ] → G continuous , γ(0) = e and ν the Wiener measure on Pe (G), the law of w → gw (·). Denote by  T ! " 2 ˙ H = h : [0, T ] → G; |h|2H = |h(s)| ds < +∞ . 0

For h ∈ H, consider

Zh (w)(t) =



t 0

˙ Adgw (s)−1 h(s) ds.

Then by elementary calculation (see [8]), for a cylindrical functional F : Pe (G) → R, d d F (eεh gw ) = F (gw+εZh (w) ). dε |ε=0 dε |ε=0 It follows that   ∂hℓ F (γ) dν(γ) = F (γ)Khℓ (γ) dν(γ), (25) Pe (G)

Pe (G)

70

S. Fang and H. Lee

where ∂hℓ F (γ) =

d F (eεh γ) and dε |ε=0  T ˙ Adgw (s)−1 h(s), dw(s). Khℓ (gw ) = 0

But for the right derivative, we have ([8]),   ∂hr F (γ) dν(γ) = F (γ)Khr (γ) dν(γ), Pe (G)

where Khr (gw )

=



T 0

˙ h(s), dw(s).

Now take F (γ) = f (γ(T )) and h(s) = Ts a with a ∈ G in (25), we get   f (γ(T ))Khℓ (γ) ν(γ), ∇f (γ(T )), h(T )γ(T ) dν(γ) = Pe (G)

Pe (G)

or

(26)

Pe (G)



G

∇f (g), agpT (g) dg =

It follows that



G

f (g)E(Khℓ (gw ) | gw (T ) = g) pT (g) dg.

 T * 1 ) Laˆ pT (g) = − E a, Adgw (s) dw(s) | gw (T ) = g . pT (g) T 0

(27)

* La˜ pT (g) 1 ) = − E a, w(T ) | gw (T ) = g . pT (g) T

(28)

Using (26), we get

It is difficult to conclude with these formulae (27) and (28). This means that the estimate in Theorem 3.3 is geometric: even in the case of Lie groups, we have to take into account the Riemannian structure. Acknowledgment The authors are grateful to referees for pertinent suggestions to improve the exposition of this note.

References [1] L. Ambrosio, Transport equation and Cauchy problem for BV vector fields, Invent. Math. 158 (2004), 227–260. [2] L. Ambrosio and A. Figalli, On flows associated to Sobolev vector fields in Wiener space: an approach a ` la DiPerna-Lions, Preprint 2008. [3] J.M. Bismut, Large deviations and Malliavin calculus, Birkh¨ auser, Boston/Basel/ Stuttgart, 1984. [4] R.J. Di Perna and P.L. Lions Ordinary differential equations, transport theory and Sobolev spaces. Invent. Math. 98 (1989), 511–547.

Commutators in DiPerna–Lions Theory

71

[5] B. Driver, Integration by parts for heat kernel measures revisited, J. Math. Pures Appl., 76 (1997), 703–737. [6] B. Driver and A. Thalmaier, Heat equation derivative formulas for vector bundles, J. Funct. Anal. 183 (2001), 42–108. [7] D. Elworthy and X. Li, Formulae for the derivative of heat semigroups, J. Funct. Anal. 125 (1994), 252–286. [8] S. Fang, Introduction to Malliavin Calculus, Math. Series for Graduate studients, vol. 3, Springer, Tsinghua University Press, 2005. [9] S. Fang and D. Luo, Transport equations and quasi-invariant flows on the Wiener space, to appear in Bulletin Sciences Math. [10] S. Fang and P. Malliavin, Stochastic analysis on the path space of a Riemannian manifold, J. Funct. Anal. 118 (1993), 249–274. [11] T. Lyons and W. Zheng, A croissing estimate for the canonical process on a Dirichlet space and tightness result, Colloque Paul L´evy, Ast´erisque, 157 (1988), 248–272. [12] P. Malliavin, Stochastic Analysis, Grund. Math. Wissen, vol. 313, Springer, 1997. [13] D. Stroock, An Introduction to the Analysis of Paths on a Riemannian manifold, Math. Survey and Monographs, vol. 74, AMS, 1999. Shizan Fang and Huaiqian Lee I.M.B. B.P. 47870 Universit´e de Bourgogne Dijon, France e-mail: [email protected]

Progress in Probability, Vol. 65, 73–81 c 2011 Springer Basel AG 

Approximation Theorem for Stochastic Differential Equations Driven by G-Brownian Motion Fuqing Gao and Hui Jiang Abstract. We present an approximation theorem for stochastic differential equations driven by G-Brownian motion, i.e., solutions of stochastic differential equations driven by G-Brownian motion can be approximated by solutions of ordinary differential equations. Mathematics Subject Classification (2000). 60F10, 60H05, 60H10, 60J65. Keywords. G-Brownian motion, G-expectation, stochastic differential equations, approximation theorem.

1. Introduction Recently, G-Brownian motion was defined in [6] and [7]. The expectation E[·] associated with the G-Brownian motion is a sublinear expectation which is called G-expectation. In this framework, the related stochastic calculus has been established in [6] and [7]. The H¨ older continuity and the homeomorphic property of the solution for stochastic differential equations driven by G-Brownian motion are studied in [2]. The large deviations for G-Brownian motion and stochastic differential equations driven by G-Brownian motion are established in [3]. The approximation theorem for stochastic differential equations driven by Brownian motion has been studied by many authors. For references, one can see Stroock and Varadhan ([11]), Ikeda and Watanabe ([4]). Ren ([10]) investigated this problem under Cr,p -capacity. The aim of this paper is to study the approximation theorem for stochastic differential equations driven by G-Brownian motion. 1.1. G-expectation and G-Brownian motion Let us briefly recall some basic conceptions and results about G-expectation and G-Brownian motion (see [1], [6], [7] for details). Research supported by the National Natural Science Foundation of China (10871153) .

74

F. Gao and H. Jiang Let Ω denote the space of all Rd -valued continuous paths ω : [0, +∞) ∋ t −→ ωt ∈ Rd ,

with ω0 = 0, equipped with the distance

∞  1 2 1 2 −n max |ωt − ωt | ∧ 1 . 2 ρ(ω , ω ) := n=1

t∈[0,n]

For each T > 0, set , Lip (FT ) := ϕ (ωt1 , ωt2 , . . . , ωtn ) : n ≥ 1, t1 , . . . , tn ∈ [0, T ], ϕ ∈ lip(Rd×n ) ,

where lip(Rd×n ) is the set of bounded Lipschitz continuous functions on Rd×n . 9∞ Define Lip (F ) := n=1 Lip (Fn ) ⊂ Cb (Ω). Sd denotes the space of d × d symmetric matrices. Γ is a given nonempty, bounded and closed subset of Rd×d which is the space of all d × d matrices. Set Σ := {γγ τ , γ ∈ Γ} ⊂ Sd and assume that Σ is a bounded, convex and closed subset. For A = (Aij )di,j=1 ∈ Sd given, set G(A) =

For each ϕ ∈ lip(Rd ), define

1 sup tr[γγ τ A]. 2 γ∈Γ

(1.1)

E(ϕ) = u(1, 0)

where u(t, x) is the viscosity solution of the following G-heat equation: ∂u − G(D2 u) = 0, on (t, x) ∈ [0, ∞) × Rd , ∂t

u(0, x) = ϕ(x),

(1.2)

and D2 u is the Hessian matrix of u, i.e., D 2 u = (∂x2i xj u)di,j=1 . Then E : lip(Rd ) → R is a sublinear expectation. This sublinear expectation is also called G-normal distribution on Rd and denoted by N (0, Σ)(cf. [9]). Let H be a vector lattice of real functions defined on Ω such that Lip (F) ⊂ H and if X1 , . . . , Xn ∈ H then ϕ(X1 , . . . , Xn ) ∈ H for each ϕ ∈ lip(Rn ). Let E[·] : H → R be a sublinear expectation on H. A d-dimensional random vector X with each component in H is said to be G-normal distributed under the sublinear expectation E[·] if for each ϕ ∈ lip(Rd ), √ u(t, x) := E(ϕ(x + tX)), t ≥ 0, x ∈ Rd is the viscosity solution of the G-heat equation (1.2). E[·] is called to be a Gexpectation if the d-dimensional canonical process {Bt(ω) = ωt , t ≥ 0} is a GBrownian motion under the sublinear expectation, that is, B0 = 0 and (i) For any s, t ≥ 0, Bt ∼ Bt+s − Bs ∼ N (0, tΣ); (ii) For any m ≥ 1, 0 = t0 < t1 < · · · < tm < ∞, the increment Btm − Btm−1 is independent from Bt1 , . . . , Btm−1 , i.e., for each ϕ ∈ lip(Rd×m), E(ϕ(Bt1 , . . . , Btm−1 , Btm − Btm−1 )) = E(ψ(Bt1 , . . . , Btm−1 )) where ψ(x1 , . . . , xm−1 ) = E(ϕ(x1 , . . . , xm−1 , Btm − Btm−1 )).

(1.3)

Approximation Theorem for SDEs Driven by G-Brownian motion

75

Throughout this paper, we assume that there exist constants 0 < σ ≤ σ < ∞ such that , Γ ⊂ γ ∈ Rd×d ; σId×d ≤ γγ τ ≤ σId×d . (1.4)

The topological completion of Lip (Ft ) (resp. Lip (F )) under the Banach norm · p,G := (E(| · |p ))1/p is denoted by LpG (Ft ) (resp. LpG (F)), where p ≥ 1. E(·) can be extended uniquely to a sublinear expectation on LpG (F). We denote also by E the extension. p,0 Given T > 0. For p ∈ [1, ∞), let MG (0, T ) denote the space of R-valued n−1 piecewise constant processes ηt = i=0 ηti 1[ti ,ti+1 ) (t) where ηti ∈ LpG (Fti ), 0 = p,0 t0 < t1 < · · · < tn = T . For η ∈ MG (0, T ), the G-stochastic integral is defined by  t n−1  j j j j I (η) := ηs dBs := ηti (Bt∧t − Bt∧t ). i+1 i 0

Let

p MG (0, T )

i=0

be the closure of

p,0 MG (0, T )

η pM p (0,T ) := G



under the norm: T

E (|ηt |p ) dt.

0

2,0 Then the mapping I j : MG (0, T ) → L2G (FT ) is continuous, and so it can be 2 2 continuously extended to MG (0, T ). For any η = (η1 , . . . , η d ) ∈ (MG (0, T ))d , define  t d  t  ηs dBs = ηsi dBsi . 0

0

i=1

The quadratic variation process of G-Brownian motion is defined by

 t ij i j i j Bt := (Bt )1≤i,j≤d = Bt Bt − 2 Bs dBs . 0 ≤ t ≤ T, 0

1≤i,j≤d

The G-stochastic integral satisfies the following BDG inequality: p

 u  t   ¯  ηv dBv  ≤ Cp σ p/2 |t − s|p/2−1 E sup  E (|ηu |p ) du, s≤u≤t

s

(1.5)

s

where p ≥ 2, 0 < Cp < ∞ is a constant independent of Γ and η. 1.2. Main result Let us consider the following G-SDE

dx(t) = σ(x(t))dBt + b(x(t))dt + h(x(t))dBt , x(0) = x0 ,

(1.6)

where σ = (σij ) : Rd → Rd ⊗ Rr , b = (bi ) : Rd → Rd , h = (hi,(j,l) ) : Rd → Rd ⊗ Rr

2

satisfy the following conditions: σij , bi , hi,(j,l) , ∂x∂ k σij , ∂x∂ k hi,(j,l) are bounded and ) *   d ∂ Lipschitz continuous. Denote by g = gi,(j,l) := ( σ )σ . ij kl k=1 ∂xk 1≤i≤d,1≤j,l≤r

76

F. Gao and H. Jiang

(1.6) can be written in Fisk-Stratonovich form:

1 dx(t) = σ(x(t)) ◦ dBt + b(x(t))dt + h(x(t)) − g(x(t)) dBt , x(0) = x0 , 2 where



t 0

σ(x(s)) ◦ dBs =



t

σ(x(s))dBs +

0

1 2



t

g(x(s))dBs .

0

Let xn be the solution of the following ordinary differential equation: 



 ˙ n (t) + b(xn (t)) + h(xn (t)) − 1 g(xn (t)) V˙ n (t) dt,  dxn (t) = σ(xn (t))W 2   xn (0) = x0 , (1.7) n

n

[2 t]+1 where for tn = [22nt] , t+ , n = 2n ) *   ij ˙ n (t) := 2n B(t+ ˙ ˙ W ) − B(t ) , V (t) := V (t) n n n n

d×d

Now we state our main result.

  := 2n B(t+ n ) − B(tn ) .

Theorem 1.1. For any p ≥ 1, there exists some constant C(p) > 0 such that for all n ≥ 1,   C(p) 2p E sup |xn (t) − x(t)| ≤ np . 2 t∈[0,1] Throughout this paper, C with or without indices will denote different constants whose values are not important. Remark 1.1. Comparing with the classical case, we need to deal with the quadratic variation process B.

2. The proof of the approximation theorem In this section, we show Theorem 1.1. We will omit some estimates similar to the classical case. Lemma 2.1. For any p ≥ 1, there exists some constant C(p) > 0 such that for all n ≥ 1,   E

sup |x(t) − x(tn )|

2p

sup |xn (t) − xn (tn )|

2p

t∈[0,1]

and E



t∈[0,1]



≤ C(p)2−np

(2.1)

≤ C(p)2−np .

(2.2)

Approximation Theorem for SDEs Driven by G-Brownian motion Proof. It is obvious that   t x(t) − x(tn ) = σ(x(s))dB(s) + tn

and

xn (t) − xn (tn ) =



t

t

b(x(s))ds +

tn

t

h(x(s))dBs

tn

tn

˙ n (s)ds + σ(xn (s))W



77



t

b(xn (s))ds tn



1 h(xn (s)) − g(xn (s)) V˙ n (s)ds. 2 tn Then, by the assumptions and the BDG inequality for G-stochastic integral (cf. [2]), we can obtain (2.1) and (2.2) immediately.  +



t

The following result is very useful for our proof. Lemma 2.2. For each 1 ≤ m ≤ d and 1 ≤ i, j ≤ r,

 tn  i +  j 1 ˙ i,j i ˙ gm,(i,j) (xn (sn )) Wn (sn ) − Wn (s) Wn (s) − Vn (s) ds 2 0  tn   1 gm,(i,j) (xn (sn )) Bsi − Bsi n dBsj = 2 0    1 tn gm,(i,j) (xn (sn )) Bsj − Bsjn dBsi , + 2 0 [2n s] + , sn 2n

n

= [2 2s]+1 , and for s ∈ [ k−1 , k ) and i = 1, . . . , r, n 2n 2n





k−1 k k−1 n i i k−1 Wni (s) = B i + 2 B − B ( ) s − . 2n 2n 2n 2n ) * Proof. Since V˙ ni,j (s) ≡ 2n B i , B j  kn − B i , B j  k−1 for s ∈ [ k−1 , k ), and 2n 2n 2 2n





k k k−1 i i i i k−1 n Wn − Wn (s) = B −B ( n ) 1−2 s− n , 2n 2n 2 2

where sn =

we have 

k 2n k−1 2n

=

gm,(i,j) (xn (sn ))



 j i ˙ (s) − 1 V˙ i,j (s) ds Wni (s+ ) − W (s) W n n n 2 n

1 k−1 gm,(i,j) (xn ( n )) 2) 2

× (B ikn − B ik−1 )(B jk − B jk−1 ) − B i , B j  kn + B i , B j  k−1 2 2n 2 2n 2n 2n  kn * ) 1 2 k−1 = gm,(i,j) (xn ( n )) Bsi − B ik−1 dBsj 2n 2 k−1 2 2n  k * 1 2n k−1 ) + gm,(i,j) (xn ( n )) Bsj − B jk−1 dBsi . 2 k−1 2 2n 2n

*

78

F. Gao and H. Jiang

Consequently,

 tn  i +  j 1 ˙ i,j i ˙ gm,(i,j) (xn (sn )) Wn (sn ) − Wn (s) Wn (s) − Vn (s) ds 2 0 n

2 tn  kn 2  i +  j 1 ˙ i,j i ˙ = gm,(i,j) (xn (sn )) Wn (sn ) − Wn (s) Wn (s) − Vn (s) ds k−1 2 2n k=1  tn   1 gm,(i,j) (xn (sn )) Bsi − Bsi n dBsj = 2 0    1 tn + gm,(i,j) (xn (sn )) Bsj − Bsjn dBsi , 2 0 which completes the proof of this lemma.



Proof of Theorem 1.1. Since xn (t) − x(t) = xn (tn ) − x(tn ) + xn (t) − xn (tn ) − 2p x(t) + x(tn ), by Lemma 2.1, we only have to estimate E |xn (tn ) − x(tn )| . By the k−1 k Newton-Leibniz formula, for any s ∈ [ 2n , 2n ),

and



 d  s k−1 ∂ σij (xn (s)) = σij xn + σij (xn (u))dxm n (u) n k−1 2 ∂x m n 2 m=1



 d  s k−1 ∂ + hi,(j,l) (xn (u))dxm hi,(j,l) (xn (s)) = hi,(j,l) xn n (u). k−1 ∂xm 2n 2n m=1

Therefore, we have that





k−1 k k−1 j j xn B −B 2n 2n 2n

n

xin (tn ) = xi0 +

tn r 2 

σij

j=1 k=1

+

r  d  tn  0

j=1 k=1

×



r  l=1

+



Wnj (s+ n)

r 

n tn r 2 

j,l=1 k=1

+

∂ σij (xn (s)) ∂xk



1 m,l ˙ hk,(m,l) (xn (s)) − gk,(m,l) (xn (s)) Vn (s) ds 2



) * k−1 hi,(j,l) xn B j , B l  kn − B j , B l  k−1 n n 2 2 2

r  d  

j,l=1 k=1



˙ nl (s) + bk (xn (s)) σkl (xn (s))W

m,l=1

+





Wnj (s)

0

tn



 j,l Vnj,l (s+ n ) − Vn (s)



∂ hi,(j,l) (xn (s)) ∂xk

Approximation Theorem for SDEs Driven by G-Brownian motion

×



r 

˙ j (s) + bk (xn (s)) σk,l (xn (s))W n

l=1



r  1 u,v ˙ hk,(u,v) (xn (s)) − gk,(u,v) (xn (s)) Vn (s) ds + 2 u,v=1   tn r 1  tn + gi,(j,l) (xn (s))V˙ nj,l (s)ds, bi (xn (s))ds − 2 0 0 j,l=1

and can write that I(t) := xin (tn ) − xi (tn ) =

8 

Im (t).

m=1

where I1 (t) = I2 (t) =

r  

I4 (t) =

(σi,j (xn (sn )) − σi,j (x(s))) dB j (s),

j=1 0 r  tn 



tn

0



0

j,l=1

I3 (t) =

tn

 hi,(j,l) (xn (sn )) − hi,(j,l) (x(s)) dB j , B l s ,

bi (xn (s)) − bi (x(s))ds,

r  

tn

j,l=1

0

j,l=1

0



 gi,(j,l) (xn (s)) − gi,(j,l) (xn (sn ))

* )  l j ˙ (s) − V˙ j,l (s) ds, ) − W (s) W × Wnj (s+ n n n n  r t * ) n   l  j ˙ ˙ j,l I5 (t) = gi,(j,l) (xn (sn )) Wnj (s+ n ) − Wn (s) Wn (s) − Vn (s) ds, I6 (t) =

r  d   j=1 k=1

I7 (t) =

and I8 (t) =

tn

0



 j Wnj (s+ n ) − Wn (s)



∂ σij (xn (s)) bk (xn (s))ds, ∂xk

∂ σij (xn (s)) ∂xk j,m,l=1 k=1 0   1 × hk,(m,l) (xn (s)) − gk,(m,l) (xn (s)) V˙ nm,l (s)ds, 2 r d   

r  d  

j,l=1 k=1

0

tn

tn



j Wnj (s+ n ) − Wn (s)

 j,l +  Vn (sn ) − Vnj,l (s)







∂ hi,(j,l) (xn (s)) ∂xk

79

80

F. Gao and H. Jiang

×



r  l=1

+

˙ j (s) + bk (xn (s)) σk,l (xn (s))W n

r 



1 u,v ˙ hk,(u,v) (xn (s)) − gk,(u,v) (xn (s)) Vn (s) ds. 2





u,v=1

Then, by the BDG inequality for G-stochastic integral and Lemma 2.1, it is easy to get that for any t ∈ [0, 1],

 t ) *

2p 2p E sup |I1 (s)| ds , ≤ C 2−np + E |I(s)| E and



sup |I2 (s)|

2p

0≤s≤t

+E

E

0≤s≤t

sup |I3 (s)|

2p

sup |I4 (t)|

2p

0≤s≤t



0≤t≤1



0



≤C 2



≤ C2−np .

−np

+



0

t

)

E |I(s)|

2p

*

ds ,

By the BDG inequality for G-stochastic integral and Lemma 2.2, we also have

2p ≤ C2−np . E sup |I5 (t)| 0≤t≤1

) 2p * j  From E Wnj (s+ = C2−np (1 − 2n(s − sn ))2p , it is easy to see that n ) − Wn (s)



E sup |I6 (t)|2p + E sup |I7 (t)|2p ≤ C2−np . 0≤t≤1

0≤t≤1

j,l −n By |Vnj,l (s+ n ) − Vn (s)| ≤ C2

E



 

 ˙ j 2p and E W ≤ C2np , we have n

sup |I8 (t)|2p

0≤t≤1



≤ C2−np .

Together with above estimations, we obtain that





 t 2p −np 2p E sup |I(t)| ≤C 2 + E sup |I(u)| du 0≤s≤t

)

0

0≤u≤s

Consequently, by Gronwall’s inequality, E sup0≤t≤1 |I(t)| pletes the proof of this theorem.

2p

*

≤ C2−np which com

References [1] L. Denis, M.S. Hu, S. Peng, Function spaces and capacity related to a sublinear expectation: application to G-Brownian motion pathes. Potential Analysis, 34 (2011), 139–161. [2] F.Q. Gao, Pathwise properties and homeomorphic flows for stochastic differential equations driven by G-Brownian motion, Stoch. Proc. Appl., 119(2009), 3356–3382.

Approximation Theorem for SDEs Driven by G-Brownian motion

81

[3] F.Q. Gao, H. Jiang, Large Deviations for Stochastic Differential Equations driven by G-Brownian motion, Stoch. Proc. Appl., 120 (2010), 2212–2240. [4] N. Ikeda, S. Watanabe, Stochastic Differential Equations and Diffusion Processes, North-Holland/Kodanska, Amsterdam, 1989. [5] M. Ledoux, M. Talagrand, Probability in Banach Space Isoperimetry and Processes. Springer-Verlag, 1991. [6] S. Peng, G-Expectation, G-Brownian motion and related stochastic calculus of Itˆ o’s type, in: Proceedings of the 2005 Abel Symposium 2, Edit. Benth et al., 541–567, Springer-Verlag, 2006. [7] S. Peng, Multi-dimensional G-Brownian motion and related stochastic calculus under G-expectation. Stochastic Processes and their Applications, 118 (2008) 2223–2253. [8] S. Peng, G-Brownian motion and dynamic risk measure under volatility uncertainty. Preprint (pdf-file avaliable in arXiv: math.PR/0711.2834v1 19 Nov. 2007). [9] S. Peng, A new central limit theorem under sublinear expectations. arXiv: 0803.2656v1[math.PR], 18 Mar. 2008. [10] J.G. Ren, Analyse quasi-sˆ ure des ´equations diff´erentielles stochastiques. Bull. Sc. Math. 114(1990), 187–213. [11] D. Stroock, S.R.S. Varadhan, On the support on diffusion processes with applications to the stroock maximum principle. Proceedings of the 6th Berkeley symposium on mathematical statistics and probability. 3(1972), 333–359. [12] T. Taniguchi, Successive approximations to solutions of stochastic differential equations. J.Differential Equations. 96(1992), 152–169. [13] J. Xu, B. Zhang, Martingale characterization of G-Brownian motion. Stochastic Processes and their Applications, 119 (2009) 232–248. [14] X.C. Zhang, Euler-Maruyama approximations for SDEs with non-Lipschitz coefficients and applications. J. Math. Anal. Appl. 316(2006), 447–458. Fuqing Gao School of Mathematics and Statistics Wuhan University Wuhan 430072, P.R. China e-mail: [email protected] Hui Jiang School of Science Nanjing University of Aeronautics and Astronautics Nanjing 210016, P.R. China e-mail: [email protected]

Progress in Probability, Vol. 65, 83–97 c 2011 Springer Basel AG 

Stochastic Flows for Nonlinear SPDEs Driven by Linear Multiplicative Space-time White Noises Benjamin Goldys and Xicheng Zhang Abstract. For a nonlinear stochastic partial differential equation driven by linear multiplicative space-time white noises, we prove that there exists a bicontinuous version of the solution with respect to the initial value and the time variable. Mathematics Subject Classification (2000). 60H15. Keywords. Stochastic flows, stochastic partial differential equations, spacetime white noises.

1. Introduction and main result Consider the following stochastic partial differential equation (SPDE) on [0, 1] with Dirichlet boundary conditions:  ∂ 2u ∂2W  ∂u = + f (x, u(t, x)) + [b(x)u(t, x) + c(x)] , (1) ∂t ∂x2 ∂t∂x  u(t, 0) = u(t, 1) = 0, u(0, x) = u0 (x),

where W (t, x) is a space-time white noise on R+ × [0, 1] defined on some stochastic basis (Ω, F , P ; (Ft )t0 ), i.e., a two parameter Brownian sheet, and f (x, r) : [0, 1] × R → R and b(x), c(x) : [0, 1] → R are real measurable functions, and satisfy |f (x, r) − f (x, r′ )|  Cf |r − r′ |, ∀r, r′ ∈ R, x ∈ [0, 1]

(2)

f (x, 0), c(x) ∈ L2 (0, 1), b(x) ∈ L∞ (0, 1).

(3)

and

This work is supported by ARC Discovery grant DP0663153 of Australia and NSF of China (no. 10971076).

84

B. Goldys and X. Zhang 2

W Since ∂∂t∂x does not make sense in general, equation (1) is understood as the following mild form (cf. [9]):  1  t 1 u(t, x) = Gt (x, y)u0 (y)dy + Gt−s (x, y)f (y, u(s, y))dyds 0

+

 t 0

0

0

1

Gt−s (x, y)[b(y)u(s, y) + c(y)]W (ds, dy),

(4)

0

where Gt (x, y) is the fundamental solution of the associated homogeneous heat equation, i.e, equation (1) with f = b = c = 0 and u0 (x) = δy (x), which has the expression # $ # $& ∞ %  (y − x − 2n)2 (y + x − 2n)2 1 exp − − exp − ; Gt (x, y) := √ 4t 4t 4πt n=−∞ the third integral in (4) is the usual Itˆ o stochastic integral with respect to the martingale measure W (ds, dy) (cf. [9]). It is well known that for any u0 ∈ L2 (0, 1), there exists a unique continuous (Ft )-adapted process t → u(t; u0 ) ∈ L2 (0, 1) such that equation (4) holds in L2 (0, 1) (cf. [9, Theorem 3.2]). However, it is not known whether the family of solutions {u(t; u0 ) : t ∈ R+ , u0 ∈ L2 (0, 1)} forms a stochastic flow. More precisely, does there exist a version u ˜(t; u0 ) of u(t; u0 ) such that R+ × L2 (0, 1) ∋ (t, u0 ) → u ˜(t; u0 ) ∈ L2 (0, 1) is continuous almost surely? For finite-dimensional stochastic dynamic systems, the existence of stochastic flows has been studied extensively by employing Kolmogorov’s continuity criterion (cf. [6, 10, etc.]). However, the flow property for infinite-dimensional stochastic dynamic systems may not hold true (cf. [2, p. 5–6] or [8]). For linear stochastic evolution equations with multiplicative noises, this problem has been investigated by Flandoli in [2], see also [9, p. 332, Theorem 4.1]. For nonlinear SPDEs, there are few results and we only know the literatures [3] and [4]. In these two references, the authors considered the SPDEs driven by linear multiplicative noises with finite many Brownian motions, which can not be applied to equation (1). Our main result in this paper is stated as follows: Theorem 1.1. Under (2) and (3), for u0 ∈ L2 (0, 1), let u(t; u0 ) be the unique solution of equation (4). Then there exists a version u ˜(t; u0 ) of u(t; u0 ) such that for any p  2 and P -almost all ω ∈ Ω, (0, ∞) × L2 (0, 1) ∋ (t, u0 ) → u ˜(t, ω; u0 ) ∈ Lp (0, 1) is continuous. In particular, the following flow property holds: for P -almost all ω ∈ Ω, u ˜(t + s, ω; u0 ) = u ˜(t, θs ω; u ˜(s, ω, u0 )), ∀t, s  0, u0 ∈ L2 (0, 1), where θs is the usual shift operator.

Stochastic Flows for Nonlinear SPDEs

85

We remark that for nonlinear multiplicative noises, the above problem is still open. Our proof of this theorem is mainly based on some careful estimations of the singularity for a family of random evolution linear operators studied in [8, 2] along the diagonal. In Section 2, we shall give an abstract result of the existence of stochastic flows for stochastic evolution equations. In Section 3, Theorem 1.1 will be proved.

2. An abstract result for stochastic evolution equations Let H be a real separable Hilbert space. Let L : D(L) ⊂ H → H be the infinitesimal generator of an analytic semigroup (Tt )t0 of positive type. Define a family of Hilbert spaces Hα := D(Lα/2 ), α ∈ R endowed with the graph norm x Hα := x H + Lα/2 x H .

Since we are assuming that (Tt )t0 is of positive type, we have the following equivalent norm: x Hα ≃ Lα/2 x H . Below, we shall always use this norm. Moreover, we can extend the operator Tt to Hα for any α ∈ R (cf. [7]). The following properties are well known (cf. [5, p. 24–27] or [7, p. 74]). Proposition 2.1. (i) Tt : H → Hα for each t > 0 and α > 0. (ii) For each t > 0, α ∈ R and every x ∈ H2α , Tt Lα x = Lα Ttx. (iii) For some δ > 0 and each t, α > 0, the operator Lα Tt is bounded in H and Lα Tt x H  Cα t−α e−δt x H , ∀x ∈ H.

(5)

Tt x − x H  Cα tα x H2α , t > 0.

(6)

(iv) Let α ∈ (0, 1] and x ∈ H2α , then

Let {W k (t), t  0, k ∈ N} be an infinite sequence of independent standard (Ft )-Brownian motions on (Ω, F, P ). Let l2 be the usual sequence Hilbert space of real numbers. Then {W k (t), t  0, k ∈ N} can be regarded as a standard cylindrical Brownian motion on l2 . For α ∈ R, let L2 (l2 ; Hα ) be the space of all Hilbert-Schmidt operators from l 2 to Hα . Below, we fix β ∈ [0, 1) and let B = (Bk )k∈N : H → L2 (l2 ; H−β )

be a bounded linear operator. For r  0, consider the following linear SPDE . dX(t) = −LX(t)dt + Bk X(t)dW k (t), t  r, X(r) = x0 .

(7)

(8)

86

B. Goldys and X. Zhang

Here and below, we use the convention that the repeated indices are summed and the letter C with or without subscripts denotes a positive constant, whose value may change from one place to another. For any x0 ∈ H−β , it is well known that there exists a unique solution X(t, r; x0 ) for equation (8) such that for any γ ∈ (0, 1 − β) (cf. [2, p. 25, Theorem 5.4]) t → X(t, r; x0 ) ∈ C((r, ∞); Hγ ), P − a.s. and  t

Tt−s Bk X(s, r; x0 )dW k (s).

X(t, r; x0 ) = Tt−r x0 +

r

It is easy to see that H−β ∋ x0 → X(t, r; x0 ) ∈ L2 (Ω, P; Hγ ) is a bounded linear operator for any γ ∈ (0, 1 − β). By the uniqueness of solution we also have X(t, r, ω; x0 ) = X(t − r, 0, θr ω; x0 ), P − a.s.,

(9)

where θr is the usual shift operator. The following simple estimate will be used frequently: For α, β ∈ [0, 1) and 0  r < t,  1  t ds ds 1−α−β = (t − r)  Cα,β · (t − r)1−α−β . (10) β α β α 0 (1 − s) s r (t − s) (s − r) We begin with the following Gronwall lemma of Volterra type. Lemma 2.2. Let T > 0 and f (t) : [0, T ] → R+ be a continuous function. Assume that for some α, β ∈ [0, 1) and γ > α + β − 1,  t f (s)ds f (t)  f0 + Ctγ , ∀t ∈ [0, T ]. β α 0 (t − s) s Then for some CT > 0, f (t)  CT · f0 , ∀t ∈ [0, T ].

(11)

Proof. Choose p > 1 so that

p 1+γ 1 1 < ∧ ∧ . p−1 α+β α β By H¨older’s inequality we have  t

p f (s)ds p p γp f (t)  Cp f0 + Cp t β α 0 (t − s) s  t

p−1  t ds  Cp f0p + Cp tγp f (s)p ds qβ sqα (t − s) 0 0  t (10) p (1+γ−q(α+β))(p−1)+γ  Cp f0 + Cp t f (s)p ds 0  t p  Cp f0 + CT,p f (s)p ds. q :=

0

The estimate (11) now follows by the usual Gronwall inequality.



Stochastic Flows for Nonlinear SPDEs

87

We now prove several moment estimations about the mapping (t, r) → X(t, r; x0 ). Lemma 2.3. Let α ∈ [0, 1). For any p  1 and T > 0, there exists a positive constant CT,p such that for all x0 ∈ H ) * sup E (t − r)αp X(t, r; x0 ) 2p  CT,p · x0 2p . (12) H H−α 0r 0, there exists a positive constant CT,p such that for all x0 ∈ H and all 0  r  r1 < r2  t  T , 8  r2 82p 8 8 k 8 E8 T B X(s, r; x )dW (s) t−s k 0 8 8 r1

 CT,p





r2

r1

ds (t − s)γ+β (s − r)α

p

· x0 2p . H−α

88

B. Goldys and X. Zhang

Proof. By BDG’s inequality and H¨ older’s inequality we have 82p 8 r2 8 8 k 8 Tt−s Bk X(s, r; x0 )dW (s)8 E8 8 r1

 Cp E





r2

r1 r2



Tt−s BX(s, r; x0 ) 2L2 (l2 ;Hγ ) ds

p

p 1 −β/2 2 L BX(s, r; x ) ds 2 0 L2 (l ;H) γ+β r1 (t − s) 

p r2 (7) 1 2  Cp E X(s, r; x ) ds 0 H γ+β r1 (t − s)  r2

p−1  r2 ds E((s − r)α X(s, r; x0 ) 2H )p  Cp · ds γ+β α (s − r) (t − s)γ+β (s − r)α r1 (t − s) r1  r2

p (12) ds  CT,p · x0 2p H−α . γ+β (s − r)α (t − s) r1 (5)

 Cp E

The proof is thus complete.



Lemma 2.5. Let α ∈ [0, 1) and ǫ ∈ (0, 1 − α). For any p  1 and T > 0, there exists a positive constant CT,p such that for all x0 ∈ H and 0  r < r′ < t  T , ) * E (t − r′ )(α+ǫ)p X(t, r′ ; x0 ) − X(t, r; x0 ) 2p H  CT,p · |r′ − r|(ǫ∧(1−β))p · x0 2p , H−α

(13)

where β is from (7). Proof. Let 0  r < r′ < t  T and set g(t, r′ , r) := (t − r′ )(α+ǫ)/2 (X(t, r′ ; x0 ) − X(t, r; x0 )). Then g(t, r′ , r) = (t − r′ )(α+ǫ)/2 (Tt−r′ − Tt−r )x0  r′ ′ (α+ǫ)/2 + (t − r ) Tt−s Bk X(s, r; x0 )dW k (s) r  t + (t − r′ )(α+ǫ)/2 Tt−s Bk (X(s, r′ ; x0 ) − X(s, r; x0 ))dW k (s) r′ ′



= : I1 (t, r , r) + I2 (t, r , r) + I3 (t, r′ , r).

For I1 , we have (6)

I1 (t, r′ , r) 2H  C(r′ − r)ǫ · (t − r′ )α+ǫ · Lǫ/2 Tt−r′ x0 2H (5)

 C(r′ − r)ǫ · Lα/2 x0 2H

= C(r′ − r)ǫ · x0 2H−α .

Stochastic Flows for Nonlinear SPDEs

89

For I2 , by Lemma 2.4 we have E I2 (t, r′ , r) 2p H

p ds · x0 2p  C(t − r′ )(α+ǫ)p · H−α (t − s)β (s − r)α r  ′ p r ds ′ ǫp  C(t − r ) · · x0 2p H−α β−α (s − r)α (t − s) r  ′ p r ds  CT,p · · x0 2p H−α (r′ − s)β−α (s − r)α r 

r′

(10)

 CT,p · |r′ − r|(1−β)p · x0 2p H−α .

As in the proof of Lemma 2.4, for I3 we have  2p ′ ′ (α+ǫ)p ·E E I3 (t, r , r) H  C(t − r )  C(t − r′ )(α+ǫ)p × (10)

C



t r′



t

r′



t

r′

t

g(s, r′ , r) 2H ds β ′ α+ǫ r ′ (t − s) (s − r )

p−1 ds (t − s)β (s − r′ )α+ǫ

(14)

p

1 E g(s, r′ , r) 2p H ds (t − s)β (s − r′ )α+ǫ

(t − r′ )(1−β)(p−1)+α+ǫ E g(s, r′ , r) 2p H ds. (t − s)β (s − r′ )α+ǫ

Combining the above calculations, we obtain 3 4 ′ ǫp ′ (1−β)p E g(t, r′ , r) 2p  C (r − r) + (r − r) · x0 2p T,p H H−α  t (t − r′ )(1−β)(p−1)+α+ǫ + CT,p E g(s, r′ , r) 2p H ds. β (s − r′ )α+ǫ (t − s) ′ r The result follows by Lemma 2.2.



Set f (t, r; x0 ) := (t − r)

α+γ 2



t

Tt−s Bk X(s, r; x0 )dW k (s).

r

Lemma 2.6. Let α ∈ [0, 1), γ ∈ [0, 1 − β) and 0 < ǫ < (α + γ) ∧ (1 − β − γ) ∧ (1 − α).

(15)

For any p  1 and T > 0, there exists a positive constant CT,p such that for all x0 ∈ H and 0  r  t  T , 0  r′  t′  T , 2p ′ ǫp ′ ǫp E f (t′ , r′ ; x0 ) − f (t, r; x0 ) 2p Hγ  CT,p (|t − t| + |r − r| ) · x0 H−α .

(16)

90

B. Goldys and X. Zhang

Proof. For 0  r < t  t′  T , we have 4  t′ 3 α+γ α+γ Tt′ −s Bk X(s, r; x0 )dW k (s) f (t′ , r; x0 ) − f (t, r; x0 ) = (t′ − r) 2 − (t − r) 2 + (t − r) + (t − r)



α+γ 2

r

t′

Tt′ −s Bk X(s, r; x0 )dW k (s)

t



α+γ 2

t

(Tt′ −s − Tt−s )Bk X(s, r; x0 )dW k (s)

r ′

= : I1 (t′ , t) + I2 (t , t) + I3 (t′ , t). Note that |(t′ − r)

α+γ 2

− (t − r)

α+γ 2

|  |t′ − t|

α+γ 2

ǫ

 |t′ − t| 2 · |t′ − r|

α+γ−ǫ 2

.

For any p  1, by virtue of γ + β < 1 and (15), by Lemma 2.4 we have  ′ p t ds 2p ′ ′ ǫp ′ (α+γ−ǫ)p E I1 (t ,t) Hγ  Cp |t − t| · |t − r| · · x0 2p H−α ′ γ+β (s − r)α r (t − s)

p  1 (10) ds · x0 2p  Cp |t′ − t|ǫp · |t′ − r|(1−ǫ−β)p · H−α γ+β sα (1 − s) 0  CT,p · |t′ − t|ǫp · x0 2p . H−α

(17)

Similarly, we have E I2 (t



, t) 2p Hγ

(α+γ)p

 Cp · (t − r)

γp

 Cp · (t − r) ′

·

·





ds

p

(t′ − s)γ+β (s − r)α p ds · x0 2p H−α (t′ − s)γ+β t

t′

t

t′

· x0 2p H−α

· x0 2p , H−α

(1−γ−β)p

 CT,p · |t − t|

and as in the proof of Lemma 2.4 E I3 (t′ , t) 2p Hγ  Cp · (t − r)(α+γ)p · E



 CT,p · (t − r)(α+γ)p · E ′

 CT,p · (t − t) (10)

ǫp

t

r



· (t − r)

P (Tt′ −s − Tt−s )BX(s, r; x0 ) 2L2 (l2 ;Hγ ) ds

r

t

p (t′ − t)ǫ Lǫ/2 Tt−s BX(s, r; x0 ) 2L2 (l2 ;Hγ ) ds

(α+γ)p

 CT,p · |t′ − t|ǫp · x0 2p H−α .

·



r

t

ds

(t − s)ǫ+γ+β (s − r)α

p

· x0 2p H−α

Stochastic Flows for Nonlinear SPDEs

91

Combining the above calculations, we obtain that for all 0  r < t  t′  T 2p ′ ǫp E f (t′ , r; x0 ) − f (t, r; x0 ) 2p Hγ  CT,p · |t − t| · x0 H−α .

(18)

For 0  r < r′ < t  T we have 4 t 3 α+γ α+γ f (t, r′ ; x0 ) − f (t, r; x0 ) = (t − r′ ) 2 − (t − r) 2 Tt−s Bk X(s, r; x0 )dW k (s) + (t − r′ )

α+γ 2

+ (t − r′ )

α+γ 2



r

r′

Tt−s Bk X(s, r; x0 )dW k (s)

r



t

r′ ′

Tt−s Bk (X(s, r′ ; x0 ) − X(s, r; x0 ))dW k (s)

=: J1 (r′ , r) + J2 (r , r) + J3 (r′ , r). As in the estimation of (17), we have ′ ǫp E J1 (r′ , r) 2p Hγ  Cp · (r − r)

and as in the estimation of (14) ′ (1−β)p E J2 (r′ , r) 2p . Hγ  Cp · (r − r)

For J3 we have 

t

1 X(s, r′ ; x0 ) − X(s, r; x0 ) 2H ds γ+β (t − s) ′ r  t

p−1 ds  (t − r′ )(α+γ)p γ+β (s − r′ )α+ǫ r ′ (t − s)  t E((s − r′ )α+ǫ X(s, r′ ; x0 ) − X(s, r; x0 ) 2H )p × ds (t − s)γ+β (s − r′ )α+ǫ r′  t (13) (r′ − r)ǫp  CT (t − r′ )(1−ǫ−β)(p−1)+α+γ ds γ+β (s − r′ )α+ǫ r′ (t − s)

′ (α+γ)p E J3 (r′ , r) 2p E Hγ  (t − r )

p

(10)

 CT (t − r′ )(1−ǫ−β)p · (r′ − r)ǫp .

Thus, for 0  r  r′ < t  T , we have 2p ′ ǫp E f (t, r′ ; x0 ) − f (t, r; x0 ) 2p Hγ  CT,p · |r − r| · x0 H−α .

(19)

Moreover, similar to the above calculations, we also have for all 0  r < t  T 2p ǫp E f (t, r; x0 ) 2p Hγ  CT,p · |t − r| · x0 H−α .

Estimation (16) now follows by (18), (19) and (20).

(20) 

Define a family of random linear operators by St,r (ω)x0 := X(t, r, ω; x0 ).

(21)

92

B. Goldys and X. Zhang Then we have:

Theorem 2.7. Assume that for some α ∈ (0, 1), L−α/2 is a Hilbert-Schmidt opera˜ t,r of St,r such that tor on H. Then for any γ ∈ [0, 1 − β), there exists a version S for P -almost all ω ˜ t,r (ω) ∈ L2 (H, Hγ ) is continuous, {(t, r) ∈ R2+ : r < t} ∋ (t, r) → S

and for any T > 0 and some random variable KT ∈ ∩p1 Lp (Ω, F, P )

˜ t,r (ω) L (H,Hγ )  (t − r)−(α+γ)/2 · KT (ω), ∀0  r < t  T. S 2

(22)

Proof. We decompose St,r as the following two operators: (1)

(2)

St,r x0 = St,r x0 + St,r x0 , where (1) St,r x0

:= Tt−r x0 ,

(2) St,r x0

:=



t

Tt−s Bk X(s, r; x0 )dW k (s). r

Let {ek , k ∈ N} be an orthonormal basis of H. It is clear that  (1)   Cα Cα St,r ek 2Hγ = L−α/2 ek 2H  . Tt−r ek 2Hγ  α+γ (t − r) (t − r)α+γ k

k

k

(2)

Thus, we only need to prove that x0 → St,r (ω)x0 has the desired properties. For any p  1 and T > 0, by Minkowski’s inequality and Lemma 2.6 we have p ∞  α+γ α+γ (2) (2) E (t′ − r′ ) 2 St′ ,r ′ ek − (t − r) 2 St,r ek 2Hγ k=1

=E







∞ 

k=1

∞ )  k=1





f (t , r ; ek ) − ′



f (t, r; ek ) 2Hγ

E f (t , r ; ek ) − ′

 CT,p · (|t − t|

ǫp



p

f (t, r; ek ) 2p Hγ ǫp

+ |r − r| )



∞ 

k=1

*1/p

p

−α/2

L

ek 2H

p

 CT,p · (|t′ − t|ǫp + |r′ − r|ǫp ), ∀r  t, r′  t′ ∈ [0, T ]. (2)

(2)

˜ By Kolmogorov’s continuity criterion, there exists a version S t,r of St,r such that ) * α+γ ˜ (2) ek {(t, r) ∈ R2+ : r  t} ∋ (t, r) → (t − r) 2 S ∈ L2 (l 2 ; Hγ ) t,r k∈N

is locally H¨older continuous, and for any p  1 and T > 0  p ∞  α+γ (2) 2 ˜ 2 E sup (t − r) St,r ek Hγ < +∞. 0rtT k=1

Stochastic Flows for Nonlinear SPDEs Therefore, if we define for x0 ∈ H ˜ (2) (ω)x0 = S t,r

∞ 

k=1

then

˜ (2) S t,r

(2) St,r

is a version of E sup

(2)

˜ (ω)ek , x0 , ek H · S t,r

and for any p  1,

(t − r)

0rtT

93

α+γ 2

˜ (2) L (H,Hγ ) S t,r 2

The proof is thus complete.

p

< +∞. 

We now turn to the following nonlinear stochastic evolution equation: dY (t) = −LY (t)dt + F (Y (t))dt + [Bk Y (t) + Dk (t)]dW k (t), Y (0) = y0 ,

where for some γ ∈ [0, 1 − β) (β from (7)),

F (y1 ) − F (y2 ) H  CF · y1 − y2 Hγ

and



(23) (24)

T

D(t) 2L2 (l2 ;H−β ) dt < +∞, ∀T > 0.

0

(25)

Theorem 2.8. Assume that (7), (24) and (25) hold. Then, there exists a unique (Ft )-adapted solution Y (t, ω; y0 ) such that for P -almost all ω ∈ Ω (0, ∞) × H ∋ (t, y0 ) → Y (t, ω; y0 ) ∈ Hγ is continuous.

Proof. Let Z(t) ∈ C((0, ∞); Hγ ) solve the following equation:  t Z(t) = Tt−s [Bk Z(s) + Dk (s)]dW k (s).

(26)

0

Then the unique solution of equation (23) can be represented by using the variation of constant formula as  t ˜ t,0 (ω)x0 + ˜ t,s (ω)F (Y (s, ω; x0 ))ds + Z(t, ω), Y (t, ω; x0 ) = S S (27) 0

˜ t,s is given in Theorem 2.7. where S Indeed, note that by definition (21)  t ˜ s,0 x0 dW k (s) = St,0 x0 − Tt x0 Tt−s Bk S

(28)

0

and by stochastic Fubini’s theorem

 s  t ˜ s,r F (Y (r; x0 ))dr dW k (s) Tt−s Bk S 0

= =

 t 

0

˜ s,r F (Y (r; x0 ))dW k (s)dr Tt−s Bk S

r

t

0

0

t

˜ t,r F (Y (r; x0 )) − Tt−r F (Y (r; x0 ))]dr. [S

(29)

94

B. Goldys and X. Zhang

By (26)–(29), we find that  t Tt−s F (Y (s; x0 ))ds Y (t; x0 ) = Tt x0 + 0  t Tt−s [Bk Y (s; x0 ) + Dk (s)]dW k (s). + 0

Thus, from (27) we have for any T > 0,

˜ t,0 (ω)(x0 − y0 ) Hγ Y (t, ω; x0 ) − Y (t, ω; y0 ) Hγ  S 8 t 8 8 8 ˜ 8 +8 St,s (ω)(F (Y (s, ω; x0 )) − F (Y (s, ω; y0 )))ds8 8 0

(22)(24)



t

− α+γ 2

+



t

0



KT (ω) x0 − y0 H

CF · KT (ω) (t − s)

α+γ 2

Y (s, ω; x0 ) − Y (s, ω; y0 ) Hγ ds.

By [5, Lemma 7.1] we get

Y (t, ω; x0 ) − Y (t, ω; y0 ) Hγ  CT t−

α+γ 2

KT (ω) x0 − y0 H .

The proof is complete.



Remark 2.9. We shall use this abstract result to prove Theorem 1.1. We want to emphasize that Condition (7) is rather strong so that we can only apply it to the stochastic equation with spatial dimension 1 (see Lemma 3.1 below).

3. Proof of Theorem 1.1 2

∂ In order to use Theorem 2.8, we let H := L2 (0, 1) and L := − ∂x 2 with the domain

D(L) := {φ ∈ L2 (0, 1) : Lφ ∈ L2 (0, 1)} = H01 (0, 1) ∩ H 2 (0, 1),

where H01 (0, 1) and H 2 (0, 1) are the usual Sobolev spaces. It is clear that (L, D(L)) is a positive self-adjoint and densely defined operator in L2 (0, 1), and generates an analytic semigroup (Tt )t0 , which is explicitly given by the fundamental solution Gt (x, y) as follows:  1 Tt φ(x) = Gt (x, y)φ(y)dy. (30) 0

The eigenvalues and normalized eigenfunctions of L are given by √ λn = π 2 n2 , en (x) = 2 sin(πnx), n ∈ N, i.e, Len (x) = λn en (x). For α ∈ R, the fractional power Lα is defined by Lα φ(x) =

∞ 

n=1

λα n φ, en H en ,

Stochastic Flows for Nonlinear SPDEs and Hα can be characterized by . α

2

H :=

φ ∈ L (0, 1) : φ Hα =

From this, it is easy to see that

∞ 

λα n

n=1

2

95

:

· |φ, en H | < +∞ .

L−α is a Hilbert-Schmidt operator on H for any α > 41 . Define W k (t) =



(31)

1

ek (y)W (t, dy), 0

then {W k , k ∈ N} is a sequence of independent standard Brownian motions. Let g(s, x) be a measurable (Ft )-adapted process with  T 1 E |g(s, x)|2 dsdx < +∞, ∀T > 0. 0

0

By elementary process approximation, one may prove that  t 1  t g(s, y)W (ds, dy) = g(s), ek H dW k (s). 0

0

k

0

So, the stochastic integral occurring in (4) can be rewritten as  t 1 Gt−s (x, y)[b(y)u(s, y) + c(y)]dW (ds, dy) 0

0

=

 t  0

k

(30)

=

Gt−s (x, y)[b(y)u(s, y) + c(y)]ek (y)dydW k (ds)

0

 k

Thus, if we define

1

t

[Tt−s (bu(s)ek + cek )](x)dW k (ds).

0

F (u)(x) := f (x, u(x)) and Bk u = b · u · ek , Dk := c · ek ,

then equation (4) takes the form (23).

Lemma 3.1. For any 1 > β > 21 , D ∈ L2 (l2 ; H−β ) and B is a bounded linear operator from H to L2 (l2 ; H−β )). Proof. Note that L−α also has the following expression:  ∞ 1 L−α = tα−1 Tt dt, Γ(α) 0 where Γ(α) is the usual Gamma function.

96

B. Goldys and X. Zhang By Minkowski’s inequality, we have

2    1  ∞ β 1 −1 −β/2 2 2 [Tt (buek )](x)dt dx L (buek ) H = t Γ(β/2) 0 0 k k   1/2 2  ∞ 1 β 1   |[Tt (buek )](x)|2 dx dt t 2 −1 Γ(β/2) 0 0 k



1 = Γ(β/2)

(30)



t



β 2 −1

0

0

1 =: (I1 + I2 )2 , Γ(β/2)

1

0

1

1/2 2 |Gt (x, y)b(y)u(y)|2 dydx dt

where I1 = I2 =







t

β 2 −1

1 1

β

t 2 −1

0





1

0 1

0





1 2

0 1

0

|Gt (x, y)b(y)u(y)| dydx

1/2

dt,

1/2 |Gt (x, y)b(y)u(y)|2 dydx dt.

From the expression of Gt (x, y), it is easy to see that % & |x − y|2 1 exp − . Gt (x, y)  √ 4t 4πt Thanks to b ∈ L∞ (0, 1) and β > 21 , we have 1 I1  √ 4π and



1



t

β 3 2−2



1 0



1 2

0

|b(y)u(y)| dy

1/2

dt  C u H

 ∞

1/2  1  1 |x|2 β 1 3 I2  √ t2 −2 e− 2t dx |b(y)u(y)|2 dy dt 4π 0 −∞ 0  1 β 5 t 2 − 4 dt · u H  C u H . C 0

Hence, for any u ∈ H = L2 (0, 1),

Bu L2 (l2 ;H−β ) = The proof is complete.

 k

L−β/2 (buek ) 2H  C u 2H .

Proof of Theorem 1.1. For any p  2, one chooses p. 243, (4.10)]) Hγ ⊂ Lp (0, 1).

 1 2

− p1 < γ
0. We further introduce the notion of the viscosity solution to the penalized equation, and proclaim that uε above is in fact a viscosity solution. In Section 4 we let ε → 0 to obtain the weak solution. We use an argument of approximation in the Banach space of continuous functions. In Section 5 we show that the weak solution in Section 4 is in fact a viscosity solution to the original variational inequality (HJB equation). Further we show the uniqueness. The solution v is expressed in terms of the stopping time, and we give at the last part of Section 5 an expression of this optimal stopping time. Due to this approach, the regularity of the value function can not be expected.

Optimal Stopping Problems

101

2. Problem setting We assume that the wealth price X = {X(t)} evolves according to the onedimensional stochastic differential equation of jump-diffusion type    dX(t) =

r+

|z| 0 and ak → 0 as k → ∞ obeying (A0). Or, the L´evy measure of α-stable type (where α = 2 − β ∈ (0, 2)), and that of generalized inverse Gaussian type also apply. (The measure above is an extension of the one treated in [25, Sect. 3].)

102

Y. Ishikawa We put r˜ = r +



R\{0}

(ez − 1 − z · 1|z| 0. (1)′ dX(t) = r˜X(t)dt+σX(t)dB(t)+X(t−) (ez −1)N We assume

supp µ ⊂ [0, +∞)

(A1)

for simplicity. (See the next section for the detailed reason.) This implies  (ez − 1 − z · 1|z| 0. R\{0}

Further we assume (A2)

r˜ > 0.

By (A1) the process X(t) has no negative jumps. Hence, starting from x > 0, X(t) diffuses around {x} (when σ > 0), or moves rightward by jumps. The reward function g(x) is assumed to have the following property: g ≥ 0,

g ∈ C,

(2)

where C denotes the Banach space of all continuous functions on [0, ∞) vanishing at infinity, with norm h = supx∈[0,∞) |h(x)|. We fix a g in the sequel. A mathematical objective of this article is to find an optimal stopping time τ ∗ so as to maximize the expected reward function: J(τ ) = E[e−˜rτ g(X(τ ))]

(3)

over the class S of all stopping times τ associated with (Ft ), where e−˜rτ g(Xτ ) at τ = ∞ is interpreted as zero. Here and in what follows E denotes the expectation with respect to the probability law such that X(0) = x. This function, viewed as a function of the starting point x, is called a value function, denoted by v(x). That is, v(x) = supτ ∈S J(τ ). To solve this problem, we consider the variational inequality (HJB equation, for short): . max(Lu, g − u) = 0 in (0, ∞), (4) u(0) = g(0), with the interpretation u(0) = u(0+). Here L is an integro-differential operator given by 1 Lu = −˜ ru + σ 2 x2 u′′ + r˜xu′ 2 +

{u(x + γ(x, z)) − u(x) − u′ (x) · γ(x, z)}µ(dz)

Optimal Stopping Problems

103

where γ(x, z) = x(ez − 1). We write

Lu = −˜ r u + L0 u

in the sequel. The condition (2) is fulfilled if the reward function is given by the bounded function g(x) = (K − x)+ · 1{x≥0} for the strike price K > 0 of an european put option. It may also hold for some type of call option g with an overhead barrier: choose g ∈ C such that g(x) = (x − K)+ .1{x>0} if x ≤ M , g(x) = 0 if x > M + 1 for some large M > K > 0. Remarks. (i) Suppose that the variational inequality (4) admits a solution v ∈ C 2 ([0, ∞)). Then the optimal stopping time τˆ is given by τˆ = inf{t > 0 : v(X(t)) ≤ g(X(t))}.

Indeed, assuming τˆ < ∞ a.e., from (4) it follows that Lv = 0 if

v > g.

Hence Lv(X(t)) = 0 for t < τˆ. By Itˆo’s formula, under some additional assumptions on v, we obtain  τˆ ' −˜rτˆ ( e−˜rt Lv(X(t))dt E e v(X(ˆ τ )) = v(x) + E +E +



#

τˆ

e−˜rt v′ (X(t))σX(t)dB(t)

0

τˆ

0

= v(x).

0



−˜ rt

e

$ ˜ (v(X(t) + γ(X(t), z)) − v(X(t)))N (dsdz)

Thus E[e−˜rτˆ g(X(ˆ τ ))] ≥ E[e−˜rτˆ v(X(ˆ τ ))] = v(x).

On the other hand, since Itˆ o’s formula gives

Lv ≤ 0,

E[e−˜rτ v(X(τ ))] ≤ v(x), τ ∈ S. We assume v is bounded, and choose τ = τˆ. Therefore we might seem to obtain the optimality of τˆ, and we have Φ(x) = v(x), where Φ(x) = supτ J x (τ ). However, we remark that v ∈ C 2 may be violated, because v is connected at some point x to g who is supposed to be only continuous and because we admit σ to be degenerate. (In [28] Chap. 2 a similar problem is studied with σ = 1, cf. [28] p. 43. A similar work [22] is carried out under the assumption σ > 0.) This is the reason why we use the penalty method.

104

Y. Ishikawa

(ii) There are several definitions of viscosity solutions concerning the HJB equation with respect to integro-differential operators, and some of them are equivalent. We adopt below the one introduced by Benth et al. [7], [8], [9], which goes in accordance with those given by Pham [29]. There are other attempts (e.g., [1], [2], [5]). (iii) Since L satisfies the positive maximum principle, L can be viewed as a pseudodifferential operator with the symbol a(x, ξ) given by a(x, ξ) = a1 (x, ξ) + a2 (x, ξ), where 1 rxξ a1 (x, ξ) = −˜ r − σ 2 x2 ξ 2 + i˜ 2  a2 (x, ξ) = {eiξγ(x,z) − 1 − iξ · γ(x, z)}µ(dz). The symbol of L0 is given by (a1 (x, ξ) + r˜) + a2 (x, ξ). By the assumption that σ ≥ 0, the symbol a1 may not be elliptic. On the other hand, we observe due to (A00) the symbol a2 satisfies |a2 (x, ξ)| ≥ c(x)|ξ|α for each x ∈ R. Here α = 2 − β ∈ (0, 2). Hence a(x, ξ) is a non-degenerate symbol of order α.

3. Penalized problem In this section, we show the existence of a unique solution u of the equation (4). To solve (4), we need to study the penality equation for ε > 0:  1   r˜u = L0 u + (u − g)− in (0, ∞), ε (5) g(0)   u(0) = , r˜ε + 1 originated by Bensoussan and Lions [10]. Here and in what follows we denote (x)+ = max(x, 0) and (x)− = max(−x, 0) for x ∈ R. At the point x = 0, it is known from [20] that the “trace” u(0+) exists finite under the assumption (A1), due to the transmission condition.

Remark. In general we have ez − 1 ≥ −1 for z ∈ R, and hence ∆z(t) = z(t−)(ez − 1) ≥ −z(t−) if z(t−) ≥ 0 at each jump corresponding to z ∈ supp µ. This implies z(t) = z(t−) + ∆z(t) ≥ 0 if z(t−) ≥ 0 at each jump moment t. However, since we may have infinitely many jumps in a finite time interval, we can not guarantee in general the existence of the trace of u in (5) from the right at z = 0 if we admit the negative jumps (i.e., supp µ ∩ (−∞, 0) = ∅).

Optimal Stopping Problems

105

Hence we may search the solution u in C (cf. Theorems 3.1, 4.1 below). We begin with a probabilistic penalty equation for u: $ # ∞ −(˜ r+ ε1 )t 1 (u ∨ g)(X(t))dt , e u(x) = E ε 0 for x ≥ 0, ε > 0, with the boundary condition u(0) =

(6)

g(0) . r˜ε+1

Theorem 3.1. We assume (2), (A0), (A00), (A1) and (A2). Then, for each ε > 0, there exists a unique nonnegative solution u = uε ∈ C of (6). Proof. Define T h(x) = E

#



0

$ 1 1 e−(˜r+ ε )t (h ∨ g)(X(t))dt ε

for h ∈ C+ ,

(7)

where C+ = {h ∈ C : h ≥ 0}. Clearly, C+ is a closed subset of C. By (7), we have $ # ∞ 1 1 0 ≤ T h(x) = E e−(˜r+ ε )t (h ∨ g)(X(t))dt ε 0  ∞ 1 1 ≤ h ∨ g e−(˜r+ ε )t dt ε 0 h ∨ g = ≤ h ∨ g . r˜ε + 1 Then, |T h(y) − T h(x)| ≤ E →0

#



e

−(˜ r+ ε1 )t 1

0

as y → x,

ε

$ {|h ∨ g(X(t)) − h ∨ g(Y (t))|}dt

where {Y (t)} be the solution of (1) with the initial condition Y (0) = y > 0. This is true since |h ∨ g(X(t)) − h ∨ g(Y (t))| ≤ 2 h ∨ g where h ∨ g is bounded, and since we can use the dominated convergence theorem. Moreover, we have # ∞ $ −(˜ r+ 1ε )t 1 T h(x) = E e (h ∨ g)(X(t))dt → 0 as x → ∞. ε 0 Indeed, we denote X(t) = X x (t) for the dependence of X(t) on x. We see X x (t) → ∞ as x → ∞ by the expression of X x (t) in Section 2. Therefore (h∨g)(X x (t)) → 0 as x → ∞. Then  ∞ 1 1 e−(˜r+ ε )t (h ∨ g)(X x (t))dt → 0, x → ∞ a.s. ε 0 by the dominated convergence theorem. Therefore T h(x) → 0 as x → ∞ by the dominated convergence theorem again.

106

Y. Ishikawa Thus T maps C+ into C+ . Now, by (7), we have

|T (h1 − h2 )(x)| = |T h1 (x) − T h2 (x)| $ # ∞ 1 1 e−(˜r+ ε )t |h1 (X(t)) − h2 (X(t))|dt ≤E ε #0 ∞ $ 1 1 e−(˜r+ ε )t h1 − h2 dt ≤E ε 0 1 = h1 − h2 , h1 , h2 ∈ C+ . r˜ε + 1 This yields that T is a contraction mapping. Thus T has a fixed point u in C, which solves (6).  We fix ε > 0 temporarily. Theorem 3.2. We make the assumption of Theorem 3.1. Then the solution u in (6) is a viscosity solution of 1 r˜u = L0 u + (u − g)− ε

on

(0, ∞),

with boundary condition u(0) = r˜g(0) ε+1 . Further, for any τ ∈ S, we have # τ $ 1 u(x) = E e−˜rt (u − g)− (X(t))dt + e−˜rτ u(X(τ )) . ε 0 In particular,

u(x) = E

#

0



$ 1 e−˜rt (u − g)− (X(t))dt . ε

(8)

(9)

(10)

Here we use the notion of the viscosity solution for the penalty equation in the following sense. Definition 3.3. Let w ∈ C([0, ∞)) and w(0) = g(0). Then w is called a viscosity sub- or super-solution of (8) in the following manner; (a) w is a viscosity subsolution of (8), if for any φ ∈ C 2 ((0, ∞)) and any local maximum point z > 0 of w − φ, 1 r˜w(z) ≤ L0 φ(z) + (w − g)− (z), ε and (b) w is a viscosity supersolution of (8), if for any φ ∈ C 2 ((0, ∞)) and any local minimum point z¯ > 0 of w − φ, 1 r˜w(¯ z ) ≥ L0 φ(¯ z ) + (w − g)− (¯ z ). ε The function w is called a viscosity solution of (8) if it is viscosity both super- and sub-solutions of (8).

Optimal Stopping Problems

107

We postpone the proof of this theorem to the last section. Here we remark that, since u ∨ g = u + (u − g)− , we can rewrite (8) as

1 1 r˜ + u = L0 u + (u ∨ g) ε ε

in (0, ∞).

(11)

Remark. In general we can not guarantee the regularity nor the uniqueness of the solution uε of (8), in case σ is possibly degenerate. If σ > 0, we will be able to use the method of Garroni et al. [18] and lead to the regularity (i.e., the C 2 property) of u.

4. Passing to the limit as ε → 0 In this section, granting Theorem 3.2 for the moment, we study the convergence of u = uε ∈ C+ as ε → 0, where uε is a solution to (8). Define the Green function # ∞ $ Gβ h(x) = E e−βt h(X(t))dt , β > 0, (12) 0

and let G = {Gβ (βh) : h ∈ C, β > r˜}.

(13)

Our objective is to prove the following.

Theorem 4.1. We assume (2), (A0), (A1) and (A2). Let εn > 0 be any (A00), ∞ sequence such that εn → 0 and that n=1 εn < ∞. Then we have uε n



v

(14)

where v is some element in C. For the proof of this theorem, we prepare the following lemmas (Lemmas 4.2–4.4). Lemma 4.2. The subset G is dense in C. Proof. Step 1. Let h ∈ C be arbitrary. It is clear that Gβ (βh) ≤ h . By the dominated convergence theorem, we get # ∞ $ |Gβ (βh)(y) − Gβ (βh)(x)| ≤ E e−βt β|h(Y (t)) − h(X(t))|dt → 0 as y → x, 0

and

Gβ (βh)(x) = E

#

0

as in Section 3. Thus



e

−βt

$ βh(X(t))dt

G ⊂ C.

→ 0

as x → ∞

108

Y. Ishikawa

Step 2. On the other hand, for each x ≥ 0, it is easy to see that $ # ∞ −βt e β|h(X(t)) − h(x)|dt |Gβ (βh)(x) − h(x)| ≤ E #0 ∞ $ −s =E e |h(X(s/β)) − h(x)|ds → 0

as

0

β → ∞,

by the right continuity of X(t) as t → 0. We extend h ∈ C on the compactification [0, ∞) ∪ {∞} as h(∞) = 0. Hence Gβ (βh)(x) → h(x)

as β → ∞,

x ∈ [0, ∞) ∪ {∞}.

Let Λ be any element in the dual space C ∗ of C. By the Riesz representation theorem ([31] Theorem 6.19), there exists a measure µ = µΛ on [0, ∞) ∪ {∞} such that  Λ(h) = hdµ

for all h ∈ C. Let Λ be such that Λ(f ) = 0 for all f = Gβ (βh) ∈ G. By the dominated convergence theorem, we have Λ(h) = lim Λ(βGβ h) = 0, β→∞

h ∈ C.

Now, suppose that G¯ =  C, where G¯ is the closure of G. Then, there exists h0 ∈ C \ G¯ such that δ := inf f − h0 > 0. f ∈G¯

By the Hahn-Banach theorem, (cf. [31] Theorem 5.19), we can find some Λ∗ ∈ C ∗ such that ¯ Λ∗ (h0 ) = δ, Λ∗ (f ) = 0 for all f ∈ G. This is a contradiction, since h above ranges among C. Therefore, we deduce G¯ = C.  Lemma 4.3. Let u ˜ ∈ C+ be the solution of (8) with g˜ ∈ C+ replacing g. Then we have u − u ˜ ≤ g − g˜ . (15) Proof. Assume, on the contrary, u − u ˜ > g − g˜ . Let T˜ denote T in (7) with g˜ replacing g. Then, by Theorem 3.1, we observe u = T u and u ˜ = T˜ u ˜. Therefore we have  ∞ 1 1 1 u ∨ g˜) ≤ u − u ˜ . e−(˜r+ ε )t dt (u ∨ g) − (˜ |u − u ˜| = |T u − T˜ u ˜| ≤ ε r˜ε + 1 0 Here we used the relation |a ∨ b − c ∨ d| ≤ |b − d|

if

Hence we see that

|a − c| ≤ |b − d|,

a, b, c, d ∈ R.

1 u − u˜ . r˜ε + 1 Hence u − u ˜ = 0, and g − g˜ = 0. This is a contradiction. u − u ˜ ≤



Optimal Stopping Problems

109

uε (x) = sup E[e−˜rτ {g − (uε − g)− }(X(τ ))].

(16)

Lemma 4.4. We have τ ∈S

Proof. Let x > 0. By (9), we have $ # τ 1 uε (x) = E e−˜rt (uε − g)− (X(t))dt + e−˜rτ uε (X(τ )) ε 0 ≥ E[e−˜rτ (uε ∧ g)(X(τ ))].

On the other hand, we take τ = τε := inf{t : uε (X(t)) ≤ g(X(t))}, and consider τε ∧ T, T > 0. We have by (9)  τε ∧T 1 uε (x) = E e−˜rt (uε − g)− (X(t))dt + e−˜r(τε ∧T ) uε (X(τε ∧ T )) . ε 0 Since e−˜rτε uε (X(τε )) = e−˜rτε (uε ∧ g)(X(τε )),

we let T → ∞, and have # τε $ 1 uε (x) = E e−˜rt (uε − g)− (X(t))dt + e−˜rτε (uε ∧ g)(X(τε )) ε 0 = E[e−˜rτε (uε ∧ g)(X(τε ))].

Therefore uε (x) ≤ sup E[e−˜rτ (uε ∧ g)(X(τ ))]. τ ∈S

Comparing this with the above inequality, we have the relation (16).



We now can prove Theorem 4.1 Proof of Theorem 4.1. Step 1. We claim that (uε − g)− ≤ ε βh + (˜ r − β)g ,

(17)

if g = Gβ (βh) ∈ G for some h ∈ C. Indeed, by the same line as the proof of Theorem 3.2, we observe that g is a unique viscosity solution of . βg = L0 g + βh in (0, ∞), g(0) = h(0),

or equivalently,



1  ˆ + 1g  r ˜ + g = L0 g + h  ε ε  ε  ˆ  g(0) = {h(0) + r˜ε + 1

in (0, ∞), 1 g(0)}, ε

110

Y. Ishikawa

ˆ = βh + (˜ where h r − β)g (see the proof of Theorem 5.3 below for ˆ + 1 g). Therefore, by (6) Hence we have g = Gr˜+ 1ε (h ε # ∞ 1 1 ˆ uε (x) − g(x) = E e−(˜r+ ε )t { (uε ∨ g)(X(t)) − (h(X(t)) + ε 0 $ # ∞ 1 ˆ e−(˜r+ ε )t h(X(t))dt ≥ −E

the uniqueness). 1 g(X(t)))}dt ε

$

0

ˆ ≥ −ε h ,

x > 0,

which implies (17). Step 2. Let g = Gβ (βh) ∈ G. Applying (17) to uεn+1 (x) and uεn (x), by Lemma 4.4, we have |uεn+1 (x) − uεn (x)| ≤ sup E[e−˜rτ |(uεn+1 − g)− − (uεn − g)− |(X(τ ))] τ ∈S

≤ (εn+1 + εn ) βh + (˜ r − β)g . Thus ∞ 

uεn+1 − uεn ≤

n=1

∞ 

(εn+1 + εn ) βh + (˜ r − β)g < ∞.

n=1

This implies that {uεn } is a Cauchy sequence in C, and we get (14).

Step 3. Let g satisfy (2). By Lemma 4.2, there exists a sequence {gm } ⊂ G such that gm → g in C. Let um ε be the solution of (8) corresponding to gm . By Step 2, we see that um εn



vm ∈ C

as

n → ∞.

(18)

By Lemma 4.3, ′

m um εn − uεn ≤ gm − gm′ .

Letting n → ∞, we have ′

v m − vm ≤ gm − gm′ . Hence {v m } is a Cauchy sequence, and vm



v ∈ C.

(19)

Thus m m m uεn − v ≤ uεn − um εn + uεn − v + v − v m m ≤ g − gm + um εn − v + v − v .

Letting n → ∞ and then m → ∞, we obtain (14). The limit does not depend on the choice of (εn ) and {gm } as long as (εn ) satisfies the condition in the theorem. 

Optimal Stopping Problems

111

5. Viscosity solutions of variational inequalities In this section, we study the viscosity solution of the original variational inequality: . max(Lv, g − v) = 0 in (0, ∞), (20) v(0) = g(0). First we recall the notation Lv = −˜ r v + L0 v. We repeat the definition of the viscosity solution associated to the non-local operator. Definition 5.1 (cf. [7], [8]). Let v ∈ C([0, ∞)) satisfies v(0) = g(0). Then v is called a viscosity solution of (20), if the following assertions are satisfied: (a) For any φ ∈ C 2 and for any local minimum point z¯ > 0 of v − φ, −˜ rv(¯ z ) + L0 φ(¯ z ) ≤ 0,

(b) v(x) ≥ g(x) for all x ≥ 0, (c) For any φ ∈ C 2 and for any local maximum point z > 0 of v − φ, (−˜ r v + L0 φ)(v − g)+ |x=z ≥ 0.

Theorem 5.2. We assume (2), (A0), (A00), (A1) and (A2). Then the limit v in Theorem 4.1 is a viscosity solution of (20). Proof. Let φ ∈ C 2 and let z > 0 be a local maximum point of v − φ such that ¯δ (z), z = x v(z) − φ(z) > v(x) − φ(x), x ∈ B

for some δ > 0. Here Bδ (z) denotes the neighborhood of z with radius δ > 0, and ¯δ (z) denotes its closure. B By the uniform convergence in Theorem 4.1, the function uεn − φ attains a ¯δ (z). local maximum at xn ∈ B We deduce xn → z as n → ∞. ¯δ (z) of (uεn − φ), Indeed, for the sequence of local maximum points (xn ) in B ′ ¯ choose a subsequence (xnk ) of (xn ) so that for some z ∈ Bδ (z) xnk → z ′ .

By Theorem 4.1 (uεnk − φ)(xnk ) → (v − φ)(z ′ ) and max (uεnk (x) − φ(x)) → max (v(x) − φ(x)).

¯ δ (z) x∈B

¯ δ (z) x∈B

¯δ (z), and hence (v − φ)(z ′ ) ≥ (v − φ)(z). Hence (v − φ)(z ′ ) ≥ (v − φ)(x), x ∈ B ′ Hence we have z = z. Now, by Definition 3.3, we have 1 −˜ r uεn (x) + L0 φ(x) + (uεn − g)− (x)|x=xn ≥ 0. εn

112

Y. Ishikawa

Multiply both sides by (uεn − g)+ to obtain (−˜ r uεn (xn ) + L0 φ(xn ))(uεn − g)+ (xn ) ≥ 0. Letting n → ∞, we get (−˜ rv(z) + L0 φ(z))(v − g)+ (z) ≥ 0. Next, by (17), we have − (um r − β)gm , εn − gm ) ≤ εn βhm + (˜

where gm = Gβ (βhm ) for some hm ∈ C and um εn is as in the proof of Theorem 4.1. Letting n → ∞, by (18), we have since the norm · · · above does not depend on n, vm (x) ≥ gm (x),

x ≥ 0,

and then, by (19) v(x) ≥ g(x)

for all x ≥ 0.

Finally, let z¯ be the minimizer of v − φ, and x ¯n be the sequence of the local minimizers of uεn − φ such that x¯n → z¯. Then, by Definition 3.3 −˜ r uεn (x) + L0 φ(x) +

1 (uε − g)− (x)|x=¯xn ≤ 0, εn n

from which Letting n → ∞, we deduce

−˜ ruεn (¯ xn ) + L0 φ(¯ xn ) ≤ 0. −˜ rv(¯ z ) + L0 φ(¯ z ) ≤ 0.

Thus we get the assertion of the theorem.



Theorem 5.3. We make the assumption of Theorem 5.2. Let vi ∈ C, i = 1, 2, be two viscosity solutions of (20). Then we have v1 = v2 . Proof. Step 1. Let φ ∈ C 2 and let z > 0 be the local maximizer of v1 − φ. If v1 ≤ v2 , then (v1 − v2 )+ = 0. If v1 > v2 , then, as v1 > v2 ≥ g, we have (v1 − g)+ > 0. By Definition 5.1(c), we have (−˜ r v1 (z) + L0 φ(z))(v1 − g)+ (z) ≥ 0. Hence Thus

−˜ rv1 (z) + L0 φ(z) ≥ 0. (−˜ r v1 (z) + L0 φ(z))(v1 − v2 )+ (z) ≥ 0.

Optimal Stopping Problems

113

Here we put the super jet J 2,+ v(x) and subjet J 2,− v(x) as follows. % 2,+ J v(x) = (p, q) ∈ R × S 0 ; v(y) ≤ v(x) + p · (y − x) & 1 2 2 + q(y − x) + o(|x − y| ) as y → x , 2 % J 2,− v(x) = (p, q) ∈ R × S 0 ; v(y) ≥ v(x) + p · (y − x) & 1 2 2 + q(y − x) + o(|x − y| ) as y → x , 2 2,+ 2,− ¯ ¯ and J v(x), J v(x) denote their closures. Let 1 F (x, u, p, q, B δ (x, u, p), Bδ (x, u, p)) = −˜ ru+ σ 2 x2 q+rxp+B δ (x, u, p)+Bδ (x, u, p), 2 where  δ B (x, u, p) = {u(x + γ(x, z)) − u(x) − p · γ(x, z)}µ(dz), |z|>δ

and

Bδ (x, u, p) =



|z|≤δ

{u(x + γ(x, z)) − u(x) − p · γ(x, z)}µ(dz).

We have the following basic result concerning the characterisation of superjets (subjets). Lemma 5.4 (cf. [8] Lemmas 3.1, 3.2). Let v be a subsolution (supersolution) in (0, ∞). Then for all δ > 0, all x ∈ (0, ∞), and all (p, q) ∈ J¯2,+ v(x) (resp. J¯2,− v(x)), there exists φ ∈ C 2 such that F (x, v, p, q, B δ (x, v, p), Bδ (x, φ, φ′ )) ≥ 0 (≤ 0).

Further φ must satisfy φ′ (x) = p and φ′′ (x) = q.

Therefore, we obtain

1 −˜ r v1 (x) + σ 2 x2 q + rxp + B δ (x, v1 , p) + Bδ (x, φ, φ′ ) (v1 − v2 )+ (x) ≥ 0, 2 ∀(p, q) ∈ J¯2,+ v1 (x), ∀x > 0.

Step 2. Suppose there exists x0 > 0 such that v1 (x0 ) − v2 (x0 ) > 0. Then we find η > 0 such that M0 := sup[v1 (x) − v2 (x) − 2ηψ(x)] > 0, x≥0

where ψ(x) = (1 + x2 )1/2 + ψ0 for a constant ψ0 > 0 chosen later. Define k Ψk (x, y) = v1 (x) − v2 (y) − |x − y|2 − η{ψ(x) + ψ(y)} 2

114

Y. Ishikawa

for k ∈ N. Then there exists a maximizer (xk , yk ) ∈ [0, ∞)2 of Ψk (x, y) such that Ψk (xk , yk ) = sup Ψk (x, y) ≥ max(Ψk (¯ x, x ¯), Ψk (xk , xk )), (x,y)

where x ¯ > 0 is a maximum point of M0 . Since Ψk (xk , yk ) ≥ Ψk (¯ x, x ¯), we have v1 (xk ) − v2 (yk ) −

k |xk − yk |2 − η{ψ(xk ) + ψ(yk )} ≥ M0 , 2

and k |xk − yk |2 < v1 (xk ) − v2 (yk ) − η{ψ(xk ) + ψ(yk )} 2 < v1 (xk ) − v2 (yk ) < M

for some M > 0. Hence we have |xk − yk | → 0 as k → ∞. Further, since Ψk (xk , yk ) ≥ Ψk (xk , xk ), we have k |xk − yk |2 ≤ v2 (xk ) − v2 (yk ) + η(ψ(xk ) − ψ(yk )). 2 Here we let k → ∞, then the above implies that k|xk − yk |2 → 0. By extracting subsequences xk′ , yk′ , we obtain that xk′ → x ˜ , yk ′ → x ˜ for some x ˜. We denote this (xk′ , yk′ ) by the same (xk , yk ). Therefore v1 (˜ x) − v2 (˜ x) − 2ηψ(˜ x) > 0,

x ˜ > 0.

(21)

Now, applying Ishii’s lemma (cf. [14] Theorem 3.2) to Ψk (x, y), we obtain q1 , q2 ∈ R such that (k(xk − yk ), q1 ) ∈ J¯2,+ v˜1 (xk ), (k(xk − yk ), q2 ) ∈ J¯2,− v˜2 (yk ),     q1 0 1 −1 ≤ 3k , 0 −q2 −1 1

where v˜1 (x) = v1 (x) − ηψ(x), v˜2 (y) = v2 (y) + ηψ(y). Hence

(ˆ p1 , qˆ1 ) := (k(xk − yk ) + ηψ ′ (xk ), q1 + ηψ ′′ (xk )) ∈ J¯2,+ v1 (xk ), (ˆ p2 , qˆ2 ) := (k(xk − yk ) − ηψ ′ (yk ), q2 − ηψ ′′ (yk )) ∈ J¯2,− v2 (yk ).

By Definition 5.1(a) and Lemma 5.4, we have 1 −˜ rv2 (yk ) + σ 2 yk2 qˆ2 + ryk pˆ2 + B δ (yk , v2 , pˆ2 ) + Bδ (yk , φ2 , φ′2 ) ≤ 0. 2 Also, by Step 1

1 −˜ rv1 (xk ) + σ 2 x2k qˆ1 + rxk pˆ1 + B δ (xk ,v1 , pˆ1 ) + Bδ (xk ,φ1 ,φ′1 ) (v1 − v2 )+ (xk ) ≥ 0. 2 Hence, by (21)

1 −˜ r v1 (xk ) + σ 2 x2k qˆ1 + rxk pˆ1 + B δ (xk , v1 , pˆ1 ) + Bδ (xk , φ1 , φ′1 ) ≥ 0 2

Optimal Stopping Problems

115

for sufficiently large k. Putting these inequalities together, we get r˜{v1 (xk ) − v2 (yk )} 1 ≤ σ 2 (x2k qˆ1 − yk2 qˆ2 ) + r(xk pˆ1 − yk pˆ2 ) 2 !  + v1 (xk + γ(xk , z)) − v2 (yk + γ(yk , z)) |z|>δ

+



|z|≤δ

 " p1 · γ(xk , z) − pˆ2 · γ(yk , z) µ(dz) − v1 (xk ) − v2 (yk )) − (ˆ !    φ1 (xk + γ(xk , z)) − φ2 (yk + γ(yk , z)) − φ1 (xk ) − φ2 (yk )  " − φ′1 (xk ) · γ(xk , z) − φ′2 (yk ) · γ(yk , z) µ(dz)

≡ I1 + I2 + I3 + I4 (say).

Here we remark that, from Lemma 5.4, we can choose φ1 and φ2 such that φ′1 (xk ) = pˆ1 , φ′′1 (xk ) = qˆ1 , φ′2 (yk ) = pˆ2 and φ′′2 (yk ) = qˆ2 . Letting k → ∞, we have I1 ≤

σ2 [3k|xk − yk |2 + η{x2k ψ ′′ (xk ) + yk2 ψ ′′ (yk )}] → σ 2 η˜ x2 ψ ′′ (˜ x), 2

and I2 = r[k|xk − yk |2 + η{xk ψ ′ (xk ) + yk ψ ′ (yk )}]

→ 2rη˜ xψ ′ (˜ x).

On the other hand,  , I3 = (v1 (xk + γ(xk , z)) − v2 (yk + γ(yk , z))) − (v1 (xk ) − v2 (yk )) |z|>δ

− (k(xk − yk )2 (ez − 1) + η(γ(xk , z)ψ ′ (xk ) + γ(yk , z)ψ ′ (yk ))) µ(dz).

By (A0), (A00), µ is a bounded measure on {|z| > δ}, for which a constant times (ez −1) is integrable. As k → ∞, xk → x ˜ and yk → x ˜, and we have by the Lebesgue comvergence theorem  ′ → −2ηψ (˜ x ) γ(˜ x, z)µ(dz). I3 |z|>δ

On the other hand,  , I4 = (φ1 (xk + γ(xk , z)) − φ2 (yk + γ(yk , z))) − (φ1 (xk ) − φ2 (yk )) |z|≤δ

− (φ′1 (xk ) · γ(xk , z) − φ′2 (yk ) · γ(yk , z)) µ(dz).

Here the integrand {· · · } is O((ez − 1)2 ) = O(|z|2 ) as |z| → 0 uniformly in k. Hence we can use the Lebesgue convergence theorem, and have  I4 → −2ηψ ′ (˜ x) γ(˜ x, z)µ(dz), k → ∞. |z|≤δ

116

Y. Ishikawa

Thus, choosing ψ0 > 0 such that 12 σ 2 ≤ r˜ψ0 , we obtain % &  1 2 2 ′′ ′ ′ σ x r˜{v1 (˜ x) − v2 (˜ x)} ≤ 2η ˜ ψ (˜ x) + r˜ xψ (˜ x) − x ˜ψ (˜ x) γ(˜ x, z)µ(dz) 2

x ˜ 1 2 2 1 + r˜ x√ ≤ 2η σ x ˜ 2 3/2 2 (1 + x˜ ) 1+x ˜2

1 2 σ + r˜x ˜2 )−1/2 ≤ 2η ˜2 (1 + x 2 & % 1 2 2 1/2 2 σ (1 + x ˜ ) + r˜(1 + x ˜ ) (1 + x ˜2 )−1/2 ≤ 2η 2 ≤ 2η{˜ rψ0 (1 + x˜2 )1/2 + r˜(1 + x ˜2 )}(1 + x ˜2 )−1/2 = 2˜ rηψ(˜ x).

This is the contrary to (21). The proof is finished.



Theorem 5.5. We make the assumption of Theorem 5.2. Then we have v(x) = sup E[e−˜rτ g(X(τ ))]. τ ∈S

Further, the optimal stopping time τ ∗ is given by τ ∗ = lim ̺m m→∞

for x > 0, where ̺m = inf{t ≥ 0 : v(X(t)) −

1 m

≤ g(X(t))}.

Proof. Let x > 0 and τ ∈ S. By (9), we get $ # τ −˜ rt 1 − −˜ rτ e (uε − g) (X(t))dt + e uεn (X(τ )) uεn (x) = E ε n 0 ' ( ≥ E e−˜rτ uεn (X(τ )) .

Letting n → ∞, by Theorems 4.1 and 5.2, we have

v(x) ≥ E[e−˜rτ v(X(τ ))] ≥ E[e−˜rτ g(X(τ ))]. Since v(X(t)) −

1 > g(X(t)) on m

{t < ̺m },

we have # ̺m $ # ̺m $ 1 − −˜ rt − −˜ rt E e (uεn − g) (X(t))dt ≤ E e (uεn − (v − )) (X(t))dt m 0 #0 ̺m $ 1 −˜ rt − ≤E e ( − uεn − v ) (X(t))dt m 0 =0 (22)

Optimal Stopping Problems

117

for sufficiently large n. Hence, by (9) # ̺m $ 1 uεn (x) = E e−˜rt (uεn − g)− (X(t))dt + e−˜r̺m uεn (X(̺m )) εn 0 ' −˜r̺ ( m =E e uεn (X(̺m )) . Letting n → ∞, by (22), we get

% # &$ ( 1 −˜ r̺m g(X(̺m )) + v(x) = E e v(X(̺m )) ≤ E e m ' −˜rτ ( 1 ≤ sup E e g(X(τ )) + . m τ ∈S '

−˜ r̺m

Passing to the limit, we deduce

v(x) ≤ sup E[e−˜rτ g(X(τ ))]. τ ∈S

For the latter statement, we set τ¯ = τR ∧ ̺m , where τR = inf{t ≥ 0; X(t) > 1 R or X(t) < R }, R > 1. We then have τ¯ < ∞ a.s. By (22), it is clear that # τ¯ $ E e−˜rt (uεn − g)− (X(t))dt = 0 0

for sufficiently large n. Hence again by (9) E[e−˜rτ¯ uεn (X(¯ τ ))] = uεn . Letting n → ∞ and then R → ∞, we get

E[e−˜r̺m v(X(̺m ))] = v(x).

Note that ̺m ր τ ∗ as m ր ∞. Passing to the limit, we deduce ∗



E[e−˜rτ g(X(τ ∗ ))] = E[e−˜rτ v(X(τ ∗ ))] = v(x).

Thus, we obtain the assertion. This completes the proof.



6. Proof of Theorem 3.2 We have by (6) # u(x) = E

0

τ

e

−(˜ r + ε1 )t 1

ε

(u ∨ g)(X(t))dt +



τ



−(˜ r+ ε1 )t 1

e

ε

$ (u ∨ g)(X(t))dt .

We have by the strong Markov property of X(t) # ∞ $ −(˜ r+ ε1 )τ −(˜ r+ 1ε )τ −(˜ r+ 1ε )τ 1 e u(X(τ )) = e E e (u ∨ g)(X(t + τ ))dt|Fτ ε 0 # ∞ $ −(˜ r+ 1ε )t 1 =E e (u ∨ g)(X(t))dt|Fτ a.s. ε τ

118

Y. Ishikawa

Hence u(x) = E

#

0

τ

e

−(˜ r+ 1ε )t 1

ε

(u ∨ g)(X(t))dt + e

−(˜ r+ ε1 )τ

$

u(Xτ ) , τ ∈ S.

(23)

This can be regarded as the dynamic programming principle for the solution u. This property and Itˆ o’s formula lead to that u is a viscosity solution to the equation

1 1 r˜ + u = L0 u(x) + (u ∨ g) in (0, ∞), ε ε (24) g(0) u(0) = , r˜ε + 1 as in the proof of Theorem 1 in [19]. This is equivalent to 1 r˜u = L0 u(x) + (u − g)− in (0, ∞), ε g(0) . u(0) = r˜ε + 1

(25)

Hence we have (9). By letting τ → ∞ we see that u in (6) can be written as (10).  Acknowledgement I thank the referee for suggesting several improvements of the manuscript.

References [1] M. Arisawa, A new definition of viscosity solutions for a class of second-order degenerate elliptic integro-differential equations, Ann. Inst. Henri Poincar´e, Anal. Non Lin´eaire 23 (2006) 695–711; Corrigendum: Corrigendum for comparison theorems in: “A new definition of viscosity solutions for a class of second-order degenerate elliptic integro-differential equations”, Ann. Inst. Henri Poincar´e, Anal. Non Lin´eaire 24 (2007), 167–169. [2] M. Arisawa, A remark on the definitions of viscosity solutions for the integrodifferential equations with L´evy operators, J. Math. Pures Appl. (9) 89 (2008), 567–574. [3] L.H.R. Alvarez, Solving optimal stopping problems of linear diffusions by applying convolution approximations, Math. Methods Oper. Res. 53 (2001), 89–99. [4] G. Barles, R. Buckdahn and E. Pardoux, Backward stochastic differential equations and integral-partial differential equations, Stochastics Stochastics Rep. 60 (1997), 57–83. [5] G. Barles and C. Imbert, Second-order elliptic integro-differential equations: viscosity solutions’ theory revisited, Ann. Inst. Henri Poincar´e Analyse non lin´eare 25 (2008), 567–585. [6] N. Bellamy, Wealth optimization in an incomplete market driven by a jump-diffusion process, J. Math. Econom. 35 (2001) 259–287.

Optimal Stopping Problems

119

[7] F. Benth, K. Karlsen and K. Reikvam, Optimal portfolio selection with consumption and nonlinear integro-differential equations with gradient constraint: a viscosity solution approach, Finance Stoch. 5 (2001) 275–303. [8] F. Benth, K. Karlsen and K. Reikvam, Optimal portfolio management rules in a nonGaussian market with durability and intertemporal substitution, Finance Stoch. 5 (2001) 447–467. [9] F. Benth, K. Karlsen and K. Reikvam, Portfolio optimization in a L´evy market with international substitution and transaction costs, Stoch. Stoch. Reports 74 (2002) 517–569. [10] A. Bensoussan and J.L. Lions, Applications of variational inequalities in stochastic control, North-Holland Pub. Co., New York, N.Y., 1982. [11] C. Cancelier, Problemes aux limites pseudo-diffrentiels donnant lieu au principe du maximum, Comm. Partial Differential Equations 11 (1986) 1677–1726. [12] M. Chesney and M. Jeanblanc, Pricing American currency options in an exponential L´evy model, Appl. Math. Finance 11 (2004) 207–225. [13] M. Crandall and P.-L. Lions, Viscosity solutions of Hamilton-Jacobi equations, Trans. A.M.S. 277 (1983) 1–42. [14] M. Crandall, H. Ishii and P.-L. Lions, User’s guide to viscosity solutions of second order partial differential equations, Bull. Amer. Math. Soc. (N.S.) 27 (1992) 1–67. [15] N. Framstad, B. Oksendal and A. Sulem, Optimal consumption and portfolio in a jump diffusion market with proportional transaction costs, J. Math. Econom. 35 (2001) 233–257. [16] N. Framstad, B. Oksendal and A. Sulem, Sufficient stochastic maximum principle for the optimal control of jump diffusion and applications to finance, JOTA 121 (2004) 77–98. [17] W. Fleming and M. Soner, Controlled Markov processes and viscosity solutions (Second edition), Stochastic Modelling and Applied Probability 25, Springer, New York, 2006. [18] N.G. Garroni and J-L. Menaldi, Green functions for second order parabolic integrodifferential problems, Longman Scientific & Technical, Essex, 1992. [19] Y. Ishikawa, Optimal control problem associated with jump processes, Appl. Math. Optim. 50 (2004) 21–65. [20] Y. Ishikawa, A small example of non-local operators having no transmission property. Tsukuba J. Math. 25 (2001) 399–411. [21] N. Jacob, Pseudo-differential operators and Markov processes, Akademie Verlag, Berlin, 1996. [22] M. Jeanblanc, Financial markets in continuous time, Springer Finance. Berlin: Springer, 2003. [23] H. Kumano-go, Pseudodifferential operators. Translated from the Japanese by the author, R. Vaillancourt and M. Nagase, MIT Press, Cambridge, Mass.-London, 1981. [24] N.V. Krylov, Controlled diffusion processes, Applications of Mathematics, 14, Springer-Verlag, New York-Berlin, 1980. [25] E. Mordecki, Optimal stopping and perpetual options for Levy processes, Finance Stochast. 6 (2002) 473–493.

120

Y. Ishikawa

[26] E. Mordecki and P. Salminen, Optimal stopping of Hunt and L´evy processes, Stochastics 79 (2007) 233–251. [27] H. Morimoto, Stochastic control and mathematical modeling: application to economics, Cambridge Univ. Press, Cambridge, 2010. [28] B. Oksendal and A. Sulem, Applied stochastic control of jump processes (Second edition), Universitext, Springer-Verlag, Berlin, 2007. [29] H. Pham, Optimal stopping of controlled jump diffusion processes: a viscosity solution approach, J. Math. System Estim. Control 8 (1998) 1–27. [30] Ph. Protter, Stochastic integration and differential equations. A new approach. Applications of Mathematics, 21. Springer-Verlag, Berlin, 1990. [31] W. Rudin, Real and complex analysis, third edition, McGraw-Hill, New York, 1986. Yasushi Ishikawa Department of Mathematics Faculty of Science Ehime University 790 85 77 Matsuyama, Japan e-mail: [email protected]

Progress in Probability, Vol. 65, 121–144 c 2011 Springer Basel AG 

A Review of Recent Results on Approximation of Solutions of Stochastic Differential Equations Benjamin Jourdain and Arturo Kohatsu-Higa Abstract. In this article, we give a brief review of some recent results concerning the study of the Euler-Maruyama scheme and its high-order extensions. These numerical schemes are used to approximate solutions of stochastic differential equations, which enables to approximate various important quantities including solutions of partial differential equations. Some have been implemented in Premia [56]. In this article we mainly consider results about weak approximation, the most important for financial applications. Mathematics Subject Classification (2000). 60H35, 65C05,65C30. Keywords. Euler-Maruyama scheme, Kusuoka scheme, Milshtein scheme, weak approximations, stochastic equations.

1. Introduction The Euler-Maruyama scheme is a simple and natural approximation method for the solution of various types of stochastic differential equations. It helps not only to simulate the solutions of stochastic equations but it also serves theoretical purposes (see, e.g., the articles of E. Gobet [18, 19] on the local asymptotic mixed normality (LAMN) property in statistics). To introduce this notion consider the stochastic differential equation  t r  t  X(t) = x + b(X(s))ds + σj (X(s))dZ j (s), (1) 0

j=1

0

where b, σi : Rd → Rd , i = 1, . . . , r, are Lipschitz coefficients and Z = (Z 1 , . . . , Z r ) is an r-dimensional Wiener process. For a partition of the interval [0, T ] denoted as π : 0 = t0 < · · · < tn = T , we define the norm of the partition as π = max{ti+1 − ti ; i = 0, . . . , n − 1}

122

B. Jourdain and A. Kohatsu-Higa

and η(t) = sup{ti ; ti ≤ t} the last discretization time before t. Then the EulerMaruyama scheme is defined inductively by ∀t ∈ [ti , ti+1 ], X π (t) = X π (ti ) + b(X π (ti ))(t − ti ) +

r  j=1

σj (X π (ti ))(Z j (t) − Z j (ti )).

The simplicity and the generality of the possible applications are the main attractions of this scheme. In practice, one only needs to simulate the Brownian increments (Z(ti+1 )− Z(ti ))0≤i≤n−1 in order to compute (X π (t1 ), X π (t2 ), . . . , X π (tn )). Let us first mention the strong convergence rate result. Theorem 1. Under the above assumptions # $ 2p π ∀p ≥ 1, E sup X(t) − X (t) ≤ C π p t≤T

where the constant C depends on p, T , x and the Lipschitz constants. The proof of this result is standard and essentially goes through the same methodology used to prove existence of solutions to (1). This result can also be generalized to various equations without changing the essential ideas. In this paper, we are mainly interested in analyzing different error terms which involve a test function. The corresponding results are called weak convergence results. Section two deals with weak convergence for the Euler scheme. In n incase the partition is uniform (ti = iT n ), denoting the Euler scheme by X π stead of X , we first state the convergence in law of the normalized error process √ n(X − X n ) to a process χ which writes as a stochastic integral with respect to a Brownian motion independent from Z. More precisely, for any √ continuous and bounded function F on the space of continuous paths, E (F ( n(X − X n ))) converges to E (F (χ)) as n → +∞. Analysis of the weak error E (f (X(T )))− E (f (X n(T ))) turns out to be more important for applications: for instance, the price of a European option with payoff function f and maturity T written on an underlying evolving according to (1) under the risk-neutral measure writes E(e−rT f (X(T ))) where r denotes the risk-free rate. In [60], Talay and Tubaro prove that this difference can be expanded in powers of n1 . This justifies the use of Romberg-Richardson extrapolations in order to speed  up the convergence: for instance E (f (X(T ))) − E 2f (X 2n (T )) − f (X n (T )) = O( n12 ) (see Pag`es [53] for a recent study devoted to the numerical implementation of these extrapolations). The proof given by Talay and Tubaro relies on the Feynman-Kac partial differential equation associated with (1). Here, we present another methodology introduced in [33] and based on the integration by parts formula of Malliavin calculus. In [10], a general framework relying on the study of the linear stochastic equation satisfied by the error process X − X n is presented. This new methodology enables to deal with a great variety of equations including some which seem beyond the scope of the former PDE approach. We illustrate this latter point on the example of stochastic differential equations with delay.

Approximation of Solutions of Stochastic Differential Equations

123

The third section is devoted to a method of exact (in law) simulation of (1) recently introduced by [4, 5] in the one-dimensional case r = d = 1. Then, for a smooth diffusion coefficient σ which does not vanish, one can make a change of variables which transforms (1) into a SDE with diffusion coefficient constant and equal to one. Under a new probability measure given by Girsanov theorem, the original Brownian motion Z solves the latter SDE. The exponential factor giving the change of probability measure is then simulated by a rejection/acceptation technique involving a Poisson point process. It is possible to obtain schemes with convergence order higher than the one of the Euler scheme by keeping more terms in the stochastic Taylor expansion of the solution of the SDE. Section four deals with such schemes. We first introduce Stratonovitch stochastic integrals in order to write nice Taylor expansions. Then, on the example of the Milshtein scheme, we illustrate the difficulty to simulate the iterated Brownian integrals which appear in the expansions and therefore to implement schemes with high-order of strong convergence. Recently, to overcome this difficulty, Kusuoka [40, 41] proposed to replace these iterated Brownian integrals by random variables with the same moments up to a given order. This leads to schemes with high order of weak convergence. These schemes and their application in finance are currently the subject of a consequent research activity: [42, 47, 48, 50, 17]. The last section adresses extensions of the results presented previously. The case where instead of Z we have a L´evy process, Z, is first considered. Discretization of reflecting stochastic differential equations is also discussed.

2. Weak errors: from Jacod-Kurtz-Protter to Milshtein-Talay If one is trying to approach the problem of weak convergence of the error process then the first natural approach is to study the weak convergence of the process √ n (X(t) − X n (t)) . This is done in a series of articles by Jacod, Kurtz and Protter (e.g., see Section 5 in [39]). To simplify the ideas suppose that we are dealing with the Wiener case in one dimension (r = 1), b ≡ 0 and that the partition is uniform: ti = iT . Since n the process X n solves  t X n (t) = x + σ(X n (η(s)))dZ(s), 0

we have that n

X(t) − X (t) =



t

σ1n (s) (X(s) − X n (s)) dZ(s)  t σ2n (s) (Z(s) − Z(η(s))) dZ(s) + 0

0

(2)

124

B. Jourdain and A. Kohatsu-Higa

where σ1n (s) σ2n (s)

=



1

0

=



0

1

σ ′ (αX(s) + (1 − α)X n (s))dα σ ′ (αX n (s) + (1 − α)X n (η(s)))dασ(X n (η(s))).

Given the strong convergence result and assuming smoothness of σ, one has that σ1n and σ2n converge in the Lp (C[0, T ], R)-norm to σ1 (s) = σ ′ (X(s)) σ2 (s) = σ ′ σ(X(s)). Solving (2), we obtain that



t

E n (s)−1 σ2n (s) (Z(s) − Z(η(s))) dZ(s) 0  t n − E (t) E n (s)−1 σ1n σ2n (s) (Z(s) − Z(η(s))) ds,

X(t) − X n (t) = E n (t)

0

where

 1 t n (σ1 (s))2 ds 2 0 0 is the Doleans-Dade exponential, solution of the linear equation  t E n (t) = 1 + σ1n (s)E n (s)dZ(s). E n (t) = exp



t

σ1n (s)dZ(s) −

(3)

0

Now consider the process  t √ n (Z(s) − Z(η(s))) dZ(s) 0   √ j(t)−1 n  2 = (Z(ti+1 ) − Z(ti ))2 + (Z(t) − Z(η(t))) − t , 2 i=0

where tj(t) = η(t). Then using Donsker’s theorem (see, e.g., Billingsley [6] p. 68) we have that  · √ n (Z(s) − Z(η(s))) dZ(s) =⇒ Z ′ 0 ; 2 ′ where T Z is a Wiener process independent of Z. Furthermore if we consider < =  ·  t √ √ Z, n (Z(s) − Z(η(s))) dZ(s) = n (Z(s) − Z(η(s))) ds 0

t

0

we have that this quadratic covariation converges to 0 in L2 . This points to the following convergence

 · √ (Z(s) − Z(η(s))) dZ(s) =⇒ (Z, Z ′ ) Z, n 0

Approximation of Solutions of Stochastic Differential Equations

125

; 2 where Z and Z ′ are two independent Wiener processes. Therefore one can T hint at the following result  t √ n n (X(t) − X (t)) =⇒ E(t) E(s)−1 σ2 (s)dZ ′ (s), 0

where

;

E(t) = exp 2 T



t 0

1 σ1 (s)dZ(s) − 2



0

t

(σ1 (s)) ds 2



Z ) is a two-dimensional Wiener process. and (Z, This result in a variety of forms and generalizations has been extensively proved by Jacod, Kurtz and Protter. In particular, from this result one obtains that for any continuous bounded √ functional F in C[0, T ] one has that E [F ( n (X − X n ))] converges to

$ #  · E(s)−1 σ2 (s)dZ ′ (s) . E F E(·) 0

On one hand these results give more detail about the limit law of the error process. Nevertheless, this does not give full information about the rate of convergence of various other functionals such as E(X(t)p )−E(X n (t)p ), pX(t) (x)−pX n (t) (x) where p stands for the density function. For this reason other efforts have been directed into extending the type of convergence into stronger topologies than the one given by weak convergence of processes. In [29], the authors prove that for any continuous bounded functional F and any bounded real variable Y we have that #

$  · ' √ ( n −1 ′ E YF n (X − X ) → E Y F E(·) E(s) σ2 (s)dZ (s) . 0

This type of convergence is called stable convergence in law. It is worth noting that if Y is restricted to a subfiltration this concept also allows the study of the convergence of the conditional expectation of the error process. This type of results are promising but still it does not allow the analysis of the convergence of quantities like the ones mentioned before. In order to analyze this problem, there is another “parallel” theory called weak approximation that deals particularly with the error E [f (X) − f (X n )] .

The state of the art of this problem is more advanced than the one given previously by the theory of Jacod-Kurtz-Protter. In fact one is able to deal with non bounded, non continuous and even Schwartz distribution functions f (see Guyon [24]). On the other hand one is not able to give precise information on the distribution of the limit error. Just to explain in simple terms the ideas behind this approach, let’s explain in simple terms a complex result due to Bally and Talay [2, 3].

126

B. Jourdain and A. Kohatsu-Higa

To clarify the methodology, we consider a real diffusion process (that is Z is a one-dimensional Wiener process)  t σ(X(s))dZ(s), t ∈ [0, T ], X(t) = x + 0

and its Euler approximation n

X (t) = x +



t 0

σ(X n (η(s)))dZ(s), t ∈ [0, T ],

where η(s) = kT /n for kT /n ≤ s < (k + 1)T /n. The error process Y = X − X n solves  t Y (t) = (σ(X(s)) − σ(X n (η(s))))dZ(s) 0

=

 t 0

1

0

σ ′ (aX(s) + (1 − a)X n (η(s)))da(X(s) − X n (η(s)))dZ(s),

this can be written Y (t) = with σ1n (s)

=

G(t) = =

 





t

σ1n (s)Y (s)dZ(s) + G(t),

0 1

σ ′ (aX(s) + (1 − a)X n (η(s)))da

0 t 0

0 ≤ t ≤ T,

σ1n (s)(X n (s) − X n (η(s)))dZ(s)

t 0

σ1n (s)σ(X n (η(s)))(Z(s) − Z(η(s)))dZ(s).

In this simple case we have an explicit expression for Yt ,  t n Y (t) = E (t) E n (s)−1 (dG(s) − σ1n (s)d G, Zs ) 0

n

where E (t) is given by (3). Finally we obtain  t Y (t) = E n (t) E n (s)−1 σ1 (s)σ(X n (η(s)))(Z(s) − Z(η(s)))dZ(s) 0  t E(s)−1 σ1n (s)2 σ(X n (η(s)))(Z(s) − Z(η(s)))ds. − E n (t) 0

Now let f be a smooth function with possibly polynomial growth at infinity. We are interested in obtaining the rate of convergence of Ef (X(T )) to Ef (X n (T )). We first write the difference $ # 1 ′ n n f (aXT + (1 − a)X (t))daY (T ) . Ef (X(T )) − Ef (X (T )) = E 0

Approximation of Solutions of Stochastic Differential Equations

127

Replacing Y (T ) by its expression, we obtain with the additional notation  1 n f ′ (aX(T ) + (1 − a)X n(T ))da, F = 0

Ef (X(T )) − Ef (X n (T ))   T E n (s)−1 σ1n (s)σ(X n (η(s)))(Z(s) − Z(η(s)))dZ(s) = E F n E n (T ) 0



n n

− E F E (T )



T

0

E n (s)−1 σ1n (s)2 σ(X n (η(s)))(Z(s) − Z(η(s)))ds .

(4)

Applying the duality formula for stochastic integrals (E[DF, uL2 [0,T ] ] = E[F δ(u)] see [51]) where D stands for the stochastic derivative and δ stands for the adjoint of the stochastic derivative, this gives Ef (X(T )) −Ef (X n(T ))  T =E Ds (F n E n (T ))E n (s)−1 σ1n (s)σ(X n (η(s)))(Z(s) − Z(η(s)))ds 0



n n

− E F E (T )



0

T

E n (s)−1 σ1n (s)2 σ(X n (η(s)))(Z(s) − Z(η(s))) .

Consequently, the differenceEf (X(T )) −Ef (X n (T )) has the simple expression  T n Ef (X(T )) −Ef (X (T )) =E U n (s)(Z(s) − Z(η(s)))ds , 0

with U n (s) = (Ds (F n E n (T )) − F n E n (T )σ1 (s))(E n (s)−1 σ1n (s)σ(X n (η(s)))). We finally obtain the rate of convergence by applying once more the duality for stochastic integrals   T s n Du U n (s)duds . Ef (XT ) −Ef (XT ) =E 0

η(s)

This last formula ensures that |Ef (X(T )) −Ef (X n (T ))| ≤ CT /n and leads to an expansion ofEf (X(T )) −Ef (X n(T )) with some additional work. Furthermore the above argument extends easily in the case that f is an irregular function through the use of the integration by parts formula of Malliavin Calculus. In other stochastic equations, one cannot explicitly solve the stochastic linear equation satisfied by Y , but in a recent article [10], one can find a general framework that allows treating a great variety of equations. As example we have developed the case of delay equations. The idea explained above appeared for the first time at some workshop proceedings (in [33]) and later was used by various

128

B. Jourdain and A. Kohatsu-Higa

authors (see [20, 22, 21]) to prove weak approximations errors in other contexts such as the Zakai equation or backward stochastic differential equations. In fact, the first time this argument appeared in [33], it was just considered as an alternative argument to prove the classical results of weak approximation of Milshtein [43] which are usually obtained through a PDE method. Later it has been shown that in fact this new approach can go beyond the classical proof method. To explain this with a concrete example, we will briefly describe the problem with delay equations which is solved in [10]. Notice that such equations have been introduced in finance by Rogers and Hobson [25] in order to propose a complete model with stochastic volatility. In few words the problem with the Euler approximation for delay equations is that if one tries to use the Milshtein method one gets into infinite-dimensional problems quite rapidly and therefore the degree of generalization is quite limited. In fact, consider (see the article of Buckwar and Shardlow [8]) the following one-dimensional delay equation

 0 X(t + s)dm(s) + b(X(t)) dt + σ(X(t))dZ(t) dX(t) = −τ

where m is a deterministic finite measure on the interval [−τ, 0] and the initial conditions are X(s) = x(s) for s ∈ [−τ, 0]. Consider the integral operator A  0 x(t + s)dm(s). Ax(t) = −τ

Then using classical theory of stochastic differential equations in infinite dimensions (an extension of the variation of constants method, see Da Prato-Zabczyk) one obtains  t  t S(t − s)σ(X(s))dZ(s) X(t) = S(t)x + S(t − s)b(X(s))ds + 0

0

where S is the semigroup associated with the linear first term in the equation for X. The natural definition of the Euler scheme is obviously obtained by discretization of the integral in the drift term. That is, X n (ti+1 ) = X n (ti ) +

m 

X(ti + sj )m(sj , sj+1 ]

j=0

+ b(X n (ti ))(ti+1 − ti ) + σ(X n (ti ))(Z(ti+1 ) − Z(ti )) where sj is a partition of the interval [−τ, 0] such that ti + sj = tl for some l ≤ i. Similarly, one finds that X n is generated using instead of S the Yoshida approximations to this semigroup. That is the semigroup S n associated with n

A x(t) =

m  j=0

x(t + sj )(m(sj+1 ) − m(sj )).

Approximation of Solutions of Stochastic Differential Equations

129

Then, as when n tends to infinity An x −→ Ax and S n x −→ Sx, one expects the strong convergence of the Euler scheme. In order to study the weak errors one has to go further and define the solution of the partial differential equation associated with this problem: ut (t, y) =

1 ux x (t, y)σ(x0 )2 + ux (t, y)Ax + ux0 (t, y)b(x0 ) 2 0 0

where x(0) = x0 for y = (x0 , x) ∈ R × L2 [−τ, 0]. The (non-trivial) argument is then similar to the classical Milshtein argument. Nevertheless, it is also clear from the above set-up that this approach has its limitations. For example, one cannot suppose that there is also a continuous delay in the diffusion coefficient or that the delay term is non-linear. In comparison, using the method explained previously, one obtains the following result: Let (Xt )t∈[0,T ] be the solution stochastic delay equation:   0

 0

  dX = σ Xt+s dν(s) dZt + b Xt+s dν(s) dt t −τ −τ   Xs = ξs , s ∈ [−τ, 0], where τ > 0, ξ ∈ C([−τ, 0], IR) and ν is a finite measure. We consider the Euler approximation of (Xt ) with step h = τ /n   0

 0

 n n  dX n = σ X dν(s) dZ + b X dν(s) dt t

t

η(t)+η(s)

−τ   X n = ξ , s ∈ [−τ, 0], s s

η(t)+η(s)

−τ

] with η(s) = [ns/τ n/τ , where [t] stands for the entire part of t. We assume that the functions f , σ and b are Cb3 . Then we obtain that

Ef (XT ) − Ef (XTn ) = hCf + I h (f ) + o(h)

(5)

where Cf = C(U 0 ) and I h (f ) = I h (U 0 ) are defined in [10]. In particular |I h (f )| ≤ Ch and  0

 0

0 ′ ′ ′ Xs+u dν(u) Ds f (XT ) + b Xs+u dν(u) f ′ (XT ) Us = σ −r −r   



 + σ′

0

T

Xs+u dν(u) Ds

−r

θt dt

0

+ b′

0

T

Xs+u dν(u)

−r

and θ is the unique solution of       T  ∗ ′ ∗ θt = α J f (XT ) + θs ds (t) + β E f ′ (XT ) + 0

θt dt

s

.

T

θs ds|F.



(t)

130

B. Jourdain and A. Kohatsu-Higa

with ∗

α (X)(t) = E



0

σ



max(t−T,−r)

β ∗ (X)(t) = E



0

b′

max(t−T,−r)





0 −r 0

−r



Xt−u+v dν(v) Xt−u dν(u)|Ft



Xt−u+v dν(v) Xt−u dν(u)|Ft .

The above-quoted result (5) is an expansion of the error. This result is used in order to increase the rate of convergence of the method using the Romberg extrapolation (see [60] for details in the diffusion case). That is, if Ef (XT ) − n/2 Ef (XTn ) = hCf + o(h). Then if we define Y = 2f (XTn ) − f (XT ) one obviously obtains that Ef (XT ) − EY = o(h) therefore increasing the rate of convergence of the method. In order to know exactly what has been gained, it is also important to obtain the order of the term o(h) in (5). This can be done using the same method but through tedious work obtaining that in fact, Ef (XT )−Ef (XTn ) = hCf +O(h2 ) under enough smoothness conditions on σ and b. In the diffusion case, Pag`es [53] addresses variance issues in the context of multi-step Romberg extrapolation.

3. An exact simulation method for one-dimensional elliptic diffusions Recently in two articles by Beskos et al. [4, 5], an interesting exact method of simulation in dimension one has been introduced. Consider the one-dimensional diffusion  t  t X(t) = x + b(X(s))ds + σ(X(s))dZ(s) 0

0

where σ(x) ≥ c > 0 for any x ∈ R and σ ∈ C 1 (R). Then perform the change z 1 of variables Yt = η(Xt ) where η(z) = x σ(u) du. By Itˆo’s formula, Y satisfies the following sde:  t Y (t) = α(Y (s))ds + Z(t) 0

′ b (η −1 (y))− σ2 (η−1 (y)). σ

where α(y) = Suppose that we want to compute E(f (XT )). Then using Girsanov’s theorem we have that     T 1 T 2 α(Bs )dBs − α(Bt ) dt (6) E(f (XT )) = E f (BT ) exp 2 0 0 where B is another Wiener process and here we assume that α is bounded. This idea is usually found when one proves existence of weak solutions for stochastic differential equations.

Approximation of Solutions of Stochastic Differential Equations

131

u Next, one defines the function A(u) = 0 α(y)dy. With this definition we have, applying Itˆ o’s formula, that   T 1 T ′ α (Bs )ds. A(BT ) = α(Bs )dBs + 2 0 0 Therefore





1 Ef (XT ) = E f (BT ) exp A(BT ) − 2



0

T

  α(Bt )2 + α′ (Bt ) dt



.

If one was to simulate the above quantity, one would need the whole path of the Wiener process B. In fact this is done in a series of papers by Detemple et al. [14, 15, 16] where the Doss-Sussman formula is used to improve the approximation and obtain a scheme which is of strong order one. Instead, Beskos et al. [5] propose to use a Poisson process to simulate the exponential in the above expression. In   fact, one assumes that φ(x) = 21 α(x)2 + α′ (x) is such that ∀x ∈ R, 0 ≤ φ(x) ≤ M and introduces a Poisson point process N with intensity ds×du on [0, T ]×[0, M ], independent of B. For any Borel subset S of [0, T ] × [0, M ], N (S) is a Poisson random variable with parameter the Lebesgue measure of S. Hence, the random variable N1 = N ({(s, u) ∈ [0, T ] × [0, M ] : 0 ≤ u ≤ φ(B(s))}) is such that    T P (N1 = 0|B) = exp − φ(Bs )ds . 0

The simulation scheme follows from the equality ' ( E(f (XT )) = E f (BT ) exp (A(BT )) E(1{N1 =0} |B) ' ( = E f (BT ) exp (A(BT )) 1{N1 =0} .

How is the simulation done? First one simulates independent exponential random ν variables with parameter M say X1 , . . . , Xν until i=1 Xi > T . Then one simulates independent random variable Ui , . . . , Uν−1 uniformly distributed on the inter ν−1 val [0, M ]. The resulting point process N = i=1 δ(X1 +···+Xi ,Ui ) on [0, T ] × [0, M ] is Poisson with intensity ds × du. Now one simulates the independent increments B(X1 ), B(X1 + X2 ) − B(X1 ), . . . , B(T ) − B( ν−1 Xi ) of the Brownian motion i=1 ν−1 and computes N1 = i=1 1{Ui ≤B(X1 +···+Xi )} . Obviously there are various issues that have not been considered in this short introduction which rest as open problems or that had already been treated by the authors. Also as it was well known before, the one-dimensional case always permits various reductions that do not happen in higher dimensions. For instance, in higher dimensions, the so-called Doss transformation which permits to obtain a SDE with a constant diffusion coefficient is only possible when σ satisfies a restrictive commutativity condition. Notice that under that condition, the discretization scheme obtained by applying the Euler scheme to the SDE with constant diffusion coefficient and making the inverse change of variables is of strong order one (see [14]). Moreover, in higher dimensions the replacement of the stochastic integral

132

B. Jourdain and A. Kohatsu-Higa

in (6) by a standard integral thanks to Itˆ o’s formula is only possible when α is a gradient function. Nevertheless, the one-dimensional case always remains as a testing ground for new methodology as it was proven by our recent development in Section 2. And an exact Monte Carlo method for the pricing of Asian options in the Black-Scholes model inspired by the above ideas will be implemented by Jourdain and Sbai [30] in the version 10 of Premia [56].

4. Schemes with high-order of convergence 4.1. Stochastic Taylor expansions In order to make such expansions, it is more convenient to rewrite (1) in Stratonovich form. The interest is that the chain rule holds for Stratonovich integrals. We recall that for a regular adapted one-dimensional process (H(s))s≤t the Stratonot vich integral 0 H(s) ◦ dZ j (s) is equal to the limit in probability of i 21 (H(ti+1 ∧ t) + H(ti ∧ t))(Z j (ti+1 ∧ t) − Z j (ti ∧ t)) as maxi |ti+1 − ti | tends to 0. Hence  t  t 7 16 H(s) ◦ dZ j (s) = H(s)dZ j (s) + H, Z j t 2 0 0 and (1) writes

X(t) = x + where σ0 = b − 21



t

σ0 (X(s))ds + 0

r   j=1

r

t

0

σj (X(s)) ◦ dZ j (s)

j=1 ∂σj σj with ∂σj denoting the matrix

)

∂σij ∂xl

(7)

*

for σj = 1≤i,l≤d d (σ1j , . . . , σdj )∗ . Let us introduce the differential operators Vj = i=1 σij (x)∂xi for 0 ≤ j ≤ r. Since the chain rule holds for Stratonovich integrals, for f a smooth function on Rd ,  t r  t  f (X(t)) = f (x) + V0 f (X(s))ds + Vj f (X(s)) ◦ dZ j (s) 0

= f (x) +

r  t  j=0

0

j=1

0

Vj f (X(s)) ◦ dZ j (s),

where for notational convenience we set Z 0 (s) = s.  s r Now remarking that Vj f (X(s)) = Vj f (x) + l=0 0 Vl Vj f (X(u)) ◦ dZ l (u), one obtains  t r  f (X(t)) = f (x) + Vj f (x) ◦dZ j (s) +

j=0 r 

j,l=0

0



0≤u≤s≤t

Vl Vj f (X(u)) ◦ dZ l (u) ◦ dZ j (s).

Approximation of Solutions of Stochastic Differential Equations

133

Iterating the reasoning, one obtains that for any positive integer m, f (X(t)) = f (x) + +

m 

r 

Vj1 Vj2 . . . Vjk f (x)Z (j1 ,...,jk ) (t)

k=1 j1 ,...,jk =0  r  j1 ,...,jm+1 =0

Vj1 . . . Vjm+1 f (X(s1 ))

0≤s1 ≤···≤sm+1 ≤t

◦ dZ j1 (s1 ) ◦ · · · ◦ dZ jm+1 (sm+1 )

 where Z (j1 ,...,jk ) (t) = 0≤s1 ≤···≤sk ≤t ◦dZ j1 (s1 ) ◦ · · · ◦ dZ jk (sk ). √ For 1 ≤ j ≤ r, Z j (s) is of order s while Z 0 (s) = s or, in other words, by scaling, Z (j1 ,...,jk ) (t) has the same distribution as t(k+#{1≤l≤k:jl =0})/2 Z (j1 ,...,jk ) (1). Hence to obtain terms with the same order of magnitude in the above expansion, 0 one has to count the 9 integrals withl respect to Z (s) twice. That is why for α = (j1 , . . . , jk ) ∈ A = l∈N ∗ {0, . . . , r} , we set |α| = k and α = k + #{1 ≤ l ≤ k : jl = 0}. Then we write  f (X(t)) = f (x) + Vj1 . . . Vjk f (x)Z α (t) + Rm,f (t) where α: α ≤m



Rm,f (t) =

Vj1 . . . Vjk f (x)Z α (t)

α:|α|≤m, α >m

+

r 

j1 ,...,jm+1 =0



Vj1 . . . Vjm+1 f (X(s1 )) 0≤s1 ≤···≤sm+1 ≤t

◦ dZ j1 (s1 ) ◦ · · · ◦ dZ jm+1 (sm+1 ).

(8)

Since the remainder Rm,f (t) involves termes scaling like t to a power greater or equal to (m + 1)/2, the following result (see Proposition 2.1 [42] for p = 1) is not surprising. Proposition 2. When the functions f , b and σj are smooth, the remainder Rm,f (t) m+1 is such that for p ≥ 1, E(|Rm,f (t)|2p )1/(2p) ≤ Ct 2 where the constant C depends on p, f , b, σj and their derivatives. 4.2. The Milshtein scheme The Milsthein scheme consists in choosing f (x) = x and m = 2 in the above expansion (8) and removing the remainder: ∀t ∈ [ti , ti+1 ], X π (t) = X π (ti ) + +

r 

σj (Xtπi )(Z j (t) − Z j (ti ))

j=0 r 

j,l=1

∂σj σl (Xtπi )(Z (l,j) (t) − Z (l,j) (ti+1 )).

(9)

The strong order of convergence of the Milshtein scheme is one (see, e.g., [32]):

134

B. Jourdain and A. Kohatsu-Higa

Theorem 3. Assume that the functions σj and b are C 2 with bounded derivatives. Then for p ≥ 1, sup E [ X(t) − X π (t) p ] ≤ C π p , t≤T

where the constant C does not depend on the partition π. To implement the Milshtein scheme, one faces the difficulty usually encountered when trying to construct practical discretization schemes from Taylor expansions: the need to simulate increments of the multiple stochastic integrals which appear. Onthe one hand, by the fundamental theorem of calculus, for 1 ≤ j ≤ r, Z (j,j) (t) = 0≤u≤s≤t ◦dZuj ◦ dZsj is equal to 12 (Z j (t))2 . But on the other hand, no such nice expression in terms of Z j (t), Z l (t) holds for Z (l,j) (t) when j = l. The generalization of equality 2Z (j,j) (t) = (Z j (t))2 writes Z (l,j) (t)+Z (j,l) (t) = Z l Z j (t). Hence, for the Milshtein scheme to be simulable, one needs the following commutativity condition (C) ∀1 ≤ l < j ≤ r, ∂σj σl = ∂σl σj

which always holds when r = 1 (single Brownian motion case). Under (C), it is enough to simulate the Brownian increments since the Milshtein scheme writes r  π π π X (ti+1 ) = X (ti ) + b(Xti )(ti+1 − ti ) + σj (Xtπi )(Z j (ti+1 ) − Z j (ti )) 

+ +

j=1

1≤l 0 x → E(f (X(t))) is smooth in the directions given by the fields generated by the Lie brackets of Vj even if f is not. For the non uniform grid refined near the maturity T

γ

n−i with γ > m, ti = T 1 − n he obtains convergence of the approximation with order (m − 1)/2 for functions f only C 1 :   E(f (X(T ))) − Qt1 Qt2 −t1 . . . QT −tn−1 f (x) ≤ C ∇f ∞ . n(m−1)/2

Proof. Setting Pt f (x) = E(f (X(t))), one has the following decomposition of the error |E(f (X(T )) − Qt1 . . . QT −tn−1 f (x)|

≤ |E(PT −tn−1 f (X(tn−1 ))) − Qt1 . . . Qtn−1 −tn−2 PT −tn−1 f (x)|

(11)

+ |Qt1 Qt2 −t1 . . . Qtn−1 −tn−2 (PT −tn−1 f − QT −tn−1 f )(x)|

We now prove the result by induction on n. For n = 1, one deals with Pt f (z) − Qt f (z) which by (8) is equal to 

 E f z+ Vj1 . . . Vjk I(z)Z α (t) + Rm,I (t) α ≤m



 −f z + t α /2 Vj1 . . . Vjk I(z)ζ α . α ≤m

Assuming for simplicity that d = 1 and making a standard Taylor expansion of the function f in the neighborhood of z, one deduces that Pt f (z) − Qt f (z) is equal to 

k m   f (k) (z) E Vj1 . . . Vjk I(z)Z α (t) + Rm,I (t) k! k=1

α ≤m





α ≤m

t α /2 Vj1 . . . Vjk I(z)ζ α

k

+ O(t

m+1 2

).

Approximation of Solutions of Stochastic Differential Equations

137

According to Proposition 2, Rm,I (t) scales like t(m+1)/2 . Now developping the powers and using (10) one obtains that the expectation of all terms scaling like tl/2 with l ≤ m vanish. Therefore |Pt f (z)−Qt f (z)| ≤ Ct(m+1)/2 and the induction hypothesis holds for n = 1. We now assume that the induction hypothesis holds at rank n − 1. Since when f is smooth, so is PT −tn−1 f , we deduce that the first term of the right-hand n−2 side of (11) is smaller than C i=0 (ti+1 − ti )(m+1)/2 . By the result proved for n = 1, PT −tn−1 f − QT −tn−1 f ∞ ≤ C(T − tn−1 )(m+1)/2 . Since for all t ≥ 0, Qt g ∞ ≤ g ∞ , one deduces that the second term of the r.h.s. of (11) is smaller than C(T − tn−1 )(m+1)/2 . This concludes the proof.  Let us briefly present the related approximation based on the notion of cubature proposed by Lyons and Victoir [42]. Definition 8. Let m ∈ N ∗ and t > 0. Continuous paths ωt,1 , . . . , ωt,N with bounded variation from [0, t] to Rr and positive weights λ1 , . . . , λN such that N l=1 λl = 1 define a cubature formula with degree m at time t if ∀α = (j1 , . . . , jk ) ∈ A such that α ≤ m,  N  (12) j1 jk α λl E(Z (t)) = dωt,l (s1 ) . . . dωt,l (sk ) l=1

where

j ωt,l (s)

0≤s1 ≤···≤sk ≤t

0 denotes the jth coordinate of ωt,l (s) when 1 ≤ j ≤ d and ωt,l (s) = s.

According to [42], there exists a cubature with degree m at time 1 such that N is smaller than the cardinality of {α ∈ A : α ≤ m}. Moreover, one deduces a cubature of degree m at time t by scaling. For l ∈ {1, . . . , N } let (yt,l (s, x))s≤t denote the solution of the ODE: r  j yt,l (0, x) = x and ∀s ∈ [0, t], dyt,l (s, x) = σj (yt,l (s, x))dωt,l (s). j=0

Lyons and Victoir propose to approximate E(f (X(t)) by Qt f (x) =

N 

λl f (yt,l (t, x)).

l=1

Theorem 6 still holds with this new definition of Qt f . The proof is based on a similar decomposition of the error but the analysis of E(f (X(t)) − Qt f (x) is easier. Indeed the Taylor expansion (8) holds for Z replaced by ωt,l . Multiplying by λl , summing over l and substracting (8) then using Proposition 2 and (12), one obtains that |E(f (X(t)) − Qt f (x)| ≤ Ct(m+1)/2 . Let us finally mention, interesting schemes with high weak order of convergence recently proposed by Ninomiya and Victoir [50] and Fujiwara [17]. Even if the idea of these schemes also comes from stochastic Taylor expansions, their implementation is different from the previous ones. It requires a sequence of independent uniform random variables (Ui )1≤i≤n independent from (Z 1 , . . . , Z r ).

138

B. Jourdain and A. Kohatsu-Higa

¯ π (ti ) to X ¯ π (ti+1 ), one repeats the following steps for For θ ∈ N ∗ , to go from X θ θ k ∈ {1, . . . , θ}

d x(t) = σ0 (x(t)) on an interval 1. integrate the ordinary differential equation dt with length (ti+1 − ti )/2θ, 2. depending on whether Ui+1 ≤ 12 or not, integrate successively for j increasing d from 1 to r or for j decreasing from r to 1 the ODE dt x(t) = σj (x(t)) on an j k j k−1 k interval with random length Z (ti ) − Z (ti ) where ti = ti + k(ti+1 − ti )/θ. 3. do the first step again. ¯ π (T ) is an approximation of X(T ) with Ninomiya and Victoir [50] prove that X 1 weak order of convergence 2. The idea of Fujiwara [17] is to make Romberg-like extrapolations in order to improve the weak rate of convergence. Indeed, he proves ¯ n (T )) − f (X ¯ n (T ))) that E(f (X(T ))) is respectively approximated by 13 E(4f (X 2 1 1 n n n ¯ ¯ ¯ and 120 E(243f (X3 (T )) − 128f (X2 (T )) + 5f (X1 (T ))) with order of convergence 4 and 6. When, for some of the above ODEs, no analytical expression of the solution is available, one has to resort to discretization schemes. Those schemes have to be chosen carefully in order to preserve the weak order for the resulting scheme for (1). For instance, Fujiwara suggests a Runge-Kutta scheme with order 13 to preserve the weak order 6. More recently, Ninomiya and Ninomiya [49] have proposed a scheme with weak order 2 in which, for each time-step, only two ordinary differential equations have to be integrated on a random time-horizon. They have also analysed the effect in terms of weak error of the resort to Runge-Kutta schemes to integrate the ordinary differential equations. The schemes with weak order 2 proposed by Ninomiya and Victoir and by Ninomiya and Ninomiya are both implemented in Premia [56] for the pricing of Asian options under the Heston model of stochastic volatility.

5. Comments on some extensions We discuss first the case when Z is a L´evy process. That is, process with independent and stationary increments with characteristic function given by E [exp (i θ, Z(t))]

 1 = exp − θ, Γθ t + i b, θ t + (exp(i θ, x) − 1 − iθx1 {x ≤ 1}) ν(dx) 2 Rr

where θ ∈ Rr ,)Γ ∈ Rr×r * is a symmetric non-negative matrix and ν is a measure  2 satisfying Rr 1 ∧ |x| ν(dx) < ∞. When b = ν = 0 and Γ is the identity matrix, then Z is a standard r-dimensional Wiener process. The constant b denotes the drift of the process and ν is the L´evy measure associated to the process Z. We note that in comparison with the Wiener process case  notk all moments of Z are finite. In fact the moment of order k of Z is finite if Rr |x| 1{x≥1} ν(dx) < ∞. The existence and uniqueness of the above equation (1) is ensured by standard theorems that can be found in, e.g., Protter [57] under Lipschitz assumptions on

Approximation of Solutions of Stochastic Differential Equations

139

the coefficients b and σ. Nevertheless it is not clear under which conditions the moments of the solution are finite if Z is a L´evy process, except for the case of bounded coefficients. In particular, we do not know how the finite moment property transfers from Z into X when the coefficients are Lipschitz. These properties are important in order to determine the convergence properties of the Euler scheme. The situation in the case that σ is constant is already difficult enough. Nevertheless, this is an interesting problem. We quote here some results of the article Kohatsu-Yamazato [34] who study this problem in the particular case that σ is constant. For example, consider for simplicity the one-dimensional case r = d = 1 with Γ = 0, b = 0 and ν a measure concentrated on (0, ∞). The moment E(X(t)β ) is finite or not depending on whether the integral with respect to ν in the last column is finite or not. b(y) = y α

β

0≤α≤1

β>0

α>1

0 < β < α−1

α>1

β = α−1

α>1

β > α−1

Criterion for finiteness of E(X(t)β )  +∞ β y ν(dy) 1 always finite  +∞ log (y) ν(dy) 1  +∞ β−α+1 y ν(dy) 1

In the same lines of the above table, but in another set up, Grigoriu-Samorodnistsky [23] studied the tail behavior of X(t). In either case the conclusions are similar. The rule seems to be that if the drift coefficient is sublinear then the drift does not influence the finite moment property of Z and it transfers directly to X. If the drift is superlinear then the situation is different. That is, the finite moment property depends on the difference of power between the drift and the moment to be evaluated. Therefore, it can be conjectured that this is the situation in the Lipschitz cases. Currently, as far as our knowledge goes, it is not known if X has finite moments even if the exponential moments of Z are bounded unless one imposes a series of stringent conditions. In most papers found in the literature, besides this assumption, one also has to make the assumption that the moments of X are bounded which is an unaccomplished feature of this problem. For example one has that Theorem 9. Suppose that Z has exponential moments and that X has finite moments. Then # $ 2p p E sup X(t) − X π (t) ≤ C π t≤T

where the constant C depends on T , x and the Lipschitz constants.

140

B. Jourdain and A. Kohatsu-Higa

One remarkable different case from the discussion in this paper is the situation of reflecting stochastic differential equations. In general, if the domain is closed and convex and the reflection is normal, then the results can be usually obtained as generalizations of the non-reflecting case. The main difference lies in how the inequalities are obtained. In fact, instead of using strong type inequalities directly on the error process X(t) − X π (t), one has to use Itˆ o’s formula and the fact that contribution of the reflecting processes brings X n closer to X. If the domain is more general then the results are no longer valid. In fact, as proven by Pettersson [54] (later refined by Slominski [59]) the rates can decay slightly depending on the properties of the domain. The latest refined results on this can be found in a recent thesis by S. Menozzi [44]. Nevertheless there is no parallel theory in the style of Jacod-Kurtz-Protter. In finance, one-dimensional processes that remain non-negative are of particular interest. This non-negativity property comes from the choice of the coefficients in the SDE rather than from reflection. The typical example is the Cox-IngersollRoss process :  t  t5 X(s)dZ(s), with x, a, σ ≥ 0 and k ∈ R, X(t) = x + (a − kX(s))ds + σ 0

0

which is used to model short interest rates but also stochastic volatility in the Heston model. It is not possible to discretize this SDE by the standard Euler scheme: indeed, X π (t1 ) is negative with positive probability and then it is not possible to compute the square root in the diffusion coefficient in order to define X π (t2 ). To overcome this problem, Deelstra and Delbaen [12], propose to take the positive part before the square root and define recursively: 5 X π (ti+1 ) = X π (ti )(1 −k(ti+1 −ti ))+ a(ti+1 −ti )+σ (X π (ti ))+ (Z(ti+1 )− Z(ti )).

In her thesis [13], Diop studies the symmetrized Euler scheme defined by   5   X π (ti+1 ) = X π (ti )(1 − k(ti+1 − ti )) + a(ti+1 − ti ) + σ X π (ti ) (Z(ti+1 ) − Z(ti )) .

In [1], Alfonsi compares those schemes with some new ones that he proposes. He concludes that the following explicit scheme combines the best features when 2 a ≥ σ4 :  2

5 k σ(Z(t ) − Z(t )) i+1 i X π (ti+1 ) = X π (ti ) + 1 − (ti+1 − ti ) 2 2(1 − k2 (ti+1 − ti ))

σ2 + a− (ti+1 − ti ). 4

This scheme has been implemented in Premia [56] in order to discretize the SSRD model of credit risk [7]. Another interesting issue is the discussion about adaptive methods. That is, how to choose the partition as to improve the first term in the expansion of the strong error. For this, we refer the reader to [9] and subsequent articles [26] and [46].

Approximation of Solutions of Stochastic Differential Equations

141

References [1] Alfonsi, A. On the discretization schemes for the CIR (and Bessel squared) processes. Monte Carlo Methods Appl. 11(4):355–384, 2005. [2] Bally, V. and Talay, D. The law of the Euler scheme for stochastic differential equations (II): convergence rate of the density. Monte Carlo Methods and Applications, 2:93–128, 1996. [3] Bally, V. and Talay, D. The law of the Euler scheme for stochastic differential equations (I): convergence rate of the distribution function. Probability Theory and Related Fields, 104:43–60, 1995. [4] Beskos, A. and Roberts G.O. Exact Simulation of Diffusions, Ann. Appl. Probab., 15(4):2422–2444, 2005. [5] Beskos, A., Papaspiliopoulos, O. and Roberts G.O. Retrospective exact simulation of diffusion sample paths with applications. Bernoulli 12(6):1077–1098, 2006. [6] Billingsley, P. Convergence of Probability Measures. John Wiley and Sons, 1968. [7] Brigo, D. and Alfonsi, A. Credit default swap calibration and derivatives pricing with the SSRD stochastic intensity model. Finance Stoch. 9(1):29–42, 2005. [8] Buckwar, E. and Shardlow, T. Weak approximation of stochastic differential delay equations. IMA J. Numerical Analysis, 25:57–86, 2005. [9] Cambanis, S. and Hu, Y. Exact convergence rate of the Euler-Maruyama scheme, with application to sampling design. Stochastics Stochastics Rep. 59(3-4):211–240, 1996. [10] Clement, E., Kohatsu-Higa, A. and Lamberton, D. A duality approach for the weak approximation of stochastic differential equations. Ann. Appl. Probab. 16(3):1124– 1154, 2006. [11] Cruzeiro, A.B., Malliavin, P. and Thalmaier, A.: Geometrization of Monte Carlo numerical analysis of an elliptic operator: strong approximation. C. R. Math. Acad. Sci. Paris 338(6):481–486, 2004. [12] Deelstra, G. and Delbaen, F. Convergence of discretized stochastic (interest rate) processes with stochastic drift term. Appl. Stochastic Models Data Anal. 14(1):77– 84, 1998. [13] Diop, A. Sur la discr´etisation et le comportement a ` petit bruit d’EDS unidimensionnelles dont les coefficients sont ` a d´eriv´ees singuli`eres. Th`ese Universit´e de Nice Sophia-Antipolis, 2003. [14] Detemple, J., Garcia, R., M. Rindisbacher. Asymptotic properties of Monte Carlo estimators of diffusion processes, Journal of Econometrics: 134(1):349–367, 2005. [15] Detemple, J., Garcia, R., and M. Rindisbacher, “Representation Formulas for Malliavin Derivatives of Diffusion Processes”, Finance and Stochastics, 9(3):349–367, 2005. [16] Detemple, J., Garcia, R., and M. Rindisbacher, “Asymptotic Properties of Monte Carlo Estimators of Derivatives”, Management Science, 51:1657–1675, 2005. [17] Fujiwara, T. Sixth order method of Kusuoka approximation. Preprint, 2006. [18] Gobet, E. LAMN property for elliptic diffusion: a Malliavin Calculus approach. Bernoulli 7:899–912, 2001.

142

B. Jourdain and A. Kohatsu-Higa

[19] Gobet E. LAN property for ergodic diffusions with discrete observations. Ann. Inst. H. Poincar´e Probab. Statist. 38:711–737, 2002. [20] Gobet E. and Munos, R. Sensitivity analysis using Itˆo-Malliavin calculus and martingales. Application to stochastic optimal control. SIAM Journal on Control and Optimization, 43(5):1676–1713, 2005. [21] Gobet, E., Pages, G., Pham, H. and Printemps, J. Discretization and simulation of the Zakai equation. SIAM J. Numer. Anal. 44(6):2505–2538, 2006. [22] Gobet E. and Labart C. Error expansion for the discretization of Backward Stochastic Differential Equations. Stochastic Process. Appl. 117(7):803–829, 2007. [23] Grigoriu M. and Samorodnitsky G. Tails of solutions of certain nonlinear stochastic differential equations driven by heavy tailed L´evy processes. Stochastic Proc. Appl. 105:69–97, 2003. [24] Guyon, J. Euler scheme and tempered distributions. Stochastic Process. Appl. 116(6)877–904, 2006. [25] Hobson, D.G. and Rogers L.C.G. Complete models with stochastic volatility. Math. Finance 8(1):27–48, 1998. [26] Hofmann, N., M¨ uller-Gronbach, T. and Ritter, K. Optimal approximation of stochastic differential equations by adaptive step-size control. Math. Comp. 69(231):1017– 1034, 2000. [27] Ikeda, N. and Watanabe, S.: Stochastic Differential Equations and Diffusion Processes. Amsterdam Oxford New York: North-Holland/Kodansha 1989. [28] Jacod, J., Kurtz, T., M´el´eard, S. and Protter, P. The approximate Euler method for L´evy driven stochastic differential equations. Ann. Inst. H. Poincar´e, Probab. Statist. 41(3):523–558, 2005. [29] Jacod, J. and Protter, P. Asymptotic error distributions for the Euler method for stochastic differential equations. Ann. Probab. 26(1):267–307, 1998. [30] Jourdain, B. and Sbai, M. Exact retrospective Monte Carlo computation of arithmetic average Asian options. Monte Carlo Methods Appl. 13(2):135–171, 2007. [31] Kebaier, A. Statistical Romberg extrapolation: a new variance reduction method and applications to option pricing. Ann. Appl. Probab. 15(4):2681–2705, 2005. [32] Kloeden, P.E. and Platen, E. Numerical Solution of Stochastic Differential Equations. Springer Berlin Heidelberg New York, 1992. [33] Kohatsu-Higa, A. and Pettersson, R. On the simulation of some functionals of diffusion processes. Proceedings of the “Fourth Workshop on Stochastic Numerics”. Research Institute for the Mathematical Sciences, Kyoto 2000. [34] Kohatsu-Higa, A. and Yamazato, M. On moments and tail behaviors of storage processes. Journal of Applied Probability, 20:1069–1086, 2003. [35] Kurtz, T. and Protter, P. Weak convergence of stochastic integrals and differential equations. II. Infinite-dimensional case. Probabilistic models for nonlinear partial differential equations (Montecatini Terme, 1995), Lecture Notes in Math. 1627:197– 285, Springer, Berlin, 1996. [36] Kurtz, T. and Protter, P. Weak convergence of stochastic integrals and differential equations. Probabilistic models for nonlinear partial differential equations (Montecatini Terme, 1995), Lecture Notes in Math. 1627:1–41, Springer, Berlin, 1996.

Approximation of Solutions of Stochastic Differential Equations

143

[37] Kurtz, T. and Protter, P. Characterizing the weak convergence of stochastic integrals. Stochastic analysis (Durham, 1990), London Math. Soc. Lecture Note Ser. 167:255– 259, Cambridge Univ. Press, Cambridge, 1991. [38] Kurtz, T. and Protter, P. Wong-Zakai corrections, random evolutions, and simulation schemes for SDEs. Stochastic analysis, 331–346, Academic Press, Boston, MA, 1991. [39] Kurtz, T. and Protter, P. Weak limit theorems for stochastic integrals and stochastic differential equations. Ann. Probab. 19(3): 1035–1070, 1991. [40] Kusuoka, S. Approximation of expectation of diffusion process and mathematical finance. Taniguchi Conference on Mathematics Nara ’98, Adv. Stud. Pure Math. 31:147–165, Math. Soc. Japan, Tokyo, 2001. [41] Kusuoka, S.: Approximation of expectation of diffusion processes based on Lie algebra and Malliavin calculus. Adv. Math. Econ., 6:69–83, Springer, Tokyo, 2004. [42] Lyons, T. and Victoir, N.: Cubature on Wiener space. Stochastic analysis with applications to mathematical finance. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 460:169–198, 2004. [43] Milshtein, G.N.: A method of second order accuracy integration of stochastic differential equations. Theory Probab. Appl. 23:396–401, 1978. [44] Menozzi, S. Discr´etisations associ´ees ` a un processus dans un domaine et sch´emas num´eriques probabilistes pour les EDP paraboliques quasilin´eaires. Th`ese Universit´e Pierre et Marie Curie – Paris VI, 2004. [45] Mikulevicius, R. and Platen, E. Time discrete Taylor approximations for Itˆ o processes with jump component. Math. Nachr. 138(9):93–104, 1988. [46] M¨ uller-Gronbach, T. Optimal pointwise approximation of SDEs based on Brownian motion at discrete points. Ann. Appl. Probab. 14(4):1605–1642, 2004. [47] Ninomiya, S. A new simulation scheme of diffusion processes: application of the Kusuoka approximation to finance problems. Math. Comput. Simulation 62:479– 486, 2003. [48] Ninomiya, S. A partial sampling method applied to the Kusuoka approximation. Monte Carlo Methods Appl. 9(1):27–38, 2003. [49] Ninomiya, M. and Ninomiya, S.: A new higher-order weak approximation scheme for stochastic differential equations and the Runge-Kutta method. Finance and Stochastics 13:415–443, 2009. [50] Ninomiya, S. and Victoir, N. Weak approximation of stochastic differential equations and application to derivative pricing. Applied Mathematical Finance, 15(2):107–121, 2008. [51] Nualart, D. The Malliavin Calculus and Related Topics. Springer Berlin Heidelberg New York, 1995. [52] Nualart, D. Analysis on Wiener space and anticipating stochastic calculus. Lecture Notes in Math., 1690:123–227, 1998. [53] Pag`es, G. Multi-step Richardson-Romberg Extrapolation: Remarks on Variance Control and complexity. Monte Carlo Methods Appl. 13(1):37–70, 2007. [54] Pettersson, R. Approximations for stochastic differential equations with reflecting convex boundaries. Stochastic Proc. Appl. 59:295–308, 1995.

144

B. Jourdain and A. Kohatsu-Higa

[55] Platen, E. An approximation method for a class of Itˆ o processes with jump component. Litovsk. Mat. Sb. 22(2):124–136, 1982. [56] PREMIA: An Option Pricer, Mathfi Project (INRIA, CERMICS, UMLV). http://www-rocq.inria.fr/mathfi/Premia/ [57] Protter, P.E. Stochastic integration and differential equations. Applications of Mathematics (New York) 21. Stochastic Modelling and Applied Probability. SpringerVerlag, 2004. [58] Slomi´ nski, L. On approximation of solutions of multidimensional SDEs with reflecting boundary condition. Stochastic Proc. Appl. 50:197–219, 1994. [59] Slomi´ nski, L. Euler approximation of solutions of SDEs with reflecting boundary. Stochastic Proc. Appl. 94:317–337, 2001. [60] Talay, D. and Tubaro, L. Expansion of the global error for numerical schemes solving stochastic differential equations. Stochastic Anal. Appl. 8(4):483–509, 1990. Benjamin Jourdain Universit´e Paris-Est, CERMICS Project team MathFi ENPC-INRIA-UMLV 6 et 8, avenue Blaise Pascal Cit´e Descartes – Champs-sur-Marne F-77455 Marne-la-Vall´ee Cedex 2, France e-mail: [email protected] Arturo Kohatsu-Higa Ritsumeikan University Japan Science and Technology Agency Department of Mathematical Sciences 1-1-1 Nojihigashi, Kusatsu Shiga, 525-8577, Japan e-mail: [email protected]

Progress in Probability, Vol. 65, 145–167 c 2011 Springer Basel AG 

Strong Consistency of Bayesian Estimator Under Discrete Observations and Unknown Transition Density Arturo Kohatsu-Higa, Nicolas Vayatis and Kazuhiro Yasuda Abstract. We consider the asymptotic behavior of a Bayesian parameter estimation method under discrete stationary observations. We suppose that the transition density of the data is unknown, and therefore we approximate it using a kernel density estimation method applied to the Monte Carlo simulations of approximations of the theoretical random variables generating the observations. In this article, we estimate the error between the theoretical estimator, which assumes the knowledge of the transition density and its approximation which uses the simulation. We prove the strong consistency of the approximated estimator and find the order of the error. Most importantly, we give a parameter tuning result which relates the number of data, the number of time-steps used in the approximation process, the number of the Monte Carlo simulations and the bandwidth size of the kernel density estimation. Mathematics Subject Classification (2000). 62F12, 62F15, 65C60. Keywords. Bayesian inference; strong consistency; discrete observations.

1. Introduction We consider a parameter estimation method of Bayesian type under discrete observations. That is, our goal is to estimate the posterior expectation of some function f given the observed data Y0N = (Y0 , Y1 , . . . , YN );  f (θ)φθ (Y0N )π(θ)dθ EN [f ] := Eθ [f |Y0 , . . . , YN ] :=  , (1.1) φθ (Y0N )π(θ)dθ where φθ (Y0N ) is the joint-density of the model process for Y0N = (Y0 , Y1 , . . . , YN ) and π(θ) is a prior distribution. This method can be applied when the joint-density φθ (Y0N ) is known a priori. In this article, we suppose that this is not the case.

146

A. Kohatsu-Higa, N. Vayatis and K. Yasuda

Suppose that Yi is a stationary Markov chain and therefore φθ (y0N ) = >N−1 µθ (y0 ) i=0 pθ (yi , yi+1 ), where µθ is a probability density function which is the invariant measure of Yi and pθ (y, z) is the transition density from y to z. But this expression (1.1) is still theoretical, as we do not know the transition density pθ . So we propose to estimate this quantity based on the simulation of the underlying process. Usually the underlying process is also approximated using, for example, the Euler-Maruyama scheme in the case that Y is generated by a diffusion. Then we approximate the transition density of the Euler-Maruyama approximation through the kernel density estimation method. Under these settings, we consider an approximation of the posterior expectation (1.1):  N f (θ)φˆN n θ (Y0 )π(θ)dθ ˆ , EN,m [f ] :=  N φˆN θ (Y0 )π(θ)dθ >N N N where φˆN ˆθ (Yj−1 , Yj ) and pˆN θ (Y0 ) := µθ (Y0 ) θ (y, z) is an approximation of j=1 p the transition density obtained using the kernel density estimation method. The problem we want to address is that there exists an appropriate choice of all the parameters that appear in the approximation of the posterior expectation so that convergence is obtained. In particular, the issue that is new in this research in comparison with previous results is that we give an estimate of how good the approximation of the transition density has to be as the amount of observations increase to infinity (therefore the number of arguments in qθ tend to infinity). In reality one also needs to approximate the invariant measure but this problem can be solved with an extra term. The quality of approximation is studied in Talay [10] and the references therein. Approximating posterior distribution can also be interpreted as a type of filtering problem for a diffusion process. This approach is considered in a general framework by Del Moral, Jacod, Protter [6]. Cano et al. [5] considered this problem based on Bayesian inference for a stochastic differential equation with parameter θ under similar settings to ours. They use direct calculations through an error estimation between the transition density of the true stochastic differential equation and the transition density of the Euler-Maruyama approximation given by Bally and Talay [1]. They prove that for fixed N the approximate posterior distribution which uses the exact density of the Euler approximation converges to the exact posterior distribution as the number of steps increase. Yasuda [12] improved this result and gave a general framework where the rate of convergence is N −1/2 under a condition of approximation for the simulated density and the density of the approximation. This condition ((6)-(b) in this article) hides the tuning relationship between m and N . The result is stated in general terms so that it can be adapted to various diffusion cases. In this article, we consider the problem of approximation the transition densities using the kernel density estimation method and therefore we clarify the tuning process that is required for the convergence of approximation of posterior expectations. Therefore, we give an explicit expression that shows how to choose

Strong Consistency of Bayesian Estimator

147

parameters (number of Monte Carlo simulation n, bandwidth size of the kernel density estimation h and number of approximation parameter of the process m) based on the number of observations N . In this paper, we assume that observation data Y0 , Y1 , . . . , YN comes from a stationary and α-mixing process and the time interval between Yi and Yi+1 is fixed for all i. We give the relationship between the number of data and the parameters of the transition density approximation and we estimate the error as follows;   Ξ   n ˆN,m [f ] ≤ √ a.s. ω, ω ˆ, EN [f ] − E N where Ξ is some positive random variable with Ξ < +∞ a.s. ω, ω ˆ (here ω denotes the randomness associated with the data and ω ˆ denotes the randomness associated with the simulations). And also we prove that EN [f ] converges to f (θ0 ) with order √1 , where θ0 is the true value. N This paper is structured as follows. In Section 2, we will give the setting of our problem and precisely state our main theorem. In Section 3, we will prove our main theorem stated in Section 2 by using Laplace method dividing the proof in four estimations expressed in Proposition 3.1. This decomposition plays a central role in the proof. In Section 4, we will show how to deal with each term in the decomposition. In Section 5, we will give the tuning result which states the relationship between the number of Monte Carlo simulation n and the bandwidth size h with related to N so that a certain rate of convergence is achieved. The proofs of various statements are involved and therefore we have tried to give the main line of thought in the proofs in this article. One can find proofs of theorems, propositions and lemmas of this paper in Kohatsu-Higa et al. [8] and Yasuda [12]. And also one can find an example, the Ornstein-Uhlenbeck process case, of this paper in Kohatsu-Higa et al. [9].

2. Settings and main theorem 2.1. Settings First we recall the definition of α-mixing process; Definition 2.1 (Billingsley [2], p. 315). For a sequence X1 , X2 , . . . of random variables, let αn be a number such that |P (A ∩ B) − P (A)P (B)| ≤ αn

for A ∈ σ(X1 , . . . , Xk ), B ∈ σ(Xk+n , Xk+n+1 , . . . ) and k, n ∈ N. Suppose that αn → 0 as n → ∞, then we call the sequence X1 , X2 , . . . , α-mixing process. For convenience, we set α0 = 1. Here we consider the following setting: Let θ0 ∈ Θ := [θl , θu ], (θl < θ u ) be a parameter that we want to estimate, where Θ is a compact subset in R ˙ where Θ ˙ denotes the interior of the set Θ and Θ0 = Θ − {θ0 }. and θ0 ∈ Θ, ¯ ¯ ˆ F, ˆ Pˆ ) be three probability spaces, where the Let (Ω, F, Pθ0 ), (Ω, F, P¯ ) and (Ω,

148

A. Kohatsu-Higa, N. Vayatis and K. Yasuda

probability measure Pθ0 is parametrized by θ0 . ∆ > 0 is a fixed parameter that represents the time between observations. The probability space (Ω, F, Pθ0 ) is ¯ F¯ , P¯ ) is used for the process that defines the process used for the observations, (Ω, ˆ Fˆ , Pˆ ) is used for the simulations that are used in with law Pθ and the space (Ω, estimating the transition density. (i) (Observation process) Let {Yi∆ }i=0,1,...,N be a sequence of N +1-observations of a Markov chain having transition density pθ0 (y, z), y, z ∈ R and invariant measure µθ0 . This sequence is defined on the probability space (Ω, F , Pθ0 ). We write Yi := Yi∆ for i = 0, 1, . . . , N . (ii) (Model process) Denote by X y (θ) a random variable defined on the proba¯ F, ¯ P¯ ) such that its law is given by pθ (y, ·). bility space (Ω, ˆ ˆ (iii) Denote by (Ω, F, Pˆ ) the probability space which generates the simulation of the random variable X y (θ). y (iv) (Approximating process) Denote by X(m) (θ) the simulation of an approximay ¯ F¯ , P¯ ). H ˆ t := Fˆt ⊗ F¯ tion of the model process X (θ), which is defined on (Ω, satisfies the usual condition. m is the parameter that determines the quality of the approximation. p˜N ˜N θ (y, ·) = p θ (y, ·; m(N )) is the density for the random y variable X(m) (θ). (v) (Approximated transition density) Let K : R → R+ be a kernel which sat isfies K(x)dx = 1 and K(x) > 0 for all x. Denote by pˆN θ (y, z), the kernel y (y, z) based on n simulation (independent) of X(m) (θ) density estimation of p˜N θ ˆ F, ˆ Pˆ ). The outcomes of n simulations are denoted which are defined on (Ω, y,(k)

by X(m) (θ), k = 1, . . . , n. For h(N ) > 0,

pˆN ˆN ˆ ; m(N ), h(N ), n(N )) θ (y, z) := p θ (y, z; ω   y,(k) n(N )  ˆ) − z X(m(N )) (θ, ω 1 . K := n(N )h(N ) h(N ) k=1

(vi) For given m, we introduce the “average” transition density over all trajectories with respect to the kernel K; p¯N ¯N θ (y, z) := p θ (y, z; m(N ), h(N ))    y,(1) X(m(N )) (θ, ·) − z ' N ( 1 ˆ pˆ (y, z) = E ˆ  , := E K θ h(N ) h(N )

ˆ means the expectation with respect to Pˆ . where E

As it can be deduced from the above set-up, we have preferred to state our problem in abstract terms without explicitly defining the dynamics that generate y X y (θ) or how the approximation X(m) (θ) is defined. This is done in order not to obscure the arguments that follow and to avoid making the paper excessively long.

Strong Consistency of Bayesian Estimator

149

All the properties that will be required for pθ are p˜N θ that will be satisfied for a subclass of diffusion processes. Remark 2.2. (i) Without loss of generality, we can consider the product of the above three probability spaces so that all random variables are defined on the same probability space. We do this without any further mentioning. (ii) Note that from the definition and the properties of the kernel K, p¯N θ (y, ·) satisfies for all θ ∈ Θ and y ∈ R  p¯N ¯N θ (y, z)dz = 1 and p θ (y, z) > 0 for all z ∈ R. Our purpose is to estimate the posterior expectation of some function f ∈ C 1 (Θ) given the data:  f (θ)φθ (Y0N )π(θ)dθ IN (f ) :=  , EN [f ] := Eθ [f |Y0 , . . . , YN ] = IN (1) φθ (Y0N )π(θ)dθ >N where φθ (Y0N ) = φθ (Y0 , . . . , YN ) = µθ0 (Y0 ) j=1 pθ0 (Yj−1 , Yj ) is the joint density of (Y0 , Y1 , . . . , YN ). We propose to estimate this quantity based on the simulation of the process:  n N IˆN,m (f ) f (θ)φˆN θ (Y0 )π(θ)dθ n ˆ :=  , EN,m [f ] := n N IˆN,m (1) φˆN θ (Y0 )π(θ)dθ

>N N N where φˆN ˆθ (Yj−1 , Yj ). θ (Y0 ) := µθ (Y0 ) j=1 p 2.2. Main theorem

Assumption 2.3. We assume the following (1) (Observation process) {Yi }i=0,1,...,N is an α-mixing process with αn = O(n−5 ). (2) (The prior distribution) The prior distribution π is continuous in θ. And the support is Θ, that is, for all θ ∈ Θ, π(θ) > 0. (3) (Density regularity) The transition densities , p, p¯N ∈ C 2,0,0 (Θ -× R2 ; R+ ), and for all θ ∈ Θ, y, z ∈ R, we have that min pθ (y, z), p¯N θ (y, z) > 0. And 0,0 pθ admits an invariant measure µ ∈ Cb (Θ × R; R+ ), and for all θ ∈ Θ, µθ (y) > 0 for every y ∈ R. (4) (Identifiability) Assume that there exist c1 and c2 : R → (0, ∞) such that for all θ ∈ Θ,  inf |qθi (y, z) − qθi 0 (y, z)|dz ≥ ci (y)|θ − θ0 |, N

and Ci (θ0 ) := qθ2 = p¯N θ .



ci (y)2 µθ0 (y)dy ∈ (0, +∞) for i = 1, 2 and qθ1 = pθ and

150

A. Kohatsu-Higa, N. Vayatis and K. Yasuda

(5) (Regularity of the log-density) We assume that for qθ = pθ ,¯ pN θ

12  i ∂ ln qθ (y, z) pθ0 (y, z)µθ0 (y)dydz < +∞, f or i = 0, 1, 2, sup sup ∂θi N θ∈Θ   2   ∂  < +∞, sup sup  2 (y, z)µ (y)dydz (ln qθ (y, z)) p¯N θ0 θ0  N θ∈Θ ∂θ    i  N ∂   sup sup  ∂θi ln qθ (y, z) p¯θ0 (y, z)µθ0 (y)dydz < +∞, f or i = 0, 1, N θ∈Θ 0

∂ 1 where ∂θ 0 qθ = qθ . (6) (Parameter tuning) (a) We assume the following boundedness condition; 

 −1  1 N ∂ ∂   N N ln pˆθ (Yi , Yi+1 ) − ln p¯θ (Yi , Yi+1 )  < +∞ a.s. sup sup  √  ∂θ ∂θ N θ∈Θ  N i=0

(b) Assume that for each y, z ∈ R, there exists a factor C1N (y, z) and c1 (y, z) such that   N pθ0 (y, z) − p¯N  θ0 (y, z) ≤ C1 (y, z)a1 (N ), where supN C1N (y, z) < +∞ and a1 (N ) → 0 as N → ∞, and √ C1N (y, z)a1 (N ) N < c1 (y, z), where c1 satisfies the following;    ∂   ln p¯N (y, z) c1 (y, z)µθ0 (y)dydz < +∞. sup sup θ   ∂θ N θ∈Θ

(c) There exist some function g N : R2 → R and constant a2 (N ), which depends on N , such that for all y, z ∈ R   ∂  ∂  ≤ |g N (y, z)|a2 (N ), sup  ln p¯N (y, z) − ln p (y, z) θ θ  ∂θ θ∈Θ ∂θ where supN Eθ0 [|g N (Y0 , Y1 )|4 ] < +∞ and a2 (N ) → 0 as N → ∞.

Remark 2.4. (i) Assumption 2.3 (4) is needed in order to be able to obtain that the density can be used in order to discern the value of θ from the observations. This type of assumption is natural in statistics and can be assured in the case of one-dimensional stochastic differential equations under differentiability of the coefficients. Assumption 2.3 (5) will be satisfied under enough regularity of the transition density function pθ and its approximation p¯N θ . This can be achieved with Malliavin Calculus techniques in the case of diffusion equations. 1 The

power 12 is needed to prove a central limit theorem (Proposition 4.4.).

Strong Consistency of Bayesian Estimator

151

(ii) Assumption 2.3 (6)-(a) will be crucial in what follows and it is the property that will determine the rate of convergence and the tuning properties. Note that all other hypothesis deal with the transition density pθ or the average of its approximation p¯N θ . Therefore the needed properties essentially follow from similar properties of pθ and some limit arguments. Assumption 2.3 (6)(a) is the only condition that deals with the approximation itself pˆN θ , which is random. In particular, obtaining a lower bound for pˆN will be the important θ problem to solve. This will be further discussed in Section 5. (iii) In this problem, we need to study two approximation problems, one is a difference between the transition densities of the observation process and the approximated process, and the other is a difference of the transition density of the approximated process and the expectation for the approximation based on kernel density estimation. Assumption 2.3 (6)-(b) and (c) state the rate of convergence of the density of the approximation and its derivatives. This problem was studied in Bally, Talay [1] and Guyon [7], and the second approximation problem can be dealt using the kernel density estimation theory. For example, in the case that the data {Yi } comes from a stochastic differential equation, the Euler-Maruyama approximation is most commonly used. And in that case, we have the following results. Assume that the drift and diffusion coefficients belong to Cb∞ (R), and the diffusion coefficient satisfies the uniform ellipticity condition. Then for α, β ∈ N, there exist c1 ≥ 0 and c2 > 0 such that for all N ≥ 1, ∆ ∈ (0, 1] and y, z ∈ R, 1 m(N ) α β ∂ α ∂ β πθ (∆, y, z) + rθ0 (∆, y, z), ∂yα ∂zβ p˜N θ0 (y, z) − ∂y ∂z pθ0 (y, z) ≤ m(N ) y z 0 where let D be a differential operator and ptθ0 (y, z) be the transition density of Yt , set  ∆ ∞ πθ0 (∆, y, z) := psθ0 (y, w)D(p∆−s θ0 (·, z))(w)dwds, 0

and

−∞



1 1 c2 |y − z|2 exp − . m(N )2 ∆ α+β+5 ∆ 2 For more details, see Proposition 1 in Guyon [7]. For the second problem, in the case that the kernel K satisfies   xK(x)dx = 0 and x2 K(x)dx < +∞,    m(N )  rθ0 (∆, y, z) ≤ c1

if p˜N θ0 (y, z) is uniformly bounded in y, z ∈ R and N ∈ N and twice continuously differentiable with respect to z for all y ∈ R and N ∈ N, then we have, for all y ∈ R and N ∈ N,  ∂ 2 p˜N 1 θ0 N N 2 p¯θ0 (y, z) − p˜θ0 (y, z) = h(N ) (y, z) x2 K(x)dx + o(h(N )2 ). 2 ∂z 2 For more details, see Wand, Jones [11].

152

A. Kohatsu-Higa, N. Vayatis and K. Yasuda 1 + h(N )2 . And the choice for Therefore roughly speaking, a1 (N ) = m(N ) √ m(N ) = N will satisfy Assumption 2.3 (6)-(b).

Now we state the main result of the paper. Theorem 2.5. Under Assumption 2.3, there exists some positive random variable Ξ such that Ξ < +∞ a.s. and   Ξ   n [f ] ≤ √ a.s. EN [f ] − EˆN,m N And also, we have

  Ξ1 Ξ2  n  |EN [f ] − f (θ0 )| ≤ √ a.s. and EˆN,m [f ] − f (θ0 ) ≤ √ a.s., N N

where Ξ1 is some positive random variable with Ξ1 < +∞ a.s. and Ξ2 is some positive random variable with Ξ2 < +∞ a.s.

3. Idea of the proof of Theorem 2.5 First we introduce some notation. Let p, q : R2 → R+ be strictly positive functions of two variables. Then let  ) * ln p(y, z) q(y, z)µθ0 (y)dydz, H(p, q) := N −1 " 1 ! ln pθ (Yi , Yi+1 ) − H(pθ , pθ0 ) , ZN (θ) := √ N i=0

ε(θ) := H(pθ , pθ0 ) − H(pθ0 , pθ0 ),

βN (θ) := ZN (θ) − ZN (θ0 ).

And also we set;

N −1  N N  1  Z¯N (θ) := √ ln pˆN ¯θ , p¯θ0 , θ (Yi , Yi+1 ) − H p N i=0    N N ε¯N (θ) := H p¯N ¯N ¯θ0 , p¯θ0 , θ ,p θ0 − H p ¯ ¯ ¯ βN (θ) := ZN (θ) − ZN (θ0 ).

Set Θ0 := Θ \ {θ0 }. The following proposition states the properties that are needed to achieve the proof of Theorem 2.5 Proposition 3.1. Under Assumption 2.3, we have the following results. (i) There exist some strictly negative constants c1 , c2 such that c1 ≤ inf

θ∈Θ0

ε(θ) ε(θ) ≤ sup ≤ c2 < 0. (θ − θ0 )2 (θ − θ0 )2 θ∈Θ0

Strong Consistency of Bayesian Estimator

153

(ii) There exist some random variables d1 , d2 such that d1 ≤ inf inf

N θ∈Θ0

βN (θ) βN (θ) ≤ sup sup ≤ d2 a.s. θ − θ0 N θ∈Θ0 θ − θ0

(iii) There exist some strictly negative constants c3 , c4 such that c3 ≤ inf inf

N θ∈Θ0

ε¯N (θ) ε¯N (θ) ≤ sup sup ≤ c4 < 0. 2 (θ − θ0 )2 N θ∈Θ0 (θ − θ0 )

(iv) There exist some random variables d3 , d4 such that d3 ≤ inf inf

N θ∈Θ0

β¯N (θ) β¯N (θ) ≤ sup sup ≤ d4 a.s. θ − θ0 N θ∈Θ0 θ − θ0

We will give an idea of the proof of this proposition after this section. We give the proof of Theorem 2.5 using the results of Proposition 3.1. Idea of the Proof of Theorem 2.5. We decompose the approximation error as follows; 

 ˆn n IN,m (f ) − f (θ0 )IˆN,m (1) I (f ) − f (θ )I (1) N 0 N n ˆ − . EN [f ] − E N,m [f ] = n IN (1) IˆN,m (1) The goal is then to prove that there exists some random variable C1 and C2 such that      Iˆn (f ) − f (θ0 )Iˆn (1)   IN (f ) − f (θ0 )IN (1)  C1 C2  N,m  N,m  ≤ √ a.s., and   ≤ √ a.s.   n   IN (1) N N IˆN,m (1) n Indeed, we can write IN (f ) and IˆN,m (f ) as follows;

IN (f ) = e

√ N H(pθ0 ,pθ0 )+ N ZN (θ0 )



f (θ)eN ε(θ)+

√ N βN (θ)

µθ (Y0 )π(θ)dθ,

Θ0

N

N

n (f ) = eN H(p¯θ0 ,p¯θ0 )+ IN,m

√ ¯N (θ0 ) NZ



f (θ)eN ε¯(θ)+

√ N β¯N (θ)

µθ (Y0 )π(θ)dθ.

Θ0

Then by using the Laplace method and Proposition 3.1, we have our conclusion. In fact, as N goes to infinity the leading term in the quotients

IN (f ) IN (1)

and

n IˆN,m (f ) ˆ I n (1)

are

N,m

determined by ε(θ) and ε¯(θ) due to Proposition 3.1. Their behaviors are similar to Gaussian integrals where the variance tends to zero and therefore the integrals will tend to the value in their “mean” which in this case is θ0 as it follows from Proposition 3.1. Details can be found in Kohatsu-Higa et al. [8]. 

154

A. Kohatsu-Higa, N. Vayatis and K. Yasuda

4. Proof of Proposition 3.1 4.1. Ideas for the proof of Proposition 3.1 (i) and (iii) In order to prove the upper bounds for the statements in Proposition 3.1 (i) and (iii), one uses the Pinsker’s inequality; 

2  pθ (y, z) 1 |pθ (y, z) − pθ0 (y, z)| dz ≤ ln 0 pθ (y, z)dz, 2 pθ (y, z) 0

and the identifiability condition. Therefore under Assumptions 2.3 (4), we obtain the upper bound in (i) and under Assumption 2.3 (4) and (5), we obtain the upper bound in (iii). For the lower bounds, we give a useful lemma for the first derivative of H(pθ , pθ0 ) in θ; Lemma 4.1. Let q be a transition density, which depends on a parameter θ. We assume that for all θ ∈ Θ,  ∂ (ln q(y, z; θ)) q(y, z; θ0 )µθ0 (y)dydz ∂θ

 ∂ = ln q(y, z; θ) q(y, z; θ0 )µθ0 (y)dydz. ∂θ Then

∂ ∂θ



  (ln q(y, z; θ)) q(y, z; θ0 )µθ0 (y)dydz 

= 0.

θ=θ0

Using the above lemma together with Taylor’s expansion we obtain the lower bound for (i) under Assumption 2.3 (3), (4) and (5), and under Assumption 2.3 (4) and (5) for (iii). 4.2. Ideas for the proof of Proposition 3.1 (ii) In this section, we consider Proposition 3.1 (ii). We prove this boundedness by using a central limit theorem in C(Θ; R∞ ). Note that C(Θ; R∞ ) is a complete and separable metric space with metric η defined as follows; For x = (x1 , x2 , . . . ), y = (y1 , y2 , . . . ) ∈ C(Θ; R∞ ), ∞  1 {|xi (θ) − yi (θ)| ∧ 1} . i θ∈Θ i=1 2

η(x, y) := sup For N ∈ N, set

 β (θ) ZN (θ) − ZN (θ0 )   N = if θ = θ0 , θ − θ0 θ − θ0 β˜N (θ) :=   ∂ ZN (θ0 ) if θ = θ0 . ∂θ And set γN := (β˜N , β˜N −1 , . . . , β˜1 , 0, . . . ) ∈ C(Θ; R∞ ). The reason why we need to use R∞ in the above setting can be seen from the following lemmas. In fact, in order to prove the boundedness of β˜N (θ) in N , we need to consider another

Strong Consistency of Bayesian Estimator

155

random vector which has the same “joint”-distribution as γN when we apply the Skorohod representation theorem. The idea of the proof consists of proving that the sequence γN converges weakly in C(Θ; R∞ ). Therefore the limit γ = (γ 1 , γ 2 , . . . ) should satisfy that there exist some random variables d1 , d2 such that d1 ≤ inf γ 1 (θ) ≤ sup γ 1 (θ) ≤ d2 a.s. θ∈Θ

θ∈Θ

Now, without loss of generality we can use the Skorohod representation theorem, 1 so that the first component γN = β˜N of γN will satisfy a similar property. And finally, from the convergence, we obtain the boundedness of β˜N in N . In particular, we use the following lemmas. Lemma 4.2. Let X = (X1 , X2 , . . . ), Y = (Y1 , Y2 , . . . ) ∈ C(Θ; R∞ ) be random d variables such that X = Y . Then we have d

sup sup |Xn (θ)| = sup sup |Yn (θ)|. n θ∈Θ

n θ∈Θ

Lemma 4.3. Let (S, · ) be a complete, separable metric normed space. Let X be an S-valued random variable on a probability space (Ω, F , P ) and let Y be an d

S-valued random variable on a probability space (Ω′ , F ′ , P ′ ). Supposed that X = Y and there exists an R+ -valued random variable d on (Ω′ , F ′ , P ′ ) such that Y (ω ′ ) ≤ d(ω ′ )

for all ω ′ ∈ Ω′ .

Then there exists a positive random variable M on (Ω, F, P ) such that X(ω) ≤ M (ω) a.s. ω ∈ Ω.

In order to prove the weak convergence of γN we extend Theorem 7.1 and Theorem 7.3 in Billingsley [3] which imply that it is enough to prove convergence of marginals and tightness of the sequence γN . In order to prove convergence of marginals, we use Assumption 2.3 (1) and (5), so that for every r ∈ N and θ1 , . . . , θr ∈ Θ, (γN (θ1 ), . . . , γN (θr )) converges weakly. The proof uses the Cr´amer-Wold device and the following extension of the Central limit theorem for α-mixing processes (the proof is an extension of Theorem 27.5 in Billingsley [2], p. 316.) Proposition 4.4. Suppose that X1 , X2 , . . . is stationary and α-mixing with αn = O(n−5 ), where we set α0 = 1 and f is a B(R2 )-measurable function which satisfies E[f (X0 , X1 )] = 0 and E[f (X0 , X1 )12 ] < +∞. If we set fi := f (Xi , Xi+1 ) and Sn := f1 + · · · + fn , then ∞  ' ( 1 V ar(Sn ) −→ σ 2 := E f12 + 2 E [f1 f1+k ] , n k=1

Sn √ n

where the series converges absolutely. And ⇒ M , where M is a normal distributed random variable with mean 0 and variance σ 2 . If σ = 0, we define M = 0.

156

A. Kohatsu-Higa, N. Vayatis and K. Yasuda

To prove the tightness of γN , we use Assumption 2.3 (1), (3) and (5). One needs to prove that for all ε > 0,   N  "  1 ! ˜ ′  ˜ lim lim sup Pθ0 sup βN −i+1 (θ) − βN −i+1 (θ ) ∧ 1 ≥ ε = 0. i δ→0 N →∞ |θ−θ ′ |≤δ i=1 2

In the proof, the basic ingredient is the Garsia, Rodemich, Rumsey lemma.

4.3. Decomposition for the estimation for Proposition 3.1 (iv) Set  . : N −1  pˆN 1 1  pˆN  θ0 θ  ln N (Yi , Yi+1 ) − ln N (Yi , Yi+1 ) , θ = θ0   √N θ − θ0 p¯θ p¯θ0 1 i=0 JN (θ) :=  N −1 N    ∂ pˆθ 1    √ ln N , θ = θ0 , (Yi , Yi+1 )  ∂θ p ¯ N i=0 θ θ=θ0  √   ln p¯N , ¯N  θ − ln p θ0  (y, z) p¯N θ = θ0  N θ0 − pθ0 (y, z)µθ0 (y)dydz, θ − θ0 2   JN (θ) := √  , N ∂    ln p¯N p¯θ0 − pθ0 (y, z)µθ0 (y)dydz, θ = θ0 ,  N θ (y, z) ∂θ θ=θ0 .  N −1 N N  1  ln p¯θ − ln p¯θ0   √ (Yi , Yi+1 )   θ − θ0   N i=0 :      ln p¯N − ln p¯N θ θ  0  − (y, z)pθ0 (y, z)µθ0 (y)dydz , θ = θ0  θ − θ0 3 JN (θ) :=  N −1%     1  ∂ N   √ ln p ¯ (Y , Y )  i i+1 θ    N ∂θ  θ=θ0 i=0   &     ∂  N   − ln p¯θ (y, z) pθ0 (y, z)µθ0 (y)dydz , θ = θ0 ,  ∂θ θ=θ0

Then we define:

β¯N (θ) 1 2 3 := JN (θ) − JN (θ) + JN (θ). θ − θ0 1 We can easily prove boundedness of the first term JN (θ) from Assumption 2.3 2 (6)-(a), and also we can easily prove boundedness of JN (θ) from Assumption 2.3 3 3 (6)-(b). Finally we consider the third term JN (θ). To prove boundedness of JN (θ), we prove a weak convergence by using the similar argument as Section 4.2. Then we can prove Proposition 3.1 (iv).

Strong Consistency of Bayesian Estimator

157

5. Parameter tuning and the Assumption 2.3 (6)-(a) This section is devoted to proving that Assumption 2.3 (6)-(a) is satisfied under sufficient smoothness hypothesis on the random variables and processes that appear in the problem as well as the correct parameter tuning. That is, we need to prove that the following condition (Assumption 2.3 (6)-(a) in Section 2.2) is satisfied 

 −1  1 N ∂ ∂   ln pˆN ln p¯N sup sup  √ θ (Yi , Yi+1 ) − θ (Yi , Yi+1 )  < +∞ a.s. (5.1)   ∂θ ∂θ N i=0 N θ∈Θ In order to understand the role of all the approximation parameters, we rewrite pˆN ¯N θ and p θ as follows y,(k) n X(m) (θ) − z

1  N K pˆθ (y, z) := nh h k=1  y,(1)

X(m) (θ, ·) − z 1 N p¯θ (y, z) := E K . h h

Here m ≡ m(N ), n ≡ n(N ) and h ≡ h(N ) are parameters that depend on N . n is the number of Monte Carlo simulations used in order to estimate the density and m is the parameter of approximation (in the Euler scheme this is the number y,(1) of time steps used in the simulation of X(m) (θ)) and h is the bandwidth size associated to the kernel density estimation method. In this sense we will always think of hypotheses in terms of N although we will drop them from the notation and just use m, n and h. The goal of this section is to prove that under certain hypotheses, there is a choice of m, n and h that ensures that condition (5.1) is satisfied. As the main problem is to obtained upper and lower bounds for pˆN θ for random arguments, we will first restrict the values for the random variables Yi , i = 0, . . . , N − 1 to a compact set. This is obtained using an exponential type Chebyshev’s inequality and the Borel-Cantelli lemma (Theorem 4.3 in pp. 53 of Billingsley [2]) as follows Lemma 5.1. Assume the following hypothesis 2 (H0) mc1 := supi E[ec1 |Yi | ] < ∞ for some constant c1 > 0. Furthermore let aN ≥ θu − θl be a sequence of strictly positive numbers such that ∞   N exp −c1 a2N < ∞. N=1

Then we have that for a.s. ω ∈ Ω, there exists N big enough such that maxi=1,...,N |Yi | < aN . That is, for we have

AN := {ω ∈ Ω; ∃i = 1, . . . , N s.t. |Yi | > aN } ) * P lim sup AN = 0. N →∞

158

A. Kohatsu-Higa, N. Vayatis and K. Yasuda

The decomposition that we will use in order to prove (5.1) is the same decomposition as in the proof of Theorem 3.2 in p. 73 of Bosq [4]. That is,     ∂θ pˆN ∂θ p¯N θ θ  sup sup  N (x, y) − N (x, y) pˆθ p¯θ θ∈Θ |x|,|y|≤aN    ¯N supθ∈Θ sup|x|,|y|≤aN ∂θ pˆN θ (x, y) − ∂θ p θ (x, y) ≤ N inf θ∈Θ inf |x|,|y|≤aN p¯θ (x, y)      supθ∈Θ sup|x|,|y|≤aN pˆN  ∂θ pˆN (x, y) − p¯N (x, y) θ θ θ + sup sup  N (x, y) pˆθ inf θ∈Θ inf |x|,|y|≤aN p¯N θ∈Θ |x|,|y|≤aN θ (x, y) =:

A D +C , B B

(5.2)

where we remark that   y,(k) n  X(m) (θ) − z 1  ∂θ X y,(k) (θ) , ∂θ pˆN K′  θ (x, y) = (m) nh2 h k=1

   y,(k) X (θ) − z 1  ∂θ X y,(k) (θ) .  K ′  (m) ∂θ p¯N θ (x, y) = E (m) h2 h 

√ A  Therefore in order to prove the finiteness of (5.1), we need to bound N B +C D B . This will be done in a series of lemmas using Borel-Cantelli arguments together with the modulus of continuity of the quantities p¯N ˆN θ and p θ . First, we start anaD lyzing the difficult part C B . 5.1. Upper bound for C D in (5.2) B We work in this section under the following hypotheses: (H1) Assume that there exist some positive constants ϕ1 , ϕ2 , where ϕ1 is independent of N and ϕ2 is independent of N and ∆, such that the following holds: inf

(x,θ)∈B N

p¯N θ (x, y)



ϕ2 a2N ≥ ϕ1 exp − , ∆

where we set , B N := (x, θ) ∈ R2 × Θ; x < aN ,

where · is the max-norm.

Strong Consistency of Bayesian Estimator

159

(H2) Assume that the kernel K is the Gaussian kernel;

1 1 K(z) := √ exp − z 2 . 2 2π (H3) Assume that for some constant r3 > 0 and a sequence {b3,N ; N ∈ N} ⊂ [1, ∞), ∞ na2r3 E[|Z3,N (·)|r3 ] < ∞, where we have that N =1 N (h2 b3,N )r3 (k) Z3,N

(ω) :=

a−2 N



sup (x,θ)∈B N

   x,(k)  X(m) (θ, ω) + 1



sup (x,θ)∈B N

    x,(k) ∂θ X(m) (θ, ω) .

(H4) Assume that for some constant r4 > 0 and a sequence {b4,N ; N ∈ N} ⊂ [1, ∞), r 4 3  4 (k) (·) ∞ nE Z4,N we have that N =1 < ∞, where (b4,N )r4 (k) Z4,N

(ω) := a−1 N



sup (x,θ)∈B N

    x,(k) ∂x X(m) (θ; ω) +

sup (x,θ)∈B N

     x,(k) ∂θ X(m) (θ; ω) .

(H5) Assume that there exists some positive constant C5 > 0 such that for all x, y ∈ R, m ∈ N and θ ∈ Θ,       ∂x p¯N (x, y) , ∂y p¯N (x, y) , ∂θ p¯N (x, y) ≤ C5 < +∞. θ θ θ (H6) Assume that ηN and νN are sequences of positive numbers so that

∞  (ηN )2 nh2 3 < ∞, νN exp − 16 K ∞ N=1

where · ∞ denotes the sup-norm. Note that from the assumption (H1), we have a lower bound of B in (5.2). Lemma 5.2. Assume hypothesis (H2) and (H3), then we have that  . :    ∂θ pˆN  θ P lim sup sup sup  N (x, y) > b3,N = 0. pˆθ N →∞ θ∈Θ |x|,|y|≤aN The proof of the lemma follows by rewriting the fraction

that it can be bounded using the upper bound of approximating process with respect to θ.

K′ K

∂θ pˆN θ pˆN θ

and show

and the derivative of the

5.2. Upper bound of D in (5.2) In this section, we use the modulus of continuity of pˆN ¯N θ in order to find an θ and p upper bound for B. Similar ideas appear in Theorem 2.2 on p. 49 of Bosq [4].

160

A. Kohatsu-Higa, N. Vayatis and K. Yasuda

Lemma 5.3. Set & % θ u − θl aN N BlN1 l2 := (x, θ) ∈ R2 × Θ; x − xN , |θ − θ | ≤ , ≤ l2 l1 νN νN

2 l1 = 1, . . . , νN , l2 = 1, . . . , νN ,

˚N ∩ B ˚N′ ′ = ∅ ((l1 , l2 ) = (l1′ , l2′ )) and appropriate set of points xN , θN , such that B l1 l2 l1 l2 l l 1 2

ν2

2 l1 = 1, . . . , νN and l2 = 1, . . . , νN such that ∪l1N=1 ∪νl2N=1 BlN1 l2 = B N . Then    N   pˆ (x) − p¯N (x) sup pˆN ¯N sup θ (x) − p θ (x) = max2 θ θ 1≤l1 ≤ν N 1≤l2 ≤νN

(x,θ)∈B N

≤ max2

1≤l1 ≤ν N 1≤l2 ≤νN

+ max2

1≤l1 ≤ν N 1≤l2 ≤νN

sup

(x,θ)∈BlN l

1 2

sup (x,θ)∈BlN l

1 2

(x,θ)∈BlN l

1 2

   N   N   N  N  N N x + max p ˆ x − p ¯ xl1  pˆθ (x) − pˆN   N N N l1 l1 θ θ θ 2 l2

   N  N  pθ N xl1 − p¯N (x) ¯ . θ

1≤l1 ≤ν N 1≤l2 ≤νN

l2

l2

l2

(5.3)

The proof of the above lemma is straightforward. We consider the first term of (5.3). Lemma 5.4. Under (H2) and (H4), then we have that        2 ′  N  2 K ∞ aN  N  = 0. b P lim sup max2 sup pθ (x) − pˆN xl1  > ˆ N 4,N θ l2  h2 νN N →∞  1≤l1 ≤νN (x,θ)∈B N 1≤l2 ≤νN

l1 l2

Now we consider the third term in (5.3).

Lemma 5.5. Assume (H5), then, max2

1≤l1 ≤ν N 1≤l2 ≤νN

sup (x,θ)∈BlN l

1 2

  aN  N  N  pθ N xl1 − p¯N (x) ≤ 3C5 ¯ θ l2 νN

Finally, we consider the second term of (5.3).

Lemma 5.6. Assume (H2) and that ηN satisfies (H6), then we have that              N − p¯N xN P lim sup max2 ˆ pθN xN  > ηN  = 0. N l l θ 1 1 l2 l2  N →∞  1≤l1 ≤νN 1≤l2 ≤νN

Now we can conclude this section with the following upper bound for C D B.

Theorem 5.7. Assume conditions (H2), (H4), (H5) and (H6), then there exists N big enough such that

D 1 ϕ2 a2N 2 K ′ ∞ a2N aN (3) C ≤ bN × exp × b4,N + ηN + 3C5 . B ϕ1 ∆ h2 νN νN

Strong Consistency of Bayesian Estimator 5.3. Upper bound for

A B

161

in (5.2)

The proof in this case is simpler on the one hand because many of the previous estimates can be used. On the other hand, when considering the analogous result of Lemma 5.6 for the derivatives, ∂θ pˆN ¯N θ and ∂θ p θ , has to be reworked as the condition N N ˆ pθ ∞ < +∞ and ¯ pθ ∞ < +∞ are not valid. From the condition (H1), we have   1 ϕ2 a2N A  ¯N ≤ e ∆ × sup sup ∂θ pˆN θ (x, y) − ∂θ p θ (x, y) . B ϕ1 θ∈Θ |x|,|y|≤aN

Here we consider the sup-term as before.

Lemma 5.8. We use the same notations as the previous section. Then we have    sup sup ∂θ pˆN ¯N θ (x) − ∂θ p θ (x) θ∈Θ |x|,|y|≤aN

≤ max

1≤l1 ≤ν 2 N 1≤l2 ≤νN

sup

  N   N xl1  ∂θ pˆθ (x) − ∂θ pˆN N θ l2

(x,θ)∈BlN l

1 2

  N  N   N + max2 ∂θ pˆN x − ∂ p ¯ xl1  N N θ l1 θ θ l2

1≤l1 ≤ν N 1≤l2 ≤νN

+ max2

1≤l1 ≤ν N 1≤l2 ≤νN

l2

sup

(x,θ)∈BlN l

1 2

   N  N  (x) ∂θ p¯θ N xl1 − ∂θ p¯N . θ l2

(5.4)

As in previous sections, we define that          (k) x,(k) x,(k) −1 ˙ Z4,N := aN h sup sup ∂x ∂θ X(m) (θ) + h sup sup ∂θ ∂θ X(m) (θ) θ∈Θ |x|≤aN

θ∈Θ |x|≤aN

   ) *   (k) x,(k) + Z4,N + 1 sup sup ∂θ X(m) (θ) . θ∈Θ |x|≤aN

(k) Note that {Z˙ 4,N }k=1,2,... is a sequence of independent and identically distributed random variables. Then we set the following hypothesis. (H4′ ). Assume that for some constant r˙ > 0 and a sequence {b˙ 4,N ; N ∈ N} ⊂ # $ 4

[1, ∞), we have that



N =1

   (k)  r˙ 4 nE  Z˙ 4,N  (b˙ 4,N )r˙ 4

< ∞.

Lemma 5.9. Under the above hypothesis and (H4′ ), we have that  .   N   P lim sup max2 sup ˆN ∂θ pˆN θ (x) − ∂θ p θ N xl1  N→∞

1≤l1 ≤ν N 1≤l2 ≤νN

l2

(x,θ)∈BlN l

1 2

K ′ ∞ ∨ K ′′ ∞ a2N ˙ ≥ b4,N h3 νN

:

= 0.

162

A. Kohatsu-Higa, N. Vayatis and K. Yasuda We consider the third term of (5.4).     N  N max2 sup ∂θ p¯θ N xl1 − ∂θ p¯N θ (x) 1≤l1 ≤ν N 1≤l2 ≤νN



aN νN

l2

(x,θ)∈BlN l

1 2

max2

1≤l1 ≤ν N 1≤l2 ≤νN

sup

(x,θ)∈BlN l

1 2

%

   N  sup ∂x ∂θ p¯N θ εx + (1 − ε)xl1 , y

0≤ε≤1

   N N  + sup ∂y ∂θ p¯N θ xl1 , εy + (1 − ε)yl1 0≤ε≤1

&   N   N + sup ∂θ ∂θ p¯εθ+(1−ε)θ N xl1  . 0≤ε≤1

l2

(H5′ ). Assume that there exists some positive constant C˙ 5 > 0 such that for all x, y ∈ R, m ∈ N and θ ∈ Θ,      2 N  ∂x ∂θ p¯N     ¯θ (x, y) ≤ C˙ 5 < +∞. ¯N θ (x, y) , ∂y ∂θ p θ (x, y) , ∂θ p Lemma 5.10. Assume (H5′ ). Then, we have    N aN ∨ (θu − θl )   N ˙ max2 sup x − ∂ p ¯ (x) ≤ 3 C . ∂θ p¯N  N θ 5 l1 θ θl 1≤l1 ≤ν νN 2 N (x,θ)∈B N l l 1≤l2 ≤νN

1 2

Finally, we consider the second term of (5.4). Set   x,(j) X (θ) − y 1 (m) (j),x ′ ˙  ∂θ X x,(j)(θ) W m,h (θ) := 2 K (m) h h     x,(1) 1  ′  X(m) (θ; ·) − y  x,(1) ∂θ X(m) (θ) . − 2E K h h

˙ (j),x (θ)}j=1,2,... is a sequence of independent and identically disNote that {W m,h 3 4 ˙ (j),x (θ) = 0. tributed random variables with E W m,h (H6′ ). There exists C˙ 6 > 0 and α˙ 6 > 0 and a sequence of positive numbers b˙ 6,N such that  . :n ∞  n(ηN )2 h4 C˙ 6 3 νN exp − 1 + 1+α˙ 6 < ∞. 2 n 2 K ′ ∞ (b˙ 6,N )2 a2N N=1

(H6a′ ). Assume that there exists r˙6 > 0 and4 a sequence of positive numbers b˙ 6,N 3 r˙ E |Z˙ 6,N | 6 ∞ in (H6′ ) satisfy, such that N=1 n (b˙ )r˙6 < ∞ for 6,N  4" ! 3     (j) x,(j) x,(1) Z˙ 6,N := a−1 sup sup X (θ) + E X (θ) ∂  ∂  . θ (m) θ (m) N θ∈Θ |x|≤aN

Strong Consistency of Bayesian Estimator

163

#  $  ˙ q˙6 (H6b ). Assume that for some q˙6 > 1, supN ∈N E Z6,N  < +∞ and for α˙ 6 > 0, ′

C˙ 6 > 0 and b˙ 6,N given in (H6′ ) the following is satisfied  q˙6  C˙ 6 (ηN )2 ηN h2 ≤ 1+α˙ 6 . exp − K ′ ′ 2 ( K ∞ daN ,m ) aN n 2( 2 ∞ b˙ 6,N aN )2 h

Applying Lemma 6.2 in Appendix, we obtain the following important lemma. Lemma 5.11. Assume (H6), (H6a′ ) and (H6b′ ). Then there exists N big enough such that   N  N   N max2 ∂θ pˆN x − ∂ p ¯ xl1  ≤ ηN . N N θ l θ θ 1 1≤l1 ≤ν N 1≤l2 ≤νN

l2

l2

Theorem 5.12. Assume conditions (H1), (H2), (H4′ ), (H5′ ), (H6′ ), (H6a′ ) and (H6b′ ). Then we have that for a.s. ω, there exists N0 ≡ N0 (ω) such that for all N ≥ N0 we have

K ′ ∞ ∨ K ′′ ∞ a2N ˙ 1 ϕ2 a2N aN A ˙ ≤ e ∆ × b + η + 3 C . 4,N N 5 B ϕ1 h3 νN νN Finally putting all our results together, we have (see Theorem 5.7).

Theorem 5.13. Assume conditions (H0), (H1), (H2), (H3), (H4), (H5), (H6), (H4′ ), (H5′ ), (H6′ ), (H6a′ ) and (H6b′ ). Then for a.s. ω, there exists N0 ≡ N0 (ˆ ω) such that for all N ≥ N0 we have   

√ √ 1 ϕ2 a2N K ′ ∞ ∨ K ′′ ∞ a2N b˙ 4,N A D N +C ≤6 N e ∆ × + b4,N b3,N B B ϕ1 h2 νN h    C5 ∨ C˙ 5 b3,N + ηN + νN 5.4. Conclusion: Tuning for n and h We need to find now a sequence of values for n and h such that all the hypothesis in the previous theorem are satisfied and that the upper bound is uniformly bounded in N . We rewrite some needed conditions that are related to the parameters n and h. We assume stronger hypothesis that may help us understand better the existence of the right choice of parameters n and h. As we are only interested in the relationship between n and h with N , we will denote by C1 , C2 etc., various constants that may change from one equation to next. These constants depend on K, ∆ and Θ. They are independent of n, h and N but they depend continuously on other parameters. (i) There exists some positive constant CK,∆,Θ ≥ 0, which depends on K, ∆, Θ, and is independent of N such that  

2 ˙ 4,N √ ϕ2 a2 a b a N N N Ne ∆ × b4,N b3,N + + ηN + b3,N ≤ CK,∆,Θ . (5.5) νN h2 h νN

164

A. Kohatsu-Higa, N. Vayatis and K. Yasuda

(ii) (Borel-Cantelli for Yi , (H0)) 2

mc1 := E[ec1 |Y1 | ] < +∞ for some constant c1 > 0 and {aN }N ∈N ⊂ [θu − ∞ l θ , ∞) is a sequence such that for the same c1 , N =1 exp(cN1 a2 ) < +∞. N

(k)

(iii) (Borel-Cantelli for Z3,N (ω), (H3)) For some r3 > 0, ∞ 

N =1

3 na2r r N < +∞ and sup E [|Z3,N (·)| 3 ] < +∞ for each fixed m ∈ N. 2 (h b3,N )r3 N ∈N

(k)

(iv) (Borel-Cantelli for Z4,N (ω), (H4)) For some r4 > 0, ∞ 

N =1

n r < +∞ and sup E [|Z4,N (·)| 4 ] < +∞ for each fixed m ∈ N. (b4,N )r4 N∈N

(v) (Borel-Cantelli for |ˆ pN ¯N θ (x) − p θ (x)|, (H6))

∞  (ηN )2 nh2 3 < +∞. νN exp − 16 K ∞ N =1

(k) (vi) (Borel-Cantelli for Z˙ 4,N (ω), (H4′ )) For some r˙4 > 0, ∞ 

N =1

n (b˙ 4,N )r˙4

# r˙4 $ ˙  < +∞ and sup E Z4,N (·) < +∞ for each fixed m ∈ N. N ∈N

′ (vii) (Borel-Cantelli for |∂θ pˆN ¯N ˙ 6 > 0, and a θ (x) − ∂θ p θ (x)|, (H6 )) For some α constant C˙ 6 ,  :n . ∞  n(ηN )2 h4 C˙ 6 3 < +∞. νN exp − 1 + 1+α˙ 6 n 2 K ′ 2 (b˙ 6,N )2 a2 ∞

N =1

N

(k) (viii) (Borel-Cantelli for Z˙ 6,N (ω), (H6a′ )) For some r˙6 > 0, # ∞ r˙6 $  n   < +∞ and sup E Z˙ 6,N (·) < +∞ for each fixed m ∈ N. ˙ 6,N )r˙6 ( b N ∈N N =1

(ix) ((H6b′ )) For some q˙6 > 1,  q˙6  ηN h2 (ηN )2 C˙ 6 exp − K ′ ≤ 1+α˙ 6 ∞ ′ 2 n ( K ∞ b˙ 6,N ) aN 2( h2 b˙ 6,N aN )2 and

# q˙6 $ ˙  sup E Z6,N (·) < +∞.

N ∈N

where C˙ 6 and α˙ 6 are the same as (vii) above.

Strong Consistency of Bayesian Estimator

165

√ 5.4.1. Parameter tuning. Set aN := c2 ln N for some positive constant c2 . Set n = C1 N α1 for α1 , C1 > 0 and h = C2 N −α2 for α2 , C2 > 0. For (ii) to be satisfied, we need to have ∞ 

N =1

∞  N 1 = . exp(c1 c2 ln N ) N c1 c2 −1 N =1

Then we need c1 > c22 . Note that if we choose c2 large enough, then c1 can be chosen as small as needed. With the above specifications, we can check that all the conditions in Section 5.4 are satisfied if the following parameter condition is satisfied for N sufficiently large

1 γ3 α1 + γ˙ 6 ϕ2 c2 α1 4α2 + 2 + + + + q˙6 > α1 , (5.6) r˙6 ∆ 2 r3 r3 which has to be satisfied together with

2 4ϕ2 c2 2γ˙ 3 γ˙ 6 2 − > 8α2 + 1 + + +2 . α1 1 − r3 r˙6 ∆ r3 r˙6

(5.7)

Notice that the above two inequalities will be satisfied if we take c2 or c big enough. Here for (iii), we assume that there exist some r3 > 0, γ3 > 1 and some constant C3 = 0 such that 1

C3 C3 (N γ3 n) r3 c2 ln N n(c2 ln N )r3 = and therefore b . 3,N = 2 r γ (h b3,N ) 3 N 3 h2 And for (viii), we assume that there exists some r˙6 > 0, γ˙ 6 > 1 and some constant C˙ 6 = 0 such that ) * r˙1 n C˙ 6 6 = γ˙ 6 and therefore b˙ 6,N = C˙ 6 nN γ˙ 6 . N (b˙ 6,N )r˙6 For (iv) and (vi), we assume that for g = b4,N , b˙ 4,N , there exists some rg > 0, γg > 1 and some constant Cg = 0 such that n Cg = γg . (g)rg N These constants do not appear in the main restriction but the fact that the convergence is of the above form is used in the proof of the following theorem. Theorem 5.14. Assume that the constants are chosen so as to satisfy c1 > c22 , (5.6) and (5.7). Also assume that the moment conditions stated in (ii), (iii), (iv), (vi), (viii) and (ix) above are satisfied. Then (H0), (H3), (H4), (H4′ ), (H6), (H6′ ), (H6a′ ) and (H6b′ ) are satisfied. Furthermore, if we assume (H1), (H2), (H5) and (H5′ ), then Assumption 2.3 (6)-(a) is satisfied.

166

A. Kohatsu-Higa, N. Vayatis and K. Yasuda

And also if all other conditions on Assumption 2.3 are satisfied then there exist some positive finite random variables Ξ, Ξ1 and Ξ2 such that Ξ1 |EN [f ] − f (θ0 )| ≤ √ a.s. N

 n  Ξ2 EN,m [f ] − f (θ0 ) ≤ √ a.s., N

and

and therefore  EN [f ] − E n

  ≤ √Ξ a.s. N

N,m [f ]

Remark 5.15.

x (i) In (5.7), r3 and r˙6 represent moment conditions on the derivatives of X(m) (θ), −1 x ϕ2 represents the variance of X(m) (θ), ∆ represents the length of the time interval between observations. Finally c2 > 2c−1 1 expresses a moment condix tion on Yi . In (5.6), recall that q˙6 determines a moment condition on X(m) (θ). (ii) Roughly speaking, if r3 , r˙6 and q˙6 are big enough (which √ implies a condition on n), and we choose α1 > 8α2 + 1 + 4ϕ∆2 c2 , m = N , h = C2 N −α2 and n = C1 N α1 , then Assumption 2.3 (6)-(a) and (6)-(b) are satisfied. Then conditions contain the main tuning requirements.

6. Appendix 6.1. Refinements of Markov’s inequalities In this section we state a refinement of Markov’s inequality that is applied in this n article. For λ > 0, let Sn := i=1 Xi where Xi is a sequence of independent and identically distributed random variables with E[Xi ] = 0. Lemma 6.1. Let X be a random variable with E[X] = 0. Then, for λ ∈ R, c > 0 and p := P (|X| < c), we have ' ( λ2 c 2 eλc − e−λc E eλX 1 (|X| < c) ≤ − E[X1(|X| ≥ c)] + pe 2 . 2c

Lemma 6.2. Let q1−1 + q2−1 = 1 and assume that E[|Xi |q1 ] < C¯q1 , then for all 0 < ε < 1 and {fn}n∈N ⊂ R+ satisfying that εfn−1 ≤ K we have P (|Sn | > nε; |Xi | < fn , i = 1, . . . , n) %

q &n 2 ε ¯ q1−1 − 2fε22 1 − nε2 −1 2fn n ≤ 2e 1 + (q2 − 1) q2 K1 2 Cq1 e . fn ! K −K " Here K1 = max 1, e −e . 2K

(6.1)

Strong Consistency of Bayesian Estimator

167

References [1] V. Bally, D. Talay, The law of the Euler scheme for stochastic differential equations II. Convergence rate of the density, Monte Carlo Methods Appl. 2, no. 2, 93–128, 1996. [2] P. Billingsley, Probability and measure, John Wiley & Sons, 1979. [3] P. Billingsley, Convergence of Probability Measures (Second Edition), John Wiley & Sons, 1999. [4] D. Bosq, Non-parametric Statistics for Stochastic Processes. Estimation and Prediction (second edition), Springer-Verlag, 1998. [5] J.A. Cano, M. Kessler, D. Salmeron, Approximation of the posterior density for diffusion processes, Statist. Probab. Lett. 76, no. 1, 39–44, 2006. [6] P. Del Moral, J. Jacod, P. Protter The Monte Carlo method for filtering with discretetime observations, Probab. Theory and Related Fields, 120, 346–368, 2001. [7] J. Guyon, Euler scheme and tempered distributions, Stochastic Process. Appl., 116, no. 6, 877–904, 2006. [8] A. Kohatsu-Higa, N. Vayatis, K. Yasuda, Tuning of a Bayesian Estimator under Discrete Time Observations and Unknown Transition Density, submitted. [9] A. Kohatsu-Higa, N. Vayatis, K. Yasuda, Strong consistency of Bayesian estimator for the Ornstein-Uhlenbeck process, accepted. [10] D. Talay, Stochastic Hamiltonian dissipative systems: exponential convergence to the invariant measure and discretization by the implicit Euler scheme, Markov Processes and Related Fields, 8, 163–198, 2002. [11] M.P. Wand, M.C. Jones, Kernel Smoothing, Chapman & Hall, 1995. [12] K. Yasuda, Kernel Density Estimation. The Malliavin-Thalmaier formula and Bayesian parameter estimation, Ph.D. thesis, 2008. Arturo Kohatsu-Higa Ritsumeikan University, Japan Science and Technology Agency Department of Mathematical Sciences 1-1-1 Nojihigashi, Kusatsu Shiga, 525-8577, Japan e-mail: [email protected] Nicolas Vayatis Ecole Normale Sup´erieure de Cachan Centre de Math´ematiques et de Leurs Applications 61 avenue du pr´esident Wilson F-94 235 Cachan cedex, France e-mail: [email protected] Kazuhiro Yasuda Hosei University, Faculty of Science and Engineering 3-7-2 Kajino-cho Koganei-shi Tokyo 184-8584, Japan e-mail: k [email protected]

Progress in Probability, Vol. 65, 169–178 c 2011 Springer Basel AG 

Exponentially Stable Stationary Solutions for Delay Stochastic Evolution Equations Jiaowan Luo Abstract. We establish some sufficient conditions ensuring existence, uniqueness and exponential stability of non-trivial stationary mild solutions for a class of delay stochastic partial differential equations. Some known results are generalized and improved. Mathematics Subject Classification (2000). 60H15; 34K21; 34K50. Keywords. Stochastic partial differential equations; stationary mild solutions; exponential stability; delay.

1. Introduction In this paper a stochastic evolution equation with memory is considered. The drift coefficient and the diffusion term of the equation are nonlinear functionals of the history of the solution. Sufficient conditions for the existence and uniqueness of a non-trivial stationary exponentially stable mild solution are given. This work is motivated by the recent papers [6, 8, 9] and [20, 21, 22]. In [6, 8, 9], by using a general random fixed point theorem, Caraballo et al. considered the exponential stability of non-constant stationary mild solutions of semilinear stochastic partial differential equations with delay and without delay, respectively. In [20, 21, 22], Liu studied the existence of a non-trivial stationary mild solution of retarded linear evolution equations with additive noise by means of Green operators. The approach of [20, 21, 22] works only for linear systems with additive noise and is not applicable to the cases of nonlinear systems and to linear systems with multiplicative noise. The exponential stability of stochastic partial differential equations is an important task which has been receiving considerable attention during the last decades. In particular, stochastic evolution equations containing some sort of delay or retarded argument have also been extensively studied due to their importance in applications (see, for example, [4, 11, 18] and [27]). However, even in the nondelay framework, most results in the literature are concerned with the exponential

170

J. Luo

stability of constant stationary solutions, mainly the trivial one (see [15, 25] in the finite-dimensional context, and [13, 16, 17, 23] among others in the infinitedimensional framework). It should be mentioned that there are a few papers already published for the stationary solutions of stochastic partial differential equations. Bakhtin and Mattingly in [1] proved the existence and uniqueness of the stationary solution of stochastically forced dissipative partial differential equations such as the stochastic Navier-Stokes equation and stochastic Ginsburg-Landau equation; Bessaih in [2] studied the existence of stationary martingale solutions of a two-dimensional dissipative Euler equation subjected to a random perturbation; Bl¨ omker and Hairer in [3] considered the existence of stationary mild solutions for a model of amorphous thin-film growth; Caraballo, Garrido-Atienza and Schmalfuß in [6] and Caraballo, Kloeden and Schmalfuß in [8, 9] considered the exponential stability of a nontrivial stationary solution of stochastic evolution equations with/without memory on some separable Hilbert space; Liu in [20, 21] considered the existence of stationary solution of retarded Ornstein-Uhlenbeck processes in Hilbert spaces; Liu in [22] studied the stationary solution of a class of Langevin equations driven by L´evy processes with time delay. In this note, some new techniques are applied to obtain the existence and uniqueness of a non-trivial stationary mild solution which is exponentially stable in the mean square sense. Our result improves and generalizes the results in [6, 8, 9, 10, 20, 21]. The content of the paper is as follows. In Section 2, we recall some definitions and do some preliminaries. In Section 3, we consider a linear stochastic evolution equation, and obtain the existence of a non-trivial stationary exponentially stable mild solution. At last, in Section 4, we consider a nonlinear stochastic evolution equation. Based on the result of Section 3, we prove the existence of a unique nontrivial stationary mild solution which is exponential stable in the mean square sense.

2. Definitions and preliminaries Let H and K be two real separable Hilbert spaces and denote by ·, ·H , ·, ·K their inner products and by · H , · K their vector norms, respectively. We denote by L(K, H) the set of all linear bounded operators from K into H, equipped with the usual operator norm · . In this paper, we always use the same symbol · to denote norms of operators regardless of the spaces potentially involved when no confusion possibly arises. Let Q be a linear, symmetric and nonnegative bounded operator on K. Suppose that W (t), t ∈ R, is a K-valued, Q-Wiener process on a probability space (Ω, F , {Ft }, P) with covariance operator Q, adapted to the filtration Ft and with increments W (t) − W (s) independent of Fs (see Da Prato and Zabczyk (1992)).

Stationary Solutions of Stochastic Equations

171

Let us fix an r ∈ [0, +∞] and Cr := C([−r, 0]; H) denote the family of all continuous functions ϕ from [−r, 0] into H. As usual, the space C([−r, 0]; H) is assumed to be equipped with the supremum norm ϕ Cr = sup−r≤θ≤0 ϕ(θ) H for any ϕ ∈ Cr . If we have a function X ∈ Cr , for each t ∈ R we denote by Xt ∈ Cr the function defined by Xt (θ) = X(t + θ), −r ≤ θ ≤ 0. In this paper, we consider the following infinite-dimensional stochastic differential equation dX(t) = AX(t)dt + F (Xt )dt + G(Xt )dW (t),

t ∈ R,

(2.1)

where A is the infinitesimal generator of a strongly continuous semigroup etA , t ∈ R, in H satisfying that etA ≤ M e−γt ,

(2.2)

where M > 0 and γ > 0 are constants. The operators F : Cr −→ H, G : Cr −→ L(K, H) are supposed to satisfy the following properties: for any positive continuous function ρ over (−r, +∞) there exist constants LF , LG ≥ 0 such that for all x, y ∈ C([−r, +∞); H) and all t ≥ s  t  t 2 2 ρ(u) F (xu ) − F (yu ) H du ≤ LF ρ(u) x(u) − y(u) 2H du (2.3) s

and



s−r

t

s

2

ρ(u) G(xu ) − G(yu ) du ≤

L2G



t

s−r

ρ(u) x(u) − y(u) 2H du,

(2.4)

where G(x) − G(y) 2 := tr((G(x) − G(y))Q(G(x) − G(y))∗ ) (see Da Prato and Zabczyk [13] Chapter 4). We mention that Assumptions (2.3) and (2.4) are often assumed in the context of delay partial differential equations. See [5, 7, 8, 12] for some particular examples. Definition 2.1. A stochastic process X(t, s), s ≤ t, (we simply write as X(t) and omit the initial time s) defined on (Ω, F , {Ft}, P) is called a mild solution, of Equation (2.1) if (i) X(t), s ≤ t, is adapted to Ft , and for arbitrary t ≥ s, %  t & 2 X(u) H du < ∞ = 1; P ω: s

(ii) for any t ≥ s, we have X(t) = e(t−s)A X(s) + (iii) X(s) ∈ H.



t

e(t−u)A F (Xu )du + s



s

t

e(t−u)A G(Xu )dW (u);

172

J. Luo

Definition 2.2. A solution X(t), t ≥ s, of Equation (2.1) is called strongly stationary, or simply stationary, if the finite-dimensional distributions are invariant under time translations, i.e., P {X(t + tk ) ∈ Γk , k = 1, 2, . . . , n}

= P {X(tk ) ∈ Γk , k = 1, 2, . . . , n}

for all t ≥ s, n ≥ 1, t1 , . . . , tn ≥ 0 and all Borel sets Γk ∈ B(H), k = 1, . . . , n, or equivalently, for any h1 , . . . , hn ∈ H,     n   n   E exp i . = E exp i X(tk ), hk H X(tk + t), hk H k=1

k=1

We say that (2.1) has a stationary solution X(t) if there is an initial X(s) ∈ H such that X(t), t ≥ s, is a stationary solution of Equation (2.1).

3. Linear stochastic evolution equations In this section we consider the linear version of equation (2.1): dx(t) = Ax(t)dt + α(t)dt + β(t)dW (t),

t ∈ R,

(3.1)

where α(t) and β(t) are continuous stochastic processes such that α(t) and β(t) are Ft -measurable for every t and sup E α(t) 2H < ∞, t∈R

sup E β(t) 2 < ∞.

(3.2)

t∈R

Theorem 3.1. Let (2.2) and (3.2) hold. Moreover, suppose that η(t) := (α(t), β(t)) is stationary. Then equation (3.1) has a unique non-trivial stationary mild solution which is exponentially stable in the mean square sense. Proof. From Proposition 1.3.3 in [19], we know that there exists a mild solution of equation (3.1). Define x∗ (t) =



t

e(t−u)A α(u)du +



t

e(t−u)A β(u)dW (u).

(3.3)

−∞

−∞

The right-hand side of (3.3) exists for all t by the assumptions (2.2) and (3.2). Now, we show that x∗ (t), t ∈ R, is a mild solution of Equation (3.1). In fact, it is obvious that the process x∗ (t) verifies the following integral equation x(t) = e

(t−s)A

x(s) +



t

e s

(t−u)A

α(u)du +



s

t

e(t−u)A β(u)dW (u).

(3.4)

Stationary Solutions of Stochastic Equations

173

Next we prove that x∗ (t) is mean square bounded by estimating every term in (3.3). In fact, it follows from the Cauchy-Bunyakovskii inequality that 8 t 82  t

2 8 8 (t−u)A (t−u)A 8 8 E8 e α(u)du8 ≤ E e H · α(u) H du −∞

−∞

H

2

≤M E ≤ M2 ≤



 t

t

e

e

−∞

e−γ(t−u)du

−∞



t

−∞

· α(u) H du

2

e−γ(t−u) E α(u) 2H du

M2 sup E α(t) 2H < ∞. γ 2 t∈R

Properties of stochastic integrals imply that 82 8 t  8 8 (t−u)A 8 ≤ e β(u)dW (u)8 E8 8 −∞

− γ2 (t−u) − γ2 (t−u)

t

M 2 e−2γ(t−u) E β(s) 2 ds

−∞ 2

H



M sup E β(t) 2 < ∞. 2γ t∈R

Second we show that x∗ (t) is mean square exponentially stable. Let x(t) be an arbitrary mild solution of equation (3.1). Then  t  t x(t) = e(t−s)A x(s) + e(t−u)A α(u)du + e(t−u)A β(u)dW (u). (3.5) s

Thus

s

E x(t) − x∗ (t) 2H ≤ e(t−s)A 2H E x(s) − x∗ (s) 2H

≤ M 2 e−2γ(t−s) E x(s) − x∗ (s) 2H , hence the exponential stability follows. Third we prove that x∗ (t) is strongly unique. Let y(t) be a mild solution of equation (3.1) that is different from x∗ (t) and mean square bounded on the whole axis. Then z(t) := x∗ (t) − y(t) is a solution of equation (3.1). Thus E z(t) 2H ≤ M 2 e−2γ(t−τ ) E z(τ ) 2H

for all t, τ ∈ R such that t ≥ τ . Set supt∈R E z(t) 2H = a, then we have E z(t) 2H ≤ aM 2 e−2γ(t−τ ).

Passing to the limit in the above as τ → −∞, we get E z(t) 2H = 0 for all t ∈ R, whence it follows P{x∗ (t) = y(t)} = 0 for all t ∈ R. Since x∗ (t) and y(t) are continuous, we have % & P sup x∗ (t) − y(t) H > 0 = 0, t∈R

and the uniqueness is proved.

174

J. Luo

At last, we prove that x∗ (t) is stationary similar as that in [24]. In fact, since W (t), t ∈ R, has stationary independent increments, and etA is a strongly continuous semigroup, (α(t), β(t)) is stationary, from (3.3) it follows that the joint distribution of x∗ (t) is identical with that of x∗ (t + u) for all u, t ∈ R. 

4. Nonlinear stochastic evolution equations In this section, taking advantage of what has been obtained for the linear equation in the last section 3, it is proved that the infinite-dimensional stochastic differential equation has also a unique non-trivial mild solution which is exponentially stable in mean square. Theorem 4.1. Let (2.2), (2.3) and (2.4) hold. Moreover, suppose that 3M 2 L2F + 3M 2 L2G < γ. γ

(4.1)

Then equation (2.1) has a unique non-trivial stationary mild solution which is exponentially stable in the mean square sense. Proof. We apply the successive approximation method to equation (3.1). We seek its mean square bounded solution as the limit of the sequence {X (m) (t)}, where X (m+1) (t) is defined as a unique mean square bounded mild solution of the following . (m) (m) dX(t) = AX(t)dt + F (Xt )dt + G(Xt )dW (t), t ∈ R, (4.2) (0) X (t) = 0, From the conditions (2.3) and (2.4) we have that (m)

sup E F (Xt t∈R

) 2H < ∞,

(m)

sup E G(Xt t∈R

) 2 < ∞,

whenever X (m) (t) is mean square bounded. Hence by Theorem 3.1, there is a unique mean square exponentially stable stationary solution of the form  t  t (m+1) (t−u)A (m) X (t) = e F (Xu )du + e(t−u)A G(Xu(m) )dW (u). (4.3) −∞

−∞

It is easy to prove that the solutions {X uniformly on [t1 , t2 ], and such that

(m)

(t)} converge with probability one

E X (m) (t) 2H ≤ C for all t ∈ R and some C > 0. By Fatou’s lemma this implies that E X (∞)(t) 2H ≤ C

Stationary Solutions of Stochastic Equations

175

Passing to the limit in (4.3) as m → +∞ and using the properties of stochastic integrals and the continuity of F and G we have the limit stochastic process X (∞) (t) is such that  t  t (t−u)A (∞) (∞) e(t−u)A G(Xu(∞) )dW (u), (4.4) e F (Xu )du + X (t) = −∞

−∞

which is a mild solution of equation (2.1). The solution obtained this way is exponentially stable in mean square. Indeed, let Y (t) be another solution of equation (2.1). Then  t  t (t−s)A (t−u)A Y (t) = e Y (s) + e F (Yu )du + e(t−u)A G(Yu )dW (u). (4.5) s

s

Hence, from (4.4) and (4.5), similarly as in [10], we have for t ≥ s X (∞) (t) − Y (t) 2H ≤ 3 e(t−s)A 2H X (∞) (s) − Y (s) 2H 82 8 t 8 8 (t−u)A (∞) 8 e [F (X ) − F (Y )]du +38 u u 8 8 s

H

82 8 t 8 8 (t−u)A (∞) 8 e [G(X ) − G(Y )]dW (u) +38 u u 8 8 s

H

2 −2γ(t−s)

≤ 3M e



X t

(∞)

(s) − Y

(s) 2H

3M 2 L2F e−γ(t−u) X (∞)(u) − Y (u) 2H du γ s−r 8 t 82 8 8 (t−u)A (∞) 8 +38 e [G(Xu ) − G(Yu )]dW (u)8 8 ,

+

s

i.e.,

H

eγ(t−s) E X (∞) (t) − Y (t) 2H

≤ 3M 2 e−γ(t−s) E X (∞) (s) − Y (s) 2H

 t 3M 2 L2F + + 3M 2 L2G eγ(u−s) E X (∞) (u) − Y (u) 2H du. γ s−r

Hence by the Gronwall-Bellman inequality E X (∞) (t) − Y (t) 2H ≤ Ce(

3M 2 L2 F γ

+3M 2 L2G −γ)(t−s)

,

(4.6)

where C > 0 is a constant, which shows that X (∞) (t) is exponentially stable. Theorem 3.1 implies that X (∞) (t) is the limit of a sequence of stationary processes X (m)(t) defined by (4.3). The existence of a second stationary solution of equation (2.1) contradicts the uniqueness of a mean square bounded solution of this equation, since every stationary solution has bounded second moment. 

176

J. Luo

Remark 4.2. In [6, 8, 9], the authors considered three types of equation (2.1). But they need the restriction that M = 1 in (2.2), that is, etA ≤ e−γt . Remark 4.3. The approach of [20, 21, 22] works only for linear systems with additive noise and is not applicable to the cases of nonlinear systems and linear systems with multiplicative noise. Remark 4.4. In [10], the authors obtained the mean square exponential stability of zero solution of equation (2.1) with variable delays. For example, consider the following equation dX(t) = AX(t)dt + F (X(t − r))dt + G(X(t − r))dW (t),

where r > 0 is a constant, by [10], if (2.2), (2.3) (2.4) and

3M 2 L2F + 3M 2 L2G eγr < γ γ

t ∈ R,

(4.7)

(4.8)

hold, then the zero solution of equation (4.7) is mean square exponentially stable. However, by Theorem 4.1 in present paper, if (2.2), (2.3) (2.4) and 3M 2 L2F + 3M 2 L2G < γ (4.9) γ hold, then equation (4.7) has a unique stationary mild solution which is also mean square exponentially stable. Obviously, our condition (4.9), which does not depend on the time delay, is better than the above (4.8). In addition, our conditions are similar as the well-known conditions of the stability for every value of the delay r in the finite dimension setting [14]. Acknowledgement The author would like to thank the referee for the careful reading of the manuscript. The valuable remarks made numerous improvements throughout. This work is partially supported by NNSF of China (Grant No. 10971041), by the Specialized Research Fund for the Doctoral Program of Higher Education (Grant No.20090002110047) and by the Foundation of Guangzhou Education Bureau (Grant No. 08C019).

References [1] Yuri Bakhtin and Jonathan C. Mattingly, Stationary solutions of stochastic differential equation with memory and stochastic partial differential equations, Commun. Contemp. Math., 7(2005), 553–582. [2] Hakima Bessaih, Stationary solutions for the 2D stochastic dissipative Euler equation, Seminar on Stochastic Analysis, Random Fields and Applications V, 23–36, Progr. Probab., 59, Birkh¨ auser, Basel, 2008. [3] Dirk Bl¨ omker and Martin Hairer, Stationary solutions for a model of amorphous thin-film growth, Stochastic Anal. Appl., 22(2004), 903–922.

Stationary Solutions of Stochastic Equations

177

[4] Tom´ as Caraballo, Asymptotic exponential stability of stochastic partial differential equations with delay, Stochastics Stochastics Rep., 33(1990), 27–47. [5] Tom´ as Caraballo, M.J. Garrido-Atienza and Jos´e Real, Existence and uniqueness of solutions for delay stochastic evolution equations, Stochastic Anal. Appl., 20(2002), 1225–1256. [6] Tom´ as Caraballo, M.J. Garrido-Atienza and Bj¨ orn Schmalfuß, Existence of exponentially attracting stationary solutions for delay evolution equations, Discrete Contin. Dyn. Syst., 18(2007), 271–293. [7] Tom´ as Caraballo, M.J. Garrido-Atienza and Bj¨ orn Schmalfuß, Asymptotic behaviour of non-trivial stationary solutions of stochastic functional evolution equations, Dopov. Nats. Akad. Nauk Ukr. Mat. Prirodozn. Tekh. Nauki, 6(2004), 39–42. [8] Tom´ as Caraballo, Peter E. Kloeden and Bj¨ orn Schmalfuß, Exponentially stable stationary solutions for stochastic evolution equations and their perturbation, Appl. Math. Optim., 50(2004),183–207. [9] Tom´ as Caraballo, Peter E. Kloeden and Bj¨ orn Schmalfuß, Stabilization of stationary solutions of evolution equations by noise, Discrete Contin. Dyn. Syst. Ser. B, 6(2006), no. 6, 1199–1212 [10] T. Caraballo and Kai Liu, Exponential stability of mild solutions of stochastic partial differential equations with delays, Stochastic Anal. Appl., 17(1999), 743–763. [11] Tom´ as Caraballo, Kai Liu and A. Truman, Stochastic functional partial differential equations: existence, uniqueness and asymptotic decay property, Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci., 456(2000), 1775–1802. [12] Tom´ as Caraballo, Jos´e Real and Takeshi Taniguchi, The exponential stability of neutral stochastic delay partial partial differential equations, Discrete Contin. Dyn. Syst., 18 (2007), 295–313. [13] G. Da Prato and J. Zabczyk, Stochastic Equations in Infinite Dimensions, Cambridge University Press, 1992. [14] Jack Hale, Theory of Functional Differential Equations, Springer-Verlag, 1992. ˘ [15] R.Z. Hasminski ´ 1, Stochastic Stability of Differential Equations, Sijthoff & Nordhoff, Alphen aan den Rijn, The Netherlands; Rockville, Maryland, USA, 1980. [16] U.G. Haussmann, Asymptotic stability of the linear Itˆo equation in infinite dimensions, J. Math. Anal. Appl., 65(1978), 219–235. [17] A. Ichikawa, Stability of semilinear stochastic evolution equations, J. Math. Anal. Appl., 90(1982), 12–44. [18] Kai Liu, Lyapunov functionals and asymptotic stability of stochastic delay evolution equations, Stochastics Stochastics Rep., 63 (1998), 1–26. [19] Kai Liu, Stability of Infinite Dimensional Stochastic Differential Equations with Applications, Monographs and Surveys in Pure and Applied Mathematics; 135, Chapman & Hall/CRC, 2006. [20] Kai Liu, Stationary solutions of retarded Ornstein-Uhlenbeck processes in Hilbert spaces, Statist. Probab. Lett., 78(2008), 1775–1783. [21] Kai Liu, A criterion for stationary solutions of retarded linear equations with additive noise, preprint, 2008.

178

J. Luo

[22] Kai Liu, Retarded stationary Ornstein-Uhlenbeck processes driven by L´evy noise and operator self-decomposability, Potential Anal., 33(2010), 291–312. [23] Kai Liu and Xuerong Mao, Exponential stability of nonlinear stochastic evolution equations, Stochastic Process. Appl., 78(1998), 173–193. [24] Jiaowan Luo and Guolie Lan, Stochastically bounded solutions and stationary solutions of stochastic differential equations in Hilbert spaces, Statistics and Probability Letters, 79(2009), 2260–2265. [25] Xuerong Mao, Exponential Stability of Stochastic Differential Equations, Marcel Dekker, New York, 1994. [26] S-.E-.A. Mohammed, Stochastic Functional Differential Equations, Pitman, Boston, 1984. [27] T. Taniguchi, Kai Liu and A. Truman, Existence, uniqueness, and asymptotic behavior of mild solutions to stochastic functional differential equations in Hilbert spaces, J. Differential Equations, 181(2002), 72–91. Jiaowan Luo Department of Probability and Statistics School of Mathematics and Information Sciences Guangzhou University Guangzhou Guangdong 510006, P.R. China e-mail: [email protected]

Progress in Probability, Vol. 65, 179–189 c 2011 Springer Basel AG 

Robust Stochastic Control and Equivalent Martingale Measures Bernt Øksendal and Agn`es Sulem Abstract. We study a class of robust, or worst case scenario, optimal control problems for jump diffusions. The scenario is represented by a probability measure equivalent to the initial probability law. We show that if there exists a control that annihilates the noise coefficients in the state equation and a scenario which is an equivalent martingale measure for a specific process which is related to the control-derivative of the state process, then this control and this probability measure are optimal. We apply the result to the problem of consumption and portfolio optimization under model uncertainty in a financial market, where the price process S(t) of the risky asset is modeled as a geometric Itˆ o-L´evy process. In this case the optimal scenario is an equivalent local martingale measure of S(t). We solve this problem explicitly in the case of logarithmic utility functions. Mathematics Subject Classification (2010). 93E20, 60G51, 60H20, 60G44, 91G80. Keywords. Robust stochastic control, worst case scenario, stochastic differential game, equivalent martingale measure, jump diffusion.

1. Introduction During the last decade there has been an increasing awareness of the importance of taking model uncertainty into account when dealing with mathematical models. See, e.g., [HS] and the references therein. A general feature of model uncertainty is the recognition of the uncertainty about the underlying probability law, or scenario, for the model. This leads to the study of robust models, where one seeks an optimal strategy among a family of possible scenarios. A special case is the problem of optimal control in the worst possible scenario. This is the topic of this paper. We consider a class of scenarios which is basically the set of all probability measures Q which are absolutely continuous with respect to a given reference measure P , and we study an optimal control problem for a jump diffusion under the worst possible scenario.

180

B. Øksendal and A. Sulem

Mathematically this leads to a stochastic differential game between the controller and the “environment” who chooses the scenario. Assuming the system is Markovian and using the Hamilton-Jacobi-Bellman-Isaacs (HJBI) equation, we show that if there exists a control uˆ which annihilates the noise coefficients of the ˆ which is an equivalent martingale measure for a specific system and a scenario Q process which is related to the control-derivative of the state process, then this ˆ is a worst case scenario. control is optimal for the controller and the scenario Q We then apply this result to the problem of optimal consumption and portfolio under model uncertainty in a financial market, where the price process S(t) of the risky asset is modeled as a geometric Itˆ o-L´evy process. In this case the optimal scenario is an equivalent local martingale measure of S(t). We solve this problem explicitly in the case of logarithmic utility functions. Robust control problems and worst case scenario problems have been studied by many researchers. We mention in particular the paper [BMS], where the following approach is used: The authors first fix a strategy and prove the existence of a corresponding optimal scenario Q∗ , and then subsequently use BSDEs to study the optimal strategy problem for a fixed scenario. Robust control problems for possibly non-Markovian systems can also be studied by means of stochastic maximum principles (see [AØ]) and by means of backward stochastic differential equations (see, e.g., [ØS3] and the references therein).

2. Robust optimal control 2.1. Stochastic differential game approach Consider a controlled jump diffusion X(t) = X u (t) in R of the form dX(t) = b(t, X(t), u(t))dt + σ(t, X(t), u(t))dB(t)  ˜ (dt, dz) ; X(0) = x ∈ R ; t ∈ [0, T ] γ(t, X(t), u(t), z)N +

(2.1)

R0

˜ (dt, dz) = N (dt, dz) − ν(dz)dt is the where T > 0 is a fixed constant. Here N compensated Poisson random measure of a L´evy process with jump measure N (·, ·) ˜ (·, ·) and L´evy measure ν(·), and B(t) is an independent Brownian motion. Both N and B(·) live on a filtered probability space (Ω, F, {Ft }, P ). The process u(t) = (u1 (t), . . . , um (t)) ∈ Rm is the control process of the agent and b(s, x, u), σ(s, x, u) and γ(s, x, u, z); s ∈ [0, T ], x ∈ R, u ∈ Rm , z ∈ R0 := R\{0}, are assumed to be C 1 functions with respect to u. The scenario of X(t) is determined by a (positive) measure Qθ of the form dQθ (ω) = K θ (T )dP (ω) on FT ,

(2.2)

Robust Stochastic Control

181

where θ

#



θ

dK (t) = −K (t ) θ0 (t)dB(t) + θ

K (0) = k > 0.



R0

$ ˜ θ1 (t, z)N (dt, dz) ; t ∈ [0, T ]

(2.3)

Here θ = θ(t, z) = (θ0 (t), θ1 (t, z)) ∈ R2 is the scenario control, assumed to be Ft -predictable and such that E[K θ (T )] = K θ (0) =: k > 0.

(2.4)

Let V , Θ be two sets such that u(t) ∈ V and θ(t, z) ∈ Θ for all t, z and let U, A be given families of admissible u-controls and θ-controls, respectively. Define the process Y (t) = Y θ,u (t) := (K θ (t), X u (t)). Then Y (t) is a controlled jump diffusion with generator Aθ,u ϕ(t, y) = Aθ,u ϕ(t, k, x)

(2.5) 2

2

∂ ϕ ∂ϕ 1 2 2 ∂ ϕ 1 2 + k θ0 2 + σ (t, x, u) 2 ∂x 2 ∂k 2 ∂x ∂2ϕ − θ0 kσ(t, x, u) ∂k∂x 

= b(t, x, u)

+

R0

{ϕ(t, k − kθ1 (z), x + γ(t, x, u, z)) − ϕ(t, k, x)

+ kθ1 (z)

∂ϕ ∂ϕ (t, k, x) − γ(t, x, u, z) (t, k, x)}ν(dz) ; ϕ ∈ C 1,2,2 (R3 ). ∂k ∂x

We refer to [ØS1] for more information about stochastic control of jump diffusions. Let f : R2 × V → R and g : R → R be functions such that  T

|f (t, X(t), u(t))|dt + |g(X(T ))| < ∞

EQθ

0

for all u ∈ U, θ ∈ A. Define the performance functional by  T

t,y J θ,u (t, y) = EQ θ

=E

t,y

f (s, X(s), u(s))ds + g(X(T ))



t

T

K θ (s)f (s, X(s), u(s))ds + K θ (T )g(X(T )) ,

(2.6)

t

t,y t,y where EQ denotes expectation with respect to Qθ and P , respectively, θ and E given Y (t) = y.

182

B. Øksendal and A. Sulem

We consider the following robust, or worst case scenario, stochastic control problem Problem 2.1. Find θ ∗ ∈ A, u∗ ∈ U and Φ(t, y) such that

∗ ∗ θ,u Φ(t, y) = inf sup J (t, y) = J θ ,u (t, y). θ∈A

(2.7)

u∈U

2.2. The main theorem We now formulate our main result: Theorem 2.2. Suppose there exist a C 1 function ψ(t, x) and feedback controls u ˆ= u ˆ(t, x) ∈ U, θˆ = (θˆ0 (t, x), θˆ1 (t, x, z)) ∈ A such that σ(t, x, u ˆ(t, x)) = γ(t, x, uˆ(t, x), z) = 0 for all t, x, z and

(2.8)

# ∂σ (t, x, u ˆ(t, x)) θˆ0 (t, x) ∂ui $  ∂b ∂ψ ∂γ ˆ (t, x, u ˆ (t, x), z)ν(dz) − (t, x, u ˆ (t, x)) (t, x) + (t, x, z) θ1 ∂u ∂u ∂x i i R0 ∂f (t, x, uˆ(t, x)) ; for all t, x, i = 1, 2, . . . , m. (2.9) = ∂ui

ˆ is an optimal pair for the robust control problem (2.7) and the value Then (ˆ u, θ) function is given by Φ(t, k, x) = kψ(t, x); provided that ψ(t, x) is the solution of the PDE ∂ψ ∂ψ (t, x) + b(t, x, u ˆ(t, x)) (t, x) + f (t, x, u ˆ(t, x)) = 0 ; (t, x) ∈ [0, T ] × R (2.10) ∂t ∂x ψ(T, x) = g(x) ; x ∈ R.

(2.11)

Proof. We apply Theorem 3.2 in [MØ]. Maximizing u → Aθ,u ϕ(t, k, x)+kf (t, x, u) with respect to u gives the following first-order conditions for an optimal uˆ: ∂b ∂ϕ ∂σ ∂ 2ϕ ∂σ ∂2ϕ (t, x, u ˆ) + σ(t, x, uˆ) (t, x, u ˆ) 2 − θ0 k (t, x, u ˆ) ∂ui ∂x ∂ui ∂x ∂ui ∂k∂x &  % ∂ϕ ∂ϕ ∂γ + (t, k − kθ1 (z), x + γ(t, x, u ˆ, z)) − (t, k, x) (t, x, u ˆ, z)ν(dz) ∂x ∂x ∂u i R0 ∂f +k (t, x, u ˆ) = 0; i = 1, 2, . . . , m. (2.12) ∂ui Minimizing Aθ,ˆu ϕ(t, k, x) + kf (t, x, u ˆ) with respect to θ = (θ0 , θ1 (z)), we get the following first-order conditions for optimal θˆ0 , θˆ1 (z): k 2 θˆ0

∂ 2ϕ ∂ 2ϕ − kσ(t, x, u ˆ) =0 2 ∂k ∂k∂x

(2.13)

Robust Stochastic Control and



R0

%

& ∂ϕ ∂ϕ ˆ (t, k − k θ1 (z), x + γ(t, x, u ˆ, z)) − (t, k, x) ν(dz) = 0. ∂k ∂k

183

(2.14)

Let us try a value function of the form ϕ(t, k, x) = kψ(t, x).

(2.15)

Then (2.12)–(2.14) get the form ∂ψ ∂σ ∂2ψ ∂σ ∂ψ ∂b (t, x, u ˆ) + σ(t, x, u ˆ) (t, x, u ˆ) 2 − θˆ0 (t, x, u ˆ) ∂ui ∂x ∂ui ∂x ∂ui ∂x &  % ∂ψ ∂ψ ∂γ (t, x + γ(t, x, u ˆ, z)) − (t, x) (1 − θˆ1 (z)) + (t, x, u ˆ, z)ν(dz) ∂x ∂x ∂u i R0 ∂f (t, x, u ˆ) = 0; i = 1, 2, . . . , m, (2.16) + ∂ui σ(t, x, u ˆ) and



R0

∂ψ (t, x) = 0, ∂x

{ψ(t, x + γ(t, x, u ˆ, z)) − ψ(t, x)}ν(dz) = 0.

(2.17) (2.18)

Suppose there exists a Markov control u ˆ = uˆ(t, x) such that σ(t, x, u ˆ(t, x)) = γ(t, x, uˆ(t, x), z) = 0 for all z ∈ R0 .

(2.19)

Then (2.17)–(2.18) are satisfied, and (2.16) gets the form $  ∂γ ∂b ∂ψ ∂σ ˆ ˆ (t, x, u ˆ) + θ1 (t, x, z) (t, x, u ˆ, z)ν(dz) − (t, x, u ˆ) (t, x) θ0 (t, x) ∂ui ∂u ∂u ∂x i i R0 ∂f = (t, x, uˆ); i = 1, 2, . . . , m. (2.20) ∂ui Suppose u ˆ, θˆ satisfy (2.19)–(2.20). Then by Theorem 3.2 in [MØ] we are required to have #

∂ϕ ˆ (t, y) + Aθ,ˆu ϕ(t, y) + kf (t, x, u ˆ) = 0 ; t < T. ∂t By (2.5) and (2.19), this gives the equation ∂ψ ∂ψ (t, x) + b(t, x, u ˆ(t, x)) (t, x) + f (t, x, u ˆ(t, x)) = 0 ; t < T, ∂t ∂x with terminal condition ψ(T, x) = g(x) ; x ∈ R. This completes the proof.

(2.21) (2.22) 

ˆ x) might depend on ∂ψ (t, x). Hence equaRemark 2.3. Note that u ˆ(t, x) and θ(t, ∂x tion (2.10) is in general a nonlinear PDE in the unknown function ψ.

184

B. Øksendal and A. Sulem

2.3. Equivalent local martingale measures The following definition is motivated by applications in mathematical finance: Definition 2.4. Let S(t) be an Itˆ o-L´evy process of the form  ˜ (dt, dz) dS(t) = α(t)dt + β(t)dB(t) + ζ(t, z)N R0

for predictable processes α(t), β(t), ζ(t, z); t ∈ [0, T ], z ∈ R0 . A probability measure Q on FT is called an equivalent local martingale measure (ELMM) for S(·) if Q ∼ P (i.e., Q ≪ P and P ≪ Q) and {S(t)}t∈[0,T ] is a local martingale with respect to Q. It is well known (see, e.g., [ØS1, Theorem 1.31]) that a measure Qθ of the form (2.2)–(2.4) with k = 1 is an ELMM for S(·) if and only if  θ0 (t)β(t) + θ1 (t, z)ζ(t, z)ν(dz) = α(t) ; t ∈ [0, T ]. (2.23) R0

= 0. Then, in view of Definition 2.4, the measure Qθˆ of the Suppose that ∂ψ ∂x form (2.2)–(2.4) with k = 1, where θˆ is a scenario control satisfying (2.9), is an ELMM for all the processes Gi (t), given by 

−1 ∂ψ ∂f ∂b ˆ ˆ ˆ dGi (t) := (t, X(t), u ˆ(t)) + (t, X(t)) (t, X(t), u ˆ(t)) dt (2.24) ∂ui ∂x ∂ui  ∂σ ∂γ ˆ ˆ ˜ (dt, dz); + (t, X(t), u ˆ(t))dB(t) + (t, X(t), u ˆ(t), z)N ∂ui ∂u i R0 i = 1, 2, . . . , m. ˆ ˆ where u ˆ(t) = u ˆ(t, X(t)) and X(t) = X uˆ (t); t ∈ [0, T ].

3. Example Suppose we have a financial market with a risk free asset with unit price S0 (t) = 1 and a risky asset with unit price S(t) given by # $  ˜ (dt, dz) ; t ∈ [0, T ] dS(t) = S(t− ) b0 (t)dt + σ0 (t)dB(t) + γ0 (t, z)N R0

S(0) > 0,

(3.1)

where b0 (t), σ0 (t) and γ0 (t, z) are bounded deterministic functions, γ0 (t, z) > −1. If we apply a portfolio π(t), representing the proportion of the total wealth X(t) invested in the risky asset at time t and a relative consumption rate λ(t) ≥ 0, the

Robust Stochastic Control

185

corresponding wealth process X(t) = X λ,π (t) will have the dynamics # $  ˜ (dt, dz) dX(t) = π(t)X(t− ) b0 (t)dt + σ0 (t)dB(t) + γ0 (t, z)N R0

− λ(t)X(t)dt ; t ∈ [0, T ],

X(0) = x > 0.

(3.2)

We say that the pair u = (λ, π) is an admissible control if λ and π are Fpredictable, λ ≥ 0, π(t)γ0 (t, z) > −1 and   T 2 (π (t) + λ(t) + | log(1 + π(t)γ0 (t, z))|ν(dz))dt < ∞ a.s. 0

R0

Note that under these conditions the unique solution X(t) of (3.2) is given by  t  t 1 π(s)σ0 (s)dBs X(t) = x exp( {π(s)b0 (s) − λ(s) − π 2 (s)σ02 (s)}ds + 2 0 0  t + (log(1 + π(s)γ0 (s, z)) − π(s)γ0 (s, z))ν(dz)ds +

0

R0

0

R0

 t

˜ (ds, dz)); t ∈ [0, T ] (log(1 + π(s)γ0 (s, z))N

In particular, X(t) > 0 for all t ∈ [0, T ]. Let U1 : [0, T ] × [0, ∞) → R, U2 : [0, ∞) → R be two given C 1 functions. We assume that c → U1 (t, c) and x → U2 (x) are strictly increasing, concave functions 1 (utility functions) for all t ∈ [0, T ]. We also assume that c → ∂U ∂c (t, c) is strictly ∂U1 1 decreasing and that limc→+∞ ∂c (t, c) = 0 for all t ∈ [0, T ]. Put x0 = ∂U ∂c (t, 0) and define . 0 for x ≥ x0 −1 I(t, x) =  ∂U1 (3.3) (x) for 0 ≤ x < x0 ∂c (t, ·) Define the absolute consumption rate at time t by

c(t) = λ(t)X(t) ; t ∈ [0, T ].

(3.4)

Suppose the performance functional is given by  T

t,y J θ,λ,π (t, y) = EQ θ

= E t,y

U1 (s, c(s))ds + U2 (X(T ))



t

T

K θ (s)U1 (s, c(s))ds + K θ (T )U2 (X(T )) .

(3.5)

t

To solve the problem Φ(t, y) = inf

θ∈A



sup J (λ,π)∈U

θ,λ,π

(t, y)



= Jθ



,λ∗ ,π ∗

(t, y)

(3.6)

186

B. Øksendal and A. Sulem

we apply Theorem 2.2. Thus we search for a solution cˆ, π ˆ , θˆ = (θˆ0 , θˆ1 ) and ψ(t, x) such that (2.8) and (2.9) hold, when b(t, x, u) = πxb0 (t) − λx, σ(t, x, u) = πxσ0 (t), γ(t, x, u, z) = πxγ0 (t, x, z) and f (t, x, u) = U1 (t, c), g(x) = U2 (x), u = (λ, π), c = λx. We see that (2.8) holds with π ˆ = 0, for all cˆ. Writing



∂ ∂ ∂ ∂ , , = , ∂u1 ∂u2 ∂λ ∂π equation (2.9) becomes

and

∂U1 ∂ψ (t, x) = (t, cˆ(t, x)) ∂x ∂c  ˆ θ0 (t, x)σ0 (t) + θ1 (t, x, z)γ0 (t, z)ν(dz) = b0 (t).

(3.7) (3.8)

R0

The equation (2.10)–(2.11) for ψ gets the form ∂U1 ∂ψ (t, x) − cˆ(t, x) (t, cˆ(t, x)) + U1 (t, cˆ(t, x)) = 0 ; t < T ∂t ∂c ψ(T, x) = U2 (x) which has the solution ψ(t, x) = U2 (x) +



T t

{U1 (s, cˆ(s, x)) − cˆ(s, x)

∂U1 (s, cˆ(s, x))}ds. ∂c

(3.9)

(3.10)

In this case the processes Gi defined in (2.24) are 

−1 ∂ψ ∂U1 ˆ ˆ ˆ dG1 (t) = X(t) −1 + (t, X(t)) (t, cˆ(t, X(t))) dt = 0 ∂x ∂c # $  − ˆ ˜ dG2 (t) = X(t ) b0 (t)dt + σ0 (t)dB(t) + γ0 (t, z)N (dt, dz) R0

ˆ −) X(t = dS(t). S(t− )

We conclude that the optimal scenario control is to choose θˆ = (θˆ0 , θˆ1 ) such that (3.8) holds, i.e., such that Qθˆ is an ELMM for S(t). The corresponding optimal portfolio is to choose π ˆ = 0 (no money in the risky asset). This is intuitively reasonable, because if the price process is a martingale, there is no money to be gained by investing in this asset. Finally, to find the optimal consumption rate cˆ(t, x) we combine (3.7) and (3.10) : From (3.7) and (3.3) we have

∂ψ (t, x) . (3.11) cˆ(t, x) = I t, ∂x Differentiating (3.10) we therefore get



∂ψ ∂ψ ∂ψ ∂ψ (t, x) = −U1 t, I t, (t, x) + I t, (t, x) (t, x) ; t < T. (3.12) ∂t ∂x ∂x ∂x

Robust Stochastic Control

187

This is a nonlinear first-order partial differential equation in ψ(t, x). Together with the terminal value (obtained from (3.10)) ψ(T, x) = U2 (x)

(3.13)

this determines ψ(t, x) uniquely. To summarize, we have proved Theorem 3.1. The optimal (i.e., worst case) scenario control for the problem (3.6) is to choose θ∗ = (θˆ0 , θˆ1 ) such that (3.8) holds, which is equivalent to saying that the measure Qθˆ is an ELMM for the price process S(t) given by (3.1). The optimal portfolio under this scenario is to choose π ˆ = 0 (no money in the risky asset). The optimal consumption rate cˆ(t, x) under this scenario is given by (3.11), i.e., ∂ψ ∂U1 (t, cˆ(t, x)) = (t, x) (3.14) ∂x ∂x where ψ(t, x) is the solution of (3.12)–(3.13). The corresponding value function is, by (2.15), Φ(t, k, x) = kψ(t, x). (3.15) This is an extension of the result in [ØS2]. A special case. To illustrate the content of Theorem 3.1, we consider the special case when U1 and U2 are logarithmic utility functions, i.e., U1 (s, c) = U1 (c) = ln c; U2 (x) = a ln x where a > 0 is constant. Then equation (3.12) gets the form

U1′ (c)

∂ψ (t, x) = ln ∂t Set Then

=

1 c

and, by (3.3) I(x) =

∂ψ (t, x) + 1. ∂x

(3.16) 1 . x

Therefore (3.17)

h(t, x) = ψ(t, x) − t. ∂h (t, x) = ln ∂t

Set H(t, x) =



∂h (t, x) . ∂x

∂h (t, x). ∂x

Then H satisfies the nonlinear PDE: ∂H ∂H H(t, x) (t, x) = (t, x); t ≤ T ∂t ∂x We assign the terminal condition a H(T, x) = x and try to solve this equation by setting 1 H(t, x) = H1 (t) . x

(3.18)

188

B. Øksendal and A. Sulem

Substituting into (3.18) gives H1′ (t) = −1.

Since H1 (T ) = λ, this gives the solution H(t, x) =

T −t+a x

and hence h(t, x) = (T − t + a) ln x + C(t)

for some function C(t). This gives

ψ(t, x) = (T − t + a) ln x + t + C(t) and hence

∂ψ (t, x) = − ln x + C ′ (t) + 1 ∂t while

∂ψ (t, x) = ln(T − t + a) − ln x. ln ∂x Using (3.17), we get C ′ (t) = ln(T − t + a) − 1

or Hence Requiring

C(t) = −(T − t + a) ln(T − t + a) + (T − t + a) − t + C0 ψ(t, x) = (T − t + a)[ln x − ln(T − t + a) + 1] + C0 . ψ(T, x) = U2 (x) = a ln x

leads to the condition Hence the solution is

C0 = a ln a − a.

ψ(t, x) = (T − t + a)[ln x − ln(T − t + a)] + a ln a + T − t

(3.19)

and the optimal consumption rate is then x . (3.20) T −t+a In particular, we see that the consumption increases with time, which makes sense, because a large early consumption reduces the growth for the whole remaining time period. cˆ(t, x) =

Acknowledgment The research leading to these results has received funding from the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007-2013) / ERC grant agreement no. [228087].

Robust Stochastic Control

189

References [AØ]

[BMS]

[HS] [MØ] [ØS1] [ØS2]

[ØS3]

T.T.K. An and B. Øksendal: A maximum principle for stochastic differential games with partial information. J. Optim. Theory and Appl. 139 (2008), 463– 483. G. Bordigoni, A. Matoussi and M. Schweizer: A stochastic control approach to a robust utility maximization problem. In F.E. Benth et al. (editors): Stochastic Analysis and Applications. Proceedings of the Second Abel Symposium, Oslo 2005. Springer 2007, pp. 135–151. L.P. Hansen and T.J. Sargent: Robust control and model uncertainty. The American Economic Review, Vol. 91, no. 2, (2001), pp. 60–66. S. Mataramvura and B. Øksendal: Risk minimizing portfolios and HJBI equations for stochastic differential games. Stochastics 80 (2008), 317–337. B. Øksendal and A. Sulem: Applied Stochastic Control of Jump Diffusions. Second Edition, Springer 2007. B. Øksendal and A. Sulem: A game theoretic approach to martingale measures in incomplete markets. Eprint, Dept. of Mathematics, University of Oslo 24/2006. Survey of Applied and Industrial Mathematics (TVP Publishers, Moscow), 15, (2008), 18–24. B. Øksendal and A. Sulem: Portfolio optimization under model uncertainty and BSDE games. Eprint, University of Oslo, Dept. of Mathematics, No. 1, 2011. Submitted to Quantitative Finance.

Bernt Øksendal Center of Mathematics for Applications (CMA) Dept. of Mathematics University of Oslo P.O. Box 1053 Blindern N–0316 Oslo, Norway and Norwegian School of Economics and Business Administration Helleveien 30 N–5045 Bergen, Norway e-mail: [email protected] Agn`es Sulem INRIA Paris-Rocquencourt Domaine de Voluceau BP 105 F-78153 Le Chesnay Cedex, France e-mail: [email protected]

Progress in Probability, Vol. 65, 191–205 c 2011 Springer Basel AG 

Multi-valued Stochastic Differential Equations Driven by Poisson Point Processes Jiagang Ren and Jing Wu Abstract. We prove the existence and uniqueness of solutions of multi-valued stochastic differential equations driven by Poisson point processes when the domain of the multi-valued maximal monotone operator is the whole space Rd . Mathematics Subject Classification (2000). 60H10, 34F05,47H04. Keywords. Multi-valued stochastic differential equation; Skorohod problem; L´evy processes; Maximal monotone operator; random time change; Helly’s first theorem.

1. Introduction Multi-valued stochastic differential equations driven by Brownian motions have been the object of several works (see, e.g., [5, 6, 13, 11]). In this paper we consider the following multi-valued stochastic differential equation (MSDE in short) of jump type: dXt ∈ − A(Xt )dt + b(Xt )dt + σ(Xt )dBt   ˜ + f (Xt− , u)N (dt, du) + g(Xt− , u)N (dt, du) U0

(1)

U\U0

where A is a multi-valued maximal monotone operator defined on (some domain of) Rd and B is an r-dimensional Brownian motion, N is the counting measure of a stationary Poisson point process with characteristic measure ν on some mea˜ (dt, du) is the compensator of N and U0 ∈ BU with surable space (U, BU ), N ν(U \ U0 ) < ∞. If no point process is involved, the existence and uniqueness of a solution to (1) are proved in [5, 6, 13]. Note that for a multi-valued operator A whose domain is not necessarily the whole space, the continuity of the sample paths Research supported by NSFC (Grant no. 10871215), Doctor Fund of Ministry of Education (Grant no. 20100171110038), and by the Plan of Yat-sen Innovative Talents (2009).

192

J. Ren and J. Wu

of the processes under consideration plays an indispensable role in proving the existence of solutions. In the jump case, however, it is easy to image and a trivial example shows that the equation may have no solution if the domain of A is not the whole space (see the remark after Proposition 3.10). So, for simplicity and for putting the emphasis on the general principle we shall assume throughout the paper that A is defined on the whole space, leaving the troublesome problem of determining the minimal domain to a forthcoming paper.

2. Preliminaries and main result Let (U, BU , ν) be a σ-finite measure space and (Ω, F, Ft , P ) a filtered probability space on which an r-dimensional Brownian motion B = (B(t)) and a stationary Poisson point process p = (p(t)) on (U, BU ) with L´evy measure ν are defined. Let N (dt, du) be the counting measure of p and set ˜ (dt, du) := N (dt, du) − dtν(du). N

Let Rd be the d-dimensional Euclidean space.

Definition 2.1. By a multi-valued operator on Rd we mean an operator A from Rd d to 2R . We set D(A) := {x ∈ Rd : A(x) = ∅},

Gr(A) := {(x, y) ∈ R2d : x ∈ Rd , y ∈ A(x)}. (1) A multi-valued operator A is called monotone if y1 − y2 , x1 − x2   0,

∀(x1 , y1 ), (x2 , y2 ) ∈ Gr(A).

(2) A monotone operator A is called maximal monotone if and only if (x1 , y1 ) ∈ Gr(A) ⇔ y1 − y2 , x1 − x2   0,

∀(x2 , y2 ) ∈ Gr(A).

We collect here some facts about the maximal monotone operator which will be used in the paper. For more details, we refer to [2]. Proposition 2.2. Let A be a maximal monotone operator on Rd , then: (i) Int(D(A)) and D(A) are convex subsets of Rd and Int(D(A)) = Int(D(A)). (ii) For each x ∈ D(A), A(x) is a closed and convex subset of Rd . Let A◦ (x) := projA(x) (0) be the minimal section of A, where projD is designated as the projection on every closed and convex subset D on Rd and proj∅ (0) = ∞. Then x ∈ D(A) ⇔ |A◦ (x)| < +∞.

(iii) The resolvent operator Jn := (I + n1 A)−1 is a single-valued and contractive operator defined on Rd and valued in D(A). Moreover, as n → ∞, Jn (x) → projD(A) (x).

Multi-valued Stochastic Differential Equations

193

(iv) The Yosida approximation An := n(I −Jn ) is a single-valued, maximal monotone and Lipschitz continuous function with Lipschitz constant n. Moreover, for every x ∈ D(A), as n ր ∞, An (x) → A◦ (x)

and |An (x)| ր |A◦ (x)|

|An (x)| ր +∞

if if

x ∈ D(A),

x∈ / D(A).

The following definition is taken from [5, 6]. Definition 2.3. If X, K are (Ft )-adapted processes, X is c`adl` ag and K is continuous. Then the pair (X, K) is called a strong solution of (1) if (i) X0 = x0 and X(t) ∈ D(A) for every 0  t < +∞; (ii) K is of locally finite variation and K0 = 0 a.s.;  ˜ (dt, du) (iii) dXt = b(Xt )dt + σ(Xt )dBt + U0 f (Xt− , u)dtN  + U\U0 g(Xt− , u)N (dt, du) − dKt , 0  t < ∞, a.s.; (iv) For any c` adl` ag and (Ft )-adapted functions (α, β) with (αt , βt ) ∈ Gr(A),

∀t ∈ [0, +∞),

the measure Xt − αt , dKt − βt dt  0

a.s.

Remark. The attribute “strong” is used in the usual sense signifying the solution is an adapted function of the driving L´evy process. We make the following assumptions: (H1) There exists a U0 ∈ BU such that ν(U \ U0 ) < ∞. (H2) (Lipschitz continuity) There exists a constant C1 such that for any y1 , y 2 ∈ Rd , y1 − y2 , b(y1 ) − b(y2 ) + y1 − y2 , σ(y1 ) − σ(y2 ) < =  + y 1 − y2 , (f (y1 , u) − f (y2 , u))ν(du)  C1 |y1 − y2 |2 . U0

(H3) (Linear growth) There exists C2 > 0 such that for any y ∈ Rd ,  |b(y)|2 + σ(y) 2 + |f (y, u)|2 ν(du)  C2 (1 + |y|2 ). U0

(H4) A is a maximal monotone operator with D(A) = Rd . We can now state our main result. Theorem 2.4. Under the above assumptions, Equation (1) admits a unique solution. The next section is devoted to the proof of this result.

194

J. Ren and J. Wu

3. Proof of Theorem 2.4 The following result is a slight improvement of [6, Lemma 4.1]. Proposition 3.1. Let A be a multi-valued maximal monotone operator, (X, K) and (X ′ , K ′ ) be c` adl` ag functions with X, X ′ ∈ D(A), K, K ′ of finite variation. If for any c` adl` ag functions α ∈ D(A) and βt := A0 (α(t)) satisfying  t |βs |ds < ∞, ∀t  0, 0

it holds that

Xt − αt , dKt − βt dt  0, Xt′ − αt , dKt′ − βt dt  0,

then

Xt − Xt′ , dKt − dKt′   0. Note that if α(v) is a c`adl` ag function and α(v) ∈ D(A) for all v ∈ R+ , then it is easy to see that β(v) := A0 (α(v)) is Lebesgue measurable. In fact, this can be seen from the facts that A0 (α(v)) = lim An (α(v)), ∀v  0 n→∞

and v → An (α(v)) is c` adl`ag, hence Lebesgue measurable. Hence the integral in the above proposition makes sense. Note also that any c` adl` ag function is bounded on bounded intervals. Proof. The proof is based on the following two properties of maximal monotone operators: (i) n1 An (x) → 0 (n → ∞), ∀x ∈ D(A); (ii) y, A0 (x)  |A0 (x)|2 , ∀(x, y) ∈ Gr(A). For (ii) see [2, p. 77]; (i) holds since An = n(I − Jn ) and Jn (x) → projD(A) (x) for all x. Set Xv + Xv′ αn (v) = Jn ( ), β n (v) = A0 (αn (v)). 2 Then αn is c` adl` ag. Thus for any s < t,  t Xv − αn (v), dKv − β n (v)dv  0, s  t Xv′ − αn (v), dKv′ − β n (v)dv  0. s

1 A , n n

we have Note that Jn = I −

=  t< Xv − Xv′ 1 Xv + Xv′ n + An , dKv − β (v)dv  0; 2 n 2 s

Multi-valued Stochastic Differential Equations  t< s

1 Xv′ − Xv + An 2 n



Xv + Xv′ 2



, dKv′

n

− β (v)dv

Summing them we get  t  < 1 t 1 Xv Xv − Xv′ , dKv − dKv′   − An n s s 2  < Xv 2 t + An n s

+ Xv′ 2

=

195  0.



, dKv′ + dKv

= + Xv′ , β n (v) dv. 2

=

By (ii) the second term is positive since An (x) ∈ AJn (x) for all x. Since |An (x)|  cn|x| for some constant c independent of n, the first term goes to 0 as n → ∞ in virtue of (i) and the dominated convergence theorem. Hence we get the result.  We can deduce from this immediately the following corollary. Corollary 3.2. In Definition 2.3, (iv) can be replaced by the following equivalent condition (iv′ ) For any c` adl` ag and (Ft )-adapted function α with αt ∈ D(A),

and



t

0

we have

∀t ∈ [0, +∞),

|A0 (αs )|ds < ∞, ∀t  0,

Xt − αt , dKt − A0 (αt )dt  0

a.s.

The rest of the section is devoted to the proof of Theorem 2.4. We first consider the following equation:    dZt ∈ −A(Zt )dt + b(Zt )dt + σ(Zt )dBt + ˜ (dt, du), f (Zt− , u)N (2) U0  Z 0 = x0 . where U0 ∈ B(U) such that ν(U \ U0 ) < ∞. We have

Theorem 3.3. Under (H1)–(H4), Equation (2) has a unique strong solution. 3.1. Proof of Theorem 3.3: Uniqueness Suppose (Z, K) and (Z ′ , K ′ ) are solutions. By Itˆ o’s formula,  t  t Zs − Zs′ , σ(Zs ) − σ(Zs′ )dBs |Zt − Zt′ |2 = 2 Zs − Zs′ , b(Zs ) − b(Zs′ )ds + 2 0 0  t  t+  ′ ˜ (ds, du) −2 Zs − Zs′ , dKs − dKs′  + (f (Zs− , u) − f (Zs− , u))2 N 0

+2



0

t+

′ Zs− − Zs− ,



0

U0

U0

′ ˜ (ds, du) (f (Zs− , u) − f (Zs− , u))N

196

J. Ren and J. Wu +



0

t

σ(Zs ) −

σ(Zs′ ) 2 ds

+

 t 0

U0

′ (f (Zs− , u) − f (Zs− , u))2 dsν(du)

Taking expectations on both sides, we have by (H1) and (H2)  t  t E|Zt − Zt′ |2  E Zs − Zs′ , b(Zs ) − b(Zs′ )ds + E σ(Zs ) − σ(Zs′ ) 2 ds 0

+E  CE

 t

0  t 0

0+

U0

′ (f (Zs− , u) − f (Zs− , u))2 ν(du)ds

|Zs − Zs′ |2 ds.

Gronwall’s lemma yields that E|Zt − Zt′ |2 = 0 and therefore we get the uniqueness. 3.2. Proof of Theorem 3.3: Existence We shall denote by D the space of c` adl` ag increasing functions f on [0, ∞) such that f (0) = 0. We equip it with the topology of convergence at the continuity points of the target function. Then it is a Polish space with the metric  ∞ e−t [1 ∧ |f (t) − g(t)|]dt. dD (f, g) := 0

By Helly’s first theorem, for any positive increasing function C(t), the set {f : |f (t)|  C(t), ∀t > 0}

is compact in D. Hence we have

Lemma 3.4. If (ξn ) is a family of D-valued random variables with sup E|ξn (t)| < ∞, ∀t > 0, n

then (ξn ) is tight on D.

Let D([0, +∞); Rd ) (resp. C([0, +∞); Rd )) denote the space of c`adl` ag (resp. continuous) functions from [0, +∞) to Rd equipped with the Skorohod (resp. compact uniform convergence) topology. Denote by Ts [0, ∞) (resp. Tc [0, ∞)) the space of right-continuous and strictly increasing (resp. continuous and nondecreasing) functions a on [0, ∞) with a(0) = 0 and a(∞) = ∞. The following proposition is due to [9]. Proposition 3.5. Suppose {xn } ⊂ D([0, +∞); Rd ), {an } ⊂ Ts [0, ∞) and an (0) = 0, an (t) → ∞ when t → ∞. Let yn (t) = xn (a−1 n (t)). Suppose there exist y ∈ D([0, +∞); Rd ) and a ∈ Ts [0, ∞) such that d

yn → y,

n → ∞,

in D([0, +∞); R ) and an (t) → a(t) at each continuity point of a. Define x(t) = y(a(t)). If y is constant except for at most one jump on any interval [s, t] on which a−1 is constant, then xn → x in the Skorohod topology.

Multi-valued Stochastic Differential Equations

197

Consider the equation: dZtn = − An (Ztn )dt + b(Ztn )dt + σ(Ztn )dBt  n ˜ (dt, du); Z0n = x0 . + f (Zt− , u)N

(3)

U0

Since An is Lipschitz by Proposition 2.2, there exists a unique solution Z n to (3) under (H1)–(H4) by [7, Ch. 4, Th. 9.1]. Now we set  t n at = |An (Zvn )|dv + t; θtn = (ant )−1 ; 0  t  t n n n ˜ (dv, du); Mt = σ(Zv )dBv + f (Zv− , u)N 0

Ktn =



0

U0

t

0

An (Zvn )dv;

Ytn = Zθntn ;

Htn = Kθntn .

We need the lemma below from C´epa [5]: Lemma 3.6. ∃a0 ∈ Rd , r > 0, µ  0 such that for all n ∈ N, x ∈ Rd An (x), x − a0   r|An (x)| − µ|x − a0 | − rµ. Proposition 3.7. (an , M n , Y n , H n , θn )n is tight in D × D(R2d ) × C(Rd+1 ). Proof. (i) (H n , θn )n is tight in C(Rd+1 ) since for any n |θtn − θsn |  |t − s| and

(4)

   n    t A (Y n )  θt   n v n ′ n n  dv  An (Zv′ − )dv  =  |Ht − Hs | =  n   θsn s 1 + |An (Yv )|   t n  An (Yv )    dv  |t − s|.   n  s 1 + |An (Yv )|

(5)

n (ii) Define TNn = inf{t  0 : |Ztn |  N or |Zt− |  N }. By Itˆ o’s formula and Lemma 3.6,  t∧TNn +  1 n 1 n n ˜ (ds, du) |Zt∧T n − a0 |2 = |x0 − a0 |2 + Zs− − a0 , f (Zs− , u)N N 2 2 0 U0 n   t∧TNn  1 t∧TN + n 2 ˜ + (f (Zs− , u)) N (ds, du) + Zsn − a0 , b(Zsn )ds 2 0 U0 0  t∧TNn  t∧TNn + Zsn − a0 , σ(Zsn )dBs − Zsn − a0 , An (Zsn )ds 0

1 + 2



0

n t∧TN



U0

0

(f (Zsn , u))2 dsν(du)

1 + 2



n t∧TN

0

tr(σσ ∗ )(Zun )du

198

J. Ren and J. Wu  t∧TNn +  1 2 n n ˜ (ds, du)  |x0 − a0 | + Zs− − a0 , f (Zs− , u)N 2 0 U0 n    t∧TNn 1 t∧TN + n 2 ˜ Zsn − a0 , b(Zsn )ds (f (Zs− , u)) N (ds, du) + + 2 0 U0 0  t∧TNn + Zsn − a0 , σ(Zsn )dBs 0

+

1 2



−C

n t∧TN

0





(f (Zsn , u))2 dsν(du) +

U0

n t∧TN

0

|An (a0 )|2 ds + C



n t∧TN

0

1 2



n t∧TN

0

tr(σσ ∗ )(Zsn )ds

|Zsn − a0 |2 ds.

By the BDG inequality :  .  s∧TNn +    n n ˜ (dv, du) E sup  Zv− − a0 , f (Zv− , u)N  0st  0 U0 . : 1/2 n  t∧TN

 CE

0

 CE 

1 E 4

.

%

n t∧TN

0

|Zsn − a0 |2 |Zsn

2

− a0 | (1 +

n 2 sup |Zs∧T n − a0 | N

0st

where we used the inequality |xy|  Similarly

U0

&

f (Zsn , u) 2 ν(du)ds :1/2

|Zsn |2 )ds

+ CE



n t∧TN

0

|Zvn − a0 |2 dv + Ct,

1 2 |x| + 9|y|2 . 9

 :  s∧TNn    E sup  Zvn − a0 , σ(Zvn )dBv    0st 0 % &  t∧TNn 1 n 2  E sup |Zs∧TNn − a0 | + CE |Zvn − a0 |2 dv + Ct. 4 0st 0  : .  s∧TNn +     n 2 ˜ (f (Zv− , u)) N (dv, du) E sup   0st  0 U0 % &  t∧TNn 1 n 2  E sup |Zs∧T + CE |Zsn − a0 |2 ds + Ct. n − a0 | N 4 0st 0 .

It is easy to prove that the expectation of the other terms are bounded by  t∧TNn CE |Zsn − a0 |2 ds. 0

(6)

Multi-valued Stochastic Differential Equations

199

Combining all these inequalities we obtain finally % &

 t n 2 n 2 E sup |Zs∧T n − a0 |  C(t + 1) + C E sup |Z n − a | ds. 0 v∧TN N 0st

0vs

0

Then Gronwall’s Lemma yields that & % n 2 E sup |Zs∧TNn − a0 |  C(t)eCt . 0st

Letting N → ∞ we have

%

E

sup

0st

|Zsn

− a0 |

2

&

 C(t).

(7)

Substituting this into (6) we obtain  t∧TNn n 2 E |An (Zvn )|dv  CE|Zt∧T n − a0 | . N 0

Letting N → ∞ gives

E



t

0

Hence for all n,

|An (Zvn )|dv  C(t).

E(ant )  C(t) < ∞. This proves that (an )n is tight in D by Lemma 3.4. (iii) By (H2) and (ii),  t+   t n ˜ (dv, du)|2 f (Zv− , u)N σ(Zvn )dBv + E|Mtn − Msn |2 = E| s

s

 E| E





t

s t

s

E



s

t

σ(Zvn )dBv |2 + E|

σ(Zvn ) 2 dv + E



U0 t+

s

 t s



U0

(8)

n ˜ (dv, du)|2 f (Zv− , u)N

U0

f (Zvn , u) 2 ν(du)dv

(1 + Zvn 2 )dv

 C(t)|t − s|.

Hence for 0  t1 < t < t2 < ∞, we have by the Markov property

E{|Mtn − Mtn1 |2 |Mtn2 − Mtn |2 } = E{E(|Mtn − Mtn1 |2 |Mtn2 − Mtn |2 |Ft )}

= E{|Mtn − Mtn1 |2 E(|Mtn2 − Mtn |2 |Ft )}

= E{|Mtn − Mtn1 |2 E(|Mtn2 − Mtn |2 |Mtn )}  CE{|Mtn − Mtn1 |2 |t2 − t|}

 C|t − t1 ||t2 − t|  C|t2 − t1 |2 .

(9)

200

J. Ren and J. Wu For s < t we have by the BDG inequality

E[|Zθntn − Zθnsn |2 |Fθsn ]   n θt

=E |

θsn

  +E |

b(Zvn )dv − Htn + Hsn |2 |Fθsn

θtn

θsn

  +E |

σ(Zvn )dBv |2 |Fθsn

θtn +

θsn

 CE[|θsn − θtn ||Fθsn ]



U0

n ˜ (dv, du)|2 |Fθn f (Zv− , u)N s

 C|t − s|.

Consequently for t1 < t < t2 we have E{|Ytn2 − Ytn |2 |Ytn − Ytn1 |2 } = E[|Zθntn − Zθntn |2 |Zθntn − Zθntn |2 ] 2

1

= E[|Zθntn − Zθntn |2 E[|Zθntn − Zθntn |2 |Fθtn ]] 1

2

 C|t2 − t|E[|Zθntn − Zθntn |2 ] 1

 C|t2 − t||t − t1 |

 C|t2 − t1 |2 .

(10)

Hence the tightness of (M n , Y n )n in D(R2d ) follows from (9) and (10) by [3, Ch. 3] (or [10, Theorem 6.23]). The proof is now complete.  Denote the law of (an , M n , Y n , H n , θn ) by P n . Then by the proposition above, there exists a subsequence nk such that P nk → P,

k → ∞.

˜ F, ˜ P˜ ) on By the Skorohod realization theorem, there exists a probability space (Ω, n ˜ n ˜ n ˜ n ˜n ˜ ˜ ˜ ˜ which (˜ a , M , Y , H , θ ) and (˜ a, M , Y , H, θ) are defined such that for all n ˜ n , Y˜ n , H ˜ n , θ˜n ) (˜ an , M has the same distribution as (an , M n , Y n , H n , θn )

and

˜ ˜ ˜ ˜ P˜ (˜a,M,Y ,H,θ) = P

and P˜ -almost surely ˜ ˜ n , Y˜ n , H ˜ n , θ˜n ) → (˜ ˜ , Y˜ , H, ˜ θ). (˜ an , M a, M

(11)

˜ n and θ˜n are almost surely Lipschitz continuous with a In particular, since H ˜ ˜ and θ. common Lipschitz constant 1, so are H Denote Z˜ n = Y˜ nn , Z˜t = Y˜a˜t ; t

a ˜t

˜n = H ˜ nn , K ˜t = H ˜ a˜t . K t a ˜t

Multi-valued Stochastic Differential Equations

201

˜ 1 ∈ F˜ with P˜ (Ω ˜ 1 ) = 1 such that if ω ˜ 1 , and Proposition 3.8. There exists Ω ˜ ∈Ω d ˜ ˜ there exists 0  s  t < ∞ such that θs (˜ ω ) = θt (˜ ω ), then for all x ∈ R ,  t ˜ u (˜ Y˜u (˜ ω ) − x, dH ω )  0. s

Proof. The proof is the same as the continuous case given in [6] and we give the short proof here for completeness. ˜1 ⊂ Ω ˜ such that P˜ (Ω ˜ 1 ) = 1 and the convergence (11) holds a.s. on Choose Ω it. Since An is monotone,  θ˜tn ˜ vn − An (x)dv  0. Z˜vn − x, dK θ˜sn

Thus



s

Now

t

˜ n  Y˜vn − x, dH v



t

s

Y˜vn − x, An (x)dθ˜vn .

 t     Y˜vn − x, An (x)dθ˜vn   C( sup |Y˜vn | + |x|)|A◦ (x)||θ˜tn − θ˜sn |   svt s

 C|θ˜tn − θ˜sn |,

and the result follows by letting n → ∞ and from (11).



˜ 2 ∈ F, ˜ P˜ (Ω ˜ 2 ) = 1 such that for all ω ˜ 2 , if there Proposition 3.9. There exists Ω ˜∈Ω exists 0  s  t < ∞ such that θ˜s = θ˜t , then ˜ s (˜ ˜ v (˜ Y˜v (˜ ω ) − Y˜s (˜ ω) = H ω) − H ω ), ∀v ∈ [s, t].

˜ n the subspace of Ω ˜ such that P˜ (Ω ˜ n ) = 1 on which the equation Proof. Denote by Ω below holds:  n ˜ (dt, du) − dK ˜ n. ˜n + dZ˜tn = b(Z˜tn)dt + σ(Z˜tn )dB f (Z˜t− , u)N t t U0

Set ˜ 2 = ∩n Ω ˜ n, Ω

˜ 2 ) = 1 and on Ω ˜ 2 we have for v ∈ (s, t] then P˜ (Ω  θ˜vn  θ˜vn n n n ˜ ˜ ˜ ˜hn Yv − Ys = b(Zh )dh + σ(Z˜hn )dB θ˜sn

+ =





θ˜vn +

θ˜sn

θ˜vn

θ˜sn



U0

θ˜sn

n ˜ (dh, du) − (H ˜ vn − H ˜ sn ) f (Z˜h− , u)N

˜n − H ˜ n ). ˜ ˜nn ) − (H ˜ ˜nn − M b(Z˜hn )dh + (M v s θ θ v

s

202

J. Ren and J. Wu

Since  ˜n   ˜n  θ˜vn  θv  θv   n b(Z˜h )dh  |(1 + |Z˜hn |)dh  C2 |θ˜vn − θ˜sn |, |b(Z˜hn )|dh  C2   θ˜sn  n n ˜ ˜ θs θs thus if θ˜t = θ˜s , by the convergence (11),   ˜n   θv   b(Z˜hn )dh  C|θ˜sn − θ˜vn | → 0.    θ˜sn

˜ ˜ is constant on [s, t] it is continuous at least on (s, t]. Hence, by the Since v → M θv ˜n →M ˜ in D(Rd ) one knows that convergence M ˜ ˜nn → M ˜˜ M θv

for v ∈ (s, t]. Thus

θv

˜ ˜nn → 0, ∀v ∈ (s, t]. ˜ ˜nn − M M θ θ t

s

Taking limit yields for all v ∈ (s, t], ˜s − H ˜v ; Y˜v − Y˜s = H

note that the equality holds of course for v = s and the proof is therefore complete. 

˜ ′ ∈ F˜ , P˜ (Ω ˜ ′ ) = 1 such that for all ω ˜ ′ , there exists Proposition 3.10. ∃Ω ˜ ∈ Ω 0  s  t < ∞ such that θ˜s = θ˜t . Then Y˜v = Y˜t for v ∈ [s, t]. Proof. Take ˜′ = Ω ˜1 ∩ Ω ˜ 2, Ω ˜ ′ ) = 1. If θ˜s = θ˜t , since θ˜· is increasing, thus θ˜· is constant on [s, t]. then P˜ (Ω ˜ h is Lipschitz continuous hence absolutely continuous, by Noticing that h → H Proposition 3.8 and 3.9 we have for v ∈ [s, t],  t 2 2 ˜ ˜ ˜ ˜ ˜h − H ˜ v , dH ˜ h |Yt − Yv | = |Ht − Hv |  2 H v  t ˜ h   0. Y˜h − Y˜v , dH = −2 v

Thus Y˜v = Y˜t .



Remark. With (H4), we do not need to prove Y (t) ∈ D(A)-a.s. for all t ∈ [0, ∞). Otherwise it would be a sticky job since after time change, the c`adl` ag function Y (t) may jump out of the domain. ˜ n ) → (Z, ˜ K) ˜ in By Proposition 3.5 and Proposition 3.10, we know (Z˜ n , K the Skorohod topology. Since relativized to the space continuous functions C the Skorohod topology coincides with the compact uniform topology we have ˜ n) → (Z, ˜ K) ˜ in D(Rd ) × C(Rd ). Proposition 3.11. (Z˜ n , K

Multi-valued Stochastic Differential Equations

203

To proceed we need the following lemma (see [8, Lemma 2]): Lemma 3.12. Let f n, g n ∈ D(Rd ) and supn |g n|T < +∞, (f n , g n) → (f, g) in D(R2d ), then



 .  . n n n n f ,g , fs dgs → f, g, in D(R2d+1 ). fs dgs 0

0

˜ F˜ , P˜ ), (Z, ˜ K) ˜ solves the following We need to prove that on the space (Ω, equation:    ˜t + ˜ (dt, du),  dZ˜t ∈ −A(Z˜t )dt + b(Z˜t )dt + σ(Z˜t )dB f (Z˜t− , u)N U0 (12)   Z˜ = x . 0 0 Proposition 3.13. Z˜ satisfies the following equation:  ˜ ˜ ˜ ˜ ˜ (dt, du) − dK ˜t . dZt = b(Zt)dt + σ(Zt )dBt + f (Z˜t− , u)N

(13)

U0

Proof. Since Z˜ n solves: ˜n + dZ˜tn = b(Z˜tn)dt + σ(Z˜tn )dB t



U0

n ˜ (dt, du) − dK˜n , f (Z˜t− , u)N t

˜ n ) → (Z, ˜ K) ˜ on D(Rd ) × C(Rd ). Then the result follows by letting and (Z˜ n , K n → ∞ and using the same argument as in [12] by using Lemma 3.12.  ˜ K) ˜ is a solution, it remains to verify (iv) in DefiniFinally, to prove that (Z, tion 2.3. To this end, by Corollary 3.2, it suffices to prove Proposition 3.14. For any adapted c` adl` ag function αt such that αs ∈ D(A),

and



0

we have

t

∀s ∈ [0, +∞)

|A0 (αs )|ds < ∞, ∀t  0,

˜ s − A0 (αs )ds  0. Z˜s − αs , dK Proof. Since An is monotone,  t ˜ v − An (αv )dv  0. Z˜vn − αv , dK s

From the proof of Proposition 3.7 we know that ˜n − H ˜ n |  |t − s|. an  C(t) a.s.; |H t

t

s

Thus ˜ n |t  |H n |an  ant  C(t, ω |K ˜ ) < ∞. t

(14)

204

J. Ren and J. Wu

By Lemma 3.6, Proposition 3.11 we have by letting n → ∞,  t  t ˜ n → ˜ v ; Z˜vn − αv , dK Z˜v − αv , dK v s s  t  t Z˜vn − αv , An (αv )dv → Z˜v − αv , A◦ (αv )dv. s

(15) (16)

s

Combining (14) and (15) gives  t ˜ v − A◦ (αv )dv  0. Z˜v − αv , dK



s

˜ K) ˜ is a weak solution of Equation (2). By Subsection 3.1, the Hence (Z, pathwise uniqueness for this equation holds. Consequently, there is a unique strong solution to Equation (2) by Yamada-Watanabe theorem and the proof is complete. 3.3. Proof of Theorem 1.1 Now that Theorem 3.3 is proved, to prove Theorem 2.4 it suffices to patch up together the unique solutions in different intervals between big jumps. The procedure is standard as in, e.g., [7, 1] and we omit the details. Acknowledgment We thank Xicheng Zhang heartily for his very carefully reading the manuscript and for helpful discussions. We also thank the anonymous referees for their useful suggestions.

References [1] Applebaum, D.: Levy Processes and Stochastic Calculus. Cambridge University Press, U.K., 2004. [2] Barbu, V.: Nonlinear Semigroups and Differential Equations in Banach Spaces. Leyden Netherland, Noord-Hoff Internet Publishing, 1976. [3] Billingsley, P.: Convergence of Probability Measures. John Wiley and Sons, Inc., New York, 1968. [4] Carothers, N.L.: Real Analysis. Cambridge Univ. Press, 2000. ´ [5] C´epa, E.: Equations diff´erentielles stochastiques multivoques. Lect. Notes in Math. S´em. Prob. XXIX 86–107 (1995). [6] C´epa, E.: Probl`eme de skorohod multivoque. Ann. Prob. V. 26, No. 2, 500–532 (1998). [7] Ikeda, N. and Watanabe S.: Stochastic Differential Equations and Diffusion Processes. 2nd ed., Kodansha, Tokyo/North-Holland, Amsterdam, 1989. [8] Jakubowski, A., M´emin, J. and Page, G.: Convergence en loi des suite d’int´egrales stochastiques sur l’espace D1 de Skorohod, Probab. Th. Rel. Fields V. 81, No. 2, 111–137 (1989). [9] Kurtz, T.: Random time change and convergence in distribution under the MeyerZheng conditions. Ann. Probab. 19, No. 3, 1010–1034 (1991).

Multi-valued Stochastic Differential Equations

205

[10] Qian, M. and Gong, G.: Theory of Stochastic Processes. Beijing Univ. Press, 1997 (in Chinese). [11] Ren, J. and Xu, S.: A transfer principle for multivalued stochastic differential equations. J. Funct. Anal. 256, No. 9, 2780–2814 (2009). [12] Skorohod, A.V.: Studies in the Theory of Random Processes. Addison-Wesley, Reading, Massachusetts, 1965. [13] Zhang, X.: Skorohod problem and multivalued stochastic evolution equations in Banach spaces. Bull. Sci. Math. 131, 175–217 (2007). Jiagang Ren and Jing Wu School of Mathematics and Computational Science Sun Yat-sen University Guangzhou Guangdong 510275, P.R. China e-mail: [email protected] [email protected]

Progress in Probability, Vol. 65, 207–219 c 2011 Springer Basel AG 

Sensitivity Analysis for Jump Processes Atsushi Takeuchi Abstract. Consider stochastic differential equations with jumps. The goal in this paper is to study the sensitivity of the solution with respect to the initial point, under the conditions on the L´evy measure and the uniformly elliptic condition on the coefficients. The key tool is the martingale property based upon the Kolmogorov backward equation for the infinitesimal generator associated with the equation. Mathematics Subject Classification (2000). 60H30, 60J75, 60H07. Keywords. Heat kernel, Jump processes, Logarithmic derivatives, Stochastic differential equations.

1. Introduction The Malliavin calculus helps us to study the existence of smooth densities for solutions to stochastic differential equations (SDEs). It is well known that the H¨ ormander type conditions on the coefficients imply the negative-moment integrability of the Malliavin covariance matrix, and that the Sobolev inequality leads to the existence of the smooth density. See [16, 17, 18]. In this paper, we shall focus on the sensitivity analysis for the solution to the jump-type SDE with respect to the initial point, which is equivalent to the Greeks (Delta) computations for payoff functions of asset price models in mathematical finance. There have been a lot of studies in case of the process without any jumps, by using the Malliavin calculus on the Wiener space ([12, 17]), the Girsanov transform on Brownian motions ([3]), and the Kolmogorov backward equation for the infinitesimal generator associated with the diffusion process ([11]). On the other hand, a similar problem has been discussed in case of jump-type financial models. Cass and Friz [7], and Davis and Johansson [8] studied for jump diffusion processes, but their approach does not take any effects from the jumps. Bally et al. [2], Elkhatib and Privault [9], and Elliott and Tsoi [10] applied the calculus focused on the jump term effects to the Greeks computations. The author in [19] studied the same problem in case of solutions to SDEs with jumps via the martingale method

208

A. Takeuchi

by using the Kolmogorov backward equation for the integro-differential operator associated with the equation. The results obtained in [19] reflect the effects from the diffusion and the jump coefficients. Moreover, the process can be of pure jump type, and of infinitely activity type. This paper is organized as follows: In Section 2, we shall prepare some notations and introduce our framework. Furthermore, the main result on the sensitivity with respect to the initial point will be also stated there. Section 3 is devoted to give the sketch of the proof, and we shall give some examples in Section 4.

2. Preliminaries and main result At the beginning, we shall prepare some notations. Fix T > 0, and let (Ω, F , P) be our underlying probability space. Denote by E[ · ] the expectation with respect to the measure dP, and by IK the indicator function for K ⊂ R. The script “ 0 ” of C02 (R), etc. indicates the compact support, and the script “ b ” of Cb (R), etc. indi∞ cates the boundedness. Let C1+,b (R) be the family of C ∞ (R)-functions such that all derivatives of any orders more than 1 are bounded. For an R-valued function f (t, y, θ), denote the derivative in t by ∂t f (t, y, θ), the one in y by f ′ (t, y, θ), and the one in θ by ∂θ f (t, y, θ). Denote by ci (i = 0, 1, 2, . . . ) different positive finite constants. Write R0 = R\ {0}, and let dν be a L´evy measure over R0 . Throughout the paper, suppose that (i) (the moment condition): for any p ≥ 1,   |θ|p dν < +∞, |θ| dν +

(2.1)

|θ|>1

|θ|≤1

(ii) (the order condition): there exists α > 0 such that    lim inf ρα (θ/ρ)2 ∧ 1 dν > 0, ρց0

(2.2)

R0

(iii) the measure dν has a C 1 -density g (θ) with respect to the Lebesgue measure over R0 such that lim

|θ|ր+∞

θ2 g(θ) = 0.

(2.3)

Let W = {Wt ; t ∈ [0, T ]} be a one-dimensional Brownian motion with W0 = 0, and dJ a Poisson random measure on [0, T ] × R0 with the intensity dJˆ = dt dν. Denote by {Ft ; t ∈ [0, T ]} the augmented filtration generated by W and dJ. In order to avoid the lengthy expression, we shall use the notations: dJ˜ = dJ − dJˆ,

dJ = I(|θ|≤1) dJ˜ + I(|θ|>1) dJ.

Sensitivity Analysis for Jump Processes

209

∞,∞ ∞ Let Ai (y) ∈ C1+,b (R) (i = 0, 1), and Bθ (y) in C1+,b (R × R0 ) as a function of (y, θ). Suppose that the functions A1 (y) and Bθ (y) satisfy

inf inf |1 + Bθ′ (y)| > 0,

lim Bθ (y) = 0,

y∈R θ∈R0

2

inf (A1 (y)) > 0,

y∈R

|θ|ց0

2

inf inf (∂θ Bθ (y)) > 0.

y∈R θ∈R0

(2.4) (2.5)

For x ∈ R, consider the jump process X = {Xt ; t ∈ [0, T ]} determined by the SDE of the form: X0 = x and  (2.6) Bθ (Xt− ) dJ , dXt = A0 (Xt ) dt + A1 (Xt ) ◦ dWt + R0

where the symbol “ ◦dWt ” indicates the Stratonovich stochastic integral. From the conditions on the coefficients, there exists a unique solution to the equation (2.6). The process X is Markovian, and the associated infinitesimal generator is the integro-differential operator of the form:  1 Lf (y) = A0 f (y) + A1 A1 f (y) + {Bθ f (y) − Bθ f (y)} dν 2 R0

for f ∈ C02 (R), where Ai f (y) = Ai (y) f ′ (y) (i = 0, 1) and Bθ f (y) = Bθ (y) f ′ (y) are vector fields over R, A1 A1 f (y) = A1 (y) (A1 (y) f ′ (y))′ and Bθ f (y) = f (y + Bθ (y)) − f (y). The Kolmogorov continuity criterion on random fields implies that, for each t ∈ [0, T ], the function R ∋ x −→ Xt ∈ R has a C 1 -modification (cf. [13]). Moreover, the derivative Zt = ∂x Xt is invertible a.s., and the process Z = {Zt ; t ∈ [0, T ]} satisfies the linear SDE: Z0 = 1 and  (2.7) dZt = A′0 (Xt ) Zt dt + A′1 (Xt ) Zt ◦ dWt + Bθ′ (Xt− ) Zt− dJ. R0

Now, we shall briefly introduce the criterion on the existence of the smooth density for the random variable XT stated in [16, 18]. Write   ˜θ (y) = 1 + B ′ (y) −1 ∂θ Bθ (y) θ, B θ

and let 0 < ρ < 1. Since       ˜θ (y)/ρ 2 ≥ c0 ∂θ Bθ (y) θ/ρ 2 ≥ c1 θ/ρ 2 B

∞,∞ from Bθ (y) ∈ C1+,b (R × R0 ) and the condition (2.5), we see that  ! 2    " ˜θ (y)/ρ 2 ∧ 1 dν inf A1 (y)/ρ + B y∈R R  0 !  2  " ≥ c2 ρ−2 + θ/ρ ∧ 1 dν R0

−γ

≥ c3 ρ

by the condition (2.2), where γ = 2 ∧ α. Then, using the result in [16, 18], there exists a smooth density pT (x; y) for the probability law of the random variable XT

210

A. Takeuchi

with respect to the Lebesgue measure on R. The goal in the present paper is to study the sensitivity for the solution XT in the initial point x ∈ R. In order to state the main result on the sensitivity for XT with respect to x, we shall prepare the notations. Write  T A˜T = θ2 dJ, 0

R0

and define the family of R-valued functions by !n " F= αk fk IAk ; αk ∈ R, fk ∈ CLG (R), Ak ⊂ R : interval , k=1

where CLG (R) is the set of continuous functions ψ with the linear growth condition: |ψ(y)| ≤ c4 (1 + |y|). Theorem 1. For ϕ ∈ F, it holds that

$ # KT LT − VT + 2 , ∂x (E [ϕ(XT )]) = E ϕ(XT ) AT AT

(2.8)

where AT = T + A˜T and  T  T 2Zt− θ4 Zt dWt , KT = dJ, LT = ˜ A1 (Xt ) 0 R0 Bθ (Xt− ) 0

 T 1 ∂ g(θ) Zt− θ 3 ˜ VT = dJ. ˜θ (Xt− ) B 0 R0 g(θ) ∂θ Remark 2.1. The class F includes typical functions which often appear in mathematical finance, such as continuous, or discontinuous European payoffs (e.g., call options, put options, digital options, etc.).  Remark 2.2. Let λ > 1 be sufficiently large, and write  T   λ ˆ NT = exp(−λθ2 ) − 1 dJ. 0

R0

Since AT is non-negative and   2  T λ NT ≤ − (λθ ) ∧ 1 dν ≤ −c5 λα/2 T 2 R0

from the condition (2.2), we see that, for any p > 0,  +∞ ' ( ' ( 1 = λp−1 E exp(−λ AT ) dλ E A−p T Γ(p) 0  +∞ 3  4 1 = λp−1 E exp − λA˜T − NTλ exp(−λT + NTλ ) dλ Γ(p) 0  +∞ ) * 1 ≤ λp−1 exp −λT − c5 λα/2 T dλ < +∞ Γ(p) 0

from the Fubini theorem.



Sensitivity Analysis for Jump Processes

211

3. Proof of Theorem 1 We shall give the proof of Theorem 1 briefly. Suppose that ϕ ∈ C02 (R) for a while. Define  ' ( u(t, x) = E ϕ(XT −t )X0 = x (t ∈ [0, T )) .

Since ϕ is bounded, u belongs to Cb1,2 ([0, T )×R). Moreover, the function u satisfies the Kolmogorov backward equation (cf. [14]): ∂t u + Lu = 0,

lim u(t, x) = ϕ(x).

tրT

The following lemma plays a crucial role in our discussion. Lemma 3.1. For ϕ ∈ C02 (R), it holds that  T ϕ(XT ) = E[ϕ(XT )] + u′ (s, Xs ) A1 (Xs ) dWs 0

+

 T 0

Proof. Since u ∈

(3.9)

˜ Bθ u(s, Xs− ) dJ.

R0

Cb1,2 ([0, T )

× R), the Itˆo formula yields that  t u(t, Xt ) = u(0, x) + u′ (s, Xs ) A1 (Xs ) dWs 0  t Bθ u(s, Xs− ) dJ˜ + 0

(3.10)

R0

for t ∈ [0, T ). Both sides of (3.10) converge to the ones of (3.9) as t ր T , by ϕ ∈ C02 (R) and the boundedness of u′ .  Lemma 3.2. For ϕ ∈ C02 (R), it holds that   T ∂x E [ϕ(XT )] = E [ϕ(XT ) LT ] . Proof. Since   T 0

R0

  ˜ Bθ u(t, Xt− ) dJ LT =

T

0

+

 t 0



0

from the Itˆ o formula, we have     T Bθ u(t, Xt− ) dJ˜ LT E 0

R0



T

=E

= 0.

0

 t 0

R0

T



(3.11)

˜ Bθ u(s, Xs− ) dJ dLt

R0

Bθ u(t, Xt− ) Lt− dJ˜

R0



Bθ u(s, Xs− ) dJ˜ dLt + E

0

T



R0

Bθ u(t, Xt− ) Lt− dJ˜

212

A. Takeuchi

Multiplying (3.9) in Lemma 3.1 by LT yields that  T

E [ϕ(XT ) LT ] = E[ϕ(XT )] E[LT ] + E +E =E

  T 0



T



u (t, Xt ) A1 (Xt ) dWt

0

R0

 ˜ Bθ u(t, Xt− ) dJ LT



LT

u′ (t, Xt ) Zt dt

0

= T ∂x (E [ϕ (XT )]) . The last equality can be obtained by the commutativity between the derivative ∂x and the integral with respect to dP ⊗ dt, which comes from the dominated convergence theorem and ϕ ∈ C02 (R).  Remark 3.1. As seen above, only the uniformly elliptic condition on the diffusion coefficient A1 (y) is used in Lemma 3.2. The condition on the L´evy measure dν and the uniformly elliptic condition on the jump term coefficient are not necessary.  Lemma 3.3. For ϕ ∈ C02 (R), it holds that

Proof. Write J˜T = 

T

 ' ( ∂x E ϕ(XT ) A˜T = −E [ϕ(XT ) VT ] .

 T 0

(3.12)

˜ Since θ2 dJ.

R0



u (s, Xs ) A1 (Xs ) dWs

0



J˜T =



T

0

+



 T

t



u (s, Xs ) A1 (Xs ) dWs dJ˜t ′

0

u′ (t, Xt ) A1 (Xt ) J˜t dWt

0

from the Itˆ o formula, we have   T



E

u (s, Xs ) A1 (Xs ) dWs

0



T

=E

0





J˜T

T 0



u (s, Xs ) A1 (Xs ) dWs dJ˜t ′

0

+E = 0.

t



u′ (t, Xt ) A1 (Xt ) J˜t dWt

Sensitivity Analysis for Jump Processes

213

Multiplying (3.9) in Lemma 3.1 by J˜T yields that   T 3 4 ' ( ′ ˜ ˜ u (t, Xt ) A1 (Xt ) dWt J˜T E ϕ(XT ) JT = E[ϕ(XT )] E JT + E +E =E

  T

  T 0

0

0

R0

 Bθ u(t, Xt− ) dJ˜ J˜T

Bθ u(t, Xt ) θ 2 dJˆ . R0

Hence, the Fubini theorem enables us to see that   T 4 3 ˜ u(t, Xt + Bθ (Xt )) θ2 dJˆ . E ϕ(XT ) AT = E 0

On the other hand, since   T ′ u (t, Xt ) A1 (Xt ) dWt VT 0

=



T



u (t, Xt ) A1 (Xt ) Vt dWt +

0

(3.13)

R0

 T  0

t





u (s, Xs ) A1 (Xs ) dWs dVt 0

from the Itˆ o formula, we have    T ′ u (t, Xt ) A1 (Xt ) dWt VT E 0



T

u′ (t, Xt ) A1 (Xt ) Vt dWt

=E

0

  

t T u′ (s, Xs ) A1 (Xs ) dWs dVt +E 0

0

= 0.

Multiplying (3.9) in Lemma 3.1 by VT yields that   T E [ϕ(XT ) VT ] = E[ϕ(XT )] E[VT ] + E u′ (t, Xt ) A1 (Xt ) dWt VT +E =E

  T

  T 0

= −E

0

R0

 ˜ Bθ u(t, Xt− ) dJ VT

Bθ u(t, Xt )

R0

  T 0

0

R0

1 ∂ g(θ) ∂θ



g(θ) Zt θ 3 ˜θ (Xt ) B



dJˆ

u′ (t, Xt + Bθ (Xt )) (1 + Bθ′ (Xt )) Zt θ 2 dJˆ

214

A. Takeuchi    T = −∂x E 0

u(t, Xt + Bθ (Xt )) θ dJˆ 2

R0

) ' (* = −∂x E ϕ(XT ) A˜T .



Here, the third equality holds by the condition (2.3), and the forth equality can be justified from the commutativity between the derivative ∂x and the integral with respect to dP ⊗ dJˆ, which comes from the dominated convergence theorem and ϕ ∈ C02 (R). The last equality can be justified by the equality (3.13).  Remark 3.2. The uniformly elliptic condition on the function Bθ (y) and the conditions on the L´evy measure dν are essential in Lemma 3.3.  Summing up Lemma 3.2 and Lemma 3.3, we have Corollary 3.1. For ϕ ∈ C02 (R), it holds that   ' ( ∂x E[ϕ(XT ) AT ] = E ϕ(XT ) (LT − VT ) .

(3.14)

Proof of Theorem 1. Since the random variable XT admits a smooth density with respect to the Lebesgue measure on R as stated in Section 2, it is sufficient to prove the assertion of Theorem 1 in case of ϕ ∈ C02 (R), via the standard density argument. See [15, 19] for details. Since we have already obtained in Corollary 3.1 that   ' ( ∂x E[ϕ(XT ) AT ] = E ϕ(XT ) (LT − VT )

for ϕ ∈ C02 (R), it is sufficient to discuss how to get rid of AT from the left-hand side of the above equality. Write MTλ = exp(−λT + NTλ ), and introduce the new probability measure dPλ over our underlying measurable space (Ω, F) defined by    dPλ  = exp − λA˜T − NTλ .  dP FT

From the Girsanov theorem (cf. [1]), we see that under the probability measure dPλ , the process W = {Wt ; t ∈ [0, T ]} is also the one-dimensional Brownian motion, and that dJ is the Poisson random measure on [0, T ] × R with the 2 ˆ Moreover, remark that the random measure dJ˜λ defined by intensity e−λθ dJ. −λθ 2 ˆ ˜ dJλ = dJ − e dJ generates the martingale under the measure dPλ . We shall rewrite the equation (2.6) as follows:  dXt = A0 (Xt ) dt + A1 (Xt ) ◦ dWt + Bθ (Xt− ) dJ R  0 (3.15) ˜ Bθ (Xt− ) dJ λ , = A0 (Xt ) dt + A1 (Xt ) ◦ dWt + R0

Sensitivity Analysis for Jump Processes where A˜0 (y) = A0 (y) +



|θ|≤1

215

  2 Bθ (y) e−λθ − 1 dν,

dJ λ = I(|θ|≤1) dJ˜λ + I(|θ|>1) dJ.

Denote by Eλ [ · ] the expectation with respect to the measure dPλ . Then, similarly to Corollary 3.1 on the original probability space (Ω, F , P), it holds that   ' ( ∂x Eλ [ϕ(XT ) AT ] = Eλ ϕ(XT ) (LT − VTλ ) (3.16) 2

on the new probability space (Ω, F , Pλ ), where gλ (θ) = e−λθ g(θ) and

 T ∂ gλ (θ) Zt− θ3 1 λ dJ˜λ . VT = ˜ θ (Xt− ) B 0 R0 gλ (θ) ∂θ

Since 2 ˆ dJ˜λ = dJ˜ + (1 − e−λθ ) dJ,



∂ gλ (θ) Zt θ3 2λZt θ 4 1 ∂ g(θ) Zt θ3 1 =− + , ˜θ (Xt ) ˜θ (Xt ) ˜θ (Xt ) gλ (θ) ∂θ g(θ) ∂θ B B B

and the condition (2.3) implies that

 T  T 2 1 ∂ g(θ) Zt θ 3 2Zt θ 4 −λθ 2 ˆ (1 − e−λθ ) dJˆ = −λ e dJ ˜θ (Xt ) ˜ B 0 R0 g(θ) ∂θ 0 R0 Bθ (Xt ) from the integration by parts, we can get VTλ = VT − λKT . Then, it holds that ∂x (E [ϕ(XT )]) #  = ∂x E ϕ(XT ) =



+∞

0



+∞

+∞ 0

e−λAT dλ AT

  MTλ ∂x Eλ [ϕ(XT ) AT ] dλ

$

= ∂x



0

+∞

MTλ Eλ [ϕ(XT ) AT ] dλ

MTλ Eλ [ϕ(XT ) (LT − VT + λKT )] dλ 0 # $ # $  +∞  +∞ −λAT −λAT = E ϕ(XT ) (LT − VT ) e dλ + E ϕ(XT ) KT λe dλ 0 0 #

$ LT − VT KT = E ϕ(XT ) + 2 . AT AT =

Here, we have used the Fubini theorem in the second equality, the third equality can be obtained by the commutativity between the derivative ∂x and the integral with respect to dPλ ⊗ dλ, which comes from the dominated convergence theorem, ϕ ∈ C02 (R) and the independence of MTλ and AT on the variable x. In the forth equality, we have used (3.16). The proof of Theorem 1 is complete. 

216

A. Takeuchi

Remark 3.3. Consider the equation (2.6) where only the condition 2

inf (A1 (y)) > 0

(3.17)

y∈R

is satisfied. Then, we can get from Lemma 3.2 that # $   LT ∂x E[ϕ(XT )] = E ϕ(XT ) T

(3.18)

for ϕ ∈ C02 (R), and the equality (3.18) can be extended to the case of ϕ ∈ F via the standard density argument because of the existence of the smooth density for XT under (3.17). Remark that the conditions (2.1), (2.2) and (2.3) on the measure dν are not necessary. In case of the diffusion process given by dXt = A0 (Xt ) dt + A1 (Xt ) ◦ dWt with the uniform ellipticity on A1 (y), the formula (3.18) is exactly the same as the Bismut-Elworthy-Li formula (cf. [3, 5, 11]).  Remark 3.4. Consider the equation (2.6) where only the condition 2

inf inf (∂θ Bθ (y)) > 0

(3.19)

y∈R θ∈R0

is satisfied. We have already obtained in Lemma 3.3 that ) 3 4* ∂x E ϕ(XT ) A˜T = −E [ϕ(XT ) VT ]

(3.20)

for ϕ ∈ C02 (R), and the equality (3.20) can be extended to the case of ϕ ∈ F via the standard density argument because of the existence of the smooth density for XT under (3.19). By a similar argument to the proof of Theorem 1, we can get rid of the effect of A˜T from the left-hand side of (3.20). Thus, we have    KT VT + 2 ∂x (E[ϕ(XT )]) = E ϕ(XT ) − A˜T A˜ T

for ϕ ∈ F. In this case, the assumptions (2.1), (2.2) and (2.3) on the measure dν are essential.  Remark 3.5. As seen in Theorem 1, Remark 3.3 and Remark 3.4, we have already obtained the sensitivity formula for ϕ ∈ F in the three forms below: #

$ LT − VT KT ∂x (E[ϕ(XT )]) = E ϕ(XT ) + 2 , AT AT # $ LT ∂x (E[ϕ(XT )]) = E ϕ(XT ) , T    VT KT ∂x (E[ϕ(XT )]) = E ϕ(XT ) − + 2 . A˜T A˜T

Sensitivity Analysis for Jump Processes

217

It seems to be a natural question which minimizes the variance under the condition (2.5). In order to avoid lengthy expression, write αT =

KT LT − VT + 2 , AT AT

βT =

LT , T

γT = −

KT VT + 2 . ˜ AT A˜T

We shall consider the minimization problem of V [aαT + bβT + cγT ] under a, b, c ∈ R with a + b + c = 1. It is a routine work to see that V [aαT + bβT + cγT ] ≥ V [a0 αT + b0 βT + c0 γT ] ,

where C[X, Y ] = E [(X − E[X])(Y − E[Y ])], a0 = b0 =

C[αT − γT , βT − γT ] C[βT − γT , γT ] − V[βT − γT ] C[αT − γT , γT ] 2

,

2

,

V[αT − γT ] V[βT − γT ] − {C[αT − γT , βT − γT ]} C[αT − γT , βT − γT ] C[αT − γT , γT ] − V[αT − γT ] C[βT − γT , γT ] V[αT − γT ] V[βT − γT ] − {C[αT − γT , βT − γT ]}

and c0 = 1 − a0 − b0 . Thus, we have

∂x (E [ϕ (XT )]) = E [ϕ(XT ) (a0 αT + b0 βT + c0 γT )] ,

which minimizes the variance.



Remark 3.6. Our method using the martingale property based on the Kolmogorov backward equation can be also applied to the sensitivity analysis on the secondorder derivative (Gamma) in the initial point, and the other parameters (e.g., Vega) governing the process {Xt ; t ∈ [0, T ]} (cf. [19]). 

4. Examples Example 1 (L´evy measures). Let a, b, c > 0 and 0 < β < 1. Define , dν = a exp(bθ) I(θ0) |θ|−1−β dθ,

which is a special case of CGMY processes ([6]). It is easy to see that the measure dν satisfies the conditions (2.1), (2.2) and (2.3). In particular, the L´evy measures of inverse Gaussian processes (b = +∞, β = 1/2) and tempered stable processes (b = +∞, 0 < β < 1) are in our position.  Example 2 (L´evy processes). Let X = {Xt ; t ∈ [0, T ]} be the R-valued L´evy process given by  t Xt = x + γ t + σ Wt + δ θ dJ, 0

where x, γ ∈ R and (σ, δ) ∈ [0, +∞) × [0, +∞). (1) If σ > 0 and δ > 0, then we have ′  T  g(θ) θ2 WT ˜ LT = , VT = dJ, σ δ g(θ) 0 R0

R0

KT =

 T 0

R0

2θ3 dJ. δ

218

A. Takeuchi Hence, the sensitivity formula in this case is as follows:

$ # LT − VT KT + 2 . ∂x (E [ϕ (XT )]) = E ϕ(XT ) AT AT

(2) If σ > 0, then we have

# $ WT ∂x (E [ϕ (XT )]) = E ϕ(XT ) . σT

(3) If δ > 0, then we have



∂x (E [ϕ (XT )]) = E ϕ(XT )



KT VT + 2 − A˜T A˜T



.



Example 3 (geometric L´evy processes). Let (x, γ, σ, δ) ∈ (0, +∞) × R × [0, +∞) × [0, +∞). Define the R-valued L´evy process {xt ; t ∈ [0, T ]} by  t xt = γ t + σ Wt + δ θ dJ. 0

R0

The geometric L´evy process X = {Xt ; t ∈ [0, T ]} is defined by Xt = x exp(xt ).

From the Itˆo formula, we can easily see that . :    dXt = γ + exp(δθ) − 1 − δθ dν Xt dt |θ|≤1

+ σ Xt ◦ dWt +



R0



 exp(δθ) − 1 Xt− dJ.

Thus, the coefficients of the above equation doesn’t satisfy the condition (2.5) any more. However, since   Xt 1 ′ ∂x (ϕ(Xt )) = ϕ (Xt ) = ∂X ((ϕ ◦ h)(X + xt ))  , x x X=log x

where h(y) = exp(y), we can compute the sensitivity formula for bounded functions ϕ ∈ F by using the result stated in Example 2. 

References [1] D. Applebaum: L´evy processes and stochastic calculus, 2nd ed. Cambridge Univ. Press (2009). [2] V. Bally, M.-P. Bavouzet, M. Messaoud: Integration by parts formula for locally smooth laws and applications to sensitivity computations, Ann. Appl. Probab. 17, 33–66 (2007). [3] J.M. Bismut: Martingales, the Malliavin calculus and hypoellipticity under general H¨ ormander’s conditions, Z. Wahrsch. Verw. Gebiete 56, 469–505 (1981).

Sensitivity Analysis for Jump Processes

219

[4] J.M. Bismut: Calcul des variations stochastique et processus de sauts, Z. Wahrsch. Verw. Gebiete 63, 147–235 (1983). [5] J.M. Bismut: Large deviations and the Malliavin calculus, Birkh¨ auser (1984). [6] P. Carr, H. Geman, D.B. Madan, M. Yor: The fine structure of asset returns: an empirical investigation, Journal of Business 75, 305–332 (2002). [7] T.R. Cass, P.K. Friz: The Bismut-Elworthy-Li formula for jump-diffusions and applications to Monte Carlo methods in finance, available at arXiv:math/0604311v3 (2007). [8] M.H. A. Davis, M.P. Johansson: Malliavin Monte Carlo Greeks for jump diffusions, Stoch. Processes Appl. 116, 101–129 (2006). [9] Y. El-Khatib, N. Privault: Computations of Greeks in a market with jumps via the Malliavin calculus, Finance Stoch. 8, 161–179 (2004). [10] R. Elliott, A.H. Tsoi: Integration by parts for Poisson processes, J. Multivariate Anal. 44, 179–190 (1993). [11] K.D. Elworthy, X.M. Li: Formulae for the derivatives of heat semigroups, J. Funct. Anal. 125, 252–286 (1994). [12] E. Fourni´e, J.M. Lasry, J. Lebuchoux, P.L. Lions, N. Touzi: Applications of Malliavin calculus to Monte Carlo methods in finance, Finance Stoch. 3, 391–412 (1999). [13] T. Fujiwara, H. Kunita: Stochastic differential equations of jump type and L´evy processes in diffeomorphisms group, J. Math. Kyoto Univ. 25, 71–106 (1985). [14] I.I. Gihman, A.V. Skorokhod: Stochastic differential equations, Springer-Verlag (1972). [15] R. Kawai, A. Takeuchi: Greeks formulae for an asset price model with gamma processes, to appear in Math. Finance (2011). [16] T. Komatsu, A. Takeuchi: Generalized H¨ ormander theorem for non-local operators, In Recent Developments in Stochastic Analysis and Related Topics; S. Albeverio et al. Eds.; World Scientific, Singapore (2004), 234–245. [17] D. Nualart: The Malliavin calculus and related topics, 2nd edition, Springer-Verlag (2006). [18] A. Takeuchi: The Malliavin calculus for SDE with jumps and the partially hypoelliptic problem, Osaka J. Math. 39, 523–559 (2002). [19] A. Takeuchi: The Bismut-Elworthy-Li type formulae for stochastic differential equations with jumps, J. Theoret. Probab. 23, 576–604 (2010). Atsushi Takeuchi Department of Mathematics Osaka City University Sugimoto 3-3-138 Osaka, 558-8585, Japan e-mail: [email protected]

Progress in Probability, Vol. 65, 221–252 c 2011 Springer Basel AG 

Quantifying Model Uncertainties in Complex Systems Jiarui Yang and Jinqiao Duan Abstract. Uncertainties are abundant in complex systems. Appropriate mathematical models for these systems thus contain random effects or noises. The models are often in the form of stochastic differential equations, with some parameters to be determined by observations. The stochastic differential equations may be driven by Brownian motion, fractional Brownian motion, or L´evy motion. After a brief overview of recent advances in estimating parameters in stochastic differential equations, various numerical algorithms for computing parameters are implemented. The numerical simulation results are shown to be consistent with theoretical analysis. Moreover, for fractional Brownian motion and α-stable L´evy motion, several algorithms are reviewed and implemented to numerically estimate the Hurst parameter H and characteristic exponent α. Mathematics Subject Classification (2000). Primary: 37L55, 35R60; Secondary: 58B99, 35L20. Keywords. Model uncertainty, parameter estimation, Brownian motion (BM), fractional Brownian motion (fBM), L´evy motion (LM), Hurst parameter, characteristic exponent, stochastic differential equations (SDEs).

1. Introduction Since random fluctuations are common in the real world, mathematical models for complex systems are often subject to uncertainties, such as fluctuating forces, uncertain parameters, or random boundary conditions [89, 55, 44, 121, 122, 125]. Uncertainties may also be caused by the lack of knowledge of some physical, chemical or biological mechanisms that are not well understood, and thus are This work was partially supported by NSF grants 0620539 and 0731201, and by an open research grant of the State Key Laboratory for Nonlinear Mechanics, China.

222

J. Yang and J. Duan

not appropriately represented (or missed completely) in the mathematical models [19, 65, 97, 123, 124]. Although these fluctuations and unrepresented mechanisms may be very small or very fast, their long-term impact on the system evolution may be delicate [7, 55, 44] or even profound. This kind of delicate impact on the overall evolution of dynamical systems has been observed in, for example, stochastic bifurcation [25, 17, 55], stochastic resonance [59], and noise-induced pattern formation [44, 14]. Thus taking stochastic effects into account is of central importance for mathematical modeling of complex systems under uncertainty. Stochastic differential equations (SDEs) are appropriate models for many of these systems [7, 27, 108, 122]. For example, the Langevin type models are stochastic differential equations describing various phenomena in physics, biology, and other fields. SDEs are used to model various price processes, exchange rates, and interest rates, among others, in finance. Noises in these SDEs may be modeled as a generalized time derivative of some distinguished stochastic processes, such as Brownian motion (BM), L´evy motion (LM) or fractional Brownian motion (fBM) [36]. Usually we choose different noise processes according to the statistical property of the observational data. For example, if the data has long-range dependence, we consider fractional Brownian motion rather than Brownian motion. If the data has considerable discrepancy with Gaussianity or normality, L´evy motion may be an appropriate choice. In building these SDE models, some parameters appear, as we do not know certain quantities exactly. Based on the choice of noise processes, different mathematical techniques are needed in estimating the parameters in SDEs with Brownian motion, L´evy motion, or fractional Brownain motion. In this article, we are interested in estimating and computing parameters contained in stochastic differential equations, so that we obtain computational models useful for investigating complex dynamics under uncertainty. We first review recent theoretical results in estimating parameters in SDEs, including statistical properties and convergence of various estimates. Then we develop and implement numerical algorithms in approximating these parameters. Theoretical results on parameter estimations for SDEs driven by Brownian motion are relatively well developed ([5, 28, 48, 99]), and various numerical simulations for these parameter estimates ([1, 3, 99, 62]) are implemented. So, in Section 2 below, we do not present such numerical results. Instead, we will concentrate on numerical algorithms for parameter estimations in SDEs driven by fractional Brownian motion and L´evy motion in Sections 3 and 4, respectively. This paper is organized as follows. In Section 2, we consider parameter estimation for SDEs with Brownian motion Bt . We present a brief overview of some available techniques on estimating parameters in these stochastic differential equations with continuous-time or discrete-time observations. In fact, we present results about how to estimate parameters in diffusion terms and drift terms, given continuous observations and discrete observations, respectively.

Quantifying Uncertainties in Complex Systems

223

In Section 3, we consider parameter estimation for SDEs driven by fractional Brownian motion BtH with Hurst parameter H. After discussing basic properties of fBM, we consider parameter estimation methods for Hurst parameter H from given fBM data. Then, we compare the convergence rate of each method by comparing estimates computed with hypothetic data. Unlike the case of SDEs with Brownian motion, there is no general formula for the estimate of the parameter in the drift (or diffusion) coefficient of a stochastic differential equation driven by fBM. We discuss different estimates associated with different models and discuss the statistical properties respectively. We develop and implement numerical simulation methods for these estimates. Finally, in Section 4, for stochastic differential equation with (non-Gaussian) α-stable L´evy motion Lα t , we consider estimates and their numerical implementation for parameter α and other parameters in the drift or diffusion coefficients.

2. Quantifying uncertainties in SDEs driven by Brownian motion In this section, we consider a scalar diffusion process Xt ∈ Rd , 0 ≤ t ≤ T satisfying the following stochastic differential equation dXt = µ(θ, t, Xt )dt + σ(ϑ, t, Xt )dBt , X0 = ζ

(1)

where Bt is an m-dimensional Brownian motion, θ ∈ Θ a compact subset of Rp and ϑ ∈ Ξ a compact subset of Rq are unknown parameters which are to be estimated on the basis of observations. Here µ : Θ×[0, T ]×Rd → Rd , the drift coefficient, and σ : Ξ × [0, T ] × Rd → Rd×m , the diffusion coefficient, are usually known functions but with unknown parameters θ and ϑ. Some remarks are in order here. • Under local Lipschitz and the sub-linear growth conditions on the coefficients µ and σ, there exists a unique strong solution of the above stochastic differential equation (see [77] or [85]) and this is a universal assumption for all results we discuss below. • The diffusion coefficient σ is almost surely determined by the process, i.e., it can be estimated without any error if observed continuously throughout a time interval (see [47] and [30]). • The diffusion matrix defined by Σ(ϑ, t, Xt ) ≡ σ(ϑ, t, Xt )σ(ϑ, t, Xt )T plays an important role on parameter estimation problems. 2.1. How to estimate parameters given continuous observation Since it is not easy to estimate parameters θ and ϑ at the same time, usually we simplify our model by assuming one of those parameters is known and then estimate the other. Moreover, instead of representing the results of all types of diffusion processes, we choose to present the conclusion of the most general one, such as, we prefer the nonhomogeneous case rather than the homogeneous one or the nonlinear one rather than the linear one.

224

J. Yang and J. Duan

2.1.1. Parameter estimation of diffusion terms with continuous observation. We assume that the unknown parameter θ in the drift coefficient is known. Then our model can be simplified as dXt = µ(t, X)dt + σ(ϑ, t, Xt )dBt , X0 = ζ

(2)

Remarks • Different with the model (1), the drift coefficient µ(t, X) in model (2) is possibly unknown and maybe related to the whole past of process X instead of Xt . In this case, our model can be easily extended to the non-Markovian case which is more general than case (1). • If µ is depending on the unknown parameter ϑ in model (2), we can also prove the local asymptotic mixed normality property for the maximum likelihood estimate (MLE) when µ(t, X) = µ(ϑ, Xt ) and σ(ϑ, t, Xt ) = σ(ϑ, Xt ) (see [38]). If the diffusion matrix Σ(ϑ, t, Xt ) is invertible, then define a family of contrasts by n 4 1 3 log det Σ(ϑ, tni−1 , Xtni−1 ) + (Xin)T Σ(ϑ, tni−1 , Xtni−1 )−1 Xin , (3) U n (ϑ) = n i=1 where

1 (Xtni − Xtni−1 ), δin = tni − tni−1 , for 1 ≤ i ≤ n, δin and tni is an appropriate partition of [0, T ]. However, this assumption does not always hold. So, we consider a more general class of contrasts of the form Xin =

n

U n (ϑ) =

1 f (Σ(ϑ, tni−1 , Xtni−1 ), Xin ), n i=1

(4)

where f should satisfy certain conditions to obtain the asymptotic property and consistency property for the estimate generated by these contrasts below (see [45]). Let ϑ?n be a minimum contrast estimate associated with U n , i.e., ϑ?n satisfies the following equation U n (ϑ?n ) = min U n (ϑ). ϑ∈Ξ

Under some smoothness assumptions on the coefficient µ and θ, empirical sampling measure assumption on the sample times tni , and identifiability assumption on the law of the solution of (2), Genon-Catalot and Jacod [47] have proved that the √ ? estimate ϑ?n has a local asymptotic mixed normality property, i.e., n(ϑ n − ϑ0 ) where ϑ0 is the true value of the parameter converges in law to N(0, S). Remarks • We do not include the drift coefficient µ in the contrast U n (ϑ) because it is possibly unknown. Even if it is known, we still do not want it involved since it is a function of the whole past of X and thus is not observable.

Quantifying Uncertainties in Complex Systems

225

• If the diffusion matrix Σ is invertible, it can be proven that the contrast of form (3) is optimal in the class of contrasts of type (4). 2.1.2. Parameter estimation of drift terms with continuous observations. We assume that the unknown parameter ϑ in the diffusion coefficient is known. Then the model (1) can be simplified as dXt = µ(θ, t, Xt )dt + σ(t, Xt )dBt , X0 = ζ.

(5)

Since no good result for above general model exists, we introduce the result for the following nonhomogeneous diffusion process instead. Consider a real-valued diffusion process {Xt , t ≥ 0} satisfying the following stochastic differential equation: dXt = µ(θ, t, Xt )dt + dBt , X0 = ζ,

(6)

where the drift coefficient function µ is assumed to be nonanticipative. Denote the observation of the process by X0T := {Xt , 0 ≤ t ≤ T } and let PθT be the measure generated by the process X0T . Then the Radon-Nicodym derivative (likelihood function) of PθT with respect to PθT0 where θ0 is the true value of the parameter θ is given by (see [80]) LT (θ) := (dPθT /dPθT0 )(X0T ) & % T  1 T 2 [µ (θ, t, Xt ) − µ2 (θ0 , t, Xt )]dt . = exp [µ(θ, t, Xt ) − µ(θ0 , t, Xt)]dXt − 2 0 0

So we can get the Maximal Likelihood Estimate (MLE) defined by θ?T := argsupθ∈Θ LT (θ).

Then we can show that the MLE is strongly consistent, i.e., θ?T → θ0 Pθ0 − a.s. as T → ∞, and converge to a normal distribution (see Chapter 4 in [13] for more details). Remarks • In [13], Bishwal also proves that the MLE and a regular class of Bayes estimates (BE) are asymptotically equivalent. • By applying an increasing transformation as described in [1],  X du Yt = g(X) ≡ , (7) σ(u) we can transform the diffusion process Xt defined by dXt = µ(θ, Xt )dt + σ(Xt )dBt into another diffusion process Yt defined by dY˜t = µ ˜(θ, Y˜t )dt + dBt , where µ ˜ (θ, y) ≡

µ(g −1 (y), θ) 1 ∂σ(g −1 (y)) − . σ(g −1 (y)) 2 ∂y

(8)

226

J. Yang and J. Duan Then, we can get the MLE of process Xt by calculating the MLE of process Yt according to what we learned in this section (see [1] or [2] for more details).

2.2. How to estimate parameters given discrete observation Given the practical difficulty in obtaining a complete continuous observation, we now discuss parameter estimations with discrete observation. 2.2.1. Parameter estimation of drift terms with discrete time. In this section, we assume that the unknown parameter ϑ in the diffusion coefficient σ is known. Then the model (1) can be simplified as dXt = µ(θ, t, Xt )dt + σ(t, Xt )dBt , X0 = ζ.

(9)

Ideally, when the transition densities p(s, x, t, y; θ) of X are known, we can use the log likelihood function ln (θ) =

n 

log p(ti−1 , Xti−1 , ti , Xti ; θ),

i=1

to compute the LME θ? which is strongly consistent and asymptotically normally distributed. (see [12] and [26], [79] and [109]).

If the transition densities of X are unknown, instead of computing the log likelihood function ln (θ), we would like to use approximate log-likelihood function which, under some regularity conditions (see [56]), is given by  T  1 T µ2 (θ, t, Xt ) µ(θ, t, Xt ) dX − dt lT (θ) = t σ 2 (t, Xt ) 2 0 σ 2 (t, Xt ) 0 to approximate the log-likelihood function based on continuous observations (see [103]). Then, using an Itˆ o type approximation for the stochastic integral we can obtain ˜ln (θ) =

n  µ(θ, ti−1 , Xti−1 ) i=1

σ 2 (ti−1 , Xti−1 )

(Xti − Xti−1 )

n



1  µ2 (θ, ti−1 , Xti−1 ) (ti − ti−1 ). 2 i=1 σ 2 (ti−1 , Xti−1 )

Thus, the maximizer of ˜ln (θ) provides an approximate maximum likelihood estimate (AMLE). In 1992, Yoshida [130] proved that the AMLE is weakly consistent and asymptotically normally distributed when the diffusion is homogeneous and ergodic. In [13], Bishwal got the similar result for the nonhomogeneous case with drift function µ(θ, t, X) = θf (t, Xt ) for some smooth functions f(t,x). Moreover, he measured the loss of information using several AMLEs according to different approximations to lT (θ).

Quantifying Uncertainties in Complex Systems

227

2.2.2. Parameter estimation of diffusion terms (and/or drift terms) with discrete observation. In previous sections, we always assume one of those parameters is known and then estimate the other one. In this section, I want to include the situation when both θ and ϑ are unknown and how to estimate them based on the discrete observation of the diffusion process at the same time. Suppose we are considering the real-valued diffusion process Xt satisfying the following stochastic differential equation dXt = µ(θ, Xt )dt + σ(ϑ, Xt )dBt .

(10)

Denote the observation times by t0 = 0, t1 , t2 , . . . , tNT , where NT is the smallest integer such that τNT +1 > T . In this section, we mainly consider three cases of estimating β = (θ, ϑ), jointly, β = θ with ϑ known and β = ϑ with θ known. ˆ ¯ In √ regular circumstances, the estimate β converges in probability to some β and ¯ converges in law to N (0, Ωβ ) as T tends to infinity, where β0 is the T (βˆ − β) true value of the parameter. For simplicity, we set the law of the sampling intervals ∆n = τn − τn−1 as ∆ = ǫ∆0 ,

(11)

where ∆0 has a given finite distribution and ǫ is deterministic. Remark. We are not only studying the case when the sampling interval is fixed, i.e., V ar[∆0 ] = 0, but also the continuous observation case, i.e., ǫ = 0 and the random sampling case. Let h(y1 , y0 , δ, β, ǫ) denote an r-dimensional vector function which consists of r moment conditions of the discretized stochastic differential equation (10) (please see [51] or [54] for more details). Moreover, this function also satisfies E∆n ,Yn ,Yn−1 [h(Yn , Yn−1 , ∆n , β, ǫ)] = 0, where the expectation is taken with respect to the joint law of (∆n , Yn , Yn−1 ). By the Law of Large Numbers, E[h(Yn , Yn−1 , ∆n , β, ǫ)] may be estimated by the sample average defined by mT (β) ≡ NT−1

N T −1

h(Yn , Yn−1 , ∆n .β, ǫ).

(12)

n=1

Then we can obtain an estimate βˆ by minimizing a quadratic function QT (β) ≡ mT (β)′ WT mT (β),

(13)

where WT is a r×r positive definite weight matrix and this method is called Generalized Method of Moments (GMM). In [51], Hansen proved the strong consistency and asymptotic normality of GMM estimate, i.e., √ N (θˆ − θ) → N (0, V ),

when θ = ϑ and WT satisfied certain conditions. Mykland used this technique to obtain the closed form for the asymptotic bias but sacrificed the consistency of the estimate.

228

J. Yang and J. Duan

3. Quantifying uncertainties in SDEs driven by fractional Brownian motion Colored noise, or noise with non-zero correlation in time, are common in physical, biological and engineering sciences. One candidate for modeling colored noise is fractional Brownian motion [36]. 3.1. Fractional Brownian motion Fractional Brownian motion (fBM) was introduced within a Hilbert space framework by Kolmogorov in 1940 in [73], where it was called Wiener Helix. It was further studied by Yaglom in [127]. The name fractional Brownian motion is due to Mandelbrot and Van Ness, who in 1968 provided in [84] a stochastic integral representation of this process in terms of a standard Brownian motion. Definition 3.1 (Fractional Brownian motion [96]). Let H be a constant belonging to (0, 1). A fractional Brownian motion (fBM) (B H (t))t≥0 of Hurst index H is a continuous and centered Gaussian process with covariance function E[B H (t)B H (s)] =

1 2H (t + s2H − |t − s|2H ). 2

By the above definition, we see that a standard fBM B H has the following properties: 1. B H (0) = 0 and E[B H (t)] = 0 for all t ≥ 0. 2. B H has homogeneous increments, i.e., B H (t + s) − B H (s) has the same law of B H (t) for s, t ≥ 0. 3. B H is a Gaussian process and E[B H (t)2 ] = t2H , t ≤ 0, for all H ∈ (0, 1). 4. B H has continuous trajectories. Using the method presented in [23, 24], we can simulate sample paths of fractional Brwonian motion with different Hurst parameters (see Figure 1). 2

2

2

1.5

1.5

1.5

0

−0.5

CuI

1 0.5

CuI

1 0.5

CuI

1 0.5

0

−0.5

0

−0.5

−1

−1

−1

−1.5

−1.5

−1.5

−2

0

5

10

15 u

20

25

30

−2

0

5

10

15 u

20

25

30

−2

0

5

10

15 u

20

25

30

Figure 1. Three sample paths of fBM with Hurst parameter H = 0.25, 0.5, 0.9 For H = 1/2, the fBM is then a standard Brownian motion. Hence, in this case the increments of the process are independent. On the contrary, for H = 1/2 the increments are not independent. More precisely, by the definition of fBM, we

Quantifying Uncertainties in Complex Systems

229

know that the covariance between B H (t + h) − B H (t) and B H (s + h) − B H (s) with s + h ≤ t and t − s = nh is ' ( 1 ρH (n) = h2H (n + 1)2H + (n − 1)2H − 2n2H . 2 In particular, we obtain that the two increments of the form B H (t + h) − B H (t) and B H (t + 2h) − B H (t + h) are positively correlated for H > 1/2, while they are negatively correlated for H < 1/2. In the first case the process presents an aggregation behavior and this property can be used in order to describe “cluster” phenomena (systems with memory and persistence). In the second case it can be used to model sequences with intermittency and antipersistence. From the above description, we can get a general ideal that the Hurst parameter H plays an important role on how respective fBM behaves. So, it should be considered as an extra parameter when we estimate others in the coefficients of the SDE driven by fBM. Considering the further computation, we would like to introduce one more useful property of fBM. Definition 3.2 (Self-similarity). A stochastic process X = {Xt , t ∈ R} is called b-self-similar or satisfies the property of self-similarity if for every a > 0 there exists b > 0 such that Law(Xat , t ≥ 0) = Law(ab Xt , t ≥ 0).

Note that “Law = Law” means that the two processes Xat and ab Xt have the same finite-dimensional distribution functions, i.e., for every choice t0 , . . . , tn in R, P (Xat0 ≤ x0 , . . . , Xatn ≤ xn ) = P (ab Xt0 ≤ x0 , . . . , ab Xtn ≤ xn ). for every x0 , . . . , xn in R. Since the covariance function of the fBM is homogeneous of order 2H, we obtain that B H is a self-similar process with Hurst index H, i.e., for any constant a > 0, the processes B H (at) and aH B H (t) have the same distribution law. 3.2. How to estimate Hurst parameter H Let’s start with the simplest case: dXt = dB H (t),

i.e., Xt = B H (t) t ≥ 0,

where {B H (t), t ≥ 0} is a fBM with Hurst parameter H ∈ (0, 1). Now, our question is how to estimate Hurst parameter H given observation data X0 , X1 , . . . , XN . For a parameter estimation of Hurst parameter H, we need an extra ingredient, fractional Gaussian noise (fGn). Definition 3.3 (Fractional Gaussian noise, [110]). Fractional Gaussian noise (fGn) {Yi , i ≥ 1} is the increment of fractional Brownian motion, namely Yi = B H (i + 1) − B H (i), i ≥ 1.

230

J. Yang and J. Duan

Remark. It is a mean zero, stationary Gaussian time series whose autocovariance function is given by 1 ρ(h) = E(Yi Yi+h ) = {(h + 1)2H − 2h2H + |h − 1|2H }, h ≥ 0. 2 An important point about ρ(h) is ρ(h) ∼ H(2H − 1)h2H−2 , as h → ∞,

when H = 1/2. Since ρ(h) = 0 for h ≥ 1 when H = 1/2, the Xi ’s are white noise in this case. The Xi ’s, however, are positively correlated when 21 < H < 1, and we say that they display long-range dependence (LRD) or long memory. From the expression of fGn, we know it is the same to estimate the Hurst parameter of fBM as to estimate the Hurst parameter of the respective fGn. Here, we introduce 4 different methods for measuring the Hurst parameter. Measurements are given on artificial data and the results of each method are compared in the end. However, the measurement techniques used in this paper can only be described briefly but references to fuller descriptions with mathematical details are given. 3.2.1. R/S method. The R/S method is one of the oldest and best known techniques for estimating H. It is discussed in detail in [83] and [10], p. 83–87. n For a time series {Yt : t = 1, 2, . . . , N } with partial sums given by Z(n) = i=1 Yi and the sample variance given by n

S 2 (n) =

1 1  2 Yi − Z(n)2 , n−1 n(n − 1) i=1

then the R/S statistic, or the rescaled adjusted range, is given by: #



$ 1 t t R (n) = max Z(t) − Z(n) − min Z(t) − Z(n) 1≤t≤n S S(n) 1≤t≤n n n For fractional Gaussian noise, E[R/S(n)] ∼ CH nH ,

as n → ∞, where CH is another positive, finite constant not dependent on n. The procedure to estimate H is therefore as follows. For a time series of length N , subdivide the series into K blocks with each of size n = N/K. Then, for each lag n, compute R/S(ki , n), starting at points ki = iN/K + 1, i = 1, 2, . . . , K − 1. In this way, a number of estimates of R/S(n) are obtained for each value of n. For values of n approaching N , one gets fewer values, as few as 1 when n ≥ N − N/K. Choosing logarithmically spaced values of n, plot log[R/S(ki , n)] versus log n and get, for each n, several points on the plot. This plot is sometimes called the pox plot for the R/S statistic. The parameter H can be estimated by fitting a line to the points in the pox plot. There are several disadvantages with this technique. Most notably, there are more estimates of the statistic for low values of n where the statistic is affected most

Quantifying Uncertainties in Complex Systems

231

heavily by short range correlation behavior. On the other hand, for high values of n there are too few points for a reliable estimate. The values between these high and low cut off points should be used to estimate H but, in practice, often it is the case that widely differing values of H can be found by this method depending on the high and low cut off points chosen. To modify the R/S statistic, we can use a weighted sum of autocovariance instead of the sample variance. Details can be found in [82]. 3.2.2. Aggregated variance. Given a time series {Yt : t = 1, 2, . . . , N }, divide this into blocks of length m and aggregate the series over each block. Y (m) (k) :=

1 m

km 

Yi ,

k = 1, 2, . . . , [N/m].

i=(k−1)m+1

We compute its sample variance,  (m) = V arY

N/m 1  (m) (Y (k) − Y )2 . N/m k=1

where Y =

N

i=1

Yi

. N is the sample mean. The sample variance should be asymptotically proportional to m2H−2 for large N/m and m. Then, for successive values of m, the sample variance of the aggregated series is plotted versus m on a log-log plot. So we can get the estimate of H by computing the gradient of that log-log plot. However, jumps in the mean and slowly decaying trends can severely affect this statistic. One technique to combat this is to difference the aggregated variance and work instead with   (m+1) − V arY (m) . V arY 3.2.3. Variance of residuals. This method is described in more detail in [101]. Take the series {Yt : t = 1, 2, . . . , N } and divide it into blocks of length m. Within each (k−1)m+t block calculate partial sums: Zk (t) = i=(k−1)m+1 Yi , k = 1 . . . N/m, t = 1 . . . m. For each block make a least squares fit to a line ak + bk t. Subtract this line from the samples in the block to obtain the residuals and then calculate their variance m

Vk =

1  (Zk (t) − ak − bk t)2 . m t=1

The variance of residuals is proportional to m2H . For the proof in the Gaussian case, see [118]. This variance of residuals is computed for each block, and the median (or average) is computed over the blocks. A log-log plot versus m should follow a straight line with a slope of 2H.

232

J. Yang and J. Duan

3.2.4. Periodogram. The periodogram is a frequency domain technique described in [49]. For a time series {Yt : t = 1, 2, . . . , N }, it is defined by N 2  1  ijλ  I(λ) = Yj e  ,  2πN j=1

where λ is the frequency. In the finite variance case, I(λ) is an estimate of the spectral density of Y , and a series with long-range dependence will have a spectral density proportional to |λ|1−2H for frequencies close to the origin. Therefore, the log-log plot of the periodogram versus the frequency displays a straight line with a slope of 1 − 2H.

3.2.5. Results on simulated data. In this subsection, we would like to use artificial data to check the robustness of above techniques and compare the result in the end. For each of the simulation methods chosen, traces have been generated. Each trace is 10,000 points of data. Hurst parameters of 0.65 and 0.95 have been chosen to represent a low and a high level of long-range dependence in data. From Figure 2 and Figure 3, we can see that the Variance of Residual Method and R/S have the most accurate result. The Modified Aggregated Variance Method improved a little bit over the original one, but both of them still fluctuate too much. 3.3. How to estimate parameters in SDEs driven by fBM After we discuss how to estimate the Hurst parameter of a series of artificial fBM data, now we want to concern how to estimate the parameters of the linear/nonlinear stochastic differential equation(s) driven by fBM. The coefficients in the stochastic differential equation could be deterministic or random, linear or nonlinear. No general results are available. So some specific statistical results will be discussed below according to what kind of specified models we deal with. 3.3.1. Preparation. The main difficulty in dealing with a fBm is that it is not a semimartingale when H = 21 and hence the results from the classical stochastic integration theory for semimartingales can not be applied. So, we would like to introduce the following integral transformation which can transform fBM to martingale firstly and it will be a key point in our development below. For 0 < s < t ≤ T , denote (1/2)−H kH (t, s) = κ−1 (t − s)(1/2)−H , H s

(14)

κH = 2HΓ(3/2 − H)Γ(H + 1/2),

(15)

2−2H wtH = λ−1 ; H t

(16)

MtH =



0

λH =

t

kH (t, s)dBsH .

2HΓ(3 − 2H)Γ(H + 1/2) , Γ(3/2 − H)

(17)

Quantifying Uncertainties in Complex Systems

233

5

4

3

Iq

2

1

0

−1

−2 1

1.1

1.2

1.3

1.4

1.5 O

Aggregated Variance Method Periodogram Method Variance of Residual Method

1.6

1.7

1.8

1.9

2 x 10 4

Modified Aggregated Variance Method R/S actual value

Figure 2. Numerical estimation of the Hurst parameter H of fBM: Actual value H = 0.65 Then the process M H is a Gaussian martingale (see [78] and [92]), called the fundamental martingale with variance function wH . 3.3.2. Parameter estimation for a fractional Langevin equation. Suppose {Xt, t ≥ 0} satisfies the following stochastic differential equation  t Xt = θ Xs ds + σBtH ; 0 ≤ t ≤ T, 0

2

where θ and σ are unknown constant parameters, BtH is a fBM with Hurst parameter H ∈ [1/2, 1]. Denote the process Z=(Zt , t ∈ [0, T ]) by  t Zt = kH (t, s)dXs .

(18)

0

Then we can prove that process Z is a semimartingale associated to X with following decomposition (see [69])  t Zt = θ Q(s)dwsH + σMtH , (19) 0

234

J. Yang and J. Duan

6

5

4

Iq

3

2 1

0

−1

−2 1

1.1

1.2

1.3

1.4

1.5 O

Aggregated Variance Method Periodogram Method Variance of Residual Method

1.6

1.7

1.8

1.9

2 x 10 4

Modified Aggregated Variance Method R/S actual value

Figure 3. Numerical estimation of the Hurst parameter H of fBM: Actual value H = 0.95 where Q(t) =

d dwtH



t

kH (t, s)X(s)ds,

(20)

0

and MtH is the Gaussian martingale defined by (17). From the representation (19), we know the quadratic variation of Z on the interval [0, t] is nothing but Zt = σ 2 wtH ,

a.s.

Hence the parameter σ 2 can be obtained by 42 3 Ztni+1 − Ztni = σ 2 , [wtH ]−1 lim n

a.s.

i

where tni is an appropriate partition of [0,t] such that supi |tni+1 − tni | → 0 as n → ∞. So, the variance parameter can be computed with probability 1 on any finite time interval. As for the parameter θ, by applying the Girsanov type formula for fBM which is proved in [69], we can define the following maximum likelihood estimate of θ

Quantifying Uncertainties in Complex Systems based on the observation on the interval [0, t] by :−1  . T 2 H θT = Q (s)dws 0

235

T

Q(s)dZs ,

(21)

0

where processes Q, Z and wtH are defined by (20), (18) and (16), respectively. For this estimate, strong consistency is proven and explicit formulas for the asymptotic bias and mean square error are derived by Kleptsyna and Le Breton [70]. Remarks 1/2 • When H = 1/2, since Q = Z = X and dωs = ds, the formula (21) reduces to the result of [80] for a usual Ornstein-Uhlenbeck process. • For an arbitrary H ∈ [1/2, 1], we could derive the following alternative expression for θT : .  :−1 . :  T T λ H θT = 2 ZT s2H−1 dZs − t . Q2 (s)dwsH 2 − 2H 0 0 Example 3.4. Consider a special Ornstein-Uhlenbeck model dXt = θXt dt + 2dBtH . Then, according to the above approximation scheme, we can numerically estimate θ = 1 and the results are shown in Figure 4. 600 Num Estimator Actual value

500

400



300

200

100

0

−100 100

200

300

400

500

600

700

800

900

1000

U

Figure 4. Numerical estimation of drift parameter θ in dXt = θXt dt + 2dBtH with Hurst parameter H = 0.75: Actual value θ0 = 1

236

J. Yang and J. Duan

3.3.3. Parameter estimation in linear deterministic regression. Suppose Xt satisfies the following stochastic differential equation  t  t A(s)ds + C(s)dBsH , 0 ≤ t ≤ T, Xt = θ 0

0

where A and C are deterministic measurable functions on [0,T], BtH is a fBM with Hurst parameter H ∈ [1/2, 1]. Let qt be defined by  t d A qt = kH (t, s) (s)ds, C dwtH 0

where wtH and kH (t, s) are defined by (16) and (14). Then, from Theorem 3 in [69], we obtain the maximum likelihood estimate of θ defined by :−1  . T T 2 H θT = qt dwt qt dZt , 0

0

where Zt is defined by (18). Remark. This result can be extended to an arbitrary H in (0, 1) (see [78]) and θT is also the best linear unbiased estimate of θ. Example 3.5. Consider a special Linear Deterministic Regression dXt = −θdt + tdBtH .

Then, using the above estimate, we can do numerical simulation with result shown in Figure 5. 3.3.4. Parameter estimation in linear random regression. Let us consider a stochastic differential equation dX(t) = [A(t, X(t)) + θC(t, X(t))]dt + σ(t)dBtH , t ≥ 0,

where B = {BtH , t ≥ 0} is a fractional Brownian motion with Hurst parameter H and σ(t) is a positive nonvanishing function on [0, ∞). According to [105], the Maximum Likelihood Estimate θˆT of θ is given by T T J2 (t)dZt + 0 J1 (t)J2 (t)dwtH 0 θT = , T 2 H 0 J2 (t)dwt where the processes Zt , J1 , J2 are defined by  t kH (t, s) Zt = dXs , t ≥ 0, σ(s) 0  t d A(s, X(s)) J1 (t) = kH (t, s) ds, σ(s) dwtH 0  t C(s, X(s)) d kH (t, s) ds, J2 (t) = σ(s) dwtH 0

Quantifying Uncertainties in Complex Systems

237

1.2

1 Num Estimator Actual value 0.8



0.6

0.4

0.2

0

−0.2 0

1000

2000

3000

4000

5000 U

6000

7000

8000

9000

10000

Figure 5. Numerical estimation of drift parameter θ in a Linear Deterministic Regression dXt = −θdt + tdBtH with Hurst parameter H = 0.75: Actual value θ=1 and wtH , kH (t, s) are defined by (16) and (14). Also in the same paper, they proved that θT is strongly consistent for the true value θ. Example 3.6. Consider a special Linear Random Regression dXt = (t + θXt )dt + tdBtH . A numerical estimation of the parameter θ is shown in Figure 6.

4. Parameter estimation for SDE driven by α-stable L´evy motion Brownian motion, as a Gaussian process, has been widely used to model fluctuations in engineering and science. For a particle in Brownian motion, its sample paths are continuous in time almost surely (i.e., no jumps), its mean square displacement increases linearly in time (i.e., normal diffusion), and its probability density function decays exponentially in space (i.e., light tail or exponential relaxation) [95]. However some complex phenomena involve non-Gaussian fluctuations, with properties such as anomalous diffusion (mean square displacement is a nonlinear power law of time) [15] and heavy tail (non-exponential relaxation) [129]. For instance, it has been argued that diffusion in a case of geophysical turbulence [114] is anomalous. Loosely speaking, the diffusion process consists of a series of “pauses”, when the particle is trapped by a coherent structure, and “flights” or “jumps” or other extreme events, when the particle moves in a jet flow. Moreover,

238

J. Yang and J. Duan

20 Num. Estimator Actual value 15



10

5

0

−5

−10 100

200

300

400

500

600

700

800

900

1000

U

Figure 6. Numerical estimation of drift parameter θ in a Linear Random Regression dXt = (t + θXt )dt + tdBtH with Hurst parameter H = 0.75: Actual value θ=1

anomalous electrical transport properties have been observed in some amorphous materials such as insulators, semiconductors and polymers, where transient current is asymptotically a power law function of time [112, 53]. Finally, some paleoclimatic data [29] indicates heavy tail distributions and some DNA data [114] shows long range power law decay for spatial correlation. L´evy motions are thought to be appropriate models for non-Gaussian processes with jumps [111]. Here we consider a special non-Gaussian process, the α-stable L´evy motion, which arise in many complex systems [126]. 4.1. α-stable L´evy motion There are several reasons for using a stable distribution to model a fluctuation process in a dynamical system. Firstly, there are theoretical reasons for expecting a non-Gaussian stable model, e.g., hitting times for a Brownian motion yielding a L´evy distribution, and reflection off a rotating mirror yielding a Cauchy distribution. The second reason is the Generalized Central Limit Theorem which states that the only possible non-trivial limit of normalized sums of i.i.d. terms is stable. The third argument for modeling with stable distributions is empirical: Many large data sets exhibit heavy tails and skewness. In this section, we consider one-dimensional α-stable distributions defined as follows.

Quantifying Uncertainties in Complex Systems

239

Definition 4.1 ([64], Chapter 2.4). The Characteristic Function ϕ(u) of an α-stable random variable is given by ϕ(u) = exp((−σ α )|u|α {1 − iβ sgn(u) tan(απ/2)} + iµu)

where α ∈ (0, 1) ∪ (1, 2), β ∈ [−1, 1], σ ∈ R+ , µ ∈ R, and by &

% 2 ϕ(u) = exp −σ|u| 1 + iβ sgn(u) log(|u|) + iµu π

(22)

(23)

when α = 1, it gives a very well-known symmetric Cauchy distribution and

1 2 (24) ϕ(u) = exp − σ|u| + iµu , 2 when α = 2, it gives the well-known Gaussian distribution.

For the random variable X distributed according to the rule described above we use the notation X ∼ Sα (σ, β, µ). Especially when µ = β = 0, i.e., X is a symmetric α-stable random variable, we will denote it as X ∼ SαS. Also, from above definition, it is easy to see that the full stable class is characterized by four parameters, usually designated α, β, σ, and µ. The shift parameter µ simply shifts the distribution to the left or right. The scale parameter σ compresses or extends the distribution about µ in proportion to σ which means, if the variable x has the stable distribution X ∼ Sα (σ, β, µ), the transformed variable z = (x − µ)/σ will have the same shaped distribution, but with location parameter 0 and scale parameter 1. The two remaining parameters completely determine the distribution’s shape. The characteristic exponent α lies in the range (0, 2] and determines the rate at which the tails of the distribution taper off. When α = 2, a normal distribution results. When α < 2, the variance is infinite. When α > 1, the mean of the distribution exists and is equal to µ. However, when α ≤ 1, the tails are so heavy that even the mean does not exist. The fourth parameter β determines the skewness of the distribution and lies in the range [−1, 1]. Now let us introduce α-stable L´evy motions. Definition 4.2 (α-stable L´evy motion [64]). A stochastic process {X(t) : t ≥ 0} is called the (standard) α-stable L´evy motion if 1. X(0)=0 a.s.; 2. {X(t) : t ≥ 0} has independent increments; 3. X(t)-X(s)∼ Sα ((t − s)1/α , β, 0) for any 0 ≤ s < t < ∞. So, from the third condition, we can simulate all α-stable L´evy motion if we know how to simulate X ∼ Sα (σ, β, 0). Especially, it is enough to simulate X ∼ Sα (σ, 0, 0) if we want to get the trajectories of symmetric α-stable L´evy motions. We recall an important property of α-Stable random variables giving us the following result: It is enough to know how to simulate X ∼ Sα (1, 0, 0) in order to get any X ∼ Sα (σ, 0, 0), ∀σ ∈ R+ .

240

J. Yang and J. Duan

0.05

8

0.04

7 0

6

0.03

5

0.01

−0.5

4

aLp

0.02

aLp

Y

0.5

3

−1

2

0

1 −0.01 −0.02 0

−1.5

0 0.2

0.4

0.6 u

0.8

1

−1 0

0.2

0.4

0.6

0.8

1

−2

u

0

0.2

0.4

0.6

0.8

1

u

Figure 7. Three sample paths of symmetric α-stable L´evy motion with α = 0.4, 1.2, 1.9, respectively Proposition 4.3. If we have X1 , X2 ∼ Sα (σ, β, µ) and A, B are real positive constants and C is a real constant, then AX1 + BX2 + C ∼ Sα (σ(Aα + B α )1/α , β, µ(Aα + B α )1/α + C) Proposition 4.4. Let X ∼ Sα (σ, β, 0), with 0 < α < 2, Then E|X|p < ∞ for any 0 < p < α, E|X|p = ∞ for any p ≥ α. Figure 7 shows sample paths of the α-stable L´evy motion with different α. As we can see in Figure 7, the bigger the parameter α is, the more the path looks like Brownian motion. Generally speaking, when we deal with concrete data, we have to choose α-stable processes very carefully to get the best estimation. We now discuss how to estimate α. 4.2. How to estimate the characteristic exponent α Five different methods about how to estimate the characteristic exponent α of αstable distribution are considered: Characteristic Function Method(CFM), Quantile Method, Maximum Likelihood Method, Extreme Value Method and Moment Method. As in the last section, measurements are given on artificial data and the results of each method are compared in the end of this section. 4.2.1. Characteristic function method. Since α-stable distributions are uniquely determined by their Characteristic Function (CF), it is natural to consider how to estimate parameter by studying their CF. Press [106] introduced a parameter estimation method based on CF, which gets estimations of parameters by minimizing differences between values of sample CF and the real ones. But this method is only applicable to standard distributions. Another method which uses the linearity of logarithm of CF was developed by Koutrouvelis [74] and it can be applied to general α-stable cases. This method is denoted as Kou-CFM. The idea is as follows: On the one hand, taking the logarithm of real part of CF gives ln[−Re(ϕ(u))] = α ln |u| + α ln σ.

Quantifying Uncertainties in Complex Systems

241

On the other hand, the sample characteristic function of ϕ(θ) is given by Φ(θ) = N ( k=1 eiθyk ) where yk ’s are n independent observations. In [74], a regression technique is applied to gain estimates for all parameters of an observed α stable distribution. In [72], Kogon improved this method by replacing a linear regression fit by a linear least square fit which gave a more accurate estimate and its computational complexity became lower. 4.2.2. Quantile method. Quantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Suppose we have n independent symmetric α-stable random variables with the stable distribution Sα (σ, β, µ), whose parameters are to be estimated. Let xp be the pth quantile, so that Sα (xp ; σ, β, µ) = p. Let xˆp be the corresponding sample quantile, then xˆp is a consistent estimate of xp . In 1971, Fama and Roll [41] discovered that, for some large p (for example, p = 0.95), ˆ1−p x ˆp − x ˆ1−p x ˆp − x zˆp = = (0.827) 2σ x ˆ0.72 − x ˆ0.28 is an estimate of the p-quantile of the standardized symmetric stable distribution with exponent α. According to this, they proposed an estimate (QM) for symmetric α-stable distributions. However, the serious disadvantage of this method is that its estimations are asymptotically biased. Later on, McCulloch [87] improved and extended this result to general α-stable distributions, denoted as McCulloch-QM. Firstly, he defined vα = (x−0.95 − x−0.05 )/(x−0.75 − x−0.25 )

vβ = (x−0.95 + x−0.05 − 2x0.5 )/(x−0.95 − x−0.05 )

and let vˆα and vˆβ be the corresponding sample value: vˆα = (ˆ x−0.95 − x ˆ−0.05 )/(ˆ x−0.75 − x ˆ−0.25 )

vˆβ = (ˆ x−0.95 + x ˆ−0.05 − 2ˆ x0.5 )/(ˆ x−0.95 − x ˆ−0.05 )

which are the consistent estimates of the index vα and vβ . Then, he illustrated that estimates of α can be expressed by a function of vˆα and vˆβ . Compared with QM, McCulloch-QM could get consistent and unbiased estimations for the general αstable distribution, and extend the estimation range of parameter α to 0.6 ≤ α ≤ 2. Despite its computational simplicity, this method has a number of drawbacks, such as, there are no analytical expressions for the value of the fraction, and the evaluation of the tables implies that it is highly dependent on the value of α in a nonlinear way. This technique does not provide any closed-form solutions. 4.2.3. Extreme value method. In 1996, based on asymptotic extreme value theory, order statistics and fractional lower-order moments, Tsihrintzis and Nikias [119] proposed a new estimate which can be computed fast for symmetric α stable distribution from a set of i.i.d. observations. Five years later, Kuruoglu [76] extended

242

J. Yang and J. Duan

it to the general α stable distributions. The general idea of this method is as follows. Given a data series {Xi : i = 1, 2, . . . , N }, divide this into L nonoverlapping blocks of length K such that K = N/L. Then the logarithms of the maximum and minimum samples of each segment are computed as follows Yl = log(max{XlK−K+i |i = 1, 2, . . . , K}),

Yl = log(− min{XlK−K+i |i = 1, 2, . . . , K}). The sample means and variances of Yl and Yl are calculated as L

Y =

1 Yl , L

s2 =

l=1

Y =

1 L

L  l=1

Yl , s2 =

L 1  (Yl − Y )2 , L−1 l=1

L 1  (Yl − Y )2 . L−1 l=1

Finally, an estimate for α is given by sample variance as follows

1 1 π α ˆ= √ + . 2 6 s s

Even though the accuracy and computational complexity decrease, there is now a closed form for the block size which means we have to look-up table to determine the segment size K. 4.2.4. Moment estimation method. Another way to estimate parameters of the general α-stable distribution is the Logarithmic Moments Methods which was also introduced by Kuruoglu [76]. The advantage of this method relative to the Fractional Lower-Order Method is that it does not require the inversion of a sinc function or the choice of a moment exponent p. The main feature is that the estimate of α can be expressed by a function of the second-order moment of the skewed process, i.e.,

−1/2 L2 1 α ˆ= − , ψ1 2 where ψ1 =

π2 6

and, for any X ∼ Sα (σ, β, 0), L2 is defined as follows

1 θ2 1 L2 = E[(log |X| − E[log |X|])2 ] = ψ1 + 2 − 2. 2 α α

4.2.5. Results on simulated data. In this subsection, we would like to use artificial data to check the robustness of the above techniques and compare the results. For each of the simulation methods chosen, estimates of α have been generated respectively and each trace is 1,000 points of data. Characteristic exponents of 0.95 and 1.70 have been chosen to represent a low and a high level of the rate at which the tails of the distribution taper off. From Figures 8 and 9, we can see that the Characteristic Function Method and the Moment Estimate Method have the most accurate result. The Quantile

Quantifying Uncertainties in Complex Systems

243

3.5 CFM EVM QM MEM act. value

3

2.5

2



1.5

1

0.5

0

−0.5

−1

100

200

300

400

500

O

600

700

800

900

1000

Figure 8. Numerical estimation of the characteristic exponent α in the α-stable L´evy motion Lα t by 4 different methods: Actual value α = 0.95

Method behaved a little better than Extreme Value Method, but both of them still fluctuate too much when α is small. As to the convergence, we can see that all the methods get closer and closer to the real value when the points of data increase except for the Extreme Value Method. 4.3. How to Estimate Parameters in SDEs Driven by L´evy Motion After we discussed how to estimate the characteristic exponent of α-stable L´evy motions, now we consider how to estimate the parameters in stochastic differential equations driven by general L´evy motion. Just as what we discussed about fBM, no general results about the parameter estimation for general cases are available at this time. Some special results will be listed below for different equations. We consider parameter estimation of the L´evy-driven stationary OrnsteinUhlenbeck process. Recently, Brockdwell, Davis and Yang [16] studied parameter estimation problems for L´evy-driven Langevin equation (whose solution is called an Ornstein-Uhlenceck process) based on observations made at uniformly and closelyspaced times. The idea is to obtain a highly efficient estimate of the L´evy-driven Ornstein-Uhlenceck process coefficient by estimating the corresponding coefficient of the sampled process. The main feature is discussed below. Consider a stochastic differential equation driven by the L´evy motion {L(t), t ≥ 0} dY (t) = −θY (t)dt + σdL(t).

244

J. Yang and J. Duan

3.5 CFM EVM QM MEM act. value 3



2.5

2

1.5

1 100

200

300

400

500

O

600

700

800

900

1000

Figure 9. Numerical estimation of the characteristic exponent α in the α-stable L´evy motion Lα t by 4 different methods: Actual value α = 1.70 When L(t) is Brownian motion, the solution of above equation can be expressed as  t −θt Y (t) = e Y (0) + σ e−θ(t−u) dL(t). (25) 0

For any second-order driving L´evy motion, the process {Y (t)} can be defined in the same way, and if {L(t)} is non-decreasing, {Y (t)} can also be defined pathwise as a Riemann-Stieltjes integral by (25). For the convenience of the simulation, we rewrite solution as follows  t −θ(t−s) Y (t) = e Y (s) + σ e−θ(t−u)dL(u), for all t > s ≥ 0. (26) s

Now we collect all information corresponding to the sampled process in order to get the estimate. Set t = nh and s = (n − 1)h in equation (26). Then the sampled (h) process {Yn , n = 0, 1, 2, . . .} (or the discrete-time AR(1) process) satisfies (h)

Yn(h) = ϕYn−1 + Zn , where ϕ=e

−θh

,

and Zn = σ



nh

e−θ(nh−u) dL(u).

(n−1)h

Then, using the highly efficient Davis-McCormick estimate of ϕ, namely (h)

(h)

ϕˆN = min

1≤n≤N

Yn

(h)

Yn−1

,

Quantifying Uncertainties in Complex Systems

245

we can get the estimate of θ and σ as follows (h) (h) θˆN = −h−1 log ϕˆN , (2)

σ ˆN =

(h) N 2θˆN  (h) (h) (Y − Y N )2 . N i=0 i

Example 4.5. Consider a L´evy-driven Ornstein-Uhlenbeck process satisfying the following SDE dXt = −Xt dt + σdLα (27) t.

A numerical estimation of the diffusion parameter σ is shown in Figure 10. x 105 12 Num Estimator real value 10

8



6

4

2

0

100

200

300

400

500

600

700

800

900

O

Figure 10. Numerical estimation of the diffusion parameter σ in L´evydriven Ornstein-Uhlenbeck process dXt = −Xt dt+σdLα t with α = 0.95: Actual value σ = 2

References [1] Y. A¨ıt-Sahalia (2002), Maximum-likelihood estimation of discretely-sampled diffusions: a closed-form approximation approach, Econometrica 70, 223–262. [2] Y. A¨ıt-Sahalia and P.A. Mykland (2004), Estimators of diffusions with randomly spaced discrete observations: a general theory, The Annals of Statistics 32(5), 2186– 2222 [3] Y. A¨ıt-Sahalia and P.A. Mykland (2003), The effects of random and discrete sampling when estimating continuous-time diffusions, Econometrica 71(2), 483–549.

246

J. Yang and J. Duan

[4] S. Albeverrio, B. R¨ udiger and J.L. Wu (2000), Invariant measures and symmetry property of L´evy type operators, Potential Analysis 13, 147–168. [5] S. Alizadeh, M.W. Brandt and F.X. Diebold (2002), Range-based estimation of stochastic volatility models, The Journal of Finance 57(3), 1047–1091. [6] D. Applebaum (2009), L´evy Processes and Stochastic Calculus, 2nd edition, Cambridge University Press, UK. [7] L. Arnold (1998), Random Dynamical Systems, Springer, New York. [8] O.E. Barndorff-Nielsen, T. Mikosch and S.I. Resnick (Eds.) (2001), L´evy Processes: Theory and Applications, Birkh¨ auser, Boston. [9] C. Bender (2003), An Itˆ o formula for generalized functionals of a Fractional Brownian motion with arbitrary Hurst parameter, Stoch. Proc. Appl. 104, 81–106. [10] J. Beran (1994), Statistics for Long-Memory Processes, Chapman and Hall. [11] J. Bertoin (1998), L´evy Processes, Cambridge University Press, Cambridge, U.K. [12] P. Billingsley (1961), Statistical Inference for Markov Processes, Chicago University Press, Chicago. [13] J.P.N. Bishwal (2007), Parameter Estimation in Stochastic Differential Equations, Springer, New York. [14] D. Blomker and S. Maier-Paape (2003), Pattern formation below criticality forced by noise, Z. Angew. Math. Phys. 54(1), 1–25. [15] J.P. Bouchaud and A. Georges (1990), Anomalous diffusion in disordered media: Statistic mechanics, models and physical applications, Phys. Repts 195, 127–293. [16] P.J. Brockwell, R.A. Davis, and Y. Yang (2007), Estimation for nonnegative L´evydriven Ornstein-Uhlenbeck processes, J. Appl. Probab. 44(4), 977–989. [17] T. Caraballo, J. Langa and J.C. Robinson (2001), A stochastic pitchfork bifurcation in a reaction-diffusion equation, R. Soc. Lond. Proc. Ser. A 457, 2041–2061. [18] B. Chen (2009), Stochastic dynamics of water vapor in the climate system, Ph.D. Thesis, Illinois Institute of Technology, Chicago, USA. [19] B. Chen and J. Duan (2009), Stochastic quantification of missing mechanisms in dynamical systems, In “Recent Development in Stochastic Dynamics and Stochastic Analysis”, Interdisciplinary Math, Sci. 8, 67–76. [20] A. Chronopoulou and F. Viens (2009), Hurst index estimation for self-similar processes with long-memory. In “Recent Development in Stochastic Dynamics and Stochastic Analysis”, J. Duan, S. Luo and C. Wang (Eds.), 91–118, World Scientific. [21] J.M. Corcuera, D. Nualart, and J.H.C. Woerner (2006), Power variation of some integral fractional processes, Bernoulli, 12, 713–735. [22] J.M. Corcuera, D. Nualart and J.H.C. Woerner (2007), A functional central limit theorem for the realized power variation of integrated stable process, Stochastic Analysis and Applications 25, 169–186. [23] J. Coeurjolly (2001), Estimating the parameters of the Fractional Brwonian motion by discrete variations of its sample paths, Statistical Inference for Stochastic Processes 4, 199–227.

Quantifying Uncertainties in Complex Systems

247

[24] J. Coeurjolly (2000): Simulation and identification of the Fractional Brwonian motion: a bibliographical and comparative study, Journal of Statistical Software, American Statistical Association 5(07). [25] H. Crauel and F. Flandoli (1998), Additive noise destroys a pitchfork bifurcation, Journal of Dynamics and Differential Equations 10, 259–274. [26] D. Dacunha-Castelle adn D. Florens-Zmirou (1986), Estimation of the coefficients of a diffusion from discrete observations, 19, 263–284. [27] G. Da Prato and J. Zabczyk (1992), Stochastic Equations in Infinite Dimensions, Cambridge University Press. [28] M. Davis (2001), Pricing weather derivatives by marginal value, Quantitative Finance 1(3), 305–308. [29] P.D. Ditlevsen (1999), Observation of α-stable noise induced millennial climate changes from an ice record, Geophys. Res. Lett. 26, 1441–1444. [30] J.L. Doob (1953), Stochastic Processes, John Wiley, New York. [31] A. Du and J. Duan (2009), A stochastic approach for parameterizing unresolved scales in a system with memory, Journal of Algorithms & Computational Technology 3, 393–405. [32] J. Duan (2009), Stochastic modeling of unresolved scales in complex systems, Frontiers of Math. in China, 4, 425–436. [33] J. Duan (2009), Predictability in spatially extended systems with model uncertainty I & II, Engineering Simulation 2, 17–32 & 3 21–35. [34] J. Duan (2009), Predictability in nonlinear dynamical systems with model uncertainty, Stochastic Physics and Climate Modeling, T.N. Palmer and P. Williams (eds.), Cambridge Univ. Press, pp. 105–132. [35] J. Duan, X. Kan and B. Schmalfuss (2009), Canonical sample spaces for stochastic dynamical systems, In “Perspectives in Mathematical Sciences”, Interdisciplinary Math. Sci. 9, 53–70. [36] J. Duan, C. Li and X. Wang (2009), Modeling colored noise by Fractional Brownian motion, Interdisciplinary Math. Sci. 8, 119–130. [37] J. Duan and B. Nadiga (2007), Stochastic parameterization of large Eddy simulation of geophysical flows, Proc. American Math. Soc. 135, 1187–1196. [38] G. Dohnal (1987), On estimating the diffusion coefficient, J. Appl. Prob. 24, 105– 114. [39] O. Elerian, S. Chib and N. Shephard (2001), Likelihood inference for discretely observed non-linear diffusions, Econometrica 69(4), 959–993. [40] B. Eraker (2001), MCMC analysis of diffusion models with application to finance, Journal of Business and Economic Statistics 19(2), 177–191. [41] E.F. Fama, R. Roll (1971), Parameter estimates for symmetric stable distribution Journal of the American Statistical Association, 66, 331–338. [42] D. Florens-Zmirou (1989), Approximate discrete-time schemes for statistics of diffusion processes, Statistics 20, 547–557. [43] C.W. Gardiner (1985), Handbook of Stochastic Methods, Second Ed., Springer, New York.

248

J. Yang and J. Duan

[44] J. Garcia-Ojalvo and J.M. Sancho (1999), Noise in Spatially Extended Systems, Springer-Verlag, 1999. [45] V. Genon-Catalot and J. Jacod (1993), On the estimation of the diffusion coefficient for multi-dimensional diffusion processes, Annales de l’Inst. H. Poincar´e., section B, tome 29, 1993. [46] V. Genon-Catalot and J. Jacod (1993), On the estimation of the diffusion coefficient for multidimensional diffusion processes, Ann. Inst. Henri Poincar´ e, Probabilit´ es et Statistiques. 29, 119–151. [47] V. Genon-Catalot and J. Jacod (1994), On the estimation of the diffusion coefficient for diffusion processes, J. Statist. 21, 193–221. [48] V. Genon-Catalot, T. Jeantheau and C. Laredo (1999), Parameter estimation for discretely observed stochastic volatility models, Bernoulli 5(5), 855–872. [49] J. Geweke and S. Porter-Hudak (1983), The estimation and application of long memory time series models, Time Ser. Anal. 4, 221–238. [50] P. Hanggi and P. Jung (1995), Colored noise in dynamical systems, Advances in Chem. Phys. 89, 239–326. [51] L.P. Hansen (1982), Large sample properties of generalized method of moments estimators, Econometrica 63, 767–804. [52] C. Hein, P. Imkeller and I. Pavlyukevich (2009), Limit theorems for p-variations of solutions of SDEs driven by additive stable L´evy noise and model selection for paleo-climatic data, In “Recent Development in Stochastic Dynamics and Stochastic Analysis”, J. Duan, S. Luo and C. Wang (Eds.), Interdisciplinary Math. Sci. 8. [53] M.P. Herrchen (2001), Stochastic Modeling of Dispersive Diffusion by Non-Gaussian Noise, Doctorial Thesis, Swiss Federal Inst. of Tech., Z¨ urich. [54] C.C. Heyde (1997), Quasi-Likelihood and its Application: A General Approach to Optimal Parameter Estimation. Springer, New York. [55] W. Horsthemke and R. Lefever (1984), Noise-Induced Transitions, Springer-Verlag, Berlin. [56] J.E. Hutton and P.I. Nelson (1986), Quasi-likelihood estimation for semimartingales, Stochastic Processes and their Applications 22, 245–257. [57] I.A. Ibragimov, R.Z. Has’minskii (1981), Statistical Estimation-Asymptotic Theory. Springer-Verlag. [58] N. Ikeda and S. Watanabe (1989), Stochastic Differential Equations and Diffusion Processes, North-Holland Publishing Company, Amsterdam. [59] P. Imkeller and I. Pavlyukevich (2002), Model reduction and stochastic resonance, Stochastics and Dynamics 2(4), 463–506. [60] P. Imkeller and I. Pavlyukevich (2006), First exit time of SDEs driven by stable L´evy processes, Stoch. Proc. Appl. 116, 611–642. [61] P. Imkeller, I. Pavlyukevich and T. Wetzel (2009), First exit times for L´evy-driven diffusions with exponentially light jumps, Annals of Probability 37(2), 530C564. [62] J. Nicolau (2004), Introduction to the Estimation of Stochastic Differential Equations Based on Discrete Observations, Stochastic Finance 2004 (Autumn School and International Conference).

Quantifying Uncertainties in Complex Systems

249

[63] J. Jacod (2006), Parametric inference for discretely observed non-ergodic diffusions, Bernoulli 12(3), 383–401. [64] A. Janicki and A. Weron (1994), Simulation and Chaotic Behavior of α-Stable Stochastic Processes, Marcel Dekker, Inc. [65] W. Just, H. Kantz, C. Rodenbeck and M. Helm (2001), Stochastic modeling: replacing fast degrees of freedom by noise, J. Phys. A: Math. Gen. 34, 3199–3213. [66] I. Karatzas and S.E. Shreve (1991), Brownian Motion and Stochastic Calculus 2nd edition, Springer. [67] M. Kessler (2000), Simple and explicit estimating functions for a discretely observed diffusion process, Scandinavian Journal of Statistics 27(1), 65–82. [68] V. Krishnan (2005), Nonlinear Filtering and Smoothing: An Introduction to Martingales, Stochastic Integrals and Estimation, Dover Publications, Inc., New York. [69] M.L. Kleptsyna, A. Le Breton and M.C. Roubaud (2000), Parameter estimation and optimal filtering for fractional type stochastic systems. Statist. Inf. Stochast. Proces. 3, 173–182. [70] M.L. Kleptsyna and A. Le Breton (2002), Statistical analysis of the fractional Ornstein-Uhlenbeck type process, Statistical Inference for Stochastic Processes 5(3), 229–242. [71] F. Klebaner (2005), Introduction to Stochastic Calculus with Application, Imperial College Press, Second Edition, 2005. [72] S. Kogon, D. Williams (1998), Characteristic function based estimation of stable distribution parameters, in A practical guide to heavy tails, M.T.R. Adler R. Feldman, Ed. Berlin: Birkh¨ auser, 311–335. [73] A.N. Kolmogorov (1940), Wienersche spiralen und einige andere interessante kurven im hilbertschen raum, C.R.(Doklady) Acad. URSS (N.S) 26, 115–118, 1940. [74] I.A. Koutrouvelis (1980), Regression-type estimation of the parameters of stable laws. Journal of the American Statistical Association 75, 918–928. [75] H. Kunita (2004), Stochastic differential equations based on L´evy processes and stochastic flows of diffeomorphisms, Real and stochastic analysis (Eds. M.M. Rao), 305–373, Birkh¨ auser, Boston, MA. [76] E.E. Kuruoglu (2001), Density parameter estimationof skewed α-stable distributions, Singnal Processing, IEEE Transactions on 2001, 49(10): 2192–2201. [77] Yu.A. Kutoyants (1984), Parameter estimation for diffusion type processes of observations, Statistics 15(4), 541–551. [78] A. Le Breton (1998), Filtering and parameter estimation in a simple linear model driven by a fractional Brownian motion, Stat. Probab. Lett. 38(3), 263–274. [79] A. Le Breton (1976), On continuous and discrete sampling for parameter estimation in diffusion type processes, Mathematical Programming Study 5, 124–144. [80] R.S. Lipster and A.N. Shiryaev (1977), Statistics of Random Processes, Springer, New York. [81] X. Liu, J. Duan, J. Liu and P.E. Kloeden (2009), Synchronization of systems of Marcus canonical equations driven by α-stable noises, Nonlinear Analysis – Real World Applications, to appear, 2009.

250

J. Yang and J. Duan

[82] A.W. Lo (1991), Long-term memory in stock market prices, Econometrica 59, 1279– 1313. [83] B.B. Mandelbrot and J.R. Wallis (1969), Computer experiments with fractional Gaussian noises, Water Resources Research 5, 228–267. [84] B.B. Mandelbrot and J.W. Van Ness (1968), Fractional Brownian motions, fractional noises and applications, SIAM Rev. 10, 422–437. [85] X. Mao (1995), Stochastic Differential Equations and Applications, Horwood Publishing, Chichester. [86] B. Maslowski and B. Schmalfuss (2005), Random dynamical systems and stationary solutions of differential equationsdriven by the fractional Brownian motion, Stoch. Anal. Appl. 22(6), 1577–1607. [87] J.H. McCulloch (1986), Simple consistent estimators of stable distributions, Communications in Statistics-Simulation and Computation 15, 1109–1136. [88] Y.S. Mishura (2008), Stochastic Calculus for Fractional Brownian Motion and Related Processes, Springer, Berlin. [89] F. Moss and P.V.E. McClintock (eds.), Noise in Nonlinear Dynamical Systems. Volume 1: Theory of Continuous Fokker-Planck Systems (2007); Volume 2: Theory of Noise Induced Processes in Special Applications (2009); Volume 3: Experiments and Simulations (2009). Cambridge University Press. [90] J.P. Nolan (2007), Stable Distributions – Models for Heavy Tailed Data, Birkh¨ auser, Boston, 2007. [91] I. Nourdin and T. Simon (2006), On the absolute continuity of L´evy processes with drift, Ann. Prob. 34(3), 1035–1051. [92] I. Norros, E. Valkeila and J. Virtamo (1999), An elementary approach to a Girsanov formula and other analytical results on fractional Brownian motions, Bernoulli 5(4), 571–587. [93] D. Nualart (2003), Stochastic calculus with respect to the fractional Brownian motion and applications, Contemporary Mathematics, 336, 3–39. [94] B. Oksendal (2005), Applied Stochastic Control Of Jump Diffusions, SpringerVerlag, New York. [95] B. Oksendal (2003), Stochastic Differenntial Equations, Sixth Ed., Springer-Verlag, New York. [96] B. Oksendal, F. Biagini, T. Zhang and Y. Hu (2008), Stochastic Calculus for Fractional Brownian Motion and Applications. Springer. [97] T.N. Palmer, G.J. Shutts, R. Hagedorn, F.J. Doblas-Reyes, T. Jung and M. Leutbecher (2005), Representing model uncertainty in weather and climate prediction, Annu. Rev. Earth Planet. Sci. 33, 163–193. [98] A. Papoulis (1984), Probability, Random Variables, and Stochastic Processes, McGraw-Hill Companies, 2nd edition. [99] N.D. Pearson and T. Sun (1994), Exploiting the conditional density in estimating the term structure: an application to the Cox, Ingersoll and Ross model, The Journal of Finance 49(4), 1279–1304.

Quantifying Uncertainties in Complex Systems

251

[100] A.R. Pedersen (1995), Consistency and asymptotic normality of an approximate maximum likelihood estimator for discretely observed diffusion processes, Bernoulli 1(3), 257–279. [101] C.K. Peng, V. Buldyrev, S. Havlin, M. Simons, H.E. Stanley, and A.L. Goldberger (1994), Mosaic organization of DNA nucleotides. Phys. Rev. E 49, 1685–1689. [102] S. Peszat and J. Zabczyk (2007), Stochastic Partial Differential Equations with L´evy Processes, Cambridge University Press, Cambridge, UK. [103] B.L.S. Prakasa Rao (1999), Statistical Inference for Diffusion Type Processes, Arnold, London. [104] B.L.S. Prakasa Rao (1999), Semimartingales and their Statistical Inference, Chapman & Hall/CRC. [105] B.L.S. Prakasa Rao (2003), Parametric estimation for linear stochastic differential equations driven by fractional Brownian motion. http://www.isid.ac.in/statmath/eprints [106] S. Press (1972), Estimation of univariate and multivariate stable distributions, Journal of the Americal Statistical Association 67, 842–846. [107] P.E. Protter (2005), Stochastic Integration and Differential Equations, SpringerVerlag, New York, Second Edition. [108] B.L. Rozovskii (1990), Stochastic Evolution Equations, Kluwer Academic Publishers, Boston. [109] P.M. Robinson (1977), Estimation of a time series model from unequally spaced data, Stoch. Proc. Appl. 6, 9–24. [110] G. Samorodnitsky, M.S. Taqqu (2008), Stable Non-Gaussian Random ProcessesStochastic Models with Infinite Variance. Chapman & Hall/CRC. [111] K. Sato (1999), L´evy Processes and Infinitely Divisible Distrributions, Cambridge University Press, Cambridge, UK, 1999 [112] H. Scher, M.F. Shlesinger and J.T. Bendler (1991), Time-scale invariance in transport and relaxation, Phys. Today 44(1), 26–34. [113] D. Schertzer, M. Larcheveque, J. Duan, V. Yanovsky and S. Lovejoy (2000), Fractional Fokker–Planck equation for non-linear stochastic differential equations driven by non-Gaussian L´evy stable noises, J. Math. Phys. 42, 200–212. [114] M.F. Shlesinger, G.M. Zaslavsky and U. Frisch (1995), L´evy Flights and Related Topics in Physics, Lecture Notes in Physics, Springer-Verlag, Berlin. [115] M. Sorensen (1999), On asymptotics of estimating functions, Brazillian Journal of Probability and Statistics 13, 111–136. [116] D.W. Stroock and S.R.S. Varadhan (1979), Multidimensional Diffusion Processes, Springer Verlag, Berlin. [117] T.H. Solomon, E.R. Weeks, and H.L. Swinney (1993), Observation of anomalous diffusion and L´evy flights in a two-dimensional rotating flow, Phys. Rev. Lett. 71(24), 3975–3978. [118] M.S. Taqqu, V. Teverovsky, and W. Willinger (1995), Estimators for long-range dependence: an empirical study, Fractals, 3(4), 785–798.

252

J. Yang and J. Duan

[119] G.A. Tsihrintzis and C.L. Nikias (1995), Fast estimation of the parameters of alphastable impulsive interference using asymptotic extreme value theory, ICASSP-95, 3, 1840–1843. [120] N.G. Van Kampen (1987), How do stochastic processes enter into physics? Lecture Note in Mathe. 1250/1987, 128–137. [121] N.G. Van Kampen (1981), Stochastic Processes in Physics and Chemistry, NorthHolland, New York. [122] E. Waymire and J. Duan (Eds.) (2005), Probability and Partial Differential Equations in Modern Applied Mathematics, Springer-Verlag. [123] D.S. Wilks (2005), Effects of stochastic parameterizations in the Lorenz ’96 system, Q. J. R. Meteorol. Soc. 131, 389–407. [124] P.D. Williams (2005), Modeling climate change: the role of unresolved processes, Phil. Trans. R. Soc. A 363, 2931–2946. [125] E. Wong and B. Hajek (1985), Stochastic Processes in Engineering Systems, SpringVerlag, New York. [126] W.A. Woyczynski (2001), L´evy processes in the physical sciences, In L´evy processes: theory and applications, O.E. Barndorff-Nielsen, T. Mikosch and S.I. Resnick (Eds.), 241–266, Birkh¨ auser, Boston, 2001. [127] A.M. Yaglom (1958), Correlation theory of processes with random stationary nth increments, AMS Transl. 2(8), 87–141. [128] Z. Yang and J. Duan (2008), An intermediate regime for exit phenomena driven by non-Gaussian L´evy noises, Stochastics and Dynamics 8(3), 583–591. [129] F. Yonezawa (1996), Introduction to focused session on ‘anomalous relaxation, J. Non-Cryst. Solids 198–200, 503–506. [130] N. Yoshida (2004), Estimation for diffusion processes from discrete observations, J. Multivariate Anal. 41(2), 220–242. Jiarui Yang and Jinqiao Duan Department of Applied Mathematics Illinois Institute of Technology Chicago, IL 60616, USA e-mail: [email protected] [email protected]

Part II Financial Applications

Progress in Probability, Vol. 65, 255–298 c 2011 Springer Basel AG 

Convertible Bonds in a Defaultable Diffusion Model Tomasz R. Bielecki, St´ephane Cr´epey, Monique Jeanblanc and Marek Rutkowski Abstract. In this paper, we study convertible securities (CS) in a primary market model consisting of: a savings account, a stock underlying a CS, and an associated CDS contract (or, alternatively to the latter, a rolling CDS more realistically used as an hedging instrument). We model the dynamics of these three securities in terms of Markovian diffusion set-up with default. In this model, we show that a doubly reflected Backward Stochastic Differential Equation associated with a CS has a solution, meaning that super-hedging of the arbitrage value of a convertible security is feasible in the present setup for both issuer and holder at the same initial cost, and we provide the related (super-)hedging strategies. Moreover, we characterize the price of a CS in terms of viscosity solutions of associated variational inequalities and we prove the convergence of suitable approximation schemes. We finally specify these results to convertible bonds and their straight bond and game exchange option components, and provide numerical results. Mathematics Subject Classification (2000). 62L15, 91G40, 60G40, 49J40. Keywords. BSDE, Superhedging, Markov models, Variational inequalities.

1. Introduction The goal of this work is a detailed and rigorous examination of convertible securities (CS) in a financial market model endowed with the following primary traded assets: a savings account, a stock underlying the CS and an associated credit default swap The research of T.R. Bielecki was supported by NSF Grant DMS-0604789 and NSF Grant DMS0908099. The research of S. Cr´epey benefited from the support of the ‘Chaire Risque de cr´edit’, F´ ed´ eration Bancaire Fran¸caise, and of Ito33. The research of M. Jeanblanc benefited from the support of the ‘Chaire Risque de cr´edit’, F´ ed´eration Bancaire Fran¸caise, and of Moody’s Corporation grant 5-55411. The research of M. Rutkowski was supported by the ARC Discovery Project DP0881460.

256

T.R. Bielecki, S. Cr´epey, M. Jeanblanc and M. Rutkowski

(CDS) contract or, alternatively to the latter, a rolling CDS. Let us stress that we deal here not only with the valuation, but also, even more crucially, with the issue of hedging convertible securities that are subject to credit risk. Special emphasis is put on the properties of convertible bonds (CB) with credit risk, which constitute an important class of actively traded convertible securities. It should be acknowledged that convertible bonds were already extensively studied in the past by, among others, Andersen and Buffum [1], Ayache et al. [2], Brennan and Schwartz [12, 13], Davis and Lischka [22], Kallsen and K¨ uhn [31], Kwok and Lau [35], Lvov et al. [37], Sˆırbu et al. [42], Takahashi et al. [43], Tsiveriotis and Fernandes [45], to mention just a few. Of course, it is not possible to give here even a brief overview of models, methods and results from the above-mentioned papers (for a discussion of some of them and further references, we refer to [4]– [6]). Despite the existence of these papers, it was nevertheless our feeling that a rigorous, systematic and fully consistent approach to hedging-based valuation of convertible securities with credit risk (as opposed to a formal risk-neutral valuation approach) was not available in the literature, and thus we decided to make an attempt to fill this gap in a series of papers for which this work can be seen as the final episode. We strive to provide here the most explicit valuation and hedging techniques, including numerical analysis of specific features of convertible bonds with call protection and call notice periods. The main original contributions of the present paper, in which we apply and make concrete several results of previous works, can be summarized as follows: • we make a judicious choice of primary traded instruments used for hedging of a convertible security, specifically, the underlying stock and the rolling credit default swap written on the same credit name, • the completeness of the model until the default time of the underlying name in terms of uniqueness of a martingale measure is studied, • a detailed specification of the model assumptions that subsequently allow us to apply in the present framework our general results from the preceding papers [4]–[6] is provided, • it is shown that super-hedging of the arbitrage value of a convertible security is feasible in the present set-up for both issuer and holder at the same initial cost, • sufficient regularity conditions for the validity of the aggregation property for the value of a convertible bond at call time in the case of positive call notice period are given, • numerical results for the decomposition of the value of a convertible bond into straight bond and embedded option components are provided, • the precise definitions of the implied spread and implied volatility of a convertible bond are stated and some numerical analysis for both quantities is conducted. Before commenting further on this work, let us first describe very briefly the results of our preceding papers. In [4], working in an abstract set-up, we characterized arbitrage prices of generic convertible securities (CS), such as convertible

Convertible Bonds in a Defaultable Diffusion Model

257

bonds (CB), and we provided a rigorous decomposition of a CB into a straight bond component and a game option component, in order to give a definite meaning to commonly used terms of ‘CB spread’ and ‘CB implied volatility.’ Subsequently, in [5], we showed that in the hazard process set-up, the theoretical problem of pricing and hedging CS can essentially be reduced to a problem of solving an associated doubly reflected Backward Stochastic Differential Equation (BSDE for short). Finally, in [6], we established a formal connection between this BSDE and the corresponding variational inequalities with double obstacles in a generic Markovian intensity model. The related mathematical issues are dealt with in companion papers by Cr´epey [18] and Cr´epey and Matoussi [19]. In the present paper, we focus on a detailed study of convertible securities in a specific market set-up with the following traded assets: a savings account, a stock underlying a convertible security, and an associated rolling credit default swap. In Section 2, the dynamics of these three securities are formally introduced in terms of Markovian diffusion set-up with default. We also study there the arbitragefree property of this model, as well as its completeness. The model considered in this work appears as the simplest equity-to-credit reduced-form model, in which the connection between equity and credit is reflected by the fact that the default intensity γ depends on the stock level S. To the best of our knowledge, it is widely used by the financial industry for dealing with convertible bonds with credit risk. This specific model’s choice was the first rationale for the present study. Our second motivation was to show that all assumptions that were postulated in our previous theoretical works [4]–[6] are indeed satisfied within this set-up; in this sense, the model can be seen as a practical implementation of the general theory of arbitrage pricing and hedging of convertible securities. Section 3 is devoted to the study of convertible securities. We first provide a general result on the valuation of a convertible security within the present framework (see Proposition 3.1). Next, we address the issue of valuation and hedging through a study of the associated doubly reflected BSDE. Proposition 3.3 provides a set of explicit conditions, obtained by applying general results of Cr´epey [18], which ensure that the BSDE associated with a convertible security has a unique solution. This allows us to establish in Proposition 3.2 the form of the (super-)hedging strategy for a convertible security. Subsequently, we characterize in Proposition 3.4 the pricing function of a convertible security in terms of the viscosity solution to associated variational inequalities and we prove in Proposition 3.5 the convergence of suitable approximation schemes for the pricing function. In Section 4, we further specify these results to the special case of a convertible bond. In [4, 6] we worked under the postulate that the value Utcb of a convertible bond upon a call at time t yields, as a function of time, a well-defined process satisfying some natural conditions. In the specific framework considered here, using the uniqueness of arbitrage prices established in Propositions 2.1 and 3.1 and the continuous aggregation property for the value Utcb of a convertible bond upon a call at time t furnished by Proposition 4.7, we actually prove that this assumption is satisfied and we subsequently discuss in Propositions 4.6 and 4.8 the methods

258

T.R. Bielecki, S. Cr´epey, M. Jeanblanc and M. Rutkowski

for computation of Utcb. We also examine in some detail the decomposition into straight bond and embedded game option components, which is practically relevant, since it provides a formal way of defining the implied volatility of a convertible bond. We conclude the paper by illustrating some results through numerical computations of relevant quantities in a simple example of an equity-to-credit model.

2. Markovian equity-to-credit framework We first introduce a generic Markovian default intensity set-up. More precisely, we consider a defaultable diffusion model with time- and stock-dependent local default t intensity and local volatility (see [1, 2, 6, 14, 22, 24, 34]). We denote by 0 the integral over (0, t]. 2.1. Default time and pre-default equity dynamics Let us be given a standard stochastic basis (Ω, G, F, Q), over [0, Θ] for some fixed Θ ∈ R+ , endowed with a standard Brownian motion (Wt )t∈[0,Θ] . We assume that F is the filtration generated by W . The underlying probability measure Q is aimed to represent a risk-neutral probability measure (or ‘pricing probability’) in a financial market model that we are now going to construct. In the first step, we define the pre-default factor process (S+t )t∈[0,Θ] (to be interpreted later as the pre-default stock price of the firm underlying a convertible security) as the diffusion process with the initial condition S+0 and the dynamics over [0, Θ] given by the stochastic differential equation (SDE) ) *  dS+t = S+t r(t) − q(t) + ηγ(t, S+t ) dt + σ(t, S+t ) dWt (1) with a strictly positive initial value S+0 . We denote by L the infinitesimal generator + that is, the differential operator given by the formula of S,   σ 2 (t, S)S 2 2 ∂S 2 . L = ∂t + r(t) − q(t) + ηγ(t, S) S ∂S + 2

(2)

Assumption 2.1.

(i) The riskless short interest rate r(t), the equity dividend yield q(t), and the local default intensity γ(t, S) ≥ 0 are bounded, Borel-measurable functions and η ∈ [0, 1] is a real constant, to be interpreted later as the fractional loss upon default on the stock price. (ii) The local volatility σ(t, S) is a positively bounded, Borel-measurable function, so, in particular, we have that σ(t, S) ≥ σ > 0 for some constant σ. (iii) The functions γ(t, S)S and σ(t, S)S are Lipschitz continuous in S, uniformly in t. Note that we allow for negative values of r and q in order, for instance, to possibly account for repo rates in the model. Under Assumption 2.1, SDE (1) is known

Convertible Bonds in a Defaultable Diffusion Model

259

+ which is non-negative over [0, Θ]. Moreover, to admit a unique strong solution S, the following (standard) a priori estimate is available, for any p ∈ [2, +∞) ) * ) * (3) EQ sup |S+t |p ≤ C 1 + |S+0 |p . t∈[0,Θ]

In the next step, we define the [0, Θ]∪{+∞}-valued default time τd , using the so-called canonical construction [8]. Specifically, we set (by convention, inf ∅ = ∞)  t ! " τd = inf t ∈ [0, Θ]; γ(u, S+u ) du ≥ ε , (4) 0

where ε is a random variable on (Ω, G, F, Q) with the unit exponential distribution and independent of F. Because of our construction of τd , the process Gt := Q(τ > t | Ft ) satisfies, for every t ∈ [0, Θ], Gt = e−



t 0

+u ) du γ(u,S

and thus it has continuous and non-increasing sample paths. This also means that the process γ(t, S+t ) is the F-hazard rate of τd (see, e.g., [8, 30]). The fact that the hazard rate γ may depend on S+ is crucial, since this dependence actually conveys all the ‘equity-to-credit’ information in the model. A natural choice for γ is a decreasing (e.g., negative power) function of S+ capped when S+ is close to zero. A possible further refinement would be to put a positive floor on the function γ. The lower bound on γ would then reflect the perceived level the systemic default risk, as opposed to firm-specific default risk. Let Ht = 1{τd ≤t} be the default indicator process and let the process d (Mt )t∈[0,Θ] be given by the formula  t (1 − Hu )γ(u, S+u ) du. Mtd = Ht − 0

We denote by H the filtration generated by the process H and by G the enlarged filtration given as F∨H. Then the process M d is known to be a G-martingale, called the compensated jump martingale. Moreover, the filtration F is immersed in G, in the sense that all F-martingales are G-martingales; this property is also frequently referred to as Hypothesis (H). It implies, in particular, that the F-Brownian motion W remains a Brownian motion with respect to the enlarged filtration G under Q.

2.2. Market model We are now in a position to define the prices of primary traded assets in our market model. Assuming that τd is the default time of a reference entity (firm), we consider a continuous-time market on the time interval [0, Θ] composed of three primary assets: • the savings account evolving according to the deterministic short-term interest rate r; we denote by β the discount factor (the inverse of the savings t account), so that βt = e− 0 r(u) du ;

260

T.R. Bielecki, S. Cr´epey, M. Jeanblanc and M. Rutkowski

• the stock of the reference entity with the pre-default price process S+ given by (1) and the fractional loss upon default determined by a constant η ≤ 1; • a CDS contract written at time 0 on the reference entity, with maturity Θ, the protection payment given by a Borel-measurable, bounded function ν : [0, Θ] → R and the fixed CDS spread ν¯. Remarks 2.1. It is worth noting that the choice of a fixed-maturity CDS as a primary traded asset is only temporary and it is made here mainly for the sake of expositional simplicity. In Section 2.3 below, we will replace this asset by a more practical concept of a rolling CDS, which essentially is a self-financing trading strategy in market CDSs. The stock price process (St )t∈[0,Θ] is formally defined by setting ) *  dSt = St− r(t) − q(t) dt + σ(t, St ) dWt − η dMtd , S0 = S+0 ,

(5)

so that, as required, the equality (1 − Ht )St = (1 − Ht )S+t holds for every t ∈ [0, Θ]. Note that estimate (3) enforces the following moment condition on the process S ) * EQ sup St < ∞, a.s. (6) t∈[0,τd ∧Θ]

We define the discounted cumulative stock price β S? stopped at τd by setting, for every t ∈ [0, Θ],  t∧τd   ? + βt St = βt (1 − Ht )St + βu (1 − η)S+u dHu + q(u)S+u du 0

or, equivalently, in terms of S

βt S?t = βt∧τd St∧τd +



t∧τd

βu q(u)Su du.

0

Note that we deliberately stopped β S? at default time τd , since we will not need to consider the behavior of the stock price strictly after default. Indeed, it will be enough to work under the assumption that all trading activities are stopped no later than at the random time τd ∧ Θ. Let us now examine the valuation in the present model of a CDS written on the reference entity. We take the perspective of the credit protection buyer. Consistently with the no-arbitrage requirements (cf. [7]), we assume that the pre+t )t∈[0,Θ] is given as B +t = B(t, + S+t ), where the pre-default CDS default CDS price (B + S) is the unique (classical) solution on [0, Θ] × R+ to the pricing function B(t, following parabolic PDE + S) + δ(t, S) − µ(t, S)B(t, + S) = 0, B(Θ, + LB(t, S) = 0, (7)

where • the differential operator L is given by (2), • δ(t, S) = ν(t)γ(t, S) − ν¯ is the pre-default dividend function of the CDS, • µ(t, S) = r(t) + γ(t, S) is the credit-risk adjusted interest rate.

Convertible Bonds in a Defaultable Diffusion Model ? equals, for every t ∈ [0, Θ], The discounted cumulative CDS price β B

Remarks 2.2.

?t = βt (1 − Ht )B +t + βt B



t∧τd

0

  βu ν(u) dHu − ν¯ du .

261

(8)

(i) It is worth noting that as soon as the risk-neutral parameters in the dynamics of the stock price S are given by (5), the dynamics (8) of the CDS price are derived from the dynamics of S and our postulate that Q is the ‘pricing probability’ for a CDS. This procedure resembles the standard method of completing a stochastic volatility model by taking a particular option as an additional primary traded asset (see, e.g., Romano and Touzi [41]). We will sometimes refer to dynamics (5) as the model; it will be implicitly assumed that this model is actually completed either by trading a fixed-maturity CDS (as in Section 2.2.1) or by trading a rolling CDS (see Section 2.3). Given the interest rate r, dividend yield q, the parameter η, and the covenants of a (rolling) CDS, the model calibration will then reduce to a specification of the local intensity γ and the local volatility σ only. We refer, in particular, to Section 4.3.6 in which the concepts of the implied spread and the implied volatility of a convertible bond are examined. (ii) Assuming dependence of the default intensity on the stock level is, of course, not the only way to model the equity-to-credit relationship. It is even fair to say that the above set-up is in fact a very stylized model of equity-to-credit. More precisely, it is not realistic to model default by only having equity and convertible bonds in the structure of the firm. Rather than modeling default on equity, it would be more realistic to model default as the inability to pay senior debt, or, in case no senior debt is outstanding, as the inability to repay the face value of the convertible bond (that would happen at maturity only, though). However the simple equity-to-credit framework of this paper is customary in the credit risk literature. From an economic point of view it is motivated by the empirical evidence of a negative correlation between equity levels and CDS spreads. More pragmatically this model is justified by good calibrability properties to CDS and equity option data. ? are man2.2.1. Risk-neutral measures and model completeness. Since β S? and β B ifestly locally bounded processes, a risk-neutral measure for the market model is + equivalent to Q such that the discounted defined as any probability measure Q + ? ? cumulative prices β S and β B are (G, Q)-local martingales (see, for instance, Page 234 in Bj¨ork [9]). In particular, we note that the underlying probability measure Q is a risk-neutral measure for the market model. The following lemma can be easily proved using the Itˆ o formula.

262

T.R. Bielecki, S. Cr´epey, M. Jeanblanc and M. Rutkowski

?t = Lemma 2.1. Let us denote X ?t ) = d d(βt X





S?t ?t B

βt S?t ?t βt B

. We have, for every t ∈ [0, Θ], = 1{t≤τd } βt Σt d

#

Wt Mtd

$

,

where the F-predictable, matrix-valued process Σ is given by the formula  σ(t, S+t )S+t −η S+t Σt = + S+t ) ν(t) − B(t, + S+t ) . σ(t, S+t )S+t ∂S B(t,

(9)

(10)

In what follows, we work under the following standing assumption.

Assumption 2.2. The matrix-valued process Σ is invertible on [0, Θ]. The next proposition suggests that, under Assumption 2.2, our market model is complete with respect to defaultable claims maturing at τd ∧ Θ.

+ Proposition 2.1. For any risk-neutral measure )  Q *for the market model, we have +  dQ that the Radon-Nikodym density Zt := EQ dQ  Gt = 1 on [0, τd ∧ Θ].

+ equivalent to Q on (Ω, GΘ ), the RadonProof. For any probability measure Q Nikodym density process Zt , t ∈ [0, Θ], is a strictly positive (G, Q)-martingale. Therefore, by the predictable representation theorem due to Kusuoka [33], there exist two G-predictable processes, ϕ and ϕd say, such that   dZt = Zt− ϕt dWt + ϕdt dMtd , t ∈ [0, Θ]. (11) + is then a risk-neutral measure whenever the process A probability measure Q + ? is a (G, Q)-local ? is a βX martingale or, equivalently, whenever the process β XZ (G, Q)-local martingale. The latter condition is satisfied if and only if # $ ϕt Σt = 0. (12) γ(t, S+t )ϕdt The unique solution to (12) on [0, τd ∧ Θ] is ϕ = ϕd = 0 and thus Z = 1 on [0, τd ∧ Θ]. 

2.3. Modified market model In market practice, traders would typically prefer to use for hedging purposes the rolling CDS, rather than a fixed-maturity CDS considered in Section 2.2. Formally, the rolling CDS is defined as the wealth process of a self-financing trading strategy that amounts to continuously rolling one unit of long CDS contracts indexed by their inception date t ∈ [0, Θ], with respective maturities θ(t), where θ : [0, Θ] → [0, Θ] is an increasing and piecewise constant function satisfying θ(t) ≥ t (in particular, θ(Θ) = Θ). We shall denote such contracts as CDS(t, θ(t)). Intuitively, the above-mentioned strategy amounts to holding at every time t ∈ [0, Θ] one unit of the CDS(t, θ(t)) combined with the margin account, that is, either positive or negative positions in the savings account. At time t + dt

Convertible Bonds in a Defaultable Diffusion Model

263

the unit position in the CDS(t, θ(t)) is unwound (or offset) and the net markto-market proceeds, which may be either positive or negative depending on the evolution of the CDS market spread between the dates t and t + dt, are reinvested in the savings account. Simultaneously, a freshly issued unit credit default swap CDS(t + dt, θ(t + dt)) is entered into at no cost. This procedure is carried on in continuous time (in practice, on a daily basis) until the hedging horizon. In the ? in (9) is meant to represent the discounted case of the rolling CDS, the entry β B cumulative wealth process of this trading strategy. The next result shows that the only modification with respect to the case of a fixed-maturity CDS is that the matrix-valued process Σ, which was given previously by (10), should now be adjusted to Σ given by (13). ? represents the rolling CDS, Lemma 2.1 Lemma 2.2. Under the assumption that B holds with the F-predictable, matrix-valued process Σ given by the expression  σ(t, S+t )S+t −ηS+t (13) Σt = + + + + + + + + + σ(t, St )St ∂S Pθ(t) (t, St ) − ν¯(t, St )σ(t, St )St ∂S Fθ(t) (t, St ) ν(t) where the functions P+θ(t) and F+θ(t) are the pre-default pricing functions of the protection and fee legs of the CDS(t, θ(t)), respectively, and the quantity P+θ(t) (t, S+t ) ν¯(t, S+t ) = F+θ(t) (t, S+t )

represents the related CDS spread.

Proof. Of course, it suffices to focus on the second row in matrix Σ. We start by noting that Lemma 2.4 in [7], when specified to the present set-up, yields the ? of the rolling CDS following dynamics for the discounted cumulative wealth β B between the deterministic times representing the jump times of the function θ   ?t ) = (1 − Ht )βt α−1 dpt − ν¯(t, S+t ) dft + βt ν(t) dM d , d(βt B (14) t t where we denote ) pt = EQ

θ(t)

0

 *  αu ν(u)γ(u, S+u ) du  Ft ,

and where in turn the process α is given by αt = e −



t 0

+u ) du µ(u,S

= e−



ft = EQ

)

θ(t)

0

t +u )) du (r(u)+γ(u,S 0

 *  αu du  Ft ,

. ? is necessarily conIn addition, being a (G, Q)-local martingale, the process β B tinuous prior to default time τd (this follows, for instance, from Kusuoka [33]). It is therefore justified to use (14) for the computation of a diffusion term in the ? dynamics of β B. To establish (13), it remains to compute explicitly the diffusion term in (14). Since the function θ is piecewise constant, it suffices in fact to examine the stochastic differentials dpt and dft for a fixed value θ = θ(t) over each interval of

264

T.R. Bielecki, S. Cr´epey, M. Jeanblanc and M. Rutkowski

constancy of θ. By the standard valuation formulae in an intensity-based framework, the pre-default price of a protection payment ν with a fixed horizon θ is given by, for t ∈ [0, θ],  * ) θ  αu ν(u)γ(u, S+u ) du  Ft . P+θ (t, S+t ) = α−1 E Q t t

Therefore, by the definition of p, we have that, for t ∈ [0, θ],  t pt = αu ν(u)γ(u, S+u ) du + αt P+θ (t, S+t ).

(15)

0

Since p is manifestly a (G, Q)-martingale, an application of the Itˆ o formula to (15) yields, in view of (1), dpt = αt σ(t, S+t )S+t ∂S P+θ (t, S+t ) dWt .

Likewise, the pre-default price of a unit rate fee payment with a fixed horizon θ is given by  * ) θ  F+θ (t, S+t ) = α−1 E α du  Ft . Q u t t

By the definition of f , we obtain, for t ∈ [0, θ],  t ft = αu du + αt F+θ (t, S+t ) 0

and thus, noting that f is a (G, Q)-martingale, we conclude easily that dft = αt σ(t, S+t )S+t ∂S F+θ (t, S+t ) dWt .

By inserting dpt and dft into (14), we complete the derivation of (13).



Remarks 2.3. It is worth noting that for a fixed u the pricing functions P+θ(u) and F+θ(u) can be characterized as solutions of the PDE of the form (7) on [u, θ(u)] × R+ with the function δ therein given by δ 1 (t, S) = ν(t)γ(t, S) and δ 2 (t, S) = 1, respectively. Hence the use of the Itˆ o formula in the proof of Lemma 2.2 can indeed be justified. Note also that, under the standing Assumption 2.2, a suitable form of completeness of the modified market model will follow from Proposition 2.1.

3. Convertible securities In this section, we first recall the concept of a convertible security (CS). Subsequently, we establish, or specify to the present situation, the fundamental results related to its valuation and hedging. We start by providing a formal specification in the present set-up of the notion of a convertible security. Let 0 (resp. T ≤ Θ) stand for the inception date (resp. the maturity date) of a CS with the underlying asset S. For any t ∈ [0, T ], we write FTt (resp. GTt ) to denote the set of all F-stopping times (resp. G-stopping times) with values in [t, T ]. Given the time of lifting of a call protection of a CS,

Convertible Bonds in a Defaultable Diffusion Model

265

which is modeled by a stopping time τ¯ belonging to GT0 , we denote by G¯Tt the following class of stopping times , G¯Tt = ϑ ∈ GTt ; ϑ ∧ τd ≥ τ¯ ∧ τd .

We will frequently use τ as a shorthand notation for τp ∧ τc , for any choice of (τp , τc ) ∈ GTt × G¯Tt . For the definition of the game option, we refer to Kallsen and K¨ uhn [31] and Kiefer [32]. Definition 3.1. A convertible security with the underlying S is a game option with the ex-dividend cumulative discounted cash flows π(t; τp , τc ) given by the following expression, for any t ∈ [0, T ] and (τp , τc ) ∈ GTt × G¯Tt ,  τ ) * βu dDu + 1{τd >τ } βτ 1{τ =τp T } βT ξ, t ∈ [0, T ). (21) [0,t]

[0,T ]

Convertible Bonds in a Defaultable Diffusion Model

267

3.2. Doubly reflected BSDEs approach We will now apply to convertible securities the method proposed by El Karoui et al. [27] for American options and extended by Cvitani´c and Karatzas [20] to the case of stochastic games. In order to effectively deal with the doubly reflected BSDE associated with a convertible security, which is introduced in Definition 3.3 below, we need to impose some technical assumptions. We refer the reader to Section 4 for concrete examples in which all these assumptions are indeed satisfied. Assumption 3.1. We postulate that: • the coupon process C satisfies Ct = C(t) :=



t

c(u) du +

0



ci ,

(22)

0≤Ti ≤t

for a bounded, Borel-measurable continuous-time coupon rate function c(·) and deterministic discrete times and coupons Ti and ci , respectively; we take the tenor of the discrete coupons as T0 = 0 < T1 < · · · < TI−1 < TI with TI−1 < T ≤ TI ; • the recovery process (Rt )t∈[0,T ] is of the form R(t, St− ) for a Borel-measurable function R; • Lt = L(t, St), Ut = U (t, St ), ξ = ξ(ST ) for some Borel-measurable functions L, U and ξ such that, for any t, S, we have L(t, S) ≤ U (t, S),

• the call protection time τ¯ ∈ FT0 .

L(T, S) ≤ ξ(S) ≤ U (T, S);

The accrued interest at time t is given by t − Tit −1 it c , A(t) = Tit − Tit −1

(23)

where it is the integer satisfying Tit −1 ≤ t < Tit . On open intervals between the cit discrete coupon dates we thus have dA(t) = a(t) dt with a(t) = Ti −T . it −1 t To a CS with data (functions) C, R, ξ, L, U and lifting time of call protection τ¯, we associate the Borel-measurable functions f (t, S, x) (for x real), g(S), ℓ(t, S) and h(t, S) defined by g(S) = ξ(S) − A(T ),

ℓ(t, S) = L(t, S) − A(t),

h(t, S) = U (t, S) − A(t), (24)

and (recall that µ(t, S) = r(t) + γ(t, S)) where we set

f (t, S, x) = γ(t, S)R(t, S) + Γ(t, S) − µ(t, S)x,

(25)

Γ(t, S) = c(t) + a(t) − µ(t, S)A(t).

(26)

Remarks 3.1. In the case of a puttable security, the process U is not relevant and thus we may and do set h(t, S) = +∞. Moreover, in the case of an elementary security, the process L plays no role either, and we redefine further ℓ(t, S) = −∞.

268

T.R. Bielecki, S. Cr´epey, M. Jeanblanc and M. Rutkowski

We define the quadruplet (f, g, ℓ, h) associated to a CS (parameterized by x ∈ R, regarding f ) as ft(x) = f (t, S+t , x),

g = g(S+T ),

ht = 1{t¯τ } ∂S Π ? 0 (t, S+t ) , Zt = σ(t, S+t )S+t 1{t≤¯τ } ∂S Π (37)

? 0 and Π ? 1 are where the last equality holds provided that the pricing functions Π sufficiently regular for the Itˆ o formula to be applicable. Recall that the process Σ is given either by (10) or by (13), depending on whether we choose a fixed-maturity ? CDS of Section 2.2 or a rolling CDS of Section 2.3 as a traded asset B.

3.3.2. Approximation Schemes for Variational Inequalities. We now come to the issues of uniqueness and approximation of solutions for (VI). For this, we make the following additional standing Assumption 3.4. The functions r, q, γ, σ are locally Lipschitz continuous. We refer the reader to Barles and Souganidis [3] (see also Cr´epey [18]) for the definition of stable, monotone and consistent approximation schemes to (VI) and for the related notion of convergence of the scheme, involved in the following Proposition 3.5. ? 0 introduced in Proposition 3.4(i) is the (i) Post-protection price. The function Π unique P-solution, the maximal P-subsolution, and the minimal P-super? 0 )h>0 denote a solution of the related problem (VI) on D = [0, T ] × R. Let (Π h ? 0. stable, monotone and consistent approximation scheme for the function Π 0 + 0 ? ? Then Πh → Π locally uniformly on D as h → 0 . ? 1 introduced in Proposition 3.4(ii) is the (ii) Protection price. The function Π unique P-solution, the maximal P-subsolution, and the minimal P-super¯ Let (Π ? 1 )h>0 denote a solution of the related problem (VI.1) on D = D(T¯, S). h ? 1. stable, monotone and consistent approximation scheme for the function Π 1 1 + ? ? ¯ Then Πh → Π locally uniformly on D as h → 0 , provided (in case S < +∞) ?1 → Π ?1 = Π ? 0 on [0, T¯ ] × {S}. ¯ Π h

Convertible Bonds in a Defaultable Diffusion Model

277

Proof. Note, in particular, that under our assumptions: • the functions (r(t) − q(t) + ηγ(t, S))S and σ(t, S)S are locally Lipschitz continuous; • the function f admits a modulus of continuity in S, in the sense that for every constant c > 0 there exists a continuous function ηc : R+ → R+ with ηc (0) = 0 and such that, for any t ∈ [0, T ] and S, S ′ , x ∈ R with |S| ∨ |S ′ | ∨ |x| ≤ c, |f (t, S, x) − f (t, S ′ , x)| ≤ ηc (|S − S ′ |).

The assertions are then consequences of the results in [18].



Remarks 3.10. In particular, we refer the reader to [18] in regard to the fact that the potential discontinuities of f at the Ti s (which represent a non-standard feature from the point of view of the classic theory of viscosity solutions as presented, for instance, in Crandall et al. [17]) are not a real issue in the previous results, provided one works with Definition 3.8 of viscosity solutions to our problems.

4. Convertible bonds As was already pointed out, a convertible bond is a special case of a convertible security. To describe the covenants of a typical convertible bond (CB), we introduce the following additional notation (for a detailed description and discussion of typical covenants of a CB, see, e.g., [2, 4, 35]): ¯ the par (nominal) value, N η the fractional loss on the underlying equity upon default, C the deterministic coupon process given by (22), ¯ R the recovery process on the CB upon default of the issuer at time t, given by ¯ t = R(t, ¯ St− ) for a continuous bounded function R, ¯ R κ the conversion factor, cb ¯ t : the effective recovery process, Rt = Rcb (t, St− ) = (1 − η)κSt− ∨ R cb ¯ ∨ κST + A(T ): the effective payoff at maturity; with A given by (23), ξ =N ¯ ¯ ¯ ≤ C, ¯ P ≤ C: the put and call nominal payments, respectively, such that P¯ ≤ N δ ≥ 0: the length of the call notice period (see below), tδ = (t + δ) ∧ T : the end date of the call notice period started at t. Note that putting a convertible bond at τp effectively means either putting or converting the bond at τp , whichever is best for the bondholder. This implies that, accounting for the accrued interest, the effective payment to the bondholder who decides to put at time t is Ptef := P¯ ∨ κSt + A(t). (38) As for calling, convertible bonds typically stipulate a positive call notice period δ clause, so that if the bond issuer makes a call at time τc , then the bondholder

278

T.R. Bielecki, S. Cr´epey, M. Jeanblanc and M. Rutkowski

has the right to either redeem the bond for C¯ or convert it into κ shares of stock at any time t ∈ [τc , τcδ ], where τcδ = (τc + δ) ∧ T . If the bond has been called at time t then, accounting for the accrued interest, the effective payment to the bondholder in case of exercise at time u ∈ [t, (t+δ)∧T ] equals Ctef := C¯ ∨ κSt + A(t). (39) 4.1. Reduced convertible bonds A CB with a positive call notice period is rather hard to price directly. To overcome this difficulty, it is natural to use a two-step valuation method for a CB with a positive call notice period. In the first step, one searches for the value of a CB upon call, by considering a suitable family of puttable bonds indexed by the time variable t (see Propositions 4.7 and 4.8). In the second step, the price process obtained in the first step is used as the payoff at a call time of a CB with no call notice period, that is, with δ = 0. To formalize this procedure, we find it convenient to introduce the concept of a reduced convertible bond, i.e., a particular convertible bond with no call notice period. Essentially, a reduced convertible bond associated with a given convertible bond with a positive call notice period is an ‘equivalent’ convertible bond with no call notice period, but with the payoff process at call adjusted upwards in order to account for the additional value due to the option-like feature of the positive call period for the bondholder. Definition 4.1. A reduced convertible bond (RB) is a convertible security with coupon process C, recovery process Rcb and terminal payoffs Lcb , U cb , ξ cb such that (cf. (38)–(39)) ¯t , Rtcb = (1 − η)κSt− ∨ R and, for every t ∈ [0, T ],

ef ¯ Lcb t = P ∨ κSt + A(t) = Pt ,

¯ ∨ κST + A(T ), ξ cb = N

+ cb (t, St ) + 1{t≥τ } Ctef , Utcb = 1{t 0 and z0 = (x0 , y0 ) ∈ R2 , we define two half-lines (r, z0 ) = {(x, y), (y − y0 ) + r(x − x0 ) = 0, x ≤ x0 , y ≥ y0 } and (z0 , r) = {(x, y), (y − y0 ) + r(x − x0 ) = 0, x ≥ x0 , y ≤ y0 }.

Remark 2.2. Since we shall be dealing with lines with negative slopes throughout the paper, we simply use the absolute value of their slopes to denote the steepness. Note that the roles of z0 in (l, z0 ) and (z0 , r) in Definition 2.1 are different. For two points z1 = (x1 , y1 ) and z2 = (x2 , y2 ) where x1 ≤ x2 and y1 ≥ y2 , (i.e., z1 is to the northwest of z2 ) we shall use |z1 z2 | to denote the absolute value of the −y2 ) and use (z1 , |z1 z2 |, z2 ) slope of the line through z1 and z2 . (i.e., |z1 z2 | = xy21 −x 1 for the line segment connecting z1 and z2 . Similarly, for z1 , . . . , zn in R2 with zi to the northwest of zi+1 for each i, (z1 , |z1 z2 |, z2 , . . . , zn ) is the polygon connecting z1 , z2 , . . . to zn . Definition 2.3. Let z0 = (x0 , y0 ) ∈ R2 and r > 0. A point z = (x, y) ∈ R2 is said to the left of (r, z0 ) (or (z0 , r)) if y + rx ≤ y0 + rx0 and to the right of (r, z0 ) (or (z0 , r)) if y + rx ≥ y0 + rx0 . Definition 2.4. For two half-lines (l, z0 ) and (z0 , r) with l > r, let (l, z0 , r) = (l, z0 ) ∪ (z0 , r) and let conv(l, zo , r) be the convex set enclosed by (l, z0 , r). We call conv(l, z0 , r) a wedge and obviously, (l, z0 , r)= ∂ conv(l, z0 , r). For l1 > l2 > · · · > ln+1 and z1 , . . . , zn in R2 where zi is to the northwest of zi+1 with |zi zi+1 | = li+1 , let conv(l1 , z1 , l2 , z2 , . . . , ln , zn , ln+1 ) = ∩ni=1 conv(li , zi , li+1 ) and obviously, (l1 , z1 , . . . , zn , ln+1 ) = ∂ conv(l1 , z1 , l2 , z2 , . . . , ln , zn , ln+1 ). We also define (z1 , l2 , . . . , zn ) to be the effective boundary of conv(l1 , z1 , l2 , . . . , ln , zn , ln+1 ).

302

T.-S. Chiang and S.-J. Sheu Formally in Definition 2.4, conv(l, z0 , r) = {z ∈ R2 , z is to the right of both (l, z0 ) and (z0 , r)}

and

, conv(l1 , z1 , . . . , ln , zn , ln+1 ) = z ∈ R2 , z is to the right of (l1 , z1 ), . . . . . . , (ln , zn ), (zn , ln+1 ) .

Lemma 2.5. For any z0 = (x0 , y0 ) ∈ R2 and s > 0, we have R(z0 , s) = conv(l, z0 , r) where l = s(1 + k) and r = s(1 − k). And ∂R(z0 , s) consists of all replicating portfolios of z0 when the stock price is s. Proof. A point z = (x, y) ∈ R(z0 , s) if xs+y ≥ x0 s+y0 +|x−x0 |sk. If x ≤ x0 , then the inequality becomes xs + y ≥ x0 s + y0 + (x0 − x)sk, i.e., xs(1 + k) + y ≥ x0 s(1 + k) + y0 . The case x ≥ x0 is similar. Hence R(z0 , S) = conv(s(1 + k), z0 , s(1 − k)). Trivially, ∂R(z0 , s) is the set of all portfolios replicating z0 when the stock price is s.  For a contingent claim X(ω) = (x(ω), y(ω)), let H T (ω) = {z = (¯ x, y¯), x ¯≥ x(ω), y¯ ≥ y(ω)} and RT (ω) = R(X(ω), ST (ω)) be the set of super replicating portfolios of X(ω) at time T . We next define successively the adapted set-valued hedging process H t and super replicating process Rt , t = 1, 2, . . . , T − 1 as follows. First, @ H T −1 (ω) = H T −1 (ω(T − 1)) = RT (¯ ω) ω ¯ (T −1)=ω(T −1)

and

RT −1 (ω) = RT −1 (ω(T − 1)) =

A

z∈H T −1 (ω)

R(z, ST −1 (ω)) (Figure 1)

In general, having defined Rt+1 (ω) = Rt+1 (ω(t + 1)), t ≥ 1, let @ H t (ω) = Rt+1 (¯ ω)

(2.1)

ω(t)=ω(t) ¯

and

A

Rt (ω) =

R(z, St (ω)).

(2.2)

z∈H t (ω)

Finally, at t = 0, let H0 =

@

R1 (ω).

(2.3)

ω

For each ω and 1 ≤ t ≤ T − 1, Rt (ω) consists of the portfolios that can super replicate a portfolio in H t (ω) when the price is St (ω) and H t (ω) is characterized as the set that if z is in H t (ω), then carried over to time t+ 1, z belongs to Rt+1 (¯ ω) for any ω ¯ with ω ¯ (t) = ω(t). The following theorem is an easy consequence of the above definitions. We omit the proof.

A Convexity Approach to Option Pricing TU 3 )"*e)3 , l*

303

TU 3 )"*)3 , l* TU 3 )"*v)3 , l* I U 3 )"*

z Y)TU 3 )"*e*

SU 3 )"*

Y)TU 3 )"*v*

TU 3 )"*e)3  l* TU 3 )"*)3  l* TU 3 )"*v)3  l* y

Figure 1. H T −1 (ω) and RT −1 (ω) in CRR model Theorem 2.6. For a portfolio process {Zt } to super replicate a contingent claim X, it is necessary and sufficient that Zt+1 (ω) ∈ H t (ω) for each ω (or equivalently, Zt+1 (ω) ∈ Rt+1 (¯ ω ) with ω(t) = ω ¯ (t)) and 0 ≤ t ≤ T . If no transaction fee is charged at t = 0, then the (seller’s) price for X is obviously p(X) = inf z=(x,y)∈H 0 (S0 x + y). A point (x, y) ∈ H 0 is said to determine p(X) if the minimum is attained at (x, y) in p(X). The seller’s price is the minimal cost to create a portfolio process to super replicate X. A portfolio process Zt which super replicates X is optimal if Z1 = (x, y) for some (x, y) determining p(X). It is uniquely optimal if for any other optimal process Zt′ , we have Zt = Zt′ for any 1 ≤ t ≤ T + 1. Trivially, RT (ω) is the wedge conv(ST (ω)(1 + k), X(ω), ST (ω)(1 − k)). But immediately H T −1 becomes complicated because it is the intersection of several wedges. Some simple but important observations are as follows. Lemma 2.7. For wedges conv(li , zi , ri ), i = 1, . . . , n let l = max(l1 , . . . , ln ) and r = min(r1 , . . . , rn ). Then for some k and points w1 , . . . , wk , ∩ni=1 conv(li , zi , ri ) = conv(c1 , w1 , c2 , w2 , . . . , ck , wk , ck+1 )

where c1 = l > c2 > · · · > ck+1 = r.

Proof. Let conv(l1 , z1 , r1 ) and conv(l2 , z2 , r2 ) be two wedges and l1 > l2 , r1 > r2 . If z2 ∈ conv(l1 , z1 , r1 ), then either (l2 , z2 ) ∩ (l1 , z1 ) = φ or (l2 , z2 ) ∩ (z1 , r1 ) = φ. In

304

T.-S. Chiang and S.-J. Sheu

the first case, we have conv(l1 , z1 , r1 ) ∩ conv(l2 , z2 , r2 ) = conv(l1 , w, l2 , z2 , r2 ) and in the second case, conv(l1 , z1 , r1 ) ∩ conv(l2 , z2 , r2 ) = conv(l1 , z1 , r1 , w, l2 , z2 , r2 ) where w is the intersection point. If z2 ∈ / conv(l1 , z1 , r1 ), then four cases can occur. (i) (ii) (iii) (iv)

(l2 , z2 ) ∩ (l1 , z1 ) = φ (z2 , r2 ) ∩ (l1 , z1 ) = φ (z2 , r2 ) ∩ (l1 , z1 ) = φ (l2 , z2 ) ∩ (l1 , z1 ) = φ

and (l2 , z2 ) ∩ (z1 , r1 ) = φ, but (z2 , r2 ) ∩ (z1 , r1 ) = φ, but (z2 , r2 ) ∩ (z1 , r1 ) = φ or but (l2 , z2 ) ∩ (z1 , r1 ) = φ.

In (i), conv(l1 , z1 , r1 ) ∩ conv(l2 , z2 , r2 ) = (l1 , w1 , l2 , w2 , r1 , w3 , r2 ). In (ii), conv(l1 , z1 , r1 ) ∩ conv(l2 , z2 , r2 ) = (l1 , w1 , r2 ).

In (iii), conv(l1 , z1 , r1 ) ∩ conv(l2 , z2 , r2 ) = (l1 , z1 , r1 , w1 , r2 )

and in (iv), conv(l1 , z1 , r1 ) ∩ conv(l2 , z2 , r2 ) = (l1 , z1 , r1 , w1 , l2 , z2 , r2 ). Here, w1 , w2 and w3 are intersection points.

The proof is similar for other cases.



After this point the convention appearing in Lemma 2.7 about the ordering of slopes is used. This should be noted. i i i Corollary 2.8. For convex sets Ki = conv(l1i , w1i , . . . , lL , wL , lL ), i = 1, . . . , n i −1 i −1 i n n 1 n 1 with l = max(l1 , . . . , l1 ) and r = min(lL1 , . . . , lLn ), ∩i=1 Ki = conv(l1 , w1 , . . . , lL−1 , wL−1 , lL ) where l1 = l and lL = r for some L and points w1 , w2 , . . . and wL−1 . i ) is Proof. This immediately follows from Lemma 2.7 because each conv(l1i , . . . , lL i i i i i i i i itself the intersection of wedges conv(l1 , w1 , l2 ), conv(l2 , w2 , l3 ), . . . and conv(lL , i −1 i i ).  , l wL Li i −1 (¯ ω) (¯ ω) and mt (ω) = inf ω¯ (t)=ω(t) SSt+1 . We shall Let Mt (ω) = supω¯ (t)=ω(t) SSt+1 t (ω) t (ω) always assume Mt > 1 and mt < 1. The following two lemmas describe the shapes of H t and Rt respectively.

Lemma 2.9. For any 0 ≤ t ≤ T − 1, H t (ω) = conv(l1 , z1 , l2 , . . . , ln , zn, ln+1 ) where l1 = St (ω)Mt (ω)(1 + k) and ln+1 = St (ω)mt (ω)(1 − k) for some points zi , i = 1, . . . , n, and Rt (ω) = conv(St (1+k), w1 , . . . , wk , St (1−k)) for some points wi , i = 1, . . . , k. Proof. For t = T − 1,

H T −1 (ω) = ∩ω¯ (T −1)=ω(T −1) RT (¯ ω)

= ∩ω¯ (T −1)=ω(T −1) (ST (¯ ω )(1 + k), X(ω), ST (¯ ω )(1 − k)).

Hence by Lemma 2.7, for some points w1 , w2 , . . . , wk , H T −1 (ω) = conv(ST −1 (ω)MT −1 (ω)(1 + k), w1 , . . . , wk , ST −1 (ω)mT −1 (ω)(1 − k)). (2.4)

A Convexity Approach to Option Pricing

305

Now for a wedge conv(l, z0 , r), the super replicating set ∪z∈conv(l,z0 ,r)R(z, s) is obviously conv(s(1 + k), z0 , s(1 − k)) if s(1 + k) ≤ l and r ≤ s(1 − k). And in general, ∪z∈conv(l1 ,z1 ,...,lL ) R(z, s)

= conv(s(1 + k), zm , lm+1 , zm+1 , . . . , lM−1 , zM −1 , lM , zM , s(1 − k))

where m and M are so chosen that lm+1 < s(1+k) ≤ lm and lM +1 ≤ s(1−k) < lM . Since Mt (ω) > 1 and mt (ω) < 1, we thus have by (2.4), RT −1 (ω) = ∪z∈H T −1 (ω) R(z, ST −1 (ω))

= ∪z∈conv(ST −1 (ω)MT −1 (ω)(1+k),w1 ,...,wk ,ST −1 (ω)mT −1 (ω)(1−k)) R(z, ST −1 (ω)) = conv(ST −1 (ω)(1 + k), . . . , ST −1 (ω)(1 − k)).

Now Corollary 2.8 and an induction yield the lemma.



Remark 2.10. For a hedging set H t = conv(l1 , z1 , . . . , ln , zn , ln+1 ) with l1 = St (1 + k)Mt > l2 > · · · > ln+1 = St (1 − k)mt as in Lemma 2.9, let m and M be so chosen that lm ≥ St (1 + k) > lm+1

and lM > St (1 − k) ≥ lM +1 , t = 1, 2, . . . , T,

(2.5)

t

respectively. Then R = conv(St (1 + k), zm , lm+1 , . . . , lM , zM , St (1 − k)). Note that the effective boundary of Rt (= (zm , lm+1 , . . . , lM , zM )) is contained in ∂H t . If we have strict inequalities lm > St (1 + k) > lm+1

and lM > St (1 − k) > lM +1 , t = 1, 2, . . . , T,

(2.6)

and if w is a portfolio in ∂Rt , then at the price St , there can be only one posttrade portfolio of w to be in H t . To be precise, if w ∈ (zM , St (1 − k)), then zM is the only possible post trade portfolio of w in H t when the stock price is St . If w ∈ (St (1 + k), zm ), zm is the only possible post trade portfolio of w in H t . If w ∈ (zm , lm+1 , . . . , lM , zM ), then w itself is the only element in H t such that w ∈ R(w, St ). Hence if Zt super replicates X and Zt (ω) ∈ ∂Rt (¯ ω ) for some ω ¯ with ω ¯ (t − 1) = ω(t − 1), then Zt+1 (¯ ω ) must be contained in the effective boundary of Rt (¯ ω )(∈ ∂H t (¯ ω )). In particular, if Rt (¯ ω ) is a wedge conv(St (1 + k), z, St (1 − k)), then Zt+1 (¯ ω ) = z. Remark 2.11. Since H 0 = conv(l1 , z1 , . . . , ln , zn , ln+1 ) with l1 = S0 M0 (1 + k) > S0 > ln+1 = S0 m0 (1 − k), there is a first i such that li ≥ S0 > li+1 and p(X) = xi S0 + yi where zi = (xi , yi ). (Actually, if li = S0 , then p(X) = xS0 + y for any z = (x, y) ∈ (li , zi ).) If li > S0 > li+1 , (2.7) then zi is the unique portfolio determining p(X) and any optimal portfolio process must have Z1 = zi . We next state the situation that the optimal portfolio process is unique.

306

T.-S. Chiang and S.-J. Sheu

Theorem 2.12. Let St be a price process and X be a contingent claim. Suppose (2.6) and (2.7) are satisfied and let z be the unique portfolio that determines p(X). Then the optimal portfolio process super replicating X is unique if z ∈ ∂R1 (ω) a.s. and the effective boundary of Rt (ω) is contained in ∂Rt+1 (¯ ω ) with ω ¯ (t) = ω(t) for t = 1, . . . , T . Proof. If {Zt } is optimal, then Z1 = z by Remark 2.11. Since Z1 = z ∈ ∂R1 , by Remark 2.10, Z2 can take the unique choice in the effective boundary of R1 , which is contained in ∂R2 . The induction then implies that the process is unique. 

3. Perfect replication and optimality In this section, we re-derive three theorems dealing with the question whether a perfect replication is optimal. Theorem 3.1 was proved in (Kusuoka [9], p. 199) considering in general models when a perfectly replicating portfolio process (assuming it exists) of a contingent claim is optimal. Here, we can also establish the uniqueness of optimal process. It was proved in (Bensaid et al. [1], p. 74) that in binomial models with d1 = u > (1+k)2 (i.e., the transaction fee is small), a perfect replication is the unique optimal portfolio process. Theorem 3.2 was proved in [12, Theorem 9] generalizing the theorem to binomial models with u(1−k) > 1+k > 1−k > d(1+k). Theorem 3.3 was proved in (Bensaid et al. [1], p. 75) generalizing the fact (Boyle and Vorst, [2]) that perfect replication is optimal for long European calls in binomial models. We shall present much simpler proofs of the three theorems using the convexity method and show that all the three seemingly unrelated theorems hold true actually because of the same reason that ∂Rt is a wedge and the effective boundary of Rt is contained in ∂Rt+1 . Theorem 2.12 then establishes the uniqueness of the optimal portfolio process in all three cases. Theorem 3.1. Let S0 , S1 , . . . , ST be a stock price process on Ω with the filtration {Ft } and Z1 , Z2 , . . . , ZT +1 be a portfolio process perfectly replicating a contingent claim X (i.e., X = ZT +1 ). Let Zt = (xt , yt ). If for t = 0, 1, . . . , T − 1, and

P (St+1 ≥ St , xt+2 ≥ xt+1 |Ft ) > 0

(3.1)

P (St+1 ≤ St , xt+2 ≤ xt+1 |Ft ) > 0,

(3.2)

then Zt is optimal. If we have strict inequalities in (3.1) and (3.2), then {Zt}t is unique. Proof. We first claim that Rt is the wedge conv(St (1 + k), Zt+1 , St (1 − k)), t = 1, . . . , T . By definition, RT = conv(ST (1 + k), ZT +1 , ST (1 − k)). By (3.1), for every ω ∈ Ω, there exists an ω ′ such that ω(T − 1) = ω ′ (T − 1) but ST −1 (ω ′ ) ≤ ST (ω ′ ) and xT (ω ′ ) ≤ xT +1 (ω ′ ).

(3.3)

A Convexity Approach to Option Pricing

307

Since ZT (ω) replicates ZT +1 (ω ′ ), we thus have ZT (ω) ∈ ∂RT (ω ′ ). Moreover, (3.3) implies that ZT (ω) ∈ (ST (ω ′ )(1 + k), ZT +1 (ω ′ )). (3.4)

Similarly, condition (3.2) implies that there exists an ω ′′ such that ω(T − 1) = ω ′′ (T − 1) but ST −1 (ω ′′ ) ≥ ST (ω ′′ ) and xT (ω ′′ ) ≥ xT +1 (ω ′′ ).

(3.5)

ZT (ω) ∈ (ZT +1 (ω ′′ ), ST (ω ′′ )(1 − k)).

(3.6)

ZT (ω) ∈ H T −1 (ω).

(3.7)

Also, By Theorem 2.6, we have T −1

Since intH (ω) is contained in intR (ω ), the perfect replication of ZT +1 (ω ′ ) by ZT (ω) implies ZT (ω) ∈ ∂H T −1 (ω). (3.8) T



From Lemma 2.9, H T −1 (ω) takes the form conv(l1 , . . . , l, ZT (ω), r, . . . ) and (3.3) implies that l ≥ ST (ω ′ )(1 + k) and r ≤ ST (ω ′′ )(1 − k). By (3.3), (3.5) and Remark 2.10, we have that RT −1 is conv(ST −1 (1 + k), ZT , ST −1 (1− k)). This completes the induction. Now at time 0, H 0 = conv(l1 , . . . , l, Z1 , r, . . . ) with l ≥ S1 (ω ′ )(1 + k) and r ≤ S1 (ω ′′ )(1 − k). Since S1 (ω ′ ) ≥ S0 ≥ S1 (ω ′′ ) by (3.3) and (3.5), we obviously have minz=(x,y)∈H 0 (S0 x + y) = S0 x0 + yo where Z1 = (x0 , y0 ). Hence {Zt } is optimal. The effective boundary of Rt is the point Zt and it is contained in ∂Rt because it perfectly replicates Zt+1 . The uniqueness follows from Theorem 2.12 because (2.6) and (2.7) are satisfied by the strict inequalities in (3.1) and (3.2).  It is easy to see that in a binomial model, the condition u(1 − k) > d(1 + k) is necessary and sufficient for any contingent claim to be uniquely perfectly replicated. With a slightly stronger condition, we re-derive the following theorem (Theorem 4.1 [8], Prop. 3.1 [11] and Theorem 9 [12]) and establish the uniqueness of the optimal replicating portfolio process using our method. Theorem 3.2. In a binomial model, suppose 1 1+k 1+k and > . (3.9) 1−k d 1−k Then the portfolio process which perfectly replicates a contingent claim is uniquely optimal. u>

Proof. The proof is essentially the same as that in Theorem 3.1. We first claim Rt (ω) = conv(St (ω)(1 + k), Zt+1 , St (ω)(1 − k)) t = 1, . . . , T

(3.10)

where Zt , t = 1, . . . , T + 1 is the unique portfolio process replicating X. By definition, RT = conv(ST (1 + k), X, ST (1 − k)) and we proceed to show RT −1 (ω) = conv(ST −1 (ω)(1 + k), ZT (ω), ST −1 (ω)(1 − k)).

308

T.-S. Chiang and S.-J. Sheu

From (2.1), H T −1 (ω) = ∩ω¯ (T −1)=ω(T −1) RT (¯ ω)

= conv(ST −1 (ω)u(1 + k), X(ω ′ ), ST −1 (ω)u(1 − k))

∩ conv(ST −1 (ω)d(1 + k), X(ω ′′ ), ST −1 (ω)d(1 − k))

where ω ′ , ω ′′ ∈ Ω, ω ′ (T −1) = ω ′′ (T −1) = ω(T −1) and ωT′ = ωT −1 u, ωT′′ = ωT −1 d. Since ZT replicates ZT +1 , H T −1 (ω) must be one of the following four forms: conv(ST −1 (ω)u(1 + k), ZT (ω), ST −1 (ω)d(1 − k)),

conv(ST −1 (ω)u(1 + k), ZT (ω), ST −1 (ω)d(1 + k), ZT +1 (ω ′′ ), ST −1 (ω)d(1 − k)), conv(ST −1 (ω)u(1 + k), ZT +1 (ω ′ ), ST −1 (ω)u(1 − k),

ZT (ω), ST −1 (ω)d(1 + k), ZT +1 (ω ′′ ), ST −1 (ω)d(1 − k)) and

conv(ST −1 (ω)u(1 + k), ZT +1 (ω ′ ), ST −1 (ω)u(1 − k), ZT (ω), ST −1 (ω)d(1 − k)). (3.11) By Remark 2.10, condition (3.9) implies (3.10) holds for t = T −1 in any of the four cases. The induction completes (3.10). Now at t = 0, H 0 takes one of the four forms in (3.11) with T = 1 and it is obvious minz=(x,y)∈H 0 (S0 x + y) = S0 x0 + y0 where Z1 = (x0 , y0 ). Thus {Zt } is optimal. Moreover,(2.6) and (2.7) hold because of (3.11) and similar to the proof in Theorem 3.1, Theorem 2.12 implies the uniqueness of optimal portfolio process.  The following theorem generalizes the long European call option ([1] and [2]). Theorem 3.3. Let X = X(ST ) = (x(ST ), y(ST )) be a contingent claim in a binomial model such that x(s) is non-decreasing and y(s) non-increasing in s. Suppose for every ω, ST −1 (ω)d(1 − k) ≤ |X(ST −1 (ω)d)X(ST −1 (ω)u)| ≤ ST −1 (ω)u(1 + k)

(3.12)

then there exists a unique portfolio process perfectly replicating X. It is also uniquely optimal. Proof. Let ZT +1 = X. Since H T −1 (ω) = conv(ST −1 (ω ′ )u(1 + k), X(ST (ω ′ )), ST −1 (ω ′ )u(1 − k)) ∩ conv(ST −1 (ω ′′ )d(1 + k), X(ST (ω ′′ )), ST −1 (ω ′′ )d(1 − k)) where ω ′ , ω ′′ ∈ Ω, ωT −1 = ωT′ −1 = ωT′′ −1 and ωT′ = ST −1 (ω)u and ωT′′ = ST −1 (ω)d, condition (3.12) implies there is a unique ZT (ω) = Z(ST −1 (ω)) ∈ (ST −1 (ω ′ )u(1 + k), X(ST (ω ′ ))) ∩ (X(ST (ω ′′ )), ST −1 (ω ′′ )d(1 − k)) such that H T −1 (ω) = conv(ST −1 (ω)u(1 + k), ZT (ω), ST −1 (ω)d(1 − k)).

From Remark 2.10, it follows that

RT −1 (ω) = conv(ST −1 (ω)(1 + k), ZT (ω), ST −1 (ω)(1 − k)).

Now for any ω, let ω ′ , ω ′′ ∈ Ω be such that ω(T − 2) = ω ′ (T − 2) = ω ′′ (T − 2) and ST −1 (ω ′ ) = ST −2 (ω)u and ST −1 (ω ′′ ) = ST −2 (ω)d. We claim that ST −2 (ω)d(1 − k) ≤ |ZT (ω ′′ )ZT (ω ′ )| ≤ ST −2 (ω)u(1 + k).

A Convexity Approach to Option Pricing

309

Let ω ′′′ be such that ωT′′′ = ωT′ −1 d = ωT′′ −1 u. It is obvious that and

|ZT (ω ′′ )ZT +1 (ω ′′′ )| = ST (ω ′′′ )(1 + k) |ZT +1 (ω ′′′ )ZT (ω ′ )| = ST (ω ′′′ )(1 − k).

Thus

ST −2 (ω ′′ )d(1 − k) = ST (ω ′′′ )(1 − k) ≤ |ZT (ω ′′ )ZT (ω ′ )|

≤ ST (ω ′′′ )(1 + k) = ST −2 (ω ′ )u(1 + k).

An induction implies that H 0 (ω) = conv(S0 u(1+k), Z1 (ω), S0 d(1−k)). Obviously, Z1 determines p(X) and {Zt} is optimal. It is uniquely optimal because of Theorem 2.12. 

4. Trinomial model In this section, we shall apply results in §2 to pricing contingent claims in trinomial models. In a trinomial model, perfect replication is in general not possible and therefore we need to consider super replicating portfolios. For contingent claims X(ST ) = (x(ST ), y(ST )), we shall formulate two conditions AS and DS in Definitions 4.1 and 4.2 including long and some short European calls (Corrolary 4.13) respectively and determine their optimal super replicating portfolio processes. We want to remark that opions with cash settlements do not satisfy AS and DS. Definition 4.1. Let X = X(ST ) = (x(ST ), y(ST )) be a contingent claim where x denotes the number of shares of the stock and y the number of shares of bond. We say X satisfies condition AS if (1) x(s) is increasing in s while y(s) is decreasing in s and (2) X(ST −1 ui ) is to the left of (ST −1 ui+1 (1 + k), X(ST −1 ui+1 )) and to the right of (ST −1 ui (1 − k), X(ST −1 ui+1 )) for any i = −1, 0. It is easy to see that the long position of a European call option (X(ST ) = (1, −K) if ST ≥ K and X(ST ) = (0, 0) of ST < k) satisfies Definition 4.1. (Figure 2). Definition 4.2. Let X = X(ST ) be a contingent claim as in Definition 4.1. We say X satisfies the DS condition if (1) x(s) is decreasing in s while y(s) is increasing in s and (2) X(ST −1 ui ) is to the right of (X(ST −1 ui+1 ), ST −1 ui+1 (1 − k)) and to the left of (X(ST −1 ui+1 ), ST −1 ui (1 + k)) for any i = −1, 0. (Figure 3). (See Remark 4.4.) The following lemmas follow easily from geometric arguments and we omit their proofs.

310

T.-S. Chiang and S.-J. Sheu z Y)TU 3 vj *

TU 3 vj,3 )3 , l* TU 3 vj )3  l*

Y)TU 3 vj,3 *

y Figure 2. X = X(ST ) satisfies AS

z Y)TU 3 vj,3 *

TU 3 vj )3 , l*

TU 3 vj,3 )3  l*

Y)TU 3 vj *

y Figure 3. X = X(ST ) satisfies DS

A Convexity Approach to Option Pricing

311

Lemma 4.3. For l > r, the following conditions are equivalent. (a) z2 is to the left of (l, z1 ) and to the right of (r, z1 ) 1 ≤ l and y2 ≥ y1 , x1 ≥ x2 where z1 = (x1 , y1 ) and z2 = (x2 , y2 ). (b) r ≤ xy21 −y −x2 Remark 4.4. From Lemma 4.3, it is easy to see that condition AS is a generalization of [1] (p. 75) in a binomial model and is equivalent to ST −1 ui (1 − k) ≤ |X(ST −1 ui )X(ST −1 ui+1 )| ≤ ST −1 ui+1 (1 + k), i = 0, −1. Condition DS is equivalent to ST −1 ui (1 + k) ≤ |X(ST −1 ui+1 )X(ST −1 ui )| ≤ ST −1 ui+1 (1 − k), i = 0, −1. 1+k < u. Also, decreasing (inObviously, condition DS can hold true only when 1−k creasing) in Definitions 4.1 and 4.2 means non-increasing (non-decreasing).

Lemma 4.5. For two wedges (l1 , z1 , r1 ) and (l2 , z2 , r2 ) with l1 > l2 and r1 > r2 , if z2 is to the left of (l1 , z1 ) and to the right of (r2 , z1 ), then conv(l1 , z1 , r1 ) ∩ conv(l2 , z2 , r2 ) = conv(l1 , z, r2 ) where z ∈ (l1 , z1 ) ∩ (z2 , r2 ). Lemma 4.6. For three points z1 , z2 and z3 in R2 , if |z1 z2 | = l and |z2 z3 | = r, then min(r, l) ≤ |z1 z3 | ≤ max(r, l). The next two lemmas describe the intersection of three and four wedges respectively. Lemma 4.7. For three wedges conv(li , zi , ri ) with l1 > l2 > l3 and r1 > r2 > r3 , if zi+1 is to the left of (li , zi )

and to the right of

(ri+1 , zi ), i = 1, 2,

(4.1)

then 3 @

i=1

Moreover,

conv(li , zi , ri ) = conv(l1 , z, r3 ) where z ∈ (l1 , z1 , r1 ) ∩ (l3 , z3 , r3 ). 3

i=1

conv(li , zi , ri ) = conv(l1 , z1 , r1 )



conv(l3 , z3 , r3 ).

Lemma 4.8. Suppose there are (li , zi , ri ), i = 1, . . . , 4 satisfying l1 > l2 > l3 > l4 and r1 > r2 > r3 > r4 with zi+1 to the left of (li , zi ) and to the right of (ri+1 , zi ) for each i. Let 3 @

conv(li , zi , ri ) = conv(l1 , w2 , r3 )

and

conv(li , zi , ri ) = conv(l2 , w4 , r4 )

as in lemma 4.7

i=1 4 @

i=2

where w2 ∈ (z3 , r3 ) ∩ (l1 , z1 ) and w4 ∈ (z4 , r4 ) ∩ (l2 , z2 ). Then r3 ≤ |w4 w2 | ≤ l2 .

312

T.-S. Chiang and S.-J. Sheu

Let X be a contingent claim satisfying AS in a trinomial model. For each ω ∈ Ω and 0 ≤ t ≤ T , we define the binomial process Siω,t , i = 0, . . . , T − t with S0ω,t = ωt and d = u1 . When restricted to {Siω,t }0≤i≤T −t , X naturally defines a contingent claim X ω,t on this binomial model at time T − t. To be precise, we have ω,t b ω,0 X ω,t (STω,t (STω,0 ). By Theorem −t ) = X(ST −t ). When t = 0, we use X to denote X ω,t 3.3, there exists a unique portfolio process Zi , i = 1, . . . , T − t + 1, perfectly replicating X ω,t . (Condition (3.12) in Theorem 3.3 holds because of condition AS.) We first claim that for each t, H t (ω) = conv(St (ω)u(1 + k), Z1ω,t , St (ω)u−1 (1 − k)).

(4.2)

It then follows that Rt (ω) = conv(St (ω)(1 + k), Z1ω,t , St (ω)(1 − k)).

(4.3)

For t = T − 1 and ω ∈ Ω, by definition, H T −1 (ω) = ∩i=−1,0,1 conv(li , X(ST −1 (ω)ui ), ri ) where li = ST −1 (ω)ui (1 + k), ri = ST −1 (ω)ui (1 − k). Condition AS implies that (4.1) in Lemma 4.7 holds and hence H T −1 (ω) = ∩i=−1,1 conv(li ,X(ST −1 (ω)ui ),ri ). Therefore, H T −1 (ω) = conv(ST −1 (ω)u(1 + k), z, ST −1 (ω)u−1 (1 − k)) and RT −1 (ω) = conv(ST −1 (ω)(1 + k), z, ST −1 (ω)(1 − k))

where z ∈ (ST −1 (ω)u, X(ST −1 (ω)u)) ∩ (X(ST −1 (ω)u−1 ), ST −1 (ω)u−1 ). But it is obvious that z = Z1ω,T −1 . Let ω ′ ∈ Ω be such that ω ′ (T − 2) = ω(T − 2) and ω ′ (T − 1)u = ω(T − 1). We need to show that ′

ST −1 (ω ′ )(1 − k) ≤ |Z1ω ,T −1 Z1ω,T −1 | ≤ ST −1 (ω)(1 + k),

(4.4)

(i.e., (2) in Definition 4.2 in order for an induction to proceed.) Since conv(ST −1 (ω)u(1 + k), Z1ω,T −1 , ST −1 (ω)u−1 (1 − k))

= ∩i=−1,0,1 conv(ST −1 (ω)ui (1 + k), X(ST −1 (ω)ui ), ST −1 (ω)ui (1 − k))

and ′

conv(ST −1 (ω ′ )u(1 + k), Z1ω ,T −1 , ST −1 (ω ′ )u−1 (1 − k))

= ∩i=−1,0,1 conv(ST −1 (ω ′ )ui (1 + k), X(ST −1 (ω ′ )ui ), ST −1 (ω ′ )ui (1 − k))

= ∩i=−2,−1,0 conv(ST −1 (ω)ui (1 + k), X(ST −1 (ω)ui ), ST −1 (ω)ui (1 − k)),

Lemma 4.8 thus implies (4.4). We next consider t=T-2. Let ω−1 , ω0 and ω1 ∈ Ω be such that ω−1 (T − 2) = ω0 (T − 2) = ω1 (T − 2) and ST −1 (ω−1 ) = ST −1 (ω0 )u−1 ,

A Convexity Approach to Option Pricing

313

ST −1 (ω1 ) = ST −1 (ω0 )u. Then,

H T −2 (ω) = ∩i=−1,0,1 conv(ST −2 (ω)ui (1 + k), Z1ωi ,T −1 , ST −2 (ω)ui (1 − k)) = conv(ST −2 (ω)u1 (1 + k), Z1ω1 ,T −1 , ST −2 (ω)u1 (1 − k)) ω

∩ conv(ST −2 (ω)u−1 (1 + k), Z1 −1

,T −1

, ST −2 (ω)u−1 (1 − k))

= conv(ST −2 (ω)u(1 + k), z, ST −2 (ω)u−1 (1 − k)).

for some z ∈ R2 . By the uniqueness in Theorem 3.3, z = Z1ω,T −2 . An induction thus completes the proof of (4.2) and (4.3). Now, since both p(X) and p(X b ) are determined by Z1ω,0 , obviously p(X) = p(X b ). We summarize the above as follows. Theorem 4.9. Let X = (x(ST ), y(ST )) be a contingent claim satisfying condition AS on a trinomial model. Then there exists an adapted process {Zt } such that and

H t (ω) = conv(St (ω)u(1 + k), Zt (ω), St (ω)u−1 (1 − k)), t = 0, . . . , T − 1 Rt (ω) = conv(St (ω)(1 + k), Zt (ω), St (ω)(1 − k)), t = 1, . . . , T.

Moreover, if X b is the contingent claim in the corresponding binomial model with d = u1 , then p(X) = p(X b ). Obviously, any portfolio process super replicating X starting from Z0 is optimal. We next consider contingent claims satisfying condition DS and omit the proofs of first two lemmas. Lemma 4.10. For two wedges (l1 , z1 , r1 ) and (l2 , z2 , r2 ), if l1 > r1 > l2 > r2 and l2 < |z1 z2 | < r1 , then conv(l1 , z1 , r1 ) ∩ conv(l2 , z2 , r2 ) = conv(l1 , z1 , r1 , w, l2 , z2 , r2 ) for some w ∈ (z1 , r1 ) ∩ (l2 , z2 ). Lemma 4.11. For three wedges (li , zi , ri ), i = 1, 2, 3, if l1 > r1 > l2 > r2 > l3 > r3 and l2 < |z1 z2 | < r1 , l3 < |z2 z3 | < r2 , then 3 @

conv(li , zi , ri ) = conv(l1 , z1 , r1 , w1 , l2 , z2 , r2 , w2 , l3 , z3 , r3 )

i=1

for some w1 ∈ (z1 , r1 ) ∩ (l2 , z2 ) and w2 ∈ (z2 , r2 ) ∩ (l3 , z3 ). Theorem 4.12. For a contingent claim X = X(ST ) satisfying condition DS in a 1+k , there are two adapted processes Zt1 and Zt2 such trinomial model with u > 1−k that H t (ω) = conv(St (ω)u(1 + k), X(St (ω)u), St (ω)u(1 − k), Zt1 (ω),

St (ω)(1 + k), X(St (ω)), St (ω)(1 − k), Zt2 (ω), St (ω)u−1 (1 + k), X(St (ω)u−1 ),

and

St (ω)u−1 (1 − k)), 0 ≤ t ≤ T − 1

(4.5)

Rt (ω) = conv(St (ω)(1 + k), X(St(ω)), St (ω)(1 − k)), 1 ≤ t ≤ T,

(4.6)

314

T.-S. Chiang and S.-J. Sheu

where wt1 ∈ (X(St (ω))u, St (ω)u(1 − k)) ∩ (St (ω)(1 + k), X(St (ω))) wt2

∈ (X(St (ω)), St (ω)(1 − k)) ∩ (St (ω)u

−1

(1 + k), X(St (ω)u

and −1

)).

(4.7)

Moreover, X(S0 ) determines the price of X where S0 is the initial price. Proof. Let ω−1 , ω0 and ω1 ∈ Ω such that ω−1 (T − 1) = ω0 (T − 1) = ω1 (T − 1) = ω(T − 1)and ST (ω1 ) = ST −1 (ω)u, ST (ω0 ) = ST −1 (ω), and ST (ω−1 ) = ST −1 (ω)u−1 . By definition, H T −1 (ω) = ∩i=−1,0,1 conv(ST −1 (ω)ui (1 + k),X(ST (ωi )),ST −1 (ω)ui (1 − k))

= conv(ST −1 (ω)u(1 + k),X(ST (ω1 )),ST −1 (ω)u(1 − k),ZT1 −1 (ω),ST −1 (ω)(1 + k), X(ST (ω0 )),ST −1 (ω)(1 − k),ZT2 −1 (ω),ST −1 (ω)u−1 (1 + k),

X(ST (ω−1 )),ST −1 (ω)u−1 (1 − k)).

The last equality follows from Lemma 4.11. (Condition DS implies that Lemma 4.11 applies here.) Also, ZT1 −1 (ω) ∈ (X(ST (ω1 )), ST −1 (ω)u(1 − k)) ∩ (ST −1 (ω)(1 + k), X(ST (ω0 )) and ZT2 −1 (ω) ∈ (X(ST (ω0 )), ST −1 (ω)(1 − k)) ∩ (ST −1 (ω)u−1 (1 + k), X(ST (ω−1 ))).

By Remark 2.10, RT −1 (ω) = conv(ST −1 (ω)(1 + k), X(ST −1 (ω)), ST −1 (ω)(1 − k)). Since {X(ST −1 )} again satisfies condition DS, an induction thus concludes (4.5), (4.6) and (4.7). It follows from Remark 2.11 that X(S0 ) determines the price of X.  Corollary 4.13. Let X be a short European call option in a trinomial model with K K 1+k . If no terminal price ST falls in the interval ( 1+k , 1−k ) where K is the u ≥ 1−k strike price, then X satisfies condition DS. Proof. All we need to check is ST −1 (1 + k) < |X(ST −1 u)X(ST −1 )| < ST −1 u(1 − k)

(4.8)

when ST −1 < K < ST −1 u. But |X(ST −1 u)X(ST −1 )| = K, thus (4.8) holds if and K K only if ST −1 < 1+k and 1−k < ST −1 u.  Remark 4.14. The condition imposed on K in Corollary 4.13 is similar to that in binomial models for short European call options in [2, p. 278]. But the optimal super replicating portfolio process is not unique in trinomial models.

A Convexity Approach to Option Pricing

315

References [1] Bensaid, B., Lesne, J., Pages, H. and Scheinkman, H. (1992), Derivative Asset Pricing with Transaction Costs, Mathematical Finance 2, 63–86. [2] Boyle, P. and Vorst, T. (1992), Option Pricing in Discrete Time with Transaction Costs, J. of Finance 47, 271–293. [3] Cvitani´c, J., Pham, H. and Touzi, N. (1999), A Closed-form Solution to the Problem of Super-replication under Transaction Costs, Finance and Stochastics 3, no. 1, 35– 54. [4] Cox, J.C., Ross, S.A. and Rubinstein, M. (1979), Option Pricing: A Simplified Approach, J. of Financial Economics 7, 229–263. [5] Davis, M.H.A., Panas, V.J. and Zariphopoulou, T. (1992), European Options Pricing with Transaction Costs, SIAM J. of Control and Optimization 31, 470–498. [6] Kocinski, M. (2001), Hedging in the CRR Model under Concave Transaction Costs, Demonstratio Math. 34, 497–512. [7] Kocinski, M. (2004), Hedging of the European Option in Discrete Time under Proportional Transcation Costs, Math. Method Oper. Reserch 59, 315–328. [8] Koehl, P.E., Pham, H. and Touzi, N. (1999), Hedging in Discrete Time under Transaction Costs and Continuous-time Limit, J. Appl. Probab. 36, no. 1, 163–178. [9] Kusuoka, S. (1995), Limit Theorem on Option Replication with Transaction Costs, Annals of Applied Probability 5, 198–221. [10] Leland, H.E. (1985), Option Pricing and Replication with Transcation Costs, J. of Finance 40, 1283–1301. [11] Rutkowski, M. (1998), Optimality of Replication in the CRR Model with Transaction Costs, Applicationes Mathematicae 25, 29–53. [12] Stettner, L. (1997), Option Pricing in CRR Model with Proportional Transaction Costs: A Cone Transformation Approach, Applicationes Mathematicae 24, 475–514. Tzuu-Shuh Chiang and Shuenn-Jyi Sheu Institute of Mathematics Academia Sinica Taipei, Taiwan e-mail: [email protected] [email protected]

Progress in Probability, Vol. 65, 317–330 c 2011 Springer Basel AG 

Completeness and Hedging in a L´evy Bond Market Jos´e M. Corcuera Abstract. In this paper we analyze the completeness problem in a bond market where the short rate is driven by a non-homogeneous L´evy process. Even though it is known that under certain conditions we have a kind of uniqueness of the risk neutral measure, little is known about how to hedge in this market. We elucidate that perfect replication formulas are not, in general, possible to obtain and an approximate hedging, in an L2 sense, is then the appropriate approach. Mathematics Subject Classification (2000). 60H30, 91B28, 60G51. Keywords. Bond market, term structures, arbitrage, market completeness, L´evy processes.

1. Introduction Fixed income markets can be described as markets where agents trade contracts in which deterministic flows of money are prescribed at certain times in the future. The problem is how to price these contracts and to hedge the associated risk. The risk is due to the fact that the value of the money changes with time. One unit now is, normally, more valuable that one unit tomorrow. We can describe this by a money account process : . T r(t)dt , T ≥ 0 BT = exp 0

that indicates that one unit today (t = 0) is equivalent, it has the same value, to Bt units tomorrow, at time t = T . This evolution is governed by r(t), 0 ≤ t ≤ T, the interest rate, a random process, usually positive, defined on a stochastic basis (Ω, F , P, (Ft )). Unfortunately, since r is random, we do not know the price of the money in the future. Even though we might have a perfect model for r(t), This work is supported by the MCI Grant MTM2009-08218.

318

J.M. Corcuera

0 ≤ t ≤ T, it is not obvious how to price, at time 0, one unit at time T . One could think that this price is 1/BT but its value is not observed at time 0. So, to have a model for r(t), 0 ≤ t ≤ T, is not sufficient to price the so-called T zero coupon bond, a contract which guarantees the holder one unit at time T . Let P (t, T ) denote a possible price of a T -zero coupon bond at time t. Arbitrage free prices can be obtained by doing 

Bt  P (t, T ) := EQ Ft , T ≥ t, BT 

where Q is any probability measure equivalent to P . Fixing the real evolution of the short interest rates is not enough for explaining the prices of the T -zero coupon bonds, but if, additionally, we choose a risk-neutral measure Q we have a model for the prices of T -zero coupon bonds for every T ≥ 0. Then the question is if by doing this we are fixing the prices of the rest of contracts, or derivatives, in this market and if we can hedge against any risk. This is the completeness problem. ˜ equivalent Note that if we have another contract with payoff X at T¯ and we take Q to P, EQ˜ (X|Ft ) would give an arbitrage free price, so in order to ensure that the prices P (t, T ), T ≥ t, match the rest of prices we need that 



Bt  Bt  F = E Ft , 0 ≤ t ≤ T ∧ T¯ , EQ ˜ t Q BT  BT 

˜ on FT¯ . This can be read as the uniqueness of the risk neutral implies that Q = Q, (or martingale) measure and, as we shall see later, this implies that we can hedge approximately, in an L2 -sense, against any risk by using self-financing portfolios with bonds with different maturities. Another way of modelling the market, with horizon T¯, is to give a dynamics for P (t, T ), 0 ≤ t ≤ T ∧ T¯, or equivalently for the forward rates f (t, T ) defined as

∂ log P (t, T ) , 0 ≤ t ≤ T ∧ T¯, ∂T and to look if the model is arbitrage free and complete. If there is Q such that the discounted value of the bond prices, which we denote P˜ (t, T ), are Q-martingales with respect to the filtration (Ft ) we will have that 

Bt  P (t, T ) = EQ Ft , 0 ≤ t ≤ T ∧ T¯ , BT  f (t, T ) = −

and if Q is equivalent to P the model will be free of arbitrage, with respect to the information (Ft ). Moreover if this Q is unique we will have that 



Bt  Bt  EQ Ft = EQ˜ Ft , 0 ≤ t ≤ T ∧ T¯ , BT  BT 

˜ on FT¯ and we will be able to replicate in an L2 -sense. So, implies that Q = Q both approaches, once we have a martingale measure Q, are equivalent. In this paper we follow the first approach by skipping, by construction, the existence of Q and by taking an appropriate risk neutral measure in such a way that

Completeness and Hedging in a L´evy Bond Market

319

we can have an approximate complete model with jumps. The paper is organize as follows, in the second section we define the model and study is basic properties. In the third section we make precise the notion of (perfect) completeness and we analyze when the model is perfectly complete. The last section is devoted to study the approximate, L2 or quasi-completeness of the model.

2. The model We assume that only the interest rate is fixed exogenously and under the historical probability P . There is not any other primary asset in the market. So, any equivalent measure can serve to price derivatives, and a zero coupon bond is a derivative where the underlying is the bank account. Then we have a bank account process (Bt )t≥0 that evolves as % t & Bt = exp r(s)ds , 0

where r(t) is the so-called short-rate. We consider a dynamics of the form  t r(t) = µ(t) + γ(s, t)dLs ,

(1)

0

here L is a non-homogeneous L´evy process, or a process with independent increments and absolutely continuous characteristics (σs2 , υs , as ):  t  t  t x1{|x|≤1} (J(dx, ds) − υs (dx)ds) σs dWs + as ds + Lt = 0 R 0 0  + ∆Ls 1{|∆Ls|>1} =



s≤t

t

as ds + 0

+

 s≤t



t

σs dWs + 0

∆Ls 1{|∆Ls|>1} .

 t 0

|x|≤1

BsP xdM

The coefficients µ(t) and γ(s, t) are assumed to be deterministic and c` adl` ag. We also assume that the filtration generated by L and r is the same. Fix T > 0 and consider a T -zero-coupon bond. This is a contract that guarantees the holder 1 monetary unit at time T . Write P (t, T ), 0 ≤ t ≤ T, for the price of this contract. We know that P (T, T ) = 1, take Q equivalent to P (the historical probability) and structure preserving (with respect to L) and define  .  :   T  P˜ (t, T ) = EQ exp − r(s)ds  Ft .  0 This discounted price, whatever Q, equivalent to P , we choose, does not produce arbitrage.

320

J.M. Corcuera

From here we have that  .  : T ˜ P (t, T ) = EQ exp − r(s)ds | F t .  = exp −

0

T

µ(s)ds

0

.  = exp −

: :

T

µ(s)ds

0

× EQ

 

× EQ exp

where Γ(T )t = − Under Q ,

· 0



.  exp −



.

T

0





T

γ(u, s)ds dLu u

T

Γ(T )u dLu

0

:

:

| Ft

| Ft ,

T

γ(t, s)ds.

t

Γ(T )u dLu is still a process with independent increments so : . T

EQ exp

| Ft

Γ(T )u dLu

0

= exp

%

t

Γ(T )u dLu EQ exp

0

therefore

* ) and, since P˜ (t, T )



&

P˜ (t, T ) ∝ exp

0≤t≤T

%

.

T

Γ(T )u dLu t

:

,

&

t

Γ(T )u dLu ,

0

is, by construction, a Q -martingale, we can write

! " t exp 0 Γ(T )u dLu 3 ! "4 . P˜ (t, T ) = P (0, T ) t EQ exp 0 Γ(T )u dLu

(2)

2.1. The forward rates The previous dynamics of P˜ (t, T ) for all maturities, T > t, can be described in terms of the instantaneous forward rates, by using the definition .  :  T t P˜ (t, T ) = exp − r(s)ds − f (t, s)ds . 0

t

Then f (t, T ) = −∂T log P˜ (t, T ) = f (0, T ) + A(T )t +



0

t

γ(s, T )dLs ,

Completeness and Hedging in a L´evy Bond Market where

321

3 ! "4 t ∂T EQ exp 0 Γ(T )u dLu 3 ! "4 . A(T )t = t EQ exp 0 Γ(T )u dLu

In Eberlein, Jacod and Raible (2005), the authors assume that the forward rates are given and then study the arbitrage and completeness problems. In fact they start by assuming that (under P )  t  t γ(s, T )dLs , α(s, T )ds + f (t, T ) = f (0, T ) + 0

0

and they look for martingale measures. They obtain that their existence implies t some constraints for the process 0 α(s, T )ds. This was already observed by Heath, Jarrow and Morton in 1992 in the case when L is continuous. The constraint for the existence of “structure preserving” martingale measures is then  t α(s, T )ds = A(T )t , 0

for some measure structure preserving Q. We have considered that Q is a given structure preserving equivalent measure, we construct the bond market with this Q and deduce the form of the forward rates which are compatible with this prices. Then, by construction, the market is arbitrage free. So, the relevant problem is the completeness of the market. We take Q and we check if this market is complete by trying to demonstrate that any (Q-squared integrable) contingent claim can be replicated by a self-financing portfolio.

3. Completeness of the market According to Eberlein, Jacod and Raible (2005) completeness is important for two reasons: • Completeness amounts to the fact that any claim can be priced in a unique way: by taking the expectation of the discounted value of the claim with respect to the (unique) equivalent martingale measure. This means, in our context, that if we choose different, structure preserving equivalent measures Q, we obtain different bond prices. • Completeness is also related with hedging and is in fact “equivalent” to the property that any square integrable (discounted) contingent claim Y can be written as  T Y = E(Y ) + Hsj dP˜ (s, Tj ), (3) Tj ∈J

0

322

J.M. Corcuera and this is equivalent to the martingale representation property   t E(Y |Ft ) = E(Y ) + Hsj dP˜ (s, Tj ). Tj ∈J

0

Instead of looking if a system of bond prices implies a unique Q we can use the martingale representation theorems for studying if and how we can replicate any contingent claim. We have seen that P˜ (t, T ) = P (0, T ) exp{Z¯t }   where Z¯t 0≤t T and we want to complete the market with bonds with maturity times T ∗ > T¯. Then the question is if these bonds are sufficient to complete the market. First we see that any measurable function h can be approximate by a linear combination of P (t, T ∗ ) with T ∗ > T¯ . In fact if EQ [h(P (t, T ))P (t, T ∗ )] = 0, for all T ∗ > T , we have that h(P (t, T )) = 0 a.s. since, by (9), #  t

%  t &$ ∗ EQ [h(P (t, T ))P (t, T )] = EQ g b(s)dLs exp λ b(s)dLs = 0, 0

0

330

J.M. Corcuera

with g(x) = h(Ce(a(T )−a(t))x ), for all λ = a(T ∗ ) − a(t). Then, in particular, this implies that  P (t, T )k = λkj (t)P (t, Tjk (t)) j∈J

for certain countable set of maturity times {Tjk (t), j ∈ J} in (T¯ , ∞) and *k ) * )   P (t, T )k  F+(k) t, P+(t, T ) = ϕ(k) t, T , T P+(t, T ) = ϕ(k) t, T , T Btk    λk (t)P˜ (t, T k (t)). = ϕ(k) t, T , T B 1−k t

j

j

j∈J

Consequently the result follows as in the previous subsection. Remark. Note that if in the expression ¯t + dMt = γt dW



R

BQ , ϕ(x, s)dM s

ϕ(·, t) is analytic for fixed t, then we can write a.s. that  dMt = ak dY˜s(k) , k≥1

and the corresponding contingent claim will be replicated in a perfect (a.s.) sense. This was already observed, by using different arguments, in Bj¨ ork, Kabanov and Runggaldier (1997).

References [1] Bjork, T., Kabanov, Y. and Runggaldier, W. (1997) Bond market structure in the presence of marked point processes, Math. Finance 7(2), 211–239 [2] Corcuera, J.M., Guerra (2010) Dynamic complex hedging in additive markets. Quantitative Finance 10(9), 1023–1037. [3] Corcuera, J.M., Nualart, D. and Schoutens, W. (2005) Completion of a L´evy Market by Power-Jump-Assets. Finance and Stochastics 9(1), 109–127. [4] Eberlein, E., Jacod, J. and Raible, S. (2005) L´evy term structure models: Noarbitrage and completeness, Finance and Stochastics, 9, 67–88. [5] Heath, D, Jarrow, R. and Morton, A. (1992) Bond Pricing and the Term Structure of Interest Rates, Econometrica, 60, 77–106. Jos´e M. Corcuera Universitat de Barcelona Gran Via de les Corts Catalanes 585 E-08007 Barcelona, Spain e-mail: [email protected]

Progress in Probability, Vol. 65, 331–346 c 2011 Springer Basel AG 

Asymptotically Efficient Discrete Hedging Masaaki Fukasawa Abstract. The notion of asymptotic efficiency for discrete hedging is introduced and a discretizing strategy which is asymptotically efficient is given explicitly. A lower bound for asymptotic risk of discrete hedging is given, which is attained by a simple discretization scheme. Numerical results for delta hedging in the Black-Scholes model are also presented. Mathematics Subject Classification (2000). 60F25, 60H05. Keywords. Asymptotic efficiency, discrete hedging, stopping time, Riemann sum.

1. Introduction This article considers hedging a derivative by dynamically rebalancing a portfolio of the underlying asset of the derivative and a risk-free asset under a realistic condition that the rebalancing is limited to occur discretely. Under the condition, a future payoff cannot be hedged in general even in the Black-Scholes nor the other complete market models. In incomplete market models, hedge-error is decomposed into two parts: one due to the incompleteness of the market and the another due to the fact that we cannot rebalance a portfolio continuously. We deal with the difference between what can be hedged by continuous rebalancing and what can be realized by discrete rebalancing. Therefore, we are not concerned with the portfolio selection in the usual sense but the selection of discretization schemes for a given continuous-time portfolio strategy. We present a lower bound for the asymptotic mean squared error. We introduce the notion of asymptotic efficiency for discrete hedging and constructs a discretization scheme which is efficient in that it attains the lower bound asymptotically. This problem is reduced to analyzing discretization error of stochastic integrals. Here, rebalancing times are corresponding to a partition for the Riemann sum associated with a stochastic integral. Naturally, we suppose that the rebalancing times are increasing stopping times. In the case that the times are equidistant, Rootz´en (1980) gave the convergence rate and the asymptotic distribution of the discretization error. Bertsimas, Kogan and Lo (2000), Hayashi and Mykland (2005)

332

M. Fukasawa

treated the same problem in the context of hedge-error. Tankov and Voltchkova (2009) extends the results to discontinuous semimartingales. Gobet and Temam (2001), Geiss (2002) among others gave asymptotic estimates in the Lp sense under the Black-Scholes model. However, from practical viewpoints, it seems not natural to suppose that the rebalancing times are deterministic. They should be determined adaptively to, for example, the number of shares currently held, the number of shares that should be held according to a given continuous-time strategy, and the current price of the underlying asset. Nevertheless, it remains an open problem to construct an optimal strategy of rebalancing times. Martini and Patry (1999) considered the minimization of L2 hedge-error under the risk-neutral measure in the Black-Scholes model. They analyzed the optimal stopping times given a fixed number of transactions: their existence, uniqueness and numerical construction. We take a different approach to the discrete hedging problem taking the following aspects into consideration; 1) there is no need to fix the number of transactions in practice as far as it is finite, 2) it is not so clear which probability measure we should take in the L2 minimization problem, and 3) an explicit construction of a discretization scheme is preferable even if it is only an approximation of the optimal solution. We develop an asymptotic theory to tackle the discrete hedging problem. A rigorous formulation is given in Section 2. The notion of asymptotic efficiency for discrete hedging is introduced and a discretizing strategy which is asymptotically efficient is constructed in Section 3. Numerical results for delta hedging in the Black-Scholes model are also presented in Section 4.

2. Formulation 2.1. Notation and definitions Here we give a rigorous formulation of our problem. For the sake of brevity, we suppose that the risk-free rate is always 0 throughout this article. Suppose that an asset price process Y is a continuous semimartingale on a stochastic basis (Ω, F , {Ft }, P ). Denote by E the expectation with respect to the measure P . If we write G · F for adapted processes G and F , it stands for the stochastic integral or the Stieltjes integral of G with respect to F . Fix a stopping time T which stands for the maturity of a derivative. Since we are interested in the discretization of hedging strategy, we suppose that a hedging strategy X defined on [0, T ] is given and the stochastic integral  T X ·Y = Xs dYs (1) 0

is what we should replicate. Due to our restriction on rebalancing, what we can realize is of the form  T ∞  n n ¯ n (Yτ n ∧T − Yτ n ∧T ), X ·Y = Xs dYs = X (2) j j+1 j 0

j=0

Asymptotically Efficient Discrete Hedging

333

where we define X n as n ¯ jn , s ∈ [τjn ∧ T, τj+1 ∧ T) Xsn = X

¯ n which are Fτ n -measurable for each j ≥ 0. Here, τ n are with random variables X j j j stopping times such that 0 = τ0n < τ1n < · · · < τjn < · · · ,

a.s.,

and that for any stopping time τ such that τ < T , it holds τjn

Nτn := max{j ≥ 0; τjn ≤ τ } < ∞,

a.s.

(3)

Nτn

Note that stands for jth rebalancing time and is the number of transactions up to time τ . The problem treated here is that even if Xs is given explicitly for any s ∈ [0, T ], we can only rebalance our portfolio discretely in time. For example, in the Black-Scholes world dYt = µYt dt + σYt dWt ,

(4)

the future payoff f (YT ) can be hedged by continuously rebalancing as  T f (YT ) = P (0, Y0 ) + Xs dYs 0

with

Xt = ∂y Pf (t, Yt ), Pf (t, y) = where



√ f (y exp(−σ 2 (T − t)/2 + σ T − tz))φ(z)dz,

% 2& 1 z . φ(z) = √ exp − 2 2π Although the hedging strategy X is given explicitly here, we can only have the Riemann sum (2) in practice. The stochastic part of the corresponding hedge-error is then given by Z n = X · Y − X n · Y. ¯ n) Our aim here is to estimate Z n and to construct an optimal sequence of (τjn , X j in some sense. The simplest strategy (stopping times) is equidistant one; τjn = jhn , j = 0, 1, . . . , for a positive constant hn > 0. We can consider, for example, ! " n τ0n = 0, τj+1 = inf t > τjn ; |Xt − Xτjn |2 = hn

on the other hand, which was studied in Karandikar (1995). We want to choose a strategy among these many candidates. Here we explain the role of the artificial perturbation variable n. By the n nature of the problem, the durations of transactions τj+1 − τjn can be considered sufficiently small relative to [0, T ]. For example, if the maturity is one year T = 1 n and if the portfolio is rebalanced approximately every weekday, then τj+1 − τjn ≈ n n n n 1/250. Therefore, dealing with a sequence τ = {τj }j with τj+1 − τj → 0 as

334

M. Fukasawa

n → ∞, we can expect that the distribution of Z n is approximated well by its limit as n → ∞. This asymptotic argument enables us to derive a simple lower bound of risk which is attained by a simple strategy as we will see in the next section. The precise description of the asymptotic condition is given later. We call ¯ n )}j a discrete hedging strategy. the sequence {π n }n with π n = {(τjn , X j ¯ n = Xτ n for each j ≥ 0. Put The simplest and most natural is to take X j

with

j

ˆn · Y Zˆ n = X · Y − X ˆ n = Xτ n , s ∈ [τ n ∧ T, τ n ∧ T ). X s j j+1 j

ˆ n . Note that there is no need We say X n is natural and π n is natural if X n = X to take such a special form of X n from practical point of view. Nevertheless, we will see that our lower bound of asymptotic risk is attained by a natural strategy. 2.2. Discrete hedging risk Here we shall introduce a risk criterion in terms of the stochastic process Z n . At the maturity T , the hedge-error is ZTn , therefore the simplest criterion would be E[|ZTn |2 ]. It is however not clear which measure E we should take. Martini and Patry (1999) and many preceding studies on hedging in incomplete markets took an equivalent martingale measure as E. If it is the case, Z n = (X − X n) · Y is a local martingale, so that E[|ZTn |2 ] = E[Z n T ] provided that Z n T is integrable. Notice that Z n  can be interpreted as the integrated variance of Z n , so that it also serves as a risk criterion. Taking this into consideration, we introduce E[Z n T ] (5)

as our risk criterion. We stress that E is not necessarily an equivalent martingale measure here. We will see later that an asymptotically efficient strategy with respect to the risk defined as (5) does not depend explicitly on E. In other words, we do not need to estimate the drift parameter under E. This is important because statistical estimation of the drift parameter for the physical measure is not stable. Now, we fix our stochastic basis (Ω, F, {Ft }, P ) and introduce several definitions. Definition 1. An adapted process ϕ is said to be locally bounded on [0, T ) if there exists a sequence of stopping times σ m with σ m < T , σ m → T a.s. such that ϕσm ∧· is bounded. Definition 2. An adapted process M is said to be a local martingale on [0, T ) if there exists a sequence of stopping times σ m with σ m < T , σ m → T a.s. such that Mσm ∧· is a bounded martingale.

Asymptotically Efficient Discrete Hedging

335

Definition 3. An adapted process ϕ is said to be F -continuous for an increasing continuous adapted process F if ϕ is continuous and for any stopping times τ1 , τ2 with τ1 < τ2 Fτ1 = Fτ2 , ϕ is constant on the interval [τ1 , τ2 ]. Let us assume the following structure condition: Condition S. Y is a one-dimensional continuous semimartingale and there exist 1. a continuous adapted process M which is a local martingale on [0, T ), 2. an adapted process ψ which is locally bounded on [0, T ) 3. an M -continuous adapted process κ which is strictly positive on [0, T )

such that it holds on [0, T ) that

X = X0 + ψ · M  + M, Y  = κ2 · X. The above structure condition holds, for example, in the preceding BlackScholes model (4) with M = (σY Γ) · W, κ = Γ−1 , Γt = ∂y2 Pf (t, Yt )

provided that the payoff f is a convex function whose right derivative f ′ is not a continuous singular function. Here we use the fact that ∂y2 Pf is a strictly positive continuous function on [0, T ) × (0, ∞) if the Stieltjes measure defined by the right continuous increasing function f ′ has a discrete or absolutely continuous component. A sufficient condition is that f is a convex function which is piecewise C 2 . See Section 4 for the details. Note that we allow ΓT = 0 as well as ΓT = ∞, so that the call and put payoffs are included. If f is not convex, then Γ can reach 0 in [0, T ) so the third condition may fail. It also remains for further research to include the case Y is multi-dimensional and discontinuous.

3. Main results: asymptotic efficiency The notion of the asymptotic efficiency for discrete hedging and an asymptotically efficient strategy are given in this section. It should be noted first that our risk (5) converges to 0 as n → ∞ if, for example, n sup |τj+1 ∧ T − τjn ∧ T | → 0, j≥0

sup

0≤j≤NTn

¯ jn − Xτ n | → 0 |X j

in probability under suitable integrability conditions. The more frequently is the portfolio rebalanced, the less hedge-error does it result in. To make the problem realistic, we introduce a cost for rebalancing. Let us consider minimizing C(E[Nτn ], E[Z n τ ])

for a given stopping time τ with τ ≤ T and for a given function C : [0, ∞)2 → [0, ∞) which is increasing in both variables. Suppose for a while that we are given such an inequality that E[Nτn ]E[Z n τ ] ≥ K

336

M. Fukasawa

for any discrete hedging strategy, where K > 0 is a constant depending on τ . Then apparently it holds that C(E[Nτn ], E[Z n τ ]) ≥ C(E[Nτn ], K/E[Nτn]).

Therefore, if there exists a discrete hedging strategy {π n }n which attains the lower bound K for each n and τ and E[Nτn ] → ∞ (n → ∞), then the solution of the minimization problem is given by π n for a certain n which minimizes C(E[Nτn ], K/E[Nτn]). In the following, we realize this idea in an asymptotic sense. Let us introduce a class of discrete hedging strategies which seems sufficiently ¯ n )}j,n satisfying large. Denote by T the class of discrete hedging strategies {(τjn , X j the following condition. Condition T . There exists a sequence of stopping times σ m with σ m < T , σ m → T , a.s., (m → ∞) such that for each m, • it holds that n ∧σ m − Xτ n ∧σ m | → 0, sup |Xτj+1 (6) j j≥0

in probability as n → ∞, • there exists a deterministic sequence Km,n with Km,n → ∞ (n → ∞) such that n ∧σ m − Xτ n ∧σ m |Fτ n ∧σ m ], E[Nσnm ] sup E[Xτj+1 j j j≥0

(7)

Km,n sup |Xτjn∧σm − Xτnjn ∧σm | j≥0

are uniformly bounded in n, • the sequences E[Nσnm ]Zˆ n σm , E[Nσnm ]Z n σm are uniformly integrable in n.

(8)

Note that Z n  = (X − X n )2 · Y . Condition T is satisfied if, for example, it holds that dXt = gt dt for such an adapted process g that both g and 1/g are locally bounded on [0, T ) and that n ¯ jn − Xτ n |2 ≤ ahn a.s. sup |τj+1 ∧ T − τjn ∧ T | ≤ ahn , NTn ≤ a/hn , sup |X 0≤j≤NTn

j≥0

j

with a constant a and a sequence hn → 0. In fact (6) and (7) follow immediately. To see (8), notice that there exists a localizing sequence σ m of stopping times such that E[Z n σm ] ≤ Cm hn + E[Zˆ n σm ]  T ˆ n m )2 ]dt ≤ Cm hn , ≤ Cm hn + Cm E[(Xt∧σm − X t∧σ 0

where Cm is a generic constant depending on m. The convergence (6) is weaker than the usual high frequency assumption n sup |τj+1 ∧ T − τjn ∧ T | → 0 j≥0

Asymptotically Efficient Discrete Hedging

337

ˆn · Y in probability. In such an asymptotic situation, the natural Riemann sum X converges to the stochastic integral X · Y . It is therefore not a serious restriction ¯ n − Xτ n | → 0 as in (7). The first quantities of (7) and (8) are related to suppose |X j

j

each other and their uniform properties serve as a condition on the regularity of the sequence of stopping times τjn . The last one of (8) actually controls the difference ˆ n . Besides, the integrability condition on Z n  is preferable in between X n and X terms of minimizing the hedge-error. Theorem A. For any {π n }n ∈ T and any stopping time τ with τ ≤ T a.s., it holds lim inf E[Nτn ]E[Z n τ ] ≥ n→∞

1 |E[κ · Xτ ]|2 . 6

Proof. There exists a sequence of stopping times σ m with σ m < T , σ m → T , a.s.(m → ∞) such that for each m, Mσm ∧· is a bounded local martingale on [0, τ ] and the adapted processes Yσm ∧· , Xσm ∧· , κσm ∧· ,

1 , ψσm ∧· κσm ∧·

are bounded on [0, τ ]. Without loss of generality, we assume that the statements of Condition T hold with the same localizing sequence σ m . By monotonicity, it suffices to show for each m 1 lim inf E[Nσnm ∧τ ]E[Z n σm ∧τ ] ≥ |E[κ · Y σm ∧τ ]|2 . n→∞ 6 Hence, we can suppose without loss of generality that M itself is a bounded martingale on [0, τ ] and that Y , X, κ, 1/κ, ψ theirselves are bounded on [0, τ ] as well as that (6), (7), (8) are satisfied for σ m ≡ τ . Now, define uniformly bounded adapted processes κn and ψ n on [0, τ ] as n κns = κτjn , ψsn = ψτjn , s ∈ [τjn ∧ τ, τj+1 ∧ τ ).

By Itˆo’s formula, we have  t n Z t = (Xs − Xsn )2 dY s 0  t  t n 2 n 2 = (Xs − Xs ) |κs | dXs + (Xs − Xsn)2 (κ2s − |κns |2 )dXs 0

0

∞ " 1 2 ! n 4 n 4 n ∧t − X n n = κτjn (Xτj+1 ) − (X − X ) n τj ∧t τj ∧t τj ∧t 6 j=0  t  2 t n2 n 3 − |κ | (Xs − Xs ) dXs + (Xs − Xsn )2 (κ2s − |κns |2 )dXs . 3 0 s 0

Let us see

lim E

n→∞

#

E[Nτn ]



0

τ

(Xs −

Xsn )2 (κ2s



|κns |2 )dXs

$

= 0.

(9)

338

M. Fukasawa

In fact, putting ǫn = sup |κ2s − |κns |2 |, V n = E[Nτn ] 0≤s≤τ

we have E[Nτn ]



τ 0



0

τ

(Xs − Xsn )2 dXs ,

(Xs − Xsn )2 |κ2s − |κns |2 |dXs ≤ ǫn V n .

By the assumptions, ǫn is uniformly bounded and V n is uniformly integrable. By Karatzas and Shreve (1991), 3.4.5 (iv), the M -continuity of κ implies that ǫn → 0 in probability under (6). Since ǫn is bounded, ǫn V n is uniformly integrable, which implies E[ǫn V n ] → 0. Similarly, we can show # τ $ n 2 n 3 n |κs | (Xs − Xs ) dXs E[Nτ ]E 0 $ # τ n 2 n 3 n |κs | (Xs − Xs ) ψs dXs → 0. = E[Nτ ]E 0

Here we have used the M -continuity of X instead of κ. So far, we have

lim inf E[Nτn ]E[Z n τ ] n→∞   ∞ ! "  1 n 2 n 4 n 4 . n ∧τ − X n κτjn (Xτj+1 = lim inf E[Nτ ]E  τj ∧τ ) − (Xτjn ∧τ − Xτjn ∧τ ) n→∞ 6 j=0

Let us denote Ej [·] = E[·|Fτjn ∧τ ] and put

ˇ j = Xτ n ∧τ − Ej [Xτ n ∧τ ], αj = Ej [Xτ n ∧τ ] − X nn , βj = Xτ n ∧τ − X nn . X τj ∧τ τj ∧τ j+1 j+1 j+1 j Then

 ∞ !  E κ2τ n (Xτ n

j+1 ∧τ

j

j=0



=E 

=E

Let us see that



n

Nτ  j=0 n

Nτ  j=0

j

ˇ j + αj )4 − βj4 ) κ2τjn ((X ,

4



ˇ j4 ] + 4αj Ej [X ˇ j3 ] + 6α2j Ej [X ˇ j2 ] + α4j − βj  . κ2τjn Ej [X  n  Nτ  , lim E[Nτn ]E  κ2τjn α4j − βj4  = 0.

n→∞

In fact, putting

− Xτnjn ∧τ )4 − (Xτjn ∧τ

 " − Xτnn ∧τ )4 

j=0

n ∧τ ] − Xτ n ∧τ = αj − βj , α ˆ j = Ej [Xτj+1 j

Asymptotically Efficient Discrete Hedging it suffices to observe that  n  Nτ  lim E[Nτn ]E  κ2τjn α ˆ 4j  = 0, n→∞

j=0

 n  Nτ  lim E[Nτn ]E  κ2τjn α ˆ 2j βj2  = 0,

n→∞

j=0



lim E[Nτn ]E 

n→∞



lim E[Nτn ]E 

n→∞



n

Nτ  j=0 n

Nτ  j=0

339

κ2τjn |ˆ αj |3 |βj | = 0, 

(10)

κ2τjn |ˆ αj ||βj |3  = 0.

n ∧τ − Xτ n ∧τ . Let us prove (10). Let C be a generic constant and put Qj = Xτj+1 j   n   n Nτ Nτ   κ2τjn α ˆ 4j  ≤ CE[Nτn ]E  |Ej [Qj ]|4  E[Nτn ]E 

j=0

j=0

 n  Nτ  C CE[Xτ ] E Ej [Qj ] = . ≤ E[Nτn ]2 E[Nτn ]2 j=0

We have used Condition S and (7) for the first and second inequalities respectively. The right-hand side converges to 0 since by (6), 0
c) ff 2 (X) . (X) µ

By substituting fµ (x) given above into P2c (µ), one can obtain the following results: % & f (X) c P2 (µ) = E I(X > c) fµ (X) = M (µ)E {I(X > c) exp(−µ X)} ≤ M (µ) exp(−µ c),

(5)

where µ and c are assumed positive numbers for this upper bound to hold. To minimize the logarithm of this upper bound, its first-order condition satisfies d ln(M (µ) exp(−µ c)) M ′ (µ) = −c dµ M (µ) = 0. ′



(µ ) Let µ⋆ solve for M M (µ⋆ ) = c. It follows that the expected value of X under the new probability measure  Pµ⋆ is exactly the loss threshold c. This is confirmed by evaluating Eµ⋆ (X) = x fµ⋆ (x) dx and substituting fµ⋆ (x) defined in (4) so that

M ′ (µ⋆ ) M (µ⋆ ) = c.

Eµ⋆ (X) =

(6)

Remark that this whole procedure of exponential change of measure can be extended to high dimension. From the simulation point of view, the final result shown in (6) is appealing because the rare event of default under the original probability measure is no longer rare under this new measure Pµ⋆ . In addition, even in the situation that moment generating functions are difficult or impossible to find, one can still possibly use other change of measure techniques to fulfill (6), i.e., “the expected value of a defaultable asset is equal to its debt value” in financial terms. We will see such example in Section 3 for high-dimensional Black-Cox model. In the concrete case of X being a standard normal random variable, the minimizer µ⋆ is exactly equal to c. This result is derived from a direct calculation of c = M ′ (µ⋆ )/M (µ⋆ ) given that the moment generating function of X is M (µ) = ⋆ ⋆ ⋆ exp(µ2 /2). The twist or tilted √ density function fµ (x) = exp(µ x) f (x)/M (µ ) 2 becomes exp(−(x − c) /2)/ 2 π. Hence, random samplings are generated from X ∼ N (c, 1) under this new density function, instead of X ∼ N (0, 1) under the

Efficient Importance Sampling Estimation original measure. The default probability P1c can be explicitly expressed by  ∞ 2 2 e−x /2 1 I (x > c) −(x−c)2 /2 e−(x−c) /2 dx P1c = √ e 2π −∞ ! " c2 /2−cX = Ec I (X > c) e ,

351

(7)

and its second moment becomes

" ! 2 P2c (c) := Ec I (X > c) ec −2cX .

Naturally, one can ask an optimization problem which minimizes variance of all possible √ importance sampling estimators associated with fµ (x) = exp(−(x − µ)2 /2)/ 2 π for each µ ∈ ℜ. That is, given that % & f (X) P1c = Eµ I (X > c) fµ (X) " ! 2 (8) = Eµ I (X > c) eµ /2−µX , we seek to minimize its second moment ! " 2 P2c (µ) = Eµ I (X > c) eµ −2µX .

(9)

The associated minimizer guarantees the minimal variance within the µ-parametrized measures but solving the minimizer via (9) requires numerical computation. In Section 2.2, we compare variance reduction performance between the optimal estimator (8) by minimizing (9), and the efficient estimator (7). We will see that in our numerical experiments these two estimators reach the same level of accuracy, but the computing times are very different. Efficient estimators perform more effectively then the optimal estimators. 2.1. Asymptotic variance analysis by large deviation principle It is known that the number of simulation to estimate a quantity in a certain level 2 of accuracy should be proportional to P2c (µ)/ (P1c ) − 1. See for example Section 2 4.5 in [7]. If the decay rate of P2c (µ) and (P1c ) are the same asymptotically, we say that the asymptotic variance rate is zero and the corresponding importance sampling estimator is asymptotically optimal or efficient. In this section, we aim to prove that when the density parameter µ is particularly chosen as the default loss threshold c, the variance reduced by the importance sampling scheme defined in (7) is asymptotically optimal by an application of Cramer’s theorem in large deviation theory. Recall that Theorem 1 (Cramer’s theorem [7]). Let {Xi } be real-valued i.i.d. random variables under P and EX1 < ∞. For any x ≥ E {X1 }, we have

1 Sn lim ln P ≥ x = − inf Γ∗ (y), (10) n→∞ n y≥x n

352

C.-H. Han

, where Sn = ni=1 Xi denotes the sample sum of size n, Γ(θ) = ln E eθX1 denotes the cumulant function, and Γ∗ (x) = supθ∈ℜ [θ x − Γ(θ)]. From this theorem and the moment generating function E {exp(θX)} = exp(θ2 /2) for X being a standard normal univariate, we obtain the following asymptotic approximations. Lemma 1 (Asymptotically Optimal Importance Sampling). When c approaches infinity, the variance rate of the estimator, defined in (7), approaches zero. That is, 1 1 ln P2c (c) = 2 lim 2 ln P1c = −1. c→∞ c c2 Therefore, this importance sampling is asymptotically optimal or efficient. lim

c→∞

Proof. Given that Xi , i = 1, 2, . . . are i.i.d. one-dimensional standard normal random variables, it is easy to obtain n

1 x2 i=1 Xi lim ln P ≥x =− , n→∞ n n 2 ) n * 2 i=1 Xi or equivalently P ≥ x ≈ exp(−n x2 ) by an application of Theorem 1. n √ Introduce a rescaled default probability P (X ≥ nx) for n large, then n

 √ √  i=1 Xi √ P X ≥ nx = P nx ≥ n n

i=1 Xi =P ≥x , n in which each √ random variable Xi has the same distribution as X. Hence, let 1 ≪ c := nx, the approximation to the first moment of I(X ≥ c) or default probability P1c is obtained:  √  1 1 lim √ ln P X ≥ nx = − , 2 n→∞ ( nx) 2 or equivalently

c2 ). 2 Given the second moment defined in (9), it is easy to see that P (X ≥ c) ≈ exp(−

(11)

2

P2c (µ) = E−µ {I (X > c)} eµ

2

= E0 {I (X > µ + c)} eµ .

The first line is obtained by changing measure via dPµ /dP−µ , and the second line shifts the mean value of X ∼ N (−µ, 1) to 0. With the choice µ = c, we get   2 P2c (c) = E0 {I (X > 2 c)} ec and its approximation P2c (c) ≈ exp −c2 can be easily obtained from the same derivation as in (11). As a result, we verify that

Efficient Importance Sampling Estimation

353

the decay rate of the second moment P2c (c) is two times the decay rate of the probability P1c , i.e., lim

c→∞

1 1 log P2c (c) = 2 lim 2 log P1c = −1. c→∞ c c2

(12) 

We have shown that the proposed importance sampling scheme defined in (7) is efficient. This zero variance rate should be understood as an optimal variance reduction in an asymptotic sense because the variance rate cannot be less than zero. Remark. This lemma can be generalized to any finite dimension and possible extended to other generalized distribution such as Student’s t. We refer to [14] for further details with applications in credit risk. For our thematic topic of estimating joint default probabilities under highdimensional Black-Cox model, this technique unfortunately does not work because the moment generating function of multivariate first passage times is unknown. We overcome this difficulty by considering a simplified problem, namely change measure for the joint distributions of underlying processes at maturity rather than their first passage times. Remarkably, we find that this measure change can still be proven asymptotically optimal for the original first passage time problem. Details can be found in Section 3. 2.2. Numerical results Table 1 demonstrates performance of two importance sampling schemes, including optimal estimator and efficient estimator, to estimate the default probability P (X > c) for X ∼ N (0, 1) with various loss threshold values c. In Column 2, exact solutions of N (−c) are reported. In each column of simulation, Mean and SE stand for the sample mean and the sample standard error, respectively. IS(µ = c) represents the scheme in (7) using the pre-determined choice of µ = c suggested from the asymptotic analysis in Lemma 1, while IS(µ = µ⋆ ) represents the optimal scheme in (7) using µ = µ⋆ , which minimizes P2c (µ) numerically. We observe that standard errors obtained from these two importance sampling schemes are comparable in terms of the same order of accuracy, while the computing time is not. From the last row, the optimal importance sampling scheme IS(µ = µ⋆ ) takes about 50 times more than the efficient importance sampling scheme IS(µ = c). These numerical experiments are implemented in Matlab on a laptop PC with 2.40GHz Intel Duo CPU T8300. There have been extensive studies and applications, see Chapter 9 of [11] for various applications in risk management, by using this concept of minimizing an upper bound of the second moment under a parametrized twist probability, then construct an importance sampling scheme. However, it remains to check whether this scheme is efficient (with zero variance rate) or not.

354

C.-H. Han Table 1. Estimation of default probability P (X > c) with different loss threshold c when X ∼ N (0, 1). The total number of simulation is 10,000. DP

Basic MC

IS(µ = c)

IS(µ = µ⋆ )

c

true

Mean

SE

Mean

SE

Mean

SE

1 2 3 4

0.1587 0.0228 0.0013 3.17e-05

0.1566 0.0212 1.00E-03 –

0.0036 0.0014 3.16E-04 –

0.1592 0.0227 0.0014 3.13E-05

0.0019 3.49E-04 2.53E-05 6.62E-07

0.1594 0.0225 0.0014 3.11E-05

0.0018 3.37E-04 2.51E-05 6.66E-07

time

0.004659

0.020904

1.060617

3. Efficient importance sampling for high-dimensional first passage time problem In this section, we review an importance sampling scheme developed by Han and Vestal [13] for the first passage time problem (3) in order to improve the convergence of Monte Carlo simulation. In addition, we provide a variance analysis to justify that the importance sampling scheme is asymptotic optimal (or efficient) in one dimension. The basic Monte Carlo simulation approximates the joint default probability defined in (3) by the following estimator DP ≈

N * 1  n ) (k) Πi=1 I τi ≤ T , N

(13)

k=1

(k)

where τi denotes the kth i.i.d. sample of the ith default time defined in (2) and N denotes the total number of simulation. By the Girsanov Theorem, one can construct an equivalent probability measure P˜ defined by the following Radon-Nikodym derivative    T dP 1 T 2 ˜ = QT (h· ) = exp h(s, Ss ) · dWs − ||h(s, Ss )|| ds , (14) 2 0 dP˜ 0 where we denote by ) Ss = (S1s , .*. . , Sns ) the state variable (asset value process) ˜ ˜ 1s , . . . , W ˜ ns the vector of standard Brownian motions, revector and Ws = W spectively. The function h(s, Ss ) is assumed to satisfy Novikov’s condition such  ˜ t = Wt + t h(s, Ss )ds is a vector of Brownian motions under P˜ . that W 0 The importance sampling scheme proposed in [13] is to select a constant vector h = (h1 , . . . , hn ) which satisfies the following n conditions E˜ {SiT |F0 } = Bi , i = 1, . . . , n.

(15)

Efficient Importance Sampling Estimation

355

These equations can be simplified by using the explicit log-normal density of SiT , so we deduce the following sequence of linear equations for hi ’s: µi ln Bi /Si0 − , i = 1, . . . , n. (16) σi σi T If the covariance matrix Σ = (ρij )1≤i,j,≤n is non-singular, the vector h exists uniquely so that the equivalent probability measure P˜ is uniquely determined. The joint default probability defined in (3) becomes Σij=1 ρij hj =

˜ {Πni=1 I (τi ≤ T ) QT (h)|F0 } . DP = E

(17)

Equation (15) requires that, under the new probability measure P˜ , the expectation of asset’s value at time T is equal to its debt level. When the debt level B of a company is much smaller than its initial asset value S0 (see examples in Table 2), or returns of any two names are highly negative correlated (see examples in Table 3), joint default events are rare. By the proposed importance sampling scheme, random samples drawn under the new measure P˜ cause more defaults than those samples drawn under P . Table 2. Comparison of single-name default probability by basic Monte Carlo (BMC), exact solution, and importance sampling (IS). The number of simulation is 104 and an Euler discretization for (1) is used by taking time step size T /400, where T is one year. Other parameters are S0 = 100, µ = 0.05 and σ = 0.4. Standard errors are shown in parenthesis. B

BMC

50 0.0886 (0.0028) 20

– (–)

1

– (–)

Exact Sol

IS

0.0945

0.0890 (0.0016)

7.7310 ∗ 10

−5

1.3341 ∗ 10−30

7.1598 ∗ 10−5 (2.3183 ∗ 10−6 )

1.8120 ∗ 10−30 (3.4414 ∗ 10−31 )

Table 2 and Table 3 illustrate numerical results for estimating the (joint) default probabilities of a single-name case and a three-name case. The exact solution of the single name default probability 1−2µ/σ2 S0 + − 1 − N (d2 ) + N (d2 ) (18) B 2

± ln(S0 /B)+(µ−σ /2)T √ with d± can be found in [8]. This result is obtained from 2 = σ T the distribution of the running minimum of Brownian motion. However, there is no closed-form solution for the joint default probability of three names in Table 3. It is worth noting the drastic difference of joint default probabilities caused by different correlations. Furthermore, Table 4 shows the capability of proposed importance sampling method to treat the first passage-time problems in high dimension.

356

C.-H. Han Table 3. Comparison of three-name joint default probability by basic Monte Carlo (BMC), and importance sampling (IS). The number of simulation is 104 and an Euler discretization for (1) is used by taking time step size T /100, where T is one year. Other parameters are S10 = S20 = S30 = 100, µ1 = µ2 = µ3 = 0.05, σ1 = σ2 = 0.4, σ3 = 0.3 and B1 = B2 = 50, B3 = 60. Standard errors are shown in parenthesis. ρ

BMC

IS

0.3

0.0049(6.9832 ∗ 10−4 )

0.0057(1.9534 ∗ 10−4 )

0

3.0000 ∗ 10−4 (1.7319 ∗ 10−4 )

−0.3

−(−)

6.4052 ∗ 10−4 (6.9935 ∗ 10−5 ) 2.2485 ∗ 10−5 (1.1259 ∗ 10−5 )

Table 4. Comparison of multiname joint default probabilities by basic Monte Carlo (BMC), and importance sampling (IS) under highdimensional Black-Cox model. Let n denote the dimension, the total number of firms. The number of simulation is 3 ∗ 104 and an Euler discretization for (1) is used by taking time step size T /100, where T is one year. Other parameters are S0 = 100, µ = 0.05, σ = 0.3, ρ = 0.3, and B = 50. Basic MC n 2 5 10 15 20 25 30 35 40 45 50

Mean

SE

Importance Sampling Mean

1.1E-03 3.31E-04 1.04E-03 6.36E-06 2.90E-07 9.45E-09 1.15E-09 2.06E-10 6.76E-11 1.35E-11 6.59E-12 3.25E-12 6.76E-13

SE 2.83E-05 3.72E-07 2.66E-08 1.16E-09 1.98E-10 3.84E-11 2.36E-11 2.89E-12 1.58E-12 1.08E-12 2.26E-13

3.1. Asymptotic variance analysis by large deviation principle We provide a theoretical verification to show that the importance sampling developed above is asymptotic optimal for the one-dimensional first passage time problem under the geometric Brownian motion. This problem has also been considered in Carmona et al. [8].

Efficient Importance Sampling Estimation

357

Our proof is based on the Freidlin-Wentzell theorem [6, 9] in large deviation theory in order to approximate the default probability and the second moment of the importance sampling estimator defined in (17). We consider the scale ε = −1 − (ln(B/S0 )) being small or equivalently 0 < B ≪ S0 , the current asset value S0 is much larger than its debt value B. Our asymptotic results show that the second moment approximation is the square of the first moment (or default probability) approximation. Therefore, we attain the optimality of variance reduction in an asymptotic sense, i.e., the proposed importance sampling scheme is efficient. Theorem 2 (Efficient Importance Sampling). Let St denote the asset value following the log-normal process dSt = µSt dt + σ St dWt with the initial value S0 , and B denote the default boundary. We define the default probability and its importance sampling scheme by %

& ε P1 = E I min St ≤ B 0≤t≤T

& % ˜ = E I min St ≤ B QT (h) , 0≤t≤T

where the Radon-Nykodym derivative Q(h) is defined in (14). The second moment of this estimator is denoted by & %

P2ε (h) = E˜ I min St ≤ B Q2T (h) . 0≤t≤T

By the choice of h = (µ T + 1/ε) /(σ T ) with the scale ε defined from −1/ε = ln(B/S0 ), the expected value ST under P˜ is B. That is, E˜ {ST } = B. When ε is small enough or equivalently B ≪ S0 , we obtain a zero variance rate, i.e., ) * 2 limε→0 ε2 ln P2ε (h)/ (P1ε ) = 0, so that the importance sampling scheme is efficient. Proof. Recall that the one-dimensional default probability is defined by $ # * ) 2 µ− σ2 t+σWt ≤B P inf St = S0 e 0≤t≤T #

$ σ2 =E I inf ε µ − t + εσWt ≤ −1 , 0≤t≤T 2

(19)

where we have use the strictly monotonicity of the logarithmic transformation . For small parameter ε, the default and we introduce a scaling ln (B/S0 ) = −1 ε probability will be small in financial intuition because the debt to asset value, B/S0 , is small. By an application of Freidlin-Wentzell Theorem [6, 9], it is easy to prove that the rate function of (19) is 2σ12 T . That is the rescaled default probability # $

) * 2 −1 µ− σ2 t+σWt P inf St = S0 e ≤ B ≈ exp 2 2 , (20) 0≤t≤T ε 2σ T

358

C.-H. Han

when ε is small. Recall that under the*measure change defined in (17), the price ) 2

˜ µ− σ −σh t+σW

t 0 2 dynamics becomes St = S0 e with h = σµ − ln σB/S T . The second ε moment P2 (h) becomes #

$ 2 ˜ ˜ I E inf St ≤ B e2 h WT −h T 0≤t≤T

$ # ) * 2 ˆt 2 µ− σ2 +σh t+σW ˆ I ≤B eh T =E inf S0 e 0≤t≤T #



$ 1 σ2 1 )2 T , ˆ I ˆ t ≤ −1 × e( σr + εσT =E inf ε 2µ − + t + εσ W 0≤t≤T 2 T where the measure change dPˆ /dP˜ is defined by QT (2h) for the second line and we incorporate the same scaling ln (B/S0 ) = −1 ε to rescale our problem for the last line above. By Freidlin-Wentzell theorem, the rate function of the expectation is 2 . Consequently, the approximation σ2 T



# $ 2 −1 ˜ ˜ I E inf St ≤ B e2 h WT −h T ≈ exp 2 2 (21) 0≤t≤T ε σ T

is derived. By limε→0 ε2 ln P2ε (h) = 2 limε→0 ε2 ln P1ε , we confirm that the variance rate of this importance sampling is asymptotically zero so that this scheme is efficient.  Remark. The same result can be obtained from a PDE argument studied in [13].

4. Conclusion Estimation of joint default probabilities under a first passage time problem in the structural form model are tackled by importance sampling. Imposing “the expected asset value at debt maturity equals to its debt value” as a condition, importance sampling schemes can be uniquely determined within a family of parametrized probability measures. This approach overcomes the hurdle of exponential change of measure that requires existence of the moment generating function of first passage times. According to the large deviation principle, our proposed importance sampling scheme is asymptotically optimal or efficient in a rare event simulation. Acknowledgment Work supported by NSC 97-2115-M-007-002-MY2, Taiwan. We are also grateful to one anonymous referee and Professor Nicolas Privault. Other acknowledgments: NCTS, National Tsing-Hua University; TIMS, National Taiwan University; CMMSC, National Chiao-Tung University.

Efficient Importance Sampling Estimation

359

References [1] B. Arouna,“Robbins Monro algorithms and variance reduction in finance,” Journal of Computational Finance, 7 (2), (Winter 2003/04). [2] B. Arouna, “Adaptive Monte Carlo method, a variance reduction technique,” Monte Carlo Methods Appl, 10 (1) (2004), 1–24. [3] F. Black and M. Scholes, “The Pricing of Options and Corporate Liabilities,” Journal of Political Economy, 81 (1973), 637–654. [4] F. Black and J. Cox, “Valuing Corporate Securities: Some Effects of Bond Indenture Provisions,” Journal of Finance, 31(2) (1976): 351–367. [5] T.R. Bielecki and M. Rutkowski, Credit Risk: Modeling, Valuation and Hedging, Springer 2002. [6] J.A. Bucklew, Large Deviation Techniques in Decision, Simulation, and Estimation, Wiley-Interscience, Applied Probability and Statistics Series, New York, 1990. [7] J.A. Bucklew, Introduction to rare event simulation, Springer, 2003. [8] R. Carmona, J.P. Fouque, and D. Vestal, “ Interacting Particle Systems for the Computation of Rare Credit Portfolio Losses,” Finance and Stochastics, 13(4), 2009, 613–633. [9] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications, 2/e, Springer, 1998. [10] J.-P. Fouque and C.-H. Han, “Variance Reduction for Monte Carlo Methods to Evaluate Option Prices under Multi-factor Stochastic Volatility Models,” Quantitative Finance, Volume 4, number 5 (2004) 1–10. [11] P. Glasserman, Monte Carlo Methods for Financial Engineering, Springer-Verlag, New York, 2003. [12] C.-H. Han, W.-H. Liu, Z.-Y. Chen, “An Improved Procedure for VaR/CVaR Estimation under Stochastic Volatility Models,” submitted, 2010. [13] C.-H. Han and D. Vestal, “Estimating the probability of joint default in structural form by efficient importance sampling,” working paper, Natoinal Tsing-Hua University, 2010. [14] C.-H. Han and C.-T. Wu, “Efficient importance sampling for estimating lower tail probabilities under Gaussian and Student’s distributions,” preprint, National TsingHua University, 2010. [15] R. Merton, “On the Pricing of Corporate Debt: The Risk Structure of Interest Rates,” The Journal of Finance, 29 (1974), 449–470. [16] C. Zhou, “The Term Structure of Credit Spreads with Jump Risk”, Journal of Banking & Finance, Vol. 25, No. 11, (November 2001), pp. 2015–2040. [17] C. Zhou, “An Analysis of Default Correlations and Multiple Defaults,” The Review of Financial Studies, 14(2) (2001), 555–576. Chuan-Hsiang Han Department of Quantitative Finance National Tsing Hua University Hsinchu, Taiwan, 30013, ROC e-mail: [email protected]

Progress in Probability, Vol. 65, 361–411 c 2011 Springer Basel AG 

Market Models of Forward CDS Spreads Libo Li and Marek Rutkowski Abstract. The paper re-examines and generalizes the construction of several variants of market models for forward CDS spreads, as first presented by Brigo [10]. We compute explicitly the joint dynamics for some families of forward CDS spreads under a common probability measure. We first examine this problem for single-period CDS spreads under certain simplifying assumptions. Subsequently, we derive, without any restrictions, the joint dynamics under a common probability measure for the family of one- and two-period forward CDS spreads, as well as for the family of one-period and co-terminal forward CDS spreads. For the sake of generality, we work throughout within a general semimartingale framework. Mathematics Subject Classification (2000). 60H30, 91B70. Keywords. Credit default swap, market model, LIBOR.

1. Introduction The market model for forward LIBORs was first examined in papers by Brace et al. [6] and Musiela and Rutkowski [23]. Their approach was subsequently extended by Jamshidian in [16, 17] to the market model for co-terminal forward swap rates. Since then, several papers on alternative market models for LIBORs and other families of forward swap rates were published. Since modeling of (non-defaultable) forward swap rates is not presented here, the interested reader is referred, for instance, to Galluccio et al. [14], Pietersz and Regenmortel [25], Rutkowski [26], or the monographs by Brace [5] or Musiela and Rutkowski [24] and the references therein. To the best of our knowledge, there is relatively scarce financial literature in regard either to the existence or to methods of construction of market models for forward CDS spreads. This apparent gap is a bit surprising, especially when confronted with the market practitioners approach to credit default swaptions, which hinges on a suitable variant of the Black formula. The standard argument which underpins the validity of this formula is the postulate of lognormality of

362

L. Li and M. Rutkowski

credit default swap (CDS) spreads, as discussed, for instance, in Brigo and Morini [11], Jamshidian [18], Morini and Brigo [22], or Rutkowski and Armstrong [28]. In the commonly used intensity-based approach to default risk, this crucial property of forward CDS spreads fails to hold, however (see, e.g., Bielecki et al. [2]), and thus a need for a novel modeling approach arises in a natural way. Recently, attempts have been made to search for explicit constructions of market model for forward CDS spreads in papers by Brigo [8, 9, 10] and Schl¨ ogl [29] (related issues were also studied in Lotz and Schl¨ ogl [20] and Sch¨ onbucher [30]). The present work is inspired by these papers, where under certain simplifying assumptions, the joint dynamics of a family of CDS spreads were derived explicitly under a common probability measure and a construction of the model was provided. Our main goal will be to derive the joint dynamics for certain families of forward CDS spreads in a general semimartingale setup. Our aim is to derive the joint dynamics of a family of CDS spreads under a common probability. Firstly, we will derive the joint dynamics of a family of single period CDS spreads under the postulate that the interest rate is deterministic. In the second part, we will derive the joint dynamics of a family of single period CDS spreads under the assumption that the interest rate and default indicator process are independent. Lastly, without any simplifying assumptions, we will derive both the joint dynamics of a family of one- and two-period CDS spreads and a family of one-period and co-terminal CDS spreads. We would also like to mention that, although not presented here, the joint dynamics of a family of one-period and co-initial CDS spreads can also be derived using the techniques developed in Subsections 5.3 and 5.4. For each market model, we will also present both the bottom-up approach and the top-down approach to the modeling of CDS spreads. In the bottom-up approach, one usually starts with a credit risk model and, relying on the assumption that the Predictable Representation Property (PRP) holds, one shows the existence of a family of ‘volatility’ processes for a family of forward CDS spreads and derives their joint dynamics under a common probability measure. On the other hand, in the top-down approach, for any given in advance family of ‘volatilities’, we focus on the direct derivation of the joint dynamics for a given family of forward CDS spreads under a common probability measure. It is fair to point out, however, that in this paper we do not provide a fully developed credit risk model obtained through the top-down approach, since the construction of the default time consistent with the derived dynamics of forward CDS spreads is not studied. In Subsection 5.5, we make an attempt to identify the set of postulates that underpin the top-down approach to a generic model of forward spreads and we derive the joint dynamics of forward spreads in a fairly general set-up. We emphasize the fact that this construction is only feasible for a judiciously chosen family of forward spreads. The work concludes by a brief discussion of the most pertinent open problems that need to be addressed in the context of top-down models.

Market Models of Forward CDS Spreads

363

2. Forward credit default swaps Let (Ω, G, F, Q) be a filtered probability space, where F = (Ft )t∈[0,T ] is the reference filtration, which is assumed to satisfy the usual conditions. We work throughout within the framework of the reduced-form (i.e., intensity-based) methodology. Let us first take the perspective of the bottom-up approach, that is, an approach in which we specify explicitly the default time using some salient probabilistic features, such as the knowledge of its survival process or, equivalently, the hazard process. We thus assume that we are given the default time τ defined on this space in such a way that the F-survival process Gt = Q(τ > t | Ft ) is positive. It is well known that this goal can be achieved in several alternative ways, for instance, using the so-called canonical construction of the random time for a given in advance F-adapted intensity process λ. We denote by G = (Gt )t∈[0,T ] the full filtration, that is, the filtration generated by F and the default indicator process Ht = 1{τ ≤t} . Formally, we set Gt = σ(Ht , Ft ) for every t ∈ R+ , where H = (Ht )t∈[0,T ] is the filtration generated by H. It is well known that for any 0 ≤ t < u ≤ T and any Q-integrable, Fu measurable random variable X the following equality is valid (see, for instance, Chapter 5 in Bielecki and Rutkowski [1] or Chapter 3 in Bielecki et al. [4]) EQ (1{τ >u} X | Gt ) = 1{τ >t} G−1 t EQ (Gu X | Ft ).

(1)

Q ( τ > Ti−1 | Ft ) > Q ( τ > Ti | Ft ) .

(3)

Finally, we assume that an underlying default-free term structure model is given and we denote by β(t, u) = Bt Bu−1 the default-free discount factor over the time period [t, u] for 0 ≤ t ≤ u ≤ T , where in turn B = (Bt , t ∈ [0, T ]) represents the savings account. By assumption, the probability measure Q will be interpreted as the risk-neutral measure. The same basic assumptions underpinning the bottom-up approach will be maintained in Section 5 where alternative variants of market models are presented. Let T = {T0 < T1 < · · · < Tn } with T0 ≥ 0 be a fixed tenor structure and let us write ai = Ti − Ti−1 . We observe that it is always true that, for every i = 1, . . . , n, Q ( τ > Ti−1 | Ft ) ≥ Q ( τ > Ti | Ft ) . (2) When dealing with the bottom-up approach, we will make the stronger assumption that the following inequality holds, for every i = 1, . . . , n, We are in a position to formally introduce the concept of the forward credit default swap. To this end, we will describe the cash flows of the two legs of the stylized forward CDS starting at Ti and maturing at Tl , where T0 ≤ Ti < Tl ≤ Tn . We denote by δj ∈ [0, 1) the constant recovery rate, which determines the size of the protection payment at time Tj , if default occurs between the dates Tj−1 and Tj . Definition 2.1. The forward credit default swap issued at time s ∈ [0, Ti ], with the unit notional and the Fs -measurable spread κ, is determined by its discounted

364

L. Li and M. Rutkowski

payoff, which equals Dti,l = Pti,l − κAi,l t for every t ∈ [s, Ti ], where in turn the discounted payoff of the protection leg equals Pti,l =

l 

j=i+1

(1 − δj )β(t, Tj )1{Tj−1 Tj } .

(5)

Remark 2.1. It should be stressed that in the specification of the two legs we have, in particular, deliberately omitted the so-called accrual payment, that is, a portion of the fee that should be paid if default occurs between two tenor dates, say Tj−1 and Tj . The interested reader is referred to Brigo [8, 9, 10], Brigo and Mercurio [12], or Rutkowski [27] for more details. Specifications (4)–(5) mean that we decided to adopt here the postponed running CDS convention proposed by Brigo [8, 9, 10]. This particular choice of convention is motivated by the fact that it appears to be the most convenient for constructing market models of forward CDS spreads. The value (or the fair price) of the forward CDS at time t is based on the risk-neutral formula under Q applied to the discounted future payoffs. Note that we only consider here the case t ∈ [s, Ti ], although an extension to the general case where t ∈ [s, Tl ] is readily available as well. Definition 2.2. The value of the forward credit default swap for the protection buyer equals, for every t ∈ [s, Ti ], Sti,l (κ) = EQ (Dti,l | Gt ) = EQ (Pti,l | Gt ) − κ EQ (Ai,l t | Gt ).

(6)

In the second equality in (6) we used the definition of the process Di,l and the postulated property that the spread κ is Fs -measurable, and thus also Gt measurable for every t ∈ [s, Ti ]. Let us observe that Ai,l = 1{τ >Ti } Ai,l and t t i,l i,l i,l i,l Pt = 1{τ >Ti } Pt so that also Dt = 1{τ >Ti } Dt . Using formula (1), it is thus straightforward to show that the value at time t ∈ [s, Ti ] of the forward CDS satisfies i,l +i,l Sti,l (κ) = 1{τ >t} G−1 (7) t EQ (Dt | Ft ) = 1{τ >t} St (κ), i,l i,l i,l +t , where we denote where the pre-default price satisfies S+t (κ) = P+t − κA i,l P+ti,l = G−1 t EQ (Pt | Ft ),

+i,l = G−1 EQ (Ai,l | Ft ). A t t t

(8)

More explicitly, the pre-default value at time t ∈ [0, Ti] of the fee leg per one unit of spread, that is, of the defaultable annuity is given by +i,l = A t

l 

j=i+1

aj G−1 t EQ (β(t, Tj )1{τ >Tj } | Ft ).

(9)

Market Models of Forward CDS Spreads

365

Similarly, the pre-default value at time t ∈ [0, Ti ] of the protection leg equals P+ti,l =

l 

j=i+1

(1 − δj )G−1 t EQ (β(t, Tj )1{Tj−1